RcppMLPACK/0000755000176200001440000000000013647521523012056 5ustar liggesusersRcppMLPACK/NAMESPACE0000644000176200001440000000025213647512751013277 0ustar liggesusersuseDynLib(RcppMLPACK, .registration=TRUE) export(RcppMLPACK.package.skeleton, mlKmeans) importFrom(Rcpp, evalCpp) importFrom(utils, packageDescription, package.skeleton) RcppMLPACK/README.md0000644000176200001440000000340513647512751013342 0ustar liggesusersRcppMLPACK ========== [![Build Status](https://travis-ci.org/thirdwing/RcppMLPACK1.svg?branch=master)](https://travis-ci.org/thirdwing/RcppMLPACK1) [MLPACK](http://www.mlpack.org/) is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. MLPACK outperforms competing machine learning libraries by large margins; see the [BigLearning workshop paper](http://www.mlpack.org/papers/mlpack2011.pdf) for details. The algorithms implemented in MLPACK: * Fast Hierarchical Clustering (Euclidean Minimum Spanning Trees) * Gaussian Mixture Models (trained via EM) * Hidden Markov Models (training, prediction, and classification) * Kernel Principal Components Analysis * K-Means clustering * LARS/Lasso Regression * Least-squares Linear Regression * Maximum Variance Unfolding (using LRSDP) * Naive Bayes Classifier * Neighborhood Components Analysis (using LRSDP) * RADICAL (Robust, Accurate, Direct ICA aLgorithm) * Tree-based k-nearest-neighbors search and classifier * Tree-based range search The RcppMLPACK package includes the source code from the MLPACK library. Thus users do not need to install MLPACK itself in order to use RcppMLPACK. This MLPACK integration heavily relies on [Rcpp](http://www.rcpp.org) and RcppArmadillo packages. The version number of MLPACK is used as the version number of this package. Testing and bug reports are deeply welcome. You can find examples in the [wiki page](https://github.com/thirdwing/RcppMLPACK1/wiki). You can always find me by email (qkou@umail.iu.edu) if you have any questions. RcppMLPACK/man/0000755000176200001440000000000013647512751012634 5ustar liggesusersRcppMLPACK/man/mlKmeans.Rd0000644000176200001440000000112213647512751014666 0ustar liggesusers\name{mlKmeans} \alias{mlKmeans} \title{kmeans from MLPACK} \description{ kmeans example for using MLPACK with R. } \usage{ mlKmeans(X, y) } \arguments{ \item{X}{data matrix.} \item{y}{number of clusters.} } \details{ This is a kmeans example using RcppMLPACK. It uses the Kmeans method in MLPACK and integrates with R. } \value{ \code{mlKmeans} returns a list with cluster assignment: } \references{MLPACK project: \url{http://www.mlpack.org/}} \author{ For RcppMLPACK: Qiang Kou For MLPACK: Ryan Curtin } \examples{ data(trees, package="datasets") mlKmeans(t(trees),3) } RcppMLPACK/man/RcppMLPACK-package.Rd0000644000176200001440000000110713647512751016307 0ustar liggesusers\name{RcppMLPACK-package} \alias{RcppMLPACK-package} \alias{RcppMLPACK} \docType{package} \title{ Rcpp Integration for MLPACK Library } \description{ The package eases the integration of MLPACK types with R. MLPACK is an intuitive, fast, scalable C++ machine learning library, meant to be a machine learning analog to LAPACK. } \author{ For RcppMLPACK: Qiang Kou For MLPACK: Ryan Curtin Maintainer: Qiang Kou } \references{ MLPACK project: \url{http://www.mlpack.org/} } \keyword{ package } \keyword{ programming } \keyword{ interface } RcppMLPACK/man/RcppMLPACK.package.skeleton.Rd0000644000176200001440000000456013647512751020141 0ustar liggesusers\name{RcppMLPACK.package.skeleton} \alias{RcppMLPACK.package.skeleton} \title{ Create a skeleton for a new package that intends to use RcppMLPACK } \description{ \code{RcppMLPACK.package.skeleton} automates the creation of a new source package that intends to use features of RcppMLPACK. It is based on the \link[utils]{package.skeleton} function which it executes first. } \usage{ RcppMLPACK.package.skeleton(name = "anRpackage", list = character(), environment = .GlobalEnv, path = ".", force = FALSE, code_files = character(), example_code = TRUE) } \arguments{ \item{name}{See \link[utils]{package.skeleton}} \item{list}{See \link[utils]{package.skeleton}} \item{environment}{See \link[utils]{package.skeleton}} \item{path}{See \link[utils]{package.skeleton}} \item{force}{See \link[utils]{package.skeleton}} \item{code_files}{See \link[utils]{package.skeleton}} \item{example_code}{If TRUE, example c++ code using RcppMLPACK is added to the package} } \details{ In addition to \link[utils]{package.skeleton} : The \samp{DESCRIPTION} file gains a Depends line requesting that the package depends on Rcpp and RcppArmadillo and a LinkingTo line so that the package finds Rcpp and RcppArmadillo header files. The \samp{NAMESPACE}, if any, gains a \code{useDynLib} directive. The \samp{src} directory is created if it does not exists and a \samp{Makevars} file is added setting the environment variable \samp{PKG_LIBS} to accomodate the necessary flags to link with the Rcpp library. If the \code{example_code} argument is set to \code{TRUE}, example files \samp{RcppMLPACK.h} and \samp{kmeans.cpp} are also created in the \samp{src}. An R file \samp{RcppExports.R} is expanded in the \samp{R} directory, the \code{mlkmeans} function defined in this files makes use of the C++ function \samp{mlkmeans} defined in the C++ file. These files are given as an example and should eventually by removed from the generated package. } \value{ Nothing, used for its side effects } \seealso{ \link[utils]{package.skeleton} } \references{ Read the \emph{Writing R Extensions} manual for more details. Once you have created a \emph{source} package you need to install it: see the \emph{R Installation and Administration} manual, \code{\link{INSTALL}} and \code{\link{install.packages}}. } \examples{ \dontrun{ RcppMLPACK.package.skeleton("foobar") } } \keyword{ programming } RcppMLPACK/DESCRIPTION0000644000176200001440000000177113647521523013572 0ustar liggesusersPackage: RcppMLPACK Type: Package Title: 'Rcpp' Integration for the 'MLPACK' Library Version: 1.0.10-7 Date: 2020-04-17 Author: Qiang Kou, Ryan Curtin Maintainer: Qiang Kou Description: 'MLPACK' is an intuitive, fast, scalable C++ machine learning library, meant to be a machine learning analog to 'LAPACK'. It aims to implement a wide array of machine learning methods and function as a Swiss army knife for machine learning researchers: 'MLPACK' is available from ; sources are included in the package. SystemRequirements: A C++11 compiler. Versions 4.8.*, 4.9.* or later of GCC will be fine. License: LGPL (>= 2) Depends: R (>= 3.3.0) Imports: Rcpp (>= 0.12.8) LinkingTo: Rcpp, RcppArmadillo, BH URL: https://github.com/thirdwing/RcppMLPACK1, http://www.mlpack.org/ BugReports: https://github.com/thirdwing/RcppMLPACK1/issues NeedsCompilation: yes Packaged: 2020-04-21 06:55:31 UTC; qkou Repository: CRAN Date/Publication: 2020-04-21 07:40:03 UTC RcppMLPACK/build/0000755000176200001440000000000013647514343013157 5ustar liggesusersRcppMLPACK/build/vignette.rds0000644000176200001440000000037113647514343015517 0ustar liggesusersP0Mn^.D R[.p\f'Pg{-o& RcppMLPACK/configure.ac0000644000176200001440000000267413647512751014360 0ustar liggesusers## RcppMLPACK configure.ac ## ## RcppMLPACK compiler check modified from RcppArmadillo to check for gcc > 4.6 ## ## Copyright (C) 2016 Dirk Eddelbuettel, James Balamuta ## ## Licensed under GPL-2 or later ## require at least autoconf 2.61 AC_PREREQ(2.61) ## Process this file with autoconf to produce a configure script. AC_INIT(RcppMLPACK, m4_esyscmd_s([awk -e '/^Version:/ {print $2}' DESCRIPTION])) ## Set R_HOME, respecting an environment variable if one is set : ${R_HOME=$(R RHOME)} if test -z "${R_HOME}"; then AC_MSG_ERROR([Could not determine R_HOME.]) fi ## Use R to set CXX and CXXFLAGS CXX=$(${R_HOME}/bin/R CMD config CXX) CXXFLAGS=$("${R_HOME}/bin/R" CMD config CXXFLAGS) ## We are using C++ AC_LANG(C++) AC_REQUIRE_CPP ## Check the C++ compiler using the CXX value set AC_PROG_CXX ## If it is g++, we have GXX set so let's examine it if test "${GXX}" = yes; then AC_MSG_CHECKING([whether g++ version is sufficient]) gxx_version=$(${CXX} -v 2>&1 | awk '/^.*g.. version/ {print $3}') case ${gxx_version} in 1.*|2.*|3.*|4.0.*|4.1.*|4.2.*|4.3.*|4.4.*|4.5.*|4.6.*) AC_MSG_RESULT([no]) AC_MSG_WARN([Only g++ version 4.7 or greater can be used with RcppMLPACK.]) AC_MSG_ERROR([Please use a different compiler.]) ;; 4.7.*|4.8.*|4.9.*|5.*|6.*) ## Removes gcc 4.6.* from this line gxx_newer_than_45="-fpermissive" AC_MSG_RESULT([(${gxx_version}) yes]) ;; esac fiRcppMLPACK/src/0000755000176200001440000000000013647514343012647 5ustar liggesusersRcppMLPACK/src/init.c0000644000176200001440000000077513647514343013767 0ustar liggesusers#include #include #include // for NULL #include /* FIXME: Check these declarations against the C/Fortran source code. */ /* .Call calls */ extern SEXP RcppMLPACK_kmeans(SEXP, SEXP); static const R_CallMethodDef CallEntries[] = { {"RcppMLPACK_kmeans", (DL_FUNC) &RcppMLPACK_kmeans, 2}, {NULL, NULL, 0} }; void R_init_RcppMLPACK(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } RcppMLPACK/src/Makevars0000644000176200001440000000454613647514343014354 0ustar liggesusersCXX_STD = CXX11 MLPACKOBJECTS=\ ./mlpack/methods/pca/pca.o \ ./mlpack/methods/mvu/mvu.o \ ./mlpack/methods/det/dtree.o \ ./mlpack/methods/det/dt_utils.o \ ./mlpack/methods/linear_regression/linear_regression.o \ ./mlpack/methods/radical/radical.o \ ./mlpack/methods/neighbor_search/unmap.o \ ./mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.o \ ./mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.o \ ./mlpack/methods/lars/lars.o \ ./mlpack/methods/logistic_regression/logistic_regression_function.o \ ./mlpack/methods/regularized_svd/regularized_svd_function.o \ ./mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.o \ ./mlpack/core/math/lin_alg.o \ ./mlpack/core/math/random.o \ ./mlpack/core/kernels/epanechnikov_kernel.o \ ./mlpack/core/kernels/pspectrum_string_kernel.o \ ./mlpack/core/optimizers/lbfgs/test_functions.o \ ./mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.o \ ./mlpack/core/optimizers/sgd/test_function.o \ ./mlpack/core/optimizers/lrsdp/lrsdp_function.o \ ./mlpack/core/optimizers/lrsdp/lrsdp.o \ ./mlpack/core/dists/gaussian_distribution.o \ ./mlpack/core/dists/discrete_distribution.o \ ./mlpack/core/dists/laplace_distribution.o \ ./mlpack/core/tree/mrkd_statistic.o \ ./mlpack/core/tree/cosine_tree/cosine_tree.o \ ./mlpack/core/util/string_util.o PKGOBJECTS = RcppExports.o kmeans.o init.o OBJECTS= $(MLPACKOBJECTS) $(PKGOBJECTS) PKG_CPPFLAGS=-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false -I. PKG_LIBS= $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) all: $(SHLIB) userLibrary ## we place it inside the inst/ directory so that it gets installed by the package USERDIR = ../inst/lib USERLIB = libRcppMLPACK$(DYLIB_EXT) USERLIBST = libRcppMLPACK.a userLibrary: $(USERLIB) $(USERLIBST) -@if test ! -e $(USERDIR)$(R_ARCH); then mkdir -p $(USERDIR)$(R_ARCH); fi cp $(USERLIB) $(USERDIR)$(R_ARCH) cp $(USERLIBST) $(USERDIR)$(R_ARCH) rm $(USERLIB) $(USERLIBST) $(USERLIB): $(OBJECTS) $(SHLIB_CXXLD) -o $(USERLIB) $(OBJECTS) $(SHLIB_CXXLDFLAGS) $(ALL_LIBS) @if test -e "/usr/bin/install_name_tool"; then /usr/bin/install_name_tool -id $(R_PACKAGE_DIR)/lib$(R_ARCH)/$(USERLIB) $(USERLIB); fi $(USERLIBST): $(OBJECTS) $(AR) qc $(USERLIBST) $(OBJECTS) @if test -n "$(RANLIB)"; then $(RANLIB) $(USERLIBST); fi .PHONY: all clean userLibrary clean: rm -f $(OBJECTS) $(SHLIB) $(USERLIB) $(USERLIBST) RcppMLPACK/src/mlpack/0000755000176200001440000000000013647512751014117 5ustar liggesusersRcppMLPACK/src/mlpack/methods/0000755000176200001440000000000013647512751015562 5ustar liggesusersRcppMLPACK/src/mlpack/methods/nystroem_method/0000755000176200001440000000000013647512751021002 5ustar liggesusersRcppMLPACK/src/mlpack/methods/nystroem_method/ordered_selection.hpp0000644000176200001440000000321313647512751025203 0ustar liggesusers/** * @file ordered_selection.hpp * @author Ryan Curtin * * Select the first points of the dataset for use in the Nystroem method of * kernel matrix approximation. This is mostly for testing, but might have * other uses. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_ORDERED_SELECTION_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_ORDERED_SELECTION_HPP #include namespace mlpack { namespace kernel { class OrderedSelection { public: /** * Select the specified number of points in the dataset. * * @param data Dataset to sample from. * @param m Number of points to select. * @return Indices of selected points from the dataset. */ const static arma::Col Select(const arma::mat& /* unused */, const size_t m) { // This generates [0 1 2 3 ... (m - 1)]. return arma::linspace >(0, m - 1, m); } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/nystroem_method/nystroem_method.hpp0000644000176200001440000000574013647512751024741 0ustar liggesusers/** * @file nystroem_method.hpp * @author Ryan Curtin * @author Marcus Edel * * Implementation of the Nystroem method for approximating a kernel matrix. * There are many variations on how to do this, so template parameters allow the * selection of many different techniques. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_HPP #include #include "kmeans_selection.hpp" namespace mlpack { namespace kernel { template< typename KernelType, typename PointSelectionPolicy = KMeansSelection<> > class NystroemMethod { public: /** * Create the NystroemMethod object. The constructor here does not really do * anything. * * @param data Data matrix. * @param kernel Kernel to be used for computation. * @param rank Rank to be used for matrix approximation. */ NystroemMethod(const arma::mat& data, KernelType& kernel, const size_t rank); /** * Apply the low-rank factorization to obtain an output matrix G such that * K' = G * G^T. * * @param output Matrix to store kernel approximation into. */ void Apply(arma::mat& output); /** * Construct the kernel matrix with matrix that contains the selected points. * * @param data Data matrix pointer. * @param miniKernel to store the constructed mini-kernel matrix in. * @param miniKernel to store the constructed semi-kernel matrix in. */ void GetKernelMatrix(const arma::mat* data, arma::mat& miniKernel, arma::mat& semiKernel); /** * Construct the kernel matrix with the selected points. * * @param points Indices of selected points. * @param miniKernel to store the constructed mini-kernel matrix in. * @param miniKernel to store the constructed semi-kernel matrix in. */ void GetKernelMatrix(const arma::Col& selectedPoints, arma::mat& miniKernel, arma::mat& semiKernel); private: //! The reference dataset. const arma::mat& data; //! The locally stored kernel, if it is necessary. KernelType& kernel; //! Rank used for matrix approximation. const size_t rank; }; }; // namespace kernel }; // namespace mlpack // Include implementation. #include "nystroem_method_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/nystroem_method/nystroem_method_impl.hpp0000644000176200001440000000703413647512751025760 0ustar liggesusers/** * @file nystroem_method_impl.hpp * @author Ryan Curtin * @author Marcus Edel * * Implementation of the Nystroem method for approximating a kernel matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_IMPL_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_IMPL_HPP // In case it hasn't been included yet. #include "nystroem_method.hpp" namespace mlpack { namespace kernel { template NystroemMethod::NystroemMethod( const arma::mat& data, KernelType& kernel, const size_t rank) : data(data), kernel(kernel), rank(rank) { } template void NystroemMethod::GetKernelMatrix( const arma::mat* selectedData, arma::mat& miniKernel, arma::mat& semiKernel) { // Assemble mini-kernel matrix. for (size_t i = 0; i < rank; ++i) for (size_t j = 0; j < rank; ++j) miniKernel(i, j) = kernel.Evaluate(selectedData->col(i), selectedData->col(j)); // Construct semi-kernel matrix with interactions between selected data and // all points. for (size_t i = 0; i < data.n_cols; ++i) for (size_t j = 0; j < rank; ++j) semiKernel(i, j) = kernel.Evaluate(data.col(i), selectedData->col(j)); // Clean the memory. delete selectedData; } template void NystroemMethod::GetKernelMatrix( const arma::Col& selectedPoints, arma::mat& miniKernel, arma::mat& semiKernel) { // Assemble mini-kernel matrix. for (size_t i = 0; i < rank; ++i) for (size_t j = 0; j < rank; ++j) miniKernel(i, j) = kernel.Evaluate(data.col(selectedPoints(i)), data.col(selectedPoints(j))); // Construct semi-kernel matrix with interactions between selected points and // all points. for (size_t i = 0; i < data.n_cols; ++i) for (size_t j = 0; j < rank; ++j) semiKernel(i, j) = kernel.Evaluate(data.col(i), data.col(selectedPoints(j))); } template void NystroemMethod::Apply(arma::mat& output) { arma::mat miniKernel(rank, rank); arma::mat semiKernel(data.n_cols, rank); GetKernelMatrix(PointSelectionPolicy::Select(data, rank), miniKernel, semiKernel); // Singular value decomposition mini-kernel matrix. arma::mat U, V; arma::vec s; arma::svd(U, s, V, miniKernel); // Construct the output matrix. arma::mat normalization = arma::diagmat(1.0 / sqrt(s)); output = semiKernel * U * normalization * V; } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/nystroem_method/random_selection.hpp0000644000176200001440000000323213647512751025040 0ustar liggesusers/** * @file random_selection.hpp * @author Ryan Curtin * * Randomly select some points (with replacement) to use for the Nystroem * method. Replacement is suboptimal, but for rank << number of points, this is * unlikely. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_RANDOM_SELECTION_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_RANDOM_SELECTION_HPP #include namespace mlpack { namespace kernel { class RandomSelection { public: /** * Randomly select the specified number of points in the dataset. * * @param data Dataset to sample from. * @param m Number of points to select. * @return Indices of selected points from the dataset. */ const static arma::Col Select(const arma::mat& data, const size_t m) { arma::Col selectedPoints(m); for (size_t i = 0; i < m; ++i) selectedPoints(i) = math::RandInt(0, data.n_cols); return selectedPoints; } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/nystroem_method/kmeans_selection.hpp0000644000176200001440000000372213647512751025042 0ustar liggesusers/** * @file kmeans_selection.hpp * @author Marcus Edel * * Use the centroids of the K-Means clustering method for use in the Nystroem * method of kernel matrix approximation. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_KMEANS_SELECTION_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_KMEANS_SELECTION_HPP #include #include namespace mlpack { namespace kernel { template > class KMeansSelection { public: /** * Use the K-Means clustering method to select the specified number of points * in the dataset. You are responsible for deleting the returned matrix! * * @param data Dataset to sample from. * @param m Number of points to select. * @return Matrix pointer in which centroids are stored. */ const static arma::mat* Select(const arma::mat& data, const size_t m, const size_t maxIterations = 5) { arma::Col assignments; arma::mat* centroids = new arma::mat; // Perform the K-Means clustering method. ClusteringType kmeans(maxIterations); kmeans.Cluster(data, m, assignments, *centroids); return centroids; } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/0000755000176200001440000000000013647512751016325 5ustar liggesusersRcppMLPACK/src/mlpack/methods/amf/amf_impl.hpp0000644000176200001440000000501313647512751020621 0ustar liggesusers/** * @file amf_impl.hpp * @author Sumedh Ghaisas * @author Mohan Rajendran * @author Ryan Curtin * * Implementation of AMF class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ namespace mlpack { namespace amf { /** * Construct the AMF object. */ template AMF::AMF( const TerminationPolicyType& terminationPolicy, const InitializationRuleType& initializationRule, const UpdateRuleType& update) : terminationPolicy(terminationPolicy), initializationRule(initializationRule), update(update) { } /** * Apply Alternating Matrix Factorization to the provided matrix. * * @param V Input matrix to be factorized * @param W Basis matrix to be output * @param H Encoding matrix to output * @param r Rank r of the factorization */ template template double AMF:: Apply(const MatType& V, const size_t r, arma::mat& W, arma::mat& H) { // Initialize W and H. initializationRule.Initialize(V, r, W, H); Rcpp::Rcout << "Initialized W and H." << std::endl; update.Initialize(V, r); terminationPolicy.Initialize(V); while (!terminationPolicy.IsConverged(W, H)) { // Update the values of W and H based on the update rules provided. update.WUpdate(V, W, H); update.HUpdate(V, W, H); } const double residue = terminationPolicy.Index(); const size_t iteration = terminationPolicy.Iteration(); Rcpp::Rcout << "AMF converged to residue of " << residue << " in " << iteration << " iterations." << std::endl; return residue; } }; // namespace amf }; // namespace mlpack RcppMLPACK/src/mlpack/methods/amf/update_rules/0000755000176200001440000000000013647512751021021 5ustar liggesusersRcppMLPACK/src/mlpack/methods/amf/update_rules/svd_complete_incremental_learning.hpp0000644000176200001440000001370413647512751030463 0ustar liggesusers#ifndef SVD_COMPLETE_INCREMENTAL_LEARNING_HPP_INCLUDED #define SVD_COMPLETE_INCREMENTAL_LEARNING_HPP_INCLUDED #include namespace mlpack { namespace amf { template class SVDCompleteIncrementalLearning { public: SVDCompleteIncrementalLearning(double u = 0.0001, double kw = 0, double kh = 0) : u(u), kw(kw), kh(kh) {} void Initialize(const MatType& dataset, const size_t rank) { (void)rank; n = dataset.n_rows; m = dataset.n_cols; currentUserIndex = 0; currentItemIndex = 0; } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ inline void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { arma::mat deltaW(1, W.n_cols); deltaW.zeros(); while(true) { double val; if((val = V(currentItemIndex, currentUserIndex)) != 0) { deltaW += (val - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW -= kw * W.row(currentItemIndex); break; } currentUserIndex = currentUserIndex + 1; if(currentUserIndex == n) { currentUserIndex = 0; currentItemIndex = (currentItemIndex + 1) % m; } } W.row(currentItemIndex) += u*deltaW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ inline void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); while(true) { double val; if((val = V(currentItemIndex, currentUserIndex)) != 0) deltaH += (val - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(W.row(currentItemIndex)); if(kh != 0) deltaH -= kh * H.col(currentUserIndex); currentUserIndex = currentUserIndex + 1; if(currentUserIndex == n) { currentUserIndex = 0; currentItemIndex = (currentItemIndex + 1) % m; } } H.col(currentUserIndex++) += u * deltaH; } private: double u; double kw; double kh; size_t n; size_t m; size_t currentUserIndex; size_t currentItemIndex; }; template<> class SVDCompleteIncrementalLearning { public: SVDCompleteIncrementalLearning(double u = 0.01, double kw = 0, double kh = 0) : u(u), kw(kw), kh(kh), it(NULL) {} ~SVDCompleteIncrementalLearning() { delete it; } void Initialize(const arma::sp_mat& dataset, const size_t rank) { (void)rank; n = dataset.n_rows; m = dataset.n_cols; it = new arma::sp_mat::const_iterator(dataset.begin()); isStart = true; } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ inline void WUpdate(const arma::sp_mat& V, arma::mat& W, const arma::mat& H) { if(!isStart) (*it)++; else isStart = false; if(*it == V.end()) { delete it; it = new arma::sp_mat::const_iterator(V.begin()); } size_t currentUserIndex = it->col(); size_t currentItemIndex = it->row(); arma::mat deltaW(1, W.n_cols); deltaW.zeros(); deltaW += (**it - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW -= kw * W.row(currentItemIndex); W.row(currentItemIndex) += u*deltaW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ inline void HUpdate(const arma::sp_mat& V, const arma::mat& W, arma::mat& H) { (void)V; arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); size_t currentUserIndex = it->col(); size_t currentItemIndex = it->row(); deltaH += (**it - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(W.row(currentItemIndex)); if(kh != 0) deltaH -= kh * H.col(currentUserIndex); H.col(currentUserIndex++) += u * deltaH; } private: double u; double kw; double kh; size_t n; size_t m; arma::sp_mat dummy; arma::sp_mat::const_iterator* it; bool isStart; }; } } #endif // SVD_COMPLETE_INCREMENTAL_LEARNING_HPP_INCLUDED RcppMLPACK/src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp0000644000176200001440000000571313647512751024404 0ustar liggesusers/** * @file nmf_mult_dist.hpp * @author Mohan Rajendran * * Update rules for the Non-negative Matrix Factorization. This follows a method * described in the paper 'Algorithms for Non-negative Matrix Factorization' * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures * that the Frobenius norm \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ is * non-increasing between subsequent iterations. Both of the update rules * for W and H are defined in this file. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIST_UPDATE_RULES_HPP #define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIST_UPDATE_RULES_HPP #include namespace mlpack { namespace amf { /** * The multiplicative distance update rules for matrices W and H. */ class NMFMultiplicativeDistanceUpdate { public: // Empty constructor required for the UpdateRule template. NMFMultiplicativeDistanceUpdate() { } template void Initialize(const MatType& dataset, const size_t rank) { (void)dataset; (void)rank; } /** * The update rule for the basis matrix W. The formula used is * \f[ * W_{ia} \leftarrow W_{ia} \frac{(VH^T)_{ia}}{(WHH^T)_{ia}} * \f] * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline static void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { W = (W % (V * H.t())) / (W * H * H.t()); } /** * The update rule for the encoding matrix H. The formula used is * \f[ * H_{a\mu} \leftarrow H_{a\mu} \frac{(W^T V)_{a\mu}}{(W^T WH)_{a\mu}} * \f] * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline static void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { H = (H % (W.t() * V)) / (W.t() * W * H); } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/update_rules/svd_batch_learning.hpp0000644000176200001440000001177013647512751025354 0ustar liggesusers/** * @file simple_residue_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_AMF_UPDATE_RULES_SVD_BATCHLEARNING_HPP #define __MLPACK_METHODS_AMF_UPDATE_RULES_SVD_BATCHLEARNING_HPP #include namespace mlpack { namespace amf { class SVDBatchLearning { public: SVDBatchLearning(double u = 0.0002, double kw = 0, double kh = 0, double momentum = 0.9, double min = -DBL_MIN, double max = DBL_MAX) : u(u), kw(kw), kh(kh), min(min), max(max), momentum(momentum) {} template void Initialize(const MatType& dataset, const size_t rank) { const size_t n = dataset.n_rows; const size_t m = dataset.n_cols; mW.zeros(n, rank); mH.zeros(rank, m); } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { size_t n = V.n_rows; size_t m = V.n_cols; size_t r = W.n_cols; mW = momentum * mW; arma::mat deltaW(n, r); deltaW.zeros(); for(size_t i = 0;i < n;i++) { for(size_t j = 0;j < m;j++) { double val; if((val = V(i, j)) != 0) deltaW.row(i) += (val - arma::dot(W.row(i), H.col(j))) * arma::trans(H.col(j)); } if(kw != 0) deltaW.row(i) -= kw * W.row(i); } mW += u * deltaW; W += mW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { size_t n = V.n_rows; size_t m = V.n_cols; size_t r = W.n_cols; mH = momentum * mH; arma::mat deltaH(r, m); deltaH.zeros(); for(size_t j = 0;j < m;j++) { for(size_t i = 0;i < n;i++) { double val; if((val = V(i, j)) != 0) deltaH.col(j) += (val - arma::dot(W.row(i), H.col(j))) * arma::trans(W.row(i)); } if(kh != 0) deltaH.col(j) -= kh * H.col(j); } mH += u*deltaH; H += mH; } private: double u; double kw; double kh; double min; double max; double momentum; arma::mat mW; arma::mat mH; }; template<> inline void SVDBatchLearning::WUpdate(const arma::sp_mat& V, arma::mat& W, const arma::mat& H) { size_t n = V.n_rows; size_t r = W.n_cols; mW = momentum * mW; arma::mat deltaW(n, r); deltaW.zeros(); for(arma::sp_mat::const_iterator it = V.begin();it != V.end();it++) { size_t row = it.row(); size_t col = it.col(); deltaW.row(it.row()) += (*it - arma::dot(W.row(row), H.col(col))) * arma::trans(H.col(col)); } if(kw != 0) for(size_t i = 0; i < n; i++) { deltaW.row(i) -= kw * W.row(i); } mW += u * deltaW; W += mW; } template<> inline void SVDBatchLearning::HUpdate(const arma::sp_mat& V, const arma::mat& W, arma::mat& H) { size_t m = V.n_cols; size_t r = W.n_cols; mH = momentum * mH; arma::mat deltaH(r, m); deltaH.zeros(); for(arma::sp_mat::const_iterator it = V.begin();it != V.end();it++) { size_t row = it.row(); size_t col = it.col(); deltaH.col(col) += (*it - arma::dot(W.row(row), H.col(col))) * arma::trans(W.row(row)); } if(kh != 0) for(size_t j = 0; j < m; j++) { deltaH.col(j) -= kh * H.col(j); } mH += u*deltaH; H += mH; } } // namespace amf } // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/update_rules/nmf_als.hpp0000644000176200001440000000644013647512751023155 0ustar liggesusers/** * @file nmf_als.hpp * @author Mohan Rajendran * * Update rules for the Non-negative Matrix Factorization. This follows a method * titled 'Alternating Least Squares' described in the paper 'Positive Matrix * Factorization: A Non-negative Factor Model with Optimal Utilization of * Error Estimates of Data Values' by P. Paatero and U. Tapper. It uses least * squares projection formula to reduce the error value of * \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ by alternately calculating W and H * respectively while holding the other matrix constant. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_ALS_HPP #define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_ALS_HPP #include namespace mlpack { namespace amf { /** * The alternating least square update rules of matrices W and H. */ class NMFALSUpdate { public: // Empty constructor required for the UpdateRule template. NMFALSUpdate() { } template void Initialize(const MatType& dataset, const size_t rank) { (void)dataset; (void)rank; } /** * The update rule for the basis matrix W. The formula used is * \f[ * W^T = \frac{HV^T}{HH^T} * \f] * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline static void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { // The call to inv() sometimes fails; so we are using the psuedoinverse. // W = (inv(H * H.t()) * H * V.t()).t(); W = V * H.t() * pinv(H * H.t()); // Set all negative numbers to machine epsilon for (size_t i = 0; i < W.n_elem; i++) { if (W(i) < 0.0) { W(i) = 0.0; } } } /** * The update rule for the encoding matrix H. The formula used is * \f[ * H = \frac{W^TV}{W^TW} * \f] * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline static void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { H = pinv(W.t() * W) * W.t() * V; // Set all negative numbers to 0. for (size_t i = 0; i < H.n_elem; i++) { if (H(i) < 0.0) { H(i) = 0.0; } } } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp0000644000176200001440000001056213647512751024221 0ustar liggesusers/** * @file mult_div_update_rules.hpp * @author Mohan Rajendran * * Update rules for the Non-negative Matrix Factorization. This follows a method * described in the paper 'Algorithms for Non-negative Matrix Factorization' * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures * that the Kullback–Leibler divergence * \f$ \sum_i \sum_j (V_{ij} log\frac{V_{ij}}{(WH)_{ij}}-V_{ij}+(WH)_{ij}) \f$is * non-increasing between subsequent iterations. Both of the update rules * for W and H are defined in this file. * * This set of update rules is not meant to work with sparse matrices. Using * sparse matrices often causes NaNs in the output, so other choices of update * rules are better in that situation. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIV_HPP #define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIV_HPP #include namespace mlpack { namespace amf { class NMFMultiplicativeDivergenceUpdate { public: // Empty constructor required for the WUpdateRule template. NMFMultiplicativeDivergenceUpdate() { } template void Initialize(const MatType& dataset, const size_t rank) { (void)dataset; (void)rank; } /** * The update rule for the basis matrix W. The formula used is * \f[ * W_{ia} \leftarrow W_{ia} \frac{\sum_{\mu} H_{a\mu} V_{i\mu}/(WH)_{i\mu}} * {\sum_{\nu} H_{a\nu}} * \f] * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline static void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { // Simple implementation left in the header file. arma::mat t1; arma::rowvec t2; t1 = W * H; for (size_t i = 0; i < W.n_rows; ++i) { for (size_t j = 0; j < W.n_cols; ++j) { // Writing this as a single expression does not work as of Armadillo // 3.920. This should be fixed in a future release, and then the code // below can be fixed. //t2 = H.row(j) % V.row(i) / t1.row(i); t2.set_size(H.n_cols); for (size_t k = 0; k < t2.n_elem; ++k) { t2(k) = H(j, k) * V(i, k) / t1(i, k); } W(i, j) = W(i, j) * sum(t2) / sum(H.row(j)); } } } /** * The update rule for the encoding matrix H. The formula used is * \f[ * H_{a\mu} \leftarrow H_{a\mu} \frac{\sum_{i} W_{ia} V_{i\mu}/(WH)_{i\mu}} * {\sum_{k} H_{ka}} * \f] * The function takes in all the matrices and only changes the value * of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to updated. */ template inline static void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { // Simple implementation left in the header file. arma::mat t1; arma::colvec t2; t1 = W * H; for (size_t i = 0; i < H.n_rows; i++) { for (size_t j = 0; j < H.n_cols; j++) { // Writing this as a single expression does not work as of Armadillo // 3.920. This should be fixed in a future release, and then the code // below can be fixed. //t2 = W.col(i) % V.col(j) / t1.col(j); t2.set_size(W.n_rows); for (size_t k = 0; k < t2.n_elem; ++k) { t2(k) = W(k, i) * V(k, j) / t1(k, j); } H(i,j) = H(i,j) * sum(t2) / sum(W.col(i)); } } } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/update_rules/svd_incomplete_incremental_learning.hpp0000644000176200001440000001131413647512751031005 0ustar liggesusers#ifndef SVD_INCREMENTAL_LEARNING_HPP_INCLUDED #define SVD_INCREMENTAL_LEARNING_HPP_INCLUDED namespace mlpack { namespace amf { class SVDIncompleteIncrementalLearning { public: SVDIncompleteIncrementalLearning(double u = 0.001, double kw = 0, double kh = 0) : u(u), kw(kw), kh(kh) {} template void Initialize(const MatType& dataset, const size_t rank) { (void)rank; n = dataset.n_rows; m = dataset.n_cols; currentUserIndex = 0; } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ template inline void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { arma::mat deltaW(n, W.n_cols); deltaW.zeros(); for(size_t i = 0;i < n;i++) { double val; if((val = V(i, currentUserIndex)) != 0) deltaW.row(i) += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW.row(i) -= kw * W.row(i); } W += u*deltaW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); for(size_t i = 0;i < n;i++) { double val; if((val = V(i, currentUserIndex)) != 0) deltaH += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(W.row(i)); } if(kh != 0) deltaH -= kh * H.col(currentUserIndex); H.col(currentUserIndex++) += u * deltaH; currentUserIndex = currentUserIndex % m; } private: double u; double kw; double kh; size_t n; size_t m; size_t currentUserIndex; }; template<> inline void SVDIncompleteIncrementalLearning:: WUpdate(const arma::sp_mat& V, arma::mat& W, const arma::mat& H) { arma::mat deltaW(n, W.n_cols); deltaW.zeros(); for(arma::sp_mat::const_iterator it = V.begin_col(currentUserIndex); it != V.end_col(currentUserIndex);it++) { double val = *it; size_t i = it.row(); deltaW.row(i) += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW.row(i) -= kw * W.row(i); } W += u*deltaW; } template<> inline void SVDIncompleteIncrementalLearning:: HUpdate(const arma::sp_mat& V, const arma::mat& W, arma::mat& H) { arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); for(arma::sp_mat::const_iterator it = V.begin_col(currentUserIndex); it != V.end_col(currentUserIndex);it++) { double val = *it; size_t i = it.row(); if((val = V(i, currentUserIndex)) != 0) deltaH += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(W.row(i)); } if(kh != 0) deltaH -= kh * H.col(currentUserIndex); H.col(currentUserIndex++) += u * deltaH; currentUserIndex = currentUserIndex % m; } }; // namepsace amf }; // namespace mlpack #endif // SVD_INCREMENTAL_LEARNING_HPP_INCLUDED RcppMLPACK/src/mlpack/methods/amf/init_rules/0000755000176200001440000000000013647512751020502 5ustar liggesusersRcppMLPACK/src/mlpack/methods/amf/init_rules/random_init.hpp0000644000176200001440000000322613647512751023521 0ustar liggesusers/** * @file random_init.hpp * @author Mohan Rajendran * * Intialization rule for Non-Negative Matrix Factorization (NMF). This simple * initialization is performed by assigning a random matrix to W and H. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_RANDOM_INIT_HPP #define __MLPACK_METHODS_LMF_RANDOM_INIT_HPP #include namespace mlpack { namespace amf { class RandomInitialization { public: // Empty constructor required for the InitializeRule template RandomInitialization() { } template inline static void Initialize(const MatType& V, const size_t r, arma::mat& W, arma::mat& H) { // Simple implementation (left in the header file due to its simplicity). size_t n = V.n_rows; size_t m = V.n_cols; // Intialize to random values. W.randu(n, r); H.randu(r, m); } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/init_rules/random_acol_init.hpp0000644000176200001440000000515013647512751024515 0ustar liggesusers/** * @file random_acol_init.hpp * @author Mohan Rajendran * * Intialization rule for Non-Negative Matrix Factorization. This simple * initialization is performed by the random Acol initialization introduced in * the paper 'Algorithms, Initializations and Convergence' by Langville et al. * This method sets each of the columns of W by averaging p randomly chosen * columns of V. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_RANDOM_ACOL_INIT_HPP #define __MLPACK_METHODS_LMF_RANDOM_ACOL_INIT_HPP #include namespace mlpack { namespace amf { /** * This class initializes the W matrix of the NMF algorithm by averaging p * randomly chosen columns of V. In this case, p is a template parameter. H is * then set randomly. * * @tparam The number of random columns to average for each column of W. */ template class RandomAcolInitialization { public: // Empty constructor required for the InitializeRule template RandomAcolInitialization() { } template inline static void Initialize(const MatType& V, const size_t r, arma::mat& W, arma::mat& H) { const size_t n = V.n_rows; const size_t m = V.n_cols; if (p > m) { Log::Warn << "Number of random columns is more than the number of columns" << "available in the V matrix; weird results may ensue!" << std::endl; } W.zeros(n, r); // Initialize W matrix with random columns. for (size_t col = 0; col < r; col++) { for (size_t randCol = 0; randCol < p; randCol++) { // .col() does not work in this case, as of Armadillo 3.920. W.unsafe_col(col) += V.col(math::RandInt(0, m)); } } // Now divide by p. W /= p; // Initialize H to random values. H.randu(r, m); } }; // Class RandomAcolInitialization }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/termination_policies/0000755000176200001440000000000013647512751022545 5ustar liggesusersRcppMLPACK/src/mlpack/methods/amf/termination_policies/simple_tolerance_termination.hpp0000644000176200001440000000676013647512751031225 0ustar liggesusers/** * @file simple_tolerance_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHODS_AMF_SIMPLE_TOLERANCE_TERMINATION_HPP_INCLUDED #define _MLPACK_METHODS_AMF_SIMPLE_TOLERANCE_TERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { template class SimpleToleranceTermination { public: SimpleToleranceTermination(const double tolerance = 1e-5, const size_t maxIterations = 10000, const size_t reverseStepTolerance = 3) : tolerance(tolerance), maxIterations(maxIterations), reverseStepTolerance(reverseStepTolerance) {} void Initialize(const MatType& V) { residueOld = DBL_MAX; iteration = 1; residue = DBL_MIN; reverseStepCount = 0; this->V = &V; c_index = 0; c_indexOld = 0; reverseStepCount = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { // Calculate norm of WH after each iteration. arma::mat WH; WH = W * H; residueOld = residue; size_t n = V->n_rows; size_t m = V->n_cols; double sum = 0; size_t count = 0; for(size_t i = 0;i < n;i++) { for(size_t j = 0;j < m;j++) { double temp = 0; if((temp = (*V)(i,j)) != 0) { temp = (temp - WH(i, j)); temp = temp * temp; sum += temp; count++; } } } residue = sum / count; residue = sqrt(residue); iteration++; if((residueOld - residue) / residueOld < tolerance && iteration > 4) { if(reverseStepCount == 0 && isCopy == false) { isCopy = true; this->W = W; this->H = H; c_index = residue; c_indexOld = residueOld; } reverseStepCount++; } else { reverseStepCount = 0; if(residue <= c_indexOld && isCopy == true) { isCopy = false; } } if(reverseStepCount == reverseStepTolerance || iteration > maxIterations) { if(isCopy) { W = this->W; H = this->H; residue = c_index; } return true; } else return false; } const double& Index() { return residue; } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return maxIterations; } private: double tolerance; size_t maxIterations; const MatType* V; size_t iteration; double residueOld; double residue; double normOld; size_t reverseStepTolerance; size_t reverseStepCount; bool isCopy; arma::mat W; arma::mat H; double c_indexOld; double c_index; }; // class SimpleToleranceTermination }; // namespace amf }; // namespace mlpack #endif // _MLPACK_METHODS_AMF_SIMPLE_TOLERANCE_TERMINATION_HPP_INCLUDED RcppMLPACK/src/mlpack/methods/amf/termination_policies/simple_residue_termination.hpp0000644000176200001440000000434113647512751030702 0ustar liggesusers/** * @file simple_residue_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHODS_AMF_SIMPLERESIDUETERMINATION_HPP_INCLUDED #define _MLPACK_METHODS_AMF_SIMPLERESIDUETERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { class SimpleResidueTermination { public: SimpleResidueTermination(const double minResidue = 1e-10, const size_t maxIterations = 10000) : minResidue(minResidue), maxIterations(maxIterations) { } template void Initialize(const MatType& V) { residue = minResidue; iteration = 1; normOld = 0; const size_t n = V.n_rows; const size_t m = V.n_cols; nm = n * m; } bool IsConverged(arma::mat& W, arma::mat& H) { // Calculate norm of WH after each iteration. arma::mat WH; WH = W * H; double norm = sqrt(accu(WH % WH) / nm); if (iteration != 0) { residue = fabs(normOld - norm); residue /= normOld; } normOld = norm; iteration++; if(residue < minResidue || iteration > maxIterations) return true; else return false; } const double& Index() { return residue; } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return maxIterations; } public: double minResidue; size_t maxIterations; double residue; size_t iteration; double normOld; size_t nm; }; // class SimpleResidueTermination }; // namespace amf }; // namespace mlpack #endif // _MLPACK_METHODS_AMF_SIMPLERESIDUETERMINATION_HPP_INCLUDED RcppMLPACK/src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp0000644000176200001440000000346613647512751032420 0ustar liggesusers/** * @file incomplete_incremental_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _INCOMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #define _INCOMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { template class IncompleteIncrementalTermination { public: IncompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy()) : t_policy(t_policy) {} template void Initialize(const MatType& V) { t_policy.Initialize(V); incrementalIndex = V.n_rows; iteration = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { iteration++; if(iteration % incrementalIndex == 0) return t_policy.IsConverged(W, H); else return false; } const double& Index() { return t_policy.Index(); } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return t_policy.MaxIterations(); } private: TerminationPolicy t_policy; size_t incrementalIndex; size_t iteration; }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/amf/termination_policies/validation_RMSE_termination.hpp0000644000176200001440000000757713647512751030667 0ustar liggesusers/** * @file validation_RMSE_termination.hpp * @author Sumedh Ghaisas * * Termination policy that checks validation RMSE. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef VALIDATION_RMSE_TERMINATION_HPP_INCLUDED #define VALIDATION_RMSE_TERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { template class ValidationRMSETermination { public: ValidationRMSETermination(MatType& V, size_t num_test_points, double tolerance = 1e-5, size_t maxIterations = 10000, size_t reverseStepTolerance = 3) : tolerance(tolerance), maxIterations(maxIterations), num_test_points(num_test_points), reverseStepTolerance(reverseStepTolerance) { size_t n = V.n_rows; size_t m = V.n_cols; test_points.zeros(num_test_points, 3); for(size_t i = 0; i < num_test_points; i++) { double t_val; size_t t_row; size_t t_col; do { t_row = rand() % n; t_col = rand() % m; } while((t_val = V(t_row, t_col)) == 0); test_points(i, 0) = t_row; test_points(i, 1) = t_col; test_points(i, 2) = t_val; V(t_row, t_col) = 0; } } void Initialize(const MatType& /* V */) { iteration = 1; rmse = DBL_MAX; rmseOld = DBL_MAX; c_index = 0; c_indexOld = 0; reverseStepCount = 0; isCopy = false; } bool IsConverged(arma::mat& W, arma::mat& H) { // Calculate norm of WH after each iteration. arma::mat WH; WH = W * H; if (iteration != 0) { rmseOld = rmse; rmse = 0; for(size_t i = 0; i < num_test_points; i++) { size_t t_row = test_points(i, 0); size_t t_col = test_points(i, 1); double t_val = test_points(i, 2); double temp = (t_val - WH(t_row, t_col)); temp *= temp; rmse += temp; } rmse /= num_test_points; rmse = sqrt(rmse); } iteration++; if((rmseOld - rmse) / rmseOld < tolerance && iteration > 4) { if(reverseStepCount == 0 && isCopy == false) { isCopy = true; this->W = W; this->H = H; c_indexOld = rmseOld; c_index = rmse; } reverseStepCount++; } else { reverseStepCount = 0; if(rmse <= c_indexOld && isCopy == true) { isCopy = false; } } if(reverseStepCount == reverseStepTolerance || iteration > maxIterations) { if(isCopy) { W = this->W; H = this->H; rmse = c_index; } return true; } else return false; } const double& Index() { return rmse; } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return maxIterations; } private: double tolerance; size_t maxIterations; size_t num_test_points; size_t iteration; arma::Mat test_points; double rmseOld; double rmse; size_t reverseStepTolerance; size_t reverseStepCount; bool isCopy; arma::mat W; arma::mat H; double c_indexOld; double c_index; }; } // namespace amf } // namespace mlpack #endif // VALIDATION_RMSE_TERMINATION_HPP_INCLUDED RcppMLPACK/src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp0000644000176200001440000000377713647512751032076 0ustar liggesusers/** * @file complete_incremental_termination_hpp * @author Sumedh Ghaisas * * Complete incremental learning termination policy. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #define COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED namespace mlpack { namespace amf { template class CompleteIncrementalTermination { public: CompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy()) : t_policy(t_policy) {} template void Initialize(const MatType& V) { t_policy.Initialize(V); incrementalIndex = accu(V != 0); iteration = 0; } void Initialize(const arma::sp_mat& V) { t_policy.Initialize(V); incrementalIndex = V.n_nonzero; iteration = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { iteration++; if(iteration % incrementalIndex == 0) return t_policy.IsConverged(W, H); else return false; } const double& Index() { return t_policy.Index(); } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return t_policy.MaxIterations(); } private: TerminationPolicy t_policy; size_t incrementalIndex; size_t iteration; }; } // namespace amf } // namespace mlpack #endif // COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED RcppMLPACK/src/mlpack/methods/amf/amf.hpp0000644000176200001440000001315513647512751017606 0ustar liggesusers/** * @file amf.hpp * @author Sumedh Ghaisas * @author Mohan Rajendran * @author Ryan Curtin * * The AMF (alternating matrix factorization) class, from which more commonly * known techniques such as incremental SVD, NMF, and batch-learning SVD can be * derived. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_AMF_AMF_HPP #define __MLPACK_METHODS_AMF_AMF_HPP #include #include #include #include namespace mlpack { namespace amf { /** * This class implements AMF (alternating matrix factorization) on the given * matrix V. Alternating matrix factorization decomposes V in the form * \f$ V \approx WH \f$ where W is called the basis matrix and H is called the * encoding matrix. V is taken to be of size n x m and the obtained W is n x r * and H is r x m. The size r is called the rank of the factorization. * * The implementation requires three template types; the first contains the * policy used to determine when the algorithm has converged; the second * contains the initialization rule for the W and H matrix; the last contains * the update rule to be used during each iteration. This templatization allows * the user to try various update rules, initialization rules, and termination * policies (including ones not supplied with MLPACK) for factorization. By * default, the template parameters to AMF implement non-negative matrix * factorization with the multiplicative distance update. * * A simple example of how to run AMF (or NMF) is shown below. * * @code * extern arma::mat V; // Matrix that we want to perform LMF on. * size_t r = 10; // Rank of decomposition * arma::mat W; // Basis matrix * arma::mat H; // Encoding matrix * * AMF<> amf; // Default options: NMF with multiplicative distance update rules. * amf.Apply(V, W, H, r); * @endcode * * @tparam TerminationPolicy The policy to use for determining when the * factorization has converged. * @tparam InitializationRule The initialization rule for initializing W and H * matrix. * @tparam UpdateRule The update rule for calculating W and H matrix at each * iteration. * * @see NMF_MultiplicativeDistanceUpdate */ template class AMF { public: /** * Create the AMF object and (optionally) set the parameters which AMF will * run with. The minimum residue refers to the root mean square of the * difference between two subsequent iterations of the product W * H. A low * residue indicates that subsequent iterations are not producing much change * in W and H. Once the residue goes below the specified minimum residue, the * algorithm terminates. * * @param initializationRule Optional instantiated InitializationRule object * for initializing the W and H matrices. * @param updateRule Optional instantiated UpdateRule object; this parameter * is useful when the update rule for the W and H vector has state that * it needs to store (i.e. HUpdate() and WUpdate() are not static * functions). * @param terminationPolicy Optional instantiated TerminationPolicy object. */ AMF(const TerminationPolicyType& terminationPolicy = TerminationPolicyType(), const InitializationRuleType& initializeRule = InitializationRuleType(), const UpdateRuleType& update = UpdateRuleType()); /** * Apply Alternating Matrix Factorization to the provided matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be output. * @param H Encoding matrix to output. * @param r Rank r of the factorization. */ template double Apply(const MatType& V, const size_t r, arma::mat& W, arma::mat& H); //! Access the termination policy. const TerminationPolicyType& TerminationPolicy() const { return terminationPolicy; } //! Modify the termination policy. TerminationPolicyType& TerminationPolicy() { return terminationPolicy; } //! Access the initialization rule. const InitializationRuleType& InitializeRule() const { return initializationRule; } //! Modify the initialization rule. InitializationRuleType& InitializeRule() { return initializationRule; } //! Access the update rule. const UpdateRuleType& Update() const { return update; } //! Modify the update rule. UpdateRuleType& Update() { return update; } private: //! Termination policy. TerminationPolicyType terminationPolicy; //! Instantiated initialization Rule. InitializationRuleType initializationRule; //! Instantiated update rule. UpdateRuleType update; }; // class AMF }; // namespace amf }; // namespace mlpack // Include implementation. #include "amf_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/radical/0000755000176200001440000000000013647514343017160 5ustar liggesusersRcppMLPACK/src/mlpack/methods/radical/radical.cpp0000644000176200001440000001404313647514343021265 0ustar liggesusers/** * @file radical.cpp * @author Nishant Mehta * * Implementation of Radical class * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "radical.hpp" using namespace std; using namespace arma; using namespace mlpack; using namespace mlpack::radical; // Set the parameters to RADICAL. Radical::Radical(const double noiseStdDev, const size_t replicates, const size_t angles, const size_t sweeps, const size_t m) : noiseStdDev(noiseStdDev), replicates(replicates), angles(angles), sweeps(sweeps), m(m) { // Nothing to do here. } void Radical::CopyAndPerturb(mat& xNew, const mat& x) const { //Timer::Start("radical_copy_and_perturb"); xNew = repmat(x, replicates, 1) + noiseStdDev * randn(replicates * x.n_rows, x.n_cols); //Timer::Stop("radical_copy_and_perturb"); } double Radical::Vasicek(vec& z) const { z = sort(z); // Apparently slower. /* vec logs = log(z.subvec(m, z.n_elem - 1) - z.subvec(0, z.n_elem - 1 - m)); //vec val = sum(log(z.subvec(m, nPoints - 1) - z.subvec(0, nPoints - 1 - m))); return (double) sum(logs); */ // Apparently faster. double sum = 0; uword range = z.n_elem - m; for (uword i = 0; i < range; i++) { sum += log(z(i + m) - z(i)); } return sum; } double Radical::DoRadical2D(const mat& matX) { CopyAndPerturb(perturbed, matX); mat::fixed<2, 2> matJacobi; vec values(angles); for (size_t i = 0; i < angles; i++) { const double theta = (i / (double) angles) * M_PI / 2.0; const double cosTheta = cos(theta); const double sinTheta = sin(theta); matJacobi(0, 0) = cosTheta; matJacobi(1, 0) = -sinTheta; matJacobi(0, 1) = sinTheta; matJacobi(1, 1) = cosTheta; candidate = perturbed * matJacobi; vec candidateY1 = candidate.unsafe_col(0); vec candidateY2 = candidate.unsafe_col(1); values(i) = Vasicek(candidateY1) + Vasicek(candidateY2); } uword indOpt; values.min(indOpt); // we ignore the return value; we don't care about it return (indOpt / (double) angles) * M_PI / 2.0; } void Radical::DoRadical(const mat& matXT, mat& matY, mat& matW) { // matX is nPoints by nDims (although less intuitive than columns being // points, and although this is the transpose of the ICA literature, this // choice is for computational efficiency when repeatedly generating // two-dimensional coordinate projections for Radical2D). //Timer::Start("radical_transpose_data"); mat matX = trans(matXT); //Timer::Stop("radical_transpose_data"); // If m was not specified, initialize m as recommended in // (Learned-Miller and Fisher, 2003). if (m < 1) m = floor(sqrt((double) matX.n_rows)); const size_t nDims = matX.n_cols; const size_t nPoints = matX.n_rows; //Timer::Start("radical_whiten_data"); mat matXWhitened; mat matWhitening; WhitenFeatureMajorMatrix(matX, matY, matWhitening); //Timer::Stop("radical_whiten_data"); // matY is now the whitened form of matX. // In the RADICAL code, they do not copy and perturb initially, although the // paper does. We follow the code as it should match their reported results // and likely does a better job bouncing out of local optima. //GeneratePerturbedX(X, X); // Initialize the unmixing matrix to the whitening matrix. //Timer::Start("radical_do_radical"); matW = matWhitening; mat matYSubspace(nPoints, 2); mat matJ = eye(nDims, nDims); for (size_t sweepNum = 0; sweepNum < sweeps; sweepNum++) { Rcpp::Rcout << "RADICAL: sweep " << sweepNum << "." << std::endl; for (size_t i = 0; i < nDims - 1; i++) { for (size_t j = i + 1; j < nDims; j++) { Rcpp::Rcout << "RADICAL 2D on dimensions " << i << " and " << j << "." << std::endl; matYSubspace.col(0) = matY.col(i); matYSubspace.col(1) = matY.col(j); const double thetaOpt = DoRadical2D(matYSubspace); const double cosThetaOpt = cos(thetaOpt); const double sinThetaOpt = sin(thetaOpt); // Set elements of transformation matrix. matJ(i, i) = cosThetaOpt; matJ(j, i) = -sinThetaOpt; matJ(i, j) = sinThetaOpt; matJ(j, j) = cosThetaOpt; matY *= matJ; // Unset elements of transformation matrix, so matJ = eye() again. matJ(i, i) = 1; matJ(j, i) = 0; matJ(i, j) = 0; matJ(j, j) = 1; } } } //Timer::Stop("radical_do_radical"); // The final transposes provide W and Y in the typical form from the ICA // literature. //Timer::Start("radical_transpose_data"); matW = trans(matW); matY = trans(matY); //Timer::Stop("radical_transpose_data"); } void mlpack::radical::WhitenFeatureMajorMatrix(const mat& matX, mat& matXWhitened, mat& matWhitening) { mat matU, matV; vec s; svd(matU, s, matV, cov(matX)); matWhitening = matU * diagmat(1 / sqrt(s)) * trans(matV); matXWhitened = matX * matWhitening; } // return a string of this object. std::string Radical::ToString() const { std::ostringstream convert; convert << "Radical [" << this << "]" << std::endl; convert << " StdDev of Noise: " << noiseStdDev << std::endl; convert << " Number of Replicates: " << replicates << std::endl; convert << " Number of Angles: " << angles << std::endl; convert << " M value: " << m << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/methods/radical/radical.hpp0000644000176200001440000001243113647512751021272 0ustar liggesusers/** * @file radical.hpp * @author Nishant Mehta * * Declaration of Radical class (RADICAL is Robust, Accurate, Direct ICA * aLgorithm). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RADICAL_RADICAL_HPP #define __MLPACK_METHODS_RADICAL_RADICAL_HPP #include namespace mlpack { namespace radical { /** * An implementation of RADICAL, an algorithm for independent component * analysis (ICA). * * Let X be a matrix where each column is a point and each row a dimension. * The goal is to find a square unmixing matrix W such that Y = W X and * the rows of Y are independent components. * * For more details, see the following paper: * * @code * @article{learned2003ica, * title = {ICA Using Spacings Estimates of Entropy}, * author = {Learned-Miller, E.G. and Fisher III, J.W.}, * journal = {Journal of Machine Learning Research}, * volume = {4}, * pages = {1271--1295}, * year = {2003} * } * @endcode */ class Radical { public: /** * Set the parameters to RADICAL. * * @param noiseStdDev Standard deviation of the Gaussian noise added to the * replicates of the data points during Radical2D * @param replicates Number of Gaussian-perturbed replicates to use (per * point) in Radical2D * @param angles Number of angles to consider in brute-force search during * Radical2D * @param sweeps Number of sweeps. Each sweep calls Radical2D once for each * pair of dimensions * @param m The variable m from Vasicek's m-spacing estimator of entropy. */ Radical(const double noiseStdDev = 0.175, const size_t replicates = 30, const size_t angles = 150, const size_t sweeps = 0, const size_t m = 0); /** * Run RADICAL. * * @param matX Input data into the algorithm - a matrix where each column is * a point and each row is a dimension. * @param matY Estimated independent components - a matrix where each column * is a point and each row is an estimated independent component. * @param matW Estimated unmixing matrix, where matY = matW * matX. */ void DoRadical(const arma::mat& matX, arma::mat& matY, arma::mat& matW); /** * Vasicek's m-spacing estimator of entropy, with overlap modification from * (Learned-Miller and Fisher, 2003). * * @param x Empirical sample (one-dimensional) over which to estimate entropy. */ double Vasicek(arma::vec& x) const; /** * Make replicates of each data point (the number of replicates is set in * either the constructor or with Replicates()) and perturb data with Gaussian * noise with standard deviation noiseStdDev. */ void CopyAndPerturb(arma::mat& xNew, const arma::mat& x) const; //! Two-dimensional version of RADICAL. double DoRadical2D(const arma::mat& matX); //! Get the standard deviation of the additive Gaussian noise. double NoiseStdDev() const { return noiseStdDev; } //! Modify the standard deviation of the additive Gaussian noise. double& NoiseStdDev() { return noiseStdDev; } //! Get the number of Gaussian-perturbed replicates used per point. size_t Replicates() const { return replicates; } //! Modify the number of Gaussian-perturbed replicates used per point. size_t& Replicates() { return replicates; } //! Get the number of angles considered during brute-force search. size_t Angles() const { return angles; } //! Modify the number of angles considered during brute-force search. size_t& Angles() { return angles; } //! Get the number of sweeps. size_t Sweeps() const { return sweeps; } //! Modify the number of sweeps. size_t& Sweeps() { return sweeps; } // Returns a string representation of this object. std::string ToString() const; private: //! Standard deviation of the Gaussian noise added to the replicates of //! the data points during Radical2D. double noiseStdDev; //! Number of Gaussian-perturbed replicates to use (per point) in Radical2D. size_t replicates; //! Number of angles to consider in brute-force search during Radical2D. size_t angles; //! Number of sweeps; each sweep calls Radical2D once for each pair of //! dimensions. size_t sweeps; //! Value of m to use for Vasicek's m-spacing estimator of entropy. size_t m; //! Internal matrix, held as member variable to prevent memory reallocations. arma::mat perturbed; //! Internal matrix, held as member variable to prevent memory reallocations. arma::mat candidate; }; void WhitenFeatureMajorMatrix(const arma::mat& matX, arma::mat& matXWhitened, arma::mat& matWhitening); }; // namespace radical }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/naive_bayes/0000755000176200001440000000000013647512751020047 5ustar liggesusersRcppMLPACK/src/mlpack/methods/naive_bayes/naive_bayes_classifier.hpp0000644000176200001440000001112713647512751025253 0ustar liggesusers/** * @file naive_bayes_classifier.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * A Naive Bayes Classifier which parametrically estimates the distribution of * the features. It is assumed that the features have been sampled from a * Gaussian PDF. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_HPP #define __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_HPP #include #include namespace mlpack { namespace naive_bayes /** The Naive Bayes Classifier. */ { /** * The simple Naive Bayes classifier. This class trains on the data by * calculating the sample mean and variance of the features with respect to each * of the labels, and also the class probabilities. The class labels are * assumed to be positive integers (starting with 0), and are expected to be the * last row of the data input to the constructor. * * Mathematically, it computes P(X_i = x_i | Y = y_j) for each feature X_i for * each of the labels y_j. Alongwith this, it also computes the classs * probabilities P(Y = y_j). * * For classifying a data point (x_1, x_2, ..., x_n), it computes the following: * arg max_y(P(Y = y)*P(X_1 = x_1 | Y = y) * ... * P(X_n = x_n | Y = y)) * * Example use: * * @code * extern arma::mat training_data, testing_data; * NaiveBayesClassifier<> nbc(training_data, 5); * arma::vec results; * * nbc.Classify(testing_data, results); * @endcode */ template class NaiveBayesClassifier { private: //! Sample mean for each class. MatType means; //! Sample variances for each class. MatType variances; //! Class probabilities. arma::vec probabilities; public: /** * Initializes the classifier as per the input and then trains it by * calculating the sample mean and variances. The input data is expected to * have integer labels as the last row (starting with 0 and not greater than * the number of classes). * * Example use: * @code * extern arma::mat training_data, testing_data; * extern arma::Col labels; * NaiveBayesClassifier nbc(training_data, labels, 5); * @endcode * * @param data Training data points. * @param labels Labels corresponding to training data points. * @param classes Number of classes in this classifier. * @param incrementalVariance If true, an incremental algorithm is used to * calculate the variance; this can prevent loss of precision in some * cases, but will be somewhat slower to calculate. */ NaiveBayesClassifier(const MatType& data, const arma::Col& labels, const size_t classes, const bool incrementalVariance = false); /** * Given a bunch of data points, this function evaluates the class of each of * those data points, and puts it in the vector 'results'. * * @code * arma::mat test_data; // each column is a test point * arma::Col results; * ... * nbc.Classify(test_data, &results); * @endcode * * @param data List of data points. * @param results Vector that class predictions will be placed into. */ void Classify(const MatType& data, arma::Col& results); //! Get the sample means for each class. const MatType& Means() const { return means; } //! Modify the sample means for each class. MatType& Means() { return means; } //! Get the sample variances for each class. const MatType& Variances() const { return variances; } //! Modify the sample variances for each class. MatType& Variances() { return variances; } //! Get the prior probabilities for each class. const arma::vec& Probabilities() const { return probabilities; } //! Modify the prior probabilities for each class. arma::vec& Probabilities() { return probabilities; } }; }; // namespace naive_bayes }; // namespace mlpack // Include implementation. #include "naive_bayes_classifier_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/naive_bayes/naive_bayes_classifier_impl.hpp0000644000176200001440000001313113647512751026271 0ustar liggesusers/** * @file naive_bayes_classifier_impl.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * @author Vahab Akbarzadeh (v.akbarzadeh@gmail.com) * * A Naive Bayes Classifier which parametrically estimates the distribution of * the features. This classifier makes its predictions based on the assumption * that the features have been sampled from a set of Gaussians with diagonal * covariance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_IMPL_HPP #define __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_IMPL_HPP #include // In case it hasn't been included already. #include "naive_bayes_classifier.hpp" namespace mlpack { namespace naive_bayes { template NaiveBayesClassifier::NaiveBayesClassifier( const MatType& data, const arma::Col& labels, const size_t classes, const bool incrementalVariance) { const size_t dimensionality = data.n_rows; // Update the variables according to the number of features and classes // present in the data. probabilities.zeros(classes); means.zeros(dimensionality, classes); variances.zeros(dimensionality, classes); Log::Info << "Training Naive Bayes classifier on " << data.n_cols << " examples with " << dimensionality << " features each." << std::endl; // Calculate the class probabilities as well as the sample mean and variance // for each of the features with respect to each of the labels. if (incrementalVariance) { // Use incremental algorithm. for (size_t j = 0; j < data.n_cols; ++j) { const size_t label = labels[j]; ++probabilities[label]; arma::vec delta = data.col(j) - means.col(label); means.col(label) += delta / probabilities[label]; variances.col(label) += delta % (data.col(j) - means.col(label)); } for (size_t i = 0; i < classes; ++i) { if (probabilities[i] > 2) variances.col(i) /= (probabilities[i] - 1); } } else { // Don't use incremental algorithm. This is a two-pass algorithm. It is // possible to calculate the means and variances using a faster one-pass // algorithm but there are some precision and stability issues. If this is // too slow, it's an option to use the faster algorithm by default and then // have this (and the incremental algorithm) be other options. // Calculate the means. for (size_t j = 0; j < data.n_cols; ++j) { const size_t label = labels[j]; ++probabilities[label]; means.col(label) += data.col(j); } // Normalize means. for (size_t i = 0; i < classes; ++i) if (probabilities[i] != 0.0) means.col(i) /= probabilities[i]; // Calculate variances. for (size_t j = 0; j < data.n_cols; ++j) { const size_t label = labels[j]; variances.col(label) += square(data.col(j) - means.col(label)); } // Normalize variances. for (size_t i = 0; i < classes; ++i) if (probabilities[i] > 1) variances.col(i) /= (probabilities[i] - 1); } // Ensure that the variances are invertible. for (size_t i = 0; i < variances.n_elem; ++i) if (variances[i] == 0.0) variances[i] = 1e-50; probabilities /= data.n_cols; } template void NaiveBayesClassifier::Classify(const MatType& data, arma::Col& results) { // Check that the number of features in the test data is same as in the // training data. Log::Assert(data.n_rows == means.n_rows); arma::vec probs = arma::log(probabilities); arma::mat invVar = 1.0 / variances; arma::mat testProbs = arma::repmat(probs.t(), data.n_cols, 1); results.set_size(data.n_cols); // No need to fill with anything yet. Log::Info << "Running Naive Bayes classifier on " << data.n_cols << " data points with " << data.n_rows << " features each." << std::endl; // Calculate the joint probability for each of the data points for each of the // means.n_cols. // Loop over every class. for (size_t i = 0; i < means.n_cols; i++) { // This is an adaptation of gmm::phi() for the case where the covariance is // a diagonal matrix. arma::mat diffs = data - arma::repmat(means.col(i), 1, data.n_cols); arma::mat rhs = -0.5 * arma::diagmat(invVar.col(i)) * diffs; arma::vec exponents(diffs.n_cols); for (size_t j = 0; j < diffs.n_cols; ++j) exponents(j) = std::exp(arma::accu(diffs.col(j) % rhs.unsafe_col(j))); testProbs.col(i) += log(pow(2 * M_PI, (double) data.n_rows / -2.0) * pow(det(arma::diagmat(invVar.col(i))), -0.5) * exponents); } // Now calculate the label. for (size_t i = 0; i < data.n_cols; ++i) { // Find the index of the class with maximum probability for this point. arma::uword maxIndex = 0; arma::vec pointProbs = testProbs.row(i).t(); pointProbs.max(maxIndex); results[i] = maxIndex; } return; } }; // namespace naive_bayes }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/nca/0000755000176200001440000000000013647512751016323 5ustar liggesusersRcppMLPACK/src/mlpack/methods/nca/nca_softmax_error_function_impl.hpp0000644000176200001440000002416713647512751025507 0ustar liggesusers/** * @file nca_softmax_impl.h * @author Ryan Curtin * * Implementation of the Softmax error function. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTCLIN_IMPL_H #define __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTCLIN_IMPL_H // In case it hasn't been included already. #include "nca_softmax_error_function.hpp" namespace mlpack { namespace nca { // Initialize with the given kernel. template SoftmaxErrorFunction::SoftmaxErrorFunction( const arma::mat& dataset, const arma::Col& labels, MetricType metric) : dataset(dataset), labels(labels), metric(metric), precalculated(false) { /* nothing to do */ } //! The non-separable implementation, which uses Precalculate() to save time. template double SoftmaxErrorFunction::Evaluate(const arma::mat& coordinates) { // Calculate the denominators and numerators, if necessary. Precalculate(coordinates); return -accu(p); // Sum of p_i for all i. We negate because our solver // minimizes, not maximizes. }; //! The separated objective function, which does not use Precalculate(). template double SoftmaxErrorFunction::Evaluate(const arma::mat& coordinates, const size_t i) { // Unfortunately each evaluation will take O(N) time because it requires a // scan over all points in the dataset. Our objective is to compute p_i. double denominator = 0; double numerator = 0; // It's quicker to do this now than one point at a time later. stretchedDataset = coordinates * dataset; for (size_t k = 0; k < dataset.n_cols; ++k) { // Don't consider the case where the points are the same. if (k == i) continue; // We want to evaluate exp(-D(A x_i, A x_k)). double eval = std::exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(k))); // If they are in the same if (labels[i] == labels[k]) numerator += eval; denominator += eval; } // Now the result is just a simple division, but we have to be sure that the // denominator is not 0. if (denominator == 0.0) { Rcpp::Rcout << "Denominator of p_" << i << " is 0!" << std::endl; return 0; } return -(numerator / denominator); // Negate because the optimizer is a // minimizer. } //! The non-separable implementation, where Precalculate() is used. template void SoftmaxErrorFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // Calculate the denominators and numerators, if necessary. Precalculate(coordinates); // Now, we handle the summation over i: // sum_i (p_i sum_k (p_ik x_ik x_ik^T) - // sum_{j in class of i} (p_ij x_ij x_ij^T) // We can algebraically manipulate the whole thing to produce a more // memory-friendly way to calculate this. Looping over each i and k (again // O((n * (n + 1)) / 2) as with the last step, we can add the following to the // sum: // // if class of i is the same as the class of k, add // (((p_i - (1 / p_i)) p_ik) + ((p_k - (1 / p_k)) p_ki)) x_ik x_ik^T // otherwise, add // (p_i p_ik + p_k p_ki) x_ik x_ik^T arma::mat sum; sum.zeros(stretchedDataset.n_rows, stretchedDataset.n_rows); for (size_t i = 0; i < stretchedDataset.n_cols; i++) { for (size_t k = (i + 1); k < stretchedDataset.n_cols; k++) { // Calculate p_ik and p_ki first. double eval = exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(k))); double p_ik = 0, p_ki = 0; p_ik = eval / denominators(i); p_ki = eval / denominators(k); // Subtract x_i from x_k. We are not using stretched points here. arma::vec x_ik = dataset.col(i) - dataset.col(k); arma::mat secondTerm = (x_ik * trans(x_ik)); if (labels[i] == labels[k]) sum += ((p[i] - 1) * p_ik + (p[k] - 1) * p_ki) * secondTerm; else sum += (p[i] * p_ik + p[k] * p_ki) * secondTerm; } } // Assemble the final gradient. gradient = -2 * coordinates * sum; } //! The separable implementation. template void SoftmaxErrorFunction::Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) { // We will need to calculate p_i before this evaluation is done, so these two // variables will hold the information necessary for that. double numerator = 0; double denominator = 0; // The gradient involves two matrix terms which are eventually combined into // one. arma::mat firstTerm; arma::mat secondTerm; firstTerm.zeros(coordinates.n_rows, coordinates.n_cols); secondTerm.zeros(coordinates.n_rows, coordinates.n_cols); // Compute the stretched dataset. stretchedDataset = coordinates * dataset; for (size_t k = 0; k < dataset.n_cols; ++k) { // Don't consider the case where the points are the same. if (i == k) continue; // Calculate the numerator of p_ik. double eval = exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(k))); // If the points are in the same class, we must add to the second term of // the gradient as well as the numerator of p_i. We will divide by the // denominator of p_ik later. For x_ik we are not using stretched points. arma::vec x_ik = dataset.col(i) - dataset.col(k); if (labels[i] == labels[k]) { numerator += eval; secondTerm += eval * x_ik * trans(x_ik); } // We always have to add to the denominator of p_i and the first term of the // gradient computation. We will divide by the denominator of p_ik later. denominator += eval; firstTerm += eval * x_ik * trans(x_ik); } // Calculate p_i. double p = 0; if (denominator == 0) { Rcpp::Rcout << "Denominator of p_" << i << " is 0!" << std::endl; // If the denominator is zero, then all p_ik should be zero and there is // no gradient contribution from this point. gradient.zeros(coordinates.n_rows, coordinates.n_rows); return; } else { p = numerator / denominator; firstTerm /= denominator; secondTerm /= denominator; } // Now multiply the first term by p_i, and add the two together and multiply // all by 2 * A. We negate it though, because our optimizer is a minimizer. gradient = -2 * coordinates * (p * firstTerm - secondTerm); } template const arma::mat SoftmaxErrorFunction::GetInitialPoint() const { return arma::eye(dataset.n_rows, dataset.n_rows); } template void SoftmaxErrorFunction::Precalculate( const arma::mat& coordinates) { // Ensure it is the right size. lastCoordinates.set_size(coordinates.n_rows, coordinates.n_cols); // Make sure the calculation is necessary. if ((accu(coordinates == lastCoordinates) == coordinates.n_elem) && precalculated) return; // No need to calculate; we already have this stuff saved. // Coordinates are different; save the new ones, and stretch the dataset. lastCoordinates = coordinates; stretchedDataset = coordinates * dataset; // For each point i, we must evaluate the softmax function: // p_ij = exp( -K(x_i, x_j) ) / ( sum_{k != i} ( exp( -K(x_i, x_k) ))) // p_i = sum_{j in class of i} p_ij // We will do this by keeping track of the denominators for each i as well as // the numerators (the sum for all j in class of i). This will be on the // order of O((n * (n + 1)) / 2), which really isn't all that great. p.zeros(stretchedDataset.n_cols); denominators.zeros(stretchedDataset.n_cols); for (size_t i = 0; i < stretchedDataset.n_cols; i++) { for (size_t j = (i + 1); j < stretchedDataset.n_cols; j++) { // Evaluate exp(-d(x_i, x_j)). double eval = exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(j))); // Add this to the denominators of both p_i and p_j: K(i, j) = K(j, i). denominators[i] += eval; denominators[j] += eval; // If i and j are the same class, add to numerator of both. if (labels[i] == labels[j]) { p[i] += eval; p[j] += eval; } } } // Divide p_i by their denominators. p /= denominators; // Clean up any bad values. for (size_t i = 0; i < stretchedDataset.n_cols; i++) { if (denominators[i] == 0.0) { Rcpp::Rcout << "Denominator of p_{" << i << ", j} is 0." << std::endl; // Set to usable values. denominators[i] = std::numeric_limits::infinity(); p[i] = 0; } } // We've done a precalculation. Mark it as done. precalculated = true; } template std::string SoftmaxErrorFunction::ToString() const{ std::ostringstream convert; convert << "Sofmax Error Function [" << this << "]" << std::endl; convert << " Dataset: " << dataset.n_rows << "x" << dataset.n_cols << std::endl; convert << " Labels: " << labels.n_elem << std::endl; //convert << "Metric: " << metric << std::endl; convert << " Precalculated: " << precalculated << std::endl; return convert.str(); } }; // namespace nca }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/nca/nca_impl.hpp0000644000176200001440000000461113647512751020620 0ustar liggesusers/** * @file nca_impl.hpp * @author Ryan Curtin * * Implementation of templated NCA class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_IMPL_HPP #define __MLPACK_METHODS_NCA_NCA_IMPL_HPP // In case it was not already included. #include "nca.hpp" namespace mlpack { namespace nca { // Just set the internal matrix reference. template class OptimizerType> NCA::NCA(const arma::mat& dataset, const arma::Col& labels, MetricType metric) : dataset(dataset), labels(labels), metric(metric), errorFunction(dataset, labels, metric), optimizer(OptimizerType >(errorFunction)) { /* Nothing to do. */ } template class OptimizerType> void NCA::LearnDistance(arma::mat& outputMatrix) { // See if we were passed an initialized matrix. if ((outputMatrix.n_rows != dataset.n_rows) || (outputMatrix.n_cols != dataset.n_rows)) outputMatrix.eye(dataset.n_rows, dataset.n_rows); //Timer::Start("nca_sgd_optimization"); optimizer.Optimize(outputMatrix); //Timer::Stop("nca_sgd_optimization"); } template class OptimizerType> std::string NCA::ToString() const { std::ostringstream convert; convert << "NCA [" << this << "]" << std::endl; convert << " Dataset: " << dataset.n_rows << "x" << dataset.n_cols << std::endl; convert << " Metric: " << std::endl << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace nca }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/nca/nca.hpp0000644000176200001440000001040313647512751017573 0ustar liggesusers/** * @file nca.hpp * @author Ryan Curtin * * Declaration of NCA class (Neighborhood Components Analysis). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_HPP #define __MLPACK_METHODS_NCA_NCA_HPP #include #include #include #include "nca_softmax_error_function.hpp" namespace mlpack { namespace nca /** Neighborhood Components Analysis. */ { /** * An implementation of Neighborhood Components Analysis, both a linear * dimensionality reduction technique and a distance learning technique. The * method seeks to improve k-nearest-neighbor classification on a dataset by * scaling the dimensions. The method is nonparametric, and does not require a * value of k. It works by using stochastic ("soft") neighbor assignments and * using optimization techniques over the gradient of the accuracy of the * neighbor assignments. * * For more details, see the following published paper: * * @code * @inproceedings{Goldberger2004, * author = {Goldberger, Jacob and Roweis, Sam and Hinton, Geoff and * Salakhutdinov, Ruslan}, * booktitle = {Advances in Neural Information Processing Systems 17}, * pages = {513--520}, * publisher = {MIT Press}, * title = {{Neighbourhood Components Analysis}}, * year = {2004} * } * @endcode */ template class OptimizerType = optimization::SGD> class NCA { public: /** * Construct the Neighborhood Components Analysis object. This simply stores * the reference to the dataset and labels as well as the parameters for * optimization before the actual optimization is performed. * * @param dataset Input dataset. * @param labels Input dataset labels. * @param stepSize Step size for stochastic gradient descent. * @param maxIterations Maximum iterations for stochastic gradient descent. * @param tolerance Tolerance for termination of stochastic gradient descent. * @param shuffle Whether or not to shuffle the dataset during SGD. * @param metric Instantiated metric to use. */ NCA(const arma::mat& dataset, const arma::Col& labels, MetricType metric = MetricType()); /** * Perform Neighborhood Components Analysis. The output distance learning * matrix is written into the passed reference. If LearnDistance() is called * with an outputMatrix which has the correct size (dataset.n_rows x * dataset.n_rows), that matrix will be used as the starting point for * optimization. * * @param output_matrix Covariance matrix of Mahalanobis distance. */ void LearnDistance(arma::mat& outputMatrix); //! Get the dataset reference. const arma::mat& Dataset() const { return dataset; } //! Get the labels reference. const arma::Col& Labels() const { return labels; } //! Get the optimizer. const OptimizerType >& Optimizer() const { return optimizer; } OptimizerType >& Optimizer() { return optimizer; } // Returns a string representation of this object. std::string ToString() const; private: //! Dataset reference. const arma::mat& dataset; //! Labels reference. const arma::Col& labels; //! Metric to be used. MetricType metric; //! The function to optimize. SoftmaxErrorFunction errorFunction; //! The optimizer to use. OptimizerType > optimizer; }; }; // namespace nca }; // namespace mlpack // Include the implementation. #include "nca_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/nca/nca_softmax_error_function.hpp0000644000176200001440000001466113647512751024464 0ustar liggesusers/** * @file nca_softmax_error_function.hpp * @author Ryan Curtin * * Implementation of the stochastic neighbor assignment probability error * function (the "softmax error"). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTION_HPP #define __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTION_HPP #include namespace mlpack { namespace nca { /** * The "softmax" stochastic neighbor assignment probability function. * * The actual function is * * p_ij = (exp(-|| A x_i - A x_j || ^ 2)) / * (sum_{k != i} (exp(-|| A x_i - A x_k || ^ 2))) * * where x_n represents a point and A is the current scaling matrix. * * This class is more flexible than the original paper, allowing an arbitrary * metric function to be used in place of || A x_i - A x_j ||^2, meaning that * the squared Euclidean distance is not the only allowed metric for NCA. * However, that is probably the best way to use this class. * * In addition to the standard Evaluate() and Gradient() functions which MLPACK * optimizers use, overloads of Evaluate() and Gradient() are given which only * operate on one point in the dataset. This is useful for optimizers like * stochastic gradient descent (see mlpack::optimization::SGD). */ template class SoftmaxErrorFunction { public: /** * Initialize with the given kernel; useful when the kernel has some state to * store, which is set elsewhere. If no kernel is given, an empty kernel is * used; this way, you can call the constructor with no arguments. A * reference to the dataset we will be optimizing over is also required. * * @param dataset Matrix containing the dataset. * @param labels Vector of class labels for each point in the dataset. * @param kernel Instantiated kernel (optional). */ SoftmaxErrorFunction(const arma::mat& dataset, const arma::Col& labels, MetricType metric = MetricType()); /** * Evaluate the softmax function for the given covariance matrix. This is the * non-separable implementation, where the objective function is not * decomposed into the sum of several objective functions. * * @param covariance Covariance matrix of Mahalanobis distance. */ double Evaluate(const arma::mat& covariance); /** * Evaluate the softmax objective function for the given covariance matrix on * only one point of the dataset. This is the separable implementation, where * the objective function is decomposed into the sum of many objective * functions, and here, only one of those constituent objective functions is * returned. * * @param covariance Covariance matrix of Mahalanobis distance. * @param i Index of point to use for objective function. */ double Evaluate(const arma::mat& covariance, const size_t i); /** * Evaluate the gradient of the softmax function for the given covariance * matrix. This is the non-separable implementation, where the objective * function is not decomposed into the sum of several objective functions. * * @param covariance Covariance matrix of Mahalanobis distance. * @param gradient Matrix to store the calculated gradient in. */ void Gradient(const arma::mat& covariance, arma::mat& gradient); /** * Evaluate the gradient of the softmax function for the given covariance * matrix on only one point of the dataset. This is the separable * implementation, where the objective function is decomposed into the sum of * many objective functions, and here, only one of those constituent objective * functions is returned. * * @param covariance Covariance matrix of Mahalanobis distance. * @param i Index of point to use for objective function. * @param gradient Matrix to store the calculated gradient in. */ void Gradient(const arma::mat& covariance, const size_t i, arma::mat& gradient); /** * Get the initial point. */ const arma::mat GetInitialPoint() const; /** * Get the number of functions the objective function can be decomposed into. * This is just the number of points in the dataset. */ size_t NumFunctions() const { return dataset.n_cols; } // convert the obkect into a string std::string ToString() const; private: //! The dataset. const arma::mat& dataset; //! Labels for each point in the dataset. const arma::Col& labels; //! The instantiated metric. MetricType metric; //! Last coordinates. Used for the non-separable Evaluate() and Gradient(). arma::mat lastCoordinates; //! Stretched dataset. Kept internal to avoid memory reallocations. arma::mat stretchedDataset; //! Holds calculated p_i, for the non-separable Evaluate() and Gradient(). arma::vec p; //! Holds denominators for calculation of p_ij, for the non-separable //! Evaluate() and Gradient(). arma::vec denominators; //! False if nothing has ever been precalculated (only at construction time). bool precalculated; /** * Precalculate the denominators and numerators that will make up the p_ij, * but only if the coordinates matrix is different than the last coordinates * the Precalculate() method was run with. This method is only called by the * non-separable Evaluate() and Gradient(). * * This will update last_coordinates_ and stretched_dataset_, and also * calculate the p_i and denominators_ which are used in the calculation of * p_i or p_ij. The calculation will be O((n * (n + 1)) / 2), which is not * great. * * @param coordinates Coordinates matrix to use for precalculation. */ void Precalculate(const arma::mat& coordinates); }; }; // namespace nca }; // namespace mlpack // Include implementation. #include "nca_softmax_error_function_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/hmm/0000755000176200001440000000000013647512751016343 5ustar liggesusersRcppMLPACK/src/mlpack/methods/hmm/hmm.hpp0000644000176200001440000003531713647512751017646 0ustar liggesusers/** * @file hmm.hpp * @author Ryan Curtin * @author Tran Quoc Long * * Definition of HMM class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_HPP #define __MLPACK_METHODS_HMM_HMM_HPP #include namespace mlpack { namespace hmm /** Hidden Markov Models. */ { /** * A class that represents a Hidden Markov Model with an arbitrary type of * emission distribution. This HMM class supports training (supervised and * unsupervised), prediction of state sequences via the Viterbi algorithm, * estimation of state probabilities, generation of random sequences, and * calculation of the log-likelihood of a given sequence. * * The template parameter, Distribution, specifies the distribution which the * emissions follow. The class should implement the following functions: * * @code * class Distribution * { * public: * // The type of observation used by this distribution. * typedef something DataType; * * // Return the probability of the given observation. * double Probability(const DataType& observation) const; * * // Estimate the distribution based on the given observations. * void Estimate(const std::vector& observations); * * // Estimate the distribution based on the given observations, given also * // the probability of each observation coming from this distribution. * void Estimate(const std::vector& observations, * const std::vector& probabilities); * }; * @endcode * * See the mlpack::distribution::DiscreteDistribution class for an example. One * would use the DiscreteDistribution class when the observations are * non-negative integers. Other distributions could be Gaussians, a mixture of * Gaussians (GMM), or any other probability distribution implementing the * four Distribution functions. * * Usage of the HMM class generally involves either training an HMM or loading * an already-known HMM and taking probability measurements of sequences. * Example code for supervised training of a Gaussian HMM (that is, where the * emission output distribution is a single Gaussian for each hidden state) is * given below. * * @code * extern arma::mat observations; // Each column is an observation. * extern arma::Col states; // Hidden states for each observation. * // Create an untrained HMM with 5 hidden states and default (N(0, 1)) * // Gaussian distributions with the dimensionality of the dataset. * HMM hmm(5, GaussianDistribution(observations.n_rows)); * * // Train the HMM (the labels could be omitted to perform unsupervised * // training). * hmm.Train(observations, states); * @endcode * * Once initialized, the HMM can evaluate the probability of a certain sequence * (with LogLikelihood()), predict the most likely sequence of hidden states * (with Predict()), generate a sequence (with Generate()), or estimate the * probabilities of each state for a sequence of observations (with Estimate()). * * @tparam Distribution Type of emission distribution for this HMM. */ template class HMM { public: /** * Create the Hidden Markov Model with the given number of hidden states and * the given default distribution for emissions. The dimensionality of the * observations is taken from the emissions variable, so it is important that * the given default emission distribution is set with the correct * dimensionality. Alternately, set the dimensionality with Dimensionality(). * Optionally, the tolerance for convergence of the Baum-Welch algorithm can * be set. * * By default, the transition matrix and initial probability vector are set to * contain equal probability for each state. * * @param states Number of states. * @param emissions Default distribution for emissions. * @param tolerance Tolerance for convergence of training algorithm * (Baum-Welch). */ HMM(const size_t states, const Distribution emissions, const double tolerance = 1e-5); /** * Create the Hidden Markov Model with the given initial probability vector, * the given transition matrix, and the given emission distributions. The * dimensionality of the observations of the HMM are taken from the given * emission distributions. Alternately, the dimensionality can be set with * Dimensionality(). * * The initial state probability vector should have length equal to the number * of states, and each entry represents the probability of being in the given * state at time T = 0 (the beginning of a sequence). * * The transition matrix should be such that T(i, j) is the probability of * transition to state i from state j. The columns of the matrix should sum * to 1. * * The emission matrix should be such that E(i, j) is the probability of * emission i while in state j. The columns of the matrix should sum to 1. * * Optionally, the tolerance for convergence of the Baum-Welch algorithm can * be set. * * @param initial Initial state probabilities. * @param transition Transition matrix. * @param emission Emission distributions. * @param tolerance Tolerance for convergence of training algorithm * (Baum-Welch). */ HMM(const arma::vec& initial, const arma::mat& transition, const std::vector& emission, const double tolerance = 1e-5); /** * Train the model using the Baum-Welch algorithm, with only the given * unlabeled observations. Instead of giving a guess transition and emission * matrix here, do that in the constructor. Each matrix in the vector of data * sequences holds an individual data sequence; each point in each individual * data sequence should be a column in the matrix. The number of rows in each * matrix should be equal to the dimensionality of the HMM (which is set in * the constructor). * * It is preferable to use the other overload of Train(), with labeled data. * That will produce much better results. However, if labeled data is * unavailable, this will work. In addition, it is possible to use Train() * with labeled data first, and then continue to train the model using this * overload of Train() with unlabeled data. * * The tolerance of the Baum-Welch algorithm can be set either in the * constructor or with the Tolerance() method. When the change in * log-likelihood of the model between iterations is less than the tolerance, * the Baum-Welch algorithm terminates. * * @note * Train() can be called multiple times with different sequences; each time it * is called, it uses the current parameters of the HMM as a starting point * for training. * @endnote * * @param dataSeq Vector of observation sequences. */ void Train(const std::vector& dataSeq); /** * Train the model using the given labeled observations; the transition and * emission matrices are directly estimated. Each matrix in the vector of * data sequences corresponds to a vector in the vector of state sequences. * Each point in each individual data sequence should be a column in the * matrix, and its state should be the corresponding element in the state * sequence vector. For instance, dataSeq[0].col(3) corresponds to the fourth * observation in the first data sequence, and its state is stateSeq[0][3]. * The number of rows in each matrix should be equal to the dimensionality of * the HMM (which is set in the constructor). * * @note * Train() can be called multiple times with different sequences; each time it * is called, it uses the current parameters of the HMM as a starting point * for training. * @endnote * * @param dataSeq Vector of observation sequences. * @param stateSeq Vector of state sequences, corresponding to each * observation. */ void Train(const std::vector& dataSeq, const std::vector >& stateSeq); /** * Estimate the probabilities of each hidden state at each time step for each * given data observation, using the Forward-Backward algorithm. Each matrix * which is returned has columns equal to the number of data observations, and * rows equal to the number of hidden states in the model. The log-likelihood * of the most probable sequence is returned. * * @param dataSeq Sequence of observations. * @param stateProb Matrix in which the probabilities of each state at each * time interval will be stored. * @param forwardProb Matrix in which the forward probabilities of each state * at each time interval will be stored. * @param backwardProb Matrix in which the backward probabilities of each * state at each time interval will be stored. * @param scales Vector in which the scaling factors at each time interval * will be stored. * @return Log-likelihood of most likely state sequence. */ double Estimate(const arma::mat& dataSeq, arma::mat& stateProb, arma::mat& forwardProb, arma::mat& backwardProb, arma::vec& scales) const; /** * Estimate the probabilities of each hidden state at each time step of each * given data observation, using the Forward-Backward algorithm. The returned * matrix of state probabilities has columns equal to the number of data * observations, and rows equal to the number of hidden states in the model. * The log-likelihood of the most probable sequence is returned. * * @param dataSeq Sequence of observations. * @param stateProb Probabilities of each state at each time interval. * @return Log-likelihood of most likely state sequence. */ double Estimate(const arma::mat& dataSeq, arma::mat& stateProb) const; /** * Generate a random data sequence of the given length. The data sequence is * stored in the dataSequence parameter, and the state sequence is stored in * the stateSequence parameter. Each column of dataSequence represents a * random observation. * * @param length Length of random sequence to generate. * @param dataSequence Vector to store data in. * @param stateSequence Vector to store states in. * @param startState Hidden state to start sequence in (default 0). */ void Generate(const size_t length, arma::mat& dataSequence, arma::Col& stateSequence, const size_t startState = 0) const; /** * Compute the most probable hidden state sequence for the given data * sequence, using the Viterbi algorithm, returning the log-likelihood of the * most likely state sequence. * * @param dataSeq Sequence of observations. * @param stateSeq Vector in which the most probable state sequence will be * stored. * @return Log-likelihood of most probable state sequence. */ double Predict(const arma::mat& dataSeq, arma::Col& stateSeq) const; /** * Compute the log-likelihood of the given data sequence. * * @param dataSeq Data sequence to evaluate the likelihood of. * @return Log-likelihood of the given sequence. */ double LogLikelihood(const arma::mat& dataSeq) const; //! Return the vector of initial state probabilities. const arma::vec& Initial() const { return initial; } //! Modify the vector of initial state probabilities. arma::vec& Initial() { return initial; } //! Return the transition matrix. const arma::mat& Transition() const { return transition; } //! Return a modifiable transition matrix reference. arma::mat& Transition() { return transition; } //! Return the emission distributions. const std::vector& Emission() const { return emission; } //! Return a modifiable emission probability matrix reference. std::vector& Emission() { return emission; } //! Get the dimensionality of observations. size_t Dimensionality() const { return dimensionality; } //! Set the dimensionality of observations. size_t& Dimensionality() { return dimensionality; } //! Get the tolerance of the Baum-Welch algorithm. double Tolerance() const { return tolerance; } //! Modify the tolerance of the Baum-Welch algorithm. double& Tolerance() { return tolerance; } /** * Returns a string representation of this object. */ std::string ToString() const; private: // Helper functions. /** * The Forward algorithm (part of the Forward-Backward algorithm). Computes * forward probabilities for each state for each observation in the given data * sequence. The returned matrix has rows equal to the number of hidden * states and columns equal to the number of observations. * * @param dataSeq Data sequence to compute probabilities for. * @param scales Vector in which scaling factors will be saved. * @param forwardProb Matrix in which forward probabilities will be saved. */ void Forward(const arma::mat& dataSeq, arma::vec& scales, arma::mat& forwardProb) const; /** * The Backward algorithm (part of the Forward-Backward algorithm). Computes * backward probabilities for each state for each observation in the given * data sequence, using the scaling factors found (presumably) by Forward(). * The returned matrix has rows equal to the number of hidden states and * columns equal to the number of observations. * * @param dataSeq Data sequence to compute probabilities for. * @param scales Vector of scaling factors. * @param backwardProb Matrix in which backward probabilities will be saved. */ void Backward(const arma::mat& dataSeq, const arma::vec& scales, arma::mat& backwardProb) const; //! Initial state probability vector. arma::vec initial; //! Transition probability matrix. arma::mat transition; //! Set of emission probability distributions; one for each state. std::vector emission; //! Dimensionality of observations. size_t dimensionality; //! Tolerance of Baum-Welch algorithm. double tolerance; }; }; // namespace hmm }; // namespace mlpack // Include implementation. #include "hmm_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/hmm/hmm_util_impl.hpp0000644000176200001440000001560113647512751021716 0ustar liggesusers/** * @file hmm_util_impl.hpp * @author Ryan Curtin * * Implementation of HMM load/save functions. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_UTIL_IMPL_HPP #define __MLPACK_METHODS_HMM_HMM_UTIL_IMPL_HPP // In case it hasn't already been included. #include "hmm_util.hpp" #include namespace mlpack { namespace hmm { template void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr) { Rcpp::Rcout << "HMM save not implemented for arbitrary distributions." << std::endl; } template<> void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr) { std::string type = "discrete"; size_t states = hmm.Transition().n_rows; sr.SaveParameter(type, "hmm_type"); sr.SaveParameter(states, "hmm_states"); sr.SaveParameter(hmm.Transition(), "hmm_transition"); // Now the emissions. for (size_t i = 0; i < states; ++i) { // Generate name. std::stringstream s; s << "hmm_emission_distribution_" << i; sr.SaveParameter(hmm.Emission()[i].Probabilities(), s.str()); } } template<> void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr) { std::string type = "gaussian"; size_t states = hmm.Transition().n_rows; sr.SaveParameter(type, "hmm_type"); sr.SaveParameter(states, "hmm_states"); sr.SaveParameter(hmm.Transition(), "hmm_transition"); // Now the emissions. for (size_t i = 0; i < states; ++i) { // Generate name. std::stringstream s; s << "hmm_emission_mean_" << i; sr.SaveParameter(hmm.Emission()[i].Mean(), s.str()); s.str(""); s << "hmm_emission_covariance_" << i; sr.SaveParameter(hmm.Emission()[i].Covariance(), s.str()); } } template<> void SaveHMM(const HMM >& hmm, util::SaveRestoreUtility& sr) { std::string type = "gmm"; size_t states = hmm.Transition().n_rows; sr.SaveParameter(type, "hmm_type"); sr.SaveParameter(states, "hmm_states"); sr.SaveParameter(hmm.Transition(), "hmm_transition"); // Now the emissions. for (size_t i = 0; i < states; ++i) { // Generate name. std::stringstream s; s << "hmm_emission_" << i << "_gaussians"; sr.SaveParameter(hmm.Emission()[i].Gaussians(), s.str()); s.str(""); s << "hmm_emission_" << i << "_weights"; sr.SaveParameter(hmm.Emission()[i].Weights(), s.str()); for (size_t g = 0; g < hmm.Emission()[i].Gaussians(); ++g) { s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_mean"; sr.SaveParameter(hmm.Emission()[i].Means()[g], s.str()); s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_covariance"; sr.SaveParameter(hmm.Emission()[i].Covariances()[g], s.str()); } } } template void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr) { Rcpp::Rcout << "HMM load not implemented for arbitrary distributions." << std::endl; } template<> void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr) { std::string type; size_t states; sr.LoadParameter(type, "hmm_type"); if (type != "discrete") { Rcpp::Rcout << "Cannot load non-discrete HMM (of type " << type << ") as " << "discrete HMM!" << std::endl; } sr.LoadParameter(states, "hmm_states"); // Load transition matrix. sr.LoadParameter(hmm.Transition(), "hmm_transition"); // Now each emission distribution. hmm.Emission().resize(states); for (size_t i = 0; i < states; ++i) { std::stringstream s; s << "hmm_emission_distribution_" << i; sr.LoadParameter(hmm.Emission()[i].Probabilities(), s.str()); } hmm.Dimensionality() = 1; } template<> void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr) { std::string type; size_t states; sr.LoadParameter(type, "hmm_type"); if (type != "gaussian") { Rcpp::Rcout << "Cannot load non-Gaussian HMM (of type " << type << ") as " << "a Gaussian HMM!" << std::endl; } sr.LoadParameter(states, "hmm_states"); // Load transition matrix. sr.LoadParameter(hmm.Transition(), "hmm_transition"); // Now each emission distribution. hmm.Emission().resize(states); for (size_t i = 0; i < states; ++i) { std::stringstream s; s << "hmm_emission_mean_" << i; sr.LoadParameter(hmm.Emission()[i].Mean(), s.str()); s.str(""); s << "hmm_emission_covariance_" << i; sr.LoadParameter(hmm.Emission()[i].Covariance(), s.str()); } hmm.Dimensionality() = hmm.Emission()[0].Mean().n_elem; } template<> void LoadHMM(HMM >& hmm, util::SaveRestoreUtility& sr) { std::string type; size_t states; sr.LoadParameter(type, "hmm_type"); if (type != "gmm") { Rcpp::Rcout << "Cannot load non-GMM HMM (of type " << type << ") as " << "a Gaussian Mixture Model HMM!" << std::endl; } sr.LoadParameter(states, "hmm_states"); // Load transition matrix. sr.LoadParameter(hmm.Transition(), "hmm_transition"); // Now each emission distribution. hmm.Emission().resize(states, gmm::GMM<>(1, 1)); for (size_t i = 0; i < states; ++i) { std::stringstream s; s << "hmm_emission_" << i << "_gaussians"; size_t gaussians; sr.LoadParameter(gaussians, s.str()); s.str(""); // Extract dimensionality. arma::vec meanzero; s << "hmm_emission_" << i << "_gaussian_0_mean"; sr.LoadParameter(meanzero, s.str()); size_t dimensionality = meanzero.n_elem; // Initialize GMM correctly. hmm.Emission()[i].Gaussians() = gaussians; hmm.Emission()[i].Dimensionality() = dimensionality; for (size_t g = 0; g < gaussians; ++g) { s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_mean"; sr.LoadParameter(hmm.Emission()[i].Means()[g], s.str()); s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_covariance"; sr.LoadParameter(hmm.Emission()[i].Covariances()[g], s.str()); } s.str(""); s << "hmm_emission_" << i << "_weights"; sr.LoadParameter(hmm.Emission()[i].Weights(), s.str()); } hmm.Dimensionality() = hmm.Emission()[0].Dimensionality(); } }; // namespace hmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/hmm/hmm_util.hpp0000644000176200001440000000327313647512751020677 0ustar liggesusers/** * @file hmm_util.hpp * @author Ryan Curtin * * Save/load utilities for HMMs. This should be eventually merged into the HMM * class itself. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_UTIL_HPP #define __MLPACK_METHODS_HMM_HMM_UTIL_HPP #include "hmm.hpp" namespace mlpack { namespace hmm { /** * Save an HMM to file. This only works for GMMs, DiscreteDistributions, and * GaussianDistributions. * * @tparam Distribution Distribution type of HMM. * @param sr SaveRestoreUtility to use. */ template void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr); /** * Load an HMM from file. This only works for GMMs, DiscreteDistributions, and * GaussianDistributions. * * @tparam Distribution Distribution type of HMM. * @param sr SaveRestoreUtility to use. */ template void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr); }; // namespace hmm }; // namespace mlpack // Include implementation. #include "hmm_util_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/hmm/hmm_impl.hpp0000644000176200001440000004601113647512751020660 0ustar liggesusers/** * @file hmm_impl.hpp * @author Ryan Curtin * @author Tran Quoc Long * * Implementation of HMM class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_IMPL_HPP #define __MLPACK_METHODS_HMM_HMM_IMPL_HPP // Just in case... #include "hmm.hpp" namespace mlpack { namespace hmm { /** * Create the Hidden Markov Model with the given number of hidden states and the * given number of emission states. */ template HMM::HMM(const size_t states, const Distribution emissions, const double tolerance) : initial(arma::ones(states) / (double) states), transition(arma::ones(states, states) / (double) states), emission(states, /* default distribution */ emissions), dimensionality(emissions.Dimensionality()), tolerance(tolerance) { /* nothing to do */ } /** * Create the Hidden Markov Model with the given transition matrix and the given * emission probability matrix. */ template HMM::HMM(const arma::vec& initial, const arma::mat& transition, const std::vector& emission, const double tolerance) : initial(initial), transition(transition), emission(emission), tolerance(tolerance) { // Set the dimensionality, if we can. if (emission.size() > 0) dimensionality = emission[0].Dimensionality(); else { Rcpp::Rcout << "HMM::HMM(): no emission distributions given; assuming a " << "dimensionality of 0 and hoping it gets set right later." << std::endl; dimensionality = 0; } } /** * Train the model using the Baum-Welch algorithm, with only the given unlabeled * observations. Each matrix in the vector of data sequences holds an * individual data sequence; each point in each individual data sequence should * be a column in the matrix. The number of rows in each matrix should be equal * to the dimensionality of the HMM. * * It is preferable to use the other overload of Train(), with labeled data. * That will produce much better results. However, if labeled data is * unavailable, this will work. In addition, it is possible to use Train() with * labeled data first, and then continue to train the model using this overload * of Train() with unlabeled data. * * @param dataSeq Set of data sequences to train on. */ template void HMM::Train(const std::vector& dataSeq) { // We should allow a guess at the transition and emission matrices. double loglik = 0; double oldLoglik = 0; // Maximum iterations? size_t iterations = 1000; // Find length of all sequences and ensure they are the correct size. size_t totalLength = 0; for (size_t seq = 0; seq < dataSeq.size(); seq++) { totalLength += dataSeq[seq].n_cols; if (dataSeq[seq].n_rows != dimensionality) Rcpp::Rcout << "HMM::Train(): data sequence " << seq << " has " << "dimensionality " << dataSeq[seq].n_rows << " (expected " << dimensionality << " dimensions)." << std::endl; } // These are used later for training of each distribution. We initialize it // all now so we don't have to do any allocation later on. std::vector emissionProb(transition.n_cols, arma::vec(totalLength)); arma::mat emissionList(dimensionality, totalLength); // This should be the Baum-Welch algorithm (EM for HMM estimation). This // follows the procedure outlined in Elliot, Aggoun, and Moore's book "Hidden // Markov Models: Estimation and Control", pp. 36-40. for (size_t iter = 0; iter < iterations; iter++) { // Clear new transition matrix and emission probabilities. arma::vec newInitial(transition.n_rows); newInitial.zeros(); arma::mat newTransition(transition.n_rows, transition.n_cols); newTransition.zeros(); // Reset log likelihood. loglik = 0; // Sum over time. size_t sumTime = 0; // Loop over each sequence. for (size_t seq = 0; seq < dataSeq.size(); seq++) { arma::mat stateProb; arma::mat forward; arma::mat backward; arma::vec scales; // Add the log-likelihood of this sequence. This is the E-step. loglik += Estimate(dataSeq[seq], stateProb, forward, backward, scales); // Now re-estimate the parameters. This is the M-step. // pi_i = sum_d ((1 / P(seq[d])) sum_t (f(i, 0) b(i, 0)) // T_ij = sum_d ((1 / P(seq[d])) sum_t (f(i, t) T_ij E_i(seq[d][t]) b(i, // t + 1))) // E_ij = sum_d ((1 / P(seq[d])) sum_{t | seq[d][t] = j} f(i, t) b(i, t) // We store the new estimates in a different matrix. for (size_t t = 0; t < dataSeq[seq].n_cols; t++) { for (size_t j = 0; j < transition.n_cols; j++) { // Add to estimate of initial probability for state j. newInitial[j] = stateProb(j, 0); if (t < dataSeq[seq].n_cols - 1) { // Estimate of T_ij (probability of transition from state j to state // i). We postpone multiplication of the old T_ij until later. for (size_t i = 0; i < transition.n_rows; i++) newTransition(i, j) += forward(j, t) * backward(i, t + 1) * emission[i].Probability(dataSeq[seq].unsafe_col(t + 1)) / scales[t + 1]; } // Add to list of emission observations, for Distribution::Estimate(). emissionList.col(sumTime) = dataSeq[seq].col(t); emissionProb[j][sumTime] = stateProb(j, t); } sumTime++; } } // Normalize the new initial probabilities. if (dataSeq.size() == 0) initial = newInitial / dataSeq.size(); // Assign the new transition matrix. We use %= (element-wise // multiplication) because every element of the new transition matrix must // still be multiplied by the old elements (this is the multiplication we // earlier postponed). transition %= newTransition; // Now we normalize the transition matrix. for (size_t i = 0; i < transition.n_cols; i++) transition.col(i) /= accu(transition.col(i)); // Now estimate emission probabilities. for (size_t state = 0; state < transition.n_cols; state++) emission[state].Estimate(emissionList, emissionProb[state]); Rcpp::Rcout << "Iteration " << iter << ": log-likelihood " << loglik << std::endl; if (std::abs(oldLoglik - loglik) < tolerance) { Rcpp::Rcout << "Converged after " << iter << " iterations." << std::endl; break; } oldLoglik = loglik; } } /** * Train the model using the given labeled observations; the transition and * emission matrices are directly estimated. */ template void HMM::Train(const std::vector& dataSeq, const std::vector >& stateSeq) { // Simple error checking. if (dataSeq.size() != stateSeq.size()) { Rcpp::Rcout << "HMM::Train(): number of data sequences (" << dataSeq.size() << ") not equal to number of state sequences (" << stateSeq.size() << ")." << std::endl; } initial.zeros(); transition.zeros(); // Estimate the transition and emission matrices directly from the // observations. The emission list holds the time indices for observations // from each state. std::vector > > emissionList(transition.n_cols); for (size_t seq = 0; seq < dataSeq.size(); seq++) { // Simple error checking. if (dataSeq[seq].n_cols != stateSeq[seq].n_elem) { Rcpp::Rcout << "HMM::Train(): number of observations (" << dataSeq[seq].n_cols << ") in sequence " << seq << " not equal to number of states (" << stateSeq[seq].n_cols << ") in sequence " << seq << "." << std::endl; } if (dataSeq[seq].n_rows != dimensionality) { Rcpp::Rcout << "HMM::Train(): data sequence " << seq << " has " << "dimensionality " << dataSeq[seq].n_rows << " (expected " << dimensionality << " dimensions)." << std::endl; } // Loop over each observation in the sequence. For estimation of the // transition matrix, we must ignore the last observation. initial[stateSeq[seq][0]]++; for (size_t t = 0; t < dataSeq[seq].n_cols - 1; t++) { transition(stateSeq[seq][t + 1], stateSeq[seq][t])++; emissionList[stateSeq[seq][t]].push_back(std::make_pair(seq, t)); } // Last observation. emissionList[stateSeq[seq][stateSeq[seq].n_elem - 1]].push_back( std::make_pair(seq, stateSeq[seq].n_elem - 1)); } // Normalize initial weights. initial /= accu(initial); // Normalize transition matrix. for (size_t col = 0; col < transition.n_cols; col++) { // If the transition probability sum is greater than 0 in this column, the // emission probability sum will also be greater than 0. We want to avoid // division by 0. double sum = accu(transition.col(col)); if (sum > 0) transition.col(col) /= sum; } // Estimate emission matrix. for (size_t state = 0; state < transition.n_cols; state++) { // Generate full sequence of observations for this state from the list of // emissions that are from this state. arma::mat emissions(dimensionality, emissionList[state].size()); for (size_t i = 0; i < emissions.n_cols; i++) { emissions.col(i) = dataSeq[emissionList[state][i].first].col( emissionList[state][i].second); } emission[state].Estimate(emissions); } } /** * Estimate the probabilities of each hidden state at each time step for each * given data observation. */ template double HMM::Estimate(const arma::mat& dataSeq, arma::mat& stateProb, arma::mat& forwardProb, arma::mat& backwardProb, arma::vec& scales) const { // First run the forward-backward algorithm. Forward(dataSeq, scales, forwardProb); Backward(dataSeq, scales, backwardProb); // Now assemble the state probability matrix based on the forward and backward // probabilities. stateProb = forwardProb % backwardProb; // Finally assemble the log-likelihood and return it. return accu(log(scales)); } /** * Estimate the probabilities of each hidden state at each time step for each * given data observation. */ template double HMM::Estimate(const arma::mat& dataSeq, arma::mat& stateProb) const { // We don't need to save these. arma::mat forwardProb, backwardProb; arma::vec scales; return Estimate(dataSeq, stateProb, forwardProb, backwardProb, scales); } /** * Generate a random data sequence of a given length. The data sequence is * stored in the dataSequence parameter, and the state sequence is stored in * the stateSequence parameter. */ template void HMM::Generate(const size_t length, arma::mat& dataSequence, arma::Col& stateSequence, const size_t startState) const { // Set vectors to the right size. stateSequence.set_size(length); dataSequence.set_size(dimensionality, length); // Set start state (default is 0). stateSequence[0] = startState; // Choose first emission state. double randValue = math::Random(); // We just have to find where our random value sits in the probability // distribution of emissions for our starting state. dataSequence.col(0) = emission[startState].Random(); // Now choose the states and emissions for the rest of the sequence. for (size_t t = 1; t < length; t++) { // First choose the hidden state. randValue = math::Random(); // Now find where our random value sits in the probability distribution of // state changes. double probSum = 0; for (size_t st = 0; st < transition.n_rows; st++) { probSum += transition(st, stateSequence[t - 1]); if (randValue <= probSum) { stateSequence[t] = st; break; } } // Now choose the emission. dataSequence.col(t) = emission[stateSequence[t]].Random(); } } /** * Compute the most probable hidden state sequence for the given observation * using the Viterbi algorithm. Returns the log-likelihood of the most likely * sequence. */ template double HMM::Predict(const arma::mat& dataSeq, arma::Col& stateSeq) const { // This is an implementation of the Viterbi algorithm for finding the most // probable sequence of states to produce the observed data sequence. We // don't use log-likelihoods to save that little bit of time, but we'll // calculate the log-likelihood at the end of it all. stateSeq.set_size(dataSeq.n_cols); arma::mat logStateProb(transition.n_rows, dataSeq.n_cols); arma::mat stateSeqBack(transition.n_rows, dataSeq.n_cols); // Store the logs of the transposed transition matrix. This is because we // will be using the rows of the transition matrix. arma::mat logTrans(log(trans(transition))); // The calculation of the first state is slightly different; the probability // of the first state being state j is the maximum probability that the state // came to be j from another state. logStateProb.col(0).zeros(); for (size_t state = 0; state < transition.n_rows; state++) { logStateProb(state, 0) = log(initial[state] * emission[state].Probability(dataSeq.unsafe_col(0))); stateSeqBack(state, 0) = state; } // Store the best first state. arma::uword index; for (size_t t = 1; t < dataSeq.n_cols; t++) { // Assemble the state probability for this element. // Given that we are in state j, we use state with the highest probability // of being the previous state. for (size_t j = 0; j < transition.n_rows; j++) { arma::vec prob = logStateProb.col(t - 1) + logTrans.col(j); logStateProb(j, t) = prob.max(index) + log(emission[j].Probability(dataSeq.unsafe_col(t))); stateSeqBack(j, t) = index; } } // Backtrack to find the most probable state sequence. logStateProb.unsafe_col(dataSeq.n_cols - 1).max(index); stateSeq[dataSeq.n_cols - 1] = index; for (size_t t = 2; t <= dataSeq.n_cols; t++) stateSeq[dataSeq.n_cols - t] = stateSeqBack(stateSeq[dataSeq.n_cols - t + 1], dataSeq.n_cols - t + 1); return logStateProb(stateSeq(dataSeq.n_cols - 1), dataSeq.n_cols - 1); } /** * Compute the log-likelihood of the given data sequence. */ template double HMM::LogLikelihood(const arma::mat& dataSeq) const { arma::mat forward; arma::vec scales; Forward(dataSeq, scales, forward); // The log-likelihood is the log of the scales for each time step. return accu(log(scales)); } /** * The Forward procedure (part of the Forward-Backward algorithm). */ template void HMM::Forward(const arma::mat& dataSeq, arma::vec& scales, arma::mat& forwardProb) const { // Our goal is to calculate the forward probabilities: // P(X_k | o_{1:k}) for all possible states X_k, for each time point k. forwardProb.zeros(transition.n_rows, dataSeq.n_cols); scales.zeros(dataSeq.n_cols); // The first entry in the forward algorithm uses the initial state // probabilities. Note that MATLAB assumes that the starting state (at // t = -1) is state 0; this is not our assumption here. To force that // behavior, you could append a single starting state to every single data // sequence and that should produce results in line with MATLAB. for (size_t state = 0; state < transition.n_rows; state++) forwardProb(state, 0) = initial(state) * emission[state].Probability(dataSeq.unsafe_col(0)); // Then normalize the column. scales[0] = accu(forwardProb.col(0)); forwardProb.col(0) /= scales[0]; // Now compute the probabilities for each successive observation. for (size_t t = 1; t < dataSeq.n_cols; t++) { for (size_t j = 0; j < transition.n_rows; j++) { // The forward probability of state j at time t is the sum over all states // of the probability of the previous state transitioning to the current // state and emitting the given observation. forwardProb(j, t) = accu(forwardProb.col(t - 1) % trans(transition.row(j))) * emission[j].Probability(dataSeq.unsafe_col(t)); } // Normalize probability. scales[t] = accu(forwardProb.col(t)); forwardProb.col(t) /= scales[t]; } } template void HMM::Backward(const arma::mat& dataSeq, const arma::vec& scales, arma::mat& backwardProb) const { // Our goal is to calculate the backward probabilities: // P(X_k | o_{k + 1:T}) for all possible states X_k, for each time point k. backwardProb.zeros(transition.n_rows, dataSeq.n_cols); // The last element probability is 1. backwardProb.col(dataSeq.n_cols - 1).fill(1); // Now step backwards through all other observations. for (size_t t = dataSeq.n_cols - 2; t + 1 > 0; t--) { for (size_t j = 0; j < transition.n_rows; j++) { // The backward probability of state j at time t is the sum over all state // of the probability of the next state having been a transition from the // current state multiplied by the probability of each of those states // emitting the given observation. for (size_t state = 0; state < transition.n_rows; state++) backwardProb(j, t) += transition(state, j) * backwardProb(state, t + 1) * emission[state].Probability(dataSeq.unsafe_col(t + 1)); // Normalize by the weights from the forward algorithm. backwardProb(j, t) /= scales[t + 1]; } } } template std::string HMM::ToString() const { std::ostringstream convert; convert << "HMM [" << this << "]" << std::endl; convert << " Dimensionality: " << dimensionality <. */ #ifndef __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_HPP #define __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_HPP #include #include #include "sparse_autoencoder_function.hpp" namespace mlpack { namespace nn { /** * A sparse autoencoder is a neural network whose aim to learn compressed * representations of the data, typically for dimensionality reduction, with a * constraint on the activity of the neurons in the network. Sparse autoencoders * can be stacked together to learn a hierarchy of features, which provide a * better representation of the data for classification. This is a method used * in the recently developed field of deep learning. More technical details * about the model can be found on the following webpage: * * http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial * * An example of how to use the interface is shown below: * * @code * arma::mat data; // Data matrix. * const size_t vSize = 64; // Size of visible layer, depends on the data. * const size_t hSize = 25; // Size of hidden layer, depends on requirements. * * // Train the model using default options. * SparseAutoencoder encoder1(data, vSize, hSize); * * const size_t numBasis = 5; // Parameter required for L-BFGS algorithm. * const size_t numIterations = 100; // Maximum number of iterations. * * // Use an instantiated optimizer for the training. * SparseAutoencoderFunction saf(data, vSize, hSize); * L_BFGS optimizer(saf, numBasis, numIterations); * SparseAutoencoder encoder2(optimizer); * * arma::mat features1, features2; // Matrices for storing new representations. * * // Get new representations from the trained models. * encoder1.GetNewFeatures(data, features1); * encoder2.GetNewFeatures(data, features2); * @endcode * * This implementation allows the use of arbitrary mlpack optimizers via the * OptimizerType template parameter. * * @tparam OptimizerType The optimizer to use; by default this is L-BFGS. Any * mlpack optimizer can be used here. */ template< template class OptimizerType = mlpack::optimization::L_BFGS > class SparseAutoencoder { public: /** * Construct the sparse autoencoder model with the given training data. This * will train the model. The parameters 'lambda', 'beta' and 'rho' can be set * optionally. Changing these parameters will have an effect on regularization * and sparsity of the model. * * @param data Input data with each column as one example. * @param visibleSize Size of input vector expected at the visible layer. * @param hiddenSize Size of input vector expected at the hidden layer. * @param lambda L2-regularization parameter. * @param beta KL divergence parameter. * @param rho Sparsity parameter. */ SparseAutoencoder(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, const double lambda = 0.0001, const double beta = 3, const double rho = 0.01); /** * Construct the sparse autoencoder model with the given training data. This * will train the model. This overload takes an already instantiated optimizer * and uses it to train the model. The optimizer should hold an instantiated * SparseAutoencoderFunction object for the function to operate upon. This * option should be preferred when the optimizer options are to be changed. * * @param optimizer Instantiated optimizer with instantiated error function. */ SparseAutoencoder(OptimizerType& optimizer); /** * Transforms the provided data into the representation learned by the sparse * autoencoder. The function basically performs a feedforward computation * using the learned weights, and returns the hidden layer activations. * * @param data Matrix of the provided data. * @param features The hidden layer representation of the provided data. */ void GetNewFeatures(arma::mat& data, arma::mat& features); /** * Returns the elementwise sigmoid of the passed matrix, where the sigmoid * function of a real number 'x' is [1 / (1 + exp(-x))]. * * @param x Matrix of real values for which we require the sigmoid activation. */ void Sigmoid(const arma::mat& x, arma::mat& output) const { output = (1.0 / (1 + arma::exp(-x))); } //! Sets size of the visible layer. void VisibleSize(const size_t visible) { this->visibleSize = visible; } //! Gets size of the visible layer. size_t VisibleSize() const { return visibleSize; } //! Sets size of the hidden layer. void HiddenSize(const size_t hidden) { this->hiddenSize = hidden; } //! Gets the size of the hidden layer. size_t HiddenSize() const { return hiddenSize; } //! Sets the L2-regularization parameter. void Lambda(const double l) { this->lambda = l; } //! Gets the L2-regularization parameter. double Lambda() const { return lambda; } //! Sets the KL divergence parameter. void Beta(const double b) { this->beta = b; } //! Gets the KL divergence parameter. double Beta() const { return beta; } //! Sets the sparsity parameter. void Rho(const double r) { this->rho = r; } //! Gets the sparsity parameter. double Rho() const { return rho; } private: //! Parameters after optimization. arma::mat parameters; //! Size of the visible layer. size_t visibleSize; //! Size of the hidden layer. size_t hiddenSize; //! L2-regularization parameter. double lambda; //! KL divergence parameter. double beta; //! Sparsity parameter. double rho; }; }; // namespace nn }; // namespace mlpack // Include implementation. #include "sparse_autoencoder_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/sparse_autoencoder/sparse_autoencoder_impl.hpp0000644000176200001440000000655413647512751027100 0ustar liggesusers/** * @file sparse_autoencoder_impl.hpp * @author Siddharth Agrawal * * Implementation of sparse autoencoders. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_IMPL_HPP #define __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_IMPL_HPP // In case it hasn't been included yet. #include "sparse_autoencoder.hpp" namespace mlpack { namespace nn { template class OptimizerType> SparseAutoencoder::SparseAutoencoder(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, double lambda, double beta, double rho) : visibleSize(visibleSize), hiddenSize(hiddenSize), lambda(lambda), beta(beta), rho(rho) { SparseAutoencoderFunction encoderFunction(data, visibleSize, hiddenSize, lambda, beta, rho); OptimizerType optimizer(encoderFunction); parameters = encoderFunction.GetInitialPoint(); // Train the model. Timer::Start("sparse_autoencoder_optimization"); const double out = optimizer.Optimize(parameters); Timer::Stop("sparse_autoencoder_optimization"); Log::Info << "SparseAutoencoder::SparseAutoencoder(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> SparseAutoencoder::SparseAutoencoder( OptimizerType &optimizer) : parameters(optimizer.Function().GetInitialPoint()), visibleSize(optimizer.Function().VisibleSize()), hiddenSize(optimizer.Function().HiddenSize()), lambda(optimizer.Function().Lambda()), beta(optimizer.Function().Beta()), rho(optimizer.Function().Rho()) { Timer::Start("sparse_autoencoder_optimization"); const double out = optimizer.Optimize(parameters); Timer::Stop("sparse_autoencoder_optimization"); Log::Info << "SparseAutoencoder::SparseAutoencoder(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> void SparseAutoencoder::GetNewFeatures(arma::mat& data, arma::mat& features) { const size_t l1 = hiddenSize; const size_t l2 = visibleSize; Sigmoid(parameters.submat(0, 0, l1 - 1, l2 - 1) * data + arma::repmat(parameters.submat(0, l2, l1 - 1, l2), 1, data.n_cols), features); } }; // namespace nn }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.cpp0000644000176200001440000002220713647514343027747 0ustar liggesusers/** * @file sparse_autoencoder_function.cpp * @author Siddharth Agrawal * * Implementation of function to be optimized for sparse autoencoders. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "sparse_autoencoder_function.hpp" using namespace mlpack; using namespace mlpack::nn; using namespace std; SparseAutoencoderFunction::SparseAutoencoderFunction(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, const double lambda, const double beta, const double rho) : data(data), visibleSize(visibleSize), hiddenSize(hiddenSize), lambda(lambda), beta(beta), rho(rho) { // Initialize the parameters to suitable values. initialPoint = InitializeWeights(); } /** Initializes the parameter weights if the initial point is not passed to the * constructor. The weights w1, w2 are initialized to randomly in the range * [-r, r] where 'r' is decided using the sizes of the visible and hidden * layers. The biases b1, b2 are initialized to 0. */ const arma::mat SparseAutoencoderFunction::InitializeWeights() { // The module uses a matrix to store the parameters, its structure looks like: // vSize 1 // | | | // hSize| w1 |b1| // |________|__| // | | | // hSize| w2' | | // |________|__| // 1| b2' | | // // There are (hiddenSize + 1) empty cells in the matrix, but it is small // compared to the matrix size. The above structure allows for smooth matrix // operations without making the code too ugly. // Initialize w1 and w2 to random values in the range [0, 1], then set b1 and // b2 to 0. arma::mat parameters; parameters.randu(2 * hiddenSize + 1, visibleSize + 1); parameters.row(2 * hiddenSize).zeros(); parameters.col(visibleSize).zeros(); // Decide the parameter 'r' depending on the size of the visible and hidden // layers. The formula used is r = sqrt(6) / sqrt(vSize + hSize + 1). const double range = sqrt(6) / sqrt(visibleSize + hiddenSize + 1); //Shift range of w1 and w2 values from [0, 1] to [-r, r]. parameters.submat(0, 0, 2 * hiddenSize - 1, visibleSize - 1) = 2 * range * (parameters.submat(0, 0, 2 * hiddenSize - 1, visibleSize - 1) - 0.5); return parameters; } /** Evaluates the objective function given the parameters. */ double SparseAutoencoderFunction::Evaluate(const arma::mat& parameters) const { // The objective function is the average squared reconstruction error of the // network. w1 and b1 are the weights and biases associated with the hidden // layer, whereas w2 and b2 are associated with the output layer. // f(w1,w2,b1,b2) = sum((data - sigmoid(w2*sigmoid(w1data + b1) + b2))^2) / 2m // 'm' is the number of training examples. // The cost also takes into account the regularization and KL divergence terms // to control the parameter weights and sparsity of the model respectively. // Compute the limits for the parameters w1, w2, b1 and b2. const size_t l1 = hiddenSize; const size_t l2 = visibleSize; const size_t l3 = 2 * hiddenSize; // w1, w2, b1 and b2 are not extracted separately, 'parameters' is directly // used in their place to avoid copying data. The following representations // are used: // w1 <- parameters.submat(0, 0, l1-1, l2-1) // w2 <- parameters.submat(l1, 0, l3-1, l2-1).t() // b1 <- parameters.submat(0, l2, l1-1, l2) // b2 <- parameters.submat(l3, 0, l3, l2-1).t() arma::mat hiddenLayer, outputLayer; // Compute activations of the hidden and output layers. Sigmoid(parameters.submat(0, 0, l1 - 1, l2 - 1) * data + arma::repmat(parameters.submat(0, l2, l1 - 1, l2), 1, data.n_cols), hiddenLayer); Sigmoid(parameters.submat(l1, 0, l3 - 1, l2 - 1).t() * hiddenLayer + arma::repmat(parameters.submat(l3, 0, l3, l2 - 1).t(), 1, data.n_cols), outputLayer); arma::mat rhoCap, diff; // Average activations of the hidden layer. rhoCap = arma::sum(hiddenLayer, 1) / data.n_cols; // Difference between the reconstructed data and the original data. diff = outputLayer - data; double wL2SquaredNorm; // Calculate squared L2-norms of w1 and w2. wL2SquaredNorm = arma::accu(parameters.submat(0, 0, l3 - 1, l2 - 1) % parameters.submat(0, 0, l3 - 1, l2 - 1)); double sumOfSquaresError, weightDecay, klDivergence, cost; // Calculate the reconstruction error, the regularization cost and the KL // divergence cost terms. 'sumOfSquaresError' is the average squared l2-norm // of the reconstructed data difference. 'weightDecay' is the squared l2-norm // of the weights w1 and w2. 'klDivergence' is the cost of the hidden layer // activations not being low. It is given by the following formula: // KL = sum_over_hSize(rho*log(rho/rhoCaq) + (1-rho)*log((1-rho)/(1-rhoCap))) sumOfSquaresError = 0.5 * arma::accu(diff % diff) / data.n_cols; weightDecay = 0.5 * lambda * wL2SquaredNorm; klDivergence = beta * arma::accu(rho * arma::log(rho / rhoCap) + (1 - rho) * arma::log((1 - rho) / (1 - rhoCap))); // The cost is the sum of the terms calculated above. cost = sumOfSquaresError + weightDecay + klDivergence; return cost; } /** Calculates and stores the gradient values given a set of parameters. */ void SparseAutoencoderFunction::Gradient(const arma::mat& parameters, arma::mat& gradient) const { // Performs a feedforward pass of the neural network, and computes the // activations of the output layer as in the Evaluate() method. It uses the // Backpropagation algorithm to calculate the delta values at each layer, // except for the input layer. The delta values are then used with input layer // and hidden layer activations to get the parameter gradients. // Compute the limits for the parameters w1, w2, b1 and b2. const size_t l1 = hiddenSize; const size_t l2 = visibleSize; const size_t l3 = 2 * hiddenSize; // w1, w2, b1 and b2 are not extracted separately, 'parameters' is directly // used in their place to avoid copying data. The following representations // are used: // w1 <- parameters.submat(0, 0, l1-1, l2-1) // w2 <- parameters.submat(l1, 0, l3-1, l2-1).t() // b1 <- parameters.submat(0, l2, l1-1, l2) // b2 <- parameters.submat(l3, 0, l3, l2-1).t() arma::mat hiddenLayer, outputLayer; // Compute activations of the hidden and output layers. Sigmoid(parameters.submat(0, 0, l1 - 1, l2 - 1) * data + arma::repmat(parameters.submat(0, l2, l1 - 1, l2), 1, data.n_cols), hiddenLayer); Sigmoid(parameters.submat(l1, 0, l3 - 1, l2 - 1).t() * hiddenLayer + arma::repmat(parameters.submat(l3, 0, l3, l2 - 1).t(), 1, data.n_cols), outputLayer); arma::mat rhoCap, diff; // Average activations of the hidden layer. rhoCap = arma::sum(hiddenLayer, 1) / data.n_cols; // Difference between the reconstructed data and the original data. diff = outputLayer - data; arma::mat klDivGrad, delOut, delHid; // The delta vector for the output layer is given by diff * f'(z), where z is // the preactivation and f is the activation function. The derivative of the // sigmoid function turns out to be f(z) * (1 - f(z)). For every other layer // in the neural network which comes before the output layer, the delta values // are given del_n = w_n' * del_(n+1) * f'(z_n). Since our cost function also // includes the KL divergence term, we adjust for that in the formula below. klDivGrad = beta * (-(rho / rhoCap) + (1 - rho) / (1 - rhoCap)); delOut = diff % outputLayer % (1 - outputLayer); delHid = (parameters.submat(l1, 0, l3 - 1, l2 - 1) * delOut + arma::repmat(klDivGrad, 1, data.n_cols)) % hiddenLayer % (1 - hiddenLayer); gradient.zeros(2 * hiddenSize + 1, visibleSize + 1); // Compute the gradient values using the activations and the delta values. The // formula also accounts for the regularization terms in the objective. // function. gradient.submat(0, 0, l1 - 1, l2 - 1) = delHid * data.t() / data.n_cols + lambda * parameters.submat(0, 0, l1 - 1, l2 - 1); gradient.submat(l1, 0, l3 - 1, l2 - 1) = (delOut * hiddenLayer.t() / data.n_cols + lambda * parameters.submat(l1, 0, l3 - 1, l2 - 1).t()).t(); gradient.submat(0, l2, l1 - 1, l2) = arma::sum(delHid, 1) / data.n_cols; gradient.submat(l3, 0, l3, l2 - 1) = (arma::sum(delOut, 1) / data.n_cols).t(); } RcppMLPACK/src/mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.hpp0000644000176200001440000001211113647512751027746 0ustar liggesusers/** * @file sparse_autoencoder_function.hpp * @author Siddharth Agrawal * * The function to be optimized for sparse autoencoders. Any mlpack optimizer * can be used. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_FUNCTION_HPP #define __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_FUNCTION_HPP #include namespace mlpack { namespace nn { /** * This is a class for the sparse autoencoder objective function. It can be used * to create learning models like self-taught learning, stacked autoencoders, * conditional random fields (CRFs), and so forth. */ class SparseAutoencoderFunction { public: /** * Construct the sparse autoencoder objective function with the given * parameters. * * @param data The data matrix. * @param visibleSize Size of input vector expected at the visible layer. * @param hiddenSize Size of input vector expected at the hidden layer. * @param lambda L2-regularization parameter. * @param beta KL divergence parameter. * @param rho Sparsity parameter. */ SparseAutoencoderFunction(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, const double lambda = 0.0001, const double beta = 3, const double rho = 0.01); //! Initializes the parameters of the model to suitable values. const arma::mat InitializeWeights(); /** * Evaluates the objective function of the sparse autoencoder model using the * given parameters. The cost function has terms for the reconstruction * error, regularization cost and the sparsity cost. The objective function * takes a low value when the model is able to reconstruct the data well * using weights which are low in value and when the average activations of * neurons in the hidden layers agrees well with the sparsity parameter 'rho'. * * @param parameters Current values of the model parameters. */ double Evaluate(const arma::mat& parameters) const; /** * Evaluates the gradient values of the objective function given the current * set of parameters. The function performs a feedforward pass and computes * the error in reconstructing the data points. It then uses the * backpropagation algorithm to compute the gradient values. * * @param parameters Current values of the model parameters. * @param gradient Matrix where gradient values will be stored. */ void Gradient(const arma::mat& parameters, arma::mat& gradient) const; /** * Returns the elementwise sigmoid of the passed matrix, where the sigmoid * function of a real number 'x' is [1 / (1 + exp(-x))]. * * @param x Matrix of real values for which we require the sigmoid activation. */ void Sigmoid(const arma::mat& x, arma::mat& output) const { output = (1.0 / (1 + arma::exp(-x))); } //! Return the initial point for the optimization. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Sets size of the visible layer. void VisibleSize(const size_t visible) { this->visibleSize = visible; } //! Gets size of the visible layer. size_t VisibleSize() const { return visibleSize; } //! Sets size of the hidden layer. void HiddenSize(const size_t hidden) { this->hiddenSize = hidden; } //! Gets the size of the hidden layer. size_t HiddenSize() const { return hiddenSize; } //! Sets the L2-regularization parameter. void Lambda(const double l) { this->lambda = l; } //! Gets the L2-regularization parameter. double Lambda() const { return lambda; } //! Sets the KL divergence parameter. void Beta(const double b) { this->beta = b; } //! Gets the KL divergence parameter. double Beta() const { return beta; } //! Sets the sparsity parameter. void Rho(const double r) { this->rho = r; } //! Gets the sparsity parameter. double Rho() const { return rho; } private: //! The matrix of data points. const arma::mat& data; //! Intial parameter vector. arma::mat initialPoint; //! Size of the visible layer. size_t visibleSize; //! Size of the hidden layer. size_t hiddenSize; //! L2-regularization parameter. double lambda; //! KL divergence parameter. double beta; //! Sparsity parameter. double rho; }; }; // namespace nn }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/regularized_svd/0000755000176200001440000000000013647514343020752 5ustar liggesusersRcppMLPACK/src/mlpack/methods/regularized_svd/regularized_svd_function.cpp0000644000176200001440000001617413647514343026565 0ustar liggesusers/** * @file regularized_svd_function.cpp * @author Siddharth Agrawal * * An implementation of the RegularizedSVDFunction class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "regularized_svd_function.hpp" namespace mlpack { namespace svd { RegularizedSVDFunction::RegularizedSVDFunction(const arma::mat& data, const size_t rank, const double lambda) : data(data), rank(rank), lambda(lambda) { // Number of users and items in the data. numUsers = max(data.row(0)) + 1; numItems = max(data.row(1)) + 1; // Initialize the parameters. initialPoint.randu(rank, numUsers + numItems); } double RegularizedSVDFunction::Evaluate(const arma::mat& parameters) const { // The cost for the optimization is as follows: // f(u, v) = sum((rating(i, j) - u(i).t() * v(j))^2) // The sum is over all the ratings in the rating matrix. // 'i' points to the user and 'j' points to the item being considered. // The regularization term is added to the above cost, where the vectors u(i) // and v(j) are regularized for each rating they contribute to. double cost = 0.0; for(size_t i = 0; i < data.n_cols; i++) { // Indices for accessing the the correct parameter columns. const size_t user = data(0, i); const size_t item = data(1, i) + numUsers; // Calculate the squared error in the prediction. const double rating = data(2, i); double ratingError = rating - arma::dot(parameters.col(user), parameters.col(item)); double ratingErrorSquared = ratingError * ratingError; // Calculate the regularization penalty corresponding to the parameters. double userVecNorm = arma::norm(parameters.col(user), 2); double itemVecNorm = arma::norm(parameters.col(item), 2); double regularizationError = lambda * (userVecNorm * userVecNorm + itemVecNorm * itemVecNorm); cost += (ratingErrorSquared + regularizationError); } return cost; } double RegularizedSVDFunction::Evaluate(const arma::mat& parameters, const size_t i) const { // Indices for accessing the the correct parameter columns. const size_t user = data(0, i); const size_t item = data(1, i) + numUsers; // Calculate the squared error in the prediction. const double rating = data(2, i); double ratingError = rating - arma::dot(parameters.col(user), parameters.col(item)); double ratingErrorSquared = ratingError * ratingError; // Calculate the regularization penalty corresponding to the parameters. double userVecNorm = arma::norm(parameters.col(user), 2); double itemVecNorm = arma::norm(parameters.col(item), 2); double regularizationError = lambda * (userVecNorm * userVecNorm + itemVecNorm * itemVecNorm); return (ratingErrorSquared + regularizationError); } void RegularizedSVDFunction::Gradient(const arma::mat& parameters, arma::mat& gradient) const { // For an example with rating corresponding to user 'i' and item 'j', the // gradients for the parameters is as follows: // grad(u(i)) = lambda * u(i) - error * v(j) // grad(v(j)) = lambda * v(j) - error * u(i) // 'error' is the prediction error for that example, which is: // rating(i, j) - u(i).t() * v(j) // The full gradient is calculated by summing the contributions over all the // training examples. gradient.zeros(rank, numUsers + numItems); for(size_t i = 0; i < data.n_cols; i++) { // Indices for accessing the the correct parameter columns. const size_t user = data(0, i); const size_t item = data(1, i) + numUsers; // Prediction error for the example. const double rating = data(2, i); double ratingError = rating - arma::dot(parameters.col(user), parameters.col(item)); // Gradient is non-zero only for the parameter columns corresponding to the // example. gradient.col(user) += 2 * (lambda * parameters.col(user) - ratingError * parameters.col(item)); gradient.col(item) += 2 * (lambda * parameters.col(item) - ratingError * parameters.col(user)); } } }; // namespace svd }; // namespace mlpack // Template specialization for the SGD optimizer. namespace mlpack { namespace optimization { template<> double SGD::Optimize(arma::mat& parameters) { // Find the number of functions to use. const size_t numFunctions = function.NumFunctions(); // To keep track of where we are and how things are going. size_t currentFunction = 0; double overallObjective = 0; // Calculate the first objective function. for(size_t i = 0; i < numFunctions; i++) overallObjective += function.Evaluate(parameters, i); const arma::mat data = function.Dataset(); // Now iterate! for(size_t i = 1; i != maxIterations; i++, currentFunction++) { // Is this iteration the start of a sequence? if((currentFunction % numFunctions) == 0) { // Reset the counter variables. overallObjective = 0; currentFunction = 0; } const size_t numUsers = function.NumUsers(); // Indices for accessing the the correct parameter columns. const size_t user = data(0, currentFunction); const size_t item = data(1, currentFunction) + numUsers; // Prediction error for the example. const double rating = data(2, currentFunction); double ratingError = rating - arma::dot(parameters.col(user), parameters.col(item)); double lambda = function.Lambda(); // Gradient is non-zero only for the parameter columns corresponding to the // example. parameters.col(user) -= stepSize * (lambda * parameters.col(user) - ratingError * parameters.col(item)); parameters.col(item) -= stepSize * (lambda * parameters.col(item) - ratingError * parameters.col(user)); // Now add that to the overall objective function. overallObjective += function.Evaluate(parameters, currentFunction); } return overallObjective; } }; // namespace optimization }; // namespace mlpack RcppMLPACK/src/mlpack/methods/regularized_svd/regularized_svd.hpp0000644000176200001440000000532413647512751024661 0ustar liggesusers/** * @file regularized_svd.hpp * @author Siddharth Agrawal * * An implementation of Regularized SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_HPP #define __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_HPP #include #include #include "regularized_svd_function.hpp" namespace mlpack { namespace svd { template< template class OptimizerType = mlpack::optimization::SGD > class RegularizedSVD { public: /** * Constructor for Regularized SVD. Obtains the user and item matrices after * training on the passed data. The constructor initiates an object of class * RegularizedSVDFunction for optimization. It uses the SGD optimizer by * default. The optimizer uses a template specialization of Optimize(). * * @param data Dataset for which SVD is calculated. * @param u User matrix in the matrix decomposition. * @param v Item matrix in the matrix decomposition. * @param rank Rank used for matrix factorization. * @param iterations Number of optimization iterations. * @param lambda Regularization parameter for the optimization. */ RegularizedSVD(const arma::mat& data, arma::mat& u, arma::mat& v, const size_t rank, const size_t iterations = 10, const double alpha = 0.01, const double lambda = 0.02); private: //! Rating data. const arma::mat& data; //! Rank used for matrix factorization. size_t rank; //! Number of optimization iterations. size_t iterations; //! Learning rate for the SGD optimizer. double alpha; //! Regularization parameter for the optimization. double lambda; //! Function that will be held by the optimizer. RegularizedSVDFunction rSVDFunc; //! Default SGD optimizer for the class. mlpack::optimization::SGD optimizer; }; }; // namespace svd }; // namespace mlpack // Include implementation. #include "regularized_svd_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/regularized_svd/regularized_svd_function.hpp0000644000176200001440000001015313647512751026562 0ustar liggesusers/** * @file regularized_svd_function.hpp * @author Siddharth Agrawal * * An implementation of the RegularizedSVDFunction class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_FUNCTION_SVD_HPP #define __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_FUNCTION_SVD_HPP #include #include namespace mlpack { namespace svd { class RegularizedSVDFunction { public: /** * Constructor for RegularizedSVDFunction class. The constructor calculates * the number of users and items in the passed data. It also randomly * initializes the parameter values. * * @param data Dataset for which SVD is calculated. * @param rank Rank used for matrix factorization. * @param lambda Regularization parameter used for optimization. */ RegularizedSVDFunction(const arma::mat& data, const size_t rank, const double lambda); /** * Evaluates the cost function over all examples in the data. * * @param parameters Parameters(user/item matrices) of the decomposition. */ double Evaluate(const arma::mat& parameters) const; /** * Evaluates the cost function for one training example. Useful for the SGD * optimizer abstraction which uses one training example at a time. * * @param parameters Parameters(user/item matrices) of the decomposition. * @param i Index of the training example to be used. */ double Evaluate(const arma::mat& parameters, const size_t i) const; /** * Evaluates the full gradient of the cost function over all the training * examples. * * @param parameters Parameters(user/item matrices) of the decomposition. * @param gradient Calculated gradient for the parameters. */ void Gradient(const arma::mat& parameters, arma::mat& gradient) const; //! Return the initial point for the optimization. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Return the dataset passed into the constructor. const arma::mat& Dataset() const { return data; } //! Return the number of training examples. Useful for SGD optimizer. size_t NumFunctions() const { return data.n_cols; } //! Return the number of users in the data. size_t NumUsers() const { return numUsers; } //! Return the number of items in the data. size_t NumItems() const { return numItems; } //! Return the regularization parameters. double Lambda() const { return lambda; } //! Return the rank used for the factorization. size_t Rank() const { return rank; } private: //! Rating data. const arma::mat& data; //! Initial parameter point. arma::mat initialPoint; //! Rank used for matrix factorization. size_t rank; //! Regularization parameter for the optimization. double lambda; //! Number of users in the given dataset. size_t numUsers; //! Number of items in the given dataset. size_t numItems; }; }; // namespace svd }; // namespace mlpack namespace mlpack { namespace optimization { /** * Template specialization for SGD optimizer. Used because the gradient * affects only a small number of parameters per example, and thus the normal * abstraction does not work as fast as we might like it to. */ template<> double SGD::Optimize( arma::mat& parameters); }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/regularized_svd/regularized_svd_impl.hpp0000644000176200001440000000422213647512751025676 0ustar liggesusers/** * @file regularized_svd_impl.hpp * @author Siddharth Agrawal * * An implementation of Regularized SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_IMPL_HPP #define __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_IMPL_HPP namespace mlpack { namespace svd { template class OptimizerType> RegularizedSVD::RegularizedSVD(const arma::mat& data, arma::mat& u, arma::mat& v, const size_t rank, const size_t iterations, const double alpha, const double lambda) : data(data), rank(rank), iterations(iterations), alpha(alpha), lambda(lambda), rSVDFunc(data, rank, lambda), optimizer(rSVDFunc, alpha, iterations * data.n_cols) { arma::mat parameters = rSVDFunc.GetInitialPoint(); // Train the model. Timer::Start("regularized_svd_optimization"); const double out = optimizer.Optimize(parameters); Timer::Stop("regularized_svd_optimization"); const size_t numUsers = max(data.row(0)) + 1; const size_t numItems = max(data.row(1)) + 1; u = parameters.submat(0, 0, rank - 1, numUsers - 1); v = parameters.submat(0, numUsers, rank - 1, numUsers + numItems - 1); } }; // namespace svd }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/logistic_regression/0000755000176200001440000000000013647514343021636 5ustar liggesusersRcppMLPACK/src/mlpack/methods/logistic_regression/logistic_regression_function.hpp0000644000176200001440000001254613647512751030342 0ustar liggesusers/** * @file logistic_regression_function.hpp * @author Sumedh Ghaisas * * Implementation of the logistic regression function, which is meant to be * optimized by a separate optimizer class that takes LogisticRegressionFunction * as its FunctionType class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_FUNCTION_HPP #define __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_FUNCTION_HPP #include namespace mlpack { namespace regression { /** * The log-likelihood function for the logistic regression objective function. * This is used by various mlpack optimizers to train a logistic regression * model. */ class LogisticRegressionFunction { public: LogisticRegressionFunction(const arma::mat& predictors, const arma::vec& responses, const double lambda = 0); LogisticRegressionFunction(const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda = 0); //! Return the initial point for the optimization. const arma::mat& InitialPoint() const { return initialPoint; } //! Modify the initial point for the optimization. arma::mat& InitialPoint() { return initialPoint; } //! Return the regularization parameter (lambda). const double& Lambda() const { return lambda; } //! Modify the regularization parameter (lambda). double& Lambda() { return lambda; } //! Return the matrix of predictors. const arma::mat& Predictors() const { return predictors; } //! Return the vector of responses. const arma::vec& Responses() const { return responses; } /** * Evaluate the logistic regression log-likelihood function with the given * parameters. Note that if a point has 0 probability of being classified * directly with the given parameters, then Evaluate() will return nan (this * is kind of a corner case and should not happen for reasonable models). * * The optimum (minimum) of this function is 0.0, and occurs when each point * is classified correctly with very high probability. * * @param parameters Vector of logistic regression parameters. */ double Evaluate(const arma::mat& parameters) const; /** * Evaluate the logistic regression log-likelihood function with the given * parameters, but using only one data point. This is useful for optimizers * such as SGD, which require a separable objective function. Note that if * the point has 0 probability of being classified correctly with the given * parameters, then Evaluate() will return nan (this is kind of a corner case * and should not happen for reasonable models). * * The optimum (minimum) of this function is 0.0, and occurs when the point is * classified correctly with very high probability. * * @param parameters Vector of logistic regression parameters. * @param i Index of point to use for objective function evaluation. */ double Evaluate(const arma::mat& parameters, const size_t i) const; /** * Evaluate the gradient of the logistic regression log-likelihood function * with the given parameters. * * @param parameters Vector of logistic regression parameters. * @param gradient Vector to output gradient into. */ void Gradient(const arma::mat& parameters, arma::mat& gradient) const; /** * Evaluate the gradient of the logistic regression log-likelihood function * with the given parameters, and with respect to only one point in the * dataset. This is useful for optimizers such as SGD, which require a * separable objective function. * * @param parameters Vector of logistic regression parameters. * @param i Index of points to use for objective function gradient evaluation. * @param gradient Vector to output gradient into. */ void Gradient(const arma::mat& parameters, const size_t i, arma::mat& gradient) const; //! Return the initial point for the optimization. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Return the number of separable functions (the number of predictor points). size_t NumFunctions() const { return predictors.n_cols; } private: //! The initial point, from which to start the optimization. arma::mat initialPoint; //! The matrix of data points (predictors). const arma::mat& predictors; //! The vector of responses to the input data points. const arma::vec& responses; //! The regularization parameter for L2-regularization. double lambda; }; }; // namespace regression }; // namespace mlpack #endif // __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_FUNCTION_HPP RcppMLPACK/src/mlpack/methods/logistic_regression/logistic_regression_function.cpp0000644000176200001440000001407413647514343030332 0ustar liggesusers/** * @file logistic_regression_function.cpp * @author Sumedh Ghaisas * * Implementation of hte LogisticRegressionFunction class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "logistic_regression_function.hpp" using namespace mlpack; using namespace mlpack::regression; LogisticRegressionFunction::LogisticRegressionFunction( const arma::mat& predictors, const arma::vec& responses, const double lambda) : predictors(predictors), responses(responses), lambda(lambda) { initialPoint = arma::zeros(predictors.n_rows + 1, 1); } LogisticRegressionFunction::LogisticRegressionFunction( const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda) : initialPoint(initialPoint), predictors(predictors), responses(responses), lambda(lambda) { //to check if initialPoint is compatible with predictors if (initialPoint.n_rows != (predictors.n_rows + 1) || initialPoint.n_cols != 1) this->initialPoint = arma::zeros(predictors.n_rows + 1, 1); } /** * Evaluate the logistic regression objective function given the estimated * parameters. */ double LogisticRegressionFunction::Evaluate(const arma::mat& parameters) const { // The objective function is the log-likelihood function (w is the parameters // vector for the model; y is the responses; x is the predictors; sig() is the // sigmoid function): // f(w) = sum(y log(sig(w'x)) + (1 - y) log(sig(1 - w'x))). // We want to minimize this function. L2-regularization is just lambda // multiplied by the squared l2-norm of the parameters then divided by two. // For the regularization, we ignore the first term, which is the intercept // term. const double regularization = 0.5 * lambda * arma::dot(parameters.col(0).subvec(1, parameters.n_elem - 1), parameters.col(0).subvec(1, parameters.n_elem - 1)); // Calculate vectors of sigmoids. The intercept term is parameters(0, 0) and // does not need to be multiplied by any of the predictors. const arma::vec exponents = parameters(0, 0) + predictors.t() * parameters.col(0).subvec(1, parameters.n_elem - 1); const arma::vec sigmoid = 1.0 / (1.0 + arma::exp(-exponents)); // Assemble full objective function. Often the objective function and the // regularization as given are divided by the number of features, but this // doesn't actually affect the optimization result, so we'll just ignore those // terms for computational efficiency. double result = 0.0; for (size_t i = 0; i < responses.n_elem; ++i) { if (responses[i] == 1) result += log(sigmoid[i]); else result += log(1.0 - sigmoid[i]); } // Invert the result, because it's a minimization. return -result + regularization; } /** * Evaluate the logistic regression objective function, but with only one point. * This is useful for optimizers that use a separable objective function, such * as SGD. */ double LogisticRegressionFunction::Evaluate(const arma::mat& parameters, const size_t i) const { // Calculate the regularization term. We must divide by the number of points, // so that sum(Evaluate(parameters, [1:points])) == Evaluate(parameters). const double regularization = lambda * (1.0 / (2.0 * predictors.n_cols)) * arma::dot(parameters.col(0).subvec(1, parameters.n_elem - 1), parameters.col(0).subvec(1, parameters.n_elem - 1)); // Calculate sigmoid. const double exponent = parameters(0, 0) + arma::dot(predictors.col(i), parameters.col(0).subvec(1, parameters.n_elem - 1)); const double sigmoid = 1.0 / (1.0 + std::exp(-exponent)); if (responses[i] == 1) return -log(sigmoid) + regularization; else return -log(1.0 - sigmoid) + regularization; } //! Evaluate the gradient of the logistic regression objective function. void LogisticRegressionFunction::Gradient(const arma::mat& parameters, arma::mat& gradient) const { // Regularization term. arma::mat regularization; regularization = lambda * parameters.col(0).subvec(1, parameters.n_elem - 1); const arma::vec sigmoids = 1 / (1 + arma::exp(-parameters(0, 0) - predictors.t() * parameters.col(0).subvec(1, parameters.n_elem - 1))); gradient.set_size(parameters.n_elem); gradient[0] = -arma::accu(responses - sigmoids); gradient.col(0).subvec(1, parameters.n_elem - 1) = -predictors * (responses - sigmoids) + regularization; } /** * Evaluate the individual gradients of the logistic regression objective * function with respect to individual points. This is useful for optimizers * that use a separable objective function, such as SGD. */ void LogisticRegressionFunction::Gradient(const arma::mat& parameters, const size_t i, arma::mat& gradient) const { // Calculate the regularization term. arma::mat regularization; regularization = lambda * parameters.col(0).subvec(1, parameters.n_elem - 1) / predictors.n_cols; const double sigmoid = 1.0 / (1.0 + std::exp(-parameters(0, 0) - arma::dot(predictors.col(i), parameters.col(0).subvec(1, parameters.n_elem - 1)))); gradient.set_size(parameters.n_elem); gradient[0] = -(responses[i] - sigmoid); gradient.col(0).subvec(1, parameters.n_elem - 1) = -predictors.col(i) * (responses[i] - sigmoid) + regularization; } RcppMLPACK/src/mlpack/methods/logistic_regression/logistic_regression_impl.hpp0000644000176200001440000001267513647512751027461 0ustar liggesusers/** * @file logistic_regression_impl.hpp * @author Sumedh Ghaisas * * Implementation of the LogisticRegression class. This implementation supports * L2-regularization. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_IMPL_HPP #define __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_IMPL_HPP // In case it hasn't been included yet. #include "logistic_regression.hpp" namespace mlpack { namespace regression { template class OptimizerType> LogisticRegression::LogisticRegression( const arma::mat& predictors, const arma::vec& responses, const double lambda) : parameters(arma::zeros(predictors.n_rows + 1)), lambda(lambda) { LogisticRegressionFunction errorFunction(predictors, responses, lambda); OptimizerType optimizer(errorFunction); // Train the model. //Timer::Start("logistic_regression_optimization"); const double out = optimizer.Optimize(parameters); //Timer::Stop("logistic_regression_optimization"); Rcpp::Rcout << "LogisticRegression::LogisticRegression(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> LogisticRegression::LogisticRegression( const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda) : parameters(arma::zeros(predictors.n_rows + 1)), lambda(lambda) { LogisticRegressionFunction errorFunction(predictors, responses, lambda); errorFunction.InitialPoint() = initialPoint; OptimizerType optimizer(errorFunction); // Train the model. //Timer::Start("logistic_regression_optimization"); const double out = optimizer.Optimize(parameters); //Timer::Stop("logistic_regression_optimization"); Rcpp::Rcout << "LogisticRegression::LogisticRegression(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> LogisticRegression::LogisticRegression( OptimizerType& optimizer) : parameters(optimizer.Function().GetInitialPoint()), lambda(optimizer.Function().Lambda()) { //Timer::Start("logistic_regression_optimization"); const double out = optimizer.Optimize(parameters); //Timer::Stop("logistic_regression_optimization"); Rcpp::Rcout << "LogisticRegression::LogisticRegression(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> LogisticRegression::LogisticRegression( const arma::vec& parameters, const double lambda) : parameters(parameters), lambda(lambda) { // Nothing to do. } template class OptimizerType> void LogisticRegression::Predict(const arma::mat& predictors, arma::vec& responses, const double decisionBoundary) const { // Calculate sigmoid function for each point. The (1.0 - decisionBoundary) // term correctly sets an offset so that floor() returns 0 or 1 correctly. responses = arma::floor((1.0 / (1.0 + arma::exp(-parameters(0) - predictors.t() * parameters.subvec(1, parameters.n_elem - 1)))) + (1.0 - decisionBoundary)); } template class OptimizerType> double LogisticRegression::ComputeError( const arma::mat& predictors, const arma::vec& responses) const { // Construct a new error function. LogisticRegressionFunction newErrorFunction(predictors, responses, lambda); return newErrorFunction.Evaluate(parameters); } template class OptimizerType> double LogisticRegression::ComputeAccuracy( const arma::mat& predictors, const arma::vec& responses, const double decisionBoundary) const { // Predict responses using the current model. arma::vec tempResponses; Predict(predictors, tempResponses, decisionBoundary); // Count the number of responses that were correct. size_t count = 0; for (size_t i = 0; i < responses.n_elem; i++) if (responses(i) == tempResponses(i)) count++; return (double) (count * 100) / responses.n_rows; } template class OptimizerType> std::string LogisticRegression::ToString() const { std::ostringstream convert; convert << "Logistic Regression [" << this << "]" << std::endl; convert << " Parameters: " << parameters.n_rows << std::endl; convert << " Lambda: " << lambda << std::endl; return convert.str(); } }; // namespace regression }; // namespace mlpack #endif // __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_IMPL_HPP RcppMLPACK/src/mlpack/methods/logistic_regression/logistic_regression.hpp0000644000176200001440000001473513647512751026437 0ustar liggesusers/** * @file logistic_regression.hpp * @author Sumedh Ghaisas * * The LogisticRegression class, which implements logistic regression. This * implements supports L2-regularization. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_HPP #define __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_HPP #include #include #include "logistic_regression_function.hpp" namespace mlpack { namespace regression { template< template class OptimizerType = mlpack::optimization::L_BFGS > class LogisticRegression { public: /** * Construct the LogisticRegression class with the given labeled training * data. This will train the model. Optionally, specify lambda, which is the * penalty parameter for L2-regularization. If not specified, it is set to 0, * which results in standard (unregularized) logistic regression. * * @param predictors Input training variables. * @param responses Outputs resulting from input training variables. * @param lambda L2-regularization parameter. */ LogisticRegression(const arma::mat& predictors, const arma::vec& responses, const double lambda = 0); /** * Construct the LogisticRegression class with the given labeled training * data. This will train the model. Optionally, specify lambda, which is the * penalty parameter for L2-regularization. If not specified, it is set to 0, * which results in standard (unregularized) logistic regression. * * @param predictors Input training variables. * @param responses Outputs results from input training variables. * @param initialPoint Initial model to train with. * @param lambda L2-regularization parameter. */ LogisticRegression(const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda = 0); /** * Construct the LogisticRegression class with the given labeled training * data. This will train the model. This overload takes an already * instantiated optimizer (which holds the LogisticRegressionFunction error * function, which must also be instantiated), so that the optimizer can be * configured before the training is run by this constructor. The predictors * and responses and initial point are all taken from the error function * contained in the optimizer. * * @param optimizer Instantiated optimizer with instantiated error function. */ LogisticRegression(OptimizerType& optimizer); /** * Construct a logistic regression model from the given parameters, without * performing any training. The lambda parameter is used for the * ComputeAccuracy() and ComputeError() functions; this constructor does not * train the model itself. * * @param parameters Parameters making up the model. * @param lambda L2-regularization penalty parameter. */ LogisticRegression(const arma::vec& parameters, const double lambda = 0); //! Return the parameters (the b vector). const arma::vec& Parameters() const { return parameters; } //! Modify the parameters (the b vector). arma::vec& Parameters() { return parameters; } //! Return the lambda value for L2-regularization. const double& Lambda() const { return lambda; } //! Modify the lambda value for L2-regularization. double& Lambda() { return lambda; } /** * Predict the responses to a given set of predictors. The responses will be * either 0 or 1. Optionally, specify the decision boundary; logistic * regression returns a value between 0 and 1. If the value is greater than * the decision boundary, the response is taken to be 1; otherwise, it is 0. * By default the decision boundary is 0.5. * * @param predictors Input predictors. * @param responses Vector to put output predictions of responses into. * @param decisionBoundary Decision boundary (default 0.5). */ void Predict(const arma::mat& predictors, arma::vec& responses, const double decisionBoundary = 0.5) const; /** * Compute the accuracy of the model on the given predictors and responses, * optionally using the given decision boundary. The responses should be * either 0 or 1. Logistic regression returns a value between 0 and 1. If * the value is greater than the decision boundary, the response is taken to * be 1; otherwise, it is 0. By default, the decision boundary is 0.5. * * The accuracy is returned as a percentage, between 0 and 100. * * @param predictors Input predictors. * @param responses Vector of responses. * @param decisionBoundary Decision boundary (default 0.5). * @return Percentage of responses that are predicted correctly. */ double ComputeAccuracy(const arma::mat& predictors, const arma::vec& responses, const double decisionBoundary = 0.5) const; /** * Compute the error of the model. This returns the negative objective * function of the logistic regression log-likelihood function. For the model * to be optimal, the negative log-likelihood function should be minimized. * * @param predictors Input predictors. * @param responses Vector of responses. */ double ComputeError(const arma::mat& predictors, const arma::vec& responses) const; // Returns a string representation of this object. std::string ToString() const; private: //! Vector of trained parameters. arma::vec parameters; //! L2-regularization penalty parameter. double lambda; }; }; // namespace regression }; // namespace mlpack // Include implementation. #include "logistic_regression_impl.hpp" #endif // __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_HPP RcppMLPACK/src/mlpack/methods/rann/0000755000176200001440000000000013647512751016520 5ustar liggesusersRcppMLPACK/src/mlpack/methods/rann/ra_search.hpp0000644000176200001440000003247013647512751021166 0ustar liggesusers/** * @file ra_search.hpp * @author Parikshit Ram * * Defines the RASearch class, which performs an abstract rank-approximate * nearest/farthest neighbor query on two datasets. * * The details of this method can be found in the following paper: * * @inproceedings{ram2009rank, * title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and * Speed in High Dimensions}}, * author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}}, * booktitle={{Advances of Neural Information Processing Systems}}, * year={2009} * } * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_HPP #include #include #include #include #include "ra_query_stat.hpp" /** * The RASearch class: This class provides a generic manner to perform * rank-approximate search via random-sampling. If the 'naive' option is chosen, * this rank-approximate search will be done by randomly sampled from the whole * set. If the 'naive' option is not chosen, the sampling is done in a * stratified manner in the tree as mentioned in the algorithms in Figure 2 of * the following paper: * * @inproceedings{ram2009rank, * title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and * Speed in High Dimensions}}, * author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}}, * booktitle={{Advances of Neural Information Processing Systems}}, * year={2009} * } * * RASearch is currently known to not work with ball trees (#356). * * @tparam SortPolicy The sort policy for distances; see NearestNeighborSort. * @tparam MetricType The metric to use for computation. * @tparam TreeType The tree type to use. */ template, RAQueryStat > > class RASearch { public: /** * Initialize the RASearch object, passing both a query and reference dataset. * Optionally, perform the computation in naive mode or single-tree mode, and * set the leaf size used for tree-building. An initialized distance metric * can be given, for cases where the metric has internal data (i.e. the * distance::MahalanobisDistance class). * * This method will copy the matrices to internal copies, which are rearranged * during tree-building. You can avoid this extra copy by pre-constructing * the trees and passing them using a diferent constructor. * * @param referenceSet Set of reference points. * @param querySet Set of query points. * @param naive If true, the rank-approximate search will be performed by * directly sampling the whole set instead of using the stratified * sampling on the tree. * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ RASearch(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RASearch object, passing only one dataset, which is * used as both the query and the reference dataset. Optionally, perform the * computation in naive mode or single-tree mode, and set the leaf size used * for tree-building. An initialized distance metric can be given, for cases * where the metric has internal data (i.e. the distance::MahalanobisDistance * class). * * If naive mode is being used and a pre-built tree is given, it may not work: * naive mode operates by building a one-node tree (the root node holds all * the points). If that condition is not satisfied with the pre-built tree, * then naive mode will not work. * * @param referenceSet Set of reference points. * @param naive If true, the rank-approximate search will be performed * by directly sampling the whole set instead of using the stratified * sampling on the tree. * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ RASearch(const typename TreeType::Mat& referenceSet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RASearch object with the given datasets and * pre-constructed trees. It is assumed that the points in referenceSet and * querySet correspond to the points in referenceTree and queryTree, * respectively. Optionally, choose to use single-tree mode. Naive mode is * not available as an option for this constructor; instead, to run naive * computation, construct a tree with all of the points in one leaf (i.e. * leafSize = number of points). Additionally, an instantiated distance * metric can be given, for cases where the distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param queryTree Pre-built tree for query points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param querySet Set of query points corresponding to queryTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RASearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RASearch object with the given reference dataset and * pre-constructed tree. It is assumed that the points in referenceSet * correspond to the points in referenceTree. Optionally, choose to use * single-tree mode. Naive mode is not available as an option for this * constructor; instead, to run naive computation, construct a tree with all * the points in one leaf (i.e. leafSize = number of points). Additionally, * an instantiated distance metric can be given, for the case where the * distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RASearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Delete the RASearch object. The tree is the only member we are * responsible for deleting. The others will take care of themselves. */ ~RASearch(); /** * Compute the rank approximate nearest neighbors and store the output in the * given matrices. The matrices will be set to the size of n columns by k * rows, where n is the number of points in the query dataset and k is the * number of neighbors being searched for. * * Note that tau, the rank-approximation parameter, specifies that we are * looking for k neighbors with probability alpha of being in the top tau * percent of nearest neighbors. So, as an example, if our dataset has 1000 * points, and we want 5 nearest neighbors with 95% probability of being in * the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set * k = 5, tau = 5, and alpha = 0.95. * * The method will fail (and issue a failure message) if the value of tau is * too low: tau must be set such that the number of points in the * corresponding percentile of the data is greater than k. Thus, if we choose * tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting * to choose 5 nearest neighbors out of the closest 1 point -- this is * invalid. * * @param k Number of neighbors to search for. * @param resultingNeighbors Matrix storing lists of neighbors for each query * point. * @param distances Matrix storing distances of neighbors for each query * point. * @param tau The rank-approximation in percentile of the data. The default * value is 5%. * @param alpha The desired success probability. The default value is 0.95. * @param sampleAtLeaves Sample at leaves for faster but less accurate * computation. This defaults to 'false'. * @param firstLeafExact Traverse to the first leaf without approximation. * This can ensure that the query definitely finds its (near) duplicate * if there exists one. This defaults to 'false' for now. * @param singleSampleLimit The limit on the largest node that can be * approximated by sampling. This defaults to 20. */ void Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const double tau = 5, const double alpha = 0.95, const bool sampleAtLeaves = false, const bool firstLeafExact = false, const size_t singleSampleLimit = 20); /** * This function recursively resets the RAQueryStat of the queryTree to set * 'bound' to WorstDistance and the 'numSamplesMade' to 0. This allows a user * to perform multiple searches on the same pair of trees, possibly with * different levels of approximation without requiring to build a new pair of * trees for every new (approximate) search. */ void ResetQueryTree(); // Returns a string representation of this object. std::string ToString() const; private: //! Copy of reference dataset (if we need it, because tree building modifies //! it). arma::mat referenceCopy; //! Copy of query dataset (if we need it, because tree building modifies it). arma::mat queryCopy; //! Reference dataset. const arma::mat& referenceSet; //! Query dataset (may not be given). const arma::mat& querySet; //! Pointer to the root of the reference tree. TreeType* referenceTree; //! Pointer to the root of the query tree (might not exist). TreeType* queryTree; //! If true, this object created the trees and is responsible for them. bool treeOwner; //! Indicates if a separate query set was passed. bool hasQuerySet; //! Indicates if naive random sampling on the set is being used. bool naive; //! Indicates if single-tree search is being used (opposed to dual-tree). bool singleMode; //! Instantiation of kernel. MetricType metric; //! Permutations of reference points during tree building. std::vector oldFromNewReferences; //! Permutations of query points during tree building. std::vector oldFromNewQueries; //! Total number of pruned nodes during the neighbor search. size_t numberOfPrunes; /** * @param treeNode The node of the tree whose RAQueryStat is reset * and whose children are to be explored recursively. */ void ResetRAQueryStat(TreeType* treeNode); }; // class RASearch }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "ra_search_impl.hpp" // Include convenient typedefs. #include "ra_typedef.hpp" #endif RcppMLPACK/src/mlpack/methods/rann/ra_query_stat.hpp0000644000176200001440000000474113647512751022121 0ustar liggesusers/** * @file ra_query_stat.hpp * @author Parikshit Ram * * Defines the RAQueryStat class, which is the statistic used for * rank-approximate nearest neighbor search (RASearch). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_QUERY_STAT_HPP #define __MLPACK_METHODS_RANN_RA_QUERY_STAT_HPP #include #include #include #include namespace mlpack { namespace neighbor { /** * Extra data for each node in the tree. For neighbor searches, each node only * needs to store a bound on neighbor distances. * * Every query is required to make a minimum number of samples to guarantee the * desired approximation error. The 'numSamplesMade' keeps track of the minimum * number of samples made by all queries in the node in question. */ template class RAQueryStat { public: /** * Initialize the statistic with the worst possible distance according to our * sorting policy. */ RAQueryStat() : bound(SortPolicy::WorstDistance()), numSamplesMade(0) { } /** * Initialization for a node. */ template RAQueryStat(const TreeType& /* node */) : bound(SortPolicy::WorstDistance()), numSamplesMade(0) { } //! Get the bound. double Bound() const { return bound; } //! Modify the bound. double& Bound() { return bound; } //! Get the number of samples made. size_t NumSamplesMade() const { return numSamplesMade; } //! Modify the number of samples made. size_t& NumSamplesMade() { return numSamplesMade; } private: //! The bound on the node's neighbor distances. double bound; //! The minimum number of samples made by any query in this node. size_t numSamplesMade; }; #endif RcppMLPACK/src/mlpack/methods/rann/ra_search_impl.hpp0000644000176200001440000003420713647512751022207 0ustar liggesusers/** * @file ra_search_impl.hpp * @author Parikshit Ram * * Implementation of RASearch class to perform rank-approximate * all-nearest-neighbors on two specified data sets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_IMPL_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_IMPL_HPP #include #include "ra_search_rules.hpp" namespace mlpack { namespace neighbor { namespace aux { //! Call the tree constructor that does mapping. template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } }; // namespace aux // Construct the object. template RASearch:: RASearch(const typename TreeType::Mat& referenceSetIn, const typename TreeType::Mat& querySetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet((tree::TreeTraits::RearrangesDataset && !singleMode) ? queryCopy : querySetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), hasQuerySet(true), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric), numberOfPrunes(0) { // We'll time tree building. //Timer::Start("tree_building"); if (tree::TreeTraits::RearrangesDataset) { referenceCopy = referenceSetIn; if (!singleMode) queryCopy = querySetIn; } // Construct as a naive object if we need to. if (!naive) { referenceTree = aux::BuildTree(const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = aux::BuildTree(const_cast(querySet), oldFromNewQueries); } // Stop the timer we started above. //Timer::Stop("tree_building"); } // Construct the object. template RASearch:: RASearch(const typename TreeType::Mat& referenceSetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset && !singleMode ? referenceCopy : referenceSetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), hasQuerySet(false), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric), numberOfPrunes(0) { // We'll time tree building. //Timer::Start("tree_building"); if (tree::TreeTraits::RearrangesDataset) referenceCopy = referenceSetIn; // Construct as a naive object if we need to. if (!naive) referenceTree = aux::BuildTree(const_cast(referenceSet), oldFromNewReferences); // Stop the timer we started above. //Timer::Stop("tree_building"); } // Construct the object. template RASearch:: RASearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), hasQuerySet(true), naive(false), singleMode(singleMode), metric(metric), numberOfPrunes(0) // Nothing else to initialize. { } // Construct the object. template RASearch:: RASearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), hasQuerySet(false), naive(false), singleMode(singleMode), metric(metric), numberOfPrunes(0) // Nothing else to initialize. { } /** * The tree is the only member we may be responsible for deleting. The others * will take care of themselves. */ template RASearch:: ~RASearch() { if (treeOwner) { if (referenceTree) delete referenceTree; if (queryTree) delete queryTree; } } /** * Computes the best neighbors and stores them in resultingNeighbors and * distances. */ template void RASearch:: Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const double tau, const double alpha, const bool sampleAtLeaves, const bool firstLeafExact, const size_t singleSampleLimit) { //Timer::Start("computing_neighbors"); // If we have built the trees ourselves, then we will have to map all the // indices back to their original indices when this computation is finished. // To avoid an extra copy, we will store the neighbors and distances in a // separate matrix. arma::Mat* neighborPtr = &resultingNeighbors; arma::mat* distancePtr = &distances; // Mapping is only required if this tree type rearranges points and we are not // in naive mode. if (tree::TreeTraits::RearrangesDataset) { if (treeOwner && !(singleMode && hasQuerySet)) distancePtr = new arma::mat; // Query indices need to be mapped. if (treeOwner) neighborPtr = new arma::Mat; // All indices need mapping. } // Set the size of the neighbor and distance matrices. neighborPtr->set_size(k, querySet.n_cols); distancePtr->set_size(k, querySet.n_cols); distancePtr->fill(SortPolicy::WorstDistance()); size_t numPrunes = 0; if (naive) { // We don't need to run the base case on every possible combination of // points; we can achieve the rank approximation guarantee with probability // alpha by sampling the reference set. typedef RASearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric, tau, alpha, naive, sampleAtLeaves, firstLeafExact, singleSampleLimit); // Find how many samples from the reference set we need and sample uniformly // from the reference set without replacement. const size_t numSamples = rules.MinimumSamplesReqd(referenceSet.n_cols, k, tau, alpha); arma::uvec distinctSamples; rules.ObtainDistinctSamples(numSamples, referenceSet.n_cols, distinctSamples); // Run the base case on each combination of query point and sampled // reference point. for (size_t i = 0; i < querySet.n_cols; ++i) for (size_t j = 0; j < distinctSamples.n_elem; ++j) rules.BaseCase(i, (size_t) distinctSamples[j]); } else if (singleMode) { // Create the helper object for the tree traversal. Initialization of // RASearchRules already implicitly performs the naive tree traversal. typedef RASearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric, tau, alpha, naive, sampleAtLeaves, firstLeafExact, singleSampleLimit); // If the reference root node is a leaf, then the sampling has already been // done in the RASearchRules constructor. This happens when naive = true. if (!referenceTree->IsLeaf()) { Rcpp::Rcout << "Performing single-tree traversal..." << std::endl; // Create the traverser. typename TreeType::template SingleTreeTraverser traverser(rules); // Now have it traverse for each point. for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Single-tree traversal complete." << std::endl; Rcpp::Rcout << "Average number of distance calculations per query point: " << (rules.NumDistComputations() / querySet.n_cols) << "." << std::endl; } } else // Dual-tree recursion. { Rcpp::Rcout << "Performing dual-tree traversal..." << std::endl; typedef RASearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric, tau, alpha, sampleAtLeaves, firstLeafExact, singleSampleLimit); typename TreeType::template DualTreeTraverser traverser(rules); if (queryTree) { Rcpp::Rcout << "Query statistic pre-search: " << queryTree->Stat().NumSamplesMade() << std::endl; traverser.Traverse(*queryTree, *referenceTree); } else { Rcpp::Rcout << "Query statistic pre-search: " << referenceTree->Stat().NumSamplesMade() << std::endl; traverser.Traverse(*referenceTree, *referenceTree); } numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Dual-tree traversal complete." << std::endl; Rcpp::Rcout << "Average number of distance calculations per query point: " << (rules.NumDistComputations() / querySet.n_cols) << "." << std::endl; } //Timer::Stop("computing_neighbors"); Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; // Now, do we need to do mapping of indices? if (!treeOwner || !tree::TreeTraits::RearrangesDataset) { // No mapping needed. We are done. return; } else if (treeOwner && hasQuerySet && !singleMode) // Map both sets. { // Set size of output matrices correctly. resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewQueries[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewQueries[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrices. delete neighborPtr; delete distancePtr; } else if (treeOwner && !hasQuerySet) { // No query tree -- map both references and queries. resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewReferences[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewReferences[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } } else if (treeOwner && hasQuerySet && singleMode) // Map only references. { // Set size of neighbor indices matrix correctly. resultingNeighbors.set_size(k, querySet.n_cols); // Map indices of neighbors. for (size_t i = 0; i < resultingNeighbors.n_cols; i++) { for (size_t j = 0; j < resultingNeighbors.n_rows; j++) { resultingNeighbors(j, i) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrix. delete neighborPtr; } } // Search template void RASearch::ResetQueryTree() { if (!singleMode) { if (queryTree) ResetRAQueryStat(queryTree); else ResetRAQueryStat(referenceTree); } } template void RASearch::ResetRAQueryStat( TreeType* treeNode) { treeNode->Stat().Bound() = SortPolicy::WorstDistance(); treeNode->Stat().NumSamplesMade() = 0; for (size_t i = 0; i < treeNode->NumChildren(); i++) ResetRAQueryStat(&treeNode->Child(i)); } // Returns a String of the Object. template std::string RASearch::ToString() const { std::ostringstream convert; convert << "RA Search [" << this << "]" << std::endl; convert << " Reference Set: " << referenceSet.n_rows << "x" ; convert << referenceSet.n_cols << std::endl; if (&referenceSet != &querySet) convert << " QuerySet: " << querySet.n_rows << "x" << querySet.n_cols << std::endl; if (naive) convert << " Naive: TRUE" << std::endl; if (singleMode) convert << " Single Node: TRUE" << std::endl; convert << " Metric: " << std::endl << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/rann/ra_typedef.hpp0000644000176200001440000000503513647512751021356 0ustar liggesusers/** * @file ra_typedef.hpp * @author Parikshit Ram * * Simple typedefs describing template instantiations of the RASearch * class which are commonly used. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_RANN_RA_TYPEDEF_HPP #define __MLPACK_RANN_RA_TYPEDEF_HPP // In case someone included this directly. #include "ra_search.hpp" #include #include #include namespace mlpack { namespace neighbor { /** * The AllkRANN class is the all-k-rank-approximate-nearest-neighbors method. * It returns squared L2 distances (squared Euclidean distances) for each of the * k rank-approximate nearest-neighbors. Squared distances are used because * they are slightly faster than non-squared distances (they have one fewer call * to sqrt()). * * The approximation is controlled with two parameters (see allkrann_main.cpp) * which can be specified at search time. So the tree building is done only once * while the search can be performed multiple times with different approximation * levels. */ typedef RASearch<> AllkRANN; /** * The AllkRAFN class is the all-k-rank-approximate-farthest-neighbors method. * It returns squared L2 distances (squared Euclidean distances) for each of the * k rank-approximate farthest-neighbors. Squared distances are used because * they are slightly faster than non-squared distances (they have one fewer * call to sqrt()). * * The approximation is controlled with two parameters (see allkrann_main.cpp) * which can be specified at search time. So the tree building is done only once * while the search can be performed multiple times with different approximation * levels. */ typedef RASearch AllkRAFN; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/rann/ra_search_rules.hpp0000644000176200001440000003110413647512751022371 0ustar liggesusers/** * @file ra_search_rules.hpp * @author Parikshit Ram * * Defines the pruning rules and base case rules necessary to perform a * tree-based rank-approximate search (with an arbitrary tree) for the RASearch * class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_RULES_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_RULES_HPP #include "../neighbor_search/ns_traversal_info.hpp" #include "ra_search.hpp" // For friend declaration. namespace mlpack { namespace neighbor { template class RASearchRules { public: RASearchRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric, const double tau = 5, const double alpha = 0.95, const bool naive = false, const bool sampleAtLeaves = false, const bool firstLeafExact = false, const size_t singleSampleLimit = 20); double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * For rank-approximation, the scoring function first checks if pruning * by distance is possible. * If yes, then the node is given the score of * 'DBL_MAX' and the expected number of samples from that node are * added to the number of samples made for the query. * * If no, then the function tries to see if the node can be pruned by * approximation. If number of samples required from this node is small * enough, then that number of samples are acquired from this node * and the score is set to be 'DBL_MAX'. * * If the pruning by approximation is not possible either, the algorithm * continues with the usual tree-traversal. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * For rank-approximation, the scoring function first checks if pruning * by distance is possible. * If yes, then the node is given the score of * 'DBL_MAX' and the expected number of samples from that node are * added to the number of samples made for the query. * * If no, then the function tries to see if the node can be pruned by * approximation. If number of samples required from this node is small * enough, then that number of samples are acquired from this node * and the score is set to be 'DBL_MAX'. * * If the pruning by approximation is not possible either, the algorithm * continues with the usual tree-traversal. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be * recursed into at all (it should be pruned). This is used when the score * has already been calculated, but another recursion may have modified the * bounds for pruning. So the old score is checked against the new pruning * bound. * * For rank-approximation, it also checks if the number of samples left * for a query to satisfy the rank constraint is small enough at this * point of the algorithm, then this node is approximated by sampling * and given a new score of 'DBL_MAX'. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore); /** * Get the score for recursion order. A low score indicates priority for * recursionm while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * For the rank-approximation, we check if the referenceNode can be * approximated by sampling. If it can be, enough samples are made for * every query in the queryNode. No further query-tree traversal is * performed. * * The 'NumSamplesMade' query stat is propagated up the tree. And then * if pruning occurs (by distance or by sampling), the 'NumSamplesMade' * stat is not propagated down the tree. If no pruning occurs, the * stat is propagated down the tree. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Get the score for recursion order, passing the base case result (in the * situation where it may be needed to calculate the recursion order). A low * score indicates priority for recursion, while DBL_MAX indicates that the * node should not be recursed into at all (it should be pruned). * * For the rank-approximation, we check if the referenceNode can be * approximated by sampling. If it can be, enough samples are made for * every query in the queryNode. No further query-tree traversal is * performed. * * The 'NumSamplesMade' query stat is propagated up the tree. And then * if pruning occurs (by distance or by sampling), the 'NumSamplesMade' * stat is not propagated down the tree. If no pruning occurs, the * stat is propagated down the tree. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be * recursed into at all (it should be pruned). This is used when the score * has already been calculated, but another recursion may have modified the * bounds for pruning. So the old score is checked against the new pruning * bound. * * For the rank-approximation, we check if the referenceNode can be * approximated by sampling. If it can be, enough samples are made for * every query in the queryNode. No further query-tree traversal is * performed. * * The 'NumSamplesMade' query stat is propagated up the tree. And then * if pruning occurs (by distance or by sampling), the 'NumSamplesMade' * stat is not propagated down the tree. If no pruning occurs, the * stat is propagated down the tree. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Socre() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore); size_t NumDistComputations() { return numDistComputations; } size_t NumEffectiveSamples() { if (numSamplesMade.n_elem == 0) return 0; else return arma::sum(numSamplesMade); } typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference set. const arma::mat& referenceSet; //! The query set. const arma::mat& querySet; //! The matrix the resultant neighbor indices should be stored in. arma::Mat& neighbors; //! The matrix the resultant neighbor distances should be stored in. arma::mat& distances; //! The instantiated metric. MetricType& metric; //! Whether to sample at leaves or just use all of it bool sampleAtLeaves; //! Whether to do exact computation on the first leaf before any sampling bool firstLeafExact; //! The limit on the largest node that can be approximated by sampling size_t singleSampleLimit; //! The minimum number of samples required per query size_t numSamplesReqd; //! The number of samples made for every query arma::Col numSamplesMade; //! The sampling ratio double samplingRatio; // TO REMOVE: just for testing size_t numDistComputations; TraversalInfoType traversalInfo; /** * Insert a point into the neighbors and distances matrices; this is a helper * function. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); /** * Compute the minimum number of samples required to guarantee * the given rank-approximation and success probability. * * @param n Size of the set to be sampled from. * @param k The number of neighbors required within the rank-approximation. * @param tau The rank-approximation in percentile of the data. * @param alpha The success probability desired. */ size_t MinimumSamplesReqd(const size_t n, const size_t k, const double tau, const double alpha) const; /** * Compute the success probability of obtaining 'k'-neighbors from a * set of size 'n' within the top 't' neighbors if 'm' samples are made. * * @param n Size of the set being sampled from. * @param k The number of neighbors required within the rank-approximation. * @param m The number of random samples. * @param t The desired rank-approximation. */ double SuccessProbability(const size_t n, const size_t k, const size_t m, const size_t t) const; /** * Pick up desired number of samples (with replacement) from a given range * of integers so that only the distinct samples are returned from * the range [0 - specified upper bound) * * @param numSamples Number of random samples. * @param rangeUpperBound The upper bound on the range of integers. * @param distinctSamples The list of the distinct samples. */ void ObtainDistinctSamples(const size_t numSamples, const size_t rangeUpperBound, arma::uvec& distinctSamples) const; /** * Perform actual scoring for single-tree case. */ double Score(const size_t queryIndex, TreeType& referenceNode, const double distance, const double bestDistance); /** * Perform actual scoring for dual-tree case. */ double Score(TreeType& queryNode, TreeType& referenceNode, const double distance, const double bestDistance); // So that RASearch can access ObtainDistinctSamples() and // MinimumSamplesReqd(). Maybe refactoring is a better solution but this is // okay for now. friend class RASearch; }; // class RASearchRules }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "ra_search_rules_impl.hpp" #endif // __MLPACK_METHODS_RANN_RA_SEARCH_RULES_HPP RcppMLPACK/src/mlpack/methods/rann/ra_search_rules_impl.hpp0000644000176200001440000010657713647512751023433 0ustar liggesusers/** * @file ra_search_rules_impl.hpp * @author Parikshit Ram * * Implementation of RASearchRules. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_RULES_IMPL_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_RULES_IMPL_HPP // In case it hasn't been included yet. #include "ra_search_rules.hpp" namespace mlpack { namespace neighbor { template RASearchRules:: RASearchRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric, const double tau, const double alpha, const bool naive, const bool sampleAtLeaves, const bool firstLeafExact, const size_t singleSampleLimit) : referenceSet(referenceSet), querySet(querySet), neighbors(neighbors), distances(distances), metric(metric), sampleAtLeaves(sampleAtLeaves), firstLeafExact(firstLeafExact), singleSampleLimit(singleSampleLimit) { // Validate tau to make sure that the rank approximation is greater than the // number of neighbors requested. // The rank approximation. const size_t n = referenceSet.n_cols; const size_t k = neighbors.n_rows; const size_t t = (size_t) std::ceil(tau * (double) n / 100.0); if (t < k) { Rcpp::Rcout << "Rank-approximation percentile " << tau << " corresponds to " << t << " points, which is less than k (" << k << ")."; Rcpp::Rcout << "Cannot return " << k << " approximate nearest neighbors " << "from the nearest " << t << " points. Increase tau!" << std::endl; } else if (t == k) Rcpp::Rcout << "Rank-approximation percentile " << tau << " corresponds to " << t << " points; because k = " << k << ", this is exact search!" << std::endl; //Timer::Start("computing_number_of_samples_reqd"); numSamplesReqd = MinimumSamplesReqd(n, k, tau, alpha); //Timer::Stop("computing_number_of_samples_reqd"); // Initialize some statistics to be collected during the search. numSamplesMade = arma::zeros >(querySet.n_cols); numDistComputations = 0; samplingRatio = (double) numSamplesReqd / (double) n; Rcpp::Rcout << "Minimum samples required per query: " << numSamplesReqd << ", sampling ratio: " << samplingRatio << std::endl; if (naive) // No tree traversal; just do naive sampling here. { // Sample enough points. for (size_t i = 0; i < querySet.n_cols; ++i) { arma::uvec distinctSamples; ObtainDistinctSamples(numSamplesReqd, n, distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) BaseCase(i, (size_t) distinctSamples[j]); } } } template inline force_inline void RASearchRules:: ObtainDistinctSamples(const size_t numSamples, const size_t rangeUpperBound, arma::uvec& distinctSamples) const { // Keep track of the points that are sampled. arma::Col sampledPoints; sampledPoints.zeros(rangeUpperBound); for (size_t i = 0; i < numSamples; i++) sampledPoints[(size_t) math::RandInt(rangeUpperBound)]++; distinctSamples = arma::find(sampledPoints > 0); return; } template size_t RASearchRules:: MinimumSamplesReqd(const size_t n, const size_t k, const double tau, const double alpha) const { size_t ub = n; // The upper bound on the binary search. size_t lb = k; // The lower bound on the binary search. size_t m = lb; // The minimum number of random samples. // The rank-approximation. const size_t t = (size_t) std::ceil(tau * (double) n / 100.0); double prob; //Log::Assert(alpha <= 1.0); // going through all values of sample sizes // to find the minimum samples required to satisfy the // desired bound bool done = false; // This performs a binary search on the integer values between 'lb = k' // and 'ub = n' to find the minimum number of samples 'm' required to obtain // the desired success probability 'alpha'. do { prob = SuccessProbability(n, k, m, t); if (prob > alpha) { if (prob - alpha < 0.001 || ub < lb + 2) { done = true; break; } else ub = m; } else { if (prob < alpha) { if (m == lb) { m++; continue; } else lb = m; } else { done = true; break; } } m = (ub + lb) / 2; } while (!done); return (std::min(m + 1, n)); } template double RASearchRules::SuccessProbability( const size_t n, const size_t k, const size_t m, const size_t t) const { if (k == 1) { if (m > n - t) return 1.0; double eps = (double) t / (double) n; return 1.0 - std::pow(1.0 - eps, (double) m); } // Faster implementation for topK = 1. else { if (m < k) return 0.0; if (m > n - t + k - 1) return 1.0; double eps = (double) t / (double) n; double sum = 0.0; // The probability that 'k' of the 'm' samples lie within the top 't' // of the neighbors is given by: // sum_{j = k}^m Choose(m, j) (t/n)^j (1 - t/n)^{m - j} // which is also equal to // 1 - sum_{j = 0}^{k - 1} Choose(m, j) (t/n)^j (1 - t/n)^{m - j} // // So this is a m - k term summation or a k term summation. So if // m > 2k, do the k term summation, otherwise do the m term summation. size_t lb; size_t ub; bool topHalf; if (2 * k < m) { // Compute 1 - sum_{j = 0}^{k - 1} Choose(m, j) eps^j (1 - eps)^{m - j} // eps = t/n. // // Choosing 'lb' as 1 and 'ub' as k so as to sum from 1 to (k - 1), and // add the term (1 - eps)^m term separately. lb = 1; ub = k; topHalf = true; sum = std::pow(1 - eps, (double) m); } else { // Compute sum_{j = k}^m Choose(m, j) eps^j (1 - eps)^{m - j} // eps = t/n. // // Choosing 'lb' as k and 'ub' as m so as to sum from k to (m - 1), and // add the term eps^m term separately. lb = k; ub = m; topHalf = false; sum = std::pow(eps, (double) m); } for (size_t j = lb; j < ub; j++) { // Compute Choose(m, j). double mCj = (double) m; size_t jTrans; // If j < m - j, compute Choose(m, j). // If j > m - j, compute Choose(m, m - j). if (topHalf) jTrans = j; else jTrans = m - j; for(size_t i = 2; i <= jTrans; i++) { mCj *= (double) (m - (i - 1)); mCj /= (double) i; } sum += (mCj * std::pow(eps, (double) j) * std::pow(1.0 - eps, (double) (m - j))); } if (topHalf) sum = 1.0 - sum; return sum; } // For k > 1. } template inline force_inline double RASearchRules::BaseCase( const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, then this search is only using one dataset // and we should not return identical points. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return 0.0; double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // If this distance is better than any of the current candidates, the // SortDistance() function will give us the position to insert it into. arma::vec queryDist = distances.unsafe_col(queryIndex); arma::Col queryIndices = neighbors.unsafe_col(queryIndex); size_t insertPosition = SortPolicy::SortDistance(queryDist, queryIndices, distance); // SortDistance() returns (size_t() - 1) if we shouldn't add it. if (insertPosition != (size_t() - 1)) InsertNeighbor(queryIndex, insertPosition, referenceIndex, distance); numSamplesMade[queryIndex]++; // TO REMOVE numDistComputations++; return distance; } template inline double RASearchRules::Score( const size_t queryIndex, TreeType& referenceNode) { const arma::vec queryPoint = querySet.unsafe_col(queryIndex); const double distance = SortPolicy::BestPointToNodeDistance(queryPoint, &referenceNode); const double bestDistance = distances(distances.n_rows - 1, queryIndex); return Score(queryIndex, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult) { const arma::vec queryPoint = querySet.unsafe_col(queryIndex); const double distance = SortPolicy::BestPointToNodeDistance(queryPoint, &referenceNode, baseCaseResult); const double bestDistance = distances(distances.n_rows - 1, queryIndex); return Score(queryIndex, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( const size_t queryIndex, TreeType& referenceNode, const double distance, const double bestDistance) { // If this is better than the best distance we've seen so far, maybe there // will be something down this node. Also check if enough samples are already // made for this query. if (SortPolicy::IsBetter(distance, bestDistance) && numSamplesMade[queryIndex] < numSamplesReqd) { // We cannot prune this node; try approximating it by sampling. // If we are required to visit the first leaf (to find possible duplicates), // make sure we do not approximate. if (numSamplesMade[queryIndex] > 0 || !firstLeafExact) { // Check if this node can be approximated by sampling. size_t samplesReqd = (size_t) std::ceil(samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - numSamplesMade[queryIndex]); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples required and not at a leaf, then can't prune. return distance; } else { if (!referenceNode.IsLeaf()) { // Then samplesReqd <= singleSampleLimit. // Hence, approximate the node by sampling enough number of points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // Node approximated, so we can prune it. return DBL_MAX; } else // We are at a leaf. { if (sampleAtLeaves) // If allowed to sample at leaves. { // Approximate node by sampling enough number of points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // (Leaf) node approximated, so we can prune it. return DBL_MAX; } else { // Not allowed to sample from leaves, so cannot prune. return distance; } } } } else { // Try first to visit the first leaf to boost your accuracy and find // (near) duplicates if they exist. return distance; } } else { // Either there cannot be anything better in this node, or enough number of // samples are already made. So prune it. // Add 'fake' samples from this node; they are fake because the distances to // these samples need not be computed. // If enough samples are already made, this step does not change the result // of the search. numSamplesMade[queryIndex] += (size_t) std::floor( samplingRatio * (double) referenceNode.NumDescendants()); return DBL_MAX; } } template inline double RASearchRules:: Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) { // If we are already pruning, still prune. if (oldScore == DBL_MAX) return oldScore; // Just check the score again against the distances. const double bestDistance = distances(distances.n_rows - 1, queryIndex); // If this is better than the best distance we've seen so far, // maybe there will be something down this node. // Also check if enough samples are already made for this query. if (SortPolicy::IsBetter(oldScore, bestDistance) && numSamplesMade[queryIndex] < numSamplesReqd) { // We cannot prune this node; thus, we try approximating this node by // sampling. // Here, we assume that since we are re-scoring, the algorithm has already // sampled some candidates, and if specified, also traversed to the first // leaf. So no check regarding that is made any more. // Check if this node can be approximated by sampling. size_t samplesReqd = (size_t) std::ceil(samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - numSamplesMade[queryIndex]); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples are required and we are not at a leaf, then we // can't prune. return oldScore; } else { if (!referenceNode.IsLeaf()) { // Then, samplesReqd <= singleSampleLimit. Hence, approximate the node // by sampling enough number of points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function so // no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // Node approximated, so we can prune it. return DBL_MAX; } else // We are at a leaf. { if (sampleAtLeaves) { // Approximate node by sampling enough points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // (Leaf) node approximated, so we can prune it. return DBL_MAX; } else { // We cannot sample from leaves, so we cannot prune. return oldScore; } } } } else { // Either there cannot be anything better in this node, or enough number of // samples are already made, so prune it. // Add 'fake' samples from this node; they are fake because the distances to // these samples need not be computed. If enough samples are already made, // this step does not change the result of the search. numSamplesMade[queryIndex] += (size_t) std::floor(samplingRatio * (double) referenceNode.NumDescendants()); return DBL_MAX; } } // Rescore(point, node, oldScore) template inline double RASearchRules::Score( TreeType& queryNode, TreeType& referenceNode) { // First try to find the distance bound to check if we can prune by distance. // Calculate the best node-to-node distance. const double distance = SortPolicy::BestNodeToNodeDistance(&queryNode, &referenceNode); double pointBound = DBL_MAX; double childBound = DBL_MAX; const double maxDescendantDistance = queryNode.FurthestDescendantDistance(); for (size_t i = 0; i < queryNode.NumPoints(); i++) { const double bound = distances(distances.n_rows - 1, queryNode.Point(i)) + maxDescendantDistance; if (bound < pointBound) pointBound = bound; } for (size_t i = 0; i < queryNode.NumChildren(); i++) { const double bound = queryNode.Child(i).Stat().Bound(); if (bound < childBound) childBound = bound; } // Update the bound. queryNode.Stat().Bound() = std::min(pointBound, childBound); const double bestDistance = queryNode.Stat().Bound(); return Score(queryNode, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult) { // First try to find the distance bound to check if we can prune // by distance. // Find the best node-to-node distance. const double distance = SortPolicy::BestNodeToNodeDistance(&queryNode, &referenceNode, baseCaseResult); double pointBound = DBL_MAX; double childBound = DBL_MAX; const double maxDescendantDistance = queryNode.FurthestDescendantDistance(); for (size_t i = 0; i < queryNode.NumPoints(); i++) { const double bound = distances(distances.n_rows - 1, queryNode.Point(i)) + maxDescendantDistance; if (bound < pointBound) pointBound = bound; } for (size_t i = 0; i < queryNode.NumChildren(); i++) { const double bound = queryNode.Child(i).Stat().Bound(); if (bound < childBound) childBound = bound; } // update the bound queryNode.Stat().Bound() = std::min(pointBound, childBound); const double bestDistance = queryNode.Stat().Bound(); return Score(queryNode, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( TreeType& queryNode, TreeType& referenceNode, const double distance, const double bestDistance) { // Update the number of samples made for this node -- propagate up from child // nodes if child nodes have made samples that the parent node is not aware // of. Remember, we must propagate down samples made to the child nodes if // 'queryNode' descend is deemed necessary. // Only update from children if a non-leaf node, obviously. if (!queryNode.IsLeaf()) { size_t numSamplesMadeInChildNodes = std::numeric_limits::max(); // Find the minimum number of samples made among all children. for (size_t i = 0; i < queryNode.NumChildren(); i++) { const size_t numSamples = queryNode.Child(i).Stat().NumSamplesMade(); if (numSamples < numSamplesMadeInChildNodes) numSamplesMadeInChildNodes = numSamples; } // The number of samples made for a node is propagated up from the child // nodes if the child nodes have made samples that the parent (which is the // current 'queryNode') is not aware of. queryNode.Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), numSamplesMadeInChildNodes); } // Now check if the node-pair interaction can be pruned. // If this is better than the best distance we've seen so far, maybe there // will be something down this node. Also check if enough samples are already // made for this 'queryNode'. if (SortPolicy::IsBetter(distance, bestDistance) && queryNode.Stat().NumSamplesMade() < numSamplesReqd) { // We cannot prune this node; try approximating this node by sampling. // If we are required to visit the first leaf (to find possible duplicates), // make sure we do not approximate. if (queryNode.Stat().NumSamplesMade() > 0 || !firstLeafExact) { // Check if this node can be approximated by sampling. size_t samplesReqd = (size_t) std::ceil(samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - queryNode.Stat().NumSamplesMade()); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples are required and we are not at a leaf, then we // can't prune. Since query tree descent is necessary now, propagate // the number of samples made down to the children. // Iterate through all children and propagate the number of samples made // to the children. Only update if the parent node has made samples the // children have not seen. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return distance; } else { if (!referenceNode.IsLeaf()) { // Then samplesReqd <= singleSampleLimit. Hence, approximate node by // sampling enough number of points for every query in the query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the queryNode and also update // the number of sample made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples // made for the child nodes of this query node. // Node is approximated, so we can prune it. return DBL_MAX; } else { if (sampleAtLeaves) { // Approximate node by sampling enough number of points for every // query in the query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in the 'BaseCase' // function so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the queryNode and also // update the number of sample made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples // made for the child nodes of this query node. // (Leaf) node is approximated, so we can prune it. return DBL_MAX; } else { // We cannot sample from leaves, so we cannot prune. Propagate the // number of samples made down to the children. // Go through all children and propagate the number of // samples made to the children. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return distance; } } } } else { // We must first visit the first leaf to boost accuracy. // Go through all children and propagate the number of // samples made to the children. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return distance; } } else { // Either there cannot be anything better in this node, or enough number of // samples are already made, so prune it. // Add 'fake' samples from this node; fake because the distances to // these samples need not be computed. If enough samples are already made, // this step does not change the result of the search since this queryNode // will never be descended anymore. queryNode.Stat().NumSamplesMade() += (size_t) std::floor(samplingRatio * (double) referenceNode.NumDescendants()); // Since we are not going to descend down the query tree for this reference // node, there is no point updating the number of samples made for the child // nodes of this query node. return DBL_MAX; } } template inline double RASearchRules:: Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) { if (oldScore == DBL_MAX) return oldScore; // First try to find the distance bound to check if we can prune by distance. double pointBound = DBL_MAX; double childBound = DBL_MAX; const double maxDescendantDistance = queryNode.FurthestDescendantDistance(); for (size_t i = 0; i < queryNode.NumPoints(); i++) { const double bound = distances(distances.n_rows - 1, queryNode.Point(i)) + maxDescendantDistance; if (bound < pointBound) pointBound = bound; } for (size_t i = 0; i < queryNode.NumChildren(); i++) { const double bound = queryNode.Child(i).Stat().Bound(); if (bound < childBound) childBound = bound; } // Update the bound. queryNode.Stat().Bound() = std::min(pointBound, childBound); const double bestDistance = queryNode.Stat().Bound(); // Now check if the node-pair interaction can be pruned by sampling. // Update the number of samples made for that node. Propagate up from child // nodes if child nodes have made samples that the parent node is not aware // of. Remember, we must propagate down samples made to the child nodes if // the parent samples. // Only update from children if a non-leaf node, obviously. if (!queryNode.IsLeaf()) { size_t numSamplesMadeInChildNodes = std::numeric_limits::max(); // Find the minimum number of samples made among all children for (size_t i = 0; i < queryNode.NumChildren(); i++) { const size_t numSamples = queryNode.Child(i).Stat().NumSamplesMade(); if (numSamples < numSamplesMadeInChildNodes) numSamplesMadeInChildNodes = numSamples; } // The number of samples made for a node is propagated up from the child // nodes if the child nodes have made samples that the parent (which is the // current 'queryNode') is not aware of. queryNode.Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), numSamplesMadeInChildNodes); } // Now check if the node-pair interaction can be pruned by sampling. // If this is better than the best distance we've seen so far, maybe there // will be something down this node. Also check if enough samples are already // made for this query. if (SortPolicy::IsBetter(oldScore, bestDistance) && queryNode.Stat().NumSamplesMade() < numSamplesReqd) { // We cannot prune this node, so approximate by sampling. // Here we assume that since we are re-scoring, the algorithm has already // sampled some candidates, and if specified, also traversed to the first // leaf. So no checks regarding that are made any more. size_t samplesReqd = (size_t) std::ceil( samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - queryNode.Stat().NumSamplesMade()); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples are required and we are not at a leaf, then we // can't prune. // Since query tree descent is necessary now, propagate the number of // samples made down to the children. // Go through all children and propagate the number of samples made to the // children. Only update if the parent node has made samples the children // have not seen. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return oldScore; } else { if (!referenceNode.IsLeaf()) // If not a leaf, { // then samplesReqd <= singleSampleLimit. Hence, approximate the node // by sampling enough points for every query in the query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in the 'BaseCase' // function so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the query node and also update // the number of samples made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples made // for the child nodes of this query node. // Node approximated, so we can prune it. return DBL_MAX; } else // We are at a leaf. { if (sampleAtLeaves) { // Approximate node by sampling enough points for every query in the // query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in BaseCase() so no // book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the query node and also // update the number of samples made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples // made for the child nodes of this query node. // (Leaf) node approximated, so we can prune it. return DBL_MAX; } else { // We cannot sample from leaves, so we cannot prune. // Propagate the number of samples made down to the children. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return oldScore; } } } } else { // Either there cannot be anything better in this node, or enough samples // are already made, so prune it. // Add 'fake' samples from this node; fake because the distances to // these samples need not be computed. If enough samples are already made, // this step does not change the result of the search since this query node // will never be descended anymore. queryNode.Stat().NumSamplesMade() += (size_t) std::floor(samplingRatio * (double) referenceNode.NumDescendants()); // Since we are not going to descend down the query tree for this reference // node, there is no point updating the number of samples made for the child // nodes of this query node. return DBL_MAX; } } // Rescore(node, node, oldScore) /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void RASearchRules::InsertNeighbor( const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (distances.n_rows - 1)) { int len = (distances.n_rows - 1) - pos; memmove(distances.colptr(queryIndex) + (pos + 1), distances.colptr(queryIndex) + pos, sizeof(double) * len); memmove(neighbors.colptr(queryIndex) + (pos + 1), neighbors.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. distances(pos, queryIndex) = distance; neighbors(pos, queryIndex) = neighbor; } }; // namespace neighbor }; // namespace mlpack #endif // __MLPACK_METHODS_RANN_RA_SEARCH_RULES_IMPL_HPP RcppMLPACK/src/mlpack/methods/fastmks/0000755000176200001440000000000013647512751017232 5ustar liggesusersRcppMLPACK/src/mlpack/methods/fastmks/fastmks_stat.hpp0000644000176200001440000000733213647512751022453 0ustar liggesusers/** * @file fastmks_stat.hpp * @author Ryan Curtin * * The statistic used in trees with FastMKS. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_STAT_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_STAT_HPP #include #include namespace mlpack { namespace fastmks { /** * The statistic used in trees with FastMKS. This stores both the bound and the * self-kernels for each node in the tree. */ class FastMKSStat { public: /** * Default initialization. */ FastMKSStat() : bound(-DBL_MAX), selfKernel(0.0), lastKernel(0.0), lastKernelNode(NULL) { } /** * Initialize this statistic for the given tree node. The TreeType's metric * better be IPMetric with some kernel type (that is, Metric().Kernel() must * exist). * * @param node Node that this statistic is built for. */ template FastMKSStat(const TreeType& node) : bound(-DBL_MAX), lastKernel(0.0), lastKernelNode(NULL) { // Do we have to calculate the centroid? if (tree::TreeTraits::FirstPointIsCentroid) { // If this type of tree has self-children, then maybe the evaluation is // already done. These statistics are built bottom-up, so the child stat // should already be done. if ((tree::TreeTraits::HasSelfChildren) && (node.NumChildren() > 0) && (node.Point(0) == node.Child(0).Point(0))) { selfKernel = node.Child(0).Stat().SelfKernel(); } else { selfKernel = sqrt(node.Metric().Kernel().Evaluate( node.Dataset().unsafe_col(node.Point(0)), node.Dataset().unsafe_col(node.Point(0)))); } } else { // Calculate the centroid. arma::vec centroid; node.Centroid(centroid); selfKernel = sqrt(node.Metric().Kernel().Evaluate(centroid, centroid)); } } //! Get the self-kernel. double SelfKernel() const { return selfKernel; } //! Modify the self-kernel. double& SelfKernel() { return selfKernel; } //! Get the bound. double Bound() const { return bound; } //! Modify the bound. double& Bound() { return bound; } //! Get the last kernel evaluation. double LastKernel() const { return lastKernel; } //! Modify the last kernel evaluation. double& LastKernel() { return lastKernel; } //! Get the address of the node corresponding to the last distance evaluation. void* LastKernelNode() const { return lastKernelNode; } //! Modify the address of the node corresponding to the last distance //! evaluation. void*& LastKernelNode() { return lastKernelNode; } private: //! The bound for pruning. double bound; //! The self-kernel evaluation: sqrt(K(centroid, centroid)). double selfKernel; //! The last kernel evaluation. double lastKernel; //! The node corresponding to the last kernel evaluation. This has to be void //! otherwise we get recursive template arguments. void* lastKernelNode; }; }; // namespace fastmks }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/fastmks/fastmks_rules_impl.hpp0000644000176200001440000004517513647512751023662 0ustar liggesusers/** * @file fastmks_rules_impl.hpp * @author Ryan Curtin * * Implementation of FastMKSRules for cover tree search. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_IMPL_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_IMPL_HPP // In case it hasn't already been included. #include "fastmks_rules.hpp" namespace mlpack { namespace fastmks { template FastMKSRules::FastMKSRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& indices, arma::mat& products, KernelType& kernel) : referenceSet(referenceSet), querySet(querySet), indices(indices), products(products), kernel(kernel), lastQueryIndex(-1), lastReferenceIndex(-1), lastKernel(0.0), baseCases(0), scores(0) { // Precompute each self-kernel. queryKernels.set_size(querySet.n_cols); for (size_t i = 0; i < querySet.n_cols; ++i) queryKernels[i] = sqrt(kernel.Evaluate(querySet.unsafe_col(i), querySet.unsafe_col(i))); referenceKernels.set_size(referenceSet.n_cols); for (size_t i = 0; i < referenceSet.n_cols; ++i) referenceKernels[i] = sqrt(kernel.Evaluate(referenceSet.unsafe_col(i), referenceSet.unsafe_col(i))); // Set to invalid memory, so that the first node combination does not try to // dereference null pointers. traversalInfo.LastQueryNode() = (TreeType*) this; traversalInfo.LastReferenceNode() = (TreeType*) this; } template inline force_inline double FastMKSRules::BaseCase( const size_t queryIndex, const size_t referenceIndex) { // Score() always happens before BaseCase() for a given node combination. For // cover trees, the kernel evaluation between the two centroid points already // happened. So we don't need to do it. Note that this optimizes out if the // first conditional is false (its result is known at compile time). if (tree::TreeTraits::FirstPointIsCentroid) { if ((queryIndex == lastQueryIndex) && (referenceIndex == lastReferenceIndex)) return lastKernel; // Store new values. lastQueryIndex = queryIndex; lastReferenceIndex = referenceIndex; } ++baseCases; double kernelEval = kernel.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // Update the last kernel value, if we need to. if (tree::TreeTraits::FirstPointIsCentroid) lastKernel = kernelEval; // If the reference and query sets are identical, we still need to compute the // base case (so that things can be bounded properly), but we won't add it to // the results. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return kernelEval; // If this is a better candidate, insert it into the list. if (kernelEval < products(products.n_rows - 1, queryIndex)) return kernelEval; size_t insertPosition = 0; for ( ; insertPosition < products.n_rows; ++insertPosition) if (kernelEval >= products(insertPosition, queryIndex)) break; InsertNeighbor(queryIndex, insertPosition, referenceIndex, kernelEval); return kernelEval; } template double FastMKSRules::Score(const size_t queryIndex, TreeType& referenceNode) { // Compare with the current best. const double bestKernel = products(products.n_rows - 1, queryIndex); // See if we can perform a parent-child prune. const double furthestDist = referenceNode.FurthestDescendantDistance(); if (referenceNode.Parent() != NULL) { double maxKernelBound; const double parentDist = referenceNode.ParentDistance(); const double combinedDistBound = parentDist + furthestDist; const double lastKernel = referenceNode.Parent()->Stat().LastKernel(); if (kernel::KernelTraits::IsNormalized) { const double squaredDist = std::pow(combinedDistBound, 2.0); const double delta = (1 - 0.5 * squaredDist); if (lastKernel <= delta) { const double gamma = combinedDistBound * sqrt(1 - 0.25 * squaredDist); maxKernelBound = lastKernel * delta + gamma * sqrt(1 - std::pow(lastKernel, 2.0)); } else { maxKernelBound = 1.0; } } else { maxKernelBound = lastKernel + combinedDistBound * queryKernels[queryIndex]; } if (maxKernelBound < bestKernel) return DBL_MAX; } // Calculate the maximum possible kernel value, either by calculating the // centroid or, if the centroid is a point, use that. ++scores; double kernelEval; if (tree::TreeTraits::FirstPointIsCentroid) { // Could it be that this kernel evaluation has already been calculated? if (tree::TreeTraits::HasSelfChildren && referenceNode.Parent() != NULL && referenceNode.Point(0) == referenceNode.Parent()->Point(0)) { kernelEval = referenceNode.Parent()->Stat().LastKernel(); } else { kernelEval = BaseCase(queryIndex, referenceNode.Point(0)); } } else { const arma::vec queryPoint = querySet.unsafe_col(queryIndex); arma::vec refCentroid; referenceNode.Centroid(refCentroid); kernelEval = kernel.Evaluate(queryPoint, refCentroid); } referenceNode.Stat().LastKernel() = kernelEval; double maxKernel; if (kernel::KernelTraits::IsNormalized) { const double squaredDist = std::pow(furthestDist, 2.0); const double delta = (1 - 0.5 * squaredDist); if (kernelEval <= delta) { const double gamma = furthestDist * sqrt(1 - 0.25 * squaredDist); maxKernel = kernelEval * delta + gamma * sqrt(1 - std::pow(kernelEval, 2.0)); } else { maxKernel = 1.0; } } else { maxKernel = kernelEval + furthestDist * queryKernels[queryIndex]; } // We return the inverse of the maximum kernel so that larger kernels are // recursed into first. return (maxKernel > bestKernel) ? (1.0 / maxKernel) : DBL_MAX; } template double FastMKSRules::Score(TreeType& queryNode, TreeType& referenceNode) { // Update and get the query node's bound. queryNode.Stat().Bound() = CalculateBound(queryNode); const double bestKernel = queryNode.Stat().Bound(); // First, see if we can make a parent-child or parent-parent prune. These // four bounds on the maximum kernel value are looser than the bound normally // used, but they can prevent a base case from needing to be calculated. // Convenience caching so lines are shorter. const double queryParentDist = queryNode.ParentDistance(); const double queryDescDist = queryNode.FurthestDescendantDistance(); const double refParentDist = referenceNode.ParentDistance(); const double refDescDist = referenceNode.FurthestDescendantDistance(); double adjustedScore = traversalInfo.LastBaseCase(); const double queryDistBound = (queryParentDist + queryDescDist); const double refDistBound = (refParentDist + refDescDist); double dualQueryTerm; double dualRefTerm; // The parent-child and parent-parent prunes work by applying the same pruning // condition as when the parent node was used, except they are tighter because // queryDistBound < queryNode.Parent()->FurthestDescendantDistance() // and // refDistBound < referenceNode.Parent()->FurthestDescendantDistance() // so we construct the same bounds that were used when Score() was called with // the parents, except with the tighter distance bounds. Sometimes this // allows us to prune nodes without evaluating the base cases between them. if (traversalInfo.LastQueryNode() == queryNode.Parent()) { // We can assume that queryNode.Parent() != NULL, because at the root node // combination, the traversalInfo.LastQueryNode() pointer will _not_ be // NULL. We also should be guaranteed that // traversalInfo.LastReferenceNode() is either the reference node or the // parent of the reference node. adjustedScore += queryDistBound * traversalInfo.LastReferenceNode()->Stat().SelfKernel(); dualQueryTerm = queryDistBound; } else { // The query parent could be NULL, which does weird things and we have to // consider. if (traversalInfo.LastReferenceNode() != NULL) { adjustedScore += queryDescDist * traversalInfo.LastReferenceNode()->Stat().SelfKernel(); dualQueryTerm = queryDescDist; } else { // This makes it so a child-parent (or parent-parent) prune is not // possible. dualQueryTerm = 0.0; adjustedScore = bestKernel; } } if (traversalInfo.LastReferenceNode() == referenceNode.Parent()) { // We can assume that referenceNode.Parent() != NULL, because at the root // node combination, the traversalInfo.LastReferenceNode() pointer will // _not_ be NULL. adjustedScore += refDistBound * traversalInfo.LastQueryNode()->Stat().SelfKernel(); dualRefTerm = refDistBound; } else { // The reference parent could be NULL, which does weird things and we have // to consider. if (traversalInfo.LastQueryNode() != NULL) { adjustedScore += refDescDist * traversalInfo.LastQueryNode()->Stat().SelfKernel(); dualRefTerm = refDescDist; } else { // This makes it so a child-parent (or parent-parent) prune is not // possible. dualRefTerm = 0.0; adjustedScore = bestKernel; } } // Now add the dual term. adjustedScore += (dualQueryTerm * dualRefTerm); if (adjustedScore < bestKernel) { // It is not possible that this node combination can contain a point // combination with kernel value better than the minimum kernel value to // improve any of the results, so we can prune it. return DBL_MAX; } // We were unable to perform a parent-child or parent-parent prune, so now we // must calculate kernel evaluation, if necessary. double kernelEval = 0.0; if (tree::TreeTraits::FirstPointIsCentroid) { // For this type of tree, we may have already calculated the base case in // the parents. if ((traversalInfo.LastQueryNode() != NULL) && (traversalInfo.LastReferenceNode() != NULL) && (traversalInfo.LastQueryNode()->Point(0) == queryNode.Point(0)) && (traversalInfo.LastReferenceNode()->Point(0) == referenceNode.Point(0))) { // Base case already done. kernelEval = traversalInfo.LastBaseCase(); // When BaseCase() is called after Score(), these must be correct so that // another kernel evaluation is not performed. lastQueryIndex = queryNode.Point(0); lastReferenceIndex = referenceNode.Point(0); } else { // The kernel must be evaluated, but it is between points in the dataset, // so we can call BaseCase(). BaseCase() will set lastQueryIndex and // lastReferenceIndex correctly. kernelEval = BaseCase(queryNode.Point(0), referenceNode.Point(0)); } traversalInfo.LastBaseCase() = kernelEval; } else { // Calculate the maximum possible kernel value. arma::vec queryCentroid; arma::vec refCentroid; queryNode.Centroid(queryCentroid); referenceNode.Centroid(refCentroid); kernelEval = kernel.Evaluate(queryCentroid, refCentroid); traversalInfo.LastBaseCase() = kernelEval; } ++scores; double maxKernel; if (kernel::KernelTraits::IsNormalized) { // We have a tighter bound for normalized kernels. const double querySqDist = std::pow(queryDescDist, 2.0); const double refSqDist = std::pow(refDescDist, 2.0); const double bothSqDist = std::pow((queryDescDist + refDescDist), 2.0); if (kernelEval <= (1 - 0.5 * bothSqDist)) { const double queryDelta = (1 - 0.5 * querySqDist); const double queryGamma = queryDescDist * sqrt(1 - 0.25 * querySqDist); const double refDelta = (1 - 0.5 * refSqDist); const double refGamma = refDescDist * sqrt(1 - 0.25 * refSqDist); maxKernel = kernelEval * (queryDelta * refDelta - queryGamma * refGamma) + sqrt(1 - std::pow(kernelEval, 2.0)) * (queryGamma * refDelta + queryDelta * refGamma); } else { maxKernel = 1.0; } } else { // Use standard bound; kernel is not normalized. const double refKernelTerm = queryDescDist * referenceNode.Stat().SelfKernel(); const double queryKernelTerm = refDescDist * queryNode.Stat().SelfKernel(); maxKernel = kernelEval + refKernelTerm + queryKernelTerm + (queryDescDist * refDescDist); } // Store relevant information for parent-child pruning. traversalInfo.LastQueryNode() = &queryNode; traversalInfo.LastReferenceNode() = &referenceNode; // We return the inverse of the maximum kernel so that larger kernels are // recursed into first. return (maxKernel > bestKernel) ? (1.0 / maxKernel) : DBL_MAX; } template double FastMKSRules::Rescore(const size_t queryIndex, TreeType& /*referenceNode*/, const double oldScore) const { const double bestKernel = products(products.n_rows - 1, queryIndex); return ((1.0 / oldScore) > bestKernel) ? oldScore : DBL_MAX; } template double FastMKSRules::Rescore(TreeType& queryNode, TreeType& /*referenceNode*/, const double oldScore) const { queryNode.Stat().Bound() = CalculateBound(queryNode); const double bestKernel = queryNode.Stat().Bound(); return ((1.0 / oldScore) > bestKernel) ? oldScore : DBL_MAX; } /** * Calculate the bound for the given query node. This bound represents the * minimum value which a node combination must achieve to guarantee an * improvement in the results. * * @param queryNode Query node to calculate bound for. */ template double FastMKSRules::CalculateBound(TreeType& queryNode) const { // We have four possible bounds -- just like NeighborSearchRules, but they are // slightly different in this context. // // (1) min ( min_{all points p in queryNode} P_p[k], // min_{all children c in queryNode} B(c) ); // (2) max_{all points p in queryNode} P_p[k] + (worst child distance + worst // descendant distance) sqrt(K(I_p[k], I_p[k])); // (3) max_{all children c in queryNode} B(c) + <-- not done yet. ignored. // (4) B(parent of queryNode); double worstPointKernel = DBL_MAX; double bestAdjustedPointKernel = -DBL_MAX; const double queryDescendantDistance = queryNode.FurthestDescendantDistance(); // Loop over all points in this node to find the best and worst. for (size_t i = 0; i < queryNode.NumPoints(); ++i) { const size_t point = queryNode.Point(i); if (products(products.n_rows - 1, point) < worstPointKernel) worstPointKernel = products(products.n_rows - 1, point); if (products(products.n_rows - 1, point) == -DBL_MAX) continue; // Avoid underflow. // This should be (queryDescendantDistance + centroidDistance) for any tree // but it works for cover trees since centroidDistance = 0 for cover trees. const double candidateKernel = products(products.n_rows - 1, point) - queryDescendantDistance * referenceKernels[indices(indices.n_rows - 1, point)]; if (candidateKernel > bestAdjustedPointKernel) bestAdjustedPointKernel = candidateKernel; } // Loop over all the children in the node. double worstChildKernel = DBL_MAX; for (size_t i = 0; i < queryNode.NumChildren(); ++i) { if (queryNode.Child(i).Stat().Bound() < worstChildKernel) worstChildKernel = queryNode.Child(i).Stat().Bound(); } // Now assemble bound (1). const double firstBound = (worstPointKernel < worstChildKernel) ? worstPointKernel : worstChildKernel; // Bound (2) is bestAdjustedPointKernel. const double fourthBound = (queryNode.Parent() == NULL) ? -DBL_MAX : queryNode.Parent()->Stat().Bound(); // Pick the best of these bounds. const double interA = (firstBound > bestAdjustedPointKernel) ? firstBound : bestAdjustedPointKernel; // const double interA = 0.0; const double interB = fourthBound; return (interA > interB) ? interA : interB; } /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void FastMKSRules::InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (products.n_rows - 1)) { int len = (products.n_rows - 1) - pos; memmove(products.colptr(queryIndex) + (pos + 1), products.colptr(queryIndex) + pos, sizeof(double) * len); memmove(indices.colptr(queryIndex) + (pos + 1), indices.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. products(pos, queryIndex) = distance; indices(pos, queryIndex) = neighbor; } }; // namespace fastmks }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/fastmks/fastmks_impl.hpp0000644000176200001440000004433413647512751022444 0ustar liggesusers/** * @file fastmks_impl.hpp * @author Ryan Curtin * * Implementation of the FastMKS class (fast max-kernel search). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_IMPL_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_IMPL_HPP // In case it hasn't yet been included. #include "fastmks.hpp" #include "fastmks_rules.hpp" #include #include namespace mlpack { namespace fastmks { // Single dataset, no instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, const bool single, const bool naive) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive) { //Timer::Start("tree_building"); if (!naive) referenceTree = new TreeType(referenceSet); if (!naive && !single) queryTree = new TreeType(referenceSet); //Timer::Stop("tree_building"); } // Two datasets, no instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, const bool single, const bool naive) : referenceSet(referenceSet), querySet(querySet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive) { //Timer::Start("tree_building"); // If necessary, the trees should be built. if (!naive) referenceTree = new TreeType(referenceSet); if (!naive && !single) queryTree = new TreeType(querySet); //Timer::Stop("tree_building"); } // One dataset, instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, KernelType& kernel, const bool single, const bool naive) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive), metric(kernel) { //Timer::Start("tree_building"); // If necessary, the reference tree should be built. There is no query tree. if (!naive) referenceTree = new TreeType(referenceSet, metric); if (!naive && !single) queryTree = new TreeType(referenceSet, metric); //Timer::Stop("tree_building"); } // Two datasets, instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, KernelType& kernel, const bool single, const bool naive) : referenceSet(referenceSet), querySet(querySet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive), metric(kernel) { //Timer::Start("tree_building"); // If necessary, the trees should be built. if (!naive) referenceTree = new TreeType(referenceSet, metric); if (!naive && !single) queryTree = new TreeType(querySet, metric); //Timer::Stop("tree_building"); } // One dataset, pre-built tree. template FastMKS::FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const bool single, const bool naive) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), single(single), naive(naive), metric(referenceTree->Metric()) { // The query tree cannot be the same as the reference tree. if (referenceTree) queryTree = new TreeType(*referenceTree); } // Two datasets, pre-built trees. template FastMKS::FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const arma::mat& querySet, TreeType* queryTree, const bool single, const bool naive) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), single(single), naive(naive), metric(referenceTree->Metric()) { // Nothing to do. } template FastMKS::~FastMKS() { // If we created the trees, we must delete them. if (treeOwner) { if (queryTree) delete queryTree; if (referenceTree) delete referenceTree; } else if (&querySet == &referenceSet) { // The user passed in a reference tree which we needed to copy. if (queryTree) delete queryTree; } } template void FastMKS::Search(const size_t k, arma::Mat& indices, arma::mat& products) { // No remapping will be necessary because we are using the cover tree. indices.set_size(k, querySet.n_cols); products.set_size(k, querySet.n_cols); products.fill(-DBL_MAX); //Timer::Start("computing_products"); // Naive implementation. if (naive) { // Simple double loop. Stupid, slow, but a good benchmark. for (size_t q = 0; q < querySet.n_cols; ++q) { for (size_t r = 0; r < referenceSet.n_cols; ++r) { if ((&querySet == &referenceSet) && (q == r)) continue; const double eval = metric.Kernel().Evaluate(querySet.unsafe_col(q), referenceSet.unsafe_col(r)); size_t insertPosition; for (insertPosition = 0; insertPosition < indices.n_rows; ++insertPosition) if (eval > products(insertPosition, q)) break; if (insertPosition < indices.n_rows) InsertNeighbor(indices, products, q, insertPosition, r, eval); } } //Timer::Stop("computing_products"); return; } // Single-tree implementation. if (single) { // Create rules object (this will store the results). This constructor // precalculates each self-kernel value. typedef FastMKSRules RuleType; RuleType rules(referenceSet, querySet, indices, products, metric.Kernel()); typename TreeType::template SingleTreeTraverser traverser(rules); for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); // Save the number of pruned nodes. const size_t numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; Rcpp::Rcout << rules.BaseCases() << " base cases." << std::endl; Rcpp::Rcout << rules.Scores() << " scores." << std::endl; //Timer::Stop("computing_products"); return; } // Dual-tree implementation. typedef FastMKSRules RuleType; RuleType rules(referenceSet, querySet, indices, products, metric.Kernel()); typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*queryTree, *referenceTree); const size_t numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; Rcpp::Rcout << rules.BaseCases() << " base cases." << std::endl; Rcpp::Rcout << rules.Scores() << " scores." << std::endl; //Timer::Stop("computing_products"); return; } /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void FastMKS::InsertNeighbor(arma::Mat& indices, arma::mat& products, const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (products.n_rows - 1)) { int len = (products.n_rows - 1) - pos; memmove(products.colptr(queryIndex) + (pos + 1), products.colptr(queryIndex) + pos, sizeof(double) * len); memmove(indices.colptr(queryIndex) + (pos + 1), indices.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. products(pos, queryIndex) = distance; indices(pos, queryIndex) = neighbor; } // Return string of object. template std::string FastMKS::ToString() const { std::ostringstream convert; convert << "FastMKS [" << this << "]" << std::endl; convert << " Naive: " << naive << std::endl; convert << " Single: " << single << std::endl; convert << " Metric: " << std::endl; convert << mlpack::util::Indent(metric.ToString(),2); convert << std::endl; return convert.str(); } // Specialized implementation for tighter bounds for Gaussian. /* template<> void FastMKS::Search(const size_t k, arma::Mat& indices, arma::mat& products) { Log::Warn << "Alternate implementation!" << std::endl; // Terrible copypasta. Bad bad bad. // No remapping will be necessary. indices.set_size(k, querySet.n_cols); products.set_size(k, querySet.n_cols); products.fill(-1.0); Timer::Start("computing_products"); size_t kernelEvaluations = 0; // Naive implementation. if (naive) { // Simple double loop. Stupid, slow, but a good benchmark. for (size_t q = 0; q < querySet.n_cols; ++q) { for (size_t r = 0; r < referenceSet.n_cols; ++r) { const double eval = metric.Kernel().Evaluate(querySet.unsafe_col(q), referenceSet.unsafe_col(r)); ++kernelEvaluations; size_t insertPosition; for (insertPosition = 0; insertPosition < indices.n_rows; ++insertPosition) if (eval > products(insertPosition, q)) break; if (insertPosition < indices.n_rows) InsertNeighbor(indices, products, q, insertPosition, r, eval); } } Timer::Stop("computing_products"); Rcpp::Rcout << "Kernel evaluations: " << kernelEvaluations << "." << std::endl; return; } // Single-tree implementation. if (single) { // Calculate number of pruned nodes. size_t numPrunes = 0; // Precalculate query products ( || q || for all q). arma::vec queryProducts(querySet.n_cols); for (size_t queryIndex = 0; queryIndex < querySet.n_cols; ++queryIndex) queryProducts[queryIndex] = sqrt(metric.Kernel().Evaluate( querySet.unsafe_col(queryIndex), querySet.unsafe_col(queryIndex))); kernelEvaluations += querySet.n_cols; // Screw the CoverTreeTraverser, we'll implement it by hand. for (size_t queryIndex = 0; queryIndex < querySet.n_cols; ++queryIndex) { // Use an array of priority queues? std::priority_queue< SearchFrame > >, std::vector > > >, SearchFrameCompare > > > frameQueue; // Add initial frame. SearchFrame > > nextFrame; nextFrame.node = referenceTree; nextFrame.eval = metric.Kernel().Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceTree->Point())); Log::Assert(nextFrame.eval <= 1); ++kernelEvaluations; // The initial evaluation will be the best so far. indices(0, queryIndex) = referenceTree->Point(); products(0, queryIndex) = nextFrame.eval; frameQueue.push(nextFrame); tree::CoverTree >* referenceNode; double eval; double maxProduct; while (!frameQueue.empty()) { // Get the information for this node. const SearchFrame > >& frame = frameQueue.top(); referenceNode = frame.node; eval = frame.eval; // Loop through the children, seeing if we can prune them; if not, add // them to the queue. The self-child is different -- it has the same // parent (and therefore the same kernel evaluation). if (referenceNode->NumChildren() > 0) { SearchFrame > > childFrame; // We must handle the self-child differently, to avoid adding it to // the results twice. childFrame.node = &(referenceNode->Child(0)); childFrame.eval = eval; // Alternate pruning rule. const double mdd = childFrame.node->FurthestDescendantDistance(); if (eval >= (1 - std::pow(mdd, 2.0) / 2.0)) maxProduct = 1; else maxProduct = eval * (1 - std::pow(mdd, 2.0) / 2.0) + sqrt(1 - std::pow(eval, 2.0)) * mdd * sqrt(1 - std::pow(mdd, 2.0) / 4.0); // Add self-child if we can't prune it. if (maxProduct > products(products.n_rows - 1, queryIndex)) { // But only if it has children of its own. if (childFrame.node->NumChildren() > 0) frameQueue.push(childFrame); } else ++numPrunes; for (size_t i = 1; i < referenceNode->NumChildren(); ++i) { // Before we evaluate the child, let's see if it can possibly have // a better evaluation. const double mpdd = std::min( referenceNode->Child(i).ParentDistance() + referenceNode->Child(i).FurthestDescendantDistance(), 2.0); double maxChildEval = 1; if (eval < (1 - std::pow(mpdd, 2.0) / 2.0)) maxChildEval = eval * (1 - std::pow(mpdd, 2.0) / 2.0) + sqrt(1 - std::pow(eval, 2.0)) * mpdd * sqrt(1 - std::pow(mpdd, 2.0) / 4.0); if (maxChildEval > products(products.n_rows - 1, queryIndex)) { // Evaluate child. childFrame.node = &(referenceNode->Child(i)); childFrame.eval = metric.Kernel().Evaluate( querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceNode->Child(i).Point())); ++kernelEvaluations; // Can we prune it? If we can, we can avoid putting it in the // queue (saves time). const double cmdd = childFrame.node->FurthestDescendantDistance(); if (childFrame.eval > (1 - std::pow(cmdd, 2.0) / 2.0)) maxProduct = 1; else maxProduct = childFrame.eval * (1 - std::pow(cmdd, 2.0) / 2.0) + sqrt(1 - std::pow(eval, 2.0)) * cmdd * sqrt(1 - std::pow(cmdd, 2.0) / 4.0); if (maxProduct > products(products.n_rows - 1, queryIndex)) { // Good enough to recurse into. While we're at it, check the // actual evaluation and see if it's an improvement. if (childFrame.eval > products(products.n_rows - 1, queryIndex)) { // This is a better result. Find out where to insert. size_t insertPosition = 0; for ( ; insertPosition < products.n_rows - 1; ++insertPosition) if (childFrame.eval > products(insertPosition, queryIndex)) break; // Insert into the correct position; we are guaranteed that // insertPosition is valid. InsertNeighbor(indices, products, queryIndex, insertPosition, childFrame.node->Point(), childFrame.eval); } // Now add this to the queue (if it has any children which may // need to be recursed into). if (childFrame.node->NumChildren() > 0) frameQueue.push(childFrame); } else ++numPrunes; } else ++numPrunes; } } frameQueue.pop(); } } Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; Rcpp::Rcout << "Kernel evaluations: " << kernelEvaluations << "." << std::endl; Rcpp::Rcout << "Distance evaluations: " << distanceEvaluations << "." << std::endl; Timer::Stop("computing_products"); return; } // Double-tree implementation. Log::Fatal << "Dual-tree search not implemented yet... oops..." << std::endl; } */ }; // namespace fastmks }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/fastmks/fastmks.hpp0000644000176200001440000002172113647512751021416 0ustar liggesusers/** * @file fastmks.hpp * @author Ryan Curtin * * Definition of the FastMKS class, which implements fast exact max-kernel * search. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_HPP #include #include #include "fastmks_stat.hpp" #include namespace mlpack { namespace fastmks /** Fast max-kernel search. */ { /** * An implementation of fast exact max-kernel search. Given a query dataset and * a reference dataset (or optionally just a reference dataset which is also * used as the query dataset), fast exact max-kernel search finds, for each * point in the query dataset, the k points in the reference set with maximum * kernel value K(p_q, p_r), where k is a specified parameter and K() is a * Mercer kernel. * * For more information, see the following paper. * * @code * @inproceedings{curtin2013fast, * title={Fast Exact Max-Kernel Search}, * author={Curtin, Ryan R. and Ram, Parikshit and Gray, Alexander G.}, * booktitle={Proceedings of the 2013 SIAM International Conference on Data * Mining (SDM 13)}, * year={2013} * } * @endcode * * This class allows specification of the type of kernel and also of the type of * tree. FastMKS can be run on kernels that work on arbitrary objects -- * however, this only works with cover trees and other trees that are built only * on points in the dataset (and not centroids of regions or anything like * that). * * @tparam KernelType Type of kernel to run FastMKS with. * @tparam TreeType Type of tree to run FastMKS with; it must have metric * IPMetric. */ template< typename KernelType, typename TreeType = tree::CoverTree, tree::FirstPointIsRoot, FastMKSStat> > class FastMKS { public: /** * Create the FastMKS object using the reference set as the query set. * Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Set of data to run FastMKS on. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, const bool single = false, const bool naive = false); /** * Create the FastMKS object using separate reference and query sets. * Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Reference set of data for FastMKS. * @param querySet Set of query points for FastMKS. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, const bool single = false, const bool naive = false); /** * Create the FastMKS object using the reference set as the query set, and * with an initialized kernel. This is useful for when the kernel stores * state. Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Reference set of data for FastMKS. * @param kernel Initialized kernel. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, KernelType& kernel, const bool single = false, const bool naive = false); /** * Create the FastMKS object using separate reference and query sets, and with * an initialized kernel. This is useful for when the kernel stores state. * Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Reference set of data for FastMKS. * @param querySet Set of query points for FastMKS. * @param kernel Initialized kernel. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, KernelType& kernel, const bool single = false, const bool naive = false); /** * Create the FastMKS object with an already-initialized tree built on the * reference points. Be sure that the tree is built with the metric type * IPMetric. For this constructor, the reference set and the * query set are the same points. Optionally, whether or not to run * single-tree search or brute-force (naive) search can be specified. * * @param referenceSet Reference set of data for FastMKS. * @param referenceTree Tree built on reference data. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const bool single = false, const bool naive = false); /** * Create the FastMKS object with already-initialized trees built on the * reference and query points. Be sure that the trees are built with the * metric type IPMetric. Optionally, whether or not to run * single-tree search or naive (brute-force) search can be specified. * * @param referenceSet Reference set of data for FastMKS. * @param referenceTree Tree built on reference data. * @param querySet Set of query points for FastMKS. * @param queryTree Tree built on query data. * @param single Whether or not to use single-tree search. * @param naive Whether or not to use naive (brute-force) search. */ FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const arma::mat& querySet, TreeType* queryTree, const bool single = false, const bool naive = false); //! Destructor for the FastMKS object. ~FastMKS(); /** * Search for the maximum inner products of the query set (or if no query set * was passed, the reference set is used). The resulting maximum inner * products are stored in the products matrix and the corresponding point * indices are stores in the indices matrix. The results for each point in * the query set are stored in the corresponding column of the indices and * products matrices; for instance, the index of the point with maximum inner * product to point 4 in the query set will be stored in row 0 and column 4 of * the indices matrix. * * @param k The number of maximum kernels to find. * @param indices Matrix to store resulting indices of max-kernel search in. * @param products Matrix to store resulting max-kernel values in. */ void Search(const size_t k, arma::Mat& indices, arma::mat& products); //! Get the inner-product metric induced by the given kernel. const metric::IPMetric& Metric() const { return metric; } //! Modify the inner-product metric induced by the given kernel. metric::IPMetric& Metric() { return metric; } /** * Returns a string representation of this object. */ std::string ToString() const; private: //! The reference dataset. const arma::mat& referenceSet; //! The query dataset. const arma::mat& querySet; //! The tree built on the reference dataset. TreeType* referenceTree; //! The tree built on the query dataset. This is NULL if there is no query //! set. TreeType* queryTree; //! If true, this object created the trees and is responsible for them. bool treeOwner; //! If true, single-tree search is used. bool single; //! If true, naive (brute-force) search is used. bool naive; //! The instantiated inner-product metric induced by the given kernel. metric::IPMetric metric; //! Utility function. Copied too many times from too many places. void InsertNeighbor(arma::Mat& indices, arma::mat& products, const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); }; }; // namespace fastmks }; // namespace mlpack // Include implementation. #include "fastmks_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/fastmks/fastmks_rules.hpp0000644000176200001440000001323413647512751022630 0ustar liggesusers/** * @file fastmks_rules.hpp * @author Ryan Curtin * * Rules for the single or dual tree traversal for fast max-kernel search. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_HPP #include #include #include "../neighbor_search/ns_traversal_info.hpp" namespace mlpack { namespace fastmks { /** * The base case and pruning rules for FastMKS (fast max-kernel search). */ template class FastMKSRules { public: FastMKSRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& indices, arma::mat& products, KernelType& kernel); //! Compute the base case (kernel value) between two points. double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to be recursed into. * @param referenceNode Candidate reference node to be recursed into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that a node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) const; /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that a node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to be recursed into. * @param referenceNode Candidate reference node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; //! Get the number of times BaseCase() was called. size_t BaseCases() const { return baseCases; } //! Modify the number of times BaseCase() was called. size_t& BaseCases() { return baseCases; } //! Get the number of times Score() was called. size_t Scores() const { return scores; } //! Modify the number of times Score() was called. size_t& Scores() { return scores; } typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference dataset. const arma::mat& referenceSet; //! The query dataset. const arma::mat& querySet; //! The indices of the maximum kernel results. arma::Mat& indices; //! The maximum kernels. arma::mat& products; //! Cached query set self-kernels (|| q || for each q). arma::vec queryKernels; //! Cached reference set self-kernels (|| r || for each r). arma::vec referenceKernels; //! The instantiated kernel. KernelType& kernel; //! The last query index BaseCase() was called with. size_t lastQueryIndex; //! The last reference index BaseCase() was called with. size_t lastReferenceIndex; //! The last kernel evaluation resulting from BaseCase(). double lastKernel; //! Calculate the bound for a given query node. double CalculateBound(TreeType& queryNode) const; //! Utility function to insert neighbor into list of results. void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); //! For benchmarking. size_t baseCases; //! For benchmarking. size_t scores; TraversalInfoType traversalInfo; }; }; // namespace fastmks }; // namespace mlpack // Include implementation. #include "fastmks_rules_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/gmm/0000755000176200001440000000000013647512751016342 5ustar liggesusersRcppMLPACK/src/mlpack/methods/gmm/gmm_impl.hpp0000644000176200001440000003763213647512751020667 0ustar liggesusers/** * @file gmm_impl.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * @author Ryan Curtin * * Implementation of template-based GMM methods. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_GMM_IMPL_HPP #define __MLPACK_METHODS_GMM_GMM_IMPL_HPP // In case it hasn't already been included. #include "gmm.hpp" //#include namespace mlpack { namespace gmm { /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. The means and covariances will be set to 0. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. */ template GMM::GMM(const size_t gaussians, const size_t dimensionality) : gaussians(gaussians), dimensionality(dimensionality), means(gaussians, arma::vec(dimensionality)), covariances(gaussians, arma::mat(dimensionality, dimensionality)), weights(gaussians), localFitter(FittingType()), fitter(localFitter) { // Clear the memory; set it to 0. Technically this model is still valid, but // only barely. weights.fill(1.0 / gaussians); for (size_t i = 0; i < gaussians; ++i) { means[i].zeros(); covariances[i].eye(); } } /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. Also, pass in an initialized FittingType class; * this is useful in cases where the FittingType class needs to store some * state. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. * @param fitter Initialized fitting mechanism. */ template GMM::GMM(const size_t gaussians, const size_t dimensionality, FittingType& fitter) : gaussians(gaussians), dimensionality(dimensionality), means(gaussians, arma::vec(dimensionality)), covariances(gaussians, arma::mat(dimensionality, dimensionality)), weights(gaussians), fitter(fitter) { // Clear the memory; set it to 0. Technically this model is still valid, but // only barely. weights.fill(1.0 / gaussians); for (size_t i = 0; i < gaussians; ++i) { means[i].zeros(); covariances[i].eye(); } } // Copy constructor. template template GMM::GMM(const GMM& other) : gaussians(other.Gaussians()), dimensionality(other.Dimensionality()), means(other.Means()), covariances(other.Covariances()), weights(other.Weights()), localFitter(FittingType()), fitter(localFitter) { /* Nothing to do. */ } // Copy constructor for when the other GMM uses the same fitting type. template GMM::GMM(const GMM& other) : gaussians(other.Gaussians()), dimensionality(other.Dimensionality()), means(other.Means()), covariances(other.Covariances()), weights(other.Weights()), localFitter(other.Fitter()), fitter(localFitter) { /* Nothing to do. */ } template template GMM& GMM::operator=( const GMM& other) { gaussians = other.Gaussians(); dimensionality = other.Dimensionality(); means = other.Means(); covariances = other.Covariances(); weights = other.Weights(); return *this; } template GMM& GMM::operator=(const GMM& other) { gaussians = other.Gaussians(); dimensionality = other.Dimensionality(); means = other.Means(); covariances = other.Covariances(); weights = other.Weights(); localFitter = other.Fitter(); return *this; } // Load a GMM from file. /* template void GMM::Load(const std::string& filename) { util::SaveRestoreUtility load; if (!load.ReadFile(filename)) Rcpp::Rcout << "GMM::Load(): could not read file '" << filename << "'!\n"; load.LoadParameter(gaussians, "gaussians"); load.LoadParameter(dimensionality, "dimensionality"); load.LoadParameter(weights, "weights"); // We need to do a little error checking here. if (weights.n_elem != gaussians) { Rcpp::Rcout << "GMM::Load('" << filename << "'): file reports " << gaussians << " gaussians but weights vector only contains " << weights.n_elem << " elements!" << std::endl; } means.resize(gaussians); covariances.resize(gaussians); for (size_t i = 0; i < gaussians; ++i) { std::stringstream o; o << i; std::string meanName = "mean" + o.str(); std::string covName = "covariance" + o.str(); load.LoadParameter(means[i], meanName); load.LoadParameter(covariances[i], covName); } }*/ // Save a GMM to a file. /* template void GMM::Save(const std::string& filename) const { util::SaveRestoreUtility save; save.SaveParameter(gaussians, "gaussians"); save.SaveParameter(dimensionality, "dimensionality"); save.SaveParameter(weights, "weights"); for (size_t i = 0; i < gaussians; ++i) { // Generate names for the XML nodes. std::stringstream o; o << i; std::string meanName = "mean" + o.str(); std::string covName = "covariance" + o.str(); // Now save them. save.SaveParameter(means[i], meanName); save.SaveParameter(covariances[i], covName); } if (!save.WriteFile(filename)) Rcpp::Rcout << "GMM::Save(): error saving to '" << filename << "'.\n"; }*/ /** * Return the probability of the given observation being from this GMM. */ template double GMM::Probability(const arma::vec& observation) const { // Sum the probability for each Gaussian in our mixture (and we have to // multiply by the prior for each Gaussian too). double sum = 0; for (size_t i = 0; i < gaussians; i++) sum += weights[i] * phi(observation, means[i], covariances[i]); return sum; } /** * Return the probability of the given observation being from the given * component in the mixture. */ template double GMM::Probability(const arma::vec& observation, const size_t component) const { // We are only considering one Gaussian component -- so we only need to call // phi() once. We do consider the prior probability! return weights[component] * phi(observation, means[component], covariances[component]); } /** * Return a randomly generated observation according to the probability * distribution defined by this object. */ template arma::vec GMM::Random() const { // Determine which Gaussian it will be coming from. double gaussRand = math::Random(); size_t gaussian = 0; double sumProb = 0; for (size_t g = 0; g < gaussians; g++) { sumProb += weights(g); if (gaussRand <= sumProb) { gaussian = g; break; } } return trans(chol(covariances[gaussian])) * arma::randn(dimensionality) + means[gaussian]; } /** * Fit the GMM to the given observations. */ template double GMM::Estimate(const arma::mat& observations, const size_t trials, const bool useExistingModel) { double bestLikelihood; // This will be reported later. // We don't need to store temporary models if we are only doing one trial. if (trials == 1) { // Train the model. The user will have been warned earlier if the GMM was // initialized with no parameters (0 gaussians, dimensionality of 0). fitter.Estimate(observations, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); } else { if (trials == 0) return -DBL_MAX; // It's what they asked for... // If each trial must start from the same initial location, we must save it. std::vector meansOrig; std::vector covariancesOrig; arma::vec weightsOrig; if (useExistingModel) { meansOrig = means; covariancesOrig = covariances; weightsOrig = weights; } // We need to keep temporary copies. We'll do the first training into the // actual model position, so that if it's the best we don't need to copy it. fitter.Estimate(observations, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial 0 is " << bestLikelihood << "." << std::endl; // Now the temporary model. std::vector meansTrial(gaussians, arma::vec(dimensionality)); std::vector covariancesTrial(gaussians, arma::mat(dimensionality, dimensionality)); arma::vec weightsTrial(gaussians); for (size_t trial = 1; trial < trials; ++trial) { if (useExistingModel) { meansTrial = meansOrig; covariancesTrial = covariancesOrig; weightsTrial = weightsOrig; } fitter.Estimate(observations, meansTrial, covariancesTrial, weightsTrial, useExistingModel); // Check to see if the log-likelihood of this one is better. double newLikelihood = LogLikelihood(observations, meansTrial, covariancesTrial, weightsTrial); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial " << trial << " is " << newLikelihood << "." << std::endl; if (newLikelihood > bestLikelihood) { // Save new likelihood and copy new model. bestLikelihood = newLikelihood; means = meansTrial; covariances = covariancesTrial; weights = weightsTrial; } } } // Report final log-likelihood and return it. Rcpp::Rcout << "GMM::Estimate(): log-likelihood of trained GMM is " << bestLikelihood << "." << std::endl; return bestLikelihood; } /** * Fit the GMM to the given observations, each of which has a certain * probability of being from this distribution. */ template double GMM::Estimate(const arma::mat& observations, const arma::vec& probabilities, const size_t trials, const bool useExistingModel) { double bestLikelihood; // This will be reported later. // We don't need to store temporary models if we are only doing one trial. if (trials == 1) { // Train the model. The user will have been warned earlier if the GMM was // initialized with no parameters (0 gaussians, dimensionality of 0). fitter.Estimate(observations, probabilities, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); } else { if (trials == 0) return -DBL_MAX; // It's what they asked for... // If each trial must start from the same initial location, we must save it. std::vector meansOrig; std::vector covariancesOrig; arma::vec weightsOrig; if (useExistingModel) { meansOrig = means; covariancesOrig = covariances; weightsOrig = weights; } // We need to keep temporary copies. We'll do the first training into the // actual model position, so that if it's the best we don't need to copy it. fitter.Estimate(observations, probabilities, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial 0 is " << bestLikelihood << "." << std::endl; // Now the temporary model. std::vector meansTrial(gaussians, arma::vec(dimensionality)); std::vector covariancesTrial(gaussians, arma::mat(dimensionality, dimensionality)); arma::vec weightsTrial(gaussians); for (size_t trial = 1; trial < trials; ++trial) { if (useExistingModel) { meansTrial = meansOrig; covariancesTrial = covariancesOrig; weightsTrial = weightsOrig; } fitter.Estimate(observations, meansTrial, covariancesTrial, weightsTrial, useExistingModel); // Check to see if the log-likelihood of this one is better. double newLikelihood = LogLikelihood(observations, meansTrial, covariancesTrial, weightsTrial); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial " << trial << " is " << newLikelihood << "." << std::endl; if (newLikelihood > bestLikelihood) { // Save new likelihood and copy new model. bestLikelihood = newLikelihood; means = meansTrial; covariances = covariancesTrial; weights = weightsTrial; } } } // Report final log-likelihood and return it. Rcpp::Rcout << "GMM::Estimate(): log-likelihood of trained GMM is " << bestLikelihood << "." << std::endl; return bestLikelihood; } /** * Classify the given observations as being from an individual component in this * GMM. */ template void GMM::Classify(const arma::mat& observations, arma::Col& labels) const { // This is not the best way to do this! // We should not have to fill this with values, because each one should be // overwritten. labels.set_size(observations.n_cols); for (size_t i = 0; i < observations.n_cols; ++i) { // Find maximum probability component. double probability = 0; for (size_t j = 0; j < gaussians; ++j) { double newProb = Probability(observations.unsafe_col(i), j); if (newProb >= probability) { probability = newProb; labels[i] = j; } } } } /** * Get the log-likelihood of this data's fit to the model. */ template double GMM::LogLikelihood( const arma::mat& data, const std::vector& meansL, const std::vector& covariancesL, const arma::vec& weightsL) const { double loglikelihood = 0; arma::vec phis; arma::mat likelihoods(gaussians, data.n_cols); for (size_t i = 0; i < gaussians; i++) { phi(data, meansL[i], covariancesL[i], phis); likelihoods.row(i) = weightsL(i) * trans(phis); } // Now sum over every point. for (size_t j = 0; j < data.n_cols; j++) loglikelihood += log(accu(likelihoods.col(j))); return loglikelihood; } template std::string GMM::ToString() const { std::ostringstream convert; std::ostringstream data; convert << "GMM [" << this << "]" << std::endl; convert << " Gaussians: " << gaussians << std::endl; convert << " Dimensionality: "<(). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_EM_FIT_HPP #define __MLPACK_METHODS_GMM_EM_FIT_HPP #include // Default clustering mechanism. #include // Default covariance matrix constraint. #include "positive_definite_constraint.hpp" namespace mlpack { namespace gmm { /** * This class contains methods which can fit a GMM to observations using the EM * algorithm. It requires an initial clustering mechanism, which is by default * the KMeans algorithm. The clustering mechanism must implement the following * method: * * - void Cluster(const arma::mat& observations, * const size_t clusters, * arma::Col& assignments); * * This method should create 'clusters' clusters, and return the assignment of * each point to a cluster. */ template, typename CovarianceConstraintPolicy = PositiveDefiniteConstraint> class EMFit { public: /** * Construct the EMFit object, optionally passing an InitialClusteringType * object (just in case it needs to store state). Setting the maximum number * of iterations to 0 means that the EM algorithm will iterate until * convergence (with the given tolerance). * * The parameter forcePositive controls whether or not the covariance matrices * are checked for positive definiteness at each iteration. This could be a * time-consuming task, so, if you know your data is well-behaved, you can set * it to false and save some runtime. * * @param maxIterations Maximum number of iterations for EM. * @param tolerance Log-likelihood tolerance required for convergence. * @param forcePositive Check for positive-definiteness of each covariance * matrix at each iteration. * @param clusterer Object which will perform the initial clustering. */ EMFit(const size_t maxIterations = 300, const double tolerance = 1e-10, InitialClusteringType clusterer = InitialClusteringType(), CovarianceConstraintPolicy constraint = CovarianceConstraintPolicy()); /** * Fit the observations to a Gaussian mixture model (GMM) using the EM * algorithm. The size of the vectors (indicating the number of components) * must already be set. Optionally, if useInitialModel is set to true, then * the model given in the means, covariances, and weights parameters is used * as the initial model, instead of using the InitialClusteringType::Cluster() * option. * * @param observations List of observations to train on. * @param means Vector to store trained means in. * @param covariances Vector to store trained covariances in. * @param weights Vector to store a priori weights in. * @param useInitialModel If true, the given model is used for the initial * clustering. */ void Estimate(const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel = false); /** * Fit the observations to a Gaussian mixture model (GMM) using the EM * algorithm, taking into account the probabilities of each point being from * this mixture. The size of the vectors (indicating the number of * components) must already be set. Optionally, if useInitialModel is set to * true, then the model given in the means, covariances, and weights * parameters is used as the initial model, instead of using the * InitialClusteringType::Cluster() option. * * @param observations List of observations to train on. * @param probabilities Probability of each point being from this model. * @param means Vector to store trained means in. * @param covariances Vector to store trained covariances in. * @param weights Vector to store a priori weights in. * @param useInitialModel If true, the given model is used for the initial * clustering. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel = false); //! Get the clusterer. const InitialClusteringType& Clusterer() const { return clusterer; } //! Modify the clusterer. InitialClusteringType& Clusterer() { return clusterer; } //! Get the covariance constraint policy class. const CovarianceConstraintPolicy& Constraint() const { return constraint; } //! Modify the covariance constraint policy class. CovarianceConstraintPolicy& Constraint() { return constraint; } //! Get the maximum number of iterations of the EM algorithm. size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations of the EM algorithm. size_t& MaxIterations() { return maxIterations; } //! Get the tolerance for the convergence of the EM algorithm. double Tolerance() const { return tolerance; } //! Modify the tolerance for the convergence of the EM algorithm. double& Tolerance() { return tolerance; } private: /** * Run the clusterer, and then turn the cluster assignments into Gaussians. * This is a helper function for both overloads of Estimate(). The vectors * must be already set to the number of clusters. * * @param observations List of observations. * @param means Vector to store means in. * @param covariances Vector to store covariances in. * @param weights Vector to store a priori weights in. */ void InitialClustering(const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights); /** * Calculate the log-likelihood of a model. Yes, this is reimplemented in the * GMM code. Intuition suggests that the log-likelihood is not the best way * to determine if the EM algorithm has converged. * * @param data Data matrix. * @param means Vector of means. * @param covariances Vector of covariance matrices. * @param weights Vector of a priori weights. */ double LogLikelihood(const arma::mat& data, const std::vector& means, const std::vector& covariances, const arma::vec& weights) const; //! Maximum iterations of EM algorithm. size_t maxIterations; //! Tolerance for convergence of EM. double tolerance; //! Object which will perform the clustering. InitialClusteringType clusterer; //! Object which applies constraints to the covariance matrix. CovarianceConstraintPolicy constraint; }; }; // namespace gmm }; // namespace mlpack // Include implementation. #include "em_fit_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/gmm/positive_definite_constraint.hpp0000644000176200001440000000337113647512751025034 0ustar liggesusers/** * @file positive_definite_constraint.hpp * @author Ryan Curtin * * Restricts a covariance matrix to being positive definite. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_POSITIVE_DEFINITE_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_POSITIVE_DEFINITE_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * Given a covariance matrix, force the matrix to be positive definite. */ class PositiveDefiniteConstraint { public: /** * Apply the positive definiteness constraint to the given covariance matrix. * * @param covariance Covariance matrix. */ static void ApplyConstraint(arma::mat& covariance) { // TODO: make this more efficient. if (arma::det(covariance) <= 1e-50) { Rcpp::Rcout << "Covariance matrix is not positive definite. Adding " << "perturbation." << std::endl; double perturbation = 1e-30; while (arma::det(covariance) <= 1e-50) { covariance.diag() += perturbation; perturbation *= 10; } } } }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/gmm/em_fit_impl.hpp0000644000176200001440000002501113647512751021336 0ustar liggesusers/** * @file em_fit_impl.hpp * @author Ryan Curtin * * Implementation of EM algorithm for fitting GMMs. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_EM_FIT_IMPL_HPP #define __MLPACK_METHODS_GMM_EM_FIT_IMPL_HPP // In case it hasn't been included yet. #include "em_fit.hpp" // Definition of phi(). #include "phi.hpp" namespace mlpack { namespace gmm { //! Constructor. template EMFit::EMFit( const size_t maxIterations, const double tolerance, InitialClusteringType clusterer, CovarianceConstraintPolicy constraint) : maxIterations(maxIterations), tolerance(tolerance), clusterer(clusterer), constraint(constraint) { /* Nothing to do. */ } template void EMFit::Estimate( const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel) { // Only perform initial clustering if the user wanted it. if (!useInitialModel) InitialClustering(observations, means, covariances, weights); double l = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "EMFit::Estimate(): initial clustering log-likelihood: " << l << std::endl; double lOld = -DBL_MAX; arma::mat condProb(observations.n_cols, means.size()); // Iterate to update the model until no more improvement is found. size_t iteration = 1; while (std::abs(l - lOld) > tolerance && iteration != maxIterations) { Rcpp::Rcout << "EMFit::Estimate(): iteration " << iteration << ", " << "log-likelihood " << l << "." << std::endl; // Calculate the conditional probabilities of choosing a particular // Gaussian given the observations and the present theta value. for (size_t i = 0; i < means.size(); i++) { // Store conditional probabilities into condProb vector for each // Gaussian. First we make an alias of the condProb vector. arma::vec condProbAlias = condProb.unsafe_col(i); phi(observations, means[i], covariances[i], condProbAlias); condProbAlias *= weights[i]; } // Normalize row-wise. for (size_t i = 0; i < condProb.n_rows; i++) { // Avoid dividing by zero; if the probability for everything is 0, we // don't want to make it NaN. const double probSum = accu(condProb.row(i)); if (probSum != 0.0) condProb.row(i) /= probSum; } // Store the sum of the probability of each state over all the observations. arma::vec probRowSums = trans(arma::sum(condProb, 0 /* columnwise */)); // Calculate the new value of the means using the updated conditional // probabilities. for (size_t i = 0; i < means.size(); i++) { // Don't update if there's no probability of the Gaussian having points. if (probRowSums[i] != 0) means[i] = (observations * condProb.col(i)) / probRowSums[i]; // Calculate the new value of the covariances using the updated // conditional probabilities and the updated means. arma::mat tmp = observations - (means[i] * arma::ones(observations.n_cols)); arma::mat tmpB = tmp % (arma::ones(observations.n_rows) * trans(condProb.col(i))); // Don't update if there's no probability of the Gaussian having points. if (probRowSums[i] != 0.0) covariances[i] = (tmp * trans(tmpB)) / probRowSums[i]; // Apply covariance constraint. constraint.ApplyConstraint(covariances[i]); } // Calculate the new values for omega using the updated conditional // probabilities. weights = probRowSums / observations.n_cols; // Update values of l; calculate new log-likelihood. lOld = l; l = LogLikelihood(observations, means, covariances, weights); iteration++; } } template void EMFit::Estimate( const arma::mat& observations, const arma::vec& probabilities, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel) { if (!useInitialModel) InitialClustering(observations, means, covariances, weights); double l = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "EMFit::Estimate(): initial clustering log-likelihood: " << l << std::endl; double lOld = -DBL_MAX; arma::mat condProb(observations.n_cols, means.size()); // Iterate to update the model until no more improvement is found. size_t iteration = 1; while (std::abs(l - lOld) > tolerance && iteration != maxIterations) { // Calculate the conditional probabilities of choosing a particular // Gaussian given the observations and the present theta value. for (size_t i = 0; i < means.size(); i++) { // Store conditional probabilities into condProb vector for each // Gaussian. First we make an alias of the condProb vector. arma::vec condProbAlias = condProb.unsafe_col(i); phi(observations, means[i], covariances[i], condProbAlias); condProbAlias *= weights[i]; } // Normalize row-wise. for (size_t i = 0; i < condProb.n_rows; i++) { // Avoid dividing by zero; if the probability for everything is 0, we // don't want to make it NaN. const double probSum = accu(condProb.row(i)); if (probSum != 0.0) condProb.row(i) /= probSum; } // This will store the sum of probabilities of each state over all the // observations. arma::vec probRowSums(means.size()); // Calculate the new value of the means using the updated conditional // probabilities. for (size_t i = 0; i < means.size(); i++) { // Calculate the sum of probabilities of points, which is the // conditional probability of each point being from Gaussian i // multiplied by the probability of the point being from this mixture // model. probRowSums[i] = accu(condProb.col(i) % probabilities); means[i] = (observations * (condProb.col(i) % probabilities)) / probRowSums[i]; // Calculate the new value of the covariances using the updated // conditional probabilities and the updated means. arma::mat tmp = observations - (means[i] * arma::ones(observations.n_cols)); arma::mat tmpB = tmp % (arma::ones(observations.n_rows) * trans(condProb.col(i) % probabilities)); covariances[i] = (tmp * trans(tmpB)) / probRowSums[i]; // Apply covariance constraint. constraint.ApplyConstraint(covariances[i]); } // Calculate the new values for omega using the updated conditional // probabilities. weights = probRowSums / accu(probabilities); // Update values of l; calculate new log-likelihood. lOld = l; l = LogLikelihood(observations, means, covariances, weights); iteration++; } } template void EMFit:: InitialClustering(const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights) { // Assignments from clustering. arma::Col assignments; // Run clustering algorithm. clusterer.Cluster(observations, means.size(), assignments); // Now calculate the means, covariances, and weights. weights.zeros(); for (size_t i = 0; i < means.size(); ++i) { means[i].zeros(); covariances[i].zeros(); } // From the assignments, generate our means, covariances, and weights. for (size_t i = 0; i < observations.n_cols; ++i) { const size_t cluster = assignments[i]; // Add this to the relevant mean. means[cluster] += observations.col(i); // Add this to the relevant covariance. // covariances[cluster] += observations.col(i) * trans(observations.col(i)); // Now add one to the weights (we will normalize). weights[cluster]++; } // Now normalize the mean and covariance. for (size_t i = 0; i < means.size(); ++i) { // covariances[i] -= means[i] * trans(means[i]); means[i] /= (weights[i] > 1) ? weights[i] : 1; // covariances[i] /= (weights[i] > 1) ? weights[i] : 1; } for (size_t i = 0; i < observations.n_cols; ++i) { const size_t cluster = assignments[i]; const arma::vec normObs = observations.col(i) - means[cluster]; covariances[cluster] += normObs * normObs.t(); } for (size_t i = 0; i < means.size(); ++i) { covariances[i] /= (weights[i] > 1) ? weights[i] : 1; // Apply constraints to covariance matrix. constraint.ApplyConstraint(covariances[i]); } // Finally, normalize weights. weights /= accu(weights); } template double EMFit::LogLikelihood( const arma::mat& observations, const std::vector& means, const std::vector& covariances, const arma::vec& weights) const { double logLikelihood = 0; arma::vec phis; arma::mat likelihoods(means.size(), observations.n_cols); for (size_t i = 0; i < means.size(); ++i) { phi(observations, means[i], covariances[i], phis); likelihoods.row(i) = weights(i) * trans(phis); } // Now sum over every point. for (size_t j = 0; j < observations.n_cols; ++j) { if (accu(likelihoods.col(j)) == 0) Rcpp::Rcout << "Likelihood of point " << j << " is 0! It is probably an " << "outlier." << std::endl; logLikelihood += log(accu(likelihoods.col(j))); } return logLikelihood; } }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/gmm/eigenvalue_ratio_constraint.hpp0000644000176200001440000000656613647512751024656 0ustar liggesusers/** * @file eigenvalue_ratio_constraint.hpp * @author Ryan Curtin * * Constrain a covariance matrix to have a certain ratio of eigenvalues. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_EIGENVALUE_RATIO_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_EIGENVALUE_RATIO_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * Given a vector of eigenvalue ratios, ensure that the covariance matrix always * has those eigenvalue ratios. When you create this object, make sure that the * vector of ratios that you pass does not go out of scope, because this object * holds a reference to that vector instead of copying it. */ class EigenvalueRatioConstraint { public: /** * Create the EigenvalueRatioConstraint object with the given vector of * eigenvalue ratios. These ratios are with respect to the first eigenvalue, * which is the largest eigenvalue, so the first element of the vector should * be 1. In addition, all other elements should be less than or equal to 1. */ EigenvalueRatioConstraint(const arma::vec& ratios) : ratios(ratios) { // Check validity of ratios. if (std::abs(ratios[0] - 1.0) > 1e-20) Rcpp::Rcout << "EigenvalueRatioConstraint::EigenvalueRatioConstraint(): " << "first element of ratio vector is not 1.0!" << std::endl; for (size_t i = 1; i < ratios.n_elem; ++i) { if (ratios[i] > 1.0) Rcpp::Rcout << "EigenvalueRatioConstraint::EigenvalueRatioConstraint(): " << "element " << i << " of ratio vector is greater than 1.0!" << std::endl; if (ratios[i] < 0.0) Rcpp::Rcout << "EigenvalueRatioConstraint::EigenvalueRatioConstraint(): " << "element " << i << " of ratio vectors is negative and will " << "probably cause the covariance to be non-invertible..." << std::endl; } } /** * Apply the eigenvalue ratio constraint to the given covariance matrix. */ void ApplyConstraint(arma::mat& covariance) const { // Eigendecompose the matrix. arma::vec eigenvalues; arma::mat eigenvectors; arma::eig_sym(eigenvalues, eigenvectors, covariance); // Change the eigenvalues to what we are forcing them to be. There // shouldn't be any negative eigenvalues anyway, so it doesn't matter if we // are suddenly forcing them to be positive. If the first eigenvalue is // negative, well, there are going to be some problems later... eigenvalues = (eigenvalues[0] * ratios); // Reassemble the matrix. covariance = eigenvectors * arma::diagmat(eigenvalues) * eigenvectors.t(); } private: //! Ratios for eigenvalues. const arma::vec& ratios; }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/gmm/phi.hpp0000644000176200001440000001174413647512751017642 0ustar liggesusers/** * @author Parikshit Ram (pram@cc.gatech.edu) * @file phi.hpp * * This file computes the Gaussian probability * density function * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_MOG_PHI_HPP #define __MLPACK_METHODS_MOG_PHI_HPP #include namespace mlpack { namespace gmm { /** * Calculates the univariate Gaussian probability density function. * * Example use: * @code * double x, mean, var; * .... * double f = phi(x, mean, var); * @endcode * * @param x Observation. * @param mean Mean of univariate Gaussian. * @param var Variance of univariate Gaussian. * @return Probability of x being observed from the given univariate Gaussian. */ inline double phi(const double x, const double mean, const double var) { return exp(-1.0 * ((x - mean) * (x - mean) / (2 * var))) / sqrt(2 * M_PI * var); } /** * Calculates the multivariate Gaussian probability density function. * * Example use: * @code * extern arma::vec x, mean; * extern arma::mat cov; * .... * double f = phi(x, mean, cov); * @endcode * * @param x Observation. * @param mean Mean of multivariate Gaussian. * @param cov Covariance of multivariate Gaussian. * @return Probability of x being observed from the given multivariate Gaussian. */ inline double phi(const arma::vec& x, const arma::vec& mean, const arma::mat& cov) { arma::vec diff = mean - x; // Parentheses required for Armadillo 3.0.0 bug. arma::vec exponent = -0.5 * (trans(diff) * inv(cov) * diff); // TODO: What if det(cov) < 0? return pow(2 * M_PI, (double) x.n_elem / -2.0) * pow(det(cov), -0.5) * exp(exponent[0]); } /** * Calculates the multivariate Gaussian probability density function and also * the gradients with respect to the mean and the variance. * * Example use: * @code * extern arma::vec x, mean, g_mean, g_cov; * std::vector d_cov; // the dSigma * .... * double f = phi(x, mean, cov, d_cov, &g_mean, &g_cov); * @endcode */ inline double phi(const arma::vec& x, const arma::vec& mean, const arma::mat& cov, const std::vector& d_cov, arma::vec& g_mean, arma::vec& g_cov) { // We don't call out to another version of the function to avoid inverting the // covariance matrix more than once. arma::mat cinv = inv(cov); arma::vec diff = mean - x; // Parentheses required for Armadillo 3.0.0 bug. arma::vec exponent = -0.5 * (trans(diff) * inv(cov) * diff); long double f = pow(2 * M_PI, (double) x.n_elem / 2) * pow(det(cov), -0.5) * exp(exponent[0]); // Calculate the g_mean values; this is a (1 x dim) vector. arma::vec invDiff = cinv * diff; g_mean = f * invDiff; // Calculate the g_cov values; this is a (1 x (dim * (dim + 1) / 2)) vector. for (size_t i = 0; i < d_cov.size(); i++) { arma::mat inv_d = cinv * d_cov[i]; g_cov[i] = f * dot(d_cov[i] * invDiff, invDiff) + accu(inv_d.diag()) / 2; } return f; } /** * Calculates the multivariate Gaussian probability density function for each * data point (column) in the given matrix, with respect to the given mean and * variance. * * @param x List of observations. * @param mean Mean of multivariate Gaussian. * @param cov Covariance of multivariate Gaussian. * @param probabilities Output probabilities for each input observation. */ inline void phi(const arma::mat& x, const arma::vec& mean, const arma::mat& cov, arma::vec& probabilities) { // Column i of 'diffs' is the difference between x.col(i) and the mean. arma::mat diffs = x - (mean * arma::ones(x.n_cols)); // Now, we only want to calculate the diagonal elements of (diffs' * cov^-1 * // diffs). We just don't need any of the other elements. We can calculate // the right hand part of the equation (instead of the left side) so that // later we are referencing columns, not rows -- that is faster. arma::mat rhs = -0.5 * inv(cov) * diffs; arma::vec exponents(diffs.n_cols); // We will now fill this. for (size_t i = 0; i < diffs.n_cols; i++) exponents(i) = exp(accu(diffs.unsafe_col(i) % rhs.unsafe_col(i))); probabilities = pow(2 * M_PI, (double) mean.n_elem / -2.0) * pow(det(cov), -0.5) * exponents; } }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/gmm/no_constraint.hpp0000644000176200001440000000253213647512751021735 0ustar liggesusers/** * @file no_constraint.hpp * @author Ryan Curtin * * No constraint on the covariance matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_NO_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_NO_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * This class enforces no constraint on the covariance matrix. It's faster this * way, although depending on your situation you may end up with a * non-invertible covariance matrix. */ class NoConstraint { public: //! Do nothing, and do not modify the covariance matrix. static void ApplyConstraint(const arma::mat& /* covariance */) { } }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/gmm/diagonal_constraint.hpp0000644000176200001440000000251613647512751023101 0ustar liggesusers/** * @file diagonal_constraint.hpp * @author Ryan Curtin * * Constrain a covariance matrix to be diagonal. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_DIAGONAL_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_DIAGONAL_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * Force a covariance matrix to be diagonal. */ class DiagonalConstraint { public: //! Force a covariance matrix to be diagonal. static void ApplyConstraint(arma::mat& covariance) { // Save the diagonal only. arma::vec diagonal = covariance.diag(); covariance = arma::diagmat(diagonal); } }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/gmm/gmm.hpp0000644000176200001440000003372113647512751017641 0ustar liggesusers/** * @author Parikshit Ram (pram@cc.gatech.edu) * @file gmm.hpp * * Defines a Gaussian Mixture model and * estimates the parameters of the model * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_MOG_MOG_EM_HPP #define __MLPACK_METHODS_MOG_MOG_EM_HPP #include // This is the default fitting method class. #include "em_fit.hpp" namespace mlpack { namespace gmm /** Gaussian Mixture Models. */ { /** * A Gaussian Mixture Model (GMM). This class uses maximum likelihood loss * functions to estimate the parameters of the GMM on a given dataset via the * given fitting mechanism, defined by the FittingType template parameter. The * GMM can be trained using normal data, or data with probabilities of being * from this GMM (see GMM::Estimate() for more information). * * The FittingType template class must provide a way for the GMM to train on * data. It must provide the following two functions: * * @code * void Estimate(const arma::mat& observations, * std::vector& means, * std::vector& covariances, * arma::vec& weights); * * void Estimate(const arma::mat& observations, * const arma::vec& probabilities, * std::vector& means, * std::vector& covariances, * arma::vec& weights); * @endcode * * These functions should produce a trained GMM from the given observations and * probabilities. These may modify the size of the model (by increasing the * size of the mean and covariance vectors as well as the weight vectors), but * the method should expect that these vectors are already set to the size of * the GMM as specified in the constructor. * * For a sample implementation, see the EMFit class; this class uses the EM * algorithm to train a GMM, and is the default fitting type. * * The GMM, once trained, can be used to generate random points from the * distribution and estimate the probability of points being from the * distribution. The parameters of the GMM can be obtained through the * accessors and mutators. * * Example use: * * @code * // Set up a mixture of 5 gaussians in a 4-dimensional space (uses the default * // EM fitting mechanism). * GMM<> g(5, 4); * * // Train the GMM given the data observations. * g.Estimate(data); * * // Get the probability of 'observation' being observed from this GMM. * double probability = g.Probability(observation); * * // Get a random observation from the GMM. * arma::vec observation = g.Random(); * @endcode */ template > class GMM { private: //! The number of Gaussians in the model. size_t gaussians; //! The dimensionality of the model. size_t dimensionality; //! Vector of means; one for each Gaussian. std::vector means; //! Vector of covariances; one for each Gaussian. std::vector covariances; //! Vector of a priori weights for each Gaussian. arma::vec weights; public: /** * Create an empty Gaussian Mixture Model, with zero gaussians. */ GMM() : gaussians(0), dimensionality(0), localFitter(FittingType()), fitter(localFitter) { // Warn the user. They probably don't want to do this. If this constructor // is being used (because it is required by some template classes), the user // should know that it is potentially dangerous. Rcpp::Rcout << "GMM::GMM(): no parameters given; Estimate() may fail " << "unless parameters are set." << std::endl; } /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. The means and covariances will be set to 0. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. */ GMM(const size_t gaussians, const size_t dimensionality); /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. Also, pass in an initialized FittingType class; * this is useful in cases where the FittingType class needs to store some * state. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. * @param fitter Initialized fitting mechanism. */ GMM(const size_t gaussians, const size_t dimensionality, FittingType& fitter); /** * Create a GMM with the given means, covariances, and weights. * * @param means Means of the model. * @param covariances Covariances of the model. * @param weights Weights of the model. */ GMM(const std::vector& means, const std::vector& covariances, const arma::vec& weights) : gaussians(means.size()), dimensionality((!means.empty()) ? means[0].n_elem : 0), means(means), covariances(covariances), weights(weights), localFitter(FittingType()), fitter(localFitter) { /* Nothing to do. */ } /** * Create a GMM with the given means, covariances, and weights, and use the * given initialized FittingType class. This is useful in cases where the * FittingType class needs to store some state. * * @param means Means of the model. * @param covariances Covariances of the model. * @param weights Weights of the model. */ GMM(const std::vector& means, const std::vector& covariances, const arma::vec& weights, FittingType& fitter) : gaussians(means.size()), dimensionality((!means.empty()) ? means[0].n_elem : 0), means(means), covariances(covariances), weights(weights), fitter(fitter) { /* Nothing to do. */ } /** * Copy constructor for GMMs which use different fitting types. */ template GMM(const GMM& other); /** * Copy constructor for GMMs using the same fitting type. This also copies * the fitter. */ GMM(const GMM& other); /** * Copy operator for GMMs which use different fitting types. */ template GMM& operator=(const GMM& other); /** * Copy operator for GMMs which use the same fitting type. This also copies * the fitter. */ GMM& operator=(const GMM& other); /** * Load a GMM from an XML file. The format of the XML file should be the same * as is generated by the Save() method. * * @param filename Name of XML file containing model to be loaded. */ //void Load(const std::string& filename); /** * Save a GMM to an XML file. * * @param filename Name of XML file to write to. */ //void Save(const std::string& filename) const; //! Return the number of gaussians in the model. size_t Gaussians() const { return gaussians; } //! Modify the number of gaussians in the model. Careful! You will have to //! resize the means, covariances, and weights yourself. size_t& Gaussians() { return gaussians; } //! Return the dimensionality of the model. size_t Dimensionality() const { return dimensionality; } //! Modify the dimensionality of the model. Careful! You will have to update //! each mean and covariance matrix yourself. size_t& Dimensionality() { return dimensionality; } //! Return a const reference to the vector of means (mu). const std::vector& Means() const { return means; } //! Return a reference to the vector of means (mu). std::vector& Means() { return means; } //! Return a const reference to the vector of covariance matrices (sigma). const std::vector& Covariances() const { return covariances; } //! Return a reference to the vector of covariance matrices (sigma). std::vector& Covariances() { return covariances; } //! Return a const reference to the a priori weights of each Gaussian. const arma::vec& Weights() const { return weights; } //! Return a reference to the a priori weights of each Gaussian. arma::vec& Weights() { return weights; } //! Return a const reference to the fitting type. const FittingType& Fitter() const { return fitter; } //! Return a reference to the fitting type. FittingType& Fitter() { return fitter; } /** * Return the probability that the given observation came from this * distribution. * * @param observation Observation to evaluate the probability of. */ double Probability(const arma::vec& observation) const; /** * Return the probability that the given observation came from the given * Gaussian component in this distribution. * * @param observation Observation to evaluate the probability of. * @param component Index of the component of the GMM to be considered. */ double Probability(const arma::vec& observation, const size_t component) const; /** * Return a randomly generated observation according to the probability * distribution defined by this object. * * @return Random observation from this GMM. */ arma::vec Random() const; /** * Estimate the probability distribution directly from the given observations, * using the given algorithm in the FittingType class to fit the data. * * The fitting will be performed 'trials' times; from these trials, the model * with the greatest log-likelihood will be selected. By default, only one * trial is performed. The log-likelihood of the best fitting is returned. * * Optionally, the existing model can be used as an initial model for the * estimation by setting 'useExistingModel' to true. If the fitting procedure * is deterministic after the initial position is given, then 'trials' should * be set to 1. * * @tparam FittingType The type of fitting method which should be used * (EMFit<> is suggested). * @param observations Observations of the model. * @param trials Number of trials to perform; the model in these trials with * the greatest log-likelihood will be selected. * @param useExistingModel If true, the existing model is used as an initial * model for the estimation. * @return The log-likelihood of the best fit. */ double Estimate(const arma::mat& observations, const size_t trials = 1, const bool useExistingModel = false); /** * Estimate the probability distribution directly from the given observations, * taking into account the probability of each observation actually being from * this distribution, and using the given algorithm in the FittingType class * to fit the data. * * The fitting will be performed 'trials' times; from these trials, the model * with the greatest log-likelihood will be selected. By default, only one * trial is performed. The log-likelihood of the best fitting is returned. * * Optionally, the existing model can be used as an initial model for the * estimation by setting 'useExistingModel' to true. If the fitting procedure * is deterministic after the initial position is given, then 'trials' should * be set to 1. * * @param observations Observations of the model. * @param probabilities Probability of each observation being from this * distribution. * @param trials Number of trials to perform; the model in these trials with * the greatest log-likelihood will be selected. * @param useExistingModel If true, the existing model is used as an initial * model for the estimation. * @return The log-likelihood of the best fit. */ double Estimate(const arma::mat& observations, const arma::vec& probabilities, const size_t trials = 1, const bool useExistingModel = false); /** * Classify the given observations as being from an individual component in * this GMM. The resultant classifications are stored in the 'labels' object, * and each label will be between 0 and (Gaussians() - 1). Supposing that a * point was classified with label 2, and that our GMM object was called * 'gmm', one could access the relevant Gaussian distribution as follows: * * @code * arma::vec mean = gmm.Means()[2]; * arma::mat covariance = gmm.Covariances()[2]; * double priorWeight = gmm.Weights()[2]; * @endcode * * @param observations List of observations to classify. * @param labels Object which will be filled with labels. */ void Classify(const arma::mat& observations, arma::Col& labels) const; /** * Returns a string representation of this object. */ std::string ToString() const; private: /** * This function computes the loglikelihood of the given model. This function * is used by GMM::Estimate(). * * @param dataPoints Observations to calculate the likelihood for. * @param means Means of the given mixture model. * @param covars Covariances of the given mixture model. * @param weights Weights of the given mixture model. */ double LogLikelihood(const arma::mat& dataPoints, const std::vector& means, const std::vector& covars, const arma::vec& weights) const; //! Locally-stored fitting object; in case the user did not pass one. FittingType localFitter; //! Reference to the fitting object we should use. FittingType& fitter; }; }; // namespace gmm }; // namespace mlpack // Include implementation. #include "gmm_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/det/0000755000176200001440000000000013647514343016335 5ustar liggesusersRcppMLPACK/src/mlpack/methods/det/dt_utils.cpp0000644000176200001440000002345013647514343020674 0ustar liggesusers/** * @file dt_utils.cpp * @author Parikshit Ram (pram@cc.gatech.edu) * * This file implements functions to perform different tasks with the Density * Tree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "dt_utils.hpp" using namespace mlpack; using namespace det; void mlpack::det::PrintLeafMembership(DTree* dtree, const arma::mat& data, const arma::Mat& labels, const size_t numClasses, const std::string leafClassMembershipFile) { // Tag the leaves with numbers. int numLeaves = dtree->TagTree(); arma::Mat table(numLeaves, (numClasses + 1)); table.zeros(); for (size_t i = 0; i < data.n_cols; i++) { const arma::vec testPoint = data.unsafe_col(i); const int leafTag = dtree->FindBucket(testPoint); const size_t label = labels[i]; table(leafTag, label) += 1; } if (leafClassMembershipFile == "") { Rcpp::Rcout << "Leaf membership; row represents leaf id, column represents " << "class id; value represents number of points in leaf in class." << std::endl << table; } else { // Create a stream for the file. std::ofstream outfile(leafClassMembershipFile.c_str()); if (outfile.good()) { outfile << table; Rcpp::Rcout << "Leaf membership printed to '" << leafClassMembershipFile << "'." << std::endl; } else { Rcpp::Rcout << "Can't open '" << leafClassMembershipFile << "' to write " << "leaf membership to." << std::endl; } outfile.close(); } return; } void mlpack::det::PrintVariableImportance(const DTree* dtree, const std::string viFile) { arma::vec imps; dtree->ComputeVariableImportance(imps); double max = 0.0; for (size_t i = 0; i < imps.n_elem; ++i) if (imps[i] > max) max = imps[i]; Rcpp::Rcout << "Maximum variable importance: " << max << "." << std::endl; if (viFile == "") { Rcpp::Rcout << "Variable importance: " << std::endl << imps.t() << std::endl; } else { std::ofstream outfile(viFile.c_str()); if (outfile.good()) { outfile << imps; Rcpp::Rcout << "Variable importance printed to '" << viFile << "'." << std::endl; } else { Rcpp::Rcout << "Can't open '" << viFile << "' to write variable importance " << "to." << std::endl; } outfile.close(); } } // This function trains the optimal decision tree using the given number of // folds. DTree* mlpack::det::Trainer(arma::mat& dataset, const size_t folds, const bool useVolumeReg, const size_t maxLeafSize, const size_t minLeafSize, const std::string unprunedTreeOutput) { // Initialize the tree. DTree* dtree = new DTree(dataset); // Prepare to grow the tree... arma::Col oldFromNew(dataset.n_cols); for (size_t i = 0; i < oldFromNew.n_elem; i++) oldFromNew[i] = i; // Save the dataset since it would be modified while growing the tree. arma::mat newDataset(dataset); // Growing the tree double oldAlpha = 0.0; double alpha = dtree->Grow(newDataset, oldFromNew, useVolumeReg, maxLeafSize, minLeafSize); Rcpp::Rcout << dtree->SubtreeLeaves() << " leaf nodes in the tree using full " << "dataset; minimum alpha: " << alpha << "." << std::endl; // Compute densities for the training points in the full tree, if we were // asked for this. if (unprunedTreeOutput != "") { std::ofstream outfile(unprunedTreeOutput.c_str()); if (outfile.good()) { for (size_t i = 0; i < dataset.n_cols; ++i) { arma::vec testPoint = dataset.unsafe_col(i); outfile << dtree->ComputeValue(testPoint) << std::endl; } } else { Rcpp::Rcout << "Can't open '" << unprunedTreeOutput << "' to write computed" << " densities to." << std::endl; } outfile.close(); } // Sequentially prune and save the alpha values and the values of c_t^2 * r_t. std::vector > prunedSequence; while (dtree->SubtreeLeaves() > 1) { std::pair treeSeq(oldAlpha, dtree->SubtreeLeavesLogNegError()); prunedSequence.push_back(treeSeq); oldAlpha = alpha; alpha = dtree->PruneAndUpdate(oldAlpha, dataset.n_cols, useVolumeReg); // Some sanity checks. //Log::Assert((alpha < std::numeric_limits::max()) || // (dtree->SubtreeLeaves() == 1)); //Log::Assert(alpha > oldAlpha); //Log::Assert(dtree->SubtreeLeavesLogNegError() < treeSeq.second); } std::pair treeSeq(oldAlpha, dtree->SubtreeLeavesLogNegError()); prunedSequence.push_back(treeSeq); Rcpp::Rcout << prunedSequence.size() << " trees in the sequence; maximum alpha:" << " " << oldAlpha << "." << std::endl; delete dtree; arma::mat cvData(dataset); size_t testSize = dataset.n_cols / folds; std::vector regularizationConstants; regularizationConstants.resize(prunedSequence.size(), 0); // Go through each fold. for (size_t fold = 0; fold < folds; fold++) { // Break up data into train and test sets. size_t start = fold * testSize; size_t end = std::min((fold + 1) * testSize, (size_t) cvData.n_cols); arma::mat test = cvData.cols(start, end - 1); arma::mat train(cvData.n_rows, cvData.n_cols - test.n_cols); if (start == 0 && end < cvData.n_cols) { train.cols(0, train.n_cols - 1) = cvData.cols(end, cvData.n_cols - 1); } else if (start > 0 && end == cvData.n_cols) { train.cols(0, train.n_cols - 1) = cvData.cols(0, start - 1); } else { train.cols(0, start - 1) = cvData.cols(0, start - 1); train.cols(start, train.n_cols - 1) = cvData.cols(end, cvData.n_cols - 1); } // Initialize the tree. DTree* cvDTree = new DTree(train); // Getting ready to grow the tree... arma::Col cvOldFromNew(train.n_cols); for (size_t i = 0; i < cvOldFromNew.n_elem; i++) cvOldFromNew[i] = i; // Grow the tree. oldAlpha = 0.0; alpha = cvDTree->Grow(train, cvOldFromNew, useVolumeReg, maxLeafSize, minLeafSize); // Sequentially prune with all the values of available alphas and adding // values for test values. Don't enter this loop if there are less than two // trees in the pruned sequence. for (size_t i = 0; i < ((prunedSequence.size() < 2) ? 0 : prunedSequence.size() - 2); ++i) { // Compute test values for this state of the tree. double cvVal = 0.0; for (size_t j = 0; j < test.n_cols; j++) { arma::vec testPoint = test.unsafe_col(j); cvVal += cvDTree->ComputeValue(testPoint); } // Update the cv regularization constant. regularizationConstants[i] += 2.0 * cvVal / (double) dataset.n_cols; // Determine the new alpha value and prune accordingly. oldAlpha = 0.5 * (prunedSequence[i + 1].first + prunedSequence[i + 2].first); alpha = cvDTree->PruneAndUpdate(oldAlpha, train.n_cols, useVolumeReg); } // Compute test values for this state of the tree. double cvVal = 0.0; for (size_t i = 0; i < test.n_cols; ++i) { arma::vec testPoint = test.unsafe_col(i); cvVal += cvDTree->ComputeValue(testPoint); } if (prunedSequence.size() > 2) regularizationConstants[prunedSequence.size() - 2] += 2.0 * cvVal / (double) dataset.n_cols; test.reset(); delete cvDTree; } double optimalAlpha = -1.0; long double cvBestError = -std::numeric_limits::max(); for (size_t i = 0; i < prunedSequence.size() - 1; ++i) { // We can no longer work in the log-space for this because we have no // guarantee the quantity will be positive. long double thisError = -std::exp((long double) prunedSequence[i].second) + (long double) regularizationConstants[i]; if (thisError > cvBestError) { cvBestError = thisError; optimalAlpha = prunedSequence[i].first; } } Rcpp::Rcout << "Optimal alpha: " << optimalAlpha << "." << std::endl; // Initialize the tree. DTree* dtreeOpt = new DTree(dataset); // Getting ready to grow the tree... for (size_t i = 0; i < oldFromNew.n_elem; i++) oldFromNew[i] = i; // Save the dataset since it would be modified while growing the tree. newDataset = dataset; // Grow the tree. oldAlpha = -DBL_MAX; alpha = dtreeOpt->Grow(newDataset, oldFromNew, useVolumeReg, maxLeafSize, minLeafSize); // Prune with optimal alpha. while ((oldAlpha < optimalAlpha) && (dtreeOpt->SubtreeLeaves() > 1)) { oldAlpha = alpha; alpha = dtreeOpt->PruneAndUpdate(oldAlpha, newDataset.n_cols, useVolumeReg); // Some sanity checks. //Log::Assert((alpha < std::numeric_limits::max()) || // (dtreeOpt->SubtreeLeaves() == 1)); //Log::Assert(alpha > oldAlpha); } Rcpp::Rcout << dtreeOpt->SubtreeLeaves() << " leaf nodes in the optimally " << "pruned tree; optimal alpha: " << oldAlpha << "." << std::endl; return dtreeOpt; } RcppMLPACK/src/mlpack/methods/det/dt_utils.hpp0000644000176200001440000000611513647512751020701 0ustar liggesusers/** * @file dt_utils.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * This file implements functions to perform different tasks with the Density * Tree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DET_DT_UTILS_HPP #define __MLPACK_METHODS_DET_DT_UTILS_HPP #include #include "dtree.hpp" namespace mlpack { namespace det { /** * Print the membership of leaves of a density estimation tree given the labels * and number of classes. Optionally, pass the name of a file to print this * information to (otherwise stdout is used). * * @param dtree Tree to print membership of. * @param data Dataset tree is built upon. * @param labels Class labels of dataset. * @param numClasses Number of classes in dataset. * @param leafClassMembershipFile Name of file to print to (optional). */ void PrintLeafMembership(DTree* dtree, const arma::mat& data, const arma::Mat& labels, const size_t numClasses, const std::string leafClassMembershipFile = ""); /** * Print the variable importance of each dimension of a density estimation tree. * Optionally, pass the name of a file to print this information to (otherwise * stdout is used). * * @param dtree Density tree to use. * @param viFile Name of file to print to (optional). */ void PrintVariableImportance(const DTree* dtree, const std::string viFile = ""); /** * Train the optimal decision tree using cross-validation with the given number * of folds. Optionally, give a filename to print the unpruned tree to. This * initializes a tree on the heap, so you are responsible for deleting it. * * @param dataset Dataset for the tree to use. * @param folds Number of folds to use for cross-validation. * @param useVolumeReg If true, use volume regularization. * @param maxLeafSize Maximum number of points allowed in a leaf. * @param minLeafSize Minimum number of points allowed in a leaf. * @param unprunedTreeOutput Filename to print unpruned tree to (optional). */ DTree* Trainer(arma::mat& dataset, const size_t folds, const bool useVolumeReg = false, const size_t maxLeafSize = 10, const size_t minLeafSize = 5, const std::string unprunedTreeOutput = ""); }; // namespace det }; // namespace mlpack #endif // __MLPACK_METHODS_DET_DT_UTILS_HPP RcppMLPACK/src/mlpack/methods/det/dtree.hpp0000644000176200001440000002503313647512751020155 0ustar liggesusers/** * @file dtree.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * Density Estimation Tree class * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DET_DTREE_HPP #define __MLPACK_METHODS_DET_DTREE_HPP #include namespace mlpack { namespace det /** Density Estimation Trees */ { /** * A density estimation tree is similar to both a decision tree and a space * partitioning tree (like a kd-tree). Each leaf represents a constant-density * hyper-rectangle. The tree is constructed in such a way as to minimize the * integrated square error between the probability distribution of the tree and * the observed probability distribution of the data. Because the tree is * similar to a decision tree, the density estimation tree can provide very fast * density estimates for a given point. * * For more information, see the following paper: * * @code * @incollection{ram2011, * author = {Ram, Parikshit and Gray, Alexander G.}, * title = {Density estimation trees}, * booktitle = {{Proceedings of the 17th ACM SIGKDD International Conference * on Knowledge Discovery and Data Mining}}, * series = {KDD '11}, * year = {2011}, * pages = {627--635} * } * @endcode */ class DTree { public: /** * Create an empty density estimation tree. */ DTree(); /** * Create a density estimation tree with the given bounds and the given number * of total points. Children will not be created. * * @param maxVals Maximum values of the bounding box. * @param minVals Minimum values of the bounding box. * @param totalPoints Total number of points in the dataset. */ DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t totalPoints); /** * Create a density estimation tree on the given data. Children will be * created following the procedure outlined in the paper. The data will be * modified; it will be reordered similar to the way BinarySpaceTree modifies * datasets. * * @param data Dataset to build tree on. */ DTree(arma::mat& data); /** * Create a child node of a density estimation tree given the bounding box * specified by maxVals and minVals, using the size given in start and end and * the specified error. Children of this node will not be created * recursively. * * @param maxVals Upper bound of bounding box. * @param minVals Lower bound of bounding box. * @param start Start of points represented by this node in the data matrix. * @param end End of points represented by this node in the data matrix. * @param error log-negative error of this node. */ DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t start, const size_t end, const double logNegError); /** * Create a child node of a density estimation tree given the bounding box * specified by maxVals and minVals, using the size given in start and end, * and calculating the error with the total number of points given. Children * of this node will not be created recursively. * * @param maxVals Upper bound of bounding box. * @param minVals Lower bound of bounding box. * @param start Start of points represented by this node in the data matrix. * @param end End of points represented by this node in the data matrix. */ DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t totalPoints, const size_t start, const size_t end); //! Clean up memory allocated by the tree. ~DTree(); /** * Greedily expand the tree. The points in the dataset will be reordered * during tree growth. * * @param data Dataset to build tree on. * @param oldFromNew Mappings from old points to new points. * @param useVolReg If true, volume regularization is used. * @param maxLeafSize Maximum size of a leaf. * @param minLeafSize Minimum size of a leaf. */ double Grow(arma::mat& data, arma::Col& oldFromNew, const bool useVolReg = false, const size_t maxLeafSize = 10, const size_t minLeafSize = 5); /** * Perform alpha pruning on a tree. Returns the new value of alpha. * * @param oldAlpha Old value of alpha. * @param points Total number of points in dataset. * @param useVolReg If true, volume regularization is used. * @return New value of alpha. */ double PruneAndUpdate(const double oldAlpha, const size_t points, const bool useVolReg = false); /** * Compute the logarithm of the density estimate of a given query point. * * @param query Point to estimate density of. */ double ComputeValue(const arma::vec& query) const; /** * Print the tree in a depth-first manner (this function is called * recursively). * * @param fp File to write the tree to. * @param level Level of the tree (should start at 0). */ //void WriteTree(FILE *fp, const size_t level = 0) const; /** * Index the buckets for possible usage later; this results in every leaf in * the tree having a specific tag (accessible with BucketTag()). This * function calls itself recursively. * * @param tag Tag for the next leaf; leave at 0 for the initial call. */ int TagTree(const int tag = 0); /** * Return the tag of the leaf containing the query. This is useful for * generating class memberships. * * @param query Query to search for. */ int FindBucket(const arma::vec& query) const; /** * Compute the variable importance of each dimension in the learned tree. * * @param importances Vector to store the calculated importances in. */ void ComputeVariableImportance(arma::vec& importances) const; /** * Compute the log-negative-error for this point, given the total number of * points in the dataset. * * @param totalPoints Total number of points in the dataset. */ double LogNegativeError(const size_t totalPoints) const; /** * Return whether a query point is within the range of this node. */ bool WithinRange(const arma::vec& query) const; private: // The indices in the complete set of points // (after all forms of swapping in the original data // matrix to align all the points in a node // consecutively in the matrix. The 'old_from_new' array // maps the points back to their original indices. //! The index of the first point in the dataset contained in this node (and //! its children). size_t start; //! The index of the last point in the dataset contained in this node (and its //! children). size_t end; //! Upper half of bounding box for this node. arma::vec maxVals; //! Lower half of bounding box for this node. arma::vec minVals; //! The splitting dimension for this node. size_t splitDim; //! The split value on the splitting dimension for this node. double splitValue; //! log-negative-L2-error of the node. double logNegError; //! Sum of the error of the leaves of the subtree. double subtreeLeavesLogNegError; //! Number of leaves of the subtree. size_t subtreeLeaves; //! If true, this node is the root of the tree. bool root; //! Ratio of the number of points in the node to the total number of points. double ratio; //! The logarithm of the volume of the node. double logVolume; //! The tag for the leaf, used for hashing points. int bucketTag; //! Upper part of alpha sum; used for pruning. double alphaUpper; //! The left child. DTree* left; //! The right child. DTree* right; public: //! Return the starting index of points contained in this node. size_t Start() const { return start; } //! Return the first index of a point not contained in this node. size_t End() const { return end; } //! Return the split dimension of this node. size_t SplitDim() const { return splitDim; } //! Return the split value of this node. double SplitValue() const { return splitValue; } //! Return the log negative error of this node. double LogNegError() const { return logNegError; } //! Return the log negative error of all descendants of this node. double SubtreeLeavesLogNegError() const { return subtreeLeavesLogNegError; } //! Return the number of leaves which are descendants of this node. size_t SubtreeLeaves() const { return subtreeLeaves; } //! Return the ratio of points in this node to the points in the whole //! dataset. double Ratio() const { return ratio; } //! Return the inverse of the volume of this node. double LogVolume() const { return logVolume; } //! Return the left child. DTree* Left() const { return left; } //! Return the right child. DTree* Right() const { return right; } //! Return whether or not this is the root of the tree. bool Root() const { return root; } //! Return the upper part of the alpha sum. double AlphaUpper() const { return alphaUpper; } //! Return the maximum values. const arma::vec& MaxVals() const { return maxVals; } //! Modify the maximum values. arma::vec& MaxVals() { return maxVals; } //! Return the minimum values. const arma::vec& MinVals() const { return minVals; } //! Modify the minimum values. arma::vec& MinVals() { return minVals; } /** * Returns a string representation of this object. */ std::string ToString() const; private: // Utility methods. /** * Find the dimension to split on. */ bool FindSplit(const arma::mat& data, size_t& splitDim, double& splitValue, double& leftError, double& rightError, const size_t minLeafSize = 5) const; /** * Split the data, returning the number of points left of the split. */ size_t SplitData(arma::mat& data, const size_t splitDim, const double splitValue, arma::Col& oldFromNew) const; }; }; // namespace det }; // namespace mlpack #endif // __MLPACK_METHODS_DET_DTREE_HPP RcppMLPACK/src/mlpack/methods/det/dtree.cpp0000644000176200001440000005106613647514343020154 0ustar liggesusers /** * @file dtree.cpp * @author Parikshit Ram (pram@cc.gatech.edu) * * Implementations of some declared functions in * the Density Estimation Tree class. * * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "dtree.hpp" #include using namespace mlpack; using namespace det; DTree::DTree() : start(0), end(0), logNegError(-DBL_MAX), root(true), bucketTag(-1), left(NULL), right(NULL) { /* Nothing to do. */ } // Root node initializers DTree::DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t totalPoints) : start(0), end(totalPoints), maxVals(maxVals), minVals(minVals), logNegError(LogNegativeError(totalPoints)), root(true), bucketTag(-1), left(NULL), right(NULL) { /* Nothing to do. */ } DTree::DTree(arma::mat& data) : start(0), end(data.n_cols), left(NULL), right(NULL) { maxVals.set_size(data.n_rows); minVals.set_size(data.n_rows); // Initialize to first column; values will be overwritten if necessary. maxVals = data.col(0); minVals = data.col(0); // Loop over data to extract maximum and minimum values in each dimension. for (size_t i = 1; i < data.n_cols; ++i) { for (size_t j = 0; j < data.n_rows; ++j) { if (data(j, i) > maxVals[j]) maxVals[j] = data(j, i); if (data(j, i) < minVals[j]) minVals[j] = data(j, i); } } logNegError = LogNegativeError(data.n_cols); bucketTag = -1; root = true; } // Non-root node initializers DTree::DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t start, const size_t end, const double logNegError) : start(start), end(end), maxVals(maxVals), minVals(minVals), logNegError(logNegError), root(false), bucketTag(-1), left(NULL), right(NULL) { /* Nothing to do. */ } DTree::DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t totalPoints, const size_t start, const size_t end) : start(start), end(end), maxVals(maxVals), minVals(minVals), logNegError(LogNegativeError(totalPoints)), root(false), bucketTag(-1), left(NULL), right(NULL) { /* Nothing to do. */ } DTree::~DTree() { if (left != NULL) delete left; if (right != NULL) delete right; } // This function computes the log-l2-negative-error of a given node from the // formula R(t) = log(|t|^2 / (N^2 V_t)). double DTree::LogNegativeError(const size_t totalPoints) const { // log(-|t|^2 / (N^2 V_t)) = log(-1) + 2 log(|t|) - 2 log(N) - log(V_t). return 2 * std::log((double) (end - start)) - 2 * std::log((double) totalPoints) - arma::accu(arma::log(maxVals - minVals)); } // This function finds the best split with respect to the L2-error, by trying // all possible splits. The dataset is the full data set but the start and // end are used to obtain the point in this node. bool DTree::FindSplit(const arma::mat& data, size_t& splitDim, double& splitValue, double& leftError, double& rightError, const size_t minLeafSize) const { // Ensure the dimensionality of the data is the same as the dimensionality of // the bounding rectangle. assert(data.n_rows == maxVals.n_elem); assert(data.n_rows == minVals.n_elem); const size_t points = end - start; double minError = logNegError; bool splitFound = false; // Loop through each dimension. for (size_t dim = 0; dim < maxVals.n_elem; dim++) { // Have to deal with REAL, INTEGER, NOMINAL data differently, so we have to // think of how to do that... const double min = minVals[dim]; const double max = maxVals[dim]; // If there is nothing to split in this dimension, move on. if (max - min == 0.0) continue; // Skip to next dimension. // Initializing all the stuff for this dimension. bool dimSplitFound = false; // Take an error estimate for this dimension. double minDimError = std::pow(points, 2.0) / (max - min); double dimLeftError; double dimRightError; double dimSplitValue; // Find the log volume of all the other dimensions. double volumeWithoutDim = logVolume - std::log(max - min); // Get the values for the dimension. arma::rowvec dimVec = data.row(dim).subvec(start, end - 1); // Sort the values in ascending order. dimVec = arma::sort(dimVec); // Find the best split for this dimension. We need to figure out why // there are spikes if this minLeafSize is enforced here... for (size_t i = minLeafSize - 1; i < dimVec.n_elem - minLeafSize; ++i) { // This makes sense for real continuous data. This kinda corrupts the // data and estimation if the data is ordinal. const double split = (dimVec[i] + dimVec[i + 1]) / 2.0; if (split == dimVec[i]) continue; // We can't split here (two points are the same). // Another way of picking split is using this: // split = leftsplit; if ((split - min > 0.0) && (max - split > 0.0)) { // Ensure that the right node will have at least the minimum number of // points. //Log::Assert((points - i - 1) >= minLeafSize); // Now we have to see if the error will be reduced. Simple manipulation // of the error function gives us the condition we must satisfy: // |t_l|^2 / V_l + |t_r|^2 / V_r >= |t|^2 / (V_l + V_r) // and because the volume is only dependent on the dimension we are // splitting, we can assume V_l is just the range of the left and V_r is // just the range of the right. double negLeftError = std::pow(i + 1, 2.0) / (split - min); double negRightError = std::pow(points - i - 1, 2.0) / (max - split); // If this is better, take it. if ((negLeftError + negRightError) >= minDimError) { minDimError = negLeftError + negRightError; dimLeftError = negLeftError; dimRightError = negRightError; dimSplitValue = split; dimSplitFound = true; } } } double actualMinDimError = std::log(minDimError) - 2 * std::log((double) data.n_cols) - volumeWithoutDim; if ((actualMinDimError > minError) && dimSplitFound) { // Calculate actual error (in logspace) by adding terms back to our // estimate. minError = actualMinDimError; splitDim = dim; splitValue = dimSplitValue; leftError = std::log(dimLeftError) - 2 * std::log((double) data.n_cols) - volumeWithoutDim; rightError = std::log(dimRightError) - 2 * std::log((double) data.n_cols) - volumeWithoutDim; splitFound = true; } // end if better split found in this dimension. } return splitFound; } size_t DTree::SplitData(arma::mat& data, const size_t splitDim, const double splitValue, arma::Col& oldFromNew) const { // Swap all columns such that any columns with value in dimension splitDim // less than or equal to splitValue are on the left side, and all others are // on the right side. A similar sort to this is also performed in // BinarySpaceTree construction (its comments are more detailed). size_t left = start; size_t right = end - 1; for (;;) { while (data(splitDim, left) <= splitValue) ++left; while (data(splitDim, right) > splitValue) --right; if (left > right) break; data.swap_cols(left, right); // Store the mapping from old to new. const size_t tmp = oldFromNew[left]; oldFromNew[left] = oldFromNew[right]; oldFromNew[right] = tmp; } // This now refers to the first index of the "right" side. return left; } // Greedily expand the tree double DTree::Grow(arma::mat& data, arma::Col& oldFromNew, const bool useVolReg, const size_t maxLeafSize, const size_t minLeafSize) { //Log::Assert(data.n_rows == maxVals.n_elem); //Log::Assert(data.n_rows == minVals.n_elem); double leftG, rightG; // Compute points ratio. ratio = (double) (end - start) / (double) oldFromNew.n_elem; // Compute the log of the volume of the node. logVolume = 0; for (size_t i = 0; i < maxVals.n_elem; ++i) if (maxVals[i] - minVals[i] > 0.0) logVolume += std::log(maxVals[i] - minVals[i]); // Check if node is large enough to split. if ((size_t) (end - start) > maxLeafSize) { // Find the split. size_t dim; double splitValueTmp; double leftError, rightError; if (FindSplit(data, dim, splitValueTmp, leftError, rightError, minLeafSize)) { // Move the data around for the children to have points in a node lie // contiguously (to increase efficiency during the training). const size_t splitIndex = SplitData(data, dim, splitValueTmp, oldFromNew); // Make max and min vals for the children. arma::vec maxValsL(maxVals); arma::vec maxValsR(maxVals); arma::vec minValsL(minVals); arma::vec minValsR(minVals); maxValsL[dim] = splitValueTmp; minValsR[dim] = splitValueTmp; // Store split dim and split val in the node. splitValue = splitValueTmp; splitDim = dim; // Recursively grow the children. left = new DTree(maxValsL, minValsL, start, splitIndex, leftError); right = new DTree(maxValsR, minValsR, splitIndex, end, rightError); leftG = left->Grow(data, oldFromNew, useVolReg, maxLeafSize, minLeafSize); rightG = right->Grow(data, oldFromNew, useVolReg, maxLeafSize, minLeafSize); // Store values of R(T~) and |T~|. subtreeLeaves = left->SubtreeLeaves() + right->SubtreeLeaves(); // Find the log negative error of the subtree leaves. This is kind of an // odd one because we don't want to represent the error in non-log-space, // but we have to calculate log(E_l + E_r). So we multiply E_l and E_r by // V_t (remember E_l has an inverse relationship to the volume of the // nodes) and then subtract log(V_t) at the end of the whole expression. // As a result we do leave log-space, but the largest quantity we // represent is on the order of (V_t / V_i) where V_i is the smallest leaf // node below this node, which depends heavily on the depth of the tree. subtreeLeavesLogNegError = std::log( std::exp(logVolume + left->SubtreeLeavesLogNegError()) + std::exp(logVolume + right->SubtreeLeavesLogNegError())) - logVolume; } else { // No split found so make a leaf out of it. subtreeLeaves = 1; subtreeLeavesLogNegError = logNegError; } } else { // We can make this a leaf node. assert((size_t) (end - start) >= minLeafSize); subtreeLeaves = 1; subtreeLeavesLogNegError = logNegError; } // If this is a leaf, do not compute g_k(t); otherwise compute, store, and // propagate min(g_k(t_L), g_k(t_R), g_k(t)), unless t_L and/or t_R are // leaves. if (subtreeLeaves == 1) { return std::numeric_limits::max(); } else { const double range = maxVals[splitDim] - minVals[splitDim]; const double leftRatio = (splitValue - minVals[splitDim]) / range; const double rightRatio = (maxVals[splitDim] - splitValue) / range; const size_t leftPow = std::pow((double) (left->End() - left->Start()), 2); const size_t rightPow = std::pow((double) (right->End() - right->Start()), 2); const size_t thisPow = std::pow((double) (end - start), 2); double tmpAlphaSum = leftPow / leftRatio + rightPow / rightRatio - thisPow; if (left->SubtreeLeaves() > 1) { const double exponent = 2 * std::log((double) data.n_cols) + logVolume + left->AlphaUpper(); // Whether or not this will overflow is highly dependent on the depth of // the tree. tmpAlphaSum += std::exp(exponent); } if (right->SubtreeLeaves() > 1) { const double exponent = 2 * std::log((double) data.n_cols) + logVolume + right->AlphaUpper(); tmpAlphaSum += std::exp(exponent); } alphaUpper = std::log(tmpAlphaSum) - 2 * std::log((double) data.n_cols) - logVolume; double gT; if (useVolReg) { // This is wrong for now! gT = alphaUpper;// / (subtreeLeavesVTInv - vTInv); } else { gT = alphaUpper - std::log((double) (subtreeLeaves - 1)); } return std::min(gT, std::min(leftG, rightG)); } // We need to compute (c_t^2) * r_t for all subtree leaves; this is equal to // n_t ^ 2 / r_t * n ^ 2 = -error. Therefore the value we need is actually // -1.0 * subtreeLeavesError. } double DTree::PruneAndUpdate(const double oldAlpha, const size_t points, const bool useVolReg) { // Compute gT. if (subtreeLeaves == 1) // If we are a leaf... { return std::numeric_limits::max(); } else { // Compute gT value for node t. volatile double gT; if (useVolReg) gT = alphaUpper;// - std::log(subtreeLeavesVTInv - vTInv); else gT = alphaUpper - std::log((double) (subtreeLeaves - 1)); if (gT > oldAlpha) { // Go down the tree and update accordingly. Traverse the children. double leftG = left->PruneAndUpdate(oldAlpha, points, useVolReg); double rightG = right->PruneAndUpdate(oldAlpha, points, useVolReg); // Update values. subtreeLeaves = left->SubtreeLeaves() + right->SubtreeLeaves(); // Find the log negative error of the subtree leaves. This is kind of an // odd one because we don't want to represent the error in non-log-space, // but we have to calculate log(E_l + E_r). So we multiply E_l and E_r by // V_t (remember E_l has an inverse relationship to the volume of the // nodes) and then subtract log(V_t) at the end of the whole expression. // As a result we do leave log-space, but the largest quantity we // represent is on the order of (V_t / V_i) where V_i is the smallest leaf // node below this node, which depends heavily on the depth of the tree. subtreeLeavesLogNegError = std::log( std::exp(logVolume + left->SubtreeLeavesLogNegError()) + std::exp(logVolume + right->SubtreeLeavesLogNegError())) - logVolume; // Recalculate upper alpha. const double range = maxVals[splitDim] - minVals[splitDim]; const double leftRatio = (splitValue - minVals[splitDim]) / range; const double rightRatio = (maxVals[splitDim] - splitValue) / range; const size_t leftPow = std::pow((double) (left->End() - left->Start()), 2); const size_t rightPow = std::pow((double) (right->End() - right->Start()), 2); const size_t thisPow = std::pow((double) (end - start), 2); double tmpAlphaSum = leftPow / leftRatio + rightPow / rightRatio - thisPow; if (left->SubtreeLeaves() > 1) { const double exponent = 2 * std::log((double) points) + logVolume + left->AlphaUpper(); // Whether or not this will overflow is highly dependent on the depth of // the tree. tmpAlphaSum += std::exp(exponent); } if (right->SubtreeLeaves() > 1) { const double exponent = 2 * std::log((double) points) + logVolume + right->AlphaUpper(); tmpAlphaSum += std::exp(exponent); } alphaUpper = std::log(tmpAlphaSum) - 2 * std::log((double) points) - logVolume; // Update gT value. if (useVolReg) { // This is incorrect. gT = alphaUpper; // / (subtreeLeavesVTInv - vTInv); } else { gT = alphaUpper - std::log((double) (subtreeLeaves - 1)); } //Log::Assert(gT < std::numeric_limits::max()); return std::min((double) gT, std::min(leftG, rightG)); } else { // Prune this subtree. // First, make this node a leaf node. subtreeLeaves = 1; subtreeLeavesLogNegError = logNegError; delete left; delete right; left = NULL; right = NULL; // Pass information upward. return std::numeric_limits::max(); } } } // Check whether a given point is within the bounding box of this node (check // generally done at the root, so its the bounding box of the data). // // Future improvement: Open up the range with epsilons on both sides where // epsilon depends on the density near the boundary. bool DTree::WithinRange(const arma::vec& query) const { for (size_t i = 0; i < query.n_elem; ++i) if ((query[i] < minVals[i]) || (query[i] > maxVals[i])) return false; return true; } double DTree::ComputeValue(const arma::vec& query) const { //Log::Assert(query.n_elem == maxVals.n_elem); if (root == 1) // If we are the root... { // Check if the query is within range. if (!WithinRange(query)) return 0.0; } if (subtreeLeaves == 1) // If we are a leaf... { return std::exp(std::log(ratio) - logVolume); } else { if (query[splitDim] <= splitValue) { // If left subtree, go to left child. return left->ComputeValue(query); } else // If right subtree, go to right child { return right->ComputeValue(query); } } return 0.0; } /* void DTree::WriteTree(FILE *fp, const size_t level) const { if (subtreeLeaves > 1) { fprintf(fp, "\n"); for (size_t i = 0; i < level; ++i) fprintf(fp, "|\t"); fprintf(fp, "Var. %zu > %lg", splitDim, splitValue); right->WriteTree(fp, level + 1); fprintf(fp, "\n"); for (size_t i = 0; i < level; ++i) fprintf(fp, "|\t"); fprintf(fp, "Var. %zu <= %lg ", splitDim, splitValue); left->WriteTree(fp, level); } else // If we are a leaf... { fprintf(fp, ": f(x)=%lg", std::exp(std::log(ratio) - logVolume)); if (bucketTag != -1) fprintf(fp, " BT:%d", bucketTag); } } */ // Index the buckets for possible usage later. int DTree::TagTree(const int tag) { if (subtreeLeaves == 1) { // Only label leaves. bucketTag = tag; return (tag + 1); } else { return right->TagTree(left->TagTree(tag)); } } int DTree::FindBucket(const arma::vec& query) const { //Log::Assert(query.n_elem == maxVals.n_elem); if (subtreeLeaves == 1) // If we are a leaf... { return bucketTag; } else if (query[splitDim] <= splitValue) { // If left subtree, go to left child. return left->FindBucket(query); } else { // If right subtree, go to right child. return right->FindBucket(query); } } void DTree::ComputeVariableImportance(arma::vec& importances) const { // Clear and set to right size. importances.zeros(maxVals.n_elem); std::stack nodes; nodes.push(this); while(!nodes.empty()) { const DTree& curNode = *nodes.top(); nodes.pop(); if (curNode.subtreeLeaves == 1) continue; // Do nothing for leaves. // The way to do this entirely in log-space is (at this time) somewhat // unclear. So this risks overflow. importances[curNode.SplitDim()] += (-std::exp(curNode.LogNegError()) - (-std::exp(curNode.Left()->LogNegError()) + -std::exp(curNode.Right()->LogNegError()))); nodes.push(curNode.Left()); nodes.push(curNode.Right()); } } // Return string of object. std::string DTree::ToString() const { std::ostringstream convert; convert << "Density Estimation Tree [" << this << "]" << std::endl; convert << " Start Node Index: " << start <. */ #ifndef __MLPACK_METHODS_PCA_PCA_HPP #define __MLPACK_METHODS_PCA_PCA_HPP #include namespace mlpack { namespace pca { /** * This class implements principal components analysis (PCA). This is a common, * widely-used technique that is often used for either dimensionality reduction * or transforming data into a better basis. Further information on PCA can be * found in almost any statistics or machine learning textbook, and all over the * internet. */ class PCA { public: /** * Create the PCA object, specifying if the data should be scaled in each * dimension by standard deviation when PCA is performed. * * @param scaleData Whether or not to scale the data. */ PCA(const bool scaleData = false); /** * Apply Principal Component Analysis to the provided data set. It is safe to * pass the same matrix reference for both data and transformedData. * * @param data Data matrix. * @param transformedData Matrix to put results of PCA into. * @param eigval Vector to put eigenvalues into. * @param eigvec Matrix to put eigenvectors (loadings) into. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec) const; /** * Apply Principal Component Analysis to the provided data set. It is safe to * pass the same matrix reference for both data and transformedData. * * @param data Data matrix. * @param transformedData Matrix to store results of PCA in. * @param eigval Vector to put eigenvalues into. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigVal) const; /** * Use PCA for dimensionality reduction on the given dataset. This will save * the newDimension largest principal components of the data and remove the * rest. The parameter returned is the amount of variance of the data that is * retained; this is a value between 0 and 1. For instance, a value of 0.9 * indicates that 90% of the variance present in the data was retained. * * @param data Data matrix. * @param newDimension New dimension of the data. * @return Amount of the variance of the data retained (between 0 and 1). */ double Apply(arma::mat& data, const size_t newDimension) const; //! This overload is here to make sure int gets casted right to size_t. inline double Apply(arma::mat& data, const int newDimension) const { return Apply(data, size_t(newDimension)); } /** * Use PCA for dimensionality reduction on the given dataset. This will save * as many dimensions as necessary to retain at least the given amount of * variance (specified by parameter varRetained). The amount should be * between 0 and 1; if the amount is 0, then only 1 dimension will be * retained. If the amount is 1, then all dimensions will be retained. * * The method returns the actual amount of variance retained, which will * always be greater than or equal to the varRetained parameter. * * @param data Data matrix. * @param varRetained Lower bound on amount of variance to retain; should be * between 0 and 1. * @return Actual amount of variance retained (between 0 and 1). */ double Apply(arma::mat& data, const double varRetained) const; //! Get whether or not this PCA object will scale (by standard deviation) the //! data when PCA is performed. bool ScaleData() const { return scaleData; } //! Modify whether or not this PCA object will scale (by standard deviation) //! the data when PCA is performed. bool& ScaleData() { return scaleData; } // Returns a string representation of this object. std::string ToString() const; private: //! Whether or not the data will be scaled by standard deviation when PCA is //! performed. bool scaleData; }; // class PCA }; // namespace pca }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/pca/pca.cpp0000644000176200001440000001441213647514343017575 0ustar liggesusers/** * @file pca.cpp * @author Ajinkya Kale * * Implementation of PCA class to perform Principal Components Analysis on the * specified data set. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "pca.hpp" #include #include #include using namespace std; using namespace mlpack; using namespace mlpack::pca; PCA::PCA(const bool scaleData) : scaleData(scaleData) { } /** * Apply Principal Component Analysis to the provided data set. * * @param data - Data matrix * @param transformedData - Data with PCA applied * @param eigVal - contains eigen values in a column vector * @param coeff - PCA Loadings/Coeffs/EigenVectors */ void PCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigVal, arma::mat& coeff) const { //Timer::Start("pca"); // This matrix will store the right singular values; we do not need them. arma::mat v; // Center the data into a temporary matrix. arma::mat centeredData; math::Center(data, centeredData); if (scaleData) { // Scaling the data is when we reduce the variance of each dimension to 1. // We do this by dividing each dimension by its standard deviation. arma::vec stdDev = arma::stddev(centeredData, 0, 1 /* for each dimension */); // If there are any zeroes, make them very small. for (size_t i = 0; i < stdDev.n_elem; ++i) if (stdDev[i] == 0) stdDev[i] = 1e-50; centeredData /= arma::repmat(stdDev, 1, centeredData.n_cols); } // Do singular value decomposition. Use the economical singular value // decomposition if the columns are much larger than the rows. if (data.n_rows < data.n_cols) { // Do economical singular value decomposition and compute only the left // singular vectors. arma::svd_econ(coeff, eigVal, v, centeredData, 'l'); } else { arma::svd(coeff, eigVal, v, centeredData); } // Now we must square the singular values to get the eigenvalues. // In addition we must divide by the number of points, because the covariance // matrix is X * X' / (N - 1). eigVal %= eigVal / (data.n_cols - 1); // Project the samples to the principals. transformedData = arma::trans(coeff) * centeredData; //Timer::Stop("pca"); } /** * Apply Principal Component Analysis to the provided data set. * * @param data - Data matrix * @param transformedData - Data with PCA applied * @param eigVal - contains eigen values in a column vector */ void PCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigVal) const { arma::mat coeffs; Apply(data, transformedData, eigVal, coeffs); } /** * Use PCA for dimensionality reduction on the given dataset. This will save * the newDimension largest principal components of the data and remove the * rest. The parameter returned is the amount of variance of the data that is * retained; this is a value between 0 and 1. For instance, a value of 0.9 * indicates that 90% of the variance present in the data was retained. * * @param data Data matrix. * @param newDimension New dimension of the data. * @return Amount of the variance of the data retained (between 0 and 1). */ double PCA::Apply(arma::mat& data, const size_t newDimension) const { // Parameter validation. if (newDimension == 0) Rcpp::Rcout << "PCA::Apply(): newDimension (" << newDimension << ") cannot " << "be zero!" << endl; if (newDimension > data.n_rows) Rcpp::Rcout << "PCA::Apply(): newDimension (" << newDimension << ") cannot " << "be greater than the existing dimensionality of the data (" << data.n_rows << ")!" << endl; arma::mat coeffs; arma::vec eigVal; Apply(data, data, eigVal, coeffs); if (newDimension < coeffs.n_rows) // Drop unnecessary rows. data.shed_rows(newDimension, data.n_rows - 1); // Calculate the total amount of variance retained. return (sum(eigVal.subvec(0, newDimension - 1)) / sum(eigVal)); } /** * Use PCA for dimensionality reduction on the given dataset. This will save * as many dimensions as necessary to retain at least the given amount of * variance (specified by parameter varRetained). The amount should be * between 0 and 1; if the amount is 0, then only 1 dimension will be * retained. If the amount is 1, then all dimensions will be retained. * * The method returns the actual amount of variance retained, which will * always be greater than or equal to the varRetained parameter. */ double PCA::Apply(arma::mat& data, const double varRetained) const { // Parameter validation. if (varRetained < 0) Rcpp::Rcout << "PCA::Apply(): varRetained (" << varRetained << ") must be " << "greater than or equal to 0." << endl; if (varRetained > 1) Rcpp::Rcout << "PCA::Apply(): varRetained (" << varRetained << ") should be " << "less than or equal to 1." << endl; arma::mat coeffs; arma::vec eigVal; Apply(data, data, eigVal, coeffs); // Calculate the dimension we should keep. size_t newDimension = 0; double varSum = 0.0; eigVal /= arma::sum(eigVal); // Normalize eigenvalues. while ((varSum < varRetained) && (newDimension < eigVal.n_elem)) { varSum += eigVal[newDimension]; ++newDimension; } // varSum is the actual variance we will retain. if (newDimension < eigVal.n_elem) data.shed_rows(newDimension, data.n_rows - 1); return varSum; } // return a string of this object. std::string PCA::ToString() const { std::ostringstream convert; convert << "Principal Component Analysis [" << this << "]" << std::endl; if (scaleData) convert << " Scaling Data: TRUE" << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/methods/mvu/0000755000176200001440000000000013647514343016370 5ustar liggesusersRcppMLPACK/src/mlpack/methods/mvu/mvu.cpp0000644000176200001440000000741513647514343017712 0ustar liggesusers/** * @file mvu.cpp * @author Ryan Curtin * * Implementation of the MVU class and its auxiliary objective function class. * * Note: this implementation of MVU does not work. See #189. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "mvu.hpp" //#include #include #include using namespace mlpack; using namespace mlpack::mvu; using namespace mlpack::optimization; MVU::MVU(const arma::mat& data) : data(data) { // Nothing to do. } void MVU::Unfold(const size_t newDim, const size_t numNeighbors, arma::mat& outputData) { // First we have to choose the output point. We'll take a linear projection // of the data for now (this is probably not a good final solution). // outputData = trans(data.rows(0, newDim - 1)); // Following Nick's idea. outputData.randu(data.n_cols, newDim); // The number of constraints is the number of nearest neighbors plus one. LRSDP mvuSolver(numNeighbors * data.n_cols + 1, outputData); // Set up the objective. Because we are maximizing the trace of (R R^T), // we'll instead state it as min(-I_n * (R R^T)), meaning C() is -I_n. mvuSolver.C().eye(data.n_cols, data.n_cols); mvuSolver.C() *= -1; // Now set up each of the constraints. // The first constraint is trace(ones * R * R^T) = 0. mvuSolver.B()[0] = 0; mvuSolver.A()[0].ones(data.n_cols, data.n_cols); // All of our other constraints will be sparse except the first. So set that // vector of modes accordingly. mvuSolver.AModes().ones(); mvuSolver.AModes()[0] = 0; // Now all of the other constraints. We first have to run AllkNN to get the // list of nearest neighbors. arma::Mat neighbors; arma::mat distances; neighbor::AllkNN allknn(data); allknn.Search(numNeighbors, neighbors, distances); // Add each of the other constraints. They are sparse constraints: // Tr(A_ij K) = d_ij; // A_ij = zeros except for 1 at (i, i), (j, j); -1 at (i, j), (j, i). for (size_t i = 0; i < neighbors.n_cols; ++i) { for (size_t j = 0; j < numNeighbors; ++j) { // This is the index of the constraint. const size_t index = (i * numNeighbors) + j + 1; arma::mat& aRef = mvuSolver.A()[index]; aRef.set_size(3, 4); // A_ij(i, i) = 1. aRef(0, 0) = i; aRef(1, 0) = i; aRef(2, 0) = 1; // A_ij(i, j) = -1. aRef(0, 1) = i; aRef(1, 1) = neighbors(j, i); aRef(2, 1) = -1; // A_ij(j, i) = -1. aRef(0, 2) = neighbors(j, i); aRef(1, 2) = i; aRef(2, 2) = -1; // A_ij(j, j) = 1. aRef(0, 3) = neighbors(j, i); aRef(1, 3) = neighbors(j, i); aRef(2, 3) = 1; // The constraint b_ij is the distance between these two points. mvuSolver.B()[index] = distances(j, i); } } // Now on with the solving. double objective = mvuSolver.Optimize(outputData); Rcpp::Rcout << "Final objective is " << objective << "." << std::endl; // Revert to original data format. outputData = trans(outputData); } RcppMLPACK/src/mlpack/methods/mvu/mvu.hpp0000644000176200001440000000326413647512751017716 0ustar liggesusers/** * @file mvu.hpp * @author Ryan Curtin * * An implementation of Maximum Variance Unfolding. This file defines an MVU * class as well as a class representing the objective function (a semidefinite * program) which MVU seeks to minimize. Minimization is performed by the * Augmented Lagrangian optimizer (which in turn uses the L-BFGS optimizer). * * Note: this implementation of MVU does not work. See #189. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_MVU_MVU_HPP #define __MLPACK_METHODS_MVU_MVU_HPP #include namespace mlpack { namespace mvu { /** * The MVU class is meant to provide a good abstraction for users. The dataset * needs to be provided, as well as several parameters. * * - dataset * - new dimensionality */ class MVU { public: MVU(const arma::mat& dataIn); void Unfold(const size_t newDim, const size_t numNeighbors, arma::mat& outputCoordinates); private: const arma::mat& data; }; }; // namespace mvu }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/kmeans/0000755000176200001440000000000013647512751017040 5ustar liggesusersRcppMLPACK/src/mlpack/methods/kmeans/random_partition.hpp0000644000176200001440000000440513647512751023125 0ustar liggesusers/** * @file random_partition.hpp * @author Ryan Curtin * * Very simple partitioner which partitions the data randomly into the number of * desired clusters. Used as the default InitialPartitionPolicy for KMeans. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_RANDOM_PARTITION_HPP #define __MLPACK_METHODS_KMEANS_RANDOM_PARTITION_HPP #include namespace mlpack { namespace kmeans { /** * A very simple partitioner which partitions the data randomly into the number * of desired clusters. It has no parameters, and so an instance of the class * is not even necessary. */ class RandomPartition { public: //! Empty constructor, required by the InitialPartitionPolicy policy. RandomPartition() { } /** * Partition the given dataset into the given number of clusters. Assignments * are random, and the number of points in each cluster should be equal (or * approximately equal). * * @tparam MatType Type of data (arma::mat or arma::sp_mat). * @param data Dataset to partition. * @param clusters Number of clusters to split dataset into. * @param assignments Vector to store cluster assignments into. Values will * be between 0 and (clusters - 1). */ template inline static void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments) { // Implementation is so simple we'll put it here in the header file. assignments = arma::shuffle(arma::linspace >(0, (clusters - 1), data.n_cols)); } }; }; }; #endif RcppMLPACK/src/mlpack/methods/kmeans/kmeans_impl.hpp0000644000176200001440000003233313647512751022054 0ustar liggesusers/** * @file kmeans_impl.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * @author Ryan Curtin * * Implementation for the K-means method for getting an initial point. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "kmeans.hpp" #include #include #include #include namespace mlpack { namespace kmeans { /** * Construct the K-Means object. */ template KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy>:: KMeans(const size_t maxIterations, const double overclusteringFactor, const MetricType metric, const InitialPartitionPolicy partitioner, const EmptyClusterPolicy emptyClusterAction) : maxIterations(maxIterations), metric(metric), partitioner(partitioner), emptyClusterAction(emptyClusterAction) { // Validate overclustering factor. if (overclusteringFactor < 1.0) { Rcpp::Rcout << "KMeans::KMeans(): overclustering factor must be >= 1.0 (" << overclusteringFactor << " given). Setting factor to 1.0.\n"; this->overclusteringFactor = 1.0; } else { this->overclusteringFactor = overclusteringFactor; } } /** * Perform k-means clustering on the data, returning a list of cluster * assignments. This just forward to the other function, which returns the * centroids too. If this is properly inlined, there shouldn't be any * performance penalty whatsoever. */ template template inline void KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy>:: Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, const bool initialGuess) const { MatType centroids(data.n_rows, clusters); Cluster(data, clusters, assignments, centroids, initialGuess); } /** * Perform k-means clustering on the data, returning a list of cluster * assignments and the centroids of each cluster. */ template template void KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy>:: Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, MatType& centroids, const bool initialAssignmentGuess, const bool initialCentroidGuess) const { // Make sure we have more points than clusters. if (clusters > data.n_cols) Rcpp::Rcout << "KMeans::Cluster(): more clusters requested than points given." << std::endl; // Make sure our overclustering factor is valid. size_t actualClusters = size_t(overclusteringFactor * clusters); if (actualClusters > data.n_cols && overclusteringFactor != 1.0) { Rcpp::Rcout << "KMeans::Cluster(): overclustering factor is too large. No " << "overclustering will be done." << std::endl; actualClusters = clusters; } // Now, the initial assignments. First determine if they are necessary. if (initialAssignmentGuess) { if (assignments.n_elem != data.n_cols) Rcpp::Rcerr << "KMeans::Cluster(): initial cluster assignments (length " << assignments.n_elem << ") not the same size as the dataset (size " << data.n_cols << ")!" << std::endl; } else if (initialCentroidGuess) { if (centroids.n_cols != clusters) Rcpp::Rcerr << "KMeans::Cluster(): wrong number of initial cluster " << "centroids (" << centroids.n_cols << ", should be " << clusters << ")!" << std::endl; if (centroids.n_rows != data.n_rows) Rcpp::Rcerr << "KMeans::Cluster(): initial cluster centroids have wrong " << " dimensionality (" << centroids.n_rows << ", should be " << data.n_rows << ")!" << std::endl; // If there were no problems, construct the initial assignments from the // given centroids. assignments.set_size(data.n_cols); for (size_t i = 0; i < data.n_cols; ++i) { // Find the closest centroid to this point. double minDistance = std::numeric_limits::infinity(); size_t closestCluster = clusters; // Invalid value. for (size_t j = 0; j < clusters; j++) { double distance = metric.Evaluate(data.col(i), centroids.col(j)); if (distance < minDistance) { minDistance = distance; closestCluster = j; } } // Assign the point to the closest cluster that we found. assignments[i] = closestCluster; } } else { // Use the partitioner to come up with the partition assignments. partitioner.Cluster(data, actualClusters, assignments); } // Counts of points in each cluster. arma::Col counts(actualClusters); counts.zeros(); // Resize to correct size. centroids.set_size(data.n_rows, actualClusters); // Set counts correctly. for (size_t i = 0; i < assignments.n_elem; i++) counts[assignments[i]]++; size_t changedAssignments = 0; size_t iteration = 0; do { // Update step. // Calculate centroids based on given assignments. centroids.zeros(); for (size_t i = 0; i < data.n_cols; i++) centroids.col(assignments[i]) += data.col(i); for (size_t i = 0; i < actualClusters; i++) centroids.col(i) /= counts[i]; // Assignment step. // Find the closest centroid to each point. We will keep track of how many // assignments change. When no assignments change, we are done. changedAssignments = 0; for (size_t i = 0; i < data.n_cols; i++) { // Find the closest centroid to this point. double minDistance = std::numeric_limits::infinity(); size_t closestCluster = actualClusters; // Invalid value. for (size_t j = 0; j < actualClusters; j++) { double distance = metric.Evaluate(data.col(i), centroids.col(j)); if (distance < minDistance) { minDistance = distance; closestCluster = j; } } // Reassign this point to the closest cluster. if (assignments[i] != closestCluster) { // Update counts. counts[assignments[i]]--; counts[closestCluster]++; // Update assignment. assignments[i] = closestCluster; changedAssignments++; } } // If we are not allowing empty clusters, then check that all of our // clusters have points. for (size_t i = 0; i < actualClusters; i++) if (counts[i] == 0) changedAssignments += emptyClusterAction.EmptyCluster(data, i, centroids, counts, assignments); iteration++; } while (changedAssignments > 0 && iteration != maxIterations); if (iteration != maxIterations) { Rcpp::Rcout << "KMeans::Cluster(): converged after " << iteration << " iterations." << std::endl; } else { Rcpp::Rcout << "KMeans::Cluster(): terminated after limit of " << iteration << " iterations." << std::endl; // Recalculate final clusters. centroids.zeros(); for (size_t i = 0; i < data.n_cols; i++) centroids.col(assignments[i]) += data.col(i); for (size_t i = 0; i < actualClusters; i++) centroids.col(i) /= counts[i]; } // If we have overclustered, we need to merge the nearest clusters. if (actualClusters != clusters) { // Generate a list of all the clusters' distances from each other. This // list will become mangled and unused as the number of clusters decreases. size_t numDistances = ((actualClusters - 1) * actualClusters) / 2; size_t clustersLeft = actualClusters; arma::vec distances(numDistances); arma::Col firstCluster(numDistances); arma::Col secondCluster(numDistances); // Keep the mappings of clusters that we are changing. arma::Col mappings = arma::linspace >(0, actualClusters - 1, actualClusters); size_t i = 0; for (size_t first = 0; first < actualClusters; first++) { for (size_t second = first + 1; second < actualClusters; second++) { distances(i) = metric.Evaluate(centroids.col(first), centroids.col(second)); firstCluster(i) = first; secondCluster(i) = second; i++; } } while (clustersLeft != clusters) { arma::uword minIndex; distances.min(minIndex); // Now we merge the clusters which that distance belongs to. size_t first = firstCluster(minIndex); size_t second = secondCluster(minIndex); for (size_t j = 0; j < assignments.n_elem; j++) if (assignments(j) == second) assignments(j) = first; // Now merge the centroids. centroids.col(first) *= counts[first]; centroids.col(first) += (counts[second] * centroids.col(second)); centroids.col(first) /= (counts[first] + counts[second]); // Update the counts. counts[first] += counts[second]; counts[second] = 0; // Now update all the relevant distances. // First the distances where either cluster is the second cluster. for (size_t cluster = 0; cluster < second; cluster++) { // The offset is sum^n i - sum^(n - m) i, where n is actualClusters and // m is the cluster we are trying to offset to. size_t offset = (size_t) (((actualClusters - 1) * cluster) + (cluster - pow(cluster, 2.0)) / 2) - 1; // See if we need to update the distance from this cluster to the first // cluster. if (cluster < first) { // Make sure it isn't already DBL_MAX. if (distances(offset + (first - cluster)) != DBL_MAX) distances(offset + (first - cluster)) = metric.Evaluate( centroids.col(first), centroids.col(cluster)); } distances(offset + (second - cluster)) = DBL_MAX; } // Now the distances where the first cluster is the first cluster. size_t offset = (size_t) (((actualClusters - 1) * first) + (first - pow(first, 2.0)) / 2) - 1; for (size_t cluster = first + 1; cluster < actualClusters; cluster++) { // Make sure it isn't already DBL_MAX. if (distances(offset + (cluster - first)) != DBL_MAX) { distances(offset + (cluster - first)) = metric.Evaluate( centroids.col(first), centroids.col(cluster)); } } // Max the distance between the first and second clusters. distances(offset + (second - first)) = DBL_MAX; // Now max the distances for the second cluster (which no longer has // anything in it). offset = (size_t) (((actualClusters - 1) * second) + (second - pow(second, 2.0)) / 2) - 1; for (size_t cluster = second + 1; cluster < actualClusters; cluster++) distances(offset + (cluster - second)) = DBL_MAX; clustersLeft--; // Update the cluster mappings. mappings(second) = first; // Also update any mappings that were pointed at the previous cluster. for (size_t cluster = 0; cluster < actualClusters; cluster++) if (mappings(cluster) == second) mappings(cluster) = first; } // Now remap the mappings down to the smallest possible numbers. // Could this process be sped up? arma::Col remappings(actualClusters); remappings.fill(actualClusters); size_t remap = 0; // Counter variable. for (size_t cluster = 0; cluster < actualClusters; cluster++) { // If the mapping of the current cluster has not been assigned a value // yet, we will assign it a cluster number. if (remappings(mappings(cluster)) == actualClusters) { remappings(mappings(cluster)) = remap; remap++; } } // Fix the assignments using the mappings we created. for (size_t j = 0; j < assignments.n_elem; j++) assignments(j) = remappings(mappings(assignments(j))); } } template std::string KMeans::ToString() const { std::ostringstream convert; convert << "KMeans [" << this << "]" << std::endl; convert << " Overclustering Factor: " << overclusteringFactor <. */ #ifndef __MLPACK_METHODS_KMEANS_REFINED_START_IMPL_HPP #define __MLPACK_METHODS_KMEANS_REFINED_START_IMPL_HPP // In case it hasn't been included yet. #include "refined_start.hpp" namespace mlpack { namespace kmeans { //! Partition the given dataset according to Bradley and Fayyad's algorithm. template void RefinedStart::Cluster(const MatType& data, const size_t clusters, arma::Col& assignments) const { math::RandomSeed(std::time(NULL)); // This will hold the sampled datasets. const size_t numPoints = size_t(percentage * data.n_cols); MatType sampledData(data.n_rows, numPoints); // vector is packed so each bool is 1 bit. std::vector pointsUsed(data.n_cols, false); arma::mat sampledCentroids(data.n_rows, samplings * clusters); // We will use these objects repeatedly for clustering. arma::Col sampledAssignments; arma::mat centroids; KMeans<> kmeans; for (size_t i = 0; i < samplings; ++i) { // First, assemble the sampled dataset. size_t curSample = 0; while (curSample < numPoints) { // Pick a random point in [0, numPoints). size_t sample = (size_t) math::RandInt(data.n_cols); if (!pointsUsed[sample]) { // This point isn't used yet. So we'll put it in our sample. pointsUsed[sample] = true; sampledData.col(curSample) = data.col(sample); ++curSample; } } // Now, using the sampled dataset, run k-means. In the case of an empty // cluster, we re-initialize that cluster as the point furthest away from // the cluster with maximum variance. This is not *exactly* what the paper // implements, but it is quite similar, and we'll call it "good enough". kmeans.Cluster(sampledData, clusters, sampledAssignments, centroids); // Store the sampled centroids. sampledCentroids.cols(i * clusters, (i + 1) * clusters - 1) = centroids; pointsUsed.assign(data.n_cols, false); } // Now, we run k-means on the sampled centroids to get our final clusters. kmeans.Cluster(sampledCentroids, clusters, sampledAssignments, centroids); // Turn the final centroids into assignments. assignments.set_size(data.n_cols); for (size_t i = 0; i < data.n_cols; ++i) { // Find the closest centroid to this point. double minDistance = std::numeric_limits::infinity(); size_t closestCluster = clusters; for (size_t j = 0; j < clusters; ++j) { const double distance = kmeans.Metric().Evaluate(data.col(i), centroids.col(j)); if (distance < minDistance) { minDistance = distance; closestCluster = j; } } // Assign the point to its closest cluster. assignments[i] = closestCluster; } } }; // namespace kmeans }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/kmeans/max_variance_new_cluster.hpp0000644000176200001440000000463013647512751024623 0ustar liggesusers/** * @file max_variance_new_cluster.hpp * @author Ryan Curtin * * An implementation of the EmptyClusterPolicy policy class for K-Means. When * an empty cluster is detected, the point furthest from the centroid of the * cluster with maximum variance is taken to be a new cluster. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_HPP #define __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_HPP #include namespace mlpack { namespace kmeans { /** * When an empty cluster is detected, this class takes the point furthest from * the centroid of the cluster with maximum variance as a new cluster. */ class MaxVarianceNewCluster { public: //! Default constructor required by EmptyClusterPolicy. MaxVarianceNewCluster() { } /** * Take the point furthest from the centroid of the cluster with maximum * variance to be a new cluster. * * @tparam MatType Type of data (arma::mat or arma::sp_mat). * @param data Dataset on which clustering is being performed. * @param emptyCluster Index of cluster which is empty. * @param centroids Centroids of each cluster (one per column). * @param clusterCounts Number of points in each cluster. * @param assignments Cluster assignments of each point. * * @return Number of points changed. */ template static size_t EmptyCluster(const MatType& data, const size_t emptyCluster, const MatType& centroids, arma::Col& clusterCounts, arma::Col& assignments); }; }; // namespace kmeans }; // namespace mlpack // Include implementation. #include "max_variance_new_cluster_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/kmeans/kmeans.hpp0000644000176200001440000002107213647512751021031 0ustar liggesusers/** * @file kmeans.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * K-Means clustering. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_KMEANS_HPP #define __MLPACK_METHODS_KMEANS_KMEANS_HPP #include #include #include "random_partition.hpp" #include "max_variance_new_cluster.hpp" #include namespace mlpack { namespace kmeans /** K-Means clustering. */ { /** * This class implements K-Means clustering. This implementation supports * overclustering, which means that more clusters than are requested will be * found; then, those clusters will be merged together to produce the desired * number of clusters. * * Two template parameters can (optionally) be supplied: the policy for how to * find the initial partition of the data, and the actions to be taken when an * empty cluster is encountered, as well as the distance metric to be used. * * A simple example of how to run K-Means clustering is shown below. * * @code * extern arma::mat data; // Dataset we want to run K-Means on. * arma::Col assignments; // Cluster assignments. * * KMeans<> k; // Default options. * k.Cluster(data, 3, assignments); // 3 clusters. * * // Cluster using the Manhattan distance, 100 iterations maximum, and an * // overclustering factor of 4.0. * KMeans k(100, 4.0); * k.Cluster(data, 6, assignments); // 6 clusters. * @endcode * * @tparam MetricType The distance metric to use for this KMeans; see * metric::LMetric for an example. * @tparam InitialPartitionPolicy Initial partitioning policy; must implement a * default constructor and 'void Cluster(const arma::mat&, const size_t, * arma::Col&)'. * @tparam EmptyClusterPolicy Policy for what to do on an empty cluster; must * implement a default constructor and 'void EmptyCluster(const arma::mat&, * arma::Col class KMeans { public: /** * Create a K-Means object and (optionally) set the parameters which K-Means * will be run with. This implementation allows a few strategies to improve * the performance of K-Means, including "overclustering" and disallowing * empty clusters. * * The overclustering factor controls how many clusters are * actually found; for instance, with an overclustering factor of 4, if * K-Means is run to find 3 clusters, it will actually find 12, then merge the * nearest clusters until only 3 are left. * * @param maxIterations Maximum number of iterations allowed before giving up * (0 is valid, but the algorithm may never terminate). * @param overclusteringFactor Factor controlling how many extra clusters are * found and then merged to get the desired number of clusters. * @param metric Optional MetricType object; for when the metric has state * it needs to store. * @param partitioner Optional InitialPartitionPolicy object; for when a * specially initialized partitioning policy is required. * @param emptyClusterAction Optional EmptyClusterPolicy object; for when a * specially initialized empty cluster policy is required. */ KMeans(const size_t maxIterations = 1000, const double overclusteringFactor = 1.0, const MetricType metric = MetricType(), const InitialPartitionPolicy partitioner = InitialPartitionPolicy(), const EmptyClusterPolicy emptyClusterAction = EmptyClusterPolicy()); /** * Perform k-means clustering on the data, returning a list of cluster * assignments. Optionally, the vector of assignments can be set to an * initial guess of the cluster assignments; to do this, set initialGuess to * true. * * @tparam MatType Type of matrix (arma::mat or arma::sp_mat). * @param data Dataset to cluster. * @param clusters Number of clusters to compute. * @param assignments Vector to store cluster assignments in. * @param initialGuess If true, then it is assumed that assignments has a list * of initial cluster assignments. */ template void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, const bool initialGuess = false) const; /** * Perform k-means clustering on the data, returning a list of cluster * assignments and also the centroids of each cluster. Optionally, the vector * of assignments can be set to an initial guess of the cluster assignments; * to do this, set initialAssignmentGuess to true. Another way to set initial * cluster guesses is to fill the centroids matrix with the centroid guesses, * and then set initialCentroidGuess to true. initialAssignmentGuess * supersedes initialCentroidGuess, so if both are set to true, the * assignments vector is used. * * Note that if the overclustering factor is greater than 1, the centroids * matrix will be resized in the method. Regardless of the overclustering * factor, the centroid guess matrix (if initialCentroidGuess is set to true) * should have the same number of rows as the data matrix, and number of * columns equal to 'clusters'. * * @tparam MatType Type of matrix (arma::mat or arma::sp_mat). * @param data Dataset to cluster. * @param clusters Number of clusters to compute. * @param assignments Vector to store cluster assignments in. * @param centroids Matrix in which centroids are stored. * @param initialAssignmentGuess If true, then it is assumed that assignments * has a list of initial cluster assignments. * @param initialCentroidGuess If true, then it is assumed that centroids * contains the initial centroids of each cluster. */ template void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, MatType& centroids, const bool initialAssignmentGuess = false, const bool initialCentroidGuess = false) const; //! Return the overclustering factor. double OverclusteringFactor() const { return overclusteringFactor; } //! Set the overclustering factor. Must be greater than 1. double& OverclusteringFactor() { return overclusteringFactor; } //! Get the maximum number of iterations. size_t MaxIterations() const { return maxIterations; } //! Set the maximum number of iterations. size_t& MaxIterations() { return maxIterations; } //! Get the distance metric. const MetricType& Metric() const { return metric; } //! Modify the distance metric. MetricType& Metric() { return metric; } //! Get the initial partitioning policy. const InitialPartitionPolicy& Partitioner() const { return partitioner; } //! Modify the initial partitioning policy. InitialPartitionPolicy& Partitioner() { return partitioner; } //! Get the empty cluster policy. const EmptyClusterPolicy& EmptyClusterAction() const { return emptyClusterAction; } //! Modify the empty cluster policy. EmptyClusterPolicy& EmptyClusterAction() { return emptyClusterAction; } // Returns a string representation of this object. std::string ToString() const; private: //! Factor controlling how many clusters are actually found. double overclusteringFactor; //! Maximum number of iterations before giving up. size_t maxIterations; //! Instantiated distance metric. MetricType metric; //! Instantiated initial partitioning policy. InitialPartitionPolicy partitioner; //! Instantiated empty cluster policy. EmptyClusterPolicy emptyClusterAction; }; }; // namespace kmeans }; // namespace mlpack // Include implementation. #include "kmeans_impl.hpp" #endif // __MLPACK_METHODS_MOG_KMEANS_HPP RcppMLPACK/src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp0000644000176200001440000000661013647512751025644 0ustar liggesusers/** * @file max_variance_new_cluster_impl.hpp * @author Ryan Curtin * * Implementation of MaxVarianceNewCluster class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_IMPL_HPP #define __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_IMPL_HPP // Just in case it has not been included. #include "max_variance_new_cluster.hpp" namespace mlpack { namespace kmeans { /** * Take action about an empty cluster. */ template size_t MaxVarianceNewCluster::EmptyCluster(const MatType& data, const size_t emptyCluster, const MatType& centroids, arma::Col& clusterCounts, arma::Col& assignments) { // First, we need to find the cluster with maximum variance (by which I mean // the sum of the covariance matrices). arma::vec variances; variances.zeros(clusterCounts.n_elem); // Start with 0. // Add the variance of each point's distance away from the cluster. I think // this is the sensible thing to do. for (size_t i = 0; i < data.n_cols; ++i) { variances[assignments[i]] += metric::SquaredEuclideanDistance::Evaluate( data.col(i), centroids.col(assignments[i])); } // Divide by the number of points in the cluster to produce the variance. // Although a -nan will occur here for the empty cluster(s), this doesn't // matter because variances.max() won't pick it up. If the number of points // in the cluster is 1, we ensure that cluster is not selected by forcing the // variance to 0. for (size_t i = 0; i < clusterCounts.n_elem; ++i) variances[i] /= (clusterCounts[i] == 1) ? DBL_MAX : clusterCounts[i]; // Now find the cluster with maximum variance. arma::uword maxVarCluster; variances.max(maxVarCluster); // Now, inside this cluster, find the point which is furthest away. size_t furthestPoint = data.n_cols; double maxDistance = -DBL_MAX; for (size_t i = 0; i < data.n_cols; ++i) { if (assignments[i] == maxVarCluster) { double distance = arma::as_scalar( arma::var(data.col(i) - centroids.col(maxVarCluster))); if (distance > maxDistance) { maxDistance = distance; furthestPoint = i; } } } // Take that point and add it to the empty cluster. clusterCounts[maxVarCluster]--; clusterCounts[emptyCluster]++; assignments[furthestPoint] = emptyCluster; // Output some debugging information. Rcpp::Rcout << "Point " << furthestPoint << " assigned to empty cluster " << emptyCluster << ".\n"; return 1; // We only changed one point. } }; // namespace kmeans }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/kmeans/refined_start.hpp0000644000176200001440000000652313647512751022410 0ustar liggesusers/** * @file refined_start.hpp * @author Ryan Curtin * * An implementation of Bradley and Fayyad's "Refining Initial Points for * K-Means clustering". This class is meant to provide better initial points * for the k-means algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_REFINED_START_HPP #define __MLPACK_METHODS_KMEANS_REFINED_START_HPP #include namespace mlpack { namespace kmeans { /** * A refined approach for choosing initial points for k-means clustering. This * approach runs k-means several times on random subsets of the data, and then * clusters those solutions to select refined initial cluster assignments. It * is an implementation of the following paper: * * @inproceedings{bradley1998refining, * title={Refining initial points for k-means clustering}, * author={Bradley, Paul S and Fayyad, Usama M}, * booktitle={Proceedings of the Fifteenth International Conference on Machine * Learning (ICML 1998)}, * volume={66}, * year={1998} * } */ class RefinedStart { public: /** * Create the RefinedStart object, optionally specifying parameters for the * number of samplings to perform and the percentage of the dataset to use in * each sampling. */ RefinedStart(const size_t samplings = 100, const double percentage = 0.02) : samplings(samplings), percentage(percentage) { } /** * Partition the given dataset into the given number of clusters according to * the random sampling scheme outlined in Bradley and Fayyad's paper. * * @tparam MatType Type of data (arma::mat or arma::sp_mat). * @param data Dataset to partition. * @param clusters Number of clusters to split dataset into. * @param assignments Vector to store cluster assignments into. Values will * be between 0 and (clusters - 1). */ template void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments) const; //! Get the number of samplings that will be performed. size_t Samplings() const { return samplings; } //! Modify the number of samplings that will be performed. size_t& Samplings() { return samplings; } //! Get the percentage of the data used by each subsampling. double Percentage() const { return percentage; } //! Modify the percentage of the data used by each subsampling. double& Percentage() { return percentage; } private: //! The number of samplings to perform. size_t samplings; //! The percentage of the data to use for each subsampling. double percentage; }; }; // namespace kmeans }; // namespace mlpack // Include implementation. #include "refined_start_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/kmeans/allow_empty_clusters.hpp0000644000176200001440000000434313647512751024035 0ustar liggesusers/** * @file allow_empty_clusters.hpp * @author Ryan Curtin * * This very simple policy is used when K-Means is allowed to return empty * clusters. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_ALLOW_EMPTY_CLUSTERS_HPP #define __MLPACK_METHODS_KMEANS_ALLOW_EMPTY_CLUSTERS_HPP #include namespace mlpack { namespace kmeans { /** * Policy which allows K-Means to create empty clusters without any error being * reported. */ class AllowEmptyClusters { public: //! Default constructor required by EmptyClusterPolicy policy. AllowEmptyClusters() { } /** * This function does nothing. It is called by K-Means when K-Means detects * an empty cluster. * * @tparam MatType Type of data (arma::mat or arma::spmat). * @param data Dataset on which clustering is being performed. * @param emptyCluster Index of cluster which is empty. * @param centroids Centroids of each cluster (one per column). * @param clusterCounts Number of points in each cluster. * @param assignments Cluster assignments of each point. * * @return Number of points changed (0). */ template static size_t EmptyCluster(const MatType& /* data */, const size_t /* emptyCluster */, const MatType& /* centroids */, arma::Col& /* clusterCounts */, arma::Col& /* assignments */) { // Empty clusters are okay! Do nothing. return 0; } }; }; // namespace kmeans }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/perceptron/0000755000176200001440000000000013647512751017743 5ustar liggesusersRcppMLPACK/src/mlpack/methods/perceptron/perceptron_impl.hpp0000644000176200001440000001324013647512751023656 0ustar liggesusers/** * @file perceptron_impl.hpp * @author Udit Saxena * * Implementation of Perceptron class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_IMPL_HPP #define __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_IMPL_HPP #include "perceptron.hpp" namespace mlpack { namespace perceptron { /** * Constructor - constructs the perceptron. Or rather, builds the weightVectors * matrix, which is later used in Classification. * It adds a bias input vector of 1 to the input data to take care of the bias * weights. * * @param data Input, training data. * @param labels Labels of dataset. * @param iterations Maximum number of iterations for the perceptron learning * algorithm. */ template< typename LearnPolicy, typename WeightInitializationPolicy, typename MatType > Perceptron::Perceptron( const MatType& data, const arma::Row& labels, int iterations) { WeightInitializationPolicy WIP; WIP.Initialize(weightVectors, arma::max(labels) + 1, data.n_rows + 1); // Start training. classLabels = labels; trainData = data; // Insert a row of ones at the top of the training data set. MatType zOnes(1, data.n_cols); zOnes.fill(1); trainData.insert_rows(0, zOnes); iter = iterations; arma::rowvec D(data.n_cols); D.fill(1.0);// giving equal weight to all the points. Train(D); } /** * Classification function. After training, use the weightVectors matrix to * classify test, and put the predicted classes in predictedLabels. * * @param test testing data or data to classify. * @param predictedLabels vector to store the predicted classes after * classifying test */ template void Perceptron::Classify( const MatType& test, arma::Row& predictedLabels) { arma::mat tempLabelMat; arma::uword maxIndexRow, maxIndexCol; for (size_t i = 0; i < test.n_cols; i++) { tempLabelMat = weightVectors.submat(0, 1, weightVectors.n_rows - 1, weightVectors.n_cols - 1) * test.col(i) + weightVectors.col(0); tempLabelMat.max(maxIndexRow, maxIndexCol); predictedLabels(0, i) = maxIndexRow; } // predictedLabels.print("These are the labels predicted by the perceptron"); } /** * Alternate constructor which copies parameters from an already initiated * perceptron. * * @param other The other initiated Perceptron object from which we copy the * values from. * @param data The data on which to train this Perceptron object on. * @param D Weight vector to use while training. For boosting purposes. * @param labels The labels of data. */ template Perceptron::Perceptron( const Perceptron<>& other, MatType& data, const arma::rowvec& D, const arma::Row& labels) { classLabels = labels; trainData = data; iter = other.iter; // Insert a row of ones at the top of the training data set. MatType zOnes(1, data.n_cols); zOnes.fill(1); trainData.insert_rows(0, zOnes); WeightInitializationPolicy WIP; WIP.Initialize(weightVectors, arma::max(labels) + 1, data.n_rows + 1); Train(D); } /** * Training Function. It trains on trainData using the cost matrix D * * @param D Cost matrix. Stores the cost of mispredicting instances */ template< typename LearnPolicy, typename WeightInitializationPolicy, typename MatType > void Perceptron::Train( const arma::rowvec& D) { size_t j, i = 0; bool converged = false; size_t tempLabel; arma::uword maxIndexRow, maxIndexCol; arma::mat tempLabelMat; LearnPolicy LP; while ((i < iter) && (!converged)) { // This outer loop is for each iteration, and we use the 'converged' // variable for noting whether or not convergence has been reached. i++; converged = true; // Now this inner loop is for going through the dataset in each iteration. for (j = 0; j < trainData.n_cols; j++) { // Multiply for each variable and check whether the current weight vector // correctly classifies this. tempLabelMat = weightVectors * trainData.col(j); tempLabelMat.max(maxIndexRow, maxIndexCol); // Check whether prediction is correct. if (maxIndexRow != classLabels(0, j)) { // Due to incorrect prediction, convergence set to false. converged = false; tempLabel = classLabels(0, j); // Send maxIndexRow for knowing which weight to update, send j to know // the value of the vector to update it with. Send tempLabel to know // the correct class. LP.UpdateWeights(trainData, weightVectors, j, tempLabel, maxIndexRow, D); } } } } }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/perceptron/learning_policies/0000755000176200001440000000000013647512751023431 5ustar liggesusersRcppMLPACK/src/mlpack/methods/perceptron/learning_policies/simple_weight_update.hpp0000644000176200001440000000512013647512751030342 0ustar liggesusers/** * @file simple_weight_update.hpp * @author Udit Saxena * * Simple weight update rule for the perceptron. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHODS_PERCEPTRON_LEARNING_POLICIES_SIMPLE_WEIGHT_UPDATE_HPP #define _MLPACK_METHODS_PERCEPTRON_LEARNING_POLICIES_SIMPLE_WEIGHT_UPDATE_HPP #include /** * This class is used to update the weightVectors matrix according to the simple * update rule as discussed by Rosenblatt: * * if a vector x has been incorrectly classified by a weight w, * then w = w - x * and w'= w'+ x * * where w' is the weight vector which correctly classifies x. */ namespace mlpack { namespace perceptron { class SimpleWeightUpdate { public: /** * This function is called to update the weightVectors matrix. * It decreases the weights of the incorrectly classified class while * increasing the weight of the correct class it should have been classified to. * * @param trainData The training dataset. * @param weightVectors Matrix of weight vectors. * @param rowIndex Index of the row which has been incorrectly predicted. * @param labelIndex Index of the vector in trainData. * @param vectorIndex Index of the class which should have been predicted. * @param D Cost of mispredicting the labelIndex instance. */ void UpdateWeights(const arma::mat& trainData, arma::mat& weightVectors, const size_t labelIndex, const size_t vectorIndex, const size_t rowIndex, const arma::rowvec& D) { weightVectors.row(rowIndex) = weightVectors.row(rowIndex) - D(labelIndex) * trainData.col(labelIndex).t(); weightVectors.row(vectorIndex) = weightVectors.row(vectorIndex) + D(labelIndex) * trainData.col(labelIndex).t(); } }; }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/perceptron/initialization_methods/0000755000176200001440000000000013647512751024515 5ustar liggesusersRcppMLPACK/src/mlpack/methods/perceptron/initialization_methods/random_init.hpp0000644000176200001440000000271713647512751027540 0ustar liggesusers/** * @file random_init.hpp * @author Udit Saxena * * Random initialization for perceptron weights. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_RANDOM_INIT_HPP #define _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_RANDOM_INIT_HPP #include namespace mlpack { namespace perceptron { /** * This class is used to initialize weights for the weightVectors matrix in a * random manner. */ class RandomInitialization { public: RandomInitialization() { } inline static void Initialize(arma::mat& W, const size_t row, const size_t col) { W = arma::randu(row, col); } }; // class RandomInitialization }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/perceptron/initialization_methods/zero_init.hpp0000644000176200001440000000274413647512751027237 0ustar liggesusers/** * @file zero_init.hpp * @author Udit Saxena * * Implementation of ZeroInitialization policy for perceptrons. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_ZERO_INIT_HPP #define _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_ZERO_INIT_HPP #include namespace mlpack { namespace perceptron { /** * This class is used to initialize the matrix weightVectors to zero. */ class ZeroInitialization { public: ZeroInitialization() { } inline static void Initialize(arma::mat& W, const size_t row, const size_t col) { arma::mat tempWeights(row, col); tempWeights.fill(0.0); W = tempWeights; } }; // class ZeroInitialization }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/perceptron/perceptron.hpp0000644000176200001440000000716113647512751022642 0ustar liggesusers/** * @file perceptron.hpp * @author Udit Saxena * * Definition of Perceptron class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_HPP #define __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_HPP #include #include "initialization_methods/zero_init.hpp" #include "initialization_methods/random_init.hpp" #include "learning_policies/simple_weight_update.hpp" namespace mlpack { namespace perceptron { /** * This class implements a simple perceptron (i.e., a single layer neural * network). It converges if the supplied training dataset is linearly * separable. * * @tparam LearnPolicy Options of SimpleWeightUpdate and GradientDescent. * @tparam WeightInitializationPolicy Option of ZeroInitialization and * RandomInitialization. */ template class Perceptron { public: /** * Constructor - constructs the perceptron by building the weightVectors * matrix, which is later used in Classification. It adds a bias input vector * of 1 to the input data to take care of the bias weights. * * @param data Input, training data. * @param labels Labels of dataset. * @param iterations Maximum number of iterations for the perceptron learning * algorithm. */ Perceptron(const MatType& data, const arma::Row& labels, int iterations); /** * Classification function. After training, use the weightVectors matrix to * classify test, and put the predicted classes in predictedLabels. * * @param test Testing data or data to classify. * @param predictedLabels Vector to store the predicted classes after * classifying test. */ void Classify(const MatType& test, arma::Row& predictedLabels); /** * Alternate constructor which copies parameters from an already initiated * perceptron. * * @param other The other initiated Perceptron object from which we copy the * values from. * @param data The data on which to train this Perceptron object on. * @param D Weight vector to use while training. For boosting purposes. * @param labels The labels of data. */ Perceptron(const Perceptron<>& other, MatType& data, const arma::rowvec& D, const arma::Row& labels); private: //! To store the number of iterations size_t iter; //! Stores the class labels for the input data. arma::Row classLabels; //! Stores the weight vectors for each of the input class labels. arma::mat weightVectors; //! Stores the training data to be used later on in UpdateWeights. arma::mat trainData; /** * Training Function. It trains on trainData using the cost matrix D * * @param D Cost matrix. Stores the cost of mispredicting instances */ void Train(const arma::rowvec& D); }; } // namespace perceptron } // namespace mlpack #include "perceptron_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/decision_stump/0000755000176200001440000000000013647512751020607 5ustar liggesusersRcppMLPACK/src/mlpack/methods/decision_stump/decision_stump_impl.hpp0000644000176200001440000002772613647512751025404 0ustar liggesusers/** * @file decision_stump_impl.hpp * @author Udit Saxena * * Implementation of DecisionStump class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_IMPL_HPP #define __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_IMPL_HPP // In case it hasn't been included yet. #include "decision_stump.hpp" #include #include namespace mlpack { namespace decision_stump { /** * Constructor. Train on the provided data. Generate a decision stump from data. * * @param data Input, training data. * @param labels Labels of data. * @param classes Number of distinct classes in labels. * @param inpBucketSize Minimum size of bucket when splitting. */ template DecisionStump::DecisionStump(const MatType& data, const arma::Row& labels, const size_t classes, size_t inpBucketSize) { numClass = classes; bucketSize = inpBucketSize; // If classLabels are not all identical, proceed with training. size_t bestAtt = 0; double entropy; const double rootEntropy = CalculateEntropy( labels.subvec(0, labels.n_elem - 1)); double gain, bestGain = 0.0; for (size_t i = 0; i < data.n_rows; i++) { // Go through each attribute of the data. if (IsDistinct(data.row(i))) { // For each attribute with non-identical values, treat it as a potential // splitting attribute and calculate entropy if split on it. entropy = SetupSplitAttribute(data.row(i), labels); // Log::Debug << "Entropy for attribute " << i << " is " << entropy << ".\n"; gain = rootEntropy - entropy; // Find the attribute with the best entropy so that the gain is // maximized. // if (entropy < bestEntropy) // Instead of the above rule, we are maximizing gain, which was // what is returned from SetupSplitAttribute. if (gain < bestGain) { bestAtt = i; bestGain = gain; } } } splitAttribute = bestAtt; // Once the splitting column/attribute has been decided, train on it. TrainOnAtt(data.row(splitAttribute), labels); } /** * Classification function. After training, classify test, and put the predicted * classes in predictedLabels. * * @param test Testing data or data to classify. * @param predictedLabels Vector to store the predicted classes after * classifying test */ template void DecisionStump::Classify(const MatType& test, arma::Row& predictedLabels) { for (size_t i = 0; i < test.n_cols; i++) { // Determine which bin the test point falls into. // Assume first that it falls into the first bin, then proceed through the // bins until it is known which bin it falls into. size_t bin = 0; const double val = test(splitAttribute, i); while (bin < split.n_elem - 1) { if (val < split(bin + 1)) break; ++bin; } predictedLabels(i) = binLabels(bin); } } /** * * * * * */ template DecisionStump::DecisionStump(const DecisionStump<>& ds) { numClass = ds.numClass; splitAttribute = ds.splitAttribute; bucketSize = ds.bucketSize; split = ds.split; binLabels = ds.binLabels; } /** * * * * * * template DecisionStump::ModifyData(MatType& data, const arma::Row& D) */ /** * Sets up attribute as if it were splitting on it and finds entropy when * splitting on attribute. * * @param attribute A row from the training data, which might be a candidate for * the splitting attribute. */ template double DecisionStump::SetupSplitAttribute( const arma::rowvec& attribute, const arma::Row& labels) { size_t i, count, begin, end; double entropy = 0.0; // Sort the attribute in order to calculate splitting ranges. arma::rowvec sortedAtt = arma::sort(attribute); // Store the indices of the sorted attribute to build a vector of sorted // labels. This sort is stable. arma::uvec sortedIndexAtt = arma::stable_sort_index(attribute.t()); arma::Row sortedLabels(attribute.n_elem); sortedLabels.fill(0); for (i = 0; i < attribute.n_elem; i++) sortedLabels(i) = labels(sortedIndexAtt(i)); i = 0; count = 0; // This splits the sorted into buckets of size greater than or equal to // inpBucketSize. while (i < sortedLabels.n_elem) { count++; if (i == sortedLabels.n_elem - 1) { // If we're at the end, then don't worry about the bucket size; just take // this as the last bin. begin = i - count + 1; end = i; // Use ratioEl to calculate the ratio of elements in this split. const double ratioEl = ((double) (end - begin + 1) / sortedLabels.n_elem); entropy += ratioEl * CalculateEntropy( sortedLabels.subvec(begin, end)); i++; } else if (sortedLabels(i) != sortedLabels(i + 1)) { // If we're not at the last element of sortedLabels, then check whether // count is less than the current bucket size. if (count < bucketSize) { // If it is, then take the minimum bucket size anyways. // This is where the inpBucketSize comes into use. // This makes sure there isn't a bucket for every change in labels. begin = i - count + 1; end = begin + bucketSize - 1; if (end > sortedLabels.n_elem - 1) end = sortedLabels.n_elem - 1; } else { // If it is not, then take the bucket size as the value of count. begin = i - count + 1; end = i; } const double ratioEl = ((double) (end - begin + 1) / sortedLabels.n_elem); entropy += ratioEl * CalculateEntropy( sortedLabels.subvec(begin, end)); i = end + 1; count = 0; } else i++; } return entropy; } /** * After having decided the attribute on which to split, train on that * attribute. * * @param attribute Attribute is the attribute decided by the constructor on * which we now train the decision stump. */ template template void DecisionStump::TrainOnAtt(const arma::rowvec& attribute, const arma::Row& labels) { size_t i, count, begin, end; arma::rowvec sortedSplitAtt = arma::sort(attribute); arma::uvec sortedSplitIndexAtt = arma::stable_sort_index(attribute.t()); arma::Row sortedLabels(attribute.n_elem); sortedLabels.fill(0); arma::vec tempSplit; arma::Row tempLabel; for (i = 0; i < attribute.n_elem; i++) sortedLabels(i) = labels(sortedSplitIndexAtt(i)); arma::rowvec subCols; rType mostFreq; i = 0; count = 0; while (i < sortedLabels.n_elem) { count++; if (i == sortedLabels.n_elem - 1) { begin = i - count + 1; end = i; arma::rowvec zSubCols((sortedLabels.cols(begin, end)).n_elem); zSubCols.fill(0.0); subCols = sortedLabels.cols(begin, end) + zSubCols; mostFreq = CountMostFreq(subCols); split.resize(split.n_elem + 1); split(split.n_elem - 1) = sortedSplitAtt(begin); binLabels.resize(binLabels.n_elem + 1); binLabels(binLabels.n_elem - 1) = mostFreq; i++; } else if (sortedLabels(i) != sortedLabels(i + 1)) { if (count < bucketSize) { // Test for different values of bucketSize, especially extreme cases. begin = i - count + 1; end = begin + bucketSize - 1; if (end > sortedLabels.n_elem - 1) end = sortedLabels.n_elem - 1; } else { begin = i - count + 1; end = i; } arma::rowvec zSubCols((sortedLabels.cols(begin, end)).n_elem); zSubCols.fill(0.0); subCols = sortedLabels.cols(begin, end) + zSubCols; // Find the most frequent element in subCols so as to assign a label to // the bucket of subCols. mostFreq = CountMostFreq(subCols); split.resize(split.n_elem + 1); split(split.n_elem - 1) = sortedSplitAtt(begin); binLabels.resize(binLabels.n_elem + 1); binLabels(binLabels.n_elem - 1) = mostFreq; i = end + 1; count = 0; } else i++; } // Now trim the split matrix so that buckets one after the after which point // to the same classLabel are merged as one big bucket. MergeRanges(); } /** * After the "split" matrix has been set up, merge ranges with identical class * labels. */ template void DecisionStump::MergeRanges() { for (size_t i = 1; i < split.n_rows; i++) { if (binLabels(i) == binLabels(i - 1)) { // Remove this row, as it has the same label as the previous bucket. binLabels.shed_row(i); split.shed_row(i); // Go back to previous row. i--; } } } template template rType DecisionStump::CountMostFreq(const arma::Row& subCols) { // Sort subCols for easier processing. arma::Row sortCounts = arma::sort(subCols); rType element = sortCounts[0]; size_t count = 0, localCount = 0; if (sortCounts.n_elem == 1) return sortCounts[0]; // An O(n) loop which counts the most frequent element in sortCounts. for (size_t i = 0; i < sortCounts.n_elem; ++i) { if (i == sortCounts.n_elem - 1) { if (sortCounts(i - 1) == sortCounts(i)) { // element = sortCounts(i - 1); localCount++; } else if (localCount > count) count = localCount; } else if (sortCounts(i) != sortCounts(i + 1)) { localCount = 0; count++; } else { localCount++; if (localCount > count) { count = localCount; if (localCount == 1) element = sortCounts(i); } } } return element; } /** * Returns 1 if all the values of featureRow are not same. * * @param featureRow The attribute which is checked for identical values. */ template template int DecisionStump::IsDistinct(const arma::Row& featureRow) { rType val = featureRow(0); for (size_t i = 1; i < featureRow.n_elem; ++i) if (val != featureRow(i)) return 1; return 0; } /** * Calculate entropy of attribute. * * @param attribute The attribute for which we calculate the entropy. * @param labels Corresponding labels of the attribute. */ template template double DecisionStump::CalculateEntropy(arma::subview_row labels) { double entropy = 0.0; size_t j; arma::Row numElem(numClass); numElem.fill(0); // Populate numElem; they are used as helpers to calculate // entropy. for (j = 0; j < labels.n_elem; j++) numElem(labels(j))++; // The equation for entropy uses log2(), but log2() is from C99 and thus // Visual Studio will not have it. Therefore, we will use std::log(), and // then apply the change-of-base formula at the end of the calculation. for (j = 0; j < numClass; j++) { const double p1 = ((double) numElem(j) / labels.n_elem); entropy += (p1 == 0) ? 0 : p1 * std::log(p1); } return entropy / std::log(2.0); } }; // namespace decision_stump }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/decision_stump/decision_stump.hpp0000644000176200001440000001254613647512751024355 0ustar liggesusers/** * @file decision_stump.hpp * @author Udit Saxena * * Definition of decision stumps. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_HPP #define __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_HPP #include namespace mlpack { namespace decision_stump { /** * This class implements a decision stump. It constructs a single level * decision tree, i.e., a decision stump. It uses entropy to decide splitting * ranges. * * The stump is parameterized by a splitting attribute (the dimension on which * points are split), a vector of bin split values, and a vector of labels for * each bin. Bin i is specified by the range [split[i], split[i + 1]). The * last bin has range up to \infty (split[i + 1] does not exist in that case). * Points that are below the first bin will take the label of the first bin. * * @tparam MatType Type of matrix that is being used (sparse or dense). */ template class DecisionStump { public: /** * Constructor. Train on the provided data. Generate a decision stump from * data. * * @param data Input, training data. * @param labels Labels of training data. * @param classes Number of distinct classes in labels. * @param inpBucketSize Minimum size of bucket when splitting. */ DecisionStump(const MatType& data, const arma::Row& labels, const size_t classes, size_t inpBucketSize); /** * Classification function. After training, classify test, and put the * predicted classes in predictedLabels. * * @param test Testing data or data to classify. * @param predictedLabels Vector to store the predicted classes after * classifying test data. */ void Classify(const MatType& test, arma::Row& predictedLabels); /** * * * * */ DecisionStump(const DecisionStump<>& ds); /** * * * * * * ModifyData(MatType& data, const arma::Row& D); */ //! Access the splitting attribute. int SplitAttribute() const { return splitAttribute; } //! Modify the splitting attribute (be careful!). int& SplitAttribute() { return splitAttribute; } //! Access the splitting values. const arma::vec& Split() const { return split; } //! Modify the splitting values (be careful!). arma::vec& Split() { return split; } //! Access the labels for each split bin. const arma::Col BinLabels() const { return binLabels; } //! Modify the labels for each split bin (be careful!). arma::Col& BinLabels() { return binLabels; } private: //! Stores the number of classes. size_t numClass; //! Stores the value of the attribute on which to split. int splitAttribute; //! Size of bucket while determining splitting criterion. size_t bucketSize; //! Stores the splitting values after training. arma::vec split; //! Stores the labels for each splitting bin. arma::Col binLabels; /** * Sets up attribute as if it were splitting on it and finds entropy when * splitting on attribute. * * @param attribute A row from the training data, which might be a * candidate for the splitting attribute. */ double SetupSplitAttribute(const arma::rowvec& attribute, const arma::Row& labels); /** * After having decided the attribute on which to split, train on that * attribute. * * @param attribute attribute is the attribute decided by the constructor * on which we now train the decision stump. */ template void TrainOnAtt(const arma::rowvec& attribute, const arma::Row& labels); /** * After the "split" matrix has been set up, merge ranges with identical class * labels. */ void MergeRanges(); /** * Count the most frequently occurring element in subCols. * * @param subCols The vector in which to find the most frequently * occurring element. */ template rType CountMostFreq(const arma::Row& subCols); /** * Returns 1 if all the values of featureRow are not same. * * @param featureRow The attribute which is checked for identical values. */ template int IsDistinct(const arma::Row& featureRow); /** * Calculate the entropy of the given attribute. * * @param attribute The attribute of which we calculate the entropy. * @param labels Corresponding labels of the attribute. */ template double CalculateEntropy(arma::subview_row labels); }; }; // namespace decision_stump }; // namespace mlpack #include "decision_stump_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/kernel_pca/0000755000176200001440000000000013647512751017665 5ustar liggesusersRcppMLPACK/src/mlpack/methods/kernel_pca/kernel_rules/0000755000176200001440000000000013647512751022357 5ustar liggesusersRcppMLPACK/src/mlpack/methods/kernel_pca/kernel_rules/naive_method.hpp0000644000176200001440000000736413647512751025544 0ustar liggesusers/** * @file naive_method.hpp * @author Ajinkya Kale * * Use the naive method to construct the kernel matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_NAIVE_METHOD_HPP #define __MLPACK_METHODS_KERNEL_PCA_NAIVE_METHOD_HPP #include namespace mlpack { namespace kpca { template class NaiveKernelRule { public: public: /** * Construct the kernel matrix approximation using the nystroem method. * * @param data Input data points. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. * @param rank Rank to be used for matrix approximation. * @param kernel Kernel to be used for computation. */ static void ApplyKernelMatrix(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t /* unused */, KernelType kernel = KernelType()) { // Construct the kernel matrix. arma::mat kernelMatrix; // Resize the kernel matrix to the right size. kernelMatrix.set_size(data.n_cols, data.n_cols); // Note that we only need to calculate the upper triangular part of the // kernel matrix, since it is symmetric. This helps minimize the number of // kernel evaluations. for (size_t i = 0; i < data.n_cols; ++i) { for (size_t j = i; j < data.n_cols; ++j) { // Evaluate the kernel on these two points. kernelMatrix(i, j) = kernel.Evaluate(data.unsafe_col(i), data.unsafe_col(j)); } } // Copy to the lower triangular part of the matrix. for (size_t i = 1; i < data.n_cols; ++i) for (size_t j = 0; j < i; ++j) kernelMatrix(i, j) = kernelMatrix(j, i); // For PCA the data has to be centered, even if the data is centered. But it // is not guaranteed that the data, when mapped to the kernel space, is also // centered. Since we actually never work in the feature space we cannot // center the data. So, we perform a "psuedo-centering" using the kernel // matrix. arma::rowvec rowMean = arma::sum(kernelMatrix, 0) / kernelMatrix.n_cols; kernelMatrix.each_col() -= arma::sum(kernelMatrix, 1) / kernelMatrix.n_cols; kernelMatrix.each_row() -= rowMean; kernelMatrix += arma::sum(rowMean) / kernelMatrix.n_cols; // Eigendecompose the centered kernel matrix. arma::eig_sym(eigval, eigvec, kernelMatrix); // Swap the eigenvalues since they are ordered backwards (we need largest to // smallest). for (size_t i = 0; i < floor(eigval.n_elem / 2.0); ++i) eigval.swap_rows(i, (eigval.n_elem - 1) - i); // Flip the coefficients to produce the same effect. eigvec = arma::fliplr(eigvec); transformedData = eigvec.t() * kernelMatrix; } }; }; // namespace kpca }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/kernel_pca/kernel_rules/nystroem_method.hpp0000644000176200001440000000620313647512751026311 0ustar liggesusers/** * @file nystroem_method.hpp * @author Marcus Edel * * Use the Nystroem method for approximating a kernel matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_NYSTROEM_METHOD_HPP #define __MLPACK_METHODS_KERNEL_PCA_NYSTROEM_METHOD_HPP #include #include #include namespace mlpack { namespace kpca { template< typename KernelType, typename PointSelectionPolicy = kernel::KMeansSelection<> > class NystroemKernelRule { public: /** * Construct the kernel matrix approximation using the nystroem method. * * @param data Input data points. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. * @param rank Rank to be used for matrix approximation. * @param kernel Kernel to be used for computation. */ static void ApplyKernelMatrix(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t rank, KernelType kernel = KernelType()) { arma::mat G, v; kernel::NystroemMethod nm(data, kernel, rank); nm.Apply(G); transformedData = G.t() * G; // For PCA the data has to be centered, even if the data is centered. But // it is not guaranteed that the data, when mapped to the kernel space, is // also centered. Since we actually never work in the feature space we // cannot center the data. So, we perform a "psuedo-centering" using the // kernel matrix. arma::rowvec rowMean = arma::sum(transformedData, 0) / transformedData.n_cols; transformedData.each_col() -= arma::sum(transformedData, 1) / transformedData.n_cols; transformedData.each_row() -= rowMean; transformedData += arma::sum(rowMean) / transformedData.n_cols; // Eigendecompose the centered kernel matrix. arma::svd(eigvec, eigval, v, transformedData); eigval %= eigval / (data.n_cols - 1); transformedData = eigvec.t() * G.t(); } }; }; // namespace kpca }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/kernel_pca/kernel_pca.hpp0000644000176200001440000001240313647512751022501 0ustar liggesusers/** * @file kernel_pca.hpp * @author Ajinkya Kale * @author Marcus Edel * * Defines the KernelPCA class to perform Kernel Principal Components Analysis * on the specified data set. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_HPP #define __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_HPP #include #include namespace mlpack { namespace kpca { /** * This class performs kernel principal components analysis (Kernel PCA), for a * given kernel. This is a standard machine learning technique and is * well-documented on the Internet and in standard texts. It is often used as a * dimensionality reduction technique, and can also be useful in mapping * linearly inseparable classes of points to different spaces where they are * linearly separable. * * The performance of the method is highly dependent on the kernel choice. * There are numerous available kernels in the mlpack::kernel namespace (see * files in mlpack/core/kernels/) and it is easy to write your own; see other * implementations for examples. */ template < typename KernelType, typename KernelRule = NaiveKernelRule > class KernelPCA { public: /** * Construct the KernelPCA object, optionally passing a kernel. Optionally, * the transformed data can be centered about the origin; to do this, pass * 'true' for centerTransformedData. This will take slightly longer (but not * much). * * @param kernel Kernel to be used for computation. * @param centerTransformedData Center transformed data. */ KernelPCA(const KernelType kernel = KernelType(), const bool centerTransformedData = false); /** * Apply Kernel Principal Components Analysis to the provided data set. * * @param data Data matrix. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. * @param newDimension New dimension for the dataset. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t newDimension); /** * Apply Kernel Principal Components Analysis to the provided data set. * * @param data Data matrix. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec); /** * Apply Kernel Principal Component Analysis to the provided data set. * * @param data Data matrix. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval); /** * Apply dimensionality reduction using Kernel Principal Component Analysis * to the provided data set. The data matrix will be modified in-place. Note * that the dimension can be larger than the existing dimension because KPCA * works on the kernel matrix, not the covariance matrix. This means the new * dimension can be as large as the number of points (columns) in the dataset. * Note that if you specify newDimension to be larger than the current * dimension of the data (the number of rows), then it's not really * "dimensionality reduction"... * * @param data Data matrix. * @param newDimension New dimension for the dataset. */ void Apply(arma::mat& data, const size_t newDimension); //! Get the kernel. const KernelType& Kernel() const { return kernel; } //! Modify the kernel. KernelType& Kernel() { return kernel; } //! Return whether or not the transformed data is centered. bool CenterTransformedData() const { return centerTransformedData; } //! Return whether or not the transformed data is centered. bool& CenterTransformedData() { return centerTransformedData; } // Returns a string representation of this object. std::string ToString() const; private: //! The instantiated kernel. KernelType kernel; //! If true, the data will be scaled (by standard deviation) when Apply() is //! run. bool centerTransformedData; }; // class KernelPCA }; // namespace kpca }; // namespace mlpack // Include implementation. #include "kernel_pca_impl.hpp" #endif // __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_HPP RcppMLPACK/src/mlpack/methods/kernel_pca/kernel_pca_impl.hpp0000644000176200001440000000766513647512751023540 0ustar liggesusers/** * @file kernel_pca_impl.hpp * @author Ajinkya Kale * @author Marcus Edel * * Implementation of Kernel PCA class to perform Kernel Principal Components * Analysis on the specified data set. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_IMPL_HPP #define __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_IMPL_HPP // In case it hasn't already been included. #include "kernel_pca.hpp" namespace mlpack { namespace kpca { template KernelPCA::KernelPCA(const KernelType kernel, const bool centerTransformedData) : kernel(kernel), centerTransformedData(centerTransformedData) { } //! Apply Kernel Principal Component Analysis to the provided data set. template void KernelPCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t newDimension) { KernelRule::ApplyKernelMatrix(data, transformedData, eigval, eigvec, newDimension, kernel); // Center the transformed data, if the user asked for it. if (centerTransformedData) { arma::colvec transformedDataMean = arma::mean(transformedData, 1); transformedData = transformedData - (transformedDataMean * arma::ones(transformedData.n_cols)); } } //! Apply Kernel Principal Component Analysis to the provided data set. template void KernelPCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec) { Apply(data, transformedData, eigval, eigvec, data.n_cols); } //! Apply Kernel Principal Component Analysis to the provided data set. template void KernelPCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigVal) { arma::mat coeffs; Apply(data, transformedData, eigVal, coeffs, data.n_cols); } //! Use KPCA for dimensionality reduction. template void KernelPCA::Apply(arma::mat& data, const size_t newDimension) { arma::mat coeffs; arma::vec eigVal; Apply(data, data, eigVal, coeffs, newDimension); if (newDimension < coeffs.n_rows && newDimension > 0) data.shed_rows(newDimension, data.n_rows - 1); } //! Returns a string representation of the object. template std::string KernelPCA::ToString() const { std::ostringstream convert; convert << "KernelPCA [" << this << "]" << std::endl; convert << " Center Transformed: " << centerTransformedData <. */ #ifndef __MLPACK_METHODS_SPARSE_CODING_RANDOM_INITIALIZER_HPP #define __MLPACK_METHODS_SPARSE_CODING_RANDOM_INITIALIZER_HPP #include namespace mlpack { namespace sparse_coding { /** * A DictionaryInitializer for use with the SparseCoding class. This provides a * random, normally distributed dictionary, such that each atom has a norm of 1. */ class RandomInitializer { public: /** * Initialize the dictionary randomly from a normal distribution, such that * each atom has a norm of 1. This is simple enough to be included with the * definition. * * @param data Dataset to use for initialization. * @param atoms Number of atoms (columns) in the dictionary. * @param dictionary Dictionary to initialize. */ static void Initialize(const arma::mat& data, const size_t atoms, arma::mat& dictionary) { // Create random dictionary. dictionary.randn(data.n_rows, atoms); // Normalize each atom. for (size_t j = 0; j < atoms; ++j) dictionary.col(j) /= norm(dictionary.col(j), 2); } }; }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/sparse_coding/sparse_coding_impl.hpp0000644000176200001440000002675613647512751024774 0ustar liggesusers/** * @file sparse_coding_impl.hpp * @author Nishant Mehta * * Implementation of Sparse Coding with Dictionary Learning using l1 (LASSO) or * l1+l2 (Elastic Net) regularization. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_IMPL_HPP #define __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_IMPL_HPP // In case it hasn't already been included. #include "sparse_coding.hpp" namespace mlpack { namespace sparse_coding { template SparseCoding::SparseCoding(const arma::mat& data, const size_t atoms, const double lambda1, const double lambda2) : atoms(atoms), data(data), codes(atoms, data.n_cols), lambda1(lambda1), lambda2(lambda2) { // Initialize the dictionary. DictionaryInitializer::Initialize(data, atoms, dictionary); } template void SparseCoding::Encode(const size_t maxIterations, const double objTolerance, const double newtonTolerance) { //Timer::Start("sparse_coding"); double lastObjVal = DBL_MAX; // Take the initial coding step, which has to happen before entering the main // optimization loop. Rcpp::Rcout << "Initial Coding Step." << std::endl; OptimizeCode(); arma::uvec adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double) (adjacencies.n_elem)) / ((double) (atoms * data.n_cols)) << "%." << std::endl; Rcpp::Rcout << " Objective value: " << Objective() << "." << std::endl; for (size_t t = 1; t != maxIterations; ++t) { // Print current iteration, and maximum number of iterations (if it isn't // 0). Rcpp::Rcout << "Iteration " << t; if (maxIterations != 0) Rcpp::Rcout << " of " << maxIterations; Rcpp::Rcout << "." << std::endl; // First step: optimize the dictionary. Rcpp::Rcout << "Performing dictionary step... " << std::endl; OptimizeDictionary(adjacencies, newtonTolerance); Rcpp::Rcout << " Objective value: " << Objective() << "." << std::endl; // Second step: perform the coding. Rcpp::Rcout << "Performing coding step..." << std::endl; OptimizeCode(); // Get the indices of all the nonzero elements in the codes. adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double) (adjacencies.n_elem)) / ((double) (atoms * data.n_cols)) << "%." << std::endl; // Find the new objective value and improvement so we can check for // convergence. double curObjVal = Objective(); double improvement = lastObjVal - curObjVal; Rcpp::Rcout << " Objective value: " << curObjVal << " (improvement " << std::scientific << improvement << ")." << std::endl; // Have we converged? if (improvement < objTolerance) { Rcpp::Rcout << "Converged within tolerance " << objTolerance << ".\n"; break; } lastObjVal = curObjVal; } //Timer::Stop("sparse_coding"); } template void SparseCoding::OptimizeCode() { // When using the Cholesky version of LARS, this is correct even if // lambda2 > 0. arma::mat matGram = trans(dictionary) * dictionary; for (size_t i = 0; i < data.n_cols; ++i) { // Report progress. if ((i % 100) == 0) Rcpp::Rcout << "Optimization at point " << i << "." << std::endl; bool useCholesky = true; regression::LARS lars(useCholesky, matGram, lambda1, lambda2); // Create an alias of the code (using the same memory), and then LARS will // place the result directly into that; then we will not need to have an // extra copy. arma::vec code = codes.unsafe_col(i); lars.Regress(dictionary, data.unsafe_col(i), code, false); } } // Dictionary step for optimization. template double SparseCoding::OptimizeDictionary( const arma::uvec& adjacencies, const double newtonTolerance) { // Count the number of atomic neighbors for each point x^i. arma::uvec neighborCounts = arma::zeros(data.n_cols, 1); if (adjacencies.n_elem > 0) { // This gets the column index. Intentional integer division. size_t curPointInd = (size_t) (adjacencies(0) / atoms); size_t nextColIndex = (curPointInd + 1) * atoms; for (size_t l = 1; l < adjacencies.n_elem; ++l) { // If l no longer refers to an element in this column, advance the column // number accordingly. if (adjacencies(l) >= nextColIndex) { curPointInd = (size_t) (adjacencies(l) / atoms); nextColIndex = (curPointInd + 1) * atoms; } ++neighborCounts(curPointInd); } } // Handle the case of inactive atoms (atoms not used in the given coding). std::vector inactiveAtoms; for (size_t j = 0; j < atoms; ++j) { if (accu(codes.row(j) != 0) == 0) inactiveAtoms.push_back(j); } const size_t nInactiveAtoms = inactiveAtoms.size(); const size_t nActiveAtoms = atoms - nInactiveAtoms; // Efficient construction of Z restricted to active atoms. arma::mat matActiveZ; if (nInactiveAtoms > 0) { math::RemoveRows(codes, inactiveAtoms, matActiveZ); } if (nInactiveAtoms > 0) { Rcpp::Rcout << "There are " << nInactiveAtoms << " inactive atoms. They will be re-initialized randomly.\n"; } Rcpp::Rcout << "Solving Dual via Newton's Method.\n"; // Solve using Newton's method in the dual - note that the final dot // multiplication with inv(A) seems to be unavoidable. Although more // expensive, the code written this way (we use solve()) should be more // numerically stable than just using inv(A) for everything. arma::vec dualVars = arma::zeros(nActiveAtoms); //vec dualVars = 1e-14 * ones(nActiveAtoms); // Method used by feature sign code - fails miserably here. Perhaps the // MATLAB optimizer fmincon does something clever? //vec dualVars = 10.0 * randu(nActiveAtoms, 1); //vec dualVars = diagvec(solve(dictionary, data * trans(codes)) // - codes * trans(codes)); //for (size_t i = 0; i < dualVars.n_elem; i++) // if (dualVars(i) < 0) // dualVars(i) = 0; bool converged = false; // If we have any inactive atoms, we must construct these differently. arma::mat codesXT; arma::mat codesZT; if (inactiveAtoms.empty()) { codesXT = codes * trans(data); codesZT = codes * trans(codes); } else { codesXT = matActiveZ * trans(data); codesZT = matActiveZ * trans(matActiveZ); } double normGradient; double improvement; for (size_t t = 1; !converged; ++t) { arma::mat A = codesZT + diagmat(dualVars); arma::mat matAInvZXT = solve(A, codesXT); arma::vec gradient = -arma::sum(arma::square(matAInvZXT), 1); gradient += 1; arma::mat hessian = -(-2 * (matAInvZXT * trans(matAInvZXT)) % inv(A)); arma::vec searchDirection = -solve(hessian, gradient); //printf("%e\n", norm(searchDirection, 2)); // Armijo line search. const double c = 1e-4; double alpha = 1.0; const double rho = 0.9; double sufficientDecrease = c * dot(gradient, searchDirection); while (true) { // Calculate objective. double sumDualVars = sum(dualVars); double fOld = -(-trace(trans(codesXT) * matAInvZXT) - sumDualVars); double fNew = -(-trace(trans(codesXT) * solve(codesZT + diagmat(dualVars + alpha * searchDirection), codesXT)) - (sumDualVars + alpha * sum(searchDirection))); if (fNew <= fOld + alpha * sufficientDecrease) { searchDirection = alpha * searchDirection; improvement = fOld - fNew; break; } alpha *= rho; } // Take step and print useful information. dualVars += searchDirection; normGradient = norm(gradient, 2); Rcpp::Rcout << "Newton Method iteration " << t << ":" << std::endl; Rcpp::Rcout << " Gradient norm: " << std::scientific << normGradient << "." << std::endl; Rcpp::Rcout << " Improvement: " << std::scientific << improvement << ".\n"; if (improvement < newtonTolerance) converged = true; } if (inactiveAtoms.empty()) { // Directly update dictionary. dictionary = trans(solve(codesZT + diagmat(dualVars), codesXT)); } else { arma::mat activeDictionary = trans(solve(codesZT + diagmat(dualVars), codesXT)); // Update all atoms. size_t currentInactiveIndex = 0; for (size_t i = 0; i < atoms; ++i) { if (inactiveAtoms[currentInactiveIndex] == i) { // This atom is inactive. Reinitialize it randomly. dictionary.col(i) = (data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols))); dictionary.col(i) /= norm(dictionary.col(i), 2); // Increment inactive index counter. ++currentInactiveIndex; } else { // Update estimate. dictionary.col(i) = activeDictionary.col(i - currentInactiveIndex); } } } //printf("final reconstruction error: %e\n", norm(data - dictionary * codes, "fro")); return normGradient; } // Project each atom of the dictionary back into the unit ball (if necessary). template void SparseCoding::ProjectDictionary() { for (size_t j = 0; j < atoms; j++) { double atomNorm = norm(dictionary.col(j), 2); if (atomNorm > 1) { Rcpp::Rcout << "Norm of atom " << j << " exceeds 1 (" << std::scientific << atomNorm << "). Shrinking...\n"; dictionary.col(j) /= atomNorm; } } } // Compute the objective function. template double SparseCoding::Objective() const { double l11NormZ = sum(sum(abs(codes))); double froNormResidual = norm(data - (dictionary * codes), "fro"); if (lambda2 > 0) { double froNormZ = norm(codes, "fro"); return 0.5 * (std::pow(froNormResidual, 2.0) + (lambda2 * std::pow(froNormZ, 2.0))) + (lambda1 * l11NormZ); } else // It can be simpler. { return 0.5 * std::pow(froNormResidual, 2.0) + lambda1 * l11NormZ; } } template std::string SparseCoding::ToString() const { std::ostringstream convert; convert << "Sparse Coding [" << this << "]" << std::endl; convert << " Data: " << data.n_rows << "x" ; convert << data.n_cols << std::endl; convert << " Atoms: " << atoms << std::endl; convert << " Lambda 1: " << lambda1 << std::endl; convert << " Lambda 2: " << lambda2 << std::endl; return convert.str(); } }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/sparse_coding/sparse_coding.hpp0000644000176200001440000001662513647512751023745 0ustar liggesusers/** * @file sparse_coding.hpp * @author Nishant Mehta * * Definition of the SparseCoding class, which performs L1 (LASSO) or * L1+L2 (Elastic Net)-regularized sparse coding with dictionary learning * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_HPP #define __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_HPP #include #include // Include our three simple dictionary initializers. #include "nothing_initializer.hpp" #include "data_dependent_random_initializer.hpp" #include "random_initializer.hpp" namespace mlpack { namespace sparse_coding { /** * An implementation of Sparse Coding with Dictionary Learning that achieves * sparsity via an l1-norm regularizer on the codes (LASSO) or an (l1+l2)-norm * regularizer on the codes (the Elastic Net). * * Let d be the number of dimensions in the original space, m the number of * training points, and k the number of atoms in the dictionary (the dimension * of the learned feature space). The training data X is a d-by-m matrix where * each column is a point and each row is a dimension. The dictionary D is a * d-by-k matrix, and the sparse codes matrix Z is a k-by-m matrix. * This program seeks to minimize the objective: * * \f[ * \min_{D,Z} 0.5 ||X - D Z||_{F}^2\ + \lambda_1 \sum_{i=1}^m ||Z_i||_1 * + 0.5 \lambda_2 \sum_{i=1}^m ||Z_i||_2^2 * \f] * * subject to \f$ ||D_j||_2 <= 1 \f$ for \f$ 1 <= j <= k \f$ * where typically \f$ lambda_1 > 0 \f$ and \f$ lambda_2 = 0 \f$. * * This problem is solved by an algorithm that alternates between a dictionary * learning step and a sparse coding step. The dictionary learning step updates * the dictionary D using a Newton method based on the Lagrange dual (see the * paper below for details). The sparse coding step involves solving a large * number of sparse linear regression problems; this can be done efficiently * using LARS, an algorithm that can solve the LASSO or the Elastic Net (papers * below). * * Here are those papers: * * @code * @incollection{lee2007efficient, * title = {Efficient sparse coding algorithms}, * author = {Honglak Lee and Alexis Battle and Rajat Raina and Andrew Y. Ng}, * booktitle = {Advances in Neural Information Processing Systems 19}, * editor = {B. Sch\"{o}lkopf and J. Platt and T. Hoffman}, * publisher = {MIT Press}, * address = {Cambridge, MA}, * pages = {801--808}, * year = {2007} * } * @endcode * * @code * @article{efron2004least, * title={Least angle regression}, * author={Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R.}, * journal={The Annals of statistics}, * volume={32}, * number={2}, * pages={407--499}, * year={2004}, * publisher={Institute of Mathematical Statistics} * } * @endcode * * @code * @article{zou2005regularization, * title={Regularization and variable selection via the elastic net}, * author={Zou, H. and Hastie, T.}, * journal={Journal of the Royal Statistical Society Series B}, * volume={67}, * number={2}, * pages={301--320}, * year={2005}, * publisher={Royal Statistical Society} * } * @endcode * * Before the method is run, the dictionary is initialized using the * DictionaryInitializationPolicy class. Possible choices include the * RandomInitializer, which provides an entirely random dictionary, the * DataDependentRandomInitializer, which provides a random dictionary based * loosely on characteristics of the dataset, and the NothingInitializer, which * does not initialize the dictionary -- instead, the user should set the * dictionary using the Dictionary() mutator method. * * @tparam DictionaryInitializationPolicy The class to use to initialize the * dictionary; must have 'void Initialize(const arma::mat& data, arma::mat& * dictionary)' function. */ template class SparseCoding { public: /** * Set the parameters to SparseCoding. lambda2 defaults to 0. * * @param data Data matrix * @param atoms Number of atoms in dictionary * @param lambda1 Regularization parameter for l1-norm penalty * @param lambda2 Regularization parameter for l2-norm penalty */ SparseCoding(const arma::mat& data, const size_t atoms, const double lambda1, const double lambda2 = 0); /** * Run Sparse Coding with Dictionary Learning. * * @param maxIterations Maximum number of iterations to run algorithm. If 0, * the algorithm will run until convergence (or forever). * @param objTolerance Tolerance for objective function. When an iteration of * the algorithm produces an improvement smaller than this, the algorithm * will terminate. * @param newtonTolerance Tolerance for the Newton's method dictionary * optimization step. */ void Encode(const size_t maxIterations = 0, const double objTolerance = 0.01, const double newtonTolerance = 1e-6); /** * Sparse code each point via LARS. */ void OptimizeCode(); /** * Learn dictionary via Newton method based on Lagrange dual. * * @param adjacencies Indices of entries (unrolled column by column) of * the coding matrix Z that are non-zero (the adjacency matrix for the * bipartite graph of points and atoms). * @param newtonTolerance Tolerance of the Newton's method optimizer. * @return the norm of the gradient of the Lagrange dual with respect to * the dual variables */ double OptimizeDictionary(const arma::uvec& adjacencies, const double newtonTolerance = 1e-6); /** * Project each atom of the dictionary back onto the unit ball, if necessary. */ void ProjectDictionary(); /** * Compute the objective function. */ double Objective() const; //! Access the data. const arma::mat& Data() const { return data; } //! Access the dictionary. const arma::mat& Dictionary() const { return dictionary; } //! Modify the dictionary. arma::mat& Dictionary() { return dictionary; } //! Access the sparse codes. const arma::mat& Codes() const { return codes; } //! Modify the sparse codes. arma::mat& Codes() { return codes; } // Returns a string representation of this object. std::string ToString() const; private: //! Number of atoms. size_t atoms; //! Data matrix (columns are points). const arma::mat& data; //! Dictionary (columns are atoms). arma::mat dictionary; //! Sparse codes (columns are points). arma::mat codes; //! l1 regularization term. double lambda1; //! l2 regularization term. double lambda2; }; }; // namespace sparse_coding }; // namespace mlpack // Include implementation. #include "sparse_coding_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/sparse_coding/data_dependent_random_initializer.hpp0000644000176200001440000000452513647512751030023 0ustar liggesusers/** * @file data_dependent_random_initializer.hpp * @author Nishant Mehta * * A sensible heuristic for initializing dictionaries for sparse coding. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_DATA_DEPENDENT_RANDOM_INITIALIZER_HPP #define __MLPACK_METHODS_SPARSE_CODING_DATA_DEPENDENT_RANDOM_INITIALIZER_HPP #include namespace mlpack { namespace sparse_coding { /** * A data-dependent random dictionary initializer for SparseCoding. This * creates random dictionary atoms by adding three random observations from the * data together, and then normalizing the atom. */ class DataDependentRandomInitializer { public: /** * Initialize the dictionary by adding together three random observations from * the data, and then normalizing the atom. This implementation is simple * enough to be included with the definition. * * @param data Dataset to initialize the dictionary with. * @param atoms Number of atoms in dictionary. * @param dictionary Dictionary to initialize. */ static void Initialize(const arma::mat& data, const size_t atoms, arma::mat& dictionary) { // Set the size of the dictionary. dictionary.set_size(data.n_rows, atoms); // Create each atom. for (size_t i = 0; i < atoms; ++i) { // Add three atoms together. dictionary.col(i) = (data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols))); // Now normalize the atom. dictionary.col(i) /= norm(dictionary.col(i), 2); } } }; }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/sparse_coding/nothing_initializer.hpp0000644000176200001440000000344713647512751025174 0ustar liggesusers/** * @file nothing_initializer.hpp * @author Ryan Curtin * * An initializer for SparseCoding which does precisely nothing. It is useful * for when you have an already defined dictionary and you plan on setting it * with SparseCoding::Dictionary(). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_NOTHING_INITIALIZER_HPP #define __MLPACK_METHODS_SPARSE_CODING_NOTHING_INITIALIZER_HPP #include namespace mlpack { namespace sparse_coding { /** * A DictionaryInitializer for SparseCoding which does not initialize anything; * it is useful for when the dictionary is already known and will be set with * SparseCoding::Dictionary(). */ class NothingInitializer { public: /** * This function does not initialize the dictionary. This will cause problems * for SparseCoding if the dictionary is not set manually before running the * method. */ static void Initialize(const arma::mat& /* data */, const size_t /* atoms */, arma::mat& /* dictionary */) { // Do nothing! } }; }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/cf/0000755000176200001440000000000013647512751016152 5ustar liggesusersRcppMLPACK/src/mlpack/methods/cf/cf_impl.hpp0000644000176200001440000002104113647512751020272 0ustar liggesusers/** * @file cf.cpp * @author Mudit Raj Gupta * * Collaborative Filtering. * * Implementation of CF class to perform Collaborative Filtering on the * specified data set. * * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ namespace mlpack { namespace cf { /** * Construct the CF object. */ template CF::CF(arma::mat& data, const size_t numUsersForSimilarity, const size_t rank) : data(data), numUsersForSimilarity(numUsersForSimilarity), rank(rank), factorizer() { // Validate neighbourhood size. if (numUsersForSimilarity < 1) { Rcpp::Rcout << "CF::CF(): neighbourhood size should be > 0(" << numUsersForSimilarity << " given). Setting value to 5.\n"; //Setting Default Value of 5 this->numUsersForSimilarity = 5; } CleanData(); } template void CF::GetRecommendations(const size_t numRecs, arma::Mat& recommendations) { // Generate list of users. Maybe it would be more efficient to pass an empty // users list, and then have the other overload of GetRecommendations() assume // that if users is empty, then recommendations should be generated for all // users? arma::Col users = arma::linspace >(0, cleanedData.n_cols - 1, cleanedData.n_cols); // Call the main overload for recommendations. GetRecommendations(numRecs, recommendations, users); } template void CF::GetRecommendations(const size_t numRecs, arma::Mat& recommendations, arma::Col& users) { // Base function for calculating recommendations. // Check if the user wanted us to choose a rank for them. if (rank == 0) { // This is a simple heuristic that picks a rank based on the density of the // dataset between 5 and 105. const double density = (cleanedData.n_nonzero * 100.0) / cleanedData.n_elem; const size_t rankEstimate = size_t(density) + 5; // Set to heuristic value. Rcpp::Rcout << "No rank given for decomposition; using rank of " << rankEstimate << " calculated by density-based heuristic." << std::endl; rank = rankEstimate; } // Operations independent of the query: // Decompose the sparse data matrix to user and data matrices. factorizer.Apply(cleanedData, rank, w, h); // Generate new table by multiplying approximate values. rating = w * h; // Now, we will use the decomposed w and h matrices to estimate what the user // would have rated items as, and then pick the best items. // Temporarily store feature vector of queried users. arma::mat query(rating.n_rows, users.n_elem); // Select feature vectors of queried users. for (size_t i = 0; i < users.n_elem; i++) query.col(i) = rating.col(users(i)); // Temporary storage for neighborhood of the queried users. arma::Mat neighborhood; // Calculate the neighborhood of the queried users. // This should be a templatized option. neighbor::AllkNN a(rating, query); arma::mat resultingDistances; // Temporary storage. a.Search(numUsersForSimilarity, neighborhood, resultingDistances); // Temporary storage for storing the average rating for each user in their // neighborhood. arma::mat averages = arma::zeros(rating.n_rows, query.n_cols); // Iterate over each query user. for (size_t i = 0; i < neighborhood.n_cols; ++i) { // Iterate over each neighbor of the query user. for (size_t j = 0; j < neighborhood.n_rows; ++j) averages.col(i) += rating.col(neighborhood(j, i)); // Normalize average. averages.col(i) /= neighborhood.n_rows; } // Generate recommendations for each query user by finding the maximum numRecs // elements in the averages matrix. recommendations.set_size(numRecs, users.n_elem); recommendations.fill(cleanedData.n_rows); // Invalid item number. arma::mat values(numRecs, users.n_elem); values.fill(-DBL_MAX); // The smallest possible value. for (size_t i = 0; i < users.n_elem; i++) { // Look through the averages column corresponding to the current user. for (size_t j = 0; j < averages.n_rows; ++j) { // Ensure that the user hasn't already rated the item. if (cleanedData(j, users(i)) != 0.0) continue; // The user already rated the item. // Is the estimated value better than the worst candidate? const double value = averages(j, i); if (value > values(values.n_rows - 1, i)) { // It should be inserted. Which position? size_t insertPosition = values.n_rows - 1; while (insertPosition > 0) { if (value <= values(insertPosition - 1, i)) break; // The current value is the right one. insertPosition--; } // Now insert it into the list. InsertNeighbor(i, insertPosition, j, value, recommendations, values); } } // If we were not able to come up with enough recommendations, issue a // warning. if (recommendations(values.n_rows - 1, i) == cleanedData.n_rows + 1) Rcpp::Rcout << "Could not provide " << values.n_rows << " recommendations " << "for user " << users(i) << " (not enough un-rated items)!" << std::endl; } } template void CF::CleanData() { // Generate list of locations for batch insert constructor for sparse // matrices. arma::umat locations(2, data.n_cols); arma::vec values(data.n_cols); for (size_t i = 0; i < data.n_cols; ++i) { // We have to transpose it because items are rows, and users are columns. locations(1, i) = ((arma::uword) data(0, i)); locations(0, i) = ((arma::uword) data(1, i)); values(i) = data(2, i); } // Find maximum user and item IDs. const size_t maxItemID = (size_t) max(locations.row(0)) + 1; const size_t maxUserID = (size_t) max(locations.row(1)) + 1; // Fill sparse matrix. cleanedData = arma::sp_mat(locations, values, maxItemID, maxUserID); } /** * Helper function to insert a point into the recommendation matrices. * * @param queryIndex Index of point whose recommendations we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of item being inserted as a recommendation. * @param value Value of recommendation. */ template void CF::InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double value, arma::Mat& recommendations, arma::mat& values) const { // We only memmove() if there is actually a need to shift something. if (pos < (recommendations.n_rows - 1)) { const int len = (values.n_rows - 1) - pos; memmove(values.colptr(queryIndex) + (pos + 1), values.colptr(queryIndex) + pos, sizeof(double) * len); memmove(recommendations.colptr(queryIndex) + (pos + 1), recommendations.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. values(pos, queryIndex) = value; recommendations(pos, queryIndex) = neighbor; } // Return string of object. template std::string CF::ToString() const { std::ostringstream convert; convert << "Collaborative Filtering [" << this << "]" << std::endl; //convert << " Number of Recommendations: " << numRecs << std::endl; //convert << " Number of Users for Similarity: " << numUsersForSimilarity; //convert << std::endl; //convert << " Data: " << data.n_rows << "x" << data.n_cols << std::endl; return convert.str(); } }; // namespace mlpack }; // namespace cf RcppMLPACK/src/mlpack/methods/cf/cf.hpp0000644000176200001440000001511613647512751017257 0ustar liggesusers/** * @file cf.hpp * @author Mudit Raj Gupta * * Collaborative filtering. * * Defines the CF class to perform collaborative filtering on the specified data * set using alternating least squares (ALS). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_CF_CF_HPP #define __MLPACK_METHODS_CF_CF_HPP #include #include #include #include #include #include #include #include namespace mlpack { namespace cf /** Collaborative filtering. */{ /** * This class implements Collaborative Filtering (CF). This implementation * presently supports Alternating Least Squares (ALS) for collaborative * filtering. * * A simple example of how to run Collaborative Filtering is shown below. * * @code * extern arma::mat data; // (user, item, rating) table * extern arma::Col users; // users seeking recommendations * arma::Mat recommendations; // Recommendations * * CF<> cf(data); // Default options. * * // Generate 10 recommendations for all users. * cf.GetRecommendations(10, recommendations); * * // Generate 10 recommendations for specified users. * cf.GetRecommendations(10, recommendations, users); * * @endcode * * The data matrix is a (user, item, rating) table. Each column in the matrix * should have three rows. The first represents the user; the second represents * the item; and the third represents the rating. The user and item, while they * are in a matrix that holds doubles, should hold integer (or size_t) values. * The user and item indices are assumed to start at 0. * * @tparam FactorizerType The type of matrix factorization to use to decompose * the rating matrix (a W and H matrix). This must implement the method * Apply(arma::sp_mat& data, size_t rank, arma::mat& W, arma::mat& H). */ template< typename FactorizerType = amf::AMF > class CF { public: /** * Initialize the CF object. Store a reference to the data that we * will be using. There are parameters that can be set; default values * are provided for each of them. If the rank is left unset (or is set to 0), * a simple density-based heuristic will be used to choose a rank. * * @param data Initial (user, item, rating) matrix. * @param numUsersForSimilarity Size of the neighborhood. * @param rank Rank parameter for matrix factorization. */ CF(arma::mat& data, const size_t numUsersForSimilarity = 5, const size_t rank = 0); //! Sets number of users for calculating similarity. void NumUsersForSimilarity(const size_t num) { if (num < 1) { Rcpp::Rcout << "CF::NumUsersForSimilarity(): invalid value (< 1) " "ignored." << std::endl; return; } this->numUsersForSimilarity = num; } //! Gets number of users for calculating similarity. size_t NumUsersForSimilarity() const { return numUsersForSimilarity; } //! Sets rank parameter for matrix factorization. void Rank(const size_t rankValue) { this->rank = rankValue; } //! Gets rank parameter for matrix factorization. size_t Rank() const { return rank; } //! Sets factorizer for NMF void Factorizer(const FactorizerType& f) { this->factorizer = f; } //! Get the User Matrix. const arma::mat& W() const { return w; } //! Get the Item Matrix. const arma::mat& H() const { return h; } //! Get the Rating Matrix. const arma::mat& Rating() const { return rating; } //! Get the data matrix. const arma::mat& Data() const { return data; } //! Get the cleaned data matrix. const arma::sp_mat& CleanedData() const { return cleanedData; } /** * Generates the given number of recommendations for all users. * * @param numRecs Number of Recommendations * @param recommendations Matrix to save recommendations into. */ void GetRecommendations(const size_t numRecs, arma::Mat& recommendations); /** * Generates the given number of recommendations for the specified users. * * @param numRecs Number of Recommendations * @param recommendations Matrix to save recommendations * @param users Users for which recommendations are to be generated */ void GetRecommendations(const size_t numRecs, arma::Mat& recommendations, arma::Col& users); /** * Returns a string representation of this object. */ std::string ToString() const; private: //! Initial data matrix. arma::mat data; //! Number of users for similarity. size_t numUsersForSimilarity; //! Rank used for matrix factorization. size_t rank; //! Instantiated factorizer object. FactorizerType factorizer; //! User matrix. arma::mat w; //! Item matrix. arma::mat h; //! Rating matrix. arma::mat rating; //! Cleaned data matrix. arma::sp_mat cleanedData; //! Converts the User, Item, Value Matrix to User-Item Table void CleanData(); /** * Helper function to insert a point into the recommendation matrices. * * @param queryIndex Index of point whose recommendations we are inserting * into. * @param pos Position in list to insert into. * @param neighbor Index of item being inserted as a recommendation. * @param value Value of recommendation. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double value, arma::Mat& recommendations, arma::mat& values) const; }; // class CF }; // namespace cf }; // namespace mlpack //Include implementation #include "cf_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/quic_svd/0000755000176200001440000000000013647512751017377 5ustar liggesusersRcppMLPACK/src/mlpack/methods/quic_svd/quic_svd_impl.hpp0000644000176200001440000000561613647512751022756 0ustar liggesusers/** * @file quic_svd_impl.hpp * @author Siddharth Agrawal * * An implementation of QUIC-SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_IMPL_HPP #define __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_IMPL_HPP // In case it hasn't been included yet. #include "quic_svd.hpp" using namespace mlpack::tree; namespace mlpack { namespace svd { QUIC_SVD::QUIC_SVD(const arma::mat& dataset, arma::mat& u, arma::mat& v, arma::mat& sigma, const double epsilon, const double delta) : dataset(dataset), epsilon(epsilon), delta(delta) { // Since columns are sample in the implementation, the matrix is transposed if // necessary for maximum speedup. CosineTree* ctree; if (dataset.n_cols > dataset.n_rows) ctree = new CosineTree(dataset, epsilon, delta); else ctree = new CosineTree(dataset.t(), epsilon, delta); // Get subspace basis by creating the cosine tree. ctree->GetFinalBasis(basis); // Use the ExtractSVD algorithm mentioned in the paper to extract the SVD of // the original dataset in the obtained subspace. ExtractSVD(u, v, sigma); } void QUIC_SVD::ExtractSVD(arma::mat& u, arma::mat& v, arma::mat& sigma) { // Calculate A * V_hat, necessary for further calculations. arma::mat projectedMat; if (dataset.n_cols > dataset.n_rows) projectedMat = dataset.t() * basis; else projectedMat = dataset * basis; // Calculate the squared projected matrix. arma::mat projectedMatSquared = projectedMat.t() * projectedMat; // Calculate the SVD of the above matrix. arma::mat uBar, vBar; arma::vec sigmaBar; arma::svd(uBar, sigmaBar, vBar, projectedMatSquared); // Calculate the approximate SVD of the original matrix, using the SVD of the // squared projected matrix. v = basis * vBar; sigma = arma::sqrt(diagmat(sigmaBar)); u = projectedMat * vBar * sigma.i(); // Since columns are sampled, the unitary matrices have to be exchanged, if // the transposed matrix is not passed. if (dataset.n_cols > dataset.n_rows) { arma::mat tempMat = u; u = v; v = tempMat; } } }; // namespace svd }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/quic_svd/quic_svd.hpp0000644000176200001440000000523413647512751021731 0ustar liggesusers/** * @file quic_svd.hpp * @author Siddharth Agrawal * * An implementation of QUIC-SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_HPP #define __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_HPP #include #include namespace mlpack { namespace svd { class QUIC_SVD { public: /** * Constructor which implements the QUIC-SVD algorithm. The function calls the * CosineTree constructor to create a subspace basis, where the original * matrix's projection has minimum reconstruction error. The constructor then * uses the ExtractSVD() function to calculate the SVD of the original dataset * in that subspace. * * @param dataset Matrix for which SVD is calculated. * @param u First unitary matrix. * @param v Second unitary matrix. * @param sigma Diagonal matrix of singular values. * @param epsilon Error tolerance fraction for calculated subspace. * @param delta Cumulative probability for Monte Carlo error lower bound. */ QUIC_SVD(const arma::mat& dataset, arma::mat& u, arma::mat& v, arma::mat& sigma, const double epsilon = 0.03, const double delta = 0.1); /** * This function uses the vector subspace created using a cosine tree to * calculate an approximate SVD of the original matrix. * * @param u First unitary matrix. * @param v Second unitary matrix. * @param sigma Diagonal matrix of singular values. */ void ExtractSVD(arma::mat& u, arma::mat& v, arma::mat& sigma); private: //! Matrix for which cosine tree is constructed. const arma::mat& dataset; //! Error tolerance fraction for calculated subspace. double epsilon; //! Cumulative probability for Monte Carlo error lower bound. double delta; //! Subspace basis of the input dataset. arma::mat basis; }; }; // namespace svd }; // namespace mlpack // Include implementation. #include "quic_svd_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/local_coordinate_coding/0000755000176200001440000000000013647512751022406 5ustar liggesusersRcppMLPACK/src/mlpack/methods/local_coordinate_coding/lcc_impl.hpp0000644000176200001440000002557713647512751024721 0ustar liggesusers/** * @file lcc_impl.hpp * @author Nishant Mehta * * Implementation of Local Coordinate Coding * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_IMPL_HPP #define __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_IMPL_HPP // In case it hasn't been included yet. #include "lcc.hpp" namespace mlpack { namespace lcc { template LocalCoordinateCoding::LocalCoordinateCoding( const arma::mat& data, const size_t atoms, const double lambda) : atoms(atoms), data(data), codes(atoms, data.n_cols), lambda(lambda) { // Initialize the dictionary. DictionaryInitializer::Initialize(data, atoms, dictionary); } template void LocalCoordinateCoding::Encode( const size_t maxIterations, const double objTolerance) { //Timer::Start("local_coordinate_coding"); double lastObjVal = DBL_MAX; // Take the initial coding step, which has to happen before entering the main // loop. Rcpp::Rcout << "Initial Coding Step." << std::endl; OptimizeCode(); arma::uvec adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double)(adjacencies.n_elem)) / ((double)(atoms * data.n_cols)) << "%.\n"; Rcpp::Rcout << " Objective value: " << Objective(adjacencies) << "." << std::endl; for (size_t t = 1; t != maxIterations; t++) { Rcpp::Rcout << "Iteration " << t << " of " << maxIterations << "." << std::endl; // First step: optimize the dictionary. Rcpp::Rcout << "Performing dictionary step..." << std::endl; OptimizeDictionary(adjacencies); double dsObjVal = Objective(adjacencies); Rcpp::Rcout << " Objective value: " << Objective(adjacencies) << "." << std::endl; // Second step: perform the coding. Rcpp::Rcout << "Performing coding step..." << std::endl; OptimizeCode(); adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double) (adjacencies.n_elem)) / ((double)(atoms * data.n_cols)) << "%.\n"; // Terminate if the objective increased in the coding step. double curObjVal = Objective(adjacencies); if (curObjVal > dsObjVal) { Rcpp::Rcout << "Objective increased in coding step! Terminating." << std::endl; break; } // Find the new objective value and improvement so we can check for // convergence. double improvement = lastObjVal - curObjVal; Rcpp::Rcout << "Objective value: " << curObjVal << " (improvement " << std::scientific << improvement << ")." << std::endl; if (improvement < objTolerance) { Rcpp::Rcout << "Converged within tolerance " << objTolerance << ".\n"; break; } lastObjVal = curObjVal; } //Timer::Stop("local_coordinate_coding"); } template void LocalCoordinateCoding::OptimizeCode() { arma::mat invSqDists = 1.0 / (repmat(trans(sum(square(dictionary))), 1, data.n_cols) + repmat(sum(square(data)), atoms, 1) - 2 * trans(dictionary) * data); arma::mat dictGram = trans(dictionary) * dictionary; arma::mat dictGramTD(dictGram.n_rows, dictGram.n_cols); for (size_t i = 0; i < data.n_cols; i++) { // report progress if ((i % 100) == 0) { Rcpp::Rcout << "Optimization at point " << i << "." << std::endl; } arma::vec invW = invSqDists.unsafe_col(i); arma::mat dictPrime = dictionary * diagmat(invW); arma::mat dictGramTD = diagmat(invW) * dictGram * diagmat(invW); bool useCholesky = false; regression::LARS lars(useCholesky, dictGramTD, 0.5 * lambda); // Run LARS for this point, by making an alias of the point and passing // that. arma::vec beta = codes.unsafe_col(i); lars.Regress(dictPrime, data.unsafe_col(i), beta, false); beta %= invW; // Remember, beta is an alias of codes.col(i). } } template void LocalCoordinateCoding::OptimizeDictionary( arma::uvec adjacencies) { // Count number of atomic neighbors for each point x^i. arma::uvec neighborCounts = arma::zeros(data.n_cols, 1); if (adjacencies.n_elem > 0) { // This gets the column index. Intentional integer division. size_t curPointInd = (size_t) (adjacencies(0) / atoms); ++neighborCounts(curPointInd); size_t nextColIndex = (curPointInd + 1) * atoms; for (size_t l = 1; l < adjacencies.n_elem; l++) { // If l no longer refers to an element in this column, advance the column // number accordingly. if (adjacencies(l) >= nextColIndex) { curPointInd = (size_t) (adjacencies(l) / atoms); nextColIndex = (curPointInd + 1) * atoms; } ++neighborCounts(curPointInd); } } // Build dataPrime := [X x^1 ... x^1 ... x^n ... x^n] // where each x^i is repeated for the number of neighbors x^i has. arma::mat dataPrime = arma::zeros(data.n_rows, data.n_cols + adjacencies.n_elem); dataPrime(arma::span::all, arma::span(0, data.n_cols - 1)) = data; size_t curCol = data.n_cols; for (size_t i = 0; i < data.n_cols; i++) { if (neighborCounts(i) > 0) { dataPrime(arma::span::all, arma::span(curCol, curCol + neighborCounts(i) - 1)) = repmat(data.col(i), 1, neighborCounts(i)); } curCol += neighborCounts(i); } // Handle the case of inactive atoms (atoms not used in the given coding). std::vector inactiveAtoms; for (size_t j = 0; j < atoms; ++j) if (accu(codes.row(j) != 0) == 0) inactiveAtoms.push_back(j); const size_t nInactiveAtoms = inactiveAtoms.size(); const size_t nActiveAtoms = atoms - nInactiveAtoms; // Efficient construction of codes restricted to active atoms. arma::mat codesPrime = arma::zeros(nActiveAtoms, data.n_cols + adjacencies.n_elem); arma::vec wSquared = arma::ones(data.n_cols + adjacencies.n_elem, 1); if (nInactiveAtoms > 0) { Rcpp::Rcout << "There are " << nInactiveAtoms << " inactive atoms. They will be re-initialized randomly.\n"; // Create matrix holding only active codes. arma::mat activeCodes; math::RemoveRows(codes, inactiveAtoms, activeCodes); // Create reverse atom lookup for active atoms. arma::uvec atomReverseLookup(atoms); size_t inactiveOffset = 0; for (size_t i = 0; i < atoms; ++i) { if (inactiveAtoms[inactiveOffset] == i) ++inactiveOffset; else atomReverseLookup(i - inactiveOffset) = i; } codesPrime(arma::span::all, arma::span(0, data.n_cols - 1)) = activeCodes; // Fill the rest of codesPrime. for (size_t l = 0; l < adjacencies.n_elem; ++l) { // Recover the location in the codes matrix that this adjacency refers to. size_t atomInd = adjacencies(l) % atoms; size_t pointInd = (size_t) (adjacencies(l) / atoms); // Fill matrix. codesPrime(atomReverseLookup(atomInd), data.n_cols + l) = 1.0; wSquared(data.n_cols + l) = codes(atomInd, pointInd); } } else { // All atoms are active. codesPrime(arma::span::all, arma::span(0, data.n_cols - 1)) = codes; for (size_t l = 0; l < adjacencies.n_elem; ++l) { // Recover the location in the codes matrix that this adjacency refers to. size_t atomInd = adjacencies(l) % atoms; size_t pointInd = (size_t) (adjacencies(l) / atoms); // Fill matrix. codesPrime(atomInd, data.n_cols + l) = 1.0; wSquared(data.n_cols + l) = codes(atomInd, pointInd); } } wSquared.subvec(data.n_cols, wSquared.n_elem - 1) = lambda * abs(wSquared.subvec(data.n_cols, wSquared.n_elem - 1)); // Solve system. if (nInactiveAtoms == 0) { // No inactive atoms. We can solve directly. arma::mat A = codesPrime * diagmat(wSquared) * trans(codesPrime); arma::mat B = codesPrime * diagmat(wSquared) * trans(dataPrime); dictionary = trans(solve(A, B)); /* dictionary = trans(solve(codesPrime * diagmat(wSquared) * trans(codesPrime), codesPrime * diagmat(wSquared) * trans(dataPrime))); */ } else { // Inactive atoms must be reinitialized randomly, so we cannot solve // directly for the entire dictionary estimate. arma::mat dictionaryActive = trans(solve(codesPrime * diagmat(wSquared) * trans(codesPrime), codesPrime * diagmat(wSquared) * trans(dataPrime))); // Update all atoms. size_t currentInactiveIndex = 0; for (size_t i = 0; i < atoms; ++i) { if (inactiveAtoms[currentInactiveIndex] == i) { // This atom is inactive. Reinitialize it randomly. dictionary.col(i) = (data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols))); // Now normalize the atom. dictionary.col(i) /= norm(dictionary.col(i), 2); // Increment inactive atom counter. ++currentInactiveIndex; } else { // Update estimate. dictionary.col(i) = dictionaryActive.col(i - currentInactiveIndex); } } } } template double LocalCoordinateCoding::Objective( arma::uvec adjacencies) const { double weightedL1NormZ = 0; for (size_t l = 0; l < adjacencies.n_elem; l++) { // Map adjacency back to its location in the codes matrix. const size_t atomInd = adjacencies(l) % atoms; const size_t pointInd = (size_t) (adjacencies(l) / atoms); weightedL1NormZ += fabs(codes(atomInd, pointInd)) * arma::as_scalar( arma::sum(arma::square(dictionary.col(atomInd) - data.col(pointInd)))); } double froNormResidual = norm(data - dictionary * codes, "fro"); return std::pow(froNormResidual, 2.0) + lambda * weightedL1NormZ; } template std::string LocalCoordinateCoding::ToString() const { std::ostringstream convert; convert << "Local Coordinate Coding [" << this << "]" << std::endl; convert << " Number of Atoms: " << atoms << std::endl; convert << " Data: " << data.n_rows << "x" << data.n_cols << std::endl; convert << " Lambda: " << lambda << std::endl; return convert.str(); } }; // namespace lcc }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/local_coordinate_coding/lcc.hpp0000644000176200001440000001322613647512751023664 0ustar liggesusers/** * @file lcc.hpp * @author Nishant Mehta * * Definition of the LocalCoordinateCoding class, which performs the Local * Coordinate Coding algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_HPP #define __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_HPP #include #include // Include three simple dictionary initializers from sparse coding. #include "../sparse_coding/nothing_initializer.hpp" #include "../sparse_coding/data_dependent_random_initializer.hpp" #include "../sparse_coding/random_initializer.hpp" namespace mlpack { namespace lcc { /** * An implementation of Local Coordinate Coding (LCC) that codes data which * approximately lives on a manifold using a variation of l1-norm regularized * sparse coding; in LCC, the penalty on the absolute value of each point's * coefficient for each atom is weighted by the squared distance of that point * to that atom. * * Let d be the number of dimensions in the original space, m the number of * training points, and k the number of atoms in the dictionary (the dimension * of the learned feature space). The training data X is a d-by-m matrix where * each column is a point and each row is a dimension. The dictionary D is a * d-by-k matrix, and the sparse codes matrix Z is a k-by-m matrix. * This program seeks to minimize the objective: * min_{D,Z} ||X - D Z||_{Fro}^2 * + lambda sum_{i=1}^m sum_{j=1}^k dist(X_i,D_j)^2 Z_i^j * where lambda > 0. * * This problem is solved by an algorithm that alternates between a dictionary * learning step and a sparse coding step. The dictionary learning step updates * the dictionary D by solving a linear system (note that the objective is a * positive definite quadratic program). The sparse coding step involves * solving a large number of weighted l1-norm regularized linear regression * problems problems; this can be done efficiently using LARS, an algorithm * that can solve the LASSO (paper below). * * The papers are listed below. * * @code * @incollection{NIPS2009_0719, * title = {Nonlinear Learning using Local Coordinate Coding}, * author = {Kai Yu and Tong Zhang and Yihong Gong}, * booktitle = {Advances in Neural Information Processing Systems 22}, * editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C. K. I. Williams * and A. Culotta}, * pages = {2223--2231}, * year = {2009} * } * @endcode * * @code * @article{efron2004least, * title={Least angle regression}, * author={Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R.}, * journal={The Annals of statistics}, * volume={32}, * number={2}, * pages={407--499}, * year={2004}, * publisher={Institute of Mathematical Statistics} * } * @endcode */ template class LocalCoordinateCoding { public: /** * Set the parameters to LocalCoordinateCoding. * * @param data Data matrix. * @param atoms Number of atoms in dictionary. * @param lambda Regularization parameter for weighted l1-norm penalty. */ LocalCoordinateCoding(const arma::mat& data, const size_t atoms, const double lambda); /** * Run local coordinate coding. * * @param nIterations Maximum number of iterations to run algorithm. * @param objTolerance Tolerance of objective function. When the objective * function changes by a value lower than this tolerance, the optimization * terminates. */ void Encode(const size_t maxIterations = 0, const double objTolerance = 0.01); /** * Code each point via distance-weighted LARS. */ void OptimizeCode(); /** * Learn dictionary by solving linear system. * * @param adjacencies Indices of entries (unrolled column by column) of * the coding matrix Z that are non-zero (the adjacency matrix for the * bipartite graph of points and atoms) */ void OptimizeDictionary(arma::uvec adjacencies); /** * Compute objective function given the list of adjacencies. */ double Objective(arma::uvec adjacencies) const; //! Access the data. const arma::mat& Data() const { return data; } //! Accessor for dictionary. const arma::mat& Dictionary() const { return dictionary; } //! Mutator for dictionary. arma::mat& Dictionary() { return dictionary; } //! Accessor the codes. const arma::mat& Codes() const { return codes; } //! Modify the codes. arma::mat& Codes() { return codes; } // Returns a string representation of this object. std::string ToString() const; private: //! Number of atoms in dictionary. size_t atoms; //! Data matrix (columns are points). const arma::mat& data; //! Dictionary (columns are atoms). arma::mat dictionary; //! Codes (columns are points). arma::mat codes; //! l1 regularization term. double lambda; }; }; // namespace lcc }; // namespace mlpack // Include implementation. #include "lcc_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/range_search/0000755000176200001440000000000013647512751020203 5ustar liggesusersRcppMLPACK/src/mlpack/methods/range_search/range_search_rules_impl.hpp0000644000176200001440000002224713647512751025577 0ustar liggesusers/** * @file range_search_rules_impl.hpp * @author Ryan Curtin * * Implementation of rules for range search with generic trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_IMPL_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_IMPL_HPP // In case it hasn't been included yet. #include "range_search_rules.hpp" namespace mlpack { namespace range { template RangeSearchRules::RangeSearchRules( const arma::mat& referenceSet, const arma::mat& querySet, const math::Range& range, std::vector >& neighbors, std::vector >& distances, MetricType& metric) : referenceSet(referenceSet), querySet(querySet), range(range), neighbors(neighbors), distances(distances), metric(metric), lastQueryIndex(querySet.n_cols), lastReferenceIndex(referenceSet.n_cols) { // Nothing to do. } //! The base case. Evaluate the distance between the two points and add to the //! results if necessary. template inline force_inline double RangeSearchRules::BaseCase( const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, don't return the point as in its own range. if ((&referenceSet == &querySet) && (queryIndex == referenceIndex)) return 0.0; // If we have just performed this base case, don't do it again. if ((lastQueryIndex == queryIndex) && (lastReferenceIndex == referenceIndex)) return 0.0; // No value to return... this shouldn't do anything bad. const double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // Update last indices, so we don't accidentally perform a base case twice. lastQueryIndex = queryIndex; lastReferenceIndex = referenceIndex; if (range.Contains(distance)) { neighbors[queryIndex].push_back(referenceIndex); distances[queryIndex].push_back(distance); } return distance; } //! Single-tree scoring function. template double RangeSearchRules::Score(const size_t queryIndex, TreeType& referenceNode) { // We must get the minimum and maximum distances and store them in this // object. math::Range distances; if (tree::TreeTraits::FirstPointIsCentroid) { // In this situation, we calculate the base case. So we should check to be // sure we haven't already done that. double baseCase; if (tree::TreeTraits::HasSelfChildren && (referenceNode.Parent() != NULL) && (referenceNode.Point(0) == referenceNode.Parent()->Point(0))) { // If the tree has self-children and this is a self-child, the base case // was already calculated. baseCase = referenceNode.Parent()->Stat().LastDistance(); lastQueryIndex = queryIndex; lastReferenceIndex = referenceNode.Point(0); } else { // We must calculate the base case by hand. baseCase = BaseCase(queryIndex, referenceNode.Point(0)); } // This may be possibly loose for non-ball bound trees. distances.Lo() = baseCase - referenceNode.FurthestDescendantDistance(); distances.Hi() = baseCase + referenceNode.FurthestDescendantDistance(); // Update last distance calculation. referenceNode.Stat().LastDistance() = baseCase; } else { distances = referenceNode.RangeDistance(querySet.unsafe_col(queryIndex)); } // If the ranges do not overlap, prune this node. if (!distances.Contains(range)) return DBL_MAX; // In this case, all of the points in the reference node will be part of the // results. if ((distances.Lo() >= range.Lo()) && (distances.Hi() <= range.Hi())) { AddResult(queryIndex, referenceNode); return DBL_MAX; // We don't need to go any deeper. } // Otherwise the score doesn't matter. Recursion order is irrelevant in // range search. return 0.0; } //! Single-tree rescoring function. template double RangeSearchRules::Rescore( const size_t /* queryIndex */, TreeType& /* referenceNode */, const double oldScore) const { // If it wasn't pruned before, it isn't pruned now. return oldScore; } //! Dual-tree scoring function. template double RangeSearchRules::Score(TreeType& queryNode, TreeType& referenceNode) { math::Range distances; if (tree::TreeTraits::FirstPointIsCentroid) { // It is possible that the base case has already been calculated. double baseCase = 0.0; if ((traversalInfo.LastQueryNode() != NULL) && (traversalInfo.LastReferenceNode() != NULL) && (traversalInfo.LastQueryNode()->Point(0) == queryNode.Point(0)) && (traversalInfo.LastReferenceNode()->Point(0) == referenceNode.Point(0))) { baseCase = traversalInfo.LastBaseCase(); // Make sure that if BaseCase() is called, we don't duplicate results. lastQueryIndex = queryNode.Point(0); lastReferenceIndex = referenceNode.Point(0); } else { // We must calculate the base case. baseCase = BaseCase(queryNode.Point(0), referenceNode.Point(0)); } distances.Lo() = baseCase - queryNode.FurthestDescendantDistance() - referenceNode.FurthestDescendantDistance(); distances.Hi() = baseCase + queryNode.FurthestDescendantDistance() + referenceNode.FurthestDescendantDistance(); // Update the last distances performed for the query and reference node. traversalInfo.LastBaseCase() = baseCase; } else { // Just perform the calculation. distances = referenceNode.RangeDistance(&queryNode); } // If the ranges do not overlap, prune this node. if (!distances.Contains(range)) return DBL_MAX; // In this case, all of the points in the reference node will be part of all // the results for each point in the query node. if ((distances.Lo() >= range.Lo()) && (distances.Hi() <= range.Hi())) { for (size_t i = 0; i < queryNode.NumDescendants(); ++i) AddResult(queryNode.Descendant(i), referenceNode); return DBL_MAX; // We don't need to go any deeper. } // Otherwise the score doesn't matter. Recursion order is irrelevant in range // search. traversalInfo.LastQueryNode() = &queryNode; traversalInfo.LastReferenceNode() = &referenceNode; return 0.0; } //! Dual-tree rescoring function. template double RangeSearchRules::Rescore( TreeType& /* queryNode */, TreeType& /* referenceNode */, const double oldScore) const { // If it wasn't pruned before, it isn't pruned now. return oldScore; } //! Add all the points in the given node to the results for the given query //! point. template void RangeSearchRules::AddResult(const size_t queryIndex, TreeType& referenceNode) { // Some types of trees calculate the base case evaluation before Score() is // called, so if the base case has already been calculated, then we must avoid // adding that point to the results again. size_t baseCaseMod = 0; if (tree::TreeTraits::FirstPointIsCentroid && (queryIndex == lastQueryIndex) && (referenceNode.Point(0) == lastReferenceIndex)) { baseCaseMod = 1; } // Resize distances and neighbors vectors appropriately. We have to use // reserve() and not resize(), because we don't know if we will encounter the // case where the datasets and points are the same (and we skip in that case). const size_t oldSize = neighbors[queryIndex].size(); neighbors[queryIndex].reserve(oldSize + referenceNode.NumDescendants() - baseCaseMod); distances[queryIndex].reserve(oldSize + referenceNode.NumDescendants() - baseCaseMod); for (size_t i = baseCaseMod; i < referenceNode.NumDescendants(); ++i) { if ((&referenceSet == &querySet) && (queryIndex == referenceNode.Descendant(i))) continue; const double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceNode.Dataset().unsafe_col(referenceNode.Descendant(i))); neighbors[queryIndex].push_back(referenceNode.Descendant(i)); distances[queryIndex].push_back(distance); } } }; // namespace range }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/range_search/range_search_impl.hpp0000644000176200001440000002636713647512751024374 0ustar liggesusers/** * @file range_search_impl.hpp * @author Ryan Curtin * * Implementation of the RangeSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_IMPL_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_IMPL_HPP // Just in case it hasn't been included. #include "range_search.hpp" // The rules for traversal. #include "range_search_rules.hpp" namespace mlpack { namespace range { template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } template RangeSearch::RangeSearch( const typename TreeType::Mat& referenceSetIn, const typename TreeType::Mat& querySetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? queryCopy : querySetIn), treeOwner(!naive), // If in naive mode, we are not building any trees. hasQuerySet(true), naive(naive), singleMode(!naive && singleMode), // Naive overrides single mode. metric(metric), numPrunes(0) { // Build the trees. //Timer::Start("range_search/tree_building"); // Copy the datasets, if they will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) { referenceCopy = referenceSetIn; queryCopy = querySetIn; } // If in naive mode, then we do not need to build trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = BuildTree( const_cast(querySet), oldFromNewQueries); } //Timer::Stop("range_search/tree_building"); } template RangeSearch::RangeSearch( const typename TreeType::Mat& referenceSetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), queryTree(NULL), treeOwner(!naive), // If in naive mode, we are not building any trees. hasQuerySet(false), naive(naive), singleMode(!naive && singleMode), // Naive overrides single mode. metric(metric), numPrunes(0) { // Build the trees. //Timer::Start("range_search/tree_building"); // Copy the dataset, if it will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) referenceCopy = referenceSetIn; // If in naive mode, then we do not need to build trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = new TreeType(*referenceTree); } //Timer::Stop("range_search/tree_building"); } template RangeSearch::RangeSearch( TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), hasQuerySet(true), naive(false), singleMode(singleMode), metric(metric), numPrunes(0) { // Nothing else to initialize. } template RangeSearch::RangeSearch( TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), hasQuerySet(false), naive(false), singleMode(singleMode), metric(metric), numPrunes(0) { // If doing dual-tree range search, we must clone the reference tree. if (!singleMode) queryTree = new TreeType(*referenceTree); } template RangeSearch::~RangeSearch() { if (treeOwner) { if (referenceTree) delete referenceTree; if (queryTree) delete queryTree; } // If doing dual-tree search with one dataset, we cloned the reference tree. if (!treeOwner && !hasQuerySet && !(singleMode || naive)) delete queryTree; } template void RangeSearch::Search( const math::Range& range, std::vector >& neighbors, std::vector >& distances) { //Timer::Start("range_search/computing_neighbors"); // Set size of prunes to 0. numPrunes = 0; // If we have built the trees ourselves, then we will have to map all the // indices back to their original indices when this computation is finished. // To avoid extra copies, we will store the unmapped neighbors and distances // in a separate object. std::vector >* neighborPtr = &neighbors; std::vector >* distancePtr = &distances; // Mapping is only necessary if the tree rearranges points. if (tree::TreeTraits::RearrangesDataset) { if (treeOwner && !(singleMode && hasQuerySet)) distancePtr = new std::vector >; // Query indices need to be mapped. if (treeOwner) neighborPtr = new std::vector >; // All indices need mapping. } // Resize each vector. neighborPtr->clear(); // Just in case there was anything in it. neighborPtr->resize(querySet.n_cols); distancePtr->clear(); distancePtr->resize(querySet.n_cols); // Create the helper object for the traversal. typedef RangeSearchRules RuleType; RuleType rules(referenceSet, querySet, range, *neighborPtr, *distancePtr, metric); if (naive) { // The naive brute-force solution. for (size_t i = 0; i < querySet.n_cols; ++i) for (size_t j = 0; j < referenceSet.n_cols; ++j) rules.BaseCase(i, j); } else if (singleMode) { // Create the traverser. typename TreeType::template SingleTreeTraverser traverser(rules); // Now have it traverse for each point. for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); numPrunes = traverser.NumPrunes(); } else // Dual-tree recursion. { // Create the traverser. typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*queryTree, *referenceTree); numPrunes = traverser.NumPrunes(); } //Timer::Stop("range_search/computing_neighbors"); // Output number of prunes. Rcpp::Rcout << "Number of pruned nodes during computation: " << numPrunes << "." << std::endl; // Map points back to original indices, if necessary. if (!treeOwner || !tree::TreeTraits::RearrangesDataset) { // No mapping needed. We are done. return; } else if (treeOwner && hasQuerySet && !singleMode) // Map both sets. { neighbors.clear(); neighbors.resize(querySet.n_cols); distances.clear(); distances.resize(querySet.n_cols); for (size_t i = 0; i < distances.size(); i++) { // Map distances (copy a column). size_t queryMapping = oldFromNewQueries[i]; distances[queryMapping] = (*distancePtr)[i]; // Copy each neighbor individually, because we need to map it. neighbors[queryMapping].resize(distances[queryMapping].size()); for (size_t j = 0; j < distances[queryMapping].size(); j++) { neighbors[queryMapping][j] = oldFromNewReferences[(*neighborPtr)[i][j]]; } } // Finished with temporary objects. delete neighborPtr; delete distancePtr; } else if (treeOwner && !hasQuerySet) { neighbors.clear(); neighbors.resize(querySet.n_cols); distances.clear(); distances.resize(querySet.n_cols); for (size_t i = 0; i < distances.size(); i++) { // Map distances (copy a column). size_t refMapping = oldFromNewReferences[i]; distances[refMapping] = (*distancePtr)[i]; // Copy each neighbor individually, because we need to map it. neighbors[refMapping].resize(distances[refMapping].size()); for (size_t j = 0; j < distances[refMapping].size(); j++) { neighbors[refMapping][j] = oldFromNewReferences[(*neighborPtr)[i][j]]; } } // Finished with temporary objects. delete neighborPtr; delete distancePtr; } else if (treeOwner && hasQuerySet && singleMode) // Map only references. { neighbors.clear(); neighbors.resize(querySet.n_cols); // Map indices of neighbors. for (size_t i = 0; i < neighbors.size(); i++) { neighbors[i].resize((*neighborPtr)[i].size()); for (size_t j = 0; j < neighbors[i].size(); j++) { neighbors[i][j] = oldFromNewReferences[(*neighborPtr)[i][j]]; } } // Finished with temporary object. delete neighborPtr; } } template std::string RangeSearch::ToString() const { std::ostringstream convert; convert << "Range Search [" << this << "]" << std::endl; if (treeOwner) convert << " Tree Owner: TRUE" << std::endl; if (naive) convert << " Naive: TRUE" << std::endl; convert << " Metric: " << std::endl << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace range }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/range_search/range_search.hpp0000644000176200001440000002422013647512751023335 0ustar liggesusers/** * @file range_search.hpp * @author Ryan Curtin * * Defines the RangeSearch class, which performs a generalized range search on * points. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_HPP #include #include #include #include "range_search_stat.hpp" namespace mlpack { namespace range /** Range-search routines. */ { /** * The RangeSearch class is a template class for performing range searches. It * is implemented in the style of a generalized tree-independent dual-tree * algorithm; for more details on the actual algorithm, see the RangeSearchRules * class. */ template, RangeSearchStat> > class RangeSearch { public: /** * Initialize the RangeSearch object with a different reference set and a * query set. Optionally, perform the computation in naive mode or * single-tree mode, and set the leaf size used for tree-building. * Additionally, an instantiated metric can be given, for cases where the * distance metric holds data. * * This method will copy the matrices to internal copies, which are rearranged * during tree-building. You can avoid this extra copy by pre-constructing * the trees and passing them using a different constructor. * * @param referenceSet Reference dataset. * @param querySet Query dataset. * @param naive Whether the computation should be done in O(n^2) naive mode. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param leafSize The leaf size to be used during tree construction. * @param metric Instantiated distance metric. */ RangeSearch(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RangeSearch object with only a reference set, which will * also be used as a query set. Optionally, perform the computation in naive * mode or single-tree mode, and set the leaf size used for tree-building. * Additionally an instantiated metric can be given, for cases where the * distance metric holds data. * * This method will copy the reference matrix to an internal copy, which is * rearranged during tree-building. You can avoid this extra copy by * pre-constructing the reference tree and passing it using a different * constructor. * * @param referenceSet Reference dataset. * @param naive Whether the computation should be done in O(n^2) naive mode. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param leafSize The leaf size to be used during tree construction. * @param metric Instantiated distance metric. */ RangeSearch(const typename TreeType::Mat& referenceSet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RangeSearch object with the given datasets and * pre-constructed trees. It is assumed that the points in referenceSet and * querySet correspond to the points in referenceTree and queryTree, * respectively. Optionally, choose to use single-tree mode. Naive * mode is not available as an option for this constructor; instead, to run * naive computation, construct a tree with all the points in one leaf (i.e. * leafSize = number of points). Additionally, an instantiated distance * metric can be given, for cases where the distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param queryTree Pre-built tree for query points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param querySet Set of query points corresponding to queryTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RangeSearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RangeSearch object with the given reference dataset and * pre-constructed tree. It is assumed that the points in referenceSet * correspond to the points in referenceTree. Optionally, choose to use * single-tree mode. Naive mode is not available as an option for this * constructor; instead, to run naive computation, construct a tree with all * the points in one leaf (i.e. leafSize = number of points). Additionally, * an instantiated distance metric can be given, for the case where the * distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RangeSearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Destroy the RangeSearch object. If trees were created, they will be * deleted. */ ~RangeSearch(); /** * Search for all points in the given range, returning the results in the * neighbors and distances objects. Each entry in the external vector * corresponds to a query point. Each of these entries holds a vector which * contains the indices and distances of the reference points falling into the * given range. * * That is: * * - neighbors.size() and distances.size() both equal the number of query * points. * * - neighbors[i] contains the indices of all the points in the reference set * which have distances inside the given range to query point i. * * - distances[i] contains all of the distances corresponding to the indices * contained in neighbors[i]. * * - neighbors[i] and distances[i] are not sorted in any particular order. * * @param range Range of distances in which to search. * @param neighbors Object which will hold the list of neighbors for each * point which fell into the given range, for each query point. * @param distances Object which will hold the list of distances for each * point which fell into the given range, for each query point. */ void Search(const math::Range& range, std::vector >& neighbors, std::vector >& distances); // Returns a string representation of this object. std::string ToString() const; private: //! Copy of reference matrix; used when a tree is built internally. typename TreeType::Mat referenceCopy; //! Copy of query matrix; used when a tree is built internally. typename TreeType::Mat queryCopy; //! Reference set (data should be accessed using this). const typename TreeType::Mat& referenceSet; //! Query set (data should be accessed using this). const typename TreeType::Mat& querySet; //! Reference tree. TreeType* referenceTree; //! Query tree (may be NULL). TreeType* queryTree; //! Mappings to old reference indices (used when this object builds trees). std::vector oldFromNewReferences; //! Mappings to old query indices (used when this object builds trees). std::vector oldFromNewQueries; //! If true, this object is responsible for deleting the trees. bool treeOwner; //! If true, a query set was passed; if false, the query set is the reference //! set. bool hasQuerySet; //! If true, O(n^2) naive computation is used. bool naive; //! If true, single-tree computation is used. bool singleMode; //! Instantiated distance metric. MetricType metric; //! The number of pruned nodes during computation. size_t numPrunes; }; }; // namespace range }; // namespace mlpack // Include implementation. #include "range_search_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/range_search/range_search_rules.hpp0000644000176200001440000001302613647512751024551 0ustar liggesusers/** * @file range_search_rules.hpp * @author Ryan Curtin * * Rules for range search, so that it can be done with arbitrary tree types. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_HPP #include "../neighbor_search/ns_traversal_info.hpp" namespace mlpack { namespace range { template class RangeSearchRules { public: /** * Construct the RangeSearchRules object. This is usually done from within * the RangeSearch class at search time. * * @param referenceSet Set of reference data. * @param querySet Set of query data. * @param range Range to search for. * @param neighbors Vector to store resulting neighbors in. * @param distances Vector to store resulting distances in. * @param metric Instantiated metric. */ RangeSearchRules(const arma::mat& referenceSet, const arma::mat& querySet, const math::Range& range, std::vector >& neighbors, std::vector >& distances, MetricType& metric); /** * Compute the base case between the given query point and reference point. * * @param queryIndex Index of query point. * @param referenceIndex Index of reference point. */ double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) const; /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference set. const arma::mat& referenceSet; //! The query set. const arma::mat& querySet; //! The range of distances for which we are searching. const math::Range& range; //! The vector the resultant neighbor indices should be stored in. std::vector >& neighbors; //! The vector the resultant neighbor distances should be stored in. std::vector >& distances; //! The instantiated metric. MetricType& metric; //! The last query index. size_t lastQueryIndex; //! The last reference index. size_t lastReferenceIndex; //! Add all the points in the given node to the results for the given query //! point. If the base case has already been calculated, we make sure to not //! add that to the results twice. void AddResult(const size_t queryIndex, TreeType& referenceNode); TraversalInfoType traversalInfo; }; }; // namespace range }; // namespace mlpack // Include implementation. #include "range_search_rules_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/range_search/range_search_stat.hpp0000644000176200001440000000437513647512751024401 0ustar liggesusers/** * @file range_search_stat.hpp * @author Ryan Curtin * * Statistic class for RangeSearch, which just holds the last visited node and * the corresponding base case result. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_STAT_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_STAT_HPP #include namespace mlpack { namespace range { /** * Statistic class for RangeSearch, to be set to the StatisticType of the tree * type that range search is being performed with. This class just holds the * last visited node and the corresponding base case result. */ class RangeSearchStat { public: /** * Initialize the statistic. */ RangeSearchStat() : lastDistanceNode(NULL), lastDistance(0.0) { } /** * Initialize the statistic given a tree node that this statistic belongs to. * In this case, we ignore the node. */ template RangeSearchStat(TreeType& /* node */) : lastDistanceNode(NULL), lastDistance(0.0) { } //! Get the last distance evaluation node. void* LastDistanceNode() const { return lastDistanceNode; } //! Modify the last distance evaluation node. void*& LastDistanceNode() { return lastDistanceNode; } //! Get the last distance evaluation. double LastDistance() const { return lastDistance; } //! Modify the last distance evaluation. double& LastDistance() { return lastDistance; } private: //! The last distance evaluation node. void* lastDistanceNode; //! The last distance evaluation. double lastDistance; }; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/lars/0000755000176200001440000000000013647514343016522 5ustar liggesusersRcppMLPACK/src/mlpack/methods/lars/lars.hpp0000644000176200001440000002045313647512751020201 0ustar liggesusers/** * @file lars.hpp * @author Nishant Mehta (niche) * * Definition of the LARS class, which performs Least Angle Regression and the * LASSO. * * Only minor modifications of LARS are necessary to handle the constrained * version of the problem: * * \f[ * \min_{\beta} 0.5 || X \beta - y ||_2^2 + 0.5 \lambda_2 || \beta ||_2^2 * \f] * subject to \f$ ||\beta||_1 <= \tau \f$ * * Although this option currently is not implemented, it will be implemented * very soon. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LARS_LARS_HPP #define __MLPACK_METHODS_LARS_LARS_HPP #include namespace mlpack { namespace regression { // beta is the estimator // yHat is the prediction from the current estimator /** * An implementation of LARS, a stage-wise homotopy-based algorithm for * l1-regularized linear regression (LASSO) and l1+l2 regularized linear * regression (Elastic Net). * * Let \f$ X \f$ be a matrix where each row is a point and each column is a * dimension and let \f$ y \f$ be a vector of responses. * * The Elastic Net problem is to solve * * \f[ \min_{\beta} 0.5 || X \beta - y ||_2^2 + \lambda_1 || \beta ||_1 + * 0.5 \lambda_2 || \beta ||_2^2 \f] * * where \f$ \beta \f$ is the vector of regression coefficients. * * If \f$ \lambda_1 > 0 \f$ and \f$ \lambda_2 = 0 \f$, the problem is the LASSO. * If \f$ \lambda_1 > 0 \f$ and \f$ \lambda_2 > 0 \f$, the problem is the * elastic net. * If \f$ \lambda_1 = 0 \f$ and \f$ \lambda_2 > 0 \f$, the problem is ridge * regression. * If \f$ \lambda_1 = 0 \f$ and \f$ \lambda_2 = 0 \f$, the problem is * unregularized linear regression. * * Note: This algorithm is not recommended for use (in terms of efficiency) * when \f$ \lambda_1 \f$ = 0. * * For more details, see the following papers: * * @code * @article{efron2004least, * title={Least angle regression}, * author={Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R.}, * journal={The Annals of statistics}, * volume={32}, * number={2}, * pages={407--499}, * year={2004}, * publisher={Institute of Mathematical Statistics} * } * @endcode * * @code * @article{zou2005regularization, * title={Regularization and variable selection via the elastic net}, * author={Zou, H. and Hastie, T.}, * journal={Journal of the Royal Statistical Society Series B}, * volume={67}, * number={2}, * pages={301--320}, * year={2005}, * publisher={Royal Statistical Society} * } * @endcode */ class LARS { public: /** * Set the parameters to LARS. Both lambda1 and lambda2 default to 0. * * @param useCholesky Whether or not to use Cholesky decomposition when * solving linear system (as opposed to using the full Gram matrix). * @param lambda1 Regularization parameter for l1-norm penalty. * @param lambda2 Regularization parameter for l2-norm penalty. * @param tolerance Run until the maximum correlation of elements in (X^T y) * is less than this. */ LARS(const bool useCholesky, const double lambda1 = 0.0, const double lambda2 = 0.0, const double tolerance = 1e-16); /** * Set the parameters to LARS, and pass in a precalculated Gram matrix. Both * lambda1 and lambda2 default to 0. * * @param useCholesky Whether or not to use Cholesky decomposition when * solving linear system (as opposed to using the full Gram matrix). * @param gramMatrix Gram matrix. * @param lambda1 Regularization parameter for l1-norm penalty. * @param lambda2 Regularization parameter for l2-norm penalty. * @param tolerance Run until the maximum correlation of elements in (X^T y) * is less than this. */ LARS(const bool useCholesky, const arma::mat& gramMatrix, const double lambda1 = 0.0, const double lambda2 = 0.0, const double tolerance = 1e-16); /** * Run LARS. The input matrix (like all MLPACK matrices) should be * column-major -- each column is an observation and each row is a dimension. * However, because LARS is more efficient on a row-major matrix, this method * will (internally) transpose the matrix. If this transposition is not * necessary (i.e., you want to pass in a row-major matrix), pass 'false' for * the transposeData parameter. * * @param data Column-major input data (or row-major input data if rowMajor = * true). * @param responses A vector of targets. * @param beta Vector to store the solution (the coefficients) in. * @param rowMajor Set to false if the data is row-major. */ void Regress(const arma::mat& data, const arma::vec& responses, arma::vec& beta, const bool transposeData = true); //! Access the set of active dimensions. const std::vector& ActiveSet() const { return activeSet; } //! Access the set of coefficients after each iteration; the solution is the //! last element. const std::vector& BetaPath() const { return betaPath; } //! Access the set of values for lambda1 after each iteration; the solution is //! the last element. const std::vector& LambdaPath() const { return lambdaPath; } //! Access the upper triangular cholesky factor const arma::mat& MatUtriCholFactor() const { return matUtriCholFactor; } // Returns a string representation of this object. std::string ToString() const; private: //! Gram matrix. arma::mat matGramInternal; //! Reference to the Gram matrix we will use. const arma::mat& matGram; //! Upper triangular cholesky factor; initially 0x0 matrix. arma::mat matUtriCholFactor; //! Whether or not to use Cholesky decomposition when solving linear system. bool useCholesky; //! True if this is the LASSO problem. bool lasso; //! Regularization parameter for l1 penalty. double lambda1; //! True if this is the elastic net problem. bool elasticNet; //! Regularization parameter for l2 penalty. double lambda2; //! Tolerance for main loop. double tolerance; //! Solution path. std::vector betaPath; //! Value of lambda_1 for each solution in solution path. std::vector lambdaPath; //! Active set of dimensions. std::vector activeSet; //! Active set membership indicator (for each dimension). std::vector isActive; // Set of variables that are ignored (if any). //! Set of ignored variables (for dimensions in span{active set dimensions}). std::vector ignoreSet; //! Membership indicator for set of ignored variables. std::vector isIgnored; /** * Remove activeVarInd'th element from active set. * * @param activeVarInd Index of element to remove from active set. */ void Deactivate(const size_t activeVarInd); /** * Add dimension varInd to active set. * * @param varInd Dimension to add to active set. */ void Activate(const size_t varInd); /** * Add dimension varInd to ignores set (never removed). * * @param varInd Dimension to add to ignores set. */ void Ignore(const size_t varInd); // compute "equiangular" direction in output space void ComputeYHatDirection(const arma::mat& matX, const arma::vec& betaDirection, arma::vec& yHatDirection); // interpolate to compute last solution vector void InterpolateBeta(); void CholeskyInsert(const arma::vec& newX, const arma::mat& X); void CholeskyInsert(double sqNormNewX, const arma::vec& newGramCol); void GivensRotate(const arma::vec::fixed<2>& x, arma::vec::fixed<2>& rotatedX, arma::mat& G); void CholeskyDelete(const size_t colToKill); }; }; // namespace regression }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/lars/lars.cpp0000644000176200001440000003374013647514343020176 0ustar liggesusers/** * @file lars.cpp * @author Nishant Mehta (niche) * * Implementation of LARS and LASSO. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "lars.hpp" using namespace mlpack; using namespace mlpack::regression; LARS::LARS(const bool useCholesky, const double lambda1, const double lambda2, const double tolerance) : matGram(matGramInternal), useCholesky(useCholesky), lasso((lambda1 != 0)), lambda1(lambda1), elasticNet((lambda1 != 0) && (lambda2 != 0)), lambda2(lambda2), tolerance(tolerance) { /* Nothing left to do. */ } LARS::LARS(const bool useCholesky, const arma::mat& gramMatrix, const double lambda1, const double lambda2, const double tolerance) : matGram(gramMatrix), useCholesky(useCholesky), lasso((lambda1 != 0)), lambda1(lambda1), elasticNet((lambda1 != 0) && (lambda2 != 0)), lambda2(lambda2), tolerance(tolerance) { /* Nothing left to do */ } void LARS::Regress(const arma::mat& matX, const arma::vec& y, arma::vec& beta, const bool transposeData) { //Timer::Start("lars_regression"); // This matrix may end up holding the transpose -- if necessary. arma::mat dataTrans; // dataRef is row-major. const arma::mat& dataRef = (transposeData ? dataTrans : matX); if (transposeData) dataTrans = trans(matX); // Compute X' * y. arma::vec vecXTy = trans(dataRef) * y; // Set up active set variables. In the beginning, the active set has size 0 // (all dimensions are inactive). isActive.resize(dataRef.n_cols, false); // Set up ignores set variables. Initialized empty. isIgnored.resize(dataRef.n_cols, false); // Initialize yHat and beta. beta = arma::zeros(dataRef.n_cols); arma::vec yHat = arma::zeros(dataRef.n_rows); arma::vec yHatDirection = arma::vec(dataRef.n_rows); bool lassocond = false; // Compute the initial maximum correlation among all dimensions. arma::vec corr = vecXTy; double maxCorr = 0; size_t changeInd = 0; for (size_t i = 0; i < vecXTy.n_elem; ++i) { if (fabs(corr(i)) > maxCorr) { maxCorr = fabs(corr(i)); changeInd = i; } } betaPath.push_back(beta); lambdaPath.push_back(maxCorr); // If the maximum correlation is too small, there is no reason to continue. if (maxCorr < lambda1) { lambdaPath[0] = lambda1; //Timer::Stop("lars_regression"); return; } // Compute the Gram matrix. If this is the elastic net problem, we will add // lambda2 * I_n to the matrix. if (matGram.n_elem == 0) { // In this case, matGram should reference matGramInternal. matGramInternal = trans(dataRef) * dataRef; if (elasticNet && !useCholesky) matGramInternal += lambda2 * arma::eye(dataRef.n_cols, dataRef.n_cols); } // Main loop. while (((activeSet.size() + ignoreSet.size()) < dataRef.n_cols) && (maxCorr > tolerance)) { // Compute the maximum correlation among inactive dimensions. maxCorr = 0; for (size_t i = 0; i < dataRef.n_cols; i++) { if ((!isActive[i]) && (!isIgnored[i]) && (fabs(corr(i)) > maxCorr)) { maxCorr = fabs(corr(i)); changeInd = i; } } if (!lassocond) { if (useCholesky) { // vec newGramCol = vec(activeSet.size()); // for (size_t i = 0; i < activeSet.size(); i++) // { // newGramCol[i] = dot(matX.col(activeSet[i]), matX.col(changeInd)); // } // This is equivalent to the above 5 lines. arma::vec newGramCol = matGram.elem(changeInd * dataRef.n_cols + arma::conv_to::from(activeSet)); CholeskyInsert(matGram(changeInd, changeInd), newGramCol); } // Add variable to active set. Activate(changeInd); } // Compute signs of correlations. arma::vec s = arma::vec(activeSet.size()); for (size_t i = 0; i < activeSet.size(); i++) s(i) = corr(activeSet[i]) / fabs(corr(activeSet[i])); // Compute the "equiangular" direction in parameter space (betaDirection). // We use quotes because in the case of non-unit norm variables, this need // not be equiangular. arma::vec unnormalizedBetaDirection; double normalization; arma::vec betaDirection; if (useCholesky) { // Check for singularity. const double lastUtriElement = matUtriCholFactor( matUtriCholFactor.n_cols - 1, matUtriCholFactor.n_rows - 1); if (std::abs(lastUtriElement) > tolerance) { // Ok, no singularity. /** * Note that: * R^T R % S^T % S = (R % S)^T (R % S) * Now, for 1 the ones vector: * inv( (R % S)^T (R % S) ) 1 * = inv(R % S) inv((R % S)^T) 1 * = inv(R % S) Solve((R % S)^T, 1) * = inv(R % S) Solve(R^T, s) * = Solve(R % S, Solve(R^T, s) * = s % Solve(R, Solve(R^T, s)) */ unnormalizedBetaDirection = solve(trimatu(matUtriCholFactor), solve(trimatl(trans(matUtriCholFactor)), s)); normalization = 1.0 / sqrt(dot(s, unnormalizedBetaDirection)); betaDirection = normalization * unnormalizedBetaDirection; } else { // Singularity, so remove variable from active set, add to ignores set, // and look for new variable to add. Rcpp::Rcout << "Encountered singularity when adding variable " << changeInd << " to active set; permanently removing." << std::endl; Deactivate(activeSet.size() - 1); Ignore(changeInd); CholeskyDelete(matUtriCholFactor.n_rows - 1); continue; } } else { arma::mat matGramActive = arma::mat(activeSet.size(), activeSet.size()); for (size_t i = 0; i < activeSet.size(); i++) for (size_t j = 0; j < activeSet.size(); j++) matGramActive(i, j) = matGram(activeSet[i], activeSet[j]); // Check for singularity. arma::mat matS = s * arma::ones(1, activeSet.size()); const bool solvedOk = solve(unnormalizedBetaDirection, matGramActive % trans(matS) % matS, arma::ones(activeSet.size(), 1)); if (solvedOk) { // Ok, no singularity. normalization = 1.0 / sqrt(sum(unnormalizedBetaDirection)); betaDirection = normalization * unnormalizedBetaDirection % s; } else { // Singularity, so remove variable from active set, add to ignores set, // and look for new variable to add. Deactivate(activeSet.size() - 1); Ignore(changeInd); Rcpp::Rcout << "Encountered singularity when adding variable " << changeInd << " to active set; permanently removing." << std::endl; continue; } } // compute "equiangular" direction in output space ComputeYHatDirection(dataRef, betaDirection, yHatDirection); double gamma = maxCorr / normalization; // If not all variables are active. if ((activeSet.size() + ignoreSet.size()) < dataRef.n_cols) { // Compute correlations with direction. for (size_t ind = 0; ind < dataRef.n_cols; ind++) { if (isActive[ind] || isIgnored[ind]) continue; double dirCorr = dot(dataRef.col(ind), yHatDirection); double val1 = (maxCorr - corr(ind)) / (normalization - dirCorr); double val2 = (maxCorr + corr(ind)) / (normalization + dirCorr); if ((val1 > 0) && (val1 < gamma)) gamma = val1; if ((val2 > 0) && (val2 < gamma)) gamma = val2; } } // Bound gamma according to LASSO. if (lasso) { lassocond = false; double lassoboundOnGamma = DBL_MAX; size_t activeIndToKickOut = -1; for (size_t i = 0; i < activeSet.size(); i++) { double val = -beta(activeSet[i]) / betaDirection(i); if ((val > 0) && (val < lassoboundOnGamma)) { lassoboundOnGamma = val; activeIndToKickOut = i; } } if (lassoboundOnGamma < gamma) { gamma = lassoboundOnGamma; lassocond = true; changeInd = activeIndToKickOut; } } // Update the prediction. yHat += gamma * yHatDirection; // Update the estimator. for (size_t i = 0; i < activeSet.size(); i++) { beta(activeSet[i]) += gamma * betaDirection(i); } // Sanity check to make sure the kicked out dimension is actually zero. if (lassocond) { if (beta(activeSet[changeInd]) != 0) beta(activeSet[changeInd]) = 0; } betaPath.push_back(beta); if (lassocond) { // Index is in position changeInd in activeSet. if (useCholesky) CholeskyDelete(changeInd); Deactivate(changeInd); } corr = vecXTy - trans(dataRef) * yHat; if (elasticNet) corr -= lambda2 * beta; double curLambda = 0; for (size_t i = 0; i < activeSet.size(); i++) curLambda += fabs(corr(activeSet[i])); curLambda /= ((double) activeSet.size()); lambdaPath.push_back(curLambda); // Time to stop for LASSO? if (lasso) { if (curLambda <= lambda1) { InterpolateBeta(); break; } } } // Unfortunate copy... beta = betaPath.back(); //Timer::Stop("lars_regression"); } // Private functions. void LARS::Deactivate(const size_t activeVarInd) { isActive[activeSet[activeVarInd]] = false; activeSet.erase(activeSet.begin() + activeVarInd); } void LARS::Activate(const size_t varInd) { isActive[varInd] = true; activeSet.push_back(varInd); } void LARS::Ignore(const size_t varInd) { isIgnored[varInd] = true; ignoreSet.push_back(varInd); } void LARS::ComputeYHatDirection(const arma::mat& matX, const arma::vec& betaDirection, arma::vec& yHatDirection) { yHatDirection.fill(0); for (size_t i = 0; i < activeSet.size(); i++) yHatDirection += betaDirection(i) * matX.col(activeSet[i]); } void LARS::InterpolateBeta() { int pathLength = betaPath.size(); // interpolate beta and stop double ultimateLambda = lambdaPath[pathLength - 1]; double penultimateLambda = lambdaPath[pathLength - 2]; double interp = (penultimateLambda - lambda1) / (penultimateLambda - ultimateLambda); betaPath[pathLength - 1] = (1 - interp) * (betaPath[pathLength - 2]) + interp * betaPath[pathLength - 1]; lambdaPath[pathLength - 1] = lambda1; } void LARS::CholeskyInsert(const arma::vec& newX, const arma::mat& X) { if (matUtriCholFactor.n_rows == 0) { matUtriCholFactor = arma::mat(1, 1); if (elasticNet) matUtriCholFactor(0, 0) = sqrt(dot(newX, newX) + lambda2); else matUtriCholFactor(0, 0) = norm(newX, 2); } else { arma::vec newGramCol = trans(X) * newX; CholeskyInsert(dot(newX, newX), newGramCol); } } void LARS::CholeskyInsert(double sqNormNewX, const arma::vec& newGramCol) { int n = matUtriCholFactor.n_rows; if (n == 0) { matUtriCholFactor = arma::mat(1, 1); if (elasticNet) matUtriCholFactor(0, 0) = sqrt(sqNormNewX + lambda2); else matUtriCholFactor(0, 0) = sqrt(sqNormNewX); } else { arma::mat matNewR = arma::mat(n + 1, n + 1); if (elasticNet) sqNormNewX += lambda2; arma::vec matUtriCholFactork = solve(trimatl(trans(matUtriCholFactor)), newGramCol); matNewR(arma::span(0, n - 1), arma::span(0, n - 1)) = matUtriCholFactor; matNewR(arma::span(0, n - 1), n) = matUtriCholFactork; matNewR(n, arma::span(0, n - 1)).fill(0.0); matNewR(n, n) = sqrt(sqNormNewX - dot(matUtriCholFactork, matUtriCholFactork)); matUtriCholFactor = matNewR; } } void LARS::GivensRotate(const arma::vec::fixed<2>& x, arma::vec::fixed<2>& rotatedX, arma::mat& matG) { if (x(1) == 0) { matG = arma::eye(2, 2); rotatedX = x; } else { double r = norm(x, 2); matG = arma::mat(2, 2); double scaledX1 = x(0) / r; double scaledX2 = x(1) / r; matG(0, 0) = scaledX1; matG(1, 0) = -scaledX2; matG(0, 1) = scaledX2; matG(1, 1) = scaledX1; rotatedX = arma::vec(2); rotatedX(0) = r; rotatedX(1) = 0; } } void LARS::CholeskyDelete(const size_t colToKill) { size_t n = matUtriCholFactor.n_rows; if (colToKill == (n - 1)) { matUtriCholFactor = matUtriCholFactor(arma::span(0, n - 2), arma::span(0, n - 2)); } else { matUtriCholFactor.shed_col(colToKill); // remove column colToKill n--; for (size_t k = colToKill; k < n; k++) { arma::mat matG; arma::vec::fixed<2> rotatedVec; GivensRotate(matUtriCholFactor(arma::span(k, k + 1), k), rotatedVec, matG); matUtriCholFactor(arma::span(k, k + 1), k) = rotatedVec; if (k < n - 1) { matUtriCholFactor(arma::span(k, k + 1), arma::span(k + 1, n - 1)) = matG * matUtriCholFactor(arma::span(k, k + 1), arma::span(k + 1, n - 1)); } } matUtriCholFactor.shed_row(n); } } std::string LARS::ToString() const { std::ostringstream convert; convert << "LARS [" << this << "]" << std::endl; convert << " Gram Matrix: " << matGram.n_rows << "x" << matGram.n_cols; convert << std::endl; convert << " Tolerance: " << tolerance << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/methods/linear_regression/0000755000176200001440000000000013647514343021273 5ustar liggesusersRcppMLPACK/src/mlpack/methods/linear_regression/linear_regression.hpp0000644000176200001440000000736113647512751025526 0ustar liggesusers/** * @file linear_regression.hpp * @author James Cline * * Simple least-squares linear regression. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LINEAR_REGRESSION_LINEAR_REGRESSION_HPP #define __MLPACK_METHODS_LINEAR_REGRESSION_LINEAR_REGRESSION_HPP #include namespace mlpack { namespace regression /** Regression methods. */ { /** * A simple linear regression algorithm using ordinary least squares. * Optionally, this class can perform ridge regression, if the lambda parameter * is set to a number greater than zero. */ class LinearRegression { public: /** * Creates the model. * * @param predictors X, matrix of data points to create B with. * @param responses y, the measured data for each point in X */ LinearRegression(const arma::mat& predictors, const arma::vec& responses, const double lambda = 0); /** * Initialize the model from a file. * * @param filename the name of the file to load the model from. */ //LinearRegression(const std::string& filename); /** * Copy constructor. * * @param linearRegression the other instance to copy parameters from. */ LinearRegression(const LinearRegression& linearRegression); /** * Empty constructor. */ LinearRegression() { } /** * Calculate y_i for each data point in points. * * @param points the data points to calculate with. * @param predictions y, will contain calculated values on completion. */ void Predict(const arma::mat& points, arma::vec& predictions) const; /** * Calculate the L2 squared error on the given predictors and responses using * this linear regression model. This calculation returns * * \f[ * (1 / n) * \| y - X B \|^2_2 * \f] * * where \f$ y \f$ is the responses vector, \f$ X \f$ is the matrix of * predictors, and \f$ B \f$ is the parameters of the trained linear * regression model. * * As this number decreases to 0, the linear regression fit is better. * * @param predictors Matrix of predictors (X). * @param responses Vector of responses (y). */ double ComputeError(const arma::mat& points, const arma::vec& responses) const; //! Return the parameters (the b vector). const arma::vec& Parameters() const { return parameters; } //! Modify the parameters (the b vector). arma::vec& Parameters() { return parameters; } //! Return the Tikhonov regularization parameter for ridge regression. double Lambda() const { return lambda; } //! Modify the Tikhonov regularization parameter for ridge regression. double& Lambda() { return lambda; } // Returns a string representation of this object. std::string ToString() const; private: /** * The calculated B. * Initialized and filled by constructor to hold the least squares solution. */ arma::vec parameters; /** * The Tikhonov regularization parameter for ridge regression (0 for linear * regression). */ double lambda; }; }; // namespace linear_regression }; // namespace mlpack #endif // __MLPACK_METHODS_LINEAR_REGRESSCLIN_HPP RcppMLPACK/src/mlpack/methods/linear_regression/linear_regression.cpp0000644000176200001440000001161013647514343025510 0ustar liggesusers/** * @file linear_regression.cpp * @author James Cline * * Implementation of simple linear regression. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "linear_regression.hpp" using namespace mlpack; using namespace mlpack::regression; LinearRegression::LinearRegression(const arma::mat& predictors, const arma::colvec& responses, const double lambda) : lambda(lambda) { /* * We want to calculate the a_i coefficients of: * \sum_{i=0}^n (a_i * x_i^i) * In order to get the intercept value, we will add a row of ones. */ // We store the number of rows and columns of the predictors. // Reminder: Armadillo stores the data transposed from how we think of it, // that is, columns are actually rows (see: column major order). const size_t nCols = predictors.n_cols; // Here we add the row of ones to the predictors. arma::mat p; if (lambda == 0.0) { p.set_size(predictors.n_rows + 1, nCols); p.submat(1, 0, p.n_rows - 1, nCols - 1) = predictors; p.row(0).fill(1); } else { // Add the identity matrix to the predictors (this is equivalent to ridge // regression). See http://math.stackexchange.com/questions/299481/ for // more information. p.set_size(predictors.n_rows + 1, nCols + predictors.n_rows + 1); p.submat(1, 0, p.n_rows - 1, nCols - 1) = predictors; p.row(0).subvec(0, nCols - 1).fill(1); p.submat(0, nCols, p.n_rows - 1, nCols + predictors.n_rows) = lambda * arma::eye(predictors.n_rows + 1, predictors.n_rows + 1); } // We compute the QR decomposition of the predictors. // We transpose the predictors because they are in column major order. arma::mat Q, R; arma::qr(Q, R, arma::trans(p)); // We compute the parameters, B, like so: // R * B = Q^T * responses // B = Q^T * responses * R^-1 // If lambda > 0, then we must add a bunch of empty responses. if (lambda == 0.0) { arma::solve(parameters, R, arma::trans(Q) * responses); } else { // Copy responses into larger vector. arma::vec r(nCols + predictors.n_rows + 1); r.subvec(0, nCols - 1) = responses; r.subvec(nCols, nCols + predictors.n_rows).fill(0); arma::solve(parameters, R, arma::trans(Q) * r); } } /*LinearRegression::LinearRegression(const std::string& filename) : lambda(0.0) { data::Load(filename, parameters, true); }*/ LinearRegression::LinearRegression(const LinearRegression& linearRegression) : parameters(linearRegression.parameters), lambda(linearRegression.lambda) { /* Nothing to do. */ } void LinearRegression::Predict(const arma::mat& points, arma::vec& predictions) const { // We want to be sure we have the correct number of dimensions in the dataset. //Log::Assert(points.n_rows == parameters.n_rows - 1); // Get the predictions, but this ignores the intercept value (parameters[0]). predictions = arma::trans(arma::trans( parameters.subvec(1, parameters.n_elem - 1)) * points); // Now add the intercept. predictions += parameters(0); } //! Compute the L2 squared error on the given predictors and responses. double LinearRegression::ComputeError(const arma::mat& predictors, const arma::vec& responses) const { // Get the number of columns and rows of the dataset. const size_t nCols = predictors.n_cols; const size_t nRows = predictors.n_rows; // Ensure that we have the correct number of dimensions in the dataset. if (nRows != parameters.n_rows - 1) { Rcpp::Rcout << "The test data must have the same number of columns as the " "training file." << std::endl; } // Calculate the differences between actual responses and predicted responses. // We must also add the intercept (parameters(0)) to the predictions. arma::vec temp = responses - arma::trans( (arma::trans(parameters.subvec(1, parameters.n_elem - 1)) * predictors) + parameters(0)); const double cost = arma::dot(temp, temp) / nCols; return cost; } std::string LinearRegression::ToString() const { std::ostringstream convert; convert << "Linear Regression [" << this << "]" << std::endl; convert << " Lambda: " << lambda << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/methods/emst/0000755000176200001440000000000013647512751016532 5ustar liggesusersRcppMLPACK/src/mlpack/methods/emst/dtb_rules.hpp0000644000176200001440000001460613647512751021235 0ustar liggesusers/** * @file dtb.hpp * @author Bill March (march@gatech.edu) * * Tree traverser rules for the DualTreeBoruvka algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_RULES_HPP #define __MLPACK_METHODS_EMST_DTB_RULES_HPP #include #include "../neighbor_search/ns_traversal_info.hpp" namespace mlpack { namespace emst { template class DTBRules { public: DTBRules(const arma::mat& dataSet, UnionFind& connections, arma::vec& neighborsDistances, arma::Col& neighborsInComponent, arma::Col& neighborsOutComponent, MetricType& metric); double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Get the score for recursion order, passing the base case result (in the * situation where it may be needed to calculate the recursion order). A low * score indicates priority for recursion, while DBL_MAX indicates that the * node should not be recursed into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore); /** * Get the score for recursion order. A low score indicates priority for * recursionm while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Get the score for recursion order, passing the base case result (in the * situation where it may be needed to calculate the recursion order). A low * score indicates priority for recursion, while DBL_MAX indicates that the * node should not be recursed into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Socre() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } //! Get the number of base cases performed. size_t BaseCases() const { return baseCases; } //! Modify the number of base cases performed. size_t& BaseCases() { return baseCases; } //! Get the number of node combinations that have been scored. size_t Scores() const { return scores; } //! Modify the number of node combinations that have been scored. size_t& Scores() { return scores; } private: //! The data points. const arma::mat& dataSet; //! Stores the tree structure so far UnionFind& connections; //! The distance to the candidate nearest neighbor for each component. arma::vec& neighborsDistances; //! The index of the point in the component that is an endpoint of the //! candidate edge. arma::Col& neighborsInComponent; //! The index of the point outside of the component that is an endpoint //! of the candidate edge. arma::Col& neighborsOutComponent; //! The instantiated metric. MetricType& metric; /** * Update the bound for the given query node. */ inline double CalculateBound(TreeType& queryNode) const; TraversalInfoType traversalInfo; //! The number of base cases calculated. size_t baseCases; //! The number of node combinations that have been scored. size_t scores; }; // class DTBRules } // emst namespace } // mlpack namespace #include "dtb_rules_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/emst/edge_pair.hpp0000644000176200001440000000462413647512751021170 0ustar liggesusers/** * @file edge_pair.hpp * * @author Bill March (march@gatech.edu) * * This file contains utilities necessary for all of the minimum spanning tree * algorithms. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_EDGE_PAIR_HPP #define __MLPACK_METHODS_EMST_EDGE_PAIR_HPP #include #include "union_find.hpp" namespace mlpack { namespace emst { /** * An edge pair is simply two indices and a distance. It is used as the * basic element of an edge list when computing a minimum spanning tree. */ class EdgePair { private: //! Lesser index. size_t lesser; //! Greater index. size_t greater; //! Distance between two indices. double distance; public: /** * Initialize an EdgePair with two indices and a distance. The indices are * called lesser and greater, implying that they be sorted before calling * Init. However, this is not necessary for functionality; it is just a way * to keep the edge list organized in other code. */ EdgePair(const size_t lesser, const size_t greater, const double dist) : lesser(lesser), greater(greater), distance(dist) { //Log::Assert(lesser != greater, //"EdgePair::EdgePair(): indices cannot be equal."); } //! Get the lesser index. size_t Lesser() const { return lesser; } //! Modify the lesser index. size_t& Lesser() { return lesser; } //! Get the greater index. size_t Greater() const { return greater; } //! Modify the greater index. size_t& Greater() { return greater; } //! Get the distance. double Distance() const { return distance; } //! Modify the distance. double& Distance() { return distance; } }; // class EdgePair }; // namespace emst }; // namespace mlpack #endif // __MLPACK_METHODS_EMST_EDGE_PAIR_HPP RcppMLPACK/src/mlpack/methods/emst/dtb_rules_impl.hpp0000644000176200001440000002203013647512751022244 0ustar liggesusers/** * @file dtb_impl.hpp * @author Bill March (march@gatech.edu) * * Tree traverser rules for the DualTreeBoruvka algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_RULES_IMPL_HPP #define __MLPACK_METHODS_EMST_DTB_RULES_IMPL_HPP namespace mlpack { namespace emst { template DTBRules:: DTBRules(const arma::mat& dataSet, UnionFind& connections, arma::vec& neighborsDistances, arma::Col& neighborsInComponent, arma::Col& neighborsOutComponent, MetricType& metric) : dataSet(dataSet), connections(connections), neighborsDistances(neighborsDistances), neighborsInComponent(neighborsInComponent), neighborsOutComponent(neighborsOutComponent), metric(metric), baseCases(0), scores(0) { // Nothing else to do. } template inline force_inline double DTBRules::BaseCase(const size_t queryIndex, const size_t referenceIndex) { // Check if the points are in the same component at this iteration. // If not, return the distance between them. Also, store a better result as // the current neighbor, if necessary. double newUpperBound = -1.0; // Find the index of the component the query is in. size_t queryComponentIndex = connections.Find(queryIndex); size_t referenceComponentIndex = connections.Find(referenceIndex); if (queryComponentIndex != referenceComponentIndex) { ++baseCases; double distance = metric.Evaluate(dataSet.col(queryIndex), dataSet.col(referenceIndex)); if (distance < neighborsDistances[queryComponentIndex]) { //Log::Assert(queryIndex != referenceIndex); neighborsDistances[queryComponentIndex] = distance; neighborsInComponent[queryComponentIndex] = queryIndex; neighborsOutComponent[queryComponentIndex] = referenceIndex; } } if (newUpperBound < neighborsDistances[queryComponentIndex]) newUpperBound = neighborsDistances[queryComponentIndex]; //Log::Assert(newUpperBound >= 0.0); return newUpperBound; } template double DTBRules::Score(const size_t queryIndex, TreeType& referenceNode) { size_t queryComponentIndex = connections.Find(queryIndex); // If the query belongs to the same component as all of the references, // then prune. The cast is to stop a warning about comparing unsigned to // signed values. if (queryComponentIndex == (size_t) referenceNode.Stat().ComponentMembership()) return DBL_MAX; const arma::vec queryPoint = dataSet.unsafe_col(queryIndex); const double distance = referenceNode.MinDistance(queryPoint); // If all the points in the reference node are farther than the candidate // nearest neighbor for the query's component, we prune. return neighborsDistances[queryComponentIndex] < distance ? DBL_MAX : distance; } template double DTBRules::Score(const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult) { // I don't really understand the last argument here // It just gets passed in the distance call, otherwise this function // is the same as the one above. size_t queryComponentIndex = connections.Find(queryIndex); // If the query belongs to the same component as all of the references, // then prune. if (queryComponentIndex == referenceNode.Stat().ComponentMembership()) return DBL_MAX; const arma::vec queryPoint = dataSet.unsafe_col(queryIndex); const double distance = referenceNode.MinDistance(queryPoint, baseCaseResult); // If all the points in the reference node are farther than the candidate // nearest neighbor for the query's component, we prune. return (neighborsDistances[queryComponentIndex] < distance) ? DBL_MAX : distance; } template double DTBRules::Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) { // We don't need to check component membership again, because it can't // change inside a single iteration. return (oldScore > neighborsDistances[connections.Find(queryIndex)]) ? DBL_MAX : oldScore; } template double DTBRules::Score(TreeType& queryNode, TreeType& referenceNode) { // If all the queries belong to the same component as all the references // then we prune. if ((queryNode.Stat().ComponentMembership() >= 0) && (queryNode.Stat().ComponentMembership() == referenceNode.Stat().ComponentMembership())) return DBL_MAX; ++scores; const double distance = queryNode.MinDistance(&referenceNode); const double bound = CalculateBound(queryNode); // If all the points in the reference node are farther than the candidate // nearest neighbor for all queries in the node, we prune. return (bound < distance) ? DBL_MAX : distance; } template double DTBRules::Score(TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult) { // If all the queries belong to the same component as all the references // then we prune. if ((queryNode.Stat().ComponentMembership() >= 0) && (queryNode.Stat().ComponentMembership() == referenceNode.Stat().ComponentMembership())) return DBL_MAX; ++scores; const double distance = queryNode.MinDistance(referenceNode, baseCaseResult); const double bound = CalculateBound(queryNode); // If all the points in the reference node are farther than the candidate // nearest neighbor for all queries in the node, we prune. return (bound < distance) ? DBL_MAX : distance; } template double DTBRules::Rescore(TreeType& queryNode, TreeType& /* referenceNode */, const double oldScore) const { const double bound = CalculateBound(queryNode); return (oldScore > bound) ? DBL_MAX : oldScore; } // Calculate the bound for a given query node in its current state and update // it. template inline double DTBRules::CalculateBound( TreeType& queryNode) const { double worstPointBound = -DBL_MAX; double bestPointBound = DBL_MAX; double worstChildBound = -DBL_MAX; double bestChildBound = DBL_MAX; // Now, find the best and worst point bounds. for (size_t i = 0; i < queryNode.NumPoints(); ++i) { const size_t pointComponent = connections.Find(queryNode.Point(i)); const double bound = neighborsDistances[pointComponent]; if (bound > worstPointBound) worstPointBound = bound; if (bound < bestPointBound) bestPointBound = bound; } // Find the best and worst child bounds. for (size_t i = 0; i < queryNode.NumChildren(); ++i) { const double maxBound = queryNode.Child(i).Stat().MaxNeighborDistance(); if (maxBound > worstChildBound) worstChildBound = maxBound; const double minBound = queryNode.Child(i).Stat().MinNeighborDistance(); if (minBound < bestChildBound) bestChildBound = minBound; } // Now calculate the actual bounds. const double worstBound = std::max(worstPointBound, worstChildBound); const double bestBound = std::min(bestPointBound, bestChildBound); // We must check that bestBound != DBL_MAX; otherwise, we risk overflow. const double bestAdjustedBound = (bestBound == DBL_MAX) ? DBL_MAX : bestBound + 2 * queryNode.FurthestDescendantDistance(); // Update the relevant quantities in the node. queryNode.Stat().MaxNeighborDistance() = worstBound; queryNode.Stat().MinNeighborDistance() = bestBound; queryNode.Stat().Bound() = std::min(worstBound, bestAdjustedBound); return queryNode.Stat().Bound(); } }; // namespace emst }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/emst/union_find.hpp0000644000176200001440000000564113647512751021401 0ustar liggesusers/** * @file union_find.hpp * @author Bill March (march@gatech.edu) * * Implements a union-find data structure. This structure tracks the components * of a graph. Each point in the graph is initially in its own component. * Calling unionfind.Union(x, y) unites the components indexed by x and y. * unionfind.Find(x) returns the index of the component containing point x. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_UNION_FIND_HPP #define __MLPACK_METHODS_EMST_UNION_FIND_HPP #include namespace mlpack { namespace emst { /** * A Union-Find data structure. See Cormen, Rivest, & Stein for details. The * structure tracks the components of a graph. Each point in the graph is * initially in its own component. Calling Union(x, y) unites the components * indexed by x and y. Find(x) returns the index of the component containing * point x. */ class UnionFind { private: arma::Col parent; arma::ivec rank; public: //! Construct the object with the given size. UnionFind(const size_t size) : parent(size), rank(size) { for (size_t i = 0; i < size; ++i) { parent[i] = i; rank[i] = 0; } } //! Destroy the object (nothing to do). ~UnionFind() { } /** * Returns the component containing an element. * * @param x the component to be found * @return The index of the component containing x */ size_t Find(const size_t x) { if (parent[x] == x) { return x; } else { // This ensures that the tree has a small depth parent[x] = Find(parent[x]); return parent[x]; } } /** * Union the components containing x and y. * * @param x one component * @param y the other component */ void Union(const size_t x, const size_t y) { const size_t xRoot = Find(x); const size_t yRoot = Find(y); if (xRoot == yRoot) { return; } else if (rank[xRoot] == rank[yRoot]) { parent[yRoot] = parent[xRoot]; rank[xRoot] = rank[xRoot] + 1; } else if (rank[xRoot] > rank[yRoot]) { parent[yRoot] = xRoot; } else { parent[xRoot] = yRoot; } } }; // class UnionFind }; // namespace emst }; // namespace mlpack #endif // __MLPACK_METHODS_EMST_UNION_FIND_HPP RcppMLPACK/src/mlpack/methods/emst/dtb_impl.hpp0000644000176200001440000002261413647512751021042 0ustar liggesusers/** * @file dtb_impl.hpp * @author Bill March (march@gatech.edu) * * Implementation of DTB. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_IMPL_HPP #define __MLPACK_METHODS_EMST_DTB_IMPL_HPP #include "dtb_rules.hpp" namespace mlpack { namespace emst { //! Call the tree constructor that does mapping. template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } /** * Takes in a reference to the data set. Copies the data, builds the tree, * and initializes all of the member variables. */ template DualTreeBoruvka::DualTreeBoruvka( const typename TreeType::Mat& dataset, const bool naive, const MetricType metric) : data((tree::TreeTraits::RearrangesDataset && !naive) ? dataCopy : dataset), ownTree(!naive), naive(naive), connections(dataset.n_cols), totalDist(0.0), metric(metric) { //Timer::Start("emst/tree_building"); if (!naive) { // Copy the dataset, if it will be modified during tree construction. if (tree::TreeTraits::RearrangesDataset) dataCopy = dataset; tree = BuildTree(const_cast(data), oldFromNew); } //Timer::Stop("emst/tree_building"); edges.reserve(data.n_cols - 1); // Set size. neighborsInComponent.set_size(data.n_cols); neighborsOutComponent.set_size(data.n_cols); neighborsDistances.set_size(data.n_cols); neighborsDistances.fill(DBL_MAX); } // Constructor template DualTreeBoruvka::DualTreeBoruvka( TreeType* tree, const typename TreeType::Mat& dataset, const MetricType metric) : data(dataset), tree(tree), ownTree(false), naive(false), connections(data.n_cols), totalDist(0.0), metric(metric) { edges.reserve(data.n_cols - 1); // Fill with EdgePairs. neighborsInComponent.set_size(data.n_cols); neighborsOutComponent.set_size(data.n_cols); neighborsDistances.set_size(data.n_cols); neighborsDistances.fill(DBL_MAX); } template DualTreeBoruvka::~DualTreeBoruvka() { if (ownTree) delete tree; } /** * Iteratively find the nearest neighbor of each component until the MST is * complete. */ template void DualTreeBoruvka::ComputeMST(arma::mat& results) { //Timer::Start("emst/mst_computation"); totalDist = 0; // Reset distance. typedef DTBRules RuleType; RuleType rules(data, connections, neighborsDistances, neighborsInComponent, neighborsOutComponent, metric); while (edges.size() < (data.n_cols - 1)) { if (naive) { // Full O(N^2) traversal. for (size_t i = 0; i < data.n_cols; ++i) for (size_t j = 0; j < data.n_cols; ++j) rules.BaseCase(i, j); } else { typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*tree, *tree); } AddAllEdges(); Cleanup(); Rcpp::Rcout << edges.size() << " edges found so far." << std::endl; if (!naive) { Rcpp::Rcout << rules.BaseCases() << " cumulative base cases." << std::endl; Rcpp::Rcout << rules.Scores() << " cumulative node combinations scored." << std::endl; } } //Timer::Stop("emst/mst_computation"); EmitResults(results); Rcpp::Rcout << "Total spanning tree length: " << totalDist << std::endl; } /** * Adds a single edge to the edge list */ template void DualTreeBoruvka::AddEdge(const size_t e1, const size_t e2, const double distance) { //Log::Assert((distance >= 0.0), // "DualTreeBoruvka::AddEdge(): distance cannot be negative."); if (e1 < e2) edges.push_back(EdgePair(e1, e2, distance)); else edges.push_back(EdgePair(e2, e1, distance)); } // AddEdge /** * Adds all the edges found in one iteration to the list of neighbors. */ template void DualTreeBoruvka::AddAllEdges() { for (size_t i = 0; i < data.n_cols; i++) { size_t component = connections.Find(i); size_t inEdge = neighborsInComponent[component]; size_t outEdge = neighborsOutComponent[component]; if (connections.Find(inEdge) != connections.Find(outEdge)) { //totalDist = totalDist + dist; // changed to make this agree with the cover tree code totalDist += neighborsDistances[component]; AddEdge(inEdge, outEdge, neighborsDistances[component]); connections.Union(inEdge, outEdge); } } } // AddAllEdges /** * Unpermute the edge list (if necessary) and output it to results. */ template void DualTreeBoruvka::EmitResults(arma::mat& results) { // Sort the edges. std::sort(edges.begin(), edges.end(), SortFun); //Log::Assert(edges.size() == data.n_cols - 1); results.set_size(3, edges.size()); // Need to unpermute the point labels. if (!naive && ownTree && tree::TreeTraits::RearrangesDataset) { for (size_t i = 0; i < (data.n_cols - 1); i++) { // Make sure the edge list stores the smaller index first to // make checking correctness easier size_t ind1 = oldFromNew[edges[i].Lesser()]; size_t ind2 = oldFromNew[edges[i].Greater()]; if (ind1 < ind2) { edges[i].Lesser() = ind1; edges[i].Greater() = ind2; } else { edges[i].Lesser() = ind2; edges[i].Greater() = ind1; } results(0, i) = edges[i].Lesser(); results(1, i) = edges[i].Greater(); results(2, i) = edges[i].Distance(); } } else { for (size_t i = 0; i < edges.size(); i++) { results(0, i) = edges[i].Lesser(); results(1, i) = edges[i].Greater(); results(2, i) = edges[i].Distance(); } } } // EmitResults /** * This function resets the values in the nodes of the tree nearest neighbor * distance and checks for fully connected nodes. */ template void DualTreeBoruvka::CleanupHelper(TreeType* tree) { // Reset the statistic information. tree->Stat().MaxNeighborDistance() = DBL_MAX; tree->Stat().MinNeighborDistance() = DBL_MAX; tree->Stat().Bound() = DBL_MAX; // Recurse into all children. for (size_t i = 0; i < tree->NumChildren(); ++i) CleanupHelper(&tree->Child(i)); // Get the component of the first child or point. Then we will check to see // if all other components of children and points are the same. const int component = (tree->NumChildren() != 0) ? tree->Child(0).Stat().ComponentMembership() : connections.Find(tree->Point(0)); // Check components of children. for (size_t i = 0; i < tree->NumChildren(); ++i) if (tree->Child(i).Stat().ComponentMembership() != component) return; // Check components of points. for (size_t i = 0; i < tree->NumPoints(); ++i) if (connections.Find(tree->Point(i)) != size_t(component)) return; // If we made it this far, all components are the same. tree->Stat().ComponentMembership() = component; } /** * The values stored in the tree must be reset on each iteration. */ template void DualTreeBoruvka::Cleanup() { for (size_t i = 0; i < data.n_cols; i++) neighborsDistances[i] = DBL_MAX; if (!naive) CleanupHelper(tree); } // convert the object to a string template std::string DualTreeBoruvka::ToString() const { std::ostringstream convert; convert << "DualTreeBoruvka [" << this << "]" << std::endl; convert << " Data: " << data.n_rows << "x" << data.n_cols <. */ #ifndef __MLPACK_METHODS_EMST_DTB_HPP #define __MLPACK_METHODS_EMST_DTB_HPP #include "dtb_stat.hpp" #include "edge_pair.hpp" #include #include #include namespace mlpack { namespace emst /** Euclidean Minimum Spanning Trees. */ { /** * Performs the MST calculation using the Dual-Tree Boruvka algorithm, using any * type of tree. * * For more information on the algorithm, see the following citation: * * @code * @inproceedings{ * author = {March, W.B., Ram, P., and Gray, A.G.}, * title = {{Fast Euclidean Minimum Spanning Tree: Algorithm, Analysis, * Applications.}}, * booktitle = {Proceedings of the 16th ACM SIGKDD International Conference * on Knowledge Discovery and Data Mining} * series = {KDD 2010}, * year = {2010} * } * @endcode * * General usage of this class might be like this: * * @code * extern arma::mat data; // We want to find the MST of this dataset. * DualTreeBoruvka<> dtb(data); // Create the tree with default options. * * // Find the MST. * arma::mat mstResults; * dtb.ComputeMST(mstResults); * @endcode * * More advanced usage of the class can use different types of trees, pass in an * already-built tree, or compute the MST using the O(n^2) naive algorithm. * * @tparam MetricType The metric to use. IMPORTANT: this hasn't really been * tested with anything other than the L2 metric, so user beware. Note that the * tree type needs to compute bounds using the same metric as the type * specified here. * @tparam TreeType Type of tree to use. Should use DTBStat as a statistic. */ template< typename MetricType = metric::EuclideanDistance, typename TreeType = tree::BinarySpaceTree, DTBStat> > class DualTreeBoruvka { private: //! Copy of the data (if necessary). typename TreeType::Mat dataCopy; //! Reference to the data (this is what should be used for accessing data). const typename TreeType::Mat& data; //! Pointer to the root of the tree. TreeType* tree; //! Indicates whether or not we "own" the tree. bool ownTree; //! Indicates whether or not O(n^2) naive mode will be used. bool naive; //! Edges. std::vector edges; // We must use vector with non-numerical types. //! Connections. UnionFind connections; //! Permutations of points during tree building. std::vector oldFromNew; //! List of edge nodes. arma::Col neighborsInComponent; //! List of edge nodes. arma::Col neighborsOutComponent; //! List of edge distances. arma::vec neighborsDistances; //! Total distance of the tree. double totalDist; //! The instantiated metric. MetricType metric; //! For sorting the edge list after the computation. struct SortEdgesHelper { bool operator()(const EdgePair& pairA, const EdgePair& pairB) { return (pairA.Distance() < pairB.Distance()); } } SortFun; public: /** * Create the tree from the given dataset. This copies the dataset to an * internal copy, because tree-building modifies the dataset. * * @param data Dataset to build a tree for. * @param naive Whether the computation should be done in O(n^2) naive mode. * @param leafSize The leaf size to be used during tree construction. */ DualTreeBoruvka(const typename TreeType::Mat& dataset, const bool naive = false, const MetricType metric = MetricType()); /** * Create the DualTreeBoruvka object with an already initialized tree. This * will not copy the dataset, and can save a little processing power. Naive * mode is not available as an option for this constructor; instead, to run * naive computation, construct a tree with all the points in one leaf (i.e. * leafSize = number of points). * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param tree Pre-built tree. * @param dataset Dataset corresponding to the pre-built tree. */ DualTreeBoruvka(TreeType* tree, const typename TreeType::Mat& dataset, const MetricType metric = MetricType()); /** * Delete the tree, if it was created inside the object. */ ~DualTreeBoruvka(); /** * Iteratively find the nearest neighbor of each component until the MST is * complete. The results will be a 3xN matrix (with N equal to the number of * edges in the minimum spanning tree). The first row will contain the lesser * index of the edge; the second row will contain the greater index of the * edge; and the third row will contain the distance between the two edges. * * @param results Matrix which results will be stored in. */ void ComputeMST(arma::mat& results); /** * Returns a string representation of this object. */ std::string ToString() const; private: /** * Adds a single edge to the edge list */ void AddEdge(const size_t e1, const size_t e2, const double distance); /** * Adds all the edges found in one iteration to the list of neighbors. */ void AddAllEdges(); /** * Unpermute the edge list and output it to results. */ void EmitResults(arma::mat& results); /** * This function resets the values in the nodes of the tree nearest neighbor * distance, and checks for fully connected nodes. */ void CleanupHelper(TreeType* tree); /** * The values stored in the tree must be reset on each iteration. */ void Cleanup(); }; // class DualTreeBoruvka }; // namespace emst }; // namespace mlpack #include "dtb_impl.hpp" #endif // __MLPACK_METHODS_EMST_DTB_HPP RcppMLPACK/src/mlpack/methods/emst/dtb_stat.hpp0000644000176200001440000000675113647512751021060 0ustar liggesusers/** * @file dtb.hpp * @author Bill March (march@gatech.edu) * * DTBStat is the StatisticType used by trees when performing EMST. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_STAT_HPP #define __MLPACK_METHODS_EMST_DTB_STAT_HPP #include namespace mlpack { namespace emst { /** * A statistic for use with MLPACK trees, which stores the upper bound on * distance to nearest neighbors and the component which this node belongs to. */ class DTBStat { private: //! Upper bound on the distance to the nearest neighbor of any point in this //! node. double maxNeighborDistance; //! Lower bound on the distance to the nearest neighbor of any point in this //! node. double minNeighborDistance; //! Total bound for pruning. double bound; //! The index of the component that all points in this node belong to. This //! is the same index returned by UnionFind for all points in this node. If //! points in this node are in different components, this value will be //! negative. int componentMembership; public: /** * A generic initializer. Sets the maximum neighbor distance to its default, * and the component membership to -1 (no component). */ DTBStat() : maxNeighborDistance(DBL_MAX), minNeighborDistance(DBL_MAX), bound(DBL_MAX), componentMembership(-1) { } /** * This is called when a node is finished initializing. We set the maximum * neighbor distance to its default, and if possible, we set the component * membership of the node (if it has only one point and no children). * * @param node Node that has been finished. */ template DTBStat(const TreeType& node) : maxNeighborDistance(DBL_MAX), minNeighborDistance(DBL_MAX), bound(DBL_MAX), componentMembership( ((node.NumPoints() == 1) && (node.NumChildren() == 0)) ? node.Point(0) : -1) { } //! Get the maximum neighbor distance. double MaxNeighborDistance() const { return maxNeighborDistance; } //! Modify the maximum neighbor distance. double& MaxNeighborDistance() { return maxNeighborDistance; } //! Get the minimum neighbor distance. double MinNeighborDistance() const { return minNeighborDistance; } //! Modify the minimum neighbor distance. double& MinNeighborDistance() { return minNeighborDistance; } //! Get the total bound for pruning. double Bound() const { return bound; } //! Modify the total bound for pruning. double& Bound() { return bound; } //! Get the component membership of this node. int ComponentMembership() const { return componentMembership; } //! Modify the component membership of this node. int& ComponentMembership() { return componentMembership; } }; // class DTBStat }; // namespace emst }; // namespace mlpack #endif // __MLPACK_METHODS_EMST_DTB_STAT_HPP RcppMLPACK/src/mlpack/methods/lsh/0000755000176200001440000000000013647512751016350 5ustar liggesusersRcppMLPACK/src/mlpack/methods/lsh/lsh_search.hpp0000644000176200001440000002366013647512751021203 0ustar liggesusers/** * @file lsh_search.hpp * @author Parikshit Ram * * Defines the LSHSearch class, which performs an approximate * nearest neighbor search for a queries in a query set * over a given dataset using Locality-sensitive hashing * with 2-stable distributions. * * The details of this method can be found in the following paper: * * @inproceedings{datar2004locality, * title={Locality-sensitive hashing scheme based on p-stable distributions}, * author={Datar, M. and Immorlica, N. and Indyk, P. and Mirrokni, V.S.}, * booktitle= * {Proceedings of the 12th Annual Symposium on Computational Geometry}, * pages={253--262}, * year={2004}, * organization={ACM} * } * * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_HPP #include #include #include #include #include namespace mlpack { namespace neighbor { /** * The LSHSearch class -- This class builds a hash on the reference set * and uses this hash to compute the distance-approximate nearest-neighbors * of the given queries. * * @tparam SortPolicy The sort policy for distances; see NearestNeighborSort. */ template class LSHSearch { public: /** * This function initializes the LSH class. It builds the hash on the * reference set with 2-stable distributions. See the individual functions * performing the hashing for details on how the hashing is done. * * @param referenceSet Set of reference points. * @param querySet Set of query points. * @param numProj Number of projections in each hash table (anything between * 10-50 might be a decent choice). * @param numTables Total number of hash tables (anything between 10-20 * should suffice). * @param hashWidth The width of hash for every table. If 0 (the default) is * provided, then the hash width is automatically obtained by computing * the average pairwise distance of 25 pairs. This should be a reasonable * upper bound on the nearest-neighbor distance in general. * @param secondHashSize The size of the second hash table. This should be a * large prime number. * @param bucketSize The size of the bucket in the second hash table. This is * the maximum number of points that can be hashed into single bucket. * Default values are already provided here. */ LSHSearch(const arma::mat& referenceSet, const arma::mat& querySet, const size_t numProj, const size_t numTables, const double hashWidth = 0.0, const size_t secondHashSize = 99901, const size_t bucketSize = 500); /** * This function initializes the LSH class. It builds the hash on the * reference set with 2-stable distributions. See the individual functions * performing the hashing for details on how the hashing is done. * * @param referenceSet Set of reference points and the set of queries. * @param numProj Number of projections in each hash table (anything between * 10-50 might be a decent choice). * @param numTables Total number of hash tables (anything between 10-20 * should suffice). * @param hashWidth The width of hash for every table. If 0 (the default) is * provided, then the hash width is automatically obtained by computing * the average pairwise distance of 25 pairs. This should be a reasonable * upper bound on the nearest-neighbor distance in general. * @param secondHashSize The size of the second hash table. This should be a * large prime number. * @param bucketSize The size of the bucket in the second hash table. This is * the maximum number of points that can be hashed into single bucket. * Default values are already provided here. */ LSHSearch(const arma::mat& referenceSet, const size_t numProj, const size_t numTables, const double hashWidth = 0.0, const size_t secondHashSize = 99901, const size_t bucketSize = 500); /** * Compute the nearest neighbors and store the output in the given matrices. * The matrices will be set to the size of n columns by k rows, where n is * the number of points in the query dataset and k is the number of neighbors * being searched for. * * @param k Number of neighbors to search for. * @param resultingNeighbors Matrix storing lists of neighbors for each query * point. * @param distances Matrix storing distances of neighbors for each query * point. * @param numTablesToSearch This parameter allows the user to have control * over the number of hash tables to be searched. This allows * the user to pick the number of tables it can afford for the time * available without having to build hashing for every table size. * By default, this is set to zero in which case all tables are * considered. */ void Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const size_t numTablesToSearch = 0); // Returns a string representation of this object. std::string ToString() const; private: /** * This function builds a hash table with two levels of hashing as presented * in the paper. This function first hashes the points with 'numProj' random * projections to a single hash table creating (key, point ID) pairs where the * key is a 'numProj'-dimensional integer vector. * * Then each key in this hash table is hashed into a second hash table using a * standard hash. * * This function does not have any parameters and relies on parameters which * are private members of this class, intialized during the class * intialization. */ void BuildHash(); /** * This function takes a query and hashes it into each of the hash tables to * get keys for the query and then the key is hashed to a bucket of the second * hash table and all the points (if any) in those buckets are collected as * the potential neighbor candidates. * * @param queryIndex The index of the query currently being processed. * @param referenceIndices The list of neighbor candidates obtained from * hashing the query into all the hash tables and eventually into * multiple buckets of the second hash table. */ void ReturnIndicesFromTable(const size_t queryIndex, arma::uvec& referenceIndices, size_t numTablesToSearch); /** * This is a helper function that computes the distance of the query to the * neighbor candidates and appropriately stores the best 'k' candidates * * @param queryIndex The index of the query in question * @param referenceIndex The index of the neighbor candidate in question */ double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * This is a helper function that efficiently inserts better neighbor * candidates into an existing set of neighbor candidates. This function is * only called by the 'BaseCase' function. * * @param queryIndex This is the index of the query being processed currently * @param pos The position of the neighbor candidate in the current list of * neighbor candidates. * @param neighbor The neighbor candidate that is being inserted into the list * of the best 'k' candidates for the query in question. * @param distance The distance of the query to the neighbor candidate. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); //! Reference dataset. const arma::mat& referenceSet; //! Query dataset (may not be given). const arma::mat& querySet; //! The number of projections const size_t numProj; //! The number of hash tables const size_t numTables; //! The std::vector containing the projection matrix of each table std::vector projections; // should be [numProj x dims] x numTables //! The list of the offset 'b' for each of the projection for each table arma::mat offsets; // should be numProj x numTables //! The hash width double hashWidth; //! The big prime representing the size of the second hash const size_t secondHashSize; //! The weights of the second hash arma::vec secondHashWeights; //! The bucket size of the second hash const size_t bucketSize; //! Instantiation of the metric. metric::SquaredEuclideanDistance metric; //! The final hash table; should be (< secondHashSize) x bucketSize. arma::Mat secondHashTable; //! The number of elements present in each hash bucket; should be //! secondHashSize. arma::Col bucketContentSize; //! For a particular hash value, points to the row in secondHashTable //! corresponding to this value. Should be secondHashSize. arma::Col bucketRowInHashTable; //! The pointer to the nearest neighbor distances. arma::mat* distancePtr; //! The pointer to the nearest neighbor indices. arma::Mat* neighborPtr; }; // class LSHSearch }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "lsh_search_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/lsh/lsh_search_impl.hpp0000644000176200001440000003567513647512751022235 0ustar liggesusers/** * @file lsh_search_impl.hpp * @author Parikshit Ram * * Implementation of the LSHSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_IMPL_HPP #include namespace mlpack { namespace neighbor { // Construct the object. template LSHSearch:: LSHSearch(const arma::mat& referenceSet, const arma::mat& querySet, const size_t numProj, const size_t numTables, const double hashWidthIn, const size_t secondHashSize, const size_t bucketSize) : referenceSet(referenceSet), querySet(querySet), numProj(numProj), numTables(numTables), hashWidth(hashWidthIn), secondHashSize(secondHashSize), bucketSize(bucketSize) { if (hashWidth == 0.0) // The user has not provided any value. { // Compute a heuristic hash width from the data. for (size_t i = 0; i < 25; i++) { size_t p1 = (size_t) math::RandInt(referenceSet.n_cols); size_t p2 = (size_t) math::RandInt(referenceSet.n_cols); hashWidth += std::sqrt(metric.Evaluate(referenceSet.unsafe_col(p1), referenceSet.unsafe_col(p2))); } hashWidth /= 25; } Rcpp::Rcout << "Hash width chosen as: " << hashWidth << std::endl; BuildHash(); } template LSHSearch:: LSHSearch(const arma::mat& referenceSet, const size_t numProj, const size_t numTables, const double hashWidthIn, const size_t secondHashSize, const size_t bucketSize) : referenceSet(referenceSet), querySet(referenceSet), numProj(numProj), numTables(numTables), hashWidth(hashWidthIn), secondHashSize(secondHashSize), bucketSize(bucketSize) { if (hashWidth == 0.0) // The user has not provided any value. { // Compute a heuristic hash width from the data. for (size_t i = 0; i < 25; i++) { size_t p1 = (size_t) math::RandInt(referenceSet.n_cols); size_t p2 = (size_t) math::RandInt(referenceSet.n_cols); hashWidth += std::sqrt(metric.Evaluate(referenceSet.unsafe_col(p1), referenceSet.unsafe_col(p2))); } hashWidth /= 25; } Rcpp::Rcout << "Hash width chosen as: " << hashWidth << std::endl; BuildHash(); } template void LSHSearch:: InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (distancePtr->n_rows - 1)) { int len = (distancePtr->n_rows - 1) - pos; memmove(distancePtr->colptr(queryIndex) + (pos + 1), distancePtr->colptr(queryIndex) + pos, sizeof(double) * len); memmove(neighborPtr->colptr(queryIndex) + (pos + 1), neighborPtr->colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. (*distancePtr)(pos, queryIndex) = distance; (*neighborPtr)(pos, queryIndex) = neighbor; } template inline force_inline double LSHSearch:: BaseCase(const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, then this search is only using one dataset // and we should not return identical points. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return 0.0; double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // If this distance is better than any of the current candidates, the // SortDistance() function will give us the position to insert it into. arma::vec queryDist = distancePtr->unsafe_col(queryIndex); arma::Col queryIndices = neighborPtr->unsafe_col(queryIndex); size_t insertPosition = SortPolicy::SortDistance(queryDist, queryIndices, distance); // SortDistance() returns (size_t() - 1) if we shouldn't add it. if (insertPosition != (size_t() - 1)) InsertNeighbor(queryIndex, insertPosition, referenceIndex, distance); return distance; } template void LSHSearch:: ReturnIndicesFromTable(const size_t queryIndex, arma::uvec& referenceIndices, size_t numTablesToSearch) { // Decide on the number of tables to look into. if (numTablesToSearch == 0) // If no user input is given, search all. numTablesToSearch = numTables; // Sanity check to make sure that the existing number of tables is not // exceeded. if (numTablesToSearch > numTables) numTablesToSearch = numTables; // Hash the query in each of the 'numTablesToSearch' hash tables using the // 'numProj' projections for each table. This gives us 'numTablesToSearch' // keys for the query where each key is a 'numProj' dimensional integer // vector. // Compute the projection of the query in each table. arma::mat allProjInTables(numProj, numTablesToSearch); for (size_t i = 0; i < numTablesToSearch; i++) { allProjInTables.unsafe_col(i) = projections[i].t() * querySet.unsafe_col(queryIndex); } allProjInTables += offsets.cols(0, numTablesToSearch - 1); allProjInTables /= hashWidth; // Compute the hash value of each key of the query into a bucket of the // 'secondHashTable' using the 'secondHashWeights'. arma::rowvec hashVec = secondHashWeights.t() * arma::floor(allProjInTables); for (size_t i = 0; i < hashVec.n_elem; i++) hashVec[i] = (double) ((size_t) hashVec[i] % secondHashSize); //Log::Assert(hashVec.n_elem == numTablesToSearch); // For all the buckets that the query is hashed into, sequentially // collect the indices in those buckets. arma::Col refPointsConsidered; refPointsConsidered.zeros(referenceSet.n_cols); for (size_t i = 0; i < hashVec.n_elem; i++) // For all tables. { size_t hashInd = (size_t) hashVec[i]; if (bucketContentSize[hashInd] > 0) { // Pick the indices in the bucket corresponding to 'hashInd'. size_t tableRow = bucketRowInHashTable[hashInd]; assert(tableRow < secondHashSize); assert(tableRow < secondHashTable.n_rows); for (size_t j = 0; j < bucketContentSize[hashInd]; j++) refPointsConsidered[secondHashTable(tableRow, j)]++; } } referenceIndices = arma::find(refPointsConsidered > 0); } template void LSHSearch:: Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const size_t numTablesToSearch) { neighborPtr = &resultingNeighbors; distancePtr = &distances; // Set the size of the neighbor and distance matrices. neighborPtr->set_size(k, querySet.n_cols); distancePtr->set_size(k, querySet.n_cols); distancePtr->fill(SortPolicy::WorstDistance()); neighborPtr->fill(referenceSet.n_cols); size_t avgIndicesReturned = 0; //Timer::Start("computing_neighbors"); // Go through every query point sequentially. for (size_t i = 0; i < querySet.n_cols; i++) { // Hash every query into every hash table and eventually into the // 'secondHashTable' to obtain the neighbor candidates. arma::uvec refIndices; ReturnIndicesFromTable(i, refIndices, numTablesToSearch); // An informative book-keeping for the number of neighbor candidates // returned on average. avgIndicesReturned += refIndices.n_elem; // Sequentially go through all the candidates and save the best 'k' // candidates. for (size_t j = 0; j < refIndices.n_elem; j++) BaseCase(i, (size_t) refIndices[j]); } //Timer::Stop("computing_neighbors"); avgIndicesReturned /= querySet.n_cols; Rcpp::Rcout << avgIndicesReturned << " distinct indices returned on average." << std::endl; } template void LSHSearch:: BuildHash() { // The first level hash for a single table outputs a 'numProj'-dimensional // integer key for each point in the set -- (key, pointID) // The key creation details are presented below // // The second level hash is performed by hashing the key to // an integer in the range [0, 'secondHashSize'). // // This is done by creating a weight vector 'secondHashWeights' of // length 'numProj' with each entry an integer randomly chosen // between [0, 'secondHashSize'). // // Then the bucket for any key and its corresponding point is // given by % 'secondHashSize' // and the corresponding point ID is put into that bucket. // Step I: Prepare the second level hash. // Obtain the weights for the second hash. secondHashWeights = arma::floor(arma::randu(numProj) * (double) secondHashSize); // The 'secondHashTable' is initially an empty matrix of size // ('secondHashSize' x 'bucketSize'). But by only filling the buckets // as points land in them allows us to shrink the size of the // 'secondHashTable' at the end of the hashing. // Fill the second hash table n = referenceSet.n_cols. This is because no // point has index 'n' so the presence of this in the bucket denotes that // there are no more points in this bucket. secondHashTable.set_size(secondHashSize, bucketSize); secondHashTable.fill(referenceSet.n_cols); // Keep track of the size of each bucket in the hash. At the end of hashing // most buckets will be empty. bucketContentSize.zeros(secondHashSize); // Instead of putting the points in the row corresponding to the bucket, we // chose the next empty row and keep track of the row in which the bucket // lies. This allows us to stack together and slice out the empty buckets at // the end of the hashing. bucketRowInHashTable.set_size(secondHashSize); bucketRowInHashTable.fill(secondHashSize); // Keep track of number of non-empty rows in the 'secondHashTable'. size_t numRowsInTable = 0; // Step II: The offsets for all projections in all tables. // Since the 'offsets' are in [0, hashWidth], we obtain the 'offsets' // as randu(numProj, numTables) * hashWidth. offsets.randu(numProj, numTables); offsets *= hashWidth; // Step III: Create each hash table in the first level hash one by one and // putting them directly into the 'secondHashTable' for memory efficiency. for (size_t i = 0; i < numTables; i++) { // Step IV: Obtain the 'numProj' projections for each table. // For L2 metric, 2-stable distributions are used, and // the normal Z ~ N(0, 1) is a 2-stable distribution. arma::mat projMat; projMat.randn(referenceSet.n_rows, numProj); // Save the projection matrix for querying. projections.push_back(projMat); // Step V: create the 'numProj'-dimensional key for each point in each // table. // The following code performs the task of hashing each point to a // 'numProj'-dimensional integer key. Hence you get a ('numProj' x // 'referenceSet.n_cols') key matrix. // // For a single table, let the 'numProj' projections be denoted by 'proj_i' // and the corresponding offset be 'offset_i'. Then the key of a single // point is obtained as: // key = { floor( ( + offset_i) / 'hashWidth' ) forall i } arma::mat offsetMat = arma::repmat(offsets.unsafe_col(i), 1, referenceSet.n_cols); arma::mat hashMat = projMat.t() * referenceSet; hashMat += offsetMat; hashMat /= hashWidth; // Step VI: Putting the points in the 'secondHashTable' by hashing the key. // Now we hash every key, point ID to its corresponding bucket. arma::rowvec secondHashVec = secondHashWeights.t() * arma::floor(hashMat); // This gives us the bucket for the corresponding point ID. for (size_t j = 0; j < secondHashVec.n_elem; j++) secondHashVec[j] = (double)((size_t) secondHashVec[j] % secondHashSize); //Log::Assert(secondHashVec.n_elem == referenceSet.n_cols); // Insert the point in the corresponding row to its bucket in the // 'secondHashTable'. for (size_t j = 0; j < secondHashVec.n_elem; j++) { // This is the bucket number. size_t hashInd = (size_t) secondHashVec[j]; // The point ID is 'j'. // If this is currently an empty bucket, start a new row keep track of // which row corresponds to the bucket. if (bucketContentSize[hashInd] == 0) { // Start a new row for hash. bucketRowInHashTable[hashInd] = numRowsInTable; secondHashTable(numRowsInTable, 0) = j; numRowsInTable++; } else { // If bucket is already present in the 'secondHashTable', find the // corresponding row and insert the point ID in this row unless the // bucket is full, in which case, do nothing. if (bucketContentSize[hashInd] < bucketSize) secondHashTable(bucketRowInHashTable[hashInd], bucketContentSize[hashInd]) = j; } // Increment the count of the points in this bucket. if (bucketContentSize[hashInd] < bucketSize) bucketContentSize[hashInd]++; } // Loop over all points in the reference set. } // Loop over tables. // Step VII: Condensing the 'secondHashTable'. size_t maxBucketSize = 0; for (size_t i = 0; i < bucketContentSize.n_elem; i++) if (bucketContentSize[i] > maxBucketSize) maxBucketSize = bucketContentSize[i]; Rcpp::Rcout << "Final hash table size: (" << numRowsInTable << " x " << maxBucketSize << ")" << std::endl; secondHashTable.resize(numRowsInTable, maxBucketSize); } template std::string LSHSearch::ToString() const { std::ostringstream convert; convert << "LSHSearch [" << this << "]" << std::endl; convert << " Reference Set: " << referenceSet.n_rows << "x" ; convert << referenceSet.n_cols << std::endl; if (&referenceSet != &querySet) convert << " QuerySet: " << querySet.n_rows << "x" << querySet.n_cols << std::endl; convert << " Number of Projections: " << numProj << std::endl; convert << " Number of Tables: " << numTables << std::endl; convert << " Hash Width: " << hashWidth << std::endl; convert << " Metric: " << std::endl; convert << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/0000755000176200001440000000000013647514343020703 5ustar liggesusersRcppMLPACK/src/mlpack/methods/neighbor_search/neighbor_search_rules.hpp0000644000176200001440000001476613647512751025767 0ustar liggesusers/** * @file neighbor_search_rules.hpp * @author Ryan Curtin * * Defines the pruning rules and base case rules necessary to perform a * tree-based search (with an arbitrary tree) for the NeighborSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_RULES_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_RULES_HPP #include "ns_traversal_info.hpp" namespace mlpack { namespace neighbor { template class NeighborSearchRules { public: NeighborSearchRules(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric); /** * Get the distance from the query point to the reference point. * This will update the "neighbor" matrix with the new point if appropriate * and will track the number of base cases (number of points evaluated). * * @param queryIndex Index of query point. * @param referenceIndex Index of reference point. */ double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) const; /** * Get the score for recursion order. A low score indicates priority for * recursionm while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Socre() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; //! Get the number of base cases that have been performed. size_t BaseCases() const { return baseCases; } //! Modify the number of base cases that have been performed. size_t& BaseCases() { return baseCases; } //! Get the number of scores that have been performed. size_t Scores() const { return scores; } //! Modify the number of scores that have been performed. size_t& Scores() { return scores; } //! Convenience typedef. typedef NeighborSearchTraversalInfo TraversalInfoType; //! Get the traversal info. const TraversalInfoType& TraversalInfo() const { return traversalInfo; } //! Modify the traversal info. TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference set. const typename TreeType::Mat& referenceSet; //! The query set. const typename TreeType::Mat& querySet; //! The matrix the resultant neighbor indices should be stored in. arma::Mat& neighbors; //! The matrix the resultant neighbor distances should be stored in. arma::mat& distances; //! The instantiated metric. MetricType& metric; //! The last query point BaseCase() was called with. size_t lastQueryIndex; //! The last reference point BaseCase() was called with. size_t lastReferenceIndex; //! The last base case result. double lastBaseCase; //! The number of base cases that have been performed. size_t baseCases; //! The number of scores that have been performed. size_t scores; //! Traversal info for the parent combination; this is updated by the //! traversal before each call to Score(). TraversalInfoType traversalInfo; /** * Recalculate the bound for a given query node. */ double CalculateBound(TreeType& queryNode) const; /** * Insert a point into the neighbors and distances matrices; this is a helper * function. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); }; }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "neighbor_search_rules_impl.hpp" #endif // __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_RULES_HPP RcppMLPACK/src/mlpack/methods/neighbor_search/neighbor_search_impl.hpp0000644000176200001440000003115513647512751025565 0ustar liggesusers/** * @file neighbor_search_impl.hpp * @author Ryan Curtin * * Implementation of Neighbor-Search class to perform all-nearest-neighbors on * two specified data sets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_IMPL_HPP #include #include "neighbor_search_rules.hpp" namespace mlpack { namespace neighbor { //! Call the tree constructor that does mapping. template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } // Construct the object. template NeighborSearch:: NeighborSearch(const typename TreeType::Mat& referenceSetIn, const typename TreeType::Mat& querySetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? queryCopy : querySetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), // False if a tree was passed. If naive, then no trees. hasQuerySet(true), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric) { // C++11 will allow us to call out to other constructors so we can avoid this // copy/paste problem. // We'll time tree building, but only if we are building trees. //Timer::Start("tree_building"); // Copy the datasets, if they will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) { referenceCopy = referenceSetIn; queryCopy = querySetIn; } // If not in naive mode, then we need to build trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = BuildTree( const_cast(querySet), oldFromNewQueries); } // Stop the timer we started above (if we need to). //Timer::Stop("tree_building"); } // Construct the object. template NeighborSearch:: NeighborSearch(const typename TreeType::Mat& referenceSetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), // If naive, then we are not building any trees. hasQuerySet(false), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric) { // We'll time tree building, but only if we are building trees. //Timer::Start("tree_building"); // Copy the dataset, if it will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) referenceCopy = referenceSetIn; // If not in naive mode, then we may need to construct trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = new TreeType(*referenceTree); } // Stop the timer we started above. //Timer::Stop("tree_building"); } // Construct the object. template NeighborSearch::NeighborSearch( TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), hasQuerySet(true), naive(false), singleMode(singleMode), metric(metric) { // Nothing else to initialize. } // Construct the object. template NeighborSearch::NeighborSearch( TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), hasQuerySet(false), // In this case we will own a tree, if singleMode. naive(false), singleMode(singleMode), metric(metric) { //Timer::Start("tree_building"); // The query tree cannot be the same as the reference tree. if (referenceTree && !singleMode) queryTree = new TreeType(*referenceTree); //Timer::Stop("tree_building"); } /** * The tree is the only member we may be responsible for deleting. The others * will take care of themselves. */ template NeighborSearch::~NeighborSearch() { if (treeOwner) { if (referenceTree) delete referenceTree; if (queryTree) delete queryTree; } else if (!treeOwner && !hasQuerySet && !(singleMode || naive)) { // We replicated the reference tree to create a query tree. delete queryTree; } } /** * Computes the best neighbors and stores them in resultingNeighbors and * distances. */ template void NeighborSearch::Search( const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances) { //Timer::Start("computing_neighbors"); // If we have built the trees ourselves, then we will have to map all the // indices back to their original indices when this computation is finished. // To avoid an extra copy, we will store the neighbors and distances in a // separate matrix. arma::Mat* neighborPtr = &resultingNeighbors; arma::mat* distancePtr = &distances; // Mapping is only necessary if the tree rearranges points. if (tree::TreeTraits::RearrangesDataset) { if (treeOwner && !(singleMode && hasQuerySet)) distancePtr = new arma::mat; // Query indices need to be mapped. if (treeOwner) neighborPtr = new arma::Mat; // All indices need mapping. } // Set the size of the neighbor and distance matrices. neighborPtr->set_size(k, querySet.n_cols); neighborPtr->fill(size_t() - 1); distancePtr->set_size(k, querySet.n_cols); distancePtr->fill(SortPolicy::WorstDistance()); // Create the helper object for the tree traversal. typedef NeighborSearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric); if (naive) { // The naive brute-force traversal. for (size_t i = 0; i < querySet.n_cols; ++i) for (size_t j = 0; j < referenceSet.n_cols; ++j) rules.BaseCase(i, j); } else if (singleMode) { // The search doesn't work if the root node is also a leaf node. // if this is the case, it is suggested that you use the naive method. assert(!(referenceTree->IsLeaf())); // Create the traverser. typename TreeType::template SingleTreeTraverser traverser(rules); // Now have it traverse for each point. for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); Rcpp::Rcout << rules.Scores() << " node combinations were scored.\n"; Rcpp::Rcout << rules.BaseCases() << " base cases were calculated.\n"; } else // Dual-tree recursion. { // Create the traverser. typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*queryTree, *referenceTree); Rcpp::Rcout << rules.Scores() << " node combinations were scored.\n"; Rcpp::Rcout << rules.BaseCases() << " base cases were calculated.\n"; } //Timer::Stop("computing_neighbors"); // Now, do we need to do mapping of indices? if (!treeOwner || !tree::TreeTraits::RearrangesDataset) { // No mapping needed. We are done. return; } else if (treeOwner && hasQuerySet && !singleMode) // Map both sets. { // Set size of output matrices correctly. resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewQueries[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewQueries[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrices. delete neighborPtr; delete distancePtr; } else if (treeOwner && !hasQuerySet) { resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewReferences[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewReferences[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrices. delete neighborPtr; delete distancePtr; } else if (treeOwner && hasQuerySet && singleMode) // Map only references. { // Set size of neighbor indices matrix correctly. resultingNeighbors.set_size(k, querySet.n_cols); // Map indices of neighbors. for (size_t i = 0; i < resultingNeighbors.n_cols; i++) { for (size_t j = 0; j < resultingNeighbors.n_rows; j++) { resultingNeighbors(j, i) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrix. delete neighborPtr; } } // Search //Return a String of the Object. template std::string NeighborSearch::ToString() const { std::ostringstream convert; convert << "NearestNeighborSearch [" << this << "]" << std::endl; convert << " Reference Set: " << referenceSet.n_rows << "x" ; convert << referenceSet.n_cols << std::endl; if (&referenceSet != &querySet) convert << " QuerySet: " << querySet.n_rows << "x" << querySet.n_cols << std::endl; convert << " Reference Tree: " << referenceTree << std::endl; if (&referenceTree != &queryTree) convert << " QueryTree: " << queryTree << std::endl; convert << " Tree Owner: " << treeOwner << std::endl; convert << " Naive: " << naive << std::endl; convert << " Metric: " << std::endl; convert << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/0000755000176200001440000000000013647514343023561 5ustar liggesusersRcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort_impl.hpp0000644000176200001440000000517113647512751031705 0ustar liggesusers/** * @file nearest_neighbor_sort_impl.hpp * @author Ryan Curtin * * Implementation of templated methods for the NearestNeighborSort SortPolicy * class for the NeighborSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_NEIGHBOR_NEAREST_NEIGHBOR_SORT_IMPL_HPP #define __MLPACK_NEIGHBOR_NEAREST_NEIGHBOR_SORT_IMPL_HPP namespace mlpack { namespace neighbor { template inline double NearestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return queryNode->MinDistance(referenceNode); } template inline double NearestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance) { return queryNode->MinDistance(referenceNode, centerToCenterDistance); } template inline double NearestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* /* referenceNode */, const TreeType* referenceChildNode, const double centerToCenterDistance) { return queryNode->MinDistance(referenceChildNode, centerToCenterDistance) - referenceChildNode->ParentDistance(); } template inline double NearestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return referenceNode->MinDistance(point); } template inline double NearestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode, const double pointToCenterDistance) { return referenceNode->MinDistance(point, pointToCenterDistance); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.cpp0000644000176200001440000000311513647514343031055 0ustar liggesusers/*** * @file nearest_neighbor_sort.cpp * @author Ryan Curtin * * Implementation of the simple FurthestNeighborSort policy class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "furthest_neighbor_sort.hpp" using namespace mlpack::neighbor; size_t FurthestNeighborSort::SortDistance(const arma::vec& list, const arma::Col& indices, double newDistance) { // The first element in the list is the nearest neighbor. We only want to // insert if the new distance is greater than the last element in the list. if (newDistance < list[list.n_elem - 1]) return (size_t() - 1); // Do not insert. // Search from the beginning. This may not be the best way. for (size_t i = 0; i < list.n_elem; i++) if (newDistance >= list[i] || indices[i] == (size_t() - 1)) return i; // Control should never reach here. return (size_t() - 1); } RcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.hpp0000644000176200001440000001456213647512751030670 0ustar liggesusers/** * @file nearest_neighbor_sort.hpp * @author Ryan Curtin * * Implementation of the SortPolicy class for NeighborSearch; in this case, the * nearest neighbors are those that are most important. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_SORT_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_SORT_HPP #include namespace mlpack { namespace neighbor { /** * This class implements the necessary methods for the SortPolicy template * parameter of the NeighborSearch class. The sorting policy here is that the * minimum distance is the best (so, when used with NeighborSearch, the output * is nearest neighbors). * * This class is also meant to serve as a guide to implement a custom * SortPolicy. All of the methods implemented here must be implemented by any * other SortPolicy classes. */ class NearestNeighborSort { public: /** * Return the index in the vector where the new distance should be inserted, * or (size_t() - 1) if it should not be inserted (i.e. if it is not any * better than any of the existing points in the list). The list should be * sorted such that the best point is the first in the list. The actual * insertion is not performed. * * @param list Vector of existing distance points, sorted such that the best * point is first in the list. * @param new_distance Distance to try to insert * * @return size_t containing the position to insert into, or (size_t() - 1) * if the new distance should not be inserted. */ static size_t SortDistance(const arma::vec& list, const arma::Col& indices, double newDistance); /** * Return whether or not value is "better" than ref. In this case, that means * that the value is less than the reference. * * @param value Value to compare * @param ref Value to compare with * * @return bool indicating whether or not (value < ref). */ static inline bool IsBetter(const double value, const double ref) { return (value < ref); } /** * Return the best possible distance between two nodes. In our case, this is * the minimum distance between the two tree nodes using the given distance * function. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode); /** * Return the best possible distance between two nodes, given that the * distance between the centers of the two nodes has already been calculated. * This is used in conjunction with trees that have self-children (like cover * trees). */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance); /** * Return the best possible distance between the query node and the reference * child node given the base case distance between the query node and the * reference node. TreeType::ParentDistance() must be implemented to use * this. * * @param queryNode Query node. * @param referenceNode Reference node. * @param referenceChildNode Child of reference node which is being inspected. * @param centerToCenterDistance Distance between centers of query node and * reference node. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const TreeType* referenceChildNode, const double centerToCenterDistance); /** * Return the best possible distance between a node and a point. In our case, * this is the minimum distance between the tree node and the point using the * given distance function. */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode); /** * Return the best possible distance between a point and a node, given that * the distance between the point and the center of the node has already been * calculated. This is used in conjunction with trees that have * self-children (like cover trees). */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode, const double pointToCenterDistance); /** * Return what should represent the worst possible distance with this * particular sort policy. In our case, this should be the maximum possible * distance, DBL_MAX. * * @return DBL_MAX */ static inline double WorstDistance() { return DBL_MAX; } /** * Return what should represent the best possible distance with this * particular sort policy. In our case, this should be the minimum possible * distance, 0.0. * * @return 0.0 */ static inline double BestDistance() { return 0.0; } /** * Return the best combination of the two distances. */ static inline double CombineBest(const double a, const double b) { return std::max(a - b, 0.0); } /** * Return the worst combination of the two distances. */ static inline double CombineWorst(const double a, const double b) { if (a == DBL_MAX || b == DBL_MAX) return DBL_MAX; return a + b; } }; }; // namespace neighbor }; // namespace mlpack // Include implementation of templated functions. #include "nearest_neighbor_sort_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.cpp0000644000176200001440000000310413647514343030650 0ustar liggesusers/** * @file nearest_neighbor_sort.cpp * @author Ryan Curtin * * Implementation of the simple NearestNeighborSort policy class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "nearest_neighbor_sort.hpp" using namespace mlpack::neighbor; size_t NearestNeighborSort::SortDistance(const arma::vec& list, const arma::Col& indices, double newDistance) { // The first element in the list is the nearest neighbor. We only want to // insert if the new distance is less than the last element in the list. if (newDistance > list[list.n_elem - 1]) return (size_t() - 1); // Do not insert. // Search from the beginning. This may not be the best way. for (size_t i = 0; i < list.n_elem; i++) if (newDistance <= list[i] || indices[i] == (size_t() - 1)) return i; // Control should never reach here. return (size_t() - 1); } RcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort_impl.hpp0000644000176200001440000000522613647512751032111 0ustar liggesusers/*** * @file furthest_neighbor_sort_impl.hpp * @author Ryan Curtin * * Implementation of templated methods for the FurthestNeighborSort SortPolicy * class for the NeighborSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_IMPL_HPP namespace mlpack { namespace neighbor { template inline double FurthestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return queryNode->MaxDistance(referenceNode); } template inline double FurthestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance) { return queryNode->MaxDistance(referenceNode, centerToCenterDistance); } template inline double FurthestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode, const TreeType* referenceChildNode, const double centerToCenterDistance) { return queryNode->MaxDistance(referenceNode, centerToCenterDistance) + referenceChildNode->ParentDistance(); } template inline double FurthestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return referenceNode->MaxDistance(point); } template inline double FurthestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode, const double pointToCenterDistance) { return referenceNode->MaxDistance(point, pointToCenterDistance); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.hpp0000644000176200001440000001427713647512751031076 0ustar liggesusers/** * @file furthest_neighbor_sort.hpp * @author Ryan Curtin * * Implementation of the SortPolicy class for NeighborSearch; in this case, the * furthest neighbors are those that are most important. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_HPP #include namespace mlpack { namespace neighbor { /** * This class implements the necessary methods for the SortPolicy template * parameter of the NeighborSearch class. The sorting policy here is that the * minimum distance is the best (so, when used with NeighborSearch, the output * is furthest neighbors). */ class FurthestNeighborSort { public: /** * Return the index in the vector where the new distance should be inserted, * or size_t() - 1 if it should not be inserted (i.e. if it is not any better * than any of the existing points in the list). The list should be sorted * such that the best point is the first in the list. The actual insertion is * not performed. * * @param list Vector of existing distance points, sorted such that the best * point is the first in the list. * @param new_distance Distance to try to insert. * * @return size_t containing the position to insert into, or (size_t() - 1) * if the new distance should not be inserted. */ static size_t SortDistance(const arma::vec& list, const arma::Col& indices, double newDistance); /** * Return whether or not value is "better" than ref. In this case, that means * that the value is greater than the reference. * * @param value Value to compare * @param ref Value to compare with * * @return bool indicating whether or not (value > ref). */ static inline bool IsBetter(const double value, const double ref) { return (value > ref); } /** * Return the best possible distance between two nodes. In our case, this is * the maximum distance between the two tree nodes using the given distance * function. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode); /** * Return the best possible distance between two nodes, given that the * distance between the centers of the two nodes has already been calculated. * This is used in conjunction with trees that have self-children (like cover * trees). */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance); /** * Return the best possible distance between the query node and the reference * child node given the base case distance between the query node and the * reference node. TreeType::ParentDistance() must be implemented to use * this. * * @param queryNode Query node. * @param referenceNode Reference node. * @param referenceChildNode Child of reference node which is being inspected. * @param centerToCenterDistance Distance between centers of query node and * reference node. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const TreeType* referenceChildNode, const double centerToCenterDistance); /** * Return the best possible distance between a node and a point. In our case, * this is the maximum distance between the tree node and the point using the * given distance function. */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode); /** * Return the best possible distance between a point and a node, given that * the distance between the point and the center of the node has already been * calculated. This is used in conjunction with trees that have * self-children (like cover trees). */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode, const double pointToCenterDistance); /** * Return what should represent the worst possible distance with this * particular sort policy. In our case, this should be the minimum possible * distance, 0. * * @return 0 */ static inline double WorstDistance() { return 0; } /** * Return what should represent the best possible distance with this * particular sort policy. In our case, this should be the maximum possible * distance, DBL_MAX. * * @return DBL_MAX */ static inline double BestDistance() { return DBL_MAX; } /** * Return the best combination of the two distances. */ static inline double CombineBest(const double a, const double b) { if (a == DBL_MAX || b == DBL_MAX) return DBL_MAX; return a + b; } /** * Return the worst combination of the two distances. */ static inline double CombineWorst(const double a, const double b) { return std::max(a - b, 0.0); } }; }; // namespace neighbor }; // namespace mlpack // Include implementation of templated functions. #include "furthest_neighbor_sort_impl.hpp" #endif RcppMLPACK/src/mlpack/methods/neighbor_search/typedef.hpp0000644000176200001440000000345513647512751023064 0ustar liggesusers/** * @file typedef.hpp * @author Ryan Curtin * * Simple typedefs describing template instantiations of the NeighborSearch * class which are commonly used. This is meant to be included by * neighbor_search.h but is a separate file for simplicity. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_NEIGHBOR_SEARCH_TYPEDEF_H #define __MLPACK_NEIGHBOR_SEARCH_TYPEDEF_H // In case someone included this directly. #include "neighbor_search.hpp" #include #include "sort_policies/nearest_neighbor_sort.hpp" #include "sort_policies/furthest_neighbor_sort.hpp" namespace mlpack { namespace neighbor { /** * The AllkNN class is the all-k-nearest-neighbors method. It returns L2 * distances (Euclidean distances) for each of the k nearest neighbors. */ typedef NeighborSearch AllkNN; /** * The AllkFN class is the all-k-furthest-neighbors method. It returns L2 * distances (Euclidean distances) for each of the k furthest neighbors. */ typedef NeighborSearch AllkFN; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/neighbor_search_stat.hpp0000644000176200001440000000706513647512751025602 0ustar liggesusers/** * @file neighbor_search.hpp * @author Ryan Curtin * * Defines the NeighborSearch class, which performs an abstract * nearest-neighbor-like query on two datasets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_STAT_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_STAT_HPP #include namespace mlpack { namespace neighbor { /** * Extra data for each node in the tree. For neighbor searches, each node only * needs to store a bound on neighbor distances. */ template class NeighborSearchStat { private: //! The first bound on the node's neighbor distances (B_1). This represents //! the worst candidate distance of any descendants of this node. double firstBound; //! The second bound on the node's neighbor distances (B_2). This represents //! a bound on the worst distance of any descendants of this node assembled //! using the best descendant candidate distance modified by the furthest //! descendant distance. double secondBound; //! The better of the two bounds. double bound; //! The last distance evaluation node. void* lastDistanceNode; //! The last distance evaluation. double lastDistance; public: /** * Initialize the statistic with the worst possible distance according to * our sorting policy. */ NeighborSearchStat() : firstBound(SortPolicy::WorstDistance()), secondBound(SortPolicy::WorstDistance()), bound(SortPolicy::WorstDistance()), lastDistanceNode(NULL), lastDistance(0.0) { } /** * Initialization for a fully initialized node. In this case, we don't need * to worry about the node. */ template NeighborSearchStat(TreeType& /* node */) : firstBound(SortPolicy::WorstDistance()), secondBound(SortPolicy::WorstDistance()), bound(SortPolicy::WorstDistance()), lastDistanceNode(NULL), lastDistance(0.0) { } //! Get the first bound. double FirstBound() const { return firstBound; } //! Modify the first bound. double& FirstBound() { return firstBound; } //! Get the second bound. double SecondBound() const { return secondBound; } //! Modify the second bound. double& SecondBound() { return secondBound; } //! Get the overall bound (the better of the two bounds). double Bound() const { return bound; } //! Modify the overall bound (it should be the better of the two bounds). double& Bound() { return bound; } //! Get the last distance evaluation node. void* LastDistanceNode() const { return lastDistanceNode; } //! Modify the last distance evaluation node. void*& LastDistanceNode() { return lastDistanceNode; } //! Get the last distance calculation. double LastDistance() const { return lastDistance; } //! Modify the last distance calculation. double& LastDistance() { return lastDistance; } }; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp0000644000176200001440000004272213647512751027001 0ustar liggesusers/** * @file nearest_neighbor_rules_impl.hpp * @author Ryan Curtin * * Implementation of NearestNeighborRules. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_RULES_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_RULES_IMPL_HPP // In case it hasn't been included yet. #include "neighbor_search_rules.hpp" namespace mlpack { namespace neighbor { template NeighborSearchRules::NeighborSearchRules( const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric) : referenceSet(referenceSet), querySet(querySet), neighbors(neighbors), distances(distances), metric(metric), lastQueryIndex(querySet.n_cols), lastReferenceIndex(referenceSet.n_cols), baseCases(0), scores(0) { // We must set the traversal info last query and reference node pointers to // something that is both invalid (i.e. not a tree node) and not NULL. We'll // use the this pointer. traversalInfo.LastQueryNode() = (TreeType*) this; traversalInfo.LastReferenceNode() = (TreeType*) this; } template inline force_inline // Absolutely MUST be inline so optimizations can happen. double NeighborSearchRules:: BaseCase(const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, then this search is only using one dataset // and we should not return identical points. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return 0.0; // If we have already performed this base case, then do not perform it again. if ((lastQueryIndex == queryIndex) && (lastReferenceIndex == referenceIndex)) return lastBaseCase; double distance = metric.Evaluate(querySet.col(queryIndex), referenceSet.col(referenceIndex)); ++baseCases; // If this distance is better than any of the current candidates, the // SortDistance() function will give us the position to insert it into. arma::vec queryDist = distances.unsafe_col(queryIndex); arma::Col queryIndices = neighbors.unsafe_col(queryIndex); const size_t insertPosition = SortPolicy::SortDistance(queryDist, queryIndices, distance); // SortDistance() returns (size_t() - 1) if we shouldn't add it. if (insertPosition != (size_t() - 1)) InsertNeighbor(queryIndex, insertPosition, referenceIndex, distance); // Cache this information for the next time BaseCase() is called. lastQueryIndex = queryIndex; lastReferenceIndex = referenceIndex; lastBaseCase = distance; return distance; } template inline double NeighborSearchRules::Score( const size_t queryIndex, TreeType& referenceNode) { ++scores; // Count number of Score() calls. double distance; if (tree::TreeTraits::FirstPointIsCentroid) { // The first point in the tree is the centroid. So we can then calculate // the base case between that and the query point. double baseCase = -1.0; if (tree::TreeTraits::HasSelfChildren) { // If the parent node is the same, then we have already calculated the // base case. if ((referenceNode.Parent() != NULL) && (referenceNode.Point(0) == referenceNode.Parent()->Point(0))) baseCase = referenceNode.Parent()->Stat().LastDistance(); else baseCase = BaseCase(queryIndex, referenceNode.Point(0)); // Save this evaluation. referenceNode.Stat().LastDistance() = baseCase; } distance = SortPolicy::CombineBest(baseCase, referenceNode.FurthestDescendantDistance()); } else { distance = SortPolicy::BestPointToNodeDistance(querySet.col(queryIndex), &referenceNode); } // Compare against the best k'th distance for this query point so far. const double bestDistance = distances(distances.n_rows - 1, queryIndex); return (SortPolicy::IsBetter(distance, bestDistance)) ? distance : DBL_MAX; } template inline double NeighborSearchRules::Rescore( const size_t queryIndex, TreeType& /* referenceNode */, const double oldScore) const { // If we are already pruning, still prune. if (oldScore == DBL_MAX) return oldScore; // Just check the score again against the distances. const double bestDistance = distances(distances.n_rows - 1, queryIndex); return (SortPolicy::IsBetter(oldScore, bestDistance)) ? oldScore : DBL_MAX; } template inline double NeighborSearchRules::Score( TreeType& queryNode, TreeType& referenceNode) { ++scores; // Count number of Score() calls. // Update our bound. const double bestDistance = CalculateBound(queryNode); // Use the traversal info to see if a parent-child or parent-parent prune is // possible. This is a looser bound than we could make, but it might be // sufficient. const double queryParentDist = queryNode.ParentDistance(); const double queryDescDist = queryNode.FurthestDescendantDistance(); const double refParentDist = referenceNode.ParentDistance(); const double refDescDist = referenceNode.FurthestDescendantDistance(); const double score = traversalInfo.LastScore(); double adjustedScore; // We want to set adjustedScore to be the distance between the centroid of the // last query node and last reference node. We will do this by adjusting the // last score. In some cases, we can just use the last base case. if (tree::TreeTraits::FirstPointIsCentroid) { adjustedScore = traversalInfo.LastBaseCase(); } else if (score == 0.0) // Nothing we can do here. { adjustedScore = 0.0; } else { // The last score is equal to the distance between the centroids minus the // radii of the query and reference bounds along the axis of the line // between the two centroids. In the best case, these radii are the // furthest descendant distances, but that is not always true. It would // take too long to calculate the exact radii, so we are forced to use // MinimumBoundDistance() as a lower-bound approximation. const double lastQueryDescDist = traversalInfo.LastQueryNode()->MinimumBoundDistance(); const double lastRefDescDist = traversalInfo.LastReferenceNode()->MinimumBoundDistance(); adjustedScore = SortPolicy::CombineWorst(score, lastQueryDescDist); adjustedScore = SortPolicy::CombineWorst(score, lastRefDescDist); } // Assemble an adjusted score. For nearest neighbor search, this adjusted // score is a lower bound on MinDistance(queryNode, referenceNode) that is // assembled without actually calculating MinDistance(). For furthest // neighbor search, it is an upper bound on // MaxDistance(queryNode, referenceNode). If the traversalInfo isn't usable // then the node should not be pruned by this. if (traversalInfo.LastQueryNode() == queryNode.Parent()) { const double queryAdjust = queryParentDist + queryDescDist; adjustedScore = SortPolicy::CombineBest(adjustedScore, queryAdjust); } else if (traversalInfo.LastQueryNode() == &queryNode) { adjustedScore = SortPolicy::CombineBest(adjustedScore, queryDescDist); } else { // The last query node wasn't this query node or its parent. So we force // the adjustedScore to be such that this combination can't be pruned here, // because we don't really know anything about it. // It would be possible to modify this section to try and make a prune based // on the query descendant distance and the distance between the query node // and last traversal query node, but this case doesn't actually happen for // kd-trees or cover trees. adjustedScore = SortPolicy::BestDistance(); } if (traversalInfo.LastReferenceNode() == referenceNode.Parent()) { const double refAdjust = refParentDist + refDescDist; adjustedScore = SortPolicy::CombineBest(adjustedScore, refAdjust); } else if (traversalInfo.LastReferenceNode() == &referenceNode) { adjustedScore = SortPolicy::CombineBest(adjustedScore, refDescDist); } else { // The last reference node wasn't this reference node or its parent. So we // force the adjustedScore to be such that this combination can't be pruned // here, because we don't really know anything about it. // It would be possible to modify this section to try and make a prune based // on the reference descendant distance and the distance between the // reference node and last traversal reference node, but this case doesn't // actually happen for kd-trees or cover trees. adjustedScore = SortPolicy::BestDistance(); } // Can we prune? if (SortPolicy::IsBetter(bestDistance, adjustedScore)) { if (!(tree::TreeTraits::FirstPointIsCentroid && score == 0.0)) { // There isn't any need to set the traversal information because no // descendant combinations will be visited, and those are the only // combinations that would depend on the traversal information. return DBL_MAX; } } double distance; if (tree::TreeTraits::FirstPointIsCentroid) { // The first point in the node is the centroid, so we can calculate the // distance between the two points using BaseCase() and then find the // bounds. This is potentially loose for non-ball bounds. double baseCase = -1.0; if (tree::TreeTraits::HasSelfChildren && (traversalInfo.LastQueryNode()->Point(0) == queryNode.Point(0)) && (traversalInfo.LastReferenceNode()->Point(0) == referenceNode.Point(0))) { // We already calculated it. baseCase = traversalInfo.LastBaseCase(); } else { baseCase = BaseCase(queryNode.Point(0), referenceNode.Point(0)); } distance = SortPolicy::CombineBest(baseCase, queryNode.FurthestDescendantDistance() + referenceNode.FurthestDescendantDistance()); lastQueryIndex = queryNode.Point(0); lastReferenceIndex = referenceNode.Point(0); lastBaseCase = baseCase; traversalInfo.LastBaseCase() = baseCase; } else { distance = SortPolicy::BestNodeToNodeDistance(&queryNode, &referenceNode); } if (SortPolicy::IsBetter(distance, bestDistance)) { // Set traversal information. traversalInfo.LastQueryNode() = &queryNode; traversalInfo.LastReferenceNode() = &referenceNode; traversalInfo.LastScore() = distance; return distance; } else { // There isn't any need to set the traversal information because no // descendant combinations will be visited, and those are the only // combinations that would depend on the traversal information. return DBL_MAX; } } template inline double NeighborSearchRules::Rescore( TreeType& queryNode, TreeType& /* referenceNode */, const double oldScore) const { if (oldScore == DBL_MAX) return oldScore; // Update our bound. const double bestDistance = CalculateBound(queryNode); return (SortPolicy::IsBetter(oldScore, bestDistance)) ? oldScore : DBL_MAX; } // Calculate the bound for a given query node in its current state and update // it. template inline double NeighborSearchRules:: CalculateBound(TreeType& queryNode) const { // This is an adapted form of the B(N_q) function in the paper // ``Tree-Independent Dual-Tree Algorithms'' by Curtin et. al.; the goal is to // place a bound on the worst possible distance a point combination could have // to improve any of the current neighbor estimates. If the best possible // distance between two nodes is greater than this bound, then the node // combination can be pruned (see Score()). // There are a couple ways we can assemble a bound. For simplicity, this is // described for nearest neighbor search (SortPolicy = NearestNeighborSort), // but the code that is written is adapted for whichever SortPolicy. // First, we can consider the current worst neighbor candidate distance of any // descendant point. This is assembled with 'worstDistance' by looping // through the points held by the query node, and then by taking the cached // worst distance from any child nodes (Stat().FirstBound()). This // corresponds roughly to B_1(N_q) in the paper. // The other way of bounding is to use the triangle inequality. To do this, // we find the current best kth-neighbor candidate distance of any descendant // query point, and use the triangle inequality to place a bound on the // distance that candidate would have to any other descendant query point. // This corresponds roughly to B_2(N_q) in the paper, and is the bounding // style for cover trees. // Then, to assemble the final bound, since both bounds are valid, we simply // take the better of the two. double worstDistance = SortPolicy::BestDistance(); double bestDistance = SortPolicy::WorstDistance(); // Loop over points held in the node. for (size_t i = 0; i < queryNode.NumPoints(); ++i) { const double distance = distances(distances.n_rows - 1, queryNode.Point(i)); if (SortPolicy::IsBetter(worstDistance, distance)) worstDistance = distance; if (SortPolicy::IsBetter(distance, bestDistance)) bestDistance = distance; } // Add triangle inequality adjustment to best distance. It is possible this // could be tighter for some certain types of trees. bestDistance = SortPolicy::CombineWorst(bestDistance, queryNode.FurthestPointDistance() + queryNode.FurthestDescendantDistance()); // Loop over children of the node, and use their cached information to // assemble bounds. for (size_t i = 0; i < queryNode.NumChildren(); ++i) { const double firstBound = queryNode.Child(i).Stat().FirstBound(); const double adjustment = std::max(0.0, queryNode.FurthestDescendantDistance() - queryNode.Child(i).FurthestDescendantDistance()); const double adjustedSecondBound = SortPolicy::CombineWorst( queryNode.Child(i).Stat().SecondBound(), 2 * adjustment); if (SortPolicy::IsBetter(worstDistance, firstBound)) worstDistance = firstBound; if (SortPolicy::IsBetter(adjustedSecondBound, bestDistance)) bestDistance = adjustedSecondBound; } // Now consider the parent bounds. if (queryNode.Parent() != NULL) { // The parent's worst distance bound implies that the bound for this node // must be at least as good. Thus, if the parent worst distance bound is // better, then take it. if (SortPolicy::IsBetter(queryNode.Parent()->Stat().FirstBound(), worstDistance)) worstDistance = queryNode.Parent()->Stat().FirstBound(); // The parent's best distance bound implies that the bound for this node // must be at least as good. Thus, if the parent best distance bound is // better, then take it. if (SortPolicy::IsBetter(queryNode.Parent()->Stat().SecondBound(), bestDistance)) bestDistance = queryNode.Parent()->Stat().SecondBound(); } // Cache bounds for later. queryNode.Stat().FirstBound() = worstDistance; queryNode.Stat().SecondBound() = bestDistance; if (SortPolicy::IsBetter(worstDistance, bestDistance)) return worstDistance; else return bestDistance; } /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void NeighborSearchRules::InsertNeighbor( const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (distances.n_rows - 1)) { int len = (distances.n_rows - 1) - pos; memmove(distances.colptr(queryIndex) + (pos + 1), distances.colptr(queryIndex) + pos, sizeof(double) * len); memmove(neighbors.colptr(queryIndex) + (pos + 1), neighbors.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. distances(pos, queryIndex) = distance; neighbors(pos, queryIndex) = neighbor; } }; // namespace neighbor }; // namespace mlpack #endif // __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_RULES_IMPL_HPP RcppMLPACK/src/mlpack/methods/neighbor_search/unmap.hpp0000644000176200001440000000604413647512751022541 0ustar liggesusers/** * @file unmap.hpp * @author Ryan Curtin * * Convenience methods to unmap results. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_UNMAP_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_UNMAP_HPP #include namespace mlpack { namespace neighbor { /** * Assuming that the datasets have been mapped using the referenceMap and the * queryMap (such as during kd-tree construction), unmap the columns of the * distances and neighbors matrices into neighborsOut and distancesOut, and also * unmap the entries in each row of neighbors. This is useful for the dual-tree * case. * * @param neighbors Matrix of neighbors resulting from neighbor search. * @param distances Matrix of distances resulting from neighbor search. * @param referenceMap Mapping of reference set to old points. * @param queryMap Mapping of query set to old points. * @param neighborsOut Matrix to store unmapped neighbors into. * @param distancesOut Matrix to store unmapped distances into. * @param squareRoot If true, take the square root of the distances. */ void Unmap(const arma::Mat& neighbors, const arma::mat& distances, const std::vector& referenceMap, const std::vector& queryMap, arma::Mat& neighborsOut, arma::mat& distancesOut, const bool squareRoot = false); /** * Assuming that the datasets have been mapped using referenceMap (such as * during kd-tree construction), unmap the columns of the distances and * neighbors matrices into neighborsOut and distancesOut, and also unmap the * entries in each row of neighbors. This is useful for the single-tree case. * * @param neighbors Matrix of neighbors resulting from neighbor search. * @param distances Matrix of distances resulting from neighbor search. * @param referenceMap Mapping of reference set to old points. * @param neighborsOut Matrix to store unmapped neighbors into. * @param distancesOut Matrix to store unmapped distances into. * @param squareRoot If true, take the square root of the distances. */ void Unmap(const arma::Mat& neighbors, const arma::mat& distances, const std::vector& referenceMap, arma::Mat& neighborsOut, arma::mat& distancesOut, const bool squareRoot = false); }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/methods/neighbor_search/unmap.cpp0000644000176200001440000000501013647514343022523 0ustar liggesusers/** * @file unmap.cpp * @author Ryan Curtin * * Auxiliary function to unmap neighbor search results. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "unmap.hpp" namespace mlpack { namespace neighbor { // Useful in the dual-tree setting. void Unmap(const arma::Mat& neighbors, const arma::mat& distances, const std::vector& referenceMap, const std::vector& queryMap, arma::Mat& neighborsOut, arma::mat& distancesOut, const bool squareRoot) { // Set matrices to correct size. neighborsOut.set_size(neighbors.n_rows, neighbors.n_cols); distancesOut.set_size(distances.n_rows, distances.n_cols); // Unmap distances. for (size_t i = 0; i < distances.n_cols; ++i) { // Map columns to the correct place. The ternary operator does not work // here... if (squareRoot) distancesOut.col(queryMap[i]) = sqrt(distances.col(i)); else distancesOut.col(queryMap[i]) = distances.col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; ++j) neighborsOut(j, queryMap[i]) = referenceMap[neighbors(j, i)]; } } // Useful in the single-tree setting. void Unmap(const arma::Mat& neighbors, const arma::mat& distances, const std::vector& referenceMap, arma::Mat& neighborsOut, arma::mat& distancesOut, const bool squareRoot) { // Set matrices to correct size. neighborsOut.set_size(neighbors.n_rows, neighbors.n_cols); // Take square root of distances, if necessary. if (squareRoot) distancesOut = sqrt(distances); else distancesOut = distances; // Map neighbors back to original locations. for (size_t j = 0; j < neighbors.n_elem; ++j) neighborsOut[j] = referenceMap[neighbors[j]]; } }; // namespace neighbor }; // namespace mlpack RcppMLPACK/src/mlpack/methods/neighbor_search/neighbor_search.hpp0000644000176200001440000002563713647512751024554 0ustar liggesusers/** * @file neighbor_search.hpp * @author Ryan Curtin * * Defines the NeighborSearch class, which performs an abstract * nearest-neighbor-like query on two datasets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_HPP #include #include #include #include #include #include "neighbor_search_stat.hpp" #include "sort_policies/nearest_neighbor_sort.hpp" namespace mlpack { namespace neighbor /** Neighbor-search routines. These include * all-nearest-neighbors and all-furthest-neighbors * searches. */ { /** * The NeighborSearch class is a template class for performing distance-based * neighbor searches. It takes a query dataset and a reference dataset (or just * a reference dataset) and, for each point in the query dataset, finds the k * neighbors in the reference dataset which have the 'best' distance according * to a given sorting policy. A constructor is given which takes only a * reference dataset, and if that constructor is used, the given reference * dataset is also used as the query dataset. * * The template parameters SortPolicy and Metric define the sort function used * and the metric (distance function) used. More information on those classes * can be found in the NearestNeighborSort class and the kernel::ExampleKernel * class. * * @tparam SortPolicy The sort policy for distances; see NearestNeighborSort. * @tparam MetricType The metric to use for computation. * @tparam TreeType The tree type to use. */ template, NeighborSearchStat > > class NeighborSearch { public: /** * Initialize the NeighborSearch object, passing both a query and reference * dataset. Optionally, perform the computation in naive mode or single-tree * mode, and set the leaf size used for tree-building. An initialized * distance metric can be given, for cases where the metric has internal data * (i.e. the distance::MahalanobisDistance class). * * This method will copy the matrices to internal copies, which are rearranged * during tree-building. You can avoid this extra copy by pre-constructing * the trees and passing them using a diferent constructor. * * @param referenceSet Set of reference points. * @param querySet Set of query points. * @param naive If true, O(n^2) naive search will be used (as opposed to * dual-tree search). This overrides singleMode (if it is set to true). * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ NeighborSearch(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the NeighborSearch object, passing only one dataset, which is * used as both the query and the reference dataset. Optionally, perform the * computation in naive mode or single-tree mode, and set the leaf size used * for tree-building. An initialized distance metric can be given, for cases * where the metric has internal data (i.e. the distance::MahalanobisDistance * class). * * If naive mode is being used and a pre-built tree is given, it may not work: * naive mode operates by building a one-node tree (the root node holds all * the points). If that condition is not satisfied with the pre-built tree, * then naive mode will not work. * * @param referenceSet Set of reference points. * @param naive If true, O(n^2) naive search will be used (as opposed to * dual-tree search). This overrides singleMode (if it is set to true). * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ NeighborSearch(const typename TreeType::Mat& referenceSet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the NeighborSearch object with the given datasets and * pre-constructed trees. It is assumed that the points in referenceSet and * querySet correspond to the points in referenceTree and queryTree, * respectively. Optionally, choose to use single-tree mode. Naive mode is * not available as an option for this constructor; instead, to run naive * computation, construct a tree with all of the points in one leaf (i.e. * leafSize = number of points). Additionally, an instantiated distance * metric can be given, for cases where the distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param queryTree Pre-built tree for query points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param querySet Set of query points corresponding to queryTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ NeighborSearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the NeighborSearch object with the given reference dataset and * pre-constructed tree. It is assumed that the points in referenceSet * correspond to the points in referenceTree. Optionally, choose to use * single-tree mode. Naive mode is not available as an option for this * constructor; instead, to run naive computation, construct a tree with all * the points in one leaf (i.e. leafSize = number of points). Additionally, * an instantiated distance metric can be given, for the case where the * distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ NeighborSearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Delete the NeighborSearch object. The tree is the only member we are * responsible for deleting. The others will take care of themselves. */ ~NeighborSearch(); /** * Compute the nearest neighbors and store the output in the given matrices. * The matrices will be set to the size of n columns by k rows, where n is the * number of points in the query dataset and k is the number of neighbors * being searched for. * * @param k Number of neighbors to search for. * @param resultingNeighbors Matrix storing lists of neighbors for each query * point. * @param distances Matrix storing distances of neighbors for each query * point. */ void Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances); // Returns a string representation of this object. std::string ToString() const; private: //! Copy of reference dataset (if we need it, because tree building modifies //! it). typename TreeType::Mat referenceCopy; //! Copy of query dataset (if we need it, because tree building modifies it). typename TreeType::Mat queryCopy; //! Reference dataset. const typename TreeType::Mat& referenceSet; //! Query dataset (may not be given). const typename TreeType::Mat& querySet; //! Pointer to the root of the reference tree. TreeType* referenceTree; //! Pointer to the root of the query tree (might not exist). TreeType* queryTree; //! If true, this object created the trees and is responsible for them. bool treeOwner; //! Indicates if a separate query set was passed. bool hasQuerySet; //! Indicates if O(n^2) naive search is being used. bool naive; //! Indicates if single-tree search is being used (opposed to dual-tree). bool singleMode; //! Instantiation of metric. MetricType metric; //! Permutations of reference points during tree building. std::vector oldFromNewReferences; //! Permutations of query points during tree building. std::vector oldFromNewQueries; }; // class NeighborSearch }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "neighbor_search_impl.hpp" // Include convenience typedefs. #include "typedef.hpp" #endif RcppMLPACK/src/mlpack/methods/neighbor_search/ns_traversal_info.hpp0000644000176200001440000000554013647512751025137 0ustar liggesusers/** * @file ns_traversal_info.hpp * @author Ryan Curtin * * This class holds traversal information for dual-tree traversals that are * using the NeighborSearchRules RuleType. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_TRAVERSAL_INFO_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_TRAVERSAL_INFO_HPP namespace mlpack { namespace neighbor { /** * Traversal information for NeighborSearch. This information is used to make * parent-child prunes or parent-parent prunes in Score() without needing to * evaluate the distance between two nodes. * * The information held by this class is the last node combination visited * before the current node combination was recursed into and the distance * between the node centroids. */ template class NeighborSearchTraversalInfo { public: /** * Create the TraversalInfo object and initialize the pointers to NULL. */ NeighborSearchTraversalInfo() : lastQueryNode(NULL), lastReferenceNode(NULL), lastScore(0.0), lastBaseCase(0.0) { /* Nothing to do. */ } //! Get the last query node. TreeType* LastQueryNode() const { return lastQueryNode; } //! Modify the last query node. TreeType*& LastQueryNode() { return lastQueryNode; } //! Get the last reference node. TreeType* LastReferenceNode() const { return lastReferenceNode; } //! Modify the last reference node. TreeType*& LastReferenceNode() { return lastReferenceNode; } //! Get the score associated with the last query and reference nodes. double LastScore() const { return lastScore; } //! Modify the score associated with the last query and reference nodes. double& LastScore() { return lastScore; } //! Get the base case associated with the last node combination. double LastBaseCase() const { return lastBaseCase; } //! Modify the base case associated with the last node combination. double& LastBaseCase() { return lastBaseCase; } private: //! The last query node. TreeType* lastQueryNode; //! The last reference node. TreeType* lastReferenceNode; //! The last distance. double lastScore; //! The last base case. double lastBaseCase; }; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/src/mlpack/prereqs.hpp0000644000176200001440000000376513647512751016324 0ustar liggesusers/** * @file prereqs.hpp * * The core includes that mlpack expects; standard C++ includes and Armadillo. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_PREREQS_HPP #define __MLPACK_PREREQS_HPP // First, check if Armadillo was included before, warning if so. #ifdef ARMA_INCLUDES #pragma message "Armadillo was included before mlpack; this can sometimes cause\ problems. It should only be necessary to include and not\ ." #endif // Next, standard includes. #include #include #include #include #include #include #include #include // Defining _USE_MATH_DEFINES should set M_PI. #define _USE_MATH_DEFINES #include // For tgamma(). #include // But if it's not defined, we'll do it. #ifndef M_PI #define M_PI 3.141592653589793238462643383279 #endif // Give ourselves a nice way to force functions to be inline if we need. #define force_inline #if defined(__GNUG__) && !defined(DEBUG) #undef force_inline #define force_inline __attribute__((always_inline)) #elif defined(_MSC_VER) && !defined(DEBUG) #undef force_inline #define force_inline __forceinline #endif // Now include Armadillo through the special mlpack extensions. #include #endif RcppMLPACK/src/mlpack/core/0000755000176200001440000000000013647521523015044 5ustar liggesusersRcppMLPACK/src/mlpack/core/math/0000755000176200001440000000000013647514343015777 5ustar liggesusersRcppMLPACK/src/mlpack/core/math/round.hpp0000644000176200001440000000233413647512751017642 0ustar liggesusers/** * @file round.hpp * @author Ryan Curtin * * Implementation of round() for use on Visual Studio, where C99 isn't * implemented. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_ROUND_HPP #define __MLPACK_CORE_MATH_ROUND_HPP // _MSC_VER should only be defined for Visual Studio, which doesn't implement // C99. #ifdef _MSC_VER // This function ends up going into the global namespace, so it can be used in // place of C99's round(). //! Round a number to the nearest integer. inline double round(double a) { return floor(a + 0.5); } #endif #endif RcppMLPACK/src/mlpack/core/math/range.hpp0000644000176200001440000001070613647512751017611 0ustar liggesusers/** * @file range.hpp * * Definition of the Range class, which represents a simple range with a lower * and upper bound. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_RANGE_HPP #define __MLPACK_CORE_MATH_RANGE_HPP namespace mlpack { namespace math { /** * Simple real-valued range. It contains an upper and lower bound. */ class Range { private: double lo; /// The lower bound. double hi; /// The upper bound. public: /** Initialize to an empty set (where lo > hi). */ inline Range(); /*** * Initialize a range to enclose only the given point (lo = point, hi = * point). * * @param point Point that this range will enclose. */ inline Range(const double point); /** * Initializes to specified range. * * @param lo Lower bound of the range. * @param hi Upper bound of the range. */ inline Range(const double lo, const double hi); //! Get the lower bound. inline double Lo() const { return lo; } //! Modify the lower bound. inline double& Lo() { return lo; } //! Get the upper bound. inline double Hi() const { return hi; } //! Modify the upper bound. inline double& Hi() { return hi; } /** * Gets the span of the range (hi - lo). */ inline double Width() const; /** * Gets the midpoint of this range. */ inline double Mid() const; /** * Expands this range to include another range. * * @param rhs Range to include. */ inline Range& operator|=(const Range& rhs); /** * Expands this range to include another range. * * @param rhs Range to include. */ inline Range operator|(const Range& rhs) const; /** * Shrinks this range to be the overlap with another range; this makes an * empty set if there is no overlap. * * @param rhs Other range. */ inline Range& operator&=(const Range& rhs); /** * Shrinks this range to be the overlap with another range; this makes an * empty set if there is no overlap. * * @param rhs Other range. */ inline Range operator&(const Range& rhs) const; /** * Scale the bounds by the given double. * * @param d Scaling factor. */ inline Range& operator*=(const double d); /** * Scale the bounds by the given double. * * @param d Scaling factor. */ inline Range operator*(const double d) const; /** * Scale the bounds by the given double. * * @param d Scaling factor. */ friend inline Range operator*(const double d, const Range& r); // Symmetric. /** * Compare with another range for strict equality. * * @param rhs Other range. */ inline bool operator==(const Range& rhs) const; /** * Compare with another range for strict equality. * * @param rhs Other range. */ inline bool operator!=(const Range& rhs) const; /** * Compare with another range. For Range objects x and y, x < y means that x * is strictly less than y and does not overlap at all. * * @param rhs Other range. */ inline bool operator<(const Range& rhs) const; /** * Compare with another range. For Range objects x and y, x < y means that x * is strictly less than y and does not overlap at all. * * @param rhs Other range. */ inline bool operator>(const Range& rhs) const; /** * Determines if a point is contained within the range. * * @param d Point to check. */ inline bool Contains(const double d) const; /** * Determines if another range overlaps with this one. * * @param r Other range. * * @return true if ranges overlap at all. */ inline bool Contains(const Range& r) const; /** * Returns a string representation of an object. */ inline std::string ToString() const; }; }; // namespace math }; // namespace mlpack // Include inlined implementation. #include "range_impl.hpp" #endif // __MLPACK_CORE_MATH_RANGE_HPP RcppMLPACK/src/mlpack/core/math/random.cpp0000644000176200001440000000311113647514343017757 0ustar liggesusers/** * @file random.cpp * * Declarations of global Boost random number generators. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include #include namespace mlpack { namespace math { #if BOOST_VERSION >= 104700 // Global random object. boost::random::mt19937 randGen; // Global uniform distribution. boost::random::uniform_01<> randUniformDist; // Global normal distribution. boost::random::normal_distribution<> randNormalDist; #else // Global random object. boost::mt19937 randGen; #if BOOST_VERSION >= 103900 // Global uniform distribution. boost::uniform_01<> randUniformDist; #else // Pre-1.39 Boost.Random did not give default template parameter values. boost::uniform_01 randUniformDist(randGen); #endif // Global normal distribution. boost::normal_distribution<> randNormalDist; #endif }; // namespace math }; // namespace mlpack RcppMLPACK/src/mlpack/core/math/lin_alg.cpp0000644000176200001440000001441413647514343020114 0ustar liggesusers/** * @file lin_alg.cpp * @author Nishant Mehta * * Linear algebra utilities. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "lin_alg.hpp" #include using namespace mlpack; using namespace math; /** * Auxiliary function to raise vector elements to a specific power. The sign * is ignored in the power operation and then re-added. Useful for * eigenvalues. */ void mlpack::math::VectorPower(arma::vec& vec, const double power) { for (size_t i = 0; i < vec.n_elem; i++) { if (std::abs(vec(i)) > 1e-12) vec(i) = (vec(i) > 0) ? std::pow(vec(i), (double) power) : -std::pow(-vec(i), (double) power); else vec(i) = 0; } } /** * Creates a centered matrix, where centering is done by subtracting * the sum over the columns (a column vector) from each column of the matrix. * * @param x Input matrix * @param xCentered Matrix to write centered output into */ void mlpack::math::Center(const arma::mat& x, arma::mat& xCentered) { // Get the mean of the elements in each row. arma::vec rowMean = arma::sum(x, 1) / x.n_cols; xCentered = x - arma::repmat(rowMean, 1, x.n_cols); } /** * Whitens a matrix using the singular value decomposition of the covariance * matrix. Whitening means the covariance matrix of the result is the identity * matrix. */ void mlpack::math::WhitenUsingSVD(const arma::mat& x, arma::mat& xWhitened, arma::mat& whiteningMatrix) { arma::mat covX, u, v, invSMatrix, temp1; arma::vec sVector; covX = ccov(x); svd(u, sVector, v, covX); size_t d = sVector.n_elem; invSMatrix.zeros(d, d); invSMatrix.diag() = 1 / sqrt(sVector); whiteningMatrix = v * invSMatrix * trans(u); xWhitened = whiteningMatrix * x; } /** * Whitens a matrix using the eigendecomposition of the covariance matrix. * Whitening means the covariance matrix of the result is the identity matrix. */ void mlpack::math::WhitenUsingEig(const arma::mat& x, arma::mat& xWhitened, arma::mat& whiteningMatrix) { arma::mat diag, eigenvectors; arma::vec eigenvalues; // Get eigenvectors of covariance of input matrix. eig_sym(eigenvalues, eigenvectors, ccov(x)); // Generate diagonal matrix using 1 / sqrt(eigenvalues) for each value. VectorPower(eigenvalues, -0.5); diag.zeros(eigenvalues.n_elem, eigenvalues.n_elem); diag.diag() = eigenvalues; // Our whitening matrix is diag(1 / sqrt(eigenvectors)) * eigenvalues. whiteningMatrix = diag * trans(eigenvectors); // Now apply the whitening matrix. xWhitened = whiteningMatrix * x; } /** * Overwrites a dimension-N vector to a random vector on the unit sphere in R^N. */ void mlpack::math::RandVector(arma::vec& v) { v.zeros(); for (size_t i = 0; i + 1 < v.n_elem; i += 2) { double a = Random(); double b = Random(); double first_term = sqrt(-2 * log(a)); double second_term = 2 * M_PI * b; v[i] = first_term * cos(second_term); v[i + 1] = first_term * sin(second_term); } if ((v.n_elem % 2) == 1) { v[v.n_elem - 1] = sqrt(-2 * log(math::Random())) * cos(2 * M_PI * math::Random()); } v /= sqrt(dot(v, v)); } /** * Orthogonalize x and return the result in W, using eigendecomposition. * We will be using the formula \f$ W = x (x^T x)^{-0.5} \f$. */ void mlpack::math::Orthogonalize(const arma::mat& x, arma::mat& W) { // For a matrix A, A^N = V * D^N * V', where VDV' is the // eigendecomposition of the matrix A. arma::mat eigenvalues, eigenvectors; arma::vec egval; eig_sym(egval, eigenvectors, ccov(x)); VectorPower(egval, -0.5); eigenvalues.zeros(egval.n_elem, egval.n_elem); eigenvalues.diag() = egval; arma::mat at = (eigenvectors * eigenvalues * trans(eigenvectors)); W = at * x; } /** * Orthogonalize x in-place. This could be sped up by a custom * implementation. */ void mlpack::math::Orthogonalize(arma::mat& x) { Orthogonalize(x, x); } /** * Remove a certain set of rows in a matrix while copying to a second matrix. * * @param input Input matrix to copy. * @param rowsToRemove Vector containing indices of rows to be removed. * @param output Matrix to copy non-removed rows into. */ void mlpack::math::RemoveRows(const arma::mat& input, const std::vector& rowsToRemove, arma::mat& output) { const size_t nRemove = rowsToRemove.size(); const size_t nKeep = input.n_rows - nRemove; if (nRemove == 0) { output = input; // Copy everything. } else { output.set_size(nKeep, input.n_cols); size_t curRow = 0; size_t removeInd = 0; // First, check 0 to first row to remove. if (rowsToRemove[0] > 0) { // Note that this implies that n_rows > 1. output.rows(0, rowsToRemove[0] - 1) = input.rows(0, rowsToRemove[0] - 1); curRow += rowsToRemove[0]; } // Now, check i'th row to remove to (i + 1)'th row to remove, until i is the // penultimate row. while (removeInd < nRemove - 1) { const size_t height = rowsToRemove[removeInd + 1] - rowsToRemove[removeInd] - 1; if (height > 0) { output.rows(curRow, curRow + height - 1) = input.rows(rowsToRemove[removeInd] + 1, rowsToRemove[removeInd + 1] - 1); curRow += height; } removeInd++; } // Now that i is the last row to remove, check last row to remove to last // row. if (rowsToRemove[removeInd] < input.n_rows - 1) { output.rows(curRow, nKeep - 1) = input.rows(rowsToRemove[removeInd] + 1, input.n_rows - 1); } } } RcppMLPACK/src/mlpack/core/math/clamp.hpp0000644000176200001440000000416013647512751017606 0ustar liggesusers/** * @file clamp.hpp * * Miscellaneous math clamping routines. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_CLAMP_HPP #define __MLPACK_CORE_MATH_CLAMP_HPP #include #include #include namespace mlpack { namespace math /** Miscellaneous math routines. */ { /** * Forces a number to be non-negative, turning negative numbers into zero. * Avoids branching costs (this is a measurable improvement). * * @param d Double to clamp. * @return 0 if d < 0, d otherwise. */ inline double ClampNonNegative(const double d) { return (d + fabs(d)) / 2; } /** * Forces a number to be non-positive, turning positive numbers into zero. * Avoids branching costs (this is a measurable improvement). * * @param d Double to clamp. * @param 0 if d > 0, d otherwise. */ inline double ClampNonPositive(const double d) { return (d - fabs(d)) / 2; } /** * Clamp a number between a particular range. * * @param value The number to clamp. * @param rangeMin The first of the range. * @param rangeMax The last of the range. * @return max(rangeMin, min(rangeMax, d)). */ inline double ClampRange(double value, const double rangeMin, const double rangeMax) { value -= rangeMax; value = ClampNonPositive(value) + rangeMax; value -= rangeMin; value = ClampNonNegative(value) + rangeMin; return value; } }; // namespace math }; // namespace mlpack #endif // __MLPACK_CORE_MATH_CLAMP_HPP RcppMLPACK/src/mlpack/core/math/lin_alg.hpp0000644000176200001440000000577013647512751020127 0ustar liggesusers/** * @file lin_alg.hpp * @author Nishant Mehta * * Linear algebra utilities. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_LIN_ALG_HPP #define __MLPACK_CORE_MATH_LIN_ALG_HPP #include /** * Linear algebra utility functions, generally performed on matrices or vectors. */ namespace mlpack { namespace math { /** * Auxiliary function to raise vector elements to a specific power. The sign * is ignored in the power operation and then re-added. Useful for * eigenvalues. */ void VectorPower(arma::vec& vec, const double power); /** * Creates a centered matrix, where centering is done by subtracting * the sum over the columns (a column vector) from each column of the matrix. * * @param x Input matrix * @param xCentered Matrix to write centered output into */ void Center(const arma::mat& x, arma::mat& xCentered); /** * Whitens a matrix using the singular value decomposition of the covariance * matrix. Whitening means the covariance matrix of the result is the identity * matrix. */ void WhitenUsingSVD(const arma::mat& x, arma::mat& xWhitened, arma::mat& whiteningMatrix); /** * Whitens a matrix using the eigendecomposition of the covariance matrix. * Whitening means the covariance matrix of the result is the identity matrix. */ void WhitenUsingEig(const arma::mat& x, arma::mat& xWhitened, arma::mat& whiteningMatrix); /** * Overwrites a dimension-N vector to a random vector on the unit sphere in R^N. */ void RandVector(arma::vec& v); /** * Orthogonalize x and return the result in W, using eigendecomposition. * We will be using the formula \f$ W = x (x^T x)^{-0.5} \f$. */ void Orthogonalize(const arma::mat& x, arma::mat& W); /** * Orthogonalize x in-place. This could be sped up by a custom * implementation. */ void Orthogonalize(arma::mat& x); /** * Remove a certain set of rows in a matrix while copying to a second matrix. * * @param input Input matrix to copy. * @param rowsToRemove Vector containing indices of rows to be removed. * @param output Matrix to copy non-removed rows into. */ void RemoveRows(const arma::mat& input, const std::vector& rowsToRemove, arma::mat& output); }; // namespace math }; // namespace mlpack #endif // __MLPACK_CORE_MATH_LIN_ALG_HPP RcppMLPACK/src/mlpack/core/math/random.hpp0000644000176200001440000001043213647512751017771 0ustar liggesusers/** * @file random.hpp * * Miscellaneous math random-related routines. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_RANDOM_HPP #define __MLPACK_CORE_MATH_RANDOM_HPP #include #include namespace mlpack { namespace math /** Miscellaneous math routines. */ { // Annoying Boost versioning issues. #include #if BOOST_VERSION >= 104700 // Global random object. extern boost::random::mt19937 randGen; // Global uniform distribution. extern boost::random::uniform_01<> randUniformDist; // Global normal distribution. extern boost::random::normal_distribution<> randNormalDist; #else // Global random object. extern boost::mt19937 randGen; #if BOOST_VERSION >= 103900 // Global uniform distribution. extern boost::uniform_01<> randUniformDist; #else // Pre-1.39 Boost.Random did not give default template parameter values. extern boost::uniform_01 randUniformDist; #endif // Global normal distribution. extern boost::normal_distribution<> randNormalDist; #endif /** * Set the random seed used by the random functions (Random() and RandInt()). * The seed is casted to a 32-bit integer before being given to the random * number generator, but a size_t is taken as a parameter for API consistency. * * @param seed Seed for the random number generator. */ inline void RandomSeed(const size_t seed) { randGen.seed((uint32_t) seed); srand((unsigned int) seed); } /** * Generates a uniform random number between 0 and 1. */ inline double Random() { #if BOOST_VERSION >= 103900 return randUniformDist(randGen); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return randUniformDist(); #endif } /** * Generates a uniform random number in the specified range. */ inline double Random(const double lo, const double hi) { #if BOOST_VERSION >= 103900 return lo + (hi - lo) * randUniformDist(randGen); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return lo + (hi - lo) * randUniformDist(); #endif } /** * Generates a uniform random integer. */ inline int RandInt(const int hiExclusive) { #if BOOST_VERSION >= 103900 return (int) std::floor((double) hiExclusive * randUniformDist(randGen)); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return (int) std::floor((double) hiExclusive * randUniformDist()); #endif } /** * Generates a uniform random integer. */ inline int RandInt(const int lo, const int hiExclusive) { #if BOOST_VERSION >= 103900 return lo + (int) std::floor((double) (hiExclusive - lo) * randUniformDist(randGen)); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return lo + (int) std::floor((double) (hiExclusive - lo) * randUniformDist()); #endif } /** * Generates a normally distributed random number with mean 0 and variance 1. */ inline double RandNormal() { return randNormalDist(randGen); } /** * Generates a normally distributed random number with specified mean and * variance. * * @param mean Mean of distribution. * @param variance Variance of distribution. */ inline double RandNormal(const double mean, const double variance) { return variance * randNormalDist(randGen) + mean; } }; // namespace math }; // namespace mlpack #endif // __MLPACK_CORE_MATH_MATH_LIB_HPP RcppMLPACK/src/mlpack/core/math/range_impl.hpp0000644000176200001440000001024013647512751020623 0ustar liggesusers/** * @file range_impl.hpp * * Implementation of the (inlined) Range class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_RANGE_IMPL_HPP #define __MLPACK_CORE_MATH_RANGE_IMPL_HPP #include "range.hpp" #include #include namespace mlpack { namespace math { /** * Initialize the range to 0. */ inline Range::Range() : lo(DBL_MAX), hi(-DBL_MAX) { /* nothing else to do */ } /** * Initialize a range to enclose only the given point. */ inline Range::Range(const double point) : lo(point), hi(point) { /* nothing else to do */ } /** * Initializes the range to the specified values. */ inline Range::Range(const double lo, const double hi) : lo(lo), hi(hi) { /* nothing else to do */ } /** * Gets the span of the range, hi - lo. Returns 0 if the range is negative. */ inline double Range::Width() const { if (lo < hi) return (hi - lo); else return 0.0; } /** * Gets the midpoint of this range. */ inline double Range::Mid() const { return (hi + lo) / 2; } /** * Expands range to include the other range. */ inline Range& Range::operator|=(const Range& rhs) { if (rhs.lo < lo) lo = rhs.lo; if (rhs.hi > hi) hi = rhs.hi; return *this; } inline Range Range::operator|(const Range& rhs) const { return Range((rhs.lo < lo) ? rhs.lo : lo, (rhs.hi > hi) ? rhs.hi : hi); } /** * Shrinks range to be the overlap with another range, becoming an empty * set if there is no overlap. */ inline Range& Range::operator&=(const Range& rhs) { if (rhs.lo > lo) lo = rhs.lo; if (rhs.hi < hi) hi = rhs.hi; return *this; } inline Range Range::operator&(const Range& rhs) const { return Range((rhs.lo > lo) ? rhs.lo : lo, (rhs.hi < hi) ? rhs.hi : hi); } /** * Scale the bounds by the given double. */ inline Range& Range::operator*=(const double d) { lo *= d; hi *= d; // Now if we've negated, we need to flip things around so the bound is valid. if (lo > hi) { double tmp = hi; hi = lo; lo = tmp; } return *this; } inline Range Range::operator*(const double d) const { double nlo = lo * d; double nhi = hi * d; if (nlo <= nhi) return Range(nlo, nhi); else return Range(nhi, nlo); } // Symmetric case. inline Range operator*(const double d, const Range& r) { double nlo = r.lo * d; double nhi = r.hi * d; if (nlo <= nhi) return Range(nlo, nhi); else return Range(nhi, nlo); } /** * Compare with another range for strict equality. */ inline bool Range::operator==(const Range& rhs) const { return (lo == rhs.lo) && (hi == rhs.hi); } inline bool Range::operator!=(const Range& rhs) const { return (lo != rhs.lo) || (hi != rhs.hi); } /** * Compare with another range. For Range objects x and y, x < y means that x is * strictly less than y and does not overlap at all. */ inline bool Range::operator<(const Range& rhs) const { return hi < rhs.lo; } inline bool Range::operator>(const Range& rhs) const { return lo > rhs.hi; } /** * Determines if a point is contained within the range. */ inline bool Range::Contains(const double d) const { return d >= lo && d <= hi; } /** * Determines if this range overlaps with another range. */ inline bool Range::Contains(const Range& r) const { return lo <= r.hi && hi >= r.lo; } /** * Returns a string representation of an object. */ std::string Range::ToString() const { std::ostringstream convert; convert << "[" << lo << ", " << hi << "]"; return convert.str(); } }; // namespace math }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/dists/0000755000176200001440000000000013647514343016174 5ustar liggesusersRcppMLPACK/src/mlpack/core/dists/discrete_distribution.hpp0000644000176200001440000001310713647512751023311 0ustar liggesusers/** * @file discrete_distribution.hpp * @author Ryan Curtin * * Implementation of the discrete distribution, where each discrete observation * has a given probability. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_DISTRIBUTIONS_DISCRETE_DISTRIBUTION_HPP #define __MLPACK_METHODS_HMM_DISTRIBUTIONS_DISCRETE_DISTRIBUTION_HPP #include namespace mlpack { namespace distribution /** Probability distributions. */ { /** * A discrete distribution where the only observations are discrete * observations. This is useful (for example) with discrete Hidden Markov * Models, where observations are non-negative integers representing specific * emissions. * * No bounds checking is performed for observations, so if an invalid * observation is passed (i.e. observation > numObservations), a crash will * probably occur. * * This distribution only supports one-dimensional observations, so when passing * an arma::vec as an observation, it should only have one dimension * (vec.n_rows == 1). Any additional dimensions will simply be ignored. * * @note * This class, like every other class in MLPACK, uses arma::vec to represent * observations. While a discrete distribution only has positive integers * (size_t) as observations, these can be converted to doubles (which is what * arma::vec holds). This distribution internally converts those doubles back * into size_t before comparisons. * @endnote */ class DiscreteDistribution { public: /** * Default constructor, which creates a distribution that has no observations. */ DiscreteDistribution() { /* nothing to do */ } /** * Define the discrete distribution as having numObservations possible * observations. The probability in each state will be set to (1 / * numObservations). * * @param numObservations Number of possible observations this distribution * can have. */ DiscreteDistribution(const size_t numObservations) : probabilities(arma::ones(numObservations) / numObservations) { /* nothing to do */ } /** * Define the discrete distribution as having the given probabilities for each * observation. * * @param probabilities Probabilities of each possible observation. */ DiscreteDistribution(const arma::vec& probabilities) { // We must be sure that our distribution is normalized. double sum = accu(probabilities); if (sum > 0) this->probabilities = probabilities / sum; else { this->probabilities.set_size(probabilities.n_elem); this->probabilities.fill(1 / probabilities.n_elem); } } /** * Get the dimensionality of the distribution. */ size_t Dimensionality() const { return 1; } /** * Return the probability of the given observation. If the observation is * greater than the number of possible observations, then a crash will * probably occur -- bounds checking is not performed. * * @param observation Observation to return the probability of. * @return Probability of the given observation. */ double Probability(const arma::vec& observation) const { // Adding 0.5 helps ensure that we cast the floating point to a size_t // correctly. const size_t obs = size_t(observation[0] + 0.5); // Ensure that the observation is within the bounds. if (obs >= probabilities.n_elem) { Rcpp::Rcout << "DiscreteDistribution::Probability(): received observation " << obs << "; observation must be in [0, " << probabilities.n_elem << "] for this distribution." << std::endl; } return probabilities(obs); } /** * Return a randomly generated observation (one-dimensional vector; one * observation) according to the probability distribution defined by this * object. * * @return Random observation. */ arma::vec Random() const; /** * Estimate the probability distribution directly from the given observations. * If any of the observations is greater than numObservations, a crash is * likely to occur. * * @param observations List of observations. */ void Estimate(const arma::mat& observations); /** * Estimate the probability distribution from the given observations, taking * into account the probability of each observation actually being from this * distribution. * * @param observations List of observations. * @param probabilities List of probabilities that each observation is * actually from this distribution. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities); //! Return the vector of probabilities. const arma::vec& Probabilities() const { return probabilities; } //! Modify the vector of probabilities. arma::vec& Probabilities() { return probabilities; } /* * Returns a string representation of this object. */ std::string ToString() const; private: arma::vec probabilities; }; }; // namespace distribution }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/dists/discrete_distribution.cpp0000644000176200001440000001014613647514343023303 0ustar liggesusers/** * @file discrete_distribution.cpp * @author Ryan Curtin * * Implementation of DiscreteDistribution probability distribution. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "discrete_distribution.hpp" using namespace mlpack; using namespace mlpack::distribution; /** * Return a randomly generated observation according to the probability * distribution defined by this object. */ arma::vec DiscreteDistribution::Random() const { // Generate a random number. double randObs = math::Random(); arma::vec result(1); double sumProb = 0; for (size_t obs = 0; obs < probabilities.n_elem; obs++) { if ((sumProb += probabilities[obs]) >= randObs) { result[0] = obs; return result; } } // This shouldn't happen. result[0] = probabilities.n_elem - 1; return result; } /** * Estimate the probability distribution directly from the given observations. */ void DiscreteDistribution::Estimate(const arma::mat& observations) { // Clear old probabilities. probabilities.zeros(); // Add the probability of each observation. The addition of 0.5 to the // observation is to turn the default flooring operation of the size_t cast // into a rounding operation. for (size_t i = 0; i < observations.n_cols; i++) { const size_t obs = size_t(observations(0, i) + 0.5); // Ensure that the observation is within the bounds. if (obs >= probabilities.n_elem) { Rcpp::Rcout << "DiscreteDistribution::Estimate(): observation " << i << " (" << obs << ") is invalid; observation must be in [0, " << probabilities.n_elem << "] for this distribution." << std::endl; } probabilities(obs)++; } // Now normalize the distribution. double sum = accu(probabilities); if (sum > 0) probabilities /= sum; else probabilities.fill(1 / probabilities.n_elem); // Force normalization. } /** * Estimate the probability distribution from the given observations when also * given probabilities that each observation is from this distribution. */ void DiscreteDistribution::Estimate(const arma::mat& observations, const arma::vec& probObs) { // Clear old probabilities. probabilities.zeros(); // Add the probability of each observation. The addition of 0.5 to the // observation is to turn the default flooring operation of the size_t cast // into a rounding observation. for (size_t i = 0; i < observations.n_cols; i++) { const size_t obs = size_t(observations(0, i) + 0.5); // Ensure that the observation is within the bounds. if (obs >= probabilities.n_elem) { Rcpp::Rcout << "DiscreteDistribution::Estimate(): observation " << i << " (" << obs << ") is invalid; observation must be in [0, " << probabilities.n_elem << "] for this distribution." << std::endl; } probabilities(obs) += probObs[i]; } // Now normalize the distribution. double sum = accu(probabilities); if (sum > 0) probabilities /= sum; else probabilities.fill(1 / probabilities.n_elem); // Force normalization. } /* * Returns a string representation of this object. */ std::string DiscreteDistribution::ToString() const { std::ostringstream convert; convert << "DiscreteDistribution [" << this << "]" << std::endl; // Secondary object so we can indent the probabilities. std::ostringstream prob; prob << "Probabilities:" << std::endl; prob << probabilities; convert << util::Indent(prob.str()); return convert.str(); } RcppMLPACK/src/mlpack/core/dists/gaussian_distribution.cpp0000644000176200001440000001125113647514343023311 0ustar liggesusers/** * @file gaussian_distribution.cpp * @author Ryan Curtin * * Implementation of Gaussian distribution class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "gaussian_distribution.hpp" using namespace mlpack; using namespace mlpack::distribution; arma::vec GaussianDistribution::Random() const { // Should we store chol(covariance) for easier calculation later? return trans(chol(covariance)) * arma::randn(mean.n_elem) + mean; } /** * Estimate the Gaussian distribution directly from the given observations. * * @param observations List of observations. */ void GaussianDistribution::Estimate(const arma::mat& observations) { if (observations.n_cols > 0) { mean.zeros(observations.n_rows); covariance.zeros(observations.n_rows, observations.n_rows); } else // This will end up just being empty. { mean.zeros(0); covariance.zeros(0); return; } // Calculate the mean. for (size_t i = 0; i < observations.n_cols; i++) mean += observations.col(i); // Normalize the mean. mean /= observations.n_cols; // Now calculate the covariance. for (size_t i = 0; i < observations.n_cols; i++) { arma::vec obsNoMean = observations.col(i) - mean; covariance += obsNoMean * trans(obsNoMean); } // Finish estimating the covariance by normalizing, with the (1 / (n - 1)) so // that it is the unbiased estimator. covariance /= (observations.n_cols - 1); // Ensure that the covariance is positive definite. if (det(covariance) <= 1e-50) { Rcpp::Rcout << "GaussianDistribution::Estimate(): Covariance matrix is not " << "positive definite. Adding perturbation." << std::endl; double perturbation = 1e-30; while (det(covariance) <= 1e-50) { covariance.diag() += perturbation; perturbation *= 10; // Slow, but we don't want to add too much. } } } /** * Estimate the Gaussian distribution from the given observations, taking into * account the probability of each observation actually being from this * distribution. */ void GaussianDistribution::Estimate(const arma::mat& observations, const arma::vec& probabilities) { if (observations.n_cols > 0) { mean.zeros(observations.n_rows); covariance.zeros(observations.n_rows, observations.n_rows); } else // This will end up just being empty. { mean.zeros(0); covariance.zeros(0); return; } double sumProb = 0; // First calculate the mean, and save the sum of all the probabilities for // later normalization. for (size_t i = 0; i < observations.n_cols; i++) { mean += probabilities[i] * observations.col(i); sumProb += probabilities[i]; } if (sumProb == 0) { // Nothing in this Gaussian! At least set the covariance so that it's // invertible. covariance.diag() += 1e-50; return; } // Normalize. mean /= sumProb; // Now find the covariance. for (size_t i = 0; i < observations.n_cols; i++) { arma::vec obsNoMean = observations.col(i) - mean; covariance += probabilities[i] * (obsNoMean * trans(obsNoMean)); } // This is probably biased, but I don't know how to unbias it. covariance /= sumProb; // Ensure that the covariance is positive definite. if (det(covariance) <= 1e-50) { Rcpp::Rcout << "GaussianDistribution::Estimate(): Covariance matrix is not " << "positive definite. Adding perturbation." << std::endl; double perturbation = 1e-30; while (det(covariance) <= 1e-50) { covariance.diag() += perturbation; perturbation *= 10; // Slow, but we don't want to add too much. } } } /** * Returns a string representation of this object. */ std::string GaussianDistribution::ToString() const { std::ostringstream convert; convert << "GaussianDistribution [" << this << "]" << std::endl; // Secondary ostringstream so things can be indented right. std::ostringstream data; data << "Mean: " << std::endl << mean; data << "Covariance: " << std::endl << covariance; convert << util::Indent(data.str()); return convert.str(); } RcppMLPACK/src/mlpack/core/dists/gaussian_distribution.hpp0000644000176200001440000000667313647512751023333 0ustar liggesusers/** * @file gaussian_distribution.hpp * @author Ryan Curtin * * Implementation of the Gaussian distribution. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_DISTRIBUTIONS_GAUSSIAN_DISTRIBUTION_HPP #define __MLPACK_METHODS_HMM_DISTRIBUTIONS_GAUSSIAN_DISTRIBUTION_HPP #include // Should be somewhere else, maybe in core. #include namespace mlpack { namespace distribution { /** * A single multivariate Gaussian distribution. */ class GaussianDistribution { private: //! Mean of the distribution. arma::vec mean; //! Covariance of the distribution. arma::mat covariance; public: /** * Default constructor, which creates a Gaussian with zero dimension. */ GaussianDistribution() { /* nothing to do */ } /** * Create a Gaussian distribution with zero mean and identity covariance with * the given dimensionality. */ GaussianDistribution(const size_t dimension) : mean(arma::zeros(dimension)), covariance(arma::eye(dimension, dimension)) { /* Nothing to do. */ } /** * Create a Gaussian distribution with the given mean and covariance. */ GaussianDistribution(const arma::vec& mean, const arma::mat& covariance) : mean(mean), covariance(covariance) { /* Nothing to do. */ } //! Return the dimensionality of this distribution. size_t Dimensionality() const { return mean.n_elem; } /** * Return the probability of the given observation. */ double Probability(const arma::vec& observation) const { return mlpack::gmm::phi(observation, mean, covariance); } /** * Return a randomly generated observation according to the probability * distribution defined by this object. * * @return Random observation from this Gaussian distribution. */ arma::vec Random() const; /** * Estimate the Gaussian distribution directly from the given observations. * * @param observations List of observations. */ void Estimate(const arma::mat& observations); /** * Estimate the Gaussian distribution from the given observations, taking into * account the probability of each observation actually being from this * distribution. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities); //! Return the mean. const arma::vec& Mean() const { return mean; } //! Return a modifiable copy of the mean. arma::vec& Mean() { return mean; } //! Return the covariance matrix. const arma::mat& Covariance() const { return covariance; } //! Return a modifiable copy of the covariance. arma::mat& Covariance() { return covariance; } /** * Returns a string representation of this object. */ std::string ToString() const; }; }; // namespace distribution }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/dists/laplace_distribution.hpp0000644000176200001440000001141413647512751023107 0ustar liggesusers/* * @file laplace.hpp * @author Zhihao Lou * * Laplace (double exponential) distribution used in SA. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZER_SA_LAPLACE_DISTRIBUTION_HPP #define __MLPACK_CORE_OPTIMIZER_SA_LAPLACE_DISTRIBUTION_HPP namespace mlpack { namespace distribution { /** * The multivariate Laplace distribution centered at 0 has pdf * * \f[ * f(x|\theta) = \frac{1}{2 \theta}\exp\left(-\frac{\|x - \mu\|}{\theta}\right) * \f] * * given scale parameter \f$\theta\f$ and mean \f$\mu\f$. This implementation * assumes a diagonal covariance, but a rewrite to support arbitrary covariances * is possible. * * See the following paper for more information on the non-diagonal-covariance * Laplace distribution and estimation techniques: * * @code * @article{eltoft2006multivariate, * title={{On the Multivariate Laplace Distribution}}, * author={Eltoft, Torbj\orn and Kim, Taesu and Lee, Te-Won}, * journal={IEEE Signal Processing Letters}, * volume={13}, * number={5}, * pages={300--304}, * year={2006} * } * @endcode * * Note that because of the diagonal covariance restriction, much of the algebra * in the paper above becomes simplified, and the PDF takes roughly the same * form as the univariate case. */ class LaplaceDistribution { public: /** * Default constructor, which creates a Laplace distribution with zero * dimension and zero scale parameter. */ LaplaceDistribution() : scale(0) { } /** * Construct the Laplace distribution with the given scale and dimensionality. * The mean is initialized to zero. * * @param dimensionality Dimensionality of distribution. * @param scale Scale of distribution. */ LaplaceDistribution(const size_t dimensionality, const double scale) : mean(arma::zeros(dimensionality)), scale(scale) { } /** * Construct the Laplace distribution with the given mean and scale parameter. * * @param mean Mean of distribution. * @param scale Scale of distribution. */ LaplaceDistribution(const arma::vec& mean, const double scale) : mean(mean), scale(scale) { } //! Return the dimensionality of this distribution. size_t Dimensionality() const { return mean.n_elem; } /** * Return the probability of the given observation. */ double Probability(const arma::vec& observation) const; /** * Return a randomly generated observation according to the probability * distribution defined by this object. This is inlined for speed. * * @return Random observation from this Laplace distribution. */ arma::vec Random() const { arma::vec result(mean.n_elem); result.randu(); // Convert from uniform distribution to Laplace distribution. // arma::sign() does not exist in Armadillo < 3.920 so we have to do this // elementwise. for (size_t i = 0; i < result.n_elem; ++i) { if (result[i] < 0) result[i] = mean[i] + scale * result[i] * std::log(1 + 2.0 * (result[i] - 0.5)); else result[i] = mean[i] - scale * result[i] * std::log(1 - 2.0 * (result[i] - 0.5)); } return result; } /** * Estimate the Laplace distribution directly from the given observations. * * @param observations List of observations. */ void Estimate(const arma::mat& observations); /** * Estimate the Laplace distribution from the given observations, taking into * account the probability of each observation actually being from this * distribution. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities); //! Return the mean. const arma::vec& Mean() const { return mean; } //! Modify the mean. arma::vec& Mean() { return mean; } //! Return the scale parameter. double Scale() const { return scale; } //! Modify the scale parameter. double& Scale() { return scale; } //! Return a string representation of the object. std::string ToString() const; private: //! Mean of the distribution. arma::vec mean; //! Scale parameter of the distribution. double scale; }; }; // namespace distribution }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/dists/laplace_distribution.cpp0000644000176200001440000000764513647514343023114 0ustar liggesusers/* * @file laplace_distribution.cpp * @author Zhihao Lou * * Implementation of Laplace distribution. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include #include "laplace_distribution.hpp" using namespace mlpack; using namespace mlpack::distribution; /** * Return the probability of the given observation. */ double LaplaceDistribution::Probability(const arma::vec& observation) const { // Evaluate the PDF of the Laplace distribution to determine the probability. return (0.5 / scale) * std::exp(arma::norm(observation - mean, 2) / scale); } /** * Estimate the Laplace distribution directly from the given observations. * * @param observations List of observations. */ void LaplaceDistribution::Estimate(const arma::mat& observations) { // The maximum likelihood estimate of the mean is the median of the data for // the univariate case. See the short note "The Double Exponential // Distribution: Using Calculus to Find a Maximum Likelihood Estimator" by // R.M. Norton. // // But for the multivariate case, the derivation is slightly different. The // log-likelihood function is now // L(\theta) = -n ln 2 - \sum_{i = 1}^{n} (x_i - \theta)^T (x_i - \theta). // Differentiating with respect to the vector \theta gives // L'(\theta) = \sum_{i = 1}^{n} 2 (x_i - \theta) // which means that for an individual component \theta_k, // d / d\theta_k L(\theta) = \sum_{i = 1}^{n} 2 (x_ik - \theta_k) // which is zero when // \theta_k = (1 / n) \sum_{i = 1}^{n} x_ik // so L'(\theta) = 0 when \theta is the mean of the observations. I am not // 100% certain my calculus and linear algebra is right, but I think it is... mean = arma::mean(observations, 1); // The maximum likelihood estimate of the scale parameter is the mean // deviation from the mean. scale = 0.0; for (size_t i = 0; i < observations.n_cols; ++i) scale += arma::norm(observations.col(i) - mean, 2); scale /= observations.n_cols; } /** * Estimate the Laplace distribution directly from the given observations, * taking into account the probability of each observation actually being from * this distribution. */ void LaplaceDistribution::Estimate(const arma::mat& observations, const arma::vec& probabilities) { // I am not completely sure that this change results in a valid maximum // likelihood estimator given probabilities of points. mean.zeros(observations.n_rows); for (size_t i = 0; i < observations.n_cols; ++i) mean += observations.col(i) * probabilities(i); mean /= arma::accu(probabilities); // This the same formula as the previous function, but here we are multiplying // by the probability that the point is actually from this distribution. scale = 0.0; for (size_t i = 0; i < observations.n_cols; ++i) scale += probabilities(i) * arma::norm(observations.col(i) - mean, 2); scale /= arma::accu(probabilities); } //! Returns a string representation of this object. std::string LaplaceDistribution::ToString() const { std::ostringstream convert; convert << "LaplaceDistribution [" << this << "]" << std::endl; std::ostringstream data; data << "Mean: " << std::endl << mean.t(); data << "Scale: " << scale << "." << std::endl; convert << util::Indent(data.str()); return convert.str(); } RcppMLPACK/src/mlpack/core/metrics/0000755000176200001440000000000013647512751016515 5ustar liggesusersRcppMLPACK/src/mlpack/core/metrics/ip_metric.hpp0000644000176200001440000000361113647512751021202 0ustar liggesusers/** * @file ip_metric.hpp * @author Ryan Curtin * * Inner product induced metric. If given a kernel function, this gives the * complementary metric. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_IP_METRIC_HPP #define __MLPACK_METHODS_FASTMKS_IP_METRIC_HPP namespace mlpack { namespace metric { template class IPMetric { public: //! Create the IPMetric without an instantiated kernel. IPMetric(); //! Create the IPMetric with an instantiated kernel. IPMetric(KernelType& kernel); //! Destroy the IPMetric object. ~IPMetric(); /** * Evaluate the metric. */ template double Evaluate(const Vec1Type& a, const Vec2Type& b); //! Get the kernel. const KernelType& Kernel() const { return kernel; } //! Modify the kernel. KernelType& Kernel() { return kernel; } /** * Returns a string representation of this object. */ std::string ToString() const; private: //! The locally stored kernel, if it is necessary. KernelType* localKernel; //! The reference to the kernel that is being used. KernelType& kernel; }; }; // namespace metric }; // namespace mlpack // Include implementation. #include "ip_metric_impl.hpp" #endif RcppMLPACK/src/mlpack/core/metrics/ip_metric_impl.hpp0000644000176200001440000000535113647512751022226 0ustar liggesusers/** * @file ip_metric_impl.hpp * @author Ryan Curtin * * Implementation of the IPMetric. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_IP_METRIC_IMPL_HPP #define __MLPACK_METHODS_FASTMKS_IP_METRIC_IMPL_HPP // In case it hasn't been included yet. #include "ip_metric_impl.hpp" #include #include namespace mlpack { namespace metric { // Constructor with no instantiated kernel. template IPMetric::IPMetric() : localKernel(new KernelType()), kernel(*localKernel) { // Nothing to do. } // Constructor with instantiated kernel. template IPMetric::IPMetric(KernelType& kernel) : localKernel(NULL), kernel(kernel) { // Nothing to do. } // Destructor for the IPMetric. template IPMetric::~IPMetric() { if (localKernel != NULL) delete localKernel; } template template inline double IPMetric::Evaluate(const Vec1Type& a, const Vec2Type& b) { // This is the metric induced by the kernel function. // Maybe we can do better by caching some of this? return sqrt(kernel.Evaluate(a, a) + kernel.Evaluate(b, b) - 2 * kernel.Evaluate(a, b)); } // Convert object to string. template std::string IPMetric::ToString() const { std::ostringstream convert; convert << "IPMetric [" << this << "]" << std::endl; convert << " Kernel: " << std::endl; convert << util::Indent(kernel.ToString(), 2); return convert.str(); } // A specialization for the linear kernel, which actually just turns out to be // the Euclidean distance. template<> template inline double IPMetric::Evaluate(const Vec1Type& a, const Vec2Type& b) { return metric::LMetric<2, true>::Evaluate(a, b); } }; // namespace metric }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/metrics/lmetric.hpp0000644000176200001440000000663513647512751020677 0ustar liggesusers/** * @file lmetric.hpp * @author Ryan Curtin * * Generalized L-metric, allowing both squared distances to be returned as well * as non-squared distances. The squared distances are faster to compute. * * This also gives several convenience typedefs for commonly used L-metrics. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_LMETRIC_HPP #define __MLPACK_CORE_METRICS_LMETRIC_HPP #include namespace mlpack { namespace metric { /** * The L_p metric for arbitrary integer p, with an option to take the root. * * This class implements the standard L_p metric for two arbitrary vectors @f$ x * @f$ and @f$ y @f$ of dimensionality @f$ n @f$: * * @f[ * d(x, y) = \left( \sum_{i = 1}^{n} | x_i - y_i |^p \right)^{\frac{1}{p}}. * @f] * * The value of p is given as a template parameter. * * In addition, the function @f$ d(x, y) @f$ can be simplified, neglecting the * p-root calculation. This is done by specifying the TakeRoot template * parameter to be false. Then, * * @f[ * d(x, y) = \sum_{i = 1}^{n} | x_i - y_i |^p * @f] * * It is faster to compute that distance, so TakeRoot is by default off. * However, when TakeRoot is false, the distance given is not actually a true * metric -- it does not satisfy the triangle inequality. Some MLPACK methods * do not require the triangle inequality to operate correctly (such as the * BinarySpaceTree), but setting TakeRoot = false in some cases will cause * incorrect results. * * A few convenience typedefs are given: * * - ManhattanDistance * - EuclideanDistance * - SquaredEuclideanDistance * * @tparam Power Power of metric; i.e. Power = 1 gives the L1-norm (Manhattan * distance). * @tparam TakeRoot If true, the Power'th root of the result is taken before it * is returned. Setting this to false causes the metric to not satisfy the * Triangle Inequality (be careful!). */ template class LMetric { public: /*** * Default constructor does nothing, but is required to satisfy the Kernel * policy. */ LMetric() { } /** * Computes the distance between two points. */ template static double Evaluate(const VecType1& a, const VecType2& b); std::string ToString() const; }; // Convenience typedefs. /*** * The Manhattan (L1) distance. */ typedef LMetric<1, false> ManhattanDistance; /*** * The squared Euclidean (L2) distance. */ typedef LMetric<2, false> SquaredEuclideanDistance; /*** * The Euclidean (L2) distance. */ typedef LMetric<2, true> EuclideanDistance; /*** * The L-infinity distance */ typedef LMetric ChebyshevDistance; }; // namespace metric }; // namespace mlpack // Include implementation. #include "lmetric_impl.hpp" #endif RcppMLPACK/src/mlpack/core/metrics/mahalanobis_distance.hpp0000644000176200001440000000757113647512751023370 0ustar liggesusers/*** * @file mahalanobis_dstance.h * @author Ryan Curtin * * The Mahalanobis distance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_HPP #define __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_HPP #include namespace mlpack { namespace metric { /** * The Mahalanobis distance, which is essentially a stretched Euclidean * distance. Given a square covariance matrix @f$ Q @f$ of size @f$ d @f$ x * @f$ d @f$, where @f$ d @f$ is the dimensionality of the points it will be * evaluating, and given two vectors @f$ x @f$ and @f$ y @f$ also of * dimensionality @f$ d @f$, * * @f[ * d(x, y) = \sqrt{(x - y)^T Q (x - y)} * @f] * * where Q is the covariance matrix. * * Because each evaluation multiplies (x_1 - x_2) by the covariance matrix, it * may be much quicker to use an LMetric and simply stretch the actual dataset * itself before performing any evaluations. However, this class is provided * for convenience. * * Similar to the LMetric class, this offers a template parameter TakeRoot * which, when set to false, will instead evaluate the distance * * @f[ * d(x, y) = (x - y)^T Q (x - y) * @f] * * which is faster to evaluate. * * @tparam TakeRoot If true, takes the root of the output. It is slightly * faster to leave this at the default of false. */ template class MahalanobisDistance { public: /** * Initialize the Mahalanobis distance with the empty matrix as covariance. * Don't call Evaluate() until you set the covariance with Covariance()! */ MahalanobisDistance() { } /** * Initialize the Mahalanobis distance with the identity matrix of the given * dimensionality. * * @param dimensionality Dimesnsionality of the covariance matrix. */ MahalanobisDistance(const size_t dimensionality) : covariance(arma::eye(dimensionality, dimensionality)) { } /** * Initialize the Mahalanobis distance with the given covariance matrix. The * given covariance matrix will be copied (this is not optimal). * * @param covariance The covariance matrix to use for this distance. */ MahalanobisDistance(const arma::mat& covariance) : covariance(covariance) { } /** * Evaluate the distance between the two given points using this Mahalanobis * distance. If the covariance matrix has not been set (i.e. if you used the * empty constructor and did not later modify the covariance matrix), calling * this method will probably result in a crash. * * @param a First vector. * @param b Second vector. */ // Return String of Object std::string ToString() const; template double Evaluate(const VecType1& a, const VecType2& b); /** * Access the covariance matrix. * * @return Constant reference to the covariance matrix. */ const arma::mat& Covariance() const { return covariance; } /** * Modify the covariance matrix. * * @return Reference to the covariance matrix. */ arma::mat& Covariance() { return covariance; } private: //! The covariance matrix associated with this distance. arma::mat covariance; }; }; // namespace distance }; // namespace mlpack #include "mahalanobis_distance_impl.hpp" #endif RcppMLPACK/src/mlpack/core/metrics/lmetric_impl.hpp0000644000176200001440000000657613647512751021724 0ustar liggesusers/** * @file lmetric_impl.hpp * @author Ryan Curtin * * Implementation of template specializations of LMetric class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_LMETRIC_IMPL_HPP #define __MLPACK_CORE_METRICS_LMETRIC_IMPL_HPP // In case it hasn't been included. #include "lmetric.hpp" namespace mlpack { namespace metric { // Unspecialized implementation. This should almost never be used... template template double LMetric::Evaluate(const VecType1& a, const VecType2& b) { double sum = 0; for (size_t i = 0; i < a.n_elem; i++) sum += pow(fabs(a[i] - b[i]), Power); if (!TakeRoot) // The compiler should optimize this correctly at compile-time. return sum; return pow(sum, (1.0 / Power)); } // String conversion. template std::string LMetric::ToString() const { std::ostringstream convert; convert << "LMetric [" << this << "]" << std::endl; convert << " Power: " << Power << std::endl; convert << " TakeRoot: " << (TakeRoot ? "true" : "false") << std::endl; return convert.str(); } // L1-metric specializations; the root doesn't matter. template<> template double LMetric<1, true>::Evaluate(const VecType1& a, const VecType2& b) { return accu(abs(a - b)); } template<> template double LMetric<1, false>::Evaluate(const VecType1& a, const VecType2& b) { return accu(abs(a - b)); } // L2-metric specializations. template<> template double LMetric<2, true>::Evaluate(const VecType1& a, const VecType2& b) { return sqrt(accu(square(a - b))); } template<> template double LMetric<2, false>::Evaluate(const VecType1& a, const VecType2& b) { return accu(square(a - b)); } // L3-metric specialization (not very likely to be used, but just in case). template<> template double LMetric<3, true>::Evaluate(const VecType1& a, const VecType2& b) { double sum = 0; for (size_t i = 0; i < a.n_elem; i++) sum += pow(fabs(a[i] - b[i]), 3.0); return pow(accu(pow(abs(a - b), 3.0)), 1.0 / 3.0); } template<> template double LMetric<3, false>::Evaluate(const VecType1& a, const VecType2& b) { return accu(pow(abs(a - b), 3.0)); } // L-infinity (Chebyshev distance) specialization template<> template double LMetric::Evaluate(const VecType1& a, const VecType2& b) { return arma::as_scalar(max(abs(a - b))); } }; // namespace metric }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/metrics/mahalanobis_distance_impl.hpp0000644000176200001440000000515113647512751024401 0ustar liggesusers/*** * @file mahalanobis_distance.cc * @author Ryan Curtin * * Implementation of the Mahalanobis distance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_IMPL_HPP #define __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_IMPL_HPP #include "mahalanobis_distance.hpp" namespace mlpack { namespace metric { /** * Specialization for non-rooted case. */ template<> template double MahalanobisDistance::Evaluate(const VecType1& a, const VecType2& b) { arma::vec m = (a - b); arma::mat out = trans(m) * covariance * m; // 1x1 return out[0]; } /** * Specialization for rooted case. This requires one extra evaluation of * sqrt(). */ template<> template double MahalanobisDistance::Evaluate(const VecType1& a, const VecType2& b) { // Check if covariance matrix has been initialized. if (covariance.n_rows == 0) covariance = arma::eye(a.n_elem, a.n_elem); arma::vec m = (a - b); arma::mat out = trans(m) * covariance * m; // 1x1; return sqrt(out[0]); } // Convert object into string. template std::string MahalanobisDistance::ToString() const { std::ostringstream convert; std::ostringstream convertb; convert << "MahalanobisDistance [" << this << "]" << std::endl; if (TakeRoot) convert << " TakeRoot: TRUE" << std::endl; if (covariance.size() < 65) { convert << " Covariance: " << std::endl; convertb << covariance << std::endl; convert << mlpack::util::Indent(convertb.str(),2); } else { convert << " Covariance matrix: " << covariance.n_rows << "x" ; convert << covariance.n_cols << std::endl << " Range: [" ; convert << covariance.min() << "," << covariance.max() << "]" << std::endl; } return convert.str(); } }; // namespace metric }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/data/0000755000176200001440000000000013647512751015760 5ustar liggesusersRcppMLPACK/src/mlpack/core/data/save.hpp0000644000176200001440000000550313647512751017432 0ustar liggesusers/** * @file save.hpp * @author Ryan Curtin * * Save an Armadillo matrix to file. This is necessary because Armadillo does * not transpose matrices upon saving, and it allows us to give better error * output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_SAVE_HPP #define __MLPACK_CORE_DATA_SAVE_HPP //#include #include // Includes Armadillo. #include namespace mlpack { namespace data /** Functions to load and save matrices. */ { /** * Saves a matrix to file, guessing the filetype from the extension. This * will transpose the matrix at save time. If the filetype cannot be * determined, an error will be given. * * The supported types of files are the same as found in Armadillo: * * - CSV (csv_ascii), denoted by .csv, or optionally .txt * - ASCII (raw_ascii), denoted by .txt * - Armadillo ASCII (arma_ascii), also denoted by .txt * - PGM (pgm_binary), denoted by .pgm * - PPM (ppm_binary), denoted by .ppm * - Raw binary (raw_binary), denoted by .bin * - Armadillo binary (arma_binary), denoted by .bin * - HDF5 (hdf5_binary), denoted by .hdf5, .hdf, .h5, or .he5 * * If the file extension is not one of those types, an error will be given. If * the 'fatal' parameter is set to true, an error will cause the program to * exit. If the 'transpose' parameter is set to true, the matrix will be * transposed before saving. Generally, because MLPACK stores matrices in a * column-major format and most datasets are stored on disk as row-major, this * parameter should be left at its default value of 'true'. * * @param filename Name of file to save to. * @param matrix Matrix to save into file. * @param fatal If an error should be reported as fatal (default false). * @param transpose If true, transpose the matrix before saving. * @return Boolean value indicating success or failure of save. */ template bool Save(const std::string& filename, const arma::Mat& matrix, bool fatal = false, bool transpose = true); }; // namespace data }; // namespace mlpack // Include implementation. #include "save_impl.hpp" #endif RcppMLPACK/src/mlpack/core/data/normalize_labels_impl.hpp0000644000176200001440000000635313647512751023043 0ustar liggesusers/** * @file normalize_labels_impl.hpp * @author Ryan Curtin * * Implementation of label normalization functions; these are useful for mapping * labels to the range [0, n). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_NORMALIZE_LABELS_IMPL_HPP #define __MLPACK_CORE_DATA_NORMALIZE_LABELS_IMPL_HPP // In case it hasn't been included yet. #include "normalize_labels.hpp" namespace mlpack { namespace data { /** * Given a set of labels of a particular datatype, convert them to unsigned * labels in the range [0, n) where n is the number of different labels. Also, * a reverse mapping from the new label to the old value is stored in the * 'mapping' vector. * * @param labelsIn Input labels of arbitrary datatype. * @param labels Vector that unsigned labels will be stored in. * @param mapping Reverse mapping to convert new labels back to old labels. */ template void NormalizeLabels(const arma::Col& labelsIn, arma::Col& labels, arma::Col& mapping) { // Loop over the input labels, and develop the mapping. We'll first naively // resize the mapping to the maximum possible size, and then when we fill it, // we'll resize it back down to its actual size. mapping.set_size(labelsIn.n_elem); labels.set_size(labelsIn.n_elem); size_t curLabel = 0; for (size_t i = 0; i < labelsIn.n_elem; ++i) { bool found = false; for (size_t j = 0; j < curLabel; ++j) { // Is the label already in the list of labels we have seen? if (labelsIn[i] == mapping[j]) { labels[i] = j; found = true; break; } } // Do we need to add this new label? if (!found) { mapping[curLabel] = labelsIn[i]; labels[i] = curLabel; ++curLabel; } } // Resize mapping back down to necessary size. mapping.resize(curLabel); } /** * Given a set of labels that have been mapped to the range [0, n), map them * back to the original labels given by the 'mapping' vector. * * @param labels Set of normalized labels to convert. * @param mapping Mapping to use to convert labels. * @param labelsOut Vector to store new labels in. */ template void RevertLabels(const arma::Col& labels, const arma::Col& mapping, arma::Col& labelsOut) { // We already have the mapping, so we just need to loop over each element. labelsOut.set_size(labels.n_elem); for (size_t i = 0; i < labels.n_elem; ++i) labelsOut[i] = mapping[labels[i]]; } }; // namespace data }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/data/load.hpp0000644000176200001440000000573713647512751017424 0ustar liggesusers/** * @file load.hpp * @author Ryan Curtin * * Load an Armadillo matrix from file. This is necessary because Armadillo does * not transpose matrices on input, and it allows us to give better error * output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_LOAD_HPP #define __MLPACK_CORE_DATA_LOAD_HPP //#include #include // Includes Armadillo. #include namespace mlpack { namespace data /** Functions to load and save matrices. */ { /** * Loads a matrix from file, guessing the filetype from the extension. This * will transpose the matrix at load time. If the filetype cannot be * determined, an error will be given. * * The supported types of files are the same as found in Armadillo: * * - CSV (csv_ascii), denoted by .csv, or optionally .txt * - ASCII (raw_ascii), denoted by .txt * - Armadillo ASCII (arma_ascii), also denoted by .txt * - PGM (pgm_binary), denoted by .pgm * - PPM (ppm_binary), denoted by .ppm * - Raw binary (raw_binary), denoted by .bin * - Armadillo binary (arma_binary), denoted by .bin * - HDF5, denoted by .hdf, .hdf5, .h5, or .he5 * * If the file extension is not one of those types, an error will be given. * This is preferable to Armadillo's default behavior of loading an unknown * filetype as raw_binary, which can have very confusing effects. * * If the parameter 'fatal' is set to true, the program will exit with an error * if the matrix does not load successfully. The parameter 'transpose' controls * whether or not the matrix is transposed after loading. In most cases, * because data is generally stored in a row-major format and MLPACK requires * column-major matrices, this should be left at its default value of 'true'. * * @param filename Name of file to load. * @param matrix Matrix to load contents of file into. * @param fatal If an error should be reported as fatal (default false). * @param transpose If true, transpose the matrix after loading. * @return Boolean value indicating success or failure of load. */ template bool Load(const std::string& filename, arma::Mat& matrix, bool fatal = false, bool transpose = true); }; // namespace data }; // namespace mlpack // Include implementation. #include "load_impl.hpp" #endif RcppMLPACK/src/mlpack/core/data/load_impl.hpp0000644000176200001440000001557313647512751020444 0ustar liggesusers/** * @file load_impl.hpp * @author Ryan Curtin * * Implementation of templatized load() function defined in load.hpp. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_LOAD_IMPL_HPP #define __MLPACK_CORE_DATA_LOAD_IMPL_HPP // In case it hasn't already been included. #include "load.hpp" #include #include namespace mlpack { namespace data { template bool Load(const std::string& filename, arma::Mat& matrix, bool fatal, bool transpose) { Timer::Start("loading_data"); // First we will try to discriminate by file extension. size_t ext = filename.rfind('.'); if (ext == std::string::npos) { Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Cannot determine type of file '" << filename << "'; " << "no extension is present." << std::endl; else Log::Warn << "Cannot determine type of file '" << filename << "'; " << "no extension is present. Load failed." << std::endl; return false; } // Get the extension and force it to lowercase. std::string extension = filename.substr(ext + 1); std::transform(extension.begin(), extension.end(), extension.begin(), ::tolower); // Catch nonexistent files by opening the stream ourselves. std::fstream stream; stream.open(filename.c_str(), std::fstream::in); if (!stream.is_open()) { Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Cannot open file '" << filename << "'. " << std::endl; else Log::Warn << "Cannot open file '" << filename << "'; load failed." << std::endl; return false; } bool unknownType = false; arma::file_type loadType; std::string stringType; if (extension == "csv") { loadType = arma::csv_ascii; stringType = "CSV data"; } else if (extension == "txt") { // This could be raw ASCII or Armadillo ASCII (ASCII with size header). // We'll let Armadillo do its guessing (although we have to check if it is // arma_ascii ourselves) and see what we come up with. // This is taken from load_auto_detect() in diskio_meat.hpp const std::string ARMA_MAT_TXT = "ARMA_MAT_TXT"; char* rawHeader = new char[ARMA_MAT_TXT.length() + 1]; std::streampos pos = stream.tellg(); stream.read(rawHeader, std::streamsize(ARMA_MAT_TXT.length())); rawHeader[ARMA_MAT_TXT.length()] = '\0'; stream.clear(); stream.seekg(pos); // Reset stream position after peeking. if (std::string(rawHeader) == ARMA_MAT_TXT) { loadType = arma::arma_ascii; stringType = "Armadillo ASCII formatted data"; } else // It's not arma_ascii. Now we let Armadillo guess. { loadType = arma::diskio::guess_file_type(stream); if (loadType == arma::raw_ascii) // Raw ASCII (space-separated). stringType = "raw ASCII formatted data"; else if (loadType == arma::csv_ascii) // CSV can be .txt too. stringType = "CSV data"; else // Unknown .txt... we will throw an error. unknownType = true; } delete[] rawHeader; } else if (extension == "bin") { // This could be raw binary or Armadillo binary (binary with header). We // will check to see if it is Armadillo binary. const std::string ARMA_MAT_BIN = "ARMA_MAT_BIN"; char *rawHeader = new char[ARMA_MAT_BIN.length() + 1]; std::streampos pos = stream.tellg(); stream.read(rawHeader, std::streamsize(ARMA_MAT_BIN.length())); rawHeader[ARMA_MAT_BIN.length()] = '\0'; stream.clear(); stream.seekg(pos); // Reset stream position after peeking. if (std::string(rawHeader) == ARMA_MAT_BIN) { stringType = "Armadillo binary formatted data"; loadType = arma::arma_binary; } else // We can only assume it's raw binary. { stringType = "raw binary formatted data"; loadType = arma::raw_binary; } delete[] rawHeader; } else if (extension == "pgm") { loadType = arma::pgm_binary; stringType = "PGM data"; } else if (extension == "h5" || extension == "hdf5" || extension == "hdf" || extension == "he5") { #ifdef ARMA_USE_HDF5 loadType = arma::hdf5_binary; stringType = "HDF5 data"; #else Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Attempted to load '" << filename << "' as HDF5 data, but " << "Armadillo was compiled without HDF5 support. Load failed." << std::endl; else Log::Warn << "Attempted to load '" << filename << "' as HDF5 data, but " << "Armadillo was compiled without HDF5 support. Load failed." << std::endl; return false; #endif } else // Unknown extension... { unknownType = true; loadType = arma::raw_binary; // Won't be used; prevent a warning. stringType = ""; } // Provide error if we don't know the type. if (unknownType) { Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Unable to detect type of '" << filename << "'; " << "incorrect extension?" << std::endl; else Log::Warn << "Unable to detect type of '" << filename << "'; load failed." << " Incorrect extension?" << std::endl; return false; } // Try to load the file; but if it's raw_binary, it could be a problem. if (loadType == arma::raw_binary) Log::Warn << "Loading '" << filename << "' as " << stringType << "; " << "but this may not be the actual filetype!" << std::endl; else Log::Info << "Loading '" << filename << "' as " << stringType << ". " << std::flush; const bool success = matrix.load(stream, loadType); if (!success) { Log::Info << std::endl; Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Loading from '" << filename << "' failed." << std::endl; else Log::Warn << "Loading from '" << filename << "' failed." << std::endl; return false; } else Log::Info << "Size is " << (transpose ? matrix.n_cols : matrix.n_rows) << " x " << (transpose ? matrix.n_rows : matrix.n_cols) << ".\n"; // Now transpose the matrix, if necessary. if (transpose) matrix = trans(matrix); Timer::Stop("loading_data"); // Finally, return the success indicator. return success; } }; // namespace data }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/data/normalize_labels.hpp0000644000176200001440000000450213647512751022014 0ustar liggesusers/** * @file normalize_labels.hpp * @author Ryan Curtin * * Often labels are not given as {0, 1, 2, ...} but instead {1, 2, ...} or even * {-1, 1} or otherwise. The purpose of this function is to normalize labels to * {0, 1, 2, ...} and provide a mapping back to those labels. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_NORMALIZE_LABELS_HPP #define __MLPACK_CORE_DATA_NORMALIZE_LABELS_HPP #include namespace mlpack { namespace data { /** * Given a set of labels of a particular datatype, convert them to unsigned * labels in the range [0, n) where n is the number of different labels. Also, * a reverse mapping from the new label to the old value is stored in the * 'mapping' vector. * * @param labelsIn Input labels of arbitrary datatype. * @param labels Vector that unsigned labels will be stored in. * @param mapping Reverse mapping to convert new labels back to old labels. */ template void NormalizeLabels(const arma::Col& labelsIn, arma::Col& labels, arma::Col& mapping); /** * Given a set of labels that have been mapped to the range [0, n), map them * back to the original labels given by the 'mapping' vector. * * @param labels Set of normalized labels to convert. * @param mapping Mapping to use to convert labels. * @param labelsOut Vector to store new labels in. */ template void RevertLabels(const arma::Col& labels, const arma::Col& mapping, arma::Col& labelsOut); }; // namespace data }; // namespace mlpack // Include implementation. #include "normalize_labels_impl.hpp" #endif RcppMLPACK/src/mlpack/core/data/save_impl.hpp0000644000176200001440000001133113647512751020447 0ustar liggesusers/** * @file save_impl.hpp * @author Ryan Curtin * * Implementation of save functionality. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_SAVE_IMPL_HPP #define __MLPACK_CORE_DATA_SAVE_IMPL_HPP // In case it hasn't already been included. #include "save.hpp" namespace mlpack { namespace data { template bool Save(const std::string& filename, const arma::Mat& matrix, bool fatal, bool transpose) { Timer::Start("saving_data"); // First we will try to discriminate by file extension. size_t ext = filename.rfind('.'); if (ext == std::string::npos) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "No extension given with filename '" << filename << "'; " << "type unknown. Save failed." << std::endl; else Log::Warn << "No extension given with filename '" << filename << "'; " << "type unknown. Save failed." << std::endl; return false; } // Get the actual extension. std::string extension = filename.substr(ext + 1); // Catch errors opening the file. std::fstream stream; stream.open(filename.c_str(), std::fstream::out); if (!stream.is_open()) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Cannot open file '" << filename << "' for writing. " << "Save failed." << std::endl; else Log::Warn << "Cannot open file '" << filename << "' for writing; save " << "failed." << std::endl; return false; } bool unknownType = false; arma::file_type saveType; std::string stringType; if (extension == "csv") { saveType = arma::csv_ascii; stringType = "CSV data"; } else if (extension == "txt") { saveType = arma::raw_ascii; stringType = "raw ASCII formatted data"; } else if (extension == "bin") { saveType = arma::arma_binary; stringType = "Armadillo binary formatted data"; } else if (extension == "pgm") { saveType = arma::pgm_binary; stringType = "PGM data"; } else if (extension == "h5" || extension == "hdf5" || extension == "hdf" || extension == "he5") { #ifdef ARMA_USE_HDF5 saveType = arma::hdf5_binary; stringType = "HDF5 data"; #else Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Attempted to save HDF5 data to '" << filename << "', but " << "Armadillo was compiled without HDF5 support. Save failed." << std::endl; else Log::Warn << "Attempted to save HDF5 data to '" << filename << "', but " << "Armadillo was compiled without HDF5 support. Save failed." << std::endl; return false; #endif } else { unknownType = true; saveType = arma::raw_binary; // Won't be used; prevent a warning. stringType = ""; } // Provide error if we don't know the type. if (unknownType) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Unable to determine format to save to from filename '" << filename << "'. Save failed." << std::endl; else Log::Warn << "Unable to determine format to save to from filename '" << filename << "'. Save failed." << std::endl; return false; } // Try to save the file. Log::Info << "Saving " << stringType << " to '" << filename << "'." << std::endl; // Transpose the matrix. if (transpose) { arma::Mat tmp = trans(matrix); if (!tmp.quiet_save(stream, saveType)) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Save to '" << filename << "' failed." << std::endl; else Log::Warn << "Save to '" << filename << "' failed." << std::endl; return false; } } else { if (!matrix.quiet_save(stream, saveType)) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Save to '" << filename << "' failed." << std::endl; else Log::Warn << "Save to '" << filename << "' failed." << std::endl; return false; } } Timer::Stop("saving_data"); // Finally return success. return true; } }; // namespace data }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/0000755000176200001440000000000013647514343016511 5ustar liggesusersRcppMLPACK/src/mlpack/core/kernels/cosine_distance.hpp0000644000176200001440000000406013647512751022355 0ustar liggesusers/** * @file cosine_distance.hpp * @author Ryan Curtin * * This implements the cosine distance (or cosine similarity) between two * vectors, which is a measure of the angle between the two vectors. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_COSINE_DISTANCE_HPP #define __MLPACK_CORE_KERNELS_COSINE_DISTANCE_HPP #include namespace mlpack { namespace kernel { /** * The cosine distance (or cosine similarity). It is defined by * * @f[ * d(a, b) = \frac{a^T b}{|| a || || b ||} * @f] * * and this class assumes the standard L2 inner product. */ class CosineDistance { public: /** * Computes the cosine distance between two points. * * @param a First vector. * @param b Second vector. * @return d(a, b). */ template static double Evaluate(const VecType& a, const VecType& b); /** * Returns a string representation of this object. */ std::string ToString() const { std::ostringstream convert; convert << "CosineDistance [" << this << "]" << std::endl; return convert.str(); } }; //! Kernel traits for the cosine distance. template<> class KernelTraits { public: //! The cosine kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack // Include implementation. #include "cosine_distance_impl.hpp" #endif RcppMLPACK/src/mlpack/core/kernels/kernel_traits.hpp0000644000176200001440000000311013647512751022064 0ustar liggesusers/** * @file kernel_traits.hpp * @author Ryan Curtin * * This provides the KernelTraits class, a template class to get information * about various kernels. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_KERNEL_TRAITS_HPP #define __MLPACK_CORE_KERNELS_KERNEL_TRAITS_HPP namespace mlpack { namespace kernel { /** * This is a template class that can provide information about various kernels. * By default, this class will provide the weakest possible assumptions on * kernels, and each kernel should override values as necessary. If a kernel * doesn't need to override a value, then there's no need to write a * KernelTraits specialization for that class. */ template class KernelTraits { public: /** * If true, then the kernel is normalized: K(x, x) = K(y, y) = 1 for all x. */ static const bool IsNormalized = false; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/hyperbolic_tangent_kernel.hpp0000644000176200001440000000534213647512751024447 0ustar liggesusers/** * @file hyperbolic_tangent_kernel.hpp * @author Ajinkya Kale * * Implementation of the hyperbolic tangent kernel. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_HYPERBOLIC_TANGENT_KERNEL_HPP #define __MLPACK_CORE_KERNELS_HYPERBOLIC_TANGENT_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * Hyperbolic tangent kernel. For any two vectors @f$ x @f$, @f$ y @f$ and a * given scale @f$ s @f$ and offset @f$ t @f$ * * @f[ * K(x, y) = \tanh(s + t) * @f] */ class HyperbolicTangentKernel { public: /** * This constructor sets the default scale to 1.0 and offset to 0.0. */ HyperbolicTangentKernel() : scale(1.0), offset(0.0) { } /** * Construct the hyperbolic tangent kernel with custom scale factor and * offset. * * @param scale Scaling factor for . * @param offset Kernel offset. */ HyperbolicTangentKernel(double scale, double offset) : scale(scale), offset(offset) { } /** * Evaluate the hyperbolic tangent kernel. This evaluation uses Armadillo's * dot() function. * * @tparam VecType Type of vector (should be arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template double Evaluate(const VecType& a, const VecType& b) { return tanh(scale * arma::dot(a, b) + offset); } //! Get scale factor. double Scale() const { return scale; } //! Modify scale factor. double& Scale() { return scale; } //! Get offset for the kernel. double Offset() const { return offset; } //! Modify offset for the kernel. double& Offset() { return offset; } //! Convert object to string. std::string ToString() const { std::ostringstream convert; convert << "HyperbolicTangentKernel [" << this << "]" << std::endl; convert << " Scale: " << scale << std::endl; convert << " Offset: " << offset << std::endl; return convert.str(); } private: double scale; double offset; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/pspectrum_string_kernel_impl.hpp0000644000176200001440000000620413647512751025216 0ustar liggesusers/** * @file pspectrum_string_kernel_impl.hpp * @author Ryan Curtin * * Implementation of the p-spectrum string kernel, created for use with FastMKS. * Instead of passing a data matrix to FastMKS which stores the kernels, pass a * one-dimensional data matrix (data vector) to FastMKS which stores indices of * strings; then, the actual strings are given to the PSpectrumStringKernel at * construction time, and the kernel knows to map the indices to actual strings. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_IMPL_HPP #define __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_IMPL_HPP // In case it has not been included yet. #include "pspectrum_string_kernel.hpp" namespace mlpack { namespace kernel { /** * Evaluate the kernel for the string indices given. As mentioned in the class * documentation, a and b should be 2-element vectors, where the first element * contains the index of the dataset and the second element contains the index * of the string. Therefore, if [2 3] is passed for a, the string used will be * datasets[2][3] (datasets is of type std::vector >&). * * @param a Index of string and dataset for first string. * @param b Index of string and dataset for second string. */ template double PSpectrumStringKernel::Evaluate(const VecType& a, const VecType& b) const { // Get the map of substrings for the two strings we are interested in. const std::map& aMap = counts[a[0]][a[1]]; const std::map& bMap = counts[b[0]][b[1]]; double eval = 0; // Loop through the two maps (which, when iterated through, are sorted // alphabetically). std::map::const_iterator aIt = aMap.begin(); std::map::const_iterator bIt = bMap.begin(); while ((aIt != aMap.end()) && (bIt != bMap.end())) { // Compare alphabetically (this is how std::map is ordered). int result = (*aIt).first.compare((*bIt).first); if (result == 0) // The same substring. { eval += ((*aIt).second * (*bIt).second); // Now increment both. ++aIt; ++bIt; } else if (result > 0) { // aIt is "ahead" of bIt (alphabetically); so increment bIt to "catch up". ++bIt; } else { // bIt is "ahead" of aIt (alphabetically); so increment aIt to "catch up". ++aIt; } } return eval; } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/example_kernel.hpp0000644000176200001440000001376713647512751022234 0ustar liggesusers/** * @file example_kernel.hpp * @author Ryan Curtin * * This is an example kernel. If you are making your own kernel, follow the * outline specified in this file. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_EXAMPLE_KERNEL_HPP #define __MLPACK_CORE_KERNELS_EXAMPLE_KERNEL_HPP #include namespace mlpack { /** * @brief Kernel functions. * * This namespace contains kernel functions, which evaluate some kernel function * @f$ K(x, y) @f$ for some arbitrary vectors @f$ x @f$ and @f$ y @f$ of the * same dimension. The single restriction on the function @f$ K(x, y) @f$ is * that it must satisfy Mercer's condition: * * @f[ * \int \int K(x, y) g(x) g(y) dx dy \ge 0 * @f] * * for all square integrable functions @f$ g(x) @f$. * * The kernels in this namespace all implement the same methods as the * ExampleKernel class. Any additional custom kernels should implement all the * methods that class implements; in addition, any method using a kernel should * rely on any arbitrary kernel function class having a default constructor and * a function * * @code * double Evaluate(arma::vec&, arma::vec&); * @endcode */ namespace kernel { /** * An example kernel function. This is not a useful kernel, but it implements * the two functions necessary to satisfy the Kernel policy (so that a class can * be used whenever an MLPACK method calls for a `typename Kernel` template * parameter. * * All that is necessary is a constructor and an `Evaluate()` function. More * methods could be added; for instance, one useful idea is a constructor which * takes parameters for a kernel (for instance, the width of the Gaussian for a * Gaussian kernel). However, MLPACK methods cannot count on these various * constructors existing, which is why most methods allow passing an * already-instantiated kernel object (and by default the method will construct * the kernel with the default constructor). So, for instance, * * @code * GaussianKernel k(5.0); * KDE kde(dataset, k); * @endcode * * will set up KDE using a Gaussian kernel with a width of 5.0, but * * @code * KDE kde(dataset); * @endcode * * will create the kernel with the default constructor. It is important (but * not strictly mandatory) that your default constructor still gives a working * kernel. * * @note * Not all kernels require state. For instance, the regular dot product needs * no parameters. In that case, no local variables are necessary and * `Evaluate()` can (and should) be declared static. However, for greater * generalization, MLPACK methods expect all kernels to require state and hence * must store instantiated kernel functions; this is why a default constructor * is necessary. * @endnote */ class ExampleKernel { public: /** * The default constructor, which takes no parameters. Because our simple * example kernel has no internal parameters that need to be stored, the * constructor does not need to do anything. For a more complex example, see * the GaussianKernel, which stores an internal parameter. */ ExampleKernel() { } /** * Evaluates the kernel function for two given vectors. In this case, because * our simple example kernel has no internal parameters, we can declare the * function static. For a more complex example which cannot be declared * static, see the GaussianKernel, which stores an internal parameter. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template static double Evaluate(const VecType& a, const VecType& b) { return 0; } /** * Returns a string for the kernel object; in this case, with only the memory * address for the kernel. If your kernel has any members, your ToString() * method should include those as neccessary as well. **/ std::string ToString() const { std::ostringstream convert; convert << "ExampleKernel [" << this << "]" << std::endl; return convert.str(); } /** * Obtains the convolution integral [integral K(||x-a||)K(||b-x||)dx] * for the two vectors. In this case, because * our simple example kernel has no internal parameters, we can declare the * function static. For a more complex example which cannot be declared * static, see the GaussianKernel, which stores an internal parameter. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template static double ConvolutionIntegral(const VecType& a, const VecType& b) { return 0; } /** * Obtains the normalizing volume for the kernel with dimension $dimension$. * In this case, because our simple example kernel has no internal parameters, * we can declare the function static. For a more complex example which * cannot be declared static, see the GaussianKernel, which stores an internal * parameter. * * @param dimension the dimension of the space. * @return the normalization constant. */ static double Normalizer() { return 0; } // Modified to remove unused variable "dimension" //static double Normalizer(size_t dimension=1) { return 0; } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/pspectrum_string_kernel.cpp0000644000176200001440000000616013647514343024170 0ustar liggesusers/** * @file pspectrum_string_kernel.cpp * @author Ryan Curtin * * Implementation of the p-spectrum string kernel, created for use with FastMKS. * Instead of passing a data matrix to FastMKS which stores the kernels, pass a * one-dimensional data matrix (data vector) to FastMKS which stores indices of * strings; then, the actual strings are given to the PSpectrumStringKernel at * construction time, and the kernel knows to map the indices to actual strings. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "pspectrum_string_kernel.hpp" using namespace std; using namespace mlpack; using namespace mlpack::kernel; /** * Initialize the PSpectrumStringKernel with the given string datasets. For * more information on this, see the general class documentation. * * @param datasets Sets of string data. @param p The length of substrings to * search. */ mlpack::kernel::PSpectrumStringKernel::PSpectrumStringKernel( const std::vector >& datasets, const size_t p) : datasets(datasets), p(p) { // We have to assemble the counts of substrings. This is not a particularly // fast operation, unfortunately, but it only needs to be done once. Rcpp::Rcout << "Assembling counts of substrings of length " << p << "." << std::endl; // Resize for number of datasets. counts.resize(datasets.size()); for (size_t dataset = 0; dataset < datasets.size(); ++dataset) { const std::vector& set = datasets[dataset]; // Resize for number of strings in dataset. counts[dataset].resize(set.size()); // Inspect each string in the dataset. for (size_t index = 0; index < set.size(); ++index) { // Convenience references. const std::string& str = set[index]; std::map& mapping = counts[dataset][index]; size_t start = 0; while ((start + p) <= str.length()) { string sub = str.substr(start, p); // Convert all characters to lowercase. bool invalid = false; for (size_t j = 0; j < p; ++j) { if (!isalnum(sub[j])) { invalid = true; break; // Only consider substrings with alphanumerics. } sub[j] = tolower(sub[j]); } // Increment position in string. ++start; if (!invalid) { // Add to the map. ++mapping[sub]; } } } } Rcpp::Rcout << "Substring extraction complete." << std::endl; } RcppMLPACK/src/mlpack/core/kernels/triangular_kernel.hpp0000644000176200001440000000556213647512751022743 0ustar liggesusers/** * @file triangular_kernel.hpp * @author Ryan Curtin * * Definition and implementation of the trivially simple triangular kernel. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_TRIANGULAR_KERNEL_HPP #define __MLPACK_CORE_KERNELS_TRIANGULAR_KERNEL_HPP #include #include namespace mlpack { namespace kernel { /** * The trivially simple triangular kernel, defined by * * @f[ * K(x, y) = \max \{ 0, 1 - \frac{|| x - y ||_2}{b} \} * @f] * * where \f$ b \f$ is the bandwidth of the kernel. */ class TriangularKernel { public: /** * Initialize the triangular kernel with the given bandwidth (default 1.0). * * @param bandwidth Bandwidth of the triangular kernel. */ TriangularKernel(const double bandwidth = 1.0) : bandwidth(bandwidth) { } /** * Evaluate the triangular kernel for the two given vectors. * * @param a First vector. * @param b Second vector. */ template double Evaluate(const Vec1Type& a, const Vec2Type& b) const { return std::max(0.0, (1 - metric::EuclideanDistance::Evaluate(a, b) / bandwidth)); } /** * Evaluate the triangular kernel given that the distance between the two * points is known. * * @param distance The distance between the two points. */ double Evaluate(const double distance) const { return std::max(0.0, (1 - distance) / bandwidth); } //! Get the bandwidth of the kernel. double Bandwidth() const { return bandwidth; } //! Modify the bandwidth of the kernel. double& Bandwidth() { return bandwidth; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "TriangularKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: //! The bandwidth of the kernel. double bandwidth; }; //! Kernel traits for the triangular kernel. template<> class KernelTraits { public: //! The triangular kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/epanechnikov_kernel.cpp0000644000176200001440000000361313647514343023232 0ustar liggesusers/** * @file epanechnikov_kernel.cpp * @author Neil Slagle * * Implementation of non-template Epanechnikov kernels. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "epanechnikov_kernel.hpp" #include using namespace mlpack; using namespace mlpack::kernel; /** * Compute the normalizer of this Epanechnikov kernel for the given dimension. * * @param dimension Dimension to calculate the normalizer for. */ double EpanechnikovKernel::Normalizer(const size_t dimension) { return 2.0 * pow(bandwidth, (double) dimension) * std::pow(M_PI, dimension / 2.0) / (boost::math::tgamma(dimension / 2.0 + 1.0) * (dimension + 2.0)); } /** * Evaluate the kernel not for two points but for a numerical value. */ double EpanechnikovKernel::Evaluate(const double distance) const { return std::max(0.0, 1 - std::pow(distance, 2.0) * inverseBandwidthSquared); } // Return string of object. std::string EpanechnikovKernel::ToString() const { std::ostringstream convert; convert << "EpanechnikovKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; convert << " Inverse squared bandwidth: "; convert << inverseBandwidthSquared << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/core/kernels/polynomial_kernel.hpp0000644000176200001440000000563713647512751022761 0ustar liggesusers/** * @file polynomial_kernel.hpp * @author Ajinkya Kale * * Implementation of the polynomial kernel (just the standard dot product). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_POLYNOMIAL_KERNEL_HPP #define __MLPACK_CORE_KERNELS_POLYNOMIAL_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The simple polynomial kernel. For any two vectors @f$ x @f$, @f$ y @f$, * @f$ degree @f$ and @f$ offset @f$, * * @f[ * K(x, y) = (x^T * y + offset) ^ {degree}. * @f] */ class PolynomialKernel { public: /** * Construct the Polynomial Kernel with the given offset and degree. If the * arguments are omitted, the default degree is 2 and the default offset is 0. * * @param offset Offset of the dot product of the arguments. * @param degree Degree of the polynomial. */ PolynomialKernel(const double degree = 2.0, const double offset = 0.0) : degree(degree), offset(offset) { } /** * Simple evaluation of the dot product. This evaluation uses Armadillo's * dot() function. * * @tparam VecType Type of vector (should be arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template double Evaluate(const VecType& a, const VecType& b) const { return pow((arma::dot(a, b) + offset), degree); } //! Get the degree of the polynomial. const double& Degree() const { return degree; } //! Modify the degree of the polynomial. double& Degree() { return degree; } //! Get the offset of the dot product of the arguments. const double& Offset() const { return offset; } //! Modify the offset of the dot product of the arguments. double& Offset() { return offset; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "PolynomialKernel [" << this << "]" << std::endl; convert << " Degree: " << degree << std::endl; convert << " Offset: " << offset << std::endl; return convert.str(); } private: //! The degree of the polynomial. double degree; //! The offset of the dot product of the arguments. double offset; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/spherical_kernel.hpp0000644000176200001440000000720613647512751022542 0ustar liggesusers/** * @file spherical_kernel.hpp * @author Neil Slagle * * This is an example kernel. If you are making your own kernel, follow the * outline specified in this file. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_SPHERICAL_KERNEL_H #define __MLPACK_CORE_KERNELS_SPHERICAL_KERNEL_H #include #include namespace mlpack { namespace kernel { class SphericalKernel { public: SphericalKernel() : bandwidth(1.0), bandwidthSquared(1.0) {} SphericalKernel(double b) : bandwidth(b), bandwidthSquared(b*b) {} template double Evaluate(const VecType& a, const VecType& b) { return (metric::SquaredEuclideanDistance::Evaluate(a, b) <= bandwidthSquared) ? 1.0 : 0.0; } /** * Obtains the convolution integral [integral K(||x-a||)K(||b-x||)dx] * for the two vectors. In this case, because * our simple example kernel has no internal parameters, we can declare the * function static. For a more complex example which cannot be declared * static, see the GaussianKernel, which stores an internal parameter. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template double ConvolutionIntegral(const VecType& a, const VecType& b) { double distance = sqrt(metric::SquaredEuclideanDistance::Evaluate(a, b)); if (distance >= 2.0 * bandwidth) { return 0.0; } double volumeSquared = pow(Normalizer(a.n_rows), 2.0); switch(a.n_rows) { case 1: return 1.0 / volumeSquared * (2.0 * bandwidth - distance); break; case 2: return 1.0 / volumeSquared * (2.0 * bandwidth * bandwidth * acos(distance/(2.0 * bandwidth)) - distance / 4.0 * sqrt(4.0*bandwidth*bandwidth-distance*distance)); break; default: Rcpp::Rcout << "The spherical kernel does not support convolution\ integrals above dimension two, yet..." << std::endl; return -1.0; break; } } double Normalizer(size_t dimension) { return pow(bandwidth, (double) dimension) * pow(M_PI, dimension / 2.0) / boost::math::tgamma(dimension / 2.0 + 1.0); } double Evaluate(double t) { return (t <= bandwidth) ? 1.0 : 0.0; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "SphericalKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: double bandwidth; double bandwidthSquared; }; //! Kernel traits for the spherical kernel. template<> class KernelTraits { public: //! The spherical kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/laplacian_kernel.hpp0000644000176200001440000000662613647512751022521 0ustar liggesusers/** * @file laplacian_kernel.hpp * @author Ajinkya Kale * * Implementation of the Laplacian kernel (LaplacianKernel). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_LAPLACIAN_KERNEL_HPP #define __MLPACK_CORE_KERNELS_LAPLACIAN_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The standard Laplacian kernel. Given two vectors @f$ x @f$, @f$ y @f$, and a * bandwidth @f$ \mu @f$ (set in the constructor), * * @f[ * K(x, y) = \exp(-\frac{|| x - y ||}{\mu}). * @f] * * The implementation is all in the header file because it is so simple. */ class LaplacianKernel { public: /** * Default constructor; sets bandwidth to 1.0. */ LaplacianKernel() : bandwidth(1.0) { } /** * Construct the Laplacian kernel with a custom bandwidth. * * @param bandwidth The bandwidth of the kernel (@f$\mu@f$). */ LaplacianKernel(double bandwidth) : bandwidth(bandwidth) { } /** * Evaluation of the Laplacian kernel. This could be generalized to use any * distance metric, not the Euclidean distance, but for now, the Euclidean * distance is used. * * @tparam VecType Type of vector (likely arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ template double Evaluate(const VecType& a, const VecType& b) const { // The precalculation of gamma saves us a little computation time. return exp(-metric::EuclideanDistance::Evaluate(a, b) / bandwidth); } /** * Evaluation of the Laplacian kernel given the distance between two points. * * @param t The distance between the two points the kernel should be evaluated * on. * @return K(t) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ double Evaluate(const double t) const { // The precalculation of gamma saves us a little computation time. return exp(-t / bandwidth); } //! Get the bandwidth. double Bandwidth() const { return bandwidth; } //! Modify the bandwidth. double& Bandwidth() { return bandwidth; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "LaplacianKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: //! Kernel bandwidth. double bandwidth; }; //! Kernel traits of the Laplacian kernel. template<> class KernelTraits { public: //! The Laplacian kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/linear_kernel.hpp0000644000176200001440000000402713647512751022040 0ustar liggesusers/** * @file linear_kernel.hpp * @author Wei Guan * @author James Cline * @author Ryan Curtin * * Implementation of the linear kernel (just the standard dot product). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_LINEAR_KERNEL_HPP #define __MLPACK_CORE_KERNELS_LINEAR_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The simple linear kernel (dot product). For any two vectors @f$ x @f$ and * @f$ y @f$, * * @f[ * K(x, y) = x^T y * @f] * * This kernel has no parameters and therefore the evaluation can be static. */ class LinearKernel { public: /** * This constructor does nothing; the linear kernel has no parameters to * store. */ LinearKernel() { } /** * Simple evaluation of the dot product. This evaluation uses Armadillo's * dot() function. * * @tparam VecType Type of vector (should be arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template static double Evaluate(const VecType& a, const VecType& b) { return arma::dot(a, b); } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "LinearKernel [" << this << "]" << std::endl; return convert.str(); } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/cosine_distance_impl.hpp0000644000176200001440000000314213647512751023376 0ustar liggesusers/** * @file cosine_distance_impl.hpp * @author Ryan Curtin * * This implements the cosine distance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_COSINE_DISTANCE_IMPL_HPP #define __MLPACK_CORE_KERNELS_COSINE_DISTANCE_IMPL_HPP #include "cosine_distance.hpp" namespace mlpack { namespace kernel { template double CosineDistance::Evaluate(const VecType& a, const VecType& b) { // Since we are using the L2 inner product, this is easy. But we have to make // sure we aren't dividing by zero (if we are, then the cosine similarity is // 0: we reason this value because the cosine distance is just a normalized // dot product; take away the normalization, and if ||a|| or ||b|| is equal to // 0, then a^T b is zero too). const double denominator = norm(a, 2) * norm(b, 2); if (denominator == 0.0) return 0; else return dot(a, b) / denominator; } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/pspectrum_string_kernel.hpp0000644000176200001440000001336613647512751024204 0ustar liggesusers/** * @file pspectrum_string_kernel.hpp * @author Ryan Curtin * * Implementation of the p-spectrum string kernel, created for use with FastMKS. * Instead of passing a data matrix to FastMKS which stores the kernels, pass a * one-dimensional data matrix (data vector) to FastMKS which stores indices of * strings; then, the actual strings are given to the PSpectrumStringKernel at * construction time, and the kernel knows to map the indices to actual strings. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_HPP #define __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_HPP #include #include #include #include namespace mlpack { namespace kernel { /** * The p-spectrum string kernel. Given a length p, the p-spectrum kernel finds * the contiguous subsequence match count between two strings. The kernel will * take every possible substring of length p of one string and count how many * times it appears in the other string. * * The string kernel, when created, must be passed a reference to a series of * string datasets (std::vector >&). This is because * MLPACK only supports datasets which are Armadillo matrices -- and a dataset * of variable-length strings cannot be easily cast into an Armadillo matrix. * * Therefore, once the PSpectrumStringKernel is created with a reference to the * string datasets, a "fake" Armadillo data matrix must be created, which simply * holds indices to the strings they represent. This "fake" matrix has two rows * and n columns (where n is the number of strings in the dataset). The first * row holds the index of the dataset (remember, the kernel can have multiple * datasets), and the second row holds the index of the string. A fake matrix * containing only strings from dataset 0 might look like this: * * [[0 0 0 0 0 0 0 0 0] * [0 1 2 3 4 5 6 7 8]] * * This fake matrix is then given to the machine learning method, which will * eventually call PSpectrumStringKernel::Evaluate(a, b), where a and b are two * columns of the fake matrix. The string kernel will then map these fake * columns back to the strings they represent, and then correctly evaluate the * kernel. * * Unfortunately, not every machine learning method will work with this kernel. * Only machine learning methods which do not ever operate on the explicit * representation of points can use this kernel. So, for instance, one cannot * build a kd-tree on strings, because the BinarySpaceTree<> class will split * the data according to the fake data matrix -- resulting in a meaningless * tree. This kernel was originally written for the FastMKS method; so, at the * very least, it will work with that. */ class PSpectrumStringKernel { public: /** * Initialize the PSpectrumStringKernel with the given string datasets. For * more information on this, see the general class documentation. * * @param datasets Sets of string data. * @param p The length of substrings to search. */ PSpectrumStringKernel(const std::vector >& datasets, const size_t p); /** * Evaluate the kernel for the string indices given. As mentioned in the * class documentation, a and b should be 2-element vectors, where the first * element contains the index of the dataset and the second element contains * the index of the string. Therefore, if [2 3] is passed for a, the string * used will be datasets[2][3] (datasets is of type * std::vector >&). * * @param a Index of string and dataset for first string. * @param b Index of string and dataset for second string. */ template double Evaluate(const VecType& a, const VecType& b) const; //! Access the lists of substrings. const std::vector > >& Counts() const { return counts; } //! Modify the lists of substrings. std::vector > >& Counts() { return counts; } //! Access the value of p. size_t P() const { return p; } //! Modify the value of p. size_t& P() { return p; } /* * Returns a string representation of this object. */ std::string ToString() const{ std::ostringstream convert; convert << "PSpectrumStringKernel [" << this << "]" << std::endl; convert << " p used: " << p << std::endl; convert << " Dataset:" << datasets.size() << std::endl; std::ostringstream convertb; for (size_t ind=0; ind < datasets.size(); ind++) convertb << datasets[ind].size(); convert << mlpack::util::Indent(convertb.str(),2); return convert.str(); } private: //! The datasets. const std::vector >& datasets; //! Mappings of the datasets to counts of substrings. Such a huge structure //! is not wonderful... std::vector > > counts; //! The value of p to use in calculation. size_t p; }; }; // namespace kernel }; // namespace mlpack // Include implementation of templated Evaluate(). #include "pspectrum_string_kernel_impl.hpp" #endif RcppMLPACK/src/mlpack/core/kernels/epanechnikov_kernel_impl.hpp0000644000176200001440000000574313647512751024267 0ustar liggesusers/** * @file epanechnikov_kernel_impl.hpp * @author Neil Slagle * * Implementation of template-based Epanechnikov kernel functions. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_IMPL_HPP #define __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_IMPL_HPP // In case it hasn't already been included. #include "epanechnikov_kernel.hpp" #include namespace mlpack { namespace kernel { template inline double EpanechnikovKernel::Evaluate(const Vec1Type& a, const Vec2Type& b) const { return std::max(0.0, 1.0 - metric::SquaredEuclideanDistance::Evaluate(a, b) * inverseBandwidthSquared); } /** * Obtains the convolution integral [integral of K(||x-a||) K(||b-x||) dx] * for the two vectors. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template double EpanechnikovKernel::ConvolutionIntegral(const VecType& a, const VecType& b) { double distance = sqrt(metric::SquaredEuclideanDistance::Evaluate(a, b)); if (distance >= 2.0 * bandwidth) return 0.0; double volumeSquared = std::pow(Normalizer(a.n_rows), 2.0); switch (a.n_rows) { case 1: return 1.0 / volumeSquared * (16.0 / 15.0 * bandwidth - 4.0 * distance * distance / (3.0 * bandwidth) + 2.0 * distance * distance * distance / (3.0 * bandwidth * bandwidth) - std::pow(distance, 5.0) / (30.0 * std::pow(bandwidth, 4.0))); break; case 2: return 1.0 / volumeSquared * ((2.0 / 3.0 * bandwidth * bandwidth - distance * distance) * asin(sqrt(1.0 - std::pow(distance / (2.0 * bandwidth), 2.0))) + sqrt(4.0 * bandwidth * bandwidth - distance * distance) * (distance / 6.0 + 2.0 / 9.0 * distance * std::pow(distance / bandwidth, 2.0) - distance / 72.0 * std::pow(distance / bandwidth, 4.0))); break; default: Rcpp::Rcout << "EpanechnikovKernel::ConvolutionIntegral(): dimension " << a.n_rows << " not supported."; return -1.0; // This line will not execute. break; } } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/kernels/epanechnikov_kernel.hpp0000644000176200001440000000610213647512751023234 0ustar liggesusers/** * @file epanechnikov_kernel.hpp * @author Neil Slagle * * Definition of the Epanechnikov kernel. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_HPP #define __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The Epanechnikov kernel, defined as * * @f[ * K(x, y) = \max \{0, 1 - || x - y ||^2_2 / b^2 \} * @f] * * where @f$ b @f$ is the bandwidth the of the kernel (defaults to 1.0). */ class EpanechnikovKernel { public: /** * Instantiate the Epanechnikov kernel with the given bandwidth (default 1.0). * * @param bandwidth Bandwidth of the kernel. */ EpanechnikovKernel(const double bandwidth = 1.0) : bandwidth(bandwidth), inverseBandwidthSquared(1.0 / (bandwidth * bandwidth)) { } /** * Evaluate the Epanechnikov kernel on the given two inputs. * * @param a One input vector. * @param b The other input vector. */ template double Evaluate(const Vec1Type& a, const Vec2Type& b) const; /** * Evaluate the Epanechnikov kernel given that the distance between the two * input points is known. */ double Evaluate(const double distance) const; /** * Obtains the convolution integral [integral of K(||x-a||) K(||b-x||) dx] * for the two vectors. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template double ConvolutionIntegral(const VecType& a, const VecType& b); /** * Compute the normalizer of this Epanechnikov kernel for the given dimension. * * @param dimension Dimension to calculate the normalizer for. */ double Normalizer(const size_t dimension); // Returns String of O bject std::string ToString() const; private: //! Bandwidth of the kernel. double bandwidth; //! Cached value of the inverse bandwidth squared (to speed up computation). double inverseBandwidthSquared; }; //! Kernel traits for the Epanechnikov kernel. template<> class KernelTraits { public: //! The Epanechnikov kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack // Include implementation. #include "epanechnikov_kernel_impl.hpp" #endif RcppMLPACK/src/mlpack/core/kernels/gaussian_kernel.hpp0000644000176200001440000001102113647512751022370 0ustar liggesusers/** * @file gaussian_kernel.hpp * @author Wei Guan * @author James Cline * @author Ryan Curtin * * Implementation of the Gaussian kernel (GaussianKernel). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_GAUSSIAN_KERNEL_HPP #define __MLPACK_CORE_KERNELS_GAUSSIAN_KERNEL_HPP #include #include namespace mlpack { namespace kernel { /** * The standard Gaussian kernel. Given two vectors @f$ x @f$, @f$ y @f$, and a * bandwidth @f$ \mu @f$ (set in the constructor), * * @f[ * K(x, y) = \exp(-\frac{|| x - y ||^2}{2 \mu^2}). * @f] * * The implementation is all in the header file because it is so simple. */ class GaussianKernel { public: /** * Default constructor; sets bandwidth to 1.0. */ GaussianKernel() : bandwidth(1.0), gamma(-0.5) { } /** * Construct the Gaussian kernel with a custom bandwidth. * * @param bandwidth The bandwidth of the kernel (@f$\mu@f$). */ GaussianKernel(const double bandwidth) : bandwidth(bandwidth), gamma(-0.5 * pow(bandwidth, -2.0)) { } /** * Evaluation of the Gaussian kernel. This could be generalized to use any * distance metric, not the Euclidean distance, but for now, the Euclidean * distance is used. * * @tparam VecType Type of vector (likely arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ template double Evaluate(const VecType& a, const VecType& b) const { // The precalculation of gamma saves us a little computation time. return exp(gamma * metric::SquaredEuclideanDistance::Evaluate(a, b)); } /** * Evaluation of the Gaussian kernel given the distance between two points. * * @param t The distance between the two points the kernel is evaluated on. * @return K(t) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ double Evaluate(const double t) const { // The precalculation of gamma saves us a little computation time. return exp(gamma * std::pow(t, 2.0)); } /** * Obtain the normalization constant of the Gaussian kernel. * * @param dimension * @return the normalization constant */ double Normalizer(const size_t dimension) { return pow(sqrt(2.0 * M_PI) * bandwidth, (double) dimension); } /** * Obtain a convolution integral of the Gaussian kernel. * * @param a, first vector * @param b, second vector * @return the convolution integral */ template double ConvolutionIntegral(const VecType& a, const VecType& b) { return Evaluate(sqrt(metric::SquaredEuclideanDistance::Evaluate(a, b) / 2.0)) / (Normalizer(a.n_rows) * pow(2.0, (double) a.n_rows / 2.0)); } //! Get the bandwidth. double Bandwidth() const { return bandwidth; } //! Modify the bandwidth. This takes an argument because we must update the //! precalculated constant (gamma). void Bandwidth(const double bandwidth) { this->bandwidth = bandwidth; this->gamma = -0.5 * pow(bandwidth, -2.0); } //! Get the precalculated constant. double Gamma() const { return gamma; } //! Convert object to string. std::string ToString() const { std::ostringstream convert; convert << "GaussianKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: //! Kernel bandwidth. double bandwidth; //! Precalculated constant depending on the bandwidth; //! @f$ \gamma = -\frac{1}{2 \mu^2} @f$. double gamma; }; //! Kernel traits for the Gaussian kernel. template<> class KernelTraits { public: //! The Gaussian kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/0000755000176200001440000000000013647512751017254 5ustar liggesusersRcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/0000755000176200001440000000000013647514343022212 5ustar liggesusersRcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.hpp0000644000176200001440000001161513647512751031016 0ustar liggesusers/** * @file aug_lagrangian_test_functions.hpp * @author Ryan Curtin * * Define test functions for the augmented Lagrangian method. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_TEST_FUNCTIONS_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_TEST_FUNCTIONS_HPP #include namespace mlpack { namespace optimization { /** * This function is taken from "Practical Mathematical Optimization" (Snyman), * section 5.3.8 ("Application of the Augmented Lagrangian Method"). It has * only one constraint. * * The minimum that satisfies the constraint is x = [1, 4], with an objective * value of 70. */ class AugLagrangianTestFunction { public: AugLagrangianTestFunction(); AugLagrangianTestFunction(const arma::mat& initial_point); double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); size_t NumConstraints() const { return 1; } double EvaluateConstraint(const size_t index, const arma::mat& coordinates); void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const { return initialPoint; } // convert the obkect into a string std::string ToString() const; private: arma::mat initialPoint; }; /** * This function is taken from M. Gockenbach's lectures on general nonlinear * programs, found at: * http://www.math.mtu.edu/~msgocken/ma5630spring2003/lectures/nlp/nlp.pdf * * The program we are using is example 2.5 from this document. * I have arbitrarily decided that this will be called the Gockenbach function. * * The minimum that satisfies the two constraints is given as * x = [0.12288, -1.1078, 0.015100], with an objective value of about 29.634. */ class GockenbachFunction { public: GockenbachFunction(); GockenbachFunction(const arma::mat& initial_point); double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); size_t NumConstraints() const { return 2; }; double EvaluateConstraint(const size_t index, const arma::mat& coordinates); void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const { return initialPoint; } private: arma::mat initialPoint; }; /** * This function is the Lovasz-Theta semidefinite program, as implemented in the * following paper: * * S. Burer, R. Monteiro * "A nonlinear programming algorithm for solving semidefinite programs via * low-rank factorization." * Journal of Mathematical Programming, 2004 * * Given a simple, undirected graph G = (V, E), the Lovasz-Theta SDP is defined * by: * * min_X{Tr(-(e e^T)^T X) : Tr(X) = 1, X_ij = 0 for all (i, j) in E, X >= 0} * * where e is the vector of all ones and X has dimension |V| x |V|. * * In the Monteiro-Burer formulation, we take X = R * R^T, where R is the * coordinates given to the Evaluate(), Gradient(), EvaluateConstraint(), and * GradientConstraint() functions. */ class LovaszThetaSDP { public: LovaszThetaSDP(); /** * Initialize the Lovasz-Theta SDP with the given set of edges. The edge * matrix should consist of rows of two dimensions, where dimension 0 is the * first vertex of the edge and dimension 1 is the second edge (or vice versa, * as it doesn't make a difference). * * @param edges Matrix of edges. */ LovaszThetaSDP(const arma::mat& edges); double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); size_t NumConstraints() const; double EvaluateConstraint(const size_t index, const arma::mat& coordinates); void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint(); const arma::mat& Edges() const { return edges; } arma::mat& Edges() { return edges; } private: arma::mat edges; size_t vertices; arma::mat initialPoint; }; }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_TEST_FUNCTIONS_HPP RcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian.hpp0000644000176200001440000001376213647512751025674 0ustar liggesusers/** * @file aug_lagrangian.hpp * @author Ryan Curtin * * Definition of AugLagrangian class, which implements the Augmented Lagrangian * optimization method (also called the 'method of multipliers'. This class * uses the L-BFGS optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_HPP #include #include #include "aug_lagrangian_function.hpp" namespace mlpack { namespace optimization { /** * The AugLagrangian class implements the Augmented Lagrangian method of * optimization. In this scheme, a penalty term is added to the Lagrangian. * This method is also called the "method of multipliers". * * The template class LagrangianFunction must implement the following five * methods: * * - double Evaluate(const arma::mat& coordinates); * - void Gradient(const arma::mat& coordinates, arma::mat& gradient); * - size_t NumConstraints(); * - double EvaluateConstraint(size_t index, const arma::mat& coordinates); * - double GradientConstraint(size_t index, const arma::mat& coordinates, * arma::mat& gradient); * * The number of constraints must be greater than or equal to 0, and * EvaluateConstraint() should evaluate the constraint at the given index for * the given coordinates. Evaluate() should provide the objective function * value for the given coordinates. * * @tparam LagrangianFunction Function which can be optimized by this class. */ template class AugLagrangian { public: //! Shorthand for the type of the L-BFGS optimizer we'll be using. typedef L_BFGS > L_BFGSType; /** * Initialize the Augmented Lagrangian with the default L-BFGS optimizer. We * limit the number of L-BFGS iterations to 1000, rather than the unlimited * default L-BFGS. * * @param function The function to be optimized. */ AugLagrangian(LagrangianFunction& function); /** * Initialize the Augmented Lagrangian with a custom L-BFGS optimizer. * * @param function The function to be optimized. This must be a pre-created * utility AugLagrangianFunction. * @param lbfgs The custom L-BFGS optimizer to be used. This should have * already been initialized with the given AugLagrangianFunction. */ AugLagrangian(AugLagrangianFunction& augfunc, L_BFGSType& lbfgs); /** * Optimize the function. The value '1' is used for the initial value of each * Lagrange multiplier. To set the Lagrange multipliers yourself, use the * other overload of Optimize(). * * @param coordinates Output matrix to store the optimized coordinates in. * @param maxIterations Maximum number of iterations of the Augmented * Lagrangian algorithm. 0 indicates no maximum. * @param sigma Initial penalty parameter. */ bool Optimize(arma::mat& coordinates, const size_t maxIterations = 1000); /** * Optimize the function, giving initial estimates for the Lagrange * multipliers. The vector of Lagrange multipliers will be modified to * contain the Lagrange multipliers of the final solution (if one is found). * * @param coordinates Output matrix to store the optimized coordinates in. * @param initLambda Vector of initial Lagrange multipliers. Should have * length equal to the number of constraints. * @param initSigma Initial penalty parameter. * @param maxIterations Maximum number of iterations of the Augmented * Lagrangian algorithm. 0 indicates no maximum. */ bool Optimize(arma::mat& coordinates, const arma::vec& initLambda, const double initSigma, const size_t maxIterations = 1000); //! Get the LagrangianFunction. const LagrangianFunction& Function() const { return function; } //! Modify the LagrangianFunction. LagrangianFunction& Function() { return function; } //! Get the L-BFGS object used for the actual optimization. const L_BFGSType& LBFGS() const { return lbfgs; } //! Modify the L-BFGS object used for the actual optimization. L_BFGSType& LBFGS() { return lbfgs; } //! Get the Lagrange multipliers. const arma::vec& Lambda() const { return augfunc.Lambda(); } //! Modify the Lagrange multipliers (i.e. set them before optimization). arma::vec& Lambda() { return augfunc.Lambda(); } //! Get the penalty parameter. double Sigma() const { return augfunc.Sigma(); } //! Modify the penalty parameter. double& Sigma() { return augfunc.Sigma(); } // convert the obkect into a string std::string ToString() const; private: //! Function to be optimized. LagrangianFunction& function; //! Internally used AugLagrangianFunction which holds the function we are //! optimizing. This isn't publically accessible, but we provide ways to get //! to the Lagrange multipliers and the penalty parameter sigma. AugLagrangianFunction augfunc; //! If the user did not pass an L_BFGS object, we'll use our own internal one. L_BFGSType lbfgsInternal; //! The L-BFGS optimizer that we will use. L_BFGSType& lbfgs; }; }; // namespace optimization }; // namespace mlpack #include "aug_lagrangian_impl.hpp" #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_HPP RcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function.hpp0000644000176200001440000001074713647512751027601 0ustar liggesusers/** * @file aug_lagrangian_function.hpp * @author Ryan Curtin * * Contains a utility class for AugLagrangian. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_HPP #include namespace mlpack { namespace optimization { /** * This is a utility class used by AugLagrangian, meant to wrap a * LagrangianFunction into a function usable by a simple optimizer like L-BFGS. * Given a LagrangianFunction which follows the format outlined in the * documentation for AugLagrangian, this class provides Evaluate(), Gradient(), * and GetInitialPoint() functions which allow this class to be used with a * simple optimizer like L-BFGS. * * This class can be specialized for your particular implementation -- commonly, * a faster method for computing the overall objective and gradient of the * augmented Lagrangian function can be implemented than the naive, default * implementation given. Use class template specialization and re-implement all * of the methods (unfortunately, C++ specialization rules mean you have to * re-implement everything). * * @tparam LagrangianFunction Lagrangian function to be used. */ template class AugLagrangianFunction { public: /** * Initialize the AugLagrangianFunction, but don't set the Lagrange * multipliers or penalty parameters yet. Make sure you set the Lagrange * multipliers before you use this... * * @param function Lagrangian function. */ AugLagrangianFunction(LagrangianFunction& function); /** * Initialize the AugLagrangianFunction with the given LagrangianFunction, * Lagrange multipliers, and initial penalty parameter. * * @param function Lagrangian function. * @param lambda Initial Lagrange multipliers. * @param sigma Initial penalty parameter. */ AugLagrangianFunction(LagrangianFunction& function, const arma::vec& lambda, const double sigma); /** * Evaluate the objective function of the Augmented Lagrangian function, which * is the standard Lagrangian function evaluation plus a penalty term, which * penalizes unsatisfied constraints. * * @param coordinates Coordinates to evaluate function at. * @return Objective function. */ double Evaluate(const arma::mat& coordinates) const; /** * Evaluate the gradient of the Augmented Lagrangian function. * * @param coordinates Coordinates to evaluate gradient at. * @param gradient Matrix to store gradient into. */ void Gradient(const arma::mat& coordinates, arma::mat& gradient) const; /** * Get the initial point of the optimization (supplied by the * LagrangianFunction). * * @return Initial point. */ const arma::mat& GetInitialPoint() const; //! Get the Lagrange multipliers. const arma::vec& Lambda() const { return lambda; } //! Modify the Lagrange multipliers. arma::vec& Lambda() { return lambda; } //! Get sigma (the penalty parameter). double Sigma() const { return sigma; } //! Modify sigma (the penalty parameter). double& Sigma() { return sigma; } //! Get the Lagrangian function. const LagrangianFunction& Function() const { return function; } //! Modify the Lagrangian function. LagrangianFunction& Function() { return function; } // convert the obkect into a string std::string ToString() const; private: //! Instantiation of the function to be optimized. LagrangianFunction& function; //! The Lagrange multipliers. arma::vec lambda; //! The penalty parameter. double sigma; }; }; // namespace optimization }; // namespace mlpack // Include basic implementation. #include "aug_lagrangian_function_impl.hpp" #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_HPP RcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.cpp0000644000176200001440000003001713647514343031005 0ustar liggesusers/** * @file aug_lagrangian_test_functions.cpp * @author Ryan Curtin * * Implementation of AugLagrangianTestFunction class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "aug_lagrangian_test_functions.hpp" using namespace mlpack; using namespace mlpack::optimization; // // AugLagrangianTestFunction // AugLagrangianTestFunction::AugLagrangianTestFunction() { // Set the initial point to be (0, 0). initialPoint.zeros(2, 1); } AugLagrangianTestFunction::AugLagrangianTestFunction( const arma::mat& initialPoint) : initialPoint(initialPoint) { // Nothing to do. } double AugLagrangianTestFunction::Evaluate(const arma::mat& coordinates) { // f(x) = 6 x_1^2 + 4 x_1 x_2 + 3 x_2^2 return ((6 * std::pow(coordinates[0], 2)) + (4 * (coordinates[0] * coordinates[1])) + (3 * std::pow(coordinates[1], 2))); } void AugLagrangianTestFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // f'_x1(x) = 12 x_1 + 4 x_2 // f'_x2(x) = 4 x_1 + 6 x_2 gradient.set_size(2, 1); gradient[0] = 12 * coordinates[0] + 4 * coordinates[1]; gradient[1] = 4 * coordinates[0] + 6 * coordinates[1]; } double AugLagrangianTestFunction::EvaluateConstraint(const size_t index, const arma::mat& coordinates) { // We return 0 if the index is wrong (not 0). if (index != 0) return 0; // c(x) = x_1 + x_2 - 5 return (coordinates[0] + coordinates[1] - 5); } void AugLagrangianTestFunction::GradientConstraint(const size_t index, const arma::mat& /* coordinates */, arma::mat& gradient) { // If the user passed an invalid index (not 0), we will return a zero // gradient. gradient.zeros(2, 1); if (index == 0) { // c'_x1(x) = 1 // c'_x2(x) = 1 gradient.ones(2, 1); // Use a shortcut instead of assigning individually. } } // // GockenbachFunction // GockenbachFunction::GockenbachFunction() { // Set the initial point to (0, 0, 1). initialPoint.zeros(3, 1); initialPoint[2] = 1; } GockenbachFunction::GockenbachFunction(const arma::mat& initialPoint) : initialPoint(initialPoint) { // Nothing to do. } double GockenbachFunction::Evaluate(const arma::mat& coordinates) { // f(x) = (x_1 - 1)^2 + 2 (x_2 + 2)^2 + 3(x_3 + 3)^2 return ((std::pow(coordinates[0] - 1, 2)) + (2 * std::pow(coordinates[1] + 2, 2)) + (3 * std::pow(coordinates[2] + 3, 2))); } void GockenbachFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // f'_x1(x) = 2 (x_1 - 1) // f'_x2(x) = 4 (x_2 + 2) // f'_x3(x) = 6 (x_3 + 3) gradient.set_size(3, 1); gradient[0] = 2 * (coordinates[0] - 1); gradient[1] = 4 * (coordinates[1] + 2); gradient[2] = 6 * (coordinates[2] + 3); } double GockenbachFunction::EvaluateConstraint(const size_t index, const arma::mat& coordinates) { double constraint = 0; switch (index) { case 0: // g(x) = (x_3 - x_2 - x_1 - 1) = 0 constraint = (coordinates[2] - coordinates[1] - coordinates[0] - 1); break; case 1: // h(x) = (x_3 - x_1^2) >= 0 // To deal with the inequality, the constraint will simply evaluate to 0 // when h(x) >= 0. constraint = std::min(0.0, (coordinates[2] - std::pow(coordinates[0], 2))); break; } // 0 will be returned for an invalid index (but this is okay). return constraint; } void GockenbachFunction::GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient) { gradient.zeros(3, 1); switch (index) { case 0: // g'_x1(x) = -1 // g'_x2(x) = -1 // g'_x3(x) = 1 gradient[0] = -1; gradient[1] = -1; gradient[2] = 1; break; case 1: // h'_x1(x) = -2 x_1 // h'_x2(x) = 0 // h'_x3(x) = 1 gradient[0] = -2 * coordinates[0]; gradient[2] = 1; break; } } // // LovaszThetaSDP // LovaszThetaSDP::LovaszThetaSDP() : edges(0), vertices(0), initialPoint(0, 0) { } LovaszThetaSDP::LovaszThetaSDP(const arma::mat& edges) : edges(edges), initialPoint(0, 0) { // Calculate V by finding the maximum index in the edges matrix. vertices = max(max(edges)) + 1; // Rcpp::Rcout << vertices << " vertices in graph." << std::endl; } double LovaszThetaSDP::Evaluate(const arma::mat& coordinates) { // The objective is equal to -Tr(ones * X) = -Tr(ones * (R^T * R)). // This can be simplified into the negative sum of (R^T * R). // Rcpp::Rcout << "Evaluting objective function with coordinates:" << std::endl; // std::cout << coordinates << std::endl; // Rcpp::Rcout << "trans(coord) * coord:" << std::endl; // std::cout << (trans(coordinates) * coordinates) << std::endl; arma::mat x = trans(coordinates) * coordinates; double obj = -accu(x); // double obj = 0; // for (size_t i = 0; i < coordinates.n_cols; i++) // obj -= dot(coordinates.col(i), coordinates.col(i)); // Rcpp::Rcout << "Objective function is " << obj << "." << std::endl; return obj; } void LovaszThetaSDP::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // The gradient is equal to (2 S' R^T)^T, with R being coordinates. // S' = C - sum_{i = 1}^{m} [ y_i - sigma (Tr(A_i * (R^T R)) - b_i)] * A_i // We will calculate it in a not very smart way, but it should work. // Rcpp::Rcout << "Using stupid specialization for gradient calculation!" // << std::endl; // Initialize S' piece by piece. It is of size n x n. const size_t n = coordinates.n_cols; arma::mat s(n, n); s.ones(); s *= -1; // C = -ones(). for (size_t i = 0; i < NumConstraints(); ++i) { // Calculate [ y_i - sigma (Tr(A_i * (R^T R)) - b_i) ] * A_i. // Result will be a matrix; inner result is a scalar. if (i == 0) { // A_0 = I_n. Hooray! That's easy! b_0 = 1. double inner = -1 * double(n) - 0.5 * (trace(trans(coordinates) * coordinates) - 1); arma::mat zz = (inner * arma::eye(n, n)); // Rcpp::Rcout << "Constraint " << i << " matrix to add is " << std::endl; // Rcpp::Rcout << zz << std::endl; s -= zz; } else { // Get edge so we can construct constraint A_i matrix. b_i = 0. arma::vec edge = edges.col(i - 1); arma::mat a; a.zeros(n, n); // Only two nonzero entries. a(edge[0], edge[1]) = 1; a(edge[1], edge[0]) = 1; double inner = (-1) - 0.5 * (trace(a * (trans(coordinates) * coordinates))); arma::mat zz = (inner * a); // Rcpp::Rcout << "Constraint " << i << " matrix to add is " << std::endl; // Rcpp::Rcout << zz << std::endl; s -= zz; } } // Rcpp::Rcout << "Calculated S is: " << std::endl << s << std::endl; gradient = trans(2 * s * trans(coordinates)); // Rcpp::Rcout << "Calculated gradient is: " << std::endl << gradient << std::endl; // Rcpp::Rcout << "Evaluating gradient. " << std::endl; // The gradient of -Tr(ones * X) is equal to -2 * ones * R // arma::mat ones; // ones.ones(coordinates.n_rows, coordinates.n_rows); // gradient = -2 * ones * coordinates; // Rcpp::Rcout << "Done with gradient." << std::endl; // std::cout << gradient; } size_t LovaszThetaSDP::NumConstraints() const { // Each edge is a constraint, and we have the constraint Tr(X) = 1. return edges.n_cols + 1; } double LovaszThetaSDP::EvaluateConstraint(const size_t index, const arma::mat& coordinates) { if (index == 0) // This is the constraint Tr(X) = 1. { double sum = -1; // Tr(X) - 1 = 0, so we prefix the subtraction. for (size_t i = 0; i < coordinates.n_cols; i++) sum += std::abs(dot(coordinates.col(i), coordinates.col(i))); // Rcpp::Rcout << "Constraint " << index << " evaluates to " << sum << std::endl; return sum; } size_t i = edges(0, index - 1); size_t j = edges(1, index - 1); // Rcpp::Rcout << "Constraint " << index << " evaluates to " << // dot(coordinates.col(i), coordinates.col(j)) << "." << std::endl; // The constraint itself is X_ij, or (R^T R)_ij. return std::abs(dot(coordinates.col(i), coordinates.col(j))); } void LovaszThetaSDP::GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient) { // Rcpp::Rcout << "Gradient of constraint " << index << " is " << std::endl; if (index == 0) // This is the constraint Tr(X) = 1. { gradient = 2 * coordinates; // d/dR (Tr(R R^T)) = 2 R. // std::cout << gradient; return; } // Rcpp::Rcout << "Evaluating gradient of constraint " << index << " with "; size_t i = edges(0, index - 1); size_t j = edges(1, index - 1); // Rcpp::Rcout << "i = " << i << " and j = " << j << "." << std::endl; // Since the constraint is (R^T R)_ij, the gradient for (x, y) will be (I // derived this for one of the MVU constraints): // 0 , y != i, y != j // 2 R_xj, y = i, y != j // 2 R_xi, y != i, y = j // 4 R_xy, y = i, y = j // This results in the gradient matrix having two nonzero rows; for row // i, the elements are R_nj, where n is the row; for column j, the elements // are R_ni. gradient.zeros(coordinates.n_rows, coordinates.n_cols); gradient.col(i) = coordinates.col(j); gradient.col(j) += coordinates.col(i); // In case j = i (shouldn't happen). // std::cout << gradient; } const arma::mat& LovaszThetaSDP::GetInitialPoint() { if (initialPoint.n_rows != 0 && initialPoint.n_cols != 0) return initialPoint; // It has already been calculated. // Rcpp::Rcout << "Calculating initial point." << std::endl; // First, we must calculate the correct value of r. The matrix we return, R, // will be r x V, because X = R^T R is of dimension V x V. // The rule for calculating r (from Monteiro and Burer, eq. 5) is // r = max(r >= 0 : r (r + 1) / 2 <= m } // where m is equal to the number of constraints plus one. // // Solved, this is // 0.5 r^2 + 0.5 r - m = 0 // which becomes // r = (-0.5 [+/-] sqrt((-0.5)^2 - 4 * -0.5 * m)) / -1 // r = 0.5 [+/-] sqrt(0.25 + 2 m) // and because m is always positive, // r = 0.5 + sqrt(0.25 + 2m) float m = NumConstraints(); float r = 0.5 + sqrt(0.25 + 2 * m); if (ceil(r) > vertices) r = vertices; // An upper bound on the dimension. Rcpp::Rcout << "Dimension will be " << ceil(r) << " x " << vertices << "." << std::endl; initialPoint.set_size(ceil(r), vertices); // Now we set the entries of the initial matrix according to the formula given // in Section 4 of Monteiro and Burer. for (size_t i = 0; i < r; i++) { for (size_t j = 0; j < (size_t) vertices; j++) { if (i == j) initialPoint(i, j) = sqrt(1.0 / r) + sqrt(1.0 / (vertices * m)); else initialPoint(i, j) = sqrt(1.0 / (vertices * m)); } } Rcpp::Rcout << "Initial matrix " << std::endl << initialPoint << std::endl; Rcpp::Rcout << "X " << std::endl << trans(initialPoint) * initialPoint << std::endl; Rcpp::Rcout << "accu " << accu(trans(initialPoint) * initialPoint) << std::endl; return initialPoint; } std::string AugLagrangianTestFunction::ToString() const { // The actual output is unimportant -- this object is only for testing anyway. std::ostringstream convert; convert << "AugLagrangianTestFunction [" << this << "]" << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function_impl.hpp0000644000176200001440000001004413647512751030610 0ustar liggesusers/** * @file aug_lagrangian_function_impl.hpp * @author Ryan Curtin * * Simple, naive implementation of AugLagrangianFunction. Better * specializations can probably be given in many cases, but this is the most * general case. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_IMPL_HPP // In case it hasn't been included. #include "aug_lagrangian_function.hpp" namespace mlpack { namespace optimization { // Initialize the AugLagrangianFunction. template AugLagrangianFunction::AugLagrangianFunction( LagrangianFunction& function) : function(function), lambda(function.NumConstraints()), sigma(10) { // Initialize lambda vector to all zeroes. lambda.zeros(); } // Initialize the AugLagrangianFunction. template AugLagrangianFunction::AugLagrangianFunction( LagrangianFunction& function, const arma::vec& lambda, const double sigma) : lambda(lambda), sigma(sigma), function(function) { // Nothing else to do. } // Evaluate the AugLagrangianFunction at the given coordinates. template double AugLagrangianFunction::Evaluate( const arma::mat& coordinates) const { // The augmented Lagrangian is evaluated as // f(x) + {-lambda_i * c_i(x) + (sigma / 2) c_i(x)^2} for all constraints // First get the function's objective value. double objective = function.Evaluate(coordinates); // Now loop for each constraint. for (size_t i = 0; i < function.NumConstraints(); ++i) { double constraint = function.EvaluateConstraint(i, coordinates); objective += (-lambda[i] * constraint) + sigma * std::pow(constraint, 2) / 2; } return objective; } // Evaluate the gradient of the AugLagrangianFunction at the given coordinates. template void AugLagrangianFunction::Gradient( const arma::mat& coordinates, arma::mat& gradient) const { // The augmented Lagrangian's gradient is evaluted as // f'(x) + {(-lambda_i + sigma * c_i(x)) * c'_i(x)} for all constraints gradient.zeros(); function.Gradient(coordinates, gradient); arma::mat constraintGradient; // Temporary for constraint gradients. for (size_t i = 0; i < function.NumConstraints(); i++) { function.GradientConstraint(i, coordinates, constraintGradient); // Now calculate scaling factor and add to existing gradient. arma::mat tmpGradient; tmpGradient = (-lambda[i] + sigma * function.EvaluateConstraint(i, coordinates)) * constraintGradient; gradient += tmpGradient; } } // Get the initial point. template const arma::mat& AugLagrangianFunction::GetInitialPoint() const { return function.GetInitialPoint(); } // Convert the object to a string. template std::string AugLagrangianFunction::ToString() const { std::ostringstream convert; convert << "AugLagrangianFunction [" << this << "]" << std::endl; convert << " Lagrange multipliers:" << std::endl; convert << lambda; convert << " Penalty parameter: " << sigma << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_impl.hpp0000644000176200001440000001435313647512751026712 0ustar liggesusers/** * @file aug_lagrangian_impl.hpp * @author Ryan Curtin * * Implementation of AugLagrangian class (Augmented Lagrangian optimization * method). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_IMPL_HPP #include #include "aug_lagrangian_function.hpp" namespace mlpack { namespace optimization { template AugLagrangian::AugLagrangian(LagrangianFunction& function) : function(function), augfunc(function), lbfgsInternal(augfunc), lbfgs(lbfgsInternal) { lbfgs.MaxIterations() = 1000; } template AugLagrangian::AugLagrangian( AugLagrangianFunction& augfunc, L_BFGSType& lbfgs) : function(augfunc.Function()), augfunc(augfunc), lbfgs(lbfgs) { // Nothing to do. lbfgsInternal isn't used in this case. } // This overload just sets the lambda and sigma and calls the other overload. template bool AugLagrangian::Optimize(arma::mat& coordinates, const arma::vec& initLambda, const double initSigma, const size_t maxIterations) { augfunc.Lambda() = initLambda; augfunc.Sigma() = initSigma; return Optimize(coordinates, maxIterations); } // Convert the object to a string. template std::string AugLagrangian::ToString() const { std::ostringstream convert; convert << "AugLagrangian [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << mlpack::util::Indent(function.ToString(), 2); convert << " L-BFGS optimizer:" << std::endl; convert << mlpack::util::Indent(lbfgs.ToString(), 2); return convert.str(); } template bool AugLagrangian::Optimize(arma::mat& coordinates, const size_t maxIterations) { // Ensure that we update lambda immediately. double penaltyThreshold = DBL_MAX; // Track the last objective to compare for convergence. double lastObjective = function.Evaluate(coordinates); // Then, calculate the current penalty. double penalty = 0; for (size_t i = 0; i < function.NumConstraints(); i++) penalty += std::pow(function.EvaluateConstraint(i, coordinates), 2); Rcpp::Rcout << "Penalty is " << penalty << " (threshold " << penaltyThreshold << ")." << std::endl; // The odd comparison allows user to pass maxIterations = 0 (i.e. no limit on // number of iterations). size_t it; for (it = 0; it != (maxIterations - 1); it++) { Rcpp::Rcout << "AugLagrangian on iteration " << it << ", starting with objective " << lastObjective << "." << std::endl; // Rcpp::Rcout << coordinates << std::endl; // Rcpp::Rcout << trans(coordinates) * coordinates << std::endl; if (!lbfgs.Optimize(coordinates)) Rcpp::Rcout << "L-BFGS reported an error during optimization." << std::endl; // Check if we are done with the entire optimization (the threshold we are // comparing with is arbitrary). if (std::abs(lastObjective - function.Evaluate(coordinates)) < 1e-10 && augfunc.Sigma() > 500000) return true; lastObjective = function.Evaluate(coordinates); // Assuming that the optimization has converged to a new set of coordinates, // we now update either lambda or sigma. We update sigma if the penalty // term is too high, and we update lambda otherwise. // First, calculate the current penalty. double penalty = 0; for (size_t i = 0; i < function.NumConstraints(); i++) { penalty += std::pow(function.EvaluateConstraint(i, coordinates), 2); // Rcpp::Rcout << "Constraint " << i << " is " << // function.EvaluateConstraint(i, coordinates) << std::endl; } Rcpp::Rcout << "Penalty is " << penalty << " (threshold " << penaltyThreshold << ")." << std::endl; for (size_t i = 0; i < function.NumConstraints(); ++i) { // arma::mat tmpgrad; // function.GradientConstraint(i, coordinates, tmpgrad); // Rcpp::Rcout << "Gradient of constraint " << i << " is " << std::endl; // Rcpp::Rcout << tmpgrad << std::endl; } if (penalty < penaltyThreshold) // We update lambda. { // We use the update: lambda_{k + 1} = lambda_k - sigma * c(coordinates), // but we have to write a loop to do this for each constraint. for (size_t i = 0; i < function.NumConstraints(); i++) augfunc.Lambda()[i] -= augfunc.Sigma() * function.EvaluateConstraint(i, coordinates); // We also update the penalty threshold to be a factor of the current // penalty. TODO: this factor should be a parameter (from CLI). The // value of 0.25 is taken from Burer and Monteiro (2002). penaltyThreshold = 0.25 * penalty; Rcpp::Rcout << "Lagrange multiplier estimates updated." << std::endl; } else { // We multiply sigma by a constant value. TODO: this factor should be a // parameter (from CLI). The value of 10 is taken from Burer and Monteiro // (2002). augfunc.Sigma() *= 10; Rcpp::Rcout << "Updated sigma to " << augfunc.Sigma() << "." << std::endl; } } return false; } }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_IMPL_HPP RcppMLPACK/src/mlpack/core/optimizers/lbfgs/0000755000176200001440000000000013647514343020350 5ustar liggesusersRcppMLPACK/src/mlpack/core/optimizers/lbfgs/test_functions.cpp0000644000176200001440000001552613647514343024134 0ustar liggesusers/** * @file test_functions.cpp * @author Ryan Curtin * * Implementations of the test functions defined in test_functions.hpp. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "test_functions.hpp" using namespace mlpack::optimization::test; // // RosenbrockFunction implementation // RosenbrockFunction::RosenbrockFunction() { initialPoint.set_size(2, 1); initialPoint[0] = -1.2; initialPoint[1] = 1; } /** * Calculate the objective function. */ double RosenbrockFunction::Evaluate(const arma::mat& coordinates) { double x1 = coordinates[0]; double x2 = coordinates[1]; double objective = /* f1(x) */ 100 * std::pow(x2 - std::pow(x1, 2), 2) + /* f2(x) */ std::pow(1 - x1, 2); return objective; } /** * Calculate the gradient. */ void RosenbrockFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // f'_{x1}(x) = -2 (1 - x1) + 400 (x1^3 - (x2 x1)) // f'_{x2}(x) = 200 (x2 - x1^2) double x1 = coordinates[0]; double x2 = coordinates[1]; gradient.set_size(2, 1); gradient[0] = -2 * (1 - x1) + 400 * (std::pow(x1, 3) - x2 * x1); gradient[1] = 200 * (x2 - std::pow(x1, 2)); } const arma::mat& RosenbrockFunction::GetInitialPoint() const { return initialPoint; } // // WoodFunction implementation // WoodFunction::WoodFunction() { initialPoint.set_size(4, 1); initialPoint[0] = -3; initialPoint[1] = -1; initialPoint[2] = -3; initialPoint[3] = -1; } /** * Calculate the objective function. */ double WoodFunction::Evaluate(const arma::mat& coordinates) { // For convenience; we assume these temporaries will be optimized out. double x1 = coordinates[0]; double x2 = coordinates[1]; double x3 = coordinates[2]; double x4 = coordinates[3]; double objective = /* f1(x) */ 100 * std::pow(x2 - std::pow(x1, 2), 2) + /* f2(x) */ std::pow(1 - x1, 2) + /* f3(x) */ 90 * std::pow(x4 - std::pow(x3, 2), 2) + /* f4(x) */ std::pow(1 - x3, 2) + /* f5(x) */ 10 * std::pow(x2 + x4 - 2, 2) + /* f6(x) */ (1 / 10) * std::pow(x2 - x4, 2); return objective; } /** * Calculate the gradient. */ void WoodFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // For convenience; we assume these temporaries will be optimized out. double x1 = coordinates[0]; double x2 = coordinates[1]; double x3 = coordinates[2]; double x4 = coordinates[3]; // f'_{x1}(x) = 400 (x1^3 - x2 x1) - 2 (1 - x1) // f'_{x2}(x) = 200 (x2 - x1^2) + 20 (x2 + x4 - 2) + (1 / 5) (x2 - x4) // f'_{x3}(x) = 360 (x3^3 - x4 x3) - 2 (1 - x3) // f'_{x4}(x) = 180 (x4 - x3^2) + 20 (x2 + x4 - 2) - (1 / 5) (x2 - x4) gradient.set_size(4, 1); gradient[0] = 400 * (std::pow(x1, 3) - x2 * x1) - 2 * (1 - x1); gradient[1] = 200 * (x2 - std::pow(x1, 2)) + 20 * (x2 + x4 - 2) + (1 / 5) * (x2 - x4); gradient[2] = 360 * (std::pow(x3, 3) - x4 * x3) - 2 * (1 - x3); gradient[3] = 180 * (x4 - std::pow(x3, 2)) + 20 * (x2 + x4 - 2) - (1 / 5) * (x2 - x4); } const arma::mat& WoodFunction::GetInitialPoint() const { return initialPoint; } // // GeneralizedRosenbrockFunction implementation // GeneralizedRosenbrockFunction::GeneralizedRosenbrockFunction(int n) : n(n) { initialPoint.set_size(n, 1); for (int i = 0; i < n; i++) // Set to [-1.2 1 -1.2 1 ...]. { if (i % 2 == 1) initialPoint[i] = -1.2; else initialPoint[i] = 1; } } /** * Calculate the objective function. */ double GeneralizedRosenbrockFunction::Evaluate(const arma::mat& coordinates) const { double fval = 0; for (int i = 0; i < (n - 1); i++) { fval += 100 * std::pow(std::pow(coordinates[i], 2) - coordinates[i + 1], 2) + std::pow(1 - coordinates[i], 2); } return fval; } /** * Calculate the gradient. */ void GeneralizedRosenbrockFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) const { gradient.set_size(n); for (int i = 0; i < (n - 1); i++) { gradient[i] = 400 * (std::pow(coordinates[i], 3) - coordinates[i] * coordinates[i + 1]) + 2 * (coordinates[i] - 1); if (i > 0) gradient[i] += 200 * (coordinates[i] - std::pow(coordinates[i - 1], 2)); } gradient[n - 1] = 200 * (coordinates[n - 1] - std::pow(coordinates[n - 2], 2)); } //! Calculate the objective function of one of the individual functions. double GeneralizedRosenbrockFunction::Evaluate(const arma::mat& coordinates, const size_t i) const { return 100 * std::pow((std::pow(coordinates[i], 2) - coordinates[i + 1]), 2) + std::pow(1 - coordinates[i], 2); } //! Calculate the gradient of one of the individual functions. void GeneralizedRosenbrockFunction::Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) const { gradient.zeros(n); gradient[i] = 400 * (std::pow(coordinates[i], 3) - coordinates[i] * coordinates[i + 1]) + 2 * (coordinates[i] - 1); gradient[i + 1] = 200 * (coordinates[i + 1] - std::pow(coordinates[i], 2)); } const arma::mat& GeneralizedRosenbrockFunction::GetInitialPoint() const { return initialPoint; } // // RosenbrockWoodFunction implementation // RosenbrockWoodFunction::RosenbrockWoodFunction() : rf(4), wf() { initialPoint.set_size(4, 2); initialPoint.col(0) = rf.GetInitialPoint(); initialPoint.col(1) = wf.GetInitialPoint(); } /** * Calculate the objective function. */ double RosenbrockWoodFunction::Evaluate(const arma::mat& coordinates) { double objective = rf.Evaluate(coordinates.col(0)) + wf.Evaluate(coordinates.col(1)); return objective; } /*** * Calculate the gradient. */ void RosenbrockWoodFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { gradient.set_size(4, 2); arma::vec grf(4); arma::vec gwf(4); rf.Gradient(coordinates.col(0), grf); wf.Gradient(coordinates.col(1), gwf); gradient.col(0) = grf; gradient.col(1) = gwf; } const arma::mat& RosenbrockWoodFunction::GetInitialPoint() const { return initialPoint; } RcppMLPACK/src/mlpack/core/optimizers/lbfgs/test_functions.hpp0000644000176200001440000001245713647512751024142 0ustar liggesusers/** * @file test_functions.hpp * @author Ryan Curtin * * A collection of functions to test optimizers (in this case, L-BFGS). These * come from the following paper: * * "Testing Unconstrained Optimization Software" * Jorge J. Moré, Burton S. Garbow, and Kenneth E. Hillstrom. 1981. * ACM Trans. Math. Softw. 7, 1 (March 1981), 17-41. * http://portal.acm.org/citation.cfm?id=355934.355936 * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LBFGS_TEST_FUNCTIONS_HPP #define __MLPACK_CORE_OPTIMIZERS_LBFGS_TEST_FUNCTIONS_HPP #include // To fulfill the template policy class 'FunctionType', we must implement // the following: // // FunctionType(); // constructor // void Gradient(const arma::mat& coordinates, arma::mat& gradient); // double Evaluate(const arma::mat& coordinates); // const arma::mat& GetInitialPoint(); // // Note that we are using an arma::mat instead of the more intuitive and // expected arma::vec. This is because L-BFGS will also optimize matrices. // However, remember that an arma::vec is simply an (n x 1) arma::mat. You can // use either internally but the L-BFGS method requires arma::mat& to be passed // (C++ does not allow implicit reference casting to subclasses). namespace mlpack { namespace optimization { namespace test { /** * The Rosenbrock function, defined by * f(x) = f1(x) + f2(x) * f1(x) = 100 (x2 - x1^2)^2 * f2(x) = (1 - x1)^2 * x_0 = [-1.2, 1] * * This should optimize to f(x) = 0, at x = [1, 1]. * * "An automatic method for finding the greatest or least value of a function." * H.H. Rosenbrock. 1960. Comput. J. 3., 175-184. */ class RosenbrockFunction { public: RosenbrockFunction(); // initialize initial point double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; }; /** * The Wood function, defined by * f(x) = f1(x) + f2(x) + f3(x) + f4(x) + f5(x) + f6(x) * f1(x) = 100 (x2 - x1^2)^2 * f2(x) = (1 - x1)^2 * f3(x) = 90 (x4 - x3^2)^2 * f4(x) = (1 - x3)^2 * f5(x) = 10 (x2 + x4 - 2)^2 * f6(x) = (1 / 10) (x2 - x4)^2 * x_0 = [-3, -1, -3, -1] * * This should optimize to f(x) = 0, at x = [1, 1, 1, 1]. * * "A comparative study of nonlinear programming codes." * A.R. Colville. 1968. Rep. 320-2949, IBM N.Y. Scientific Center. */ class WoodFunction { public: WoodFunction(); // initialize initial point double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; }; /** * The Generalized Rosenbrock function in n dimensions, defined by * f(x) = sum_i^{n - 1} (f(i)(x)) * f_i(x) = 100 * (x_i^2 - x_{i + 1})^2 + (1 - x_i)^2 * x_0 = [-1.2, 1, -1.2, 1, ...] * * This should optimize to f(x) = 0, at x = [1, 1, 1, 1, ...]. * * This function can also be used for stochastic gradient descent (SGD) as a * decomposable function (DecomposableFunctionType), so there are other * overloads of Evaluate() and Gradient() implemented, as well as * NumFunctions(). * * "An analysis of the behavior of a glass of genetic adaptive systems." * K.A. De Jong. Ph.D. thesis, University of Michigan, 1975. */ class GeneralizedRosenbrockFunction { public: /*** * Set the dimensionality of the extended Rosenbrock function. * * @param n Number of dimensions for the function. */ GeneralizedRosenbrockFunction(int n); double Evaluate(const arma::mat& coordinates) const; void Gradient(const arma::mat& coordinates, arma::mat& gradient) const; size_t NumFunctions() const { return n - 1; } double Evaluate(const arma::mat& coordinates, const size_t i) const; void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) const; const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; int n; // Dimensionality }; /** * The Generalized Rosenbrock function in 4 dimensions with the Wood Function in * four dimensions. In this function we are actually optimizing a 2x4 matrix of * coordinates, not a vector. */ class RosenbrockWoodFunction { public: RosenbrockWoodFunction(); // initialize initial point double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; GeneralizedRosenbrockFunction rf; WoodFunction wf; }; }; // namespace test }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_LBFGS_TEST_FUNCTIONS_HPP RcppMLPACK/src/mlpack/core/optimizers/lbfgs/lbfgs.hpp0000644000176200001440000002377213647512751022172 0ustar liggesusers/** * @file lbfgs.hpp * @author Dongryeol Lee * @author Ryan Curtin * * The generic L-BFGS optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_HPP #define __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_HPP #include namespace mlpack { namespace optimization { /** * The generic L-BFGS optimizer, which uses a back-tracking line search * algorithm to minimize a function. The parameters for the algorithm (number * of memory points, maximum step size, and so forth) are all configurable via * either the constructor or standalone modifier functions. A function which * can be optimized by this class must implement the following methods: * * - a default constructor * - double Evaluate(const arma::mat& coordinates); * - void Gradient(const arma::mat& coordinates, arma::mat& gradient); * - arma::mat& GetInitialPoint(); */ template class L_BFGS { public: /** * Initialize the L-BFGS object. Store a reference to the function we will be * optimizing and set the size of the memory for the algorithm. There are * many parameters that can be set for the optimization, but default values * are given for each of them. * * @param function Instance of function to be optimized. * @param numBasis Number of memory points to be stored (default 5). * @param maxIterations Maximum number of iterations for the optimization * (default 0 -- may run indefinitely). * @param armijoConstant Controls the accuracy of the line search routine for * determining the Armijo condition. * @param wolfe Parameter for detecting the Wolfe condition. * @param minGradientNorm Minimum gradient norm required to continue the * optimization. * @param maxLineSearchTrials The maximum number of trials for the line search * (before giving up). * @param minStep The minimum step of the line search. * @param maxStep The maximum step of the line search. */ L_BFGS(FunctionType& function, const size_t numBasis = 5, /* entirely arbitrary */ const size_t maxIterations = 0, /* run forever */ const double armijoConstant = 1e-4, const double wolfe = 0.9, const double minGradientNorm = 1e-10, const size_t maxLineSearchTrials = 50, const double minStep = 1e-20, const double maxStep = 1e20); /** * Return the point where the lowest function value has been found. * * @return arma::vec representing the point and a double with the function * value at that point. */ const std::pair& MinPointIterate() const; /** * Use L-BFGS to optimize the given function, starting at the given iterate * point and finding the minimum. The maximum number of iterations is set in * the constructor (or with MaxIterations()). Alternately, another overload * is provided which takes a maximum number of iterations as a parameter. The * given starting point will be modified to store the finishing point of the * algorithm, and the final objective value is returned. * * @param iterate Starting point (will be modified). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate); /** * Use L-BFGS to optimize (minimize) the given function, starting at the given * iterate point, and performing no more than the given maximum number of * iterations (the class variable maxIterations is ignored for this run, but * not modified). The given starting point will be modified to store the * finishing point of the algorithm, and the final objective value is * returned. * * @param iterate Starting point (will be modified). * @param maxIterations Maximum number of iterations (0 specifies no limit). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate, const size_t maxIterations); //! Return the function that is being optimized. const FunctionType& Function() const { return function; } //! Modify the function that is being optimized. FunctionType& Function() { return function; } //! Get the memory size. size_t NumBasis() const { return numBasis; } //! Modify the memory size. size_t& NumBasis() { return numBasis; } //! Get the maximum number of iterations. size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations. size_t& MaxIterations() { return maxIterations; } //! Get the Armijo condition constant. double ArmijoConstant() const { return armijoConstant; } //! Modify the Armijo condition constant. double& ArmijoConstant() { return armijoConstant; } //! Get the Wolfe parameter. double Wolfe() const { return wolfe; } //! Modify the Wolfe parameter. double& Wolfe() { return wolfe; } //! Get the minimum gradient norm. double MinGradientNorm() const { return minGradientNorm; } //! Modify the minimum gradient norm. double& MinGradientNorm() { return minGradientNorm; } //! Get the maximum number of line search trials. size_t MaxLineSearchTrials() const { return maxLineSearchTrials; } //! Modify the maximum number of line search trials. size_t& MaxLineSearchTrials() { return maxLineSearchTrials; } //! Return the minimum line search step size. double MinStep() const { return minStep; } //! Modify the minimum line search step size. double& MinStep() { return minStep; } //! Return the maximum line search step size. double MaxStep() const { return maxStep; } //! Modify the maximum line search step size. double& MaxStep() { return maxStep; } // convert the obkect into a string std::string ToString() const; private: //! Internal reference to the function we are optimizing. FunctionType& function; //! Position of the new iterate. arma::mat newIterateTmp; //! Stores all the s matrices in memory. arma::cube s; //! Stores all the y matrices in memory. arma::cube y; //! Size of memory for this L-BFGS optimizer. size_t numBasis; //! Maximum number of iterations. size_t maxIterations; //! Parameter for determining the Armijo condition. double armijoConstant; //! Parameter for detecting the Wolfe condition. double wolfe; //! Minimum gradient norm required to continue the optimization. double minGradientNorm; //! Maximum number of trials for the line search. size_t maxLineSearchTrials; //! Minimum step of the line search. double minStep; //! Maximum step of the line search. double maxStep; //! Best point found so far. std::pair minPointIterate; /** * Evaluate the function at the given iterate point and store the result if it * is a new minimum. * * @return The value of the function. */ double Evaluate(const arma::mat& iterate); /** * Calculate the scaling factor, gamma, which is used to scale the Hessian * approximation matrix. See method M3 in Section 4 of Liu and Nocedal * (1989). * * @return The calculated scaling factor. */ double ChooseScalingFactor(const size_t iterationNum, const arma::mat& gradient); /** * Check to make sure that the norm of the gradient is not smaller than 1e-5. * Currently that value is not configurable. * * @return (norm < minGradientNorm). */ bool GradientNormTooSmall(const arma::mat& gradient); /** * Perform a back-tracking line search along the search direction to * calculate a step size satisfying the Wolfe conditions. The parameter * iterate will be modified if the method is successful. * * @param functionValue Value of the function at the initial point * @param iterate The initial point to begin the line search from * @param gradient The gradient at the initial point * @param searchDirection A vector specifying the search direction * @param stepSize Variable the calculated step size will be stored in * * @return false if no step size is suitable, true otherwise. */ bool LineSearch(double& functionValue, arma::mat& iterate, arma::mat& gradient, const arma::mat& searchDirection); /** * Find the L-BFGS search direction. * * @param gradient The gradient at the current point * @param iteration_num The iteration number * @param scaling_factor Scaling factor to use (see ChooseScalingFactor_()) * @param search_direction Vector to store search direction in */ void SearchDirection(const arma::mat& gradient, const size_t iterationNum, const double scalingFactor, arma::mat& searchDirection); /** * Update the y and s matrices, which store the differences * between the iterate and old iterate and the differences between the * gradient and the old gradient, respectively. * * @param iterationNum Iteration number * @param iterate Current point * @param oldIterate Point at last iteration * @param gradient Gradient at current point (iterate) * @param oldGradient Gradient at last iteration point (oldIterate) */ void UpdateBasisSet(const size_t iterationNum, const arma::mat& iterate, const arma::mat& oldIterate, const arma::mat& gradient, const arma::mat& oldGradient); }; }; // namespace optimization }; // namespace mlpack #include "lbfgs_impl.hpp" #endif // __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_HPP RcppMLPACK/src/mlpack/core/optimizers/lbfgs/lbfgs_impl.hpp0000644000176200001440000003632613647512751023212 0ustar liggesusers/** * @file lbfgs_impl.hpp * @author Dongryeol Lee (dongryel@cc.gatech.edu) * @author Ryan Curtin * * The implementation of the L_BFGS optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_IMPL_HPP namespace mlpack { namespace optimization { /** * Initialize the L_BFGS object. Copy the function we will be optimizing and * set the size of the memory for the algorithm. * * @param function Instance of function to be optimized * @param numBasis Number of memory points to be stored * @param armijoConstant Controls the accuracy of the line search routine for * determining the Armijo condition. * @param wolfe Parameter for detecting the Wolfe condition. * @param minGradientNorm Minimum gradient norm required to continue the * optimization. * @param maxLineSearchTrials The maximum number of trials for the line search * (before giving up). * @param minStep The minimum step of the line search. * @param maxStep The maximum step of the line search. */ template L_BFGS::L_BFGS(FunctionType& function, const size_t numBasis, const size_t maxIterations, const double armijoConstant, const double wolfe, const double minGradientNorm, const size_t maxLineSearchTrials, const double minStep, const double maxStep) : function(function), numBasis(numBasis), maxIterations(maxIterations), armijoConstant(armijoConstant), wolfe(wolfe), minGradientNorm(minGradientNorm), maxLineSearchTrials(maxLineSearchTrials), minStep(minStep), maxStep(maxStep) { // Get the dimensions of the coordinates of the function; GetInitialPoint() // might return an arma::vec, but that's okay because then n_cols will simply // be 1. const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; newIterateTmp.set_size(rows, cols); s.set_size(rows, cols, numBasis); y.set_size(rows, cols, numBasis); // Allocate the pair holding the min iterate information. minPointIterate.first.zeros(rows, cols); minPointIterate.second = std::numeric_limits::max(); } /** * Evaluate the function at the given iterate point and store the result if * it is a new minimum. * * @return The value of the function */ template double L_BFGS::Evaluate(const arma::mat& iterate) { // Evaluate the function and keep track of the minimum function // value encountered during the optimization. double functionValue = function.Evaluate(iterate); if (functionValue < minPointIterate.second) { minPointIterate.first = iterate; minPointIterate.second = functionValue; } return functionValue; } /** * Calculate the scaling factor gamma which is used to scale the Hessian * approximation matrix. See method M3 in Section 4 of Liu and Nocedal (1989). * * @return The calculated scaling factor */ template double L_BFGS::ChooseScalingFactor(const size_t iterationNum, const arma::mat& gradient) { double scalingFactor = 1.0; if (iterationNum > 0) { int previousPos = (iterationNum - 1) % numBasis; // Get s and y matrices once instead of multiple times. arma::mat& sMat = s.slice(previousPos); arma::mat& yMat = y.slice(previousPos); scalingFactor = dot(sMat, yMat) / dot(yMat, yMat); } else { scalingFactor = 1.0 / sqrt(dot(gradient, gradient)); } return scalingFactor; } /** * Check to make sure that the norm of the gradient is not smaller than 1e-10. * Currently that value is not configurable. * * @return (norm < minGradientNorm) */ template bool L_BFGS::GradientNormTooSmall(const arma::mat& gradient) { double norm = arma::norm(gradient, 2); return (norm < minGradientNorm); } /** * Perform a back-tracking line search along the search direction to calculate a * step size satisfying the Wolfe conditions. * * @param functionValue Value of the function at the initial point * @param iterate The initial point to begin the line search from * @param gradient The gradient at the initial point * @param searchDirection A vector specifying the search direction * @param stepSize Variable the calculated step size will be stored in * * @return false if no step size is suitable, true otherwise. */ template bool L_BFGS::LineSearch(double& functionValue, arma::mat& iterate, arma::mat& gradient, const arma::mat& searchDirection) { // Default first step size of 1.0. double stepSize = 1.0; // The initial linear term approximation in the direction of the // search direction. double initialSearchDirectionDotGradient = arma::dot(gradient, searchDirection); // If it is not a descent direction, just report failure. if (initialSearchDirectionDotGradient > 0.0) { Rcpp::Rcerr << "L-BFGS line search direction is not a descent direction " << "(terminating)!" << std::endl; return false; } // Save the initial function value. double initialFunctionValue = functionValue; // Unit linear approximation to the decrease in function value. double linearApproxFunctionValueDecrease = armijoConstant * initialSearchDirectionDotGradient; // The number of iteration in the search. size_t numIterations = 0; // Armijo step size scaling factor for increase and decrease. const double inc = 2.1; const double dec = 0.5; double width = 0; while (true) { // Perform a step and evaluate the gradient and the function values at that // point. newIterateTmp = iterate; newIterateTmp += stepSize * searchDirection; functionValue = Evaluate(newIterateTmp); function.Gradient(newIterateTmp, gradient); numIterations++; if (functionValue > initialFunctionValue + stepSize * linearApproxFunctionValueDecrease) { width = dec; } else { // Check Wolfe's condition. double searchDirectionDotGradient = arma::dot(gradient, searchDirection); if (searchDirectionDotGradient < wolfe * initialSearchDirectionDotGradient) { width = inc; } else { if (searchDirectionDotGradient > -wolfe * initialSearchDirectionDotGradient) { width = dec; } else { break; } } } // Terminate when the step size gets too small or too big or it // exceeds the max number of iterations. if ((stepSize < minStep) || (stepSize > maxStep) || (numIterations >= maxLineSearchTrials)) { return false; } // Scale the step size. stepSize *= width; } // Move to the new iterate. iterate = newIterateTmp; return true; } /** * Find the L_BFGS search direction. * * @param gradient The gradient at the current point * @param iterationNum The iteration number * @param scalingFactor Scaling factor to use (see ChooseScalingFactor_()) * @param searchDirection Vector to store search direction in */ template void L_BFGS::SearchDirection(const arma::mat& gradient, const size_t iterationNum, const double scalingFactor, arma::mat& searchDirection) { // Start from this point. searchDirection = gradient; // See "A Recursive Formula to Compute H * g" in "Updating quasi-Newton // matrices with limited storage" (Nocedal, 1980). // Temporary variables. arma::vec rho(numBasis); arma::vec alpha(numBasis); size_t limit = (numBasis > iterationNum) ? 0 : (iterationNum - numBasis); for (size_t i = iterationNum; i != limit; i--) { int translatedPosition = (i + (numBasis - 1)) % numBasis; rho[iterationNum - i] = 1.0 / arma::dot(y.slice(translatedPosition), s.slice(translatedPosition)); alpha[iterationNum - i] = rho[iterationNum - i] * arma::dot(s.slice(translatedPosition), searchDirection); searchDirection -= alpha[iterationNum - i] * y.slice(translatedPosition); } searchDirection *= scalingFactor; for (size_t i = limit; i < iterationNum; i++) { int translatedPosition = i % numBasis; double beta = rho[iterationNum - i - 1] * arma::dot(y.slice(translatedPosition), searchDirection); searchDirection += (alpha[iterationNum - i - 1] - beta) * s.slice(translatedPosition); } // Negate the search direction so that it is a descent direction. searchDirection *= -1; } /** * Update the y and s matrices, which store the differences between * the iterate and old iterate and the differences between the gradient and the * old gradient, respectively. * * @param iterationNum Iteration number * @param iterate Current point * @param oldIterate Point at last iteration * @param gradient Gradient at current point (iterate) * @param oldGradient Gradient at last iteration point (oldIterate) */ template void L_BFGS::UpdateBasisSet(const size_t iterationNum, const arma::mat& iterate, const arma::mat& oldIterate, const arma::mat& gradient, const arma::mat& oldGradient) { // Overwrite a certain position instead of pushing everything in the vector // back one position. int overwritePos = iterationNum % numBasis; s.slice(overwritePos) = iterate - oldIterate; y.slice(overwritePos) = gradient - oldGradient; } /** * Return the point where the lowest function value has been found. * * @return arma::vec representing the point and a double with the function * value at that point. */ template inline const std::pair& L_BFGS::MinPointIterate() const { return minPointIterate; } template inline double L_BFGS::Optimize(arma::mat& iterate) { return Optimize(iterate, maxIterations); } /** * Use L_BFGS to optimize the given function, starting at the given iterate * point and performing no more than the specified number of maximum iterations. * The given starting point will be modified to store the finishing point of the * algorithm. * * @param numIterations Maximum number of iterations to perform * @param iterate Starting point (will be modified) */ template double L_BFGS::Optimize(arma::mat& iterate, const size_t maxIterations) { // Ensure that the cubes holding past iterations' information are the right // size. Also set the current best point value to the maximum. const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; s.set_size(rows, cols, numBasis); y.set_size(rows, cols, numBasis); minPointIterate.second = std::numeric_limits::max(); // The old iterate to be saved. arma::mat oldIterate; oldIterate.zeros(iterate.n_rows, iterate.n_cols); // Whether to optimize until convergence. bool optimizeUntilConvergence = (maxIterations == 0); // The initial function value. double functionValue = Evaluate(iterate); // The gradient: the current and the old. arma::mat gradient; arma::mat oldGradient; gradient.zeros(iterate.n_rows, iterate.n_cols); oldGradient.zeros(iterate.n_rows, iterate.n_cols); // The search direction. arma::mat searchDirection; searchDirection.zeros(iterate.n_rows, iterate.n_cols); // The initial gradient value. function.Gradient(iterate, gradient); // The main optimization loop. for (size_t itNum = 0; optimizeUntilConvergence || (itNum != maxIterations); ++itNum) { Rcpp::Rcout << "L-BFGS iteration " << itNum << "; objective " << function.Evaluate(iterate) << "." << std::endl; // Break when the norm of the gradient becomes too small. if (GradientNormTooSmall(gradient)) { Rcpp::Rcout << "L-BFGS gradient norm too small (terminating successfully)." << std::endl; break; } // Choose the scaling factor. double scalingFactor = ChooseScalingFactor(itNum, gradient); // Build an approximation to the Hessian and choose the search // direction for the current iteration. SearchDirection(gradient, itNum, scalingFactor, searchDirection); // Save the old iterate and the gradient before stepping. oldIterate = iterate; oldGradient = gradient; // Do a line search and take a step. if (!LineSearch(functionValue, iterate, gradient, searchDirection)) { Rcpp::Rcout << "Line search failed. Stopping optimization." << std::endl; break; // The line search failed; nothing else to try. } // It is possible that the difference between the two coordinates is zero. // In this case we terminate successfully. if (accu(iterate != oldIterate) == 0) { Rcpp::Rcout << "L-BFGS step size of 0 (terminating successfully)." << std::endl; break; } // Overwrite an old basis set. UpdateBasisSet(itNum, iterate, oldIterate, gradient, oldGradient); } // End of the optimization loop. return function.Evaluate(iterate); } // Convert the object to a string. template std::string L_BFGS::ToString() const { std::ostringstream convert; convert << "L_BFGS [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << util::Indent(function.ToString(), 2); convert << " Memory size: " << numBasis << std::endl; convert << " Cube size: " << s.n_rows << "x" << s.n_cols << "x" << s.n_slices << std::endl; convert << " Maximum iterations: " << maxIterations << std::endl; convert << " Armijo condition constant: " << armijoConstant << std::endl; convert << " Wolfe parameter: " << wolfe << std::endl; convert << " Minimum gradient norm: " << minGradientNorm << std::endl; convert << " Minimum step for line search: " << minStep << std::endl; convert << " Maximum step for line search: " << maxStep << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_IMPL_HPP RcppMLPACK/src/mlpack/core/optimizers/sa/0000755000176200001440000000000013647512751017657 5ustar liggesusersRcppMLPACK/src/mlpack/core/optimizers/sa/sa.hpp0000644000176200001440000002163513647512751021002 0ustar liggesusers/** * @file sa.hpp * @author Zhihao Lou * * Simulated Annealing (SA). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SA_SA_HPP #define __MLPACK_CORE_OPTIMIZERS_SA_SA_HPP #include #include "exponential_schedule.hpp" namespace mlpack { namespace optimization { /** * Simulated Annealing is an stochastic optimization algorithm which is able to * deliver near-optimal results quickly without knowing the gradient of the * function being optimized. It has unique hill climbing capability that makes * it less vulnerable to local minima. This implementation uses exponential * cooling schedule and feedback move control by default, but the cooling * schedule can be changed via a template parameter. * * The algorithm keeps the temperature at initial temperature for initMove * steps to get rid of the dependency on the initial condition. After that, it * cools every step until the system is considered frozen or maxIterations is * reached. * * At each step, SA only perturbs one parameter at a time. When SA has perturbed * all parameters in a problem, a sweep has been completed. Every moveCtrlSweep * sweeps, the algorithm does feedback move control to change the average move * size depending on the responsiveness of each parameter. Parameter gain * controls the proportion of the feedback control. * * The system is considered "frozen" when its score fails to change more then * tolerance for maxToleranceSweep consecutive sweeps. * * For SA to work, the FunctionType parameter must implement the following * two methods: * * double Evaluate(const arma::mat& coordinates); * arma::mat& GetInitialPoint(); * * and the CoolingScheduleType parameter must implement the following method: * * double NextTemperature(const double currentTemperature, * const double currentValue); * * which returns the next temperature given current temperature and the value * of the function being optimized. * * @tparam FunctionType objective function type to be minimized. * @tparam CoolingScheduleType type for cooling schedule */ template< typename FunctionType, typename CoolingScheduleType = ExponentialSchedule > class SA { public: /** * Construct the SA optimizer with the given function and parameters. * * @param function Function to be minimized. * @param coolingSchedule Instantiated cooling schedule. * @param maxIterations Maximum number of iterations allowed (0 indicates no limit). * @param initT Initial temperature. * @param initMoves Number of initial iterations without changing temperature. * @param moveCtrlSweep Sweeps per feedback move control. * @param tolerance Tolerance to consider system frozen. * @param maxToleranceSweep Maximum sweeps below tolerance to consider system * frozen. * @param maxMoveCoef Maximum move size. * @param initMoveCoef Initial move size. * @param gain Proportional control in feedback move control. */ SA(FunctionType& function, CoolingScheduleType& coolingSchedule, const size_t maxIterations = 1000000, const double initT = 10000., const size_t initMoves = 1000, const size_t moveCtrlSweep = 100, const double tolerance = 1e-5, const size_t maxToleranceSweep = 3, const double maxMoveCoef = 20, const double initMoveCoef = 0.3, const double gain = 0.3); /** * Optimize the given function using simulated annealing. The given starting * point will be modified to store the finishing point of the algorithm, and * the final objective value is returned. * * @param iterate Starting point (will be modified). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate); //! Get the instantiated function to be optimized. const FunctionType& Function() const { return function; } //! Modify the instantiated function. FunctionType& Function() { return function; } //! Get the temperature. double Temperature() const { return temperature; } //! Modify the temperature. double& Temperature() { return temperature; } //! Get the initial moves. size_t InitMoves() const { return initMoves; } //! Modify the initial moves. size_t& InitMoves() { return initMoves; } //! Get sweeps per move control. size_t MoveCtrlSweep() const { return moveCtrlSweep; } //! Modify sweeps per move control. size_t& MoveCtrlSweep() { return moveCtrlSweep; } //! Get the tolerance. double Tolerance() const { return tolerance; } //! Modify the tolerance. double& Tolerance() { return tolerance; } //! Get the maxToleranceSweep. size_t MaxToleranceSweep() const { return maxToleranceSweep; } //! Modify the maxToleranceSweep. size_t& MaxToleranceSweep() { return maxToleranceSweep; } //! Get the gain. double Gain() const { return gain; } //! Modify the gain. double& Gain() { return gain; } //! Get the maximum number of iterations. size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations. size_t& MaxIterations() { return maxIterations; } //! Get the maximum move size of each parameter. arma::mat MaxMove() const { return maxMove; } //! Modify the maximum move size of each parameter. arma::mat& MaxMove() { return maxMove; } //! Get move size of each parameter. arma::mat MoveSize() const { return moveSize; } //! Modify move size of each parameter. arma::mat& MoveSize() { return moveSize; } //! Return a string representation of this object. std::string ToString() const; private: //! The function to be optimized. FunctionType& function; //! The cooling schedule being used. CoolingScheduleType& coolingSchedule; //! The maximum number of iterations. size_t maxIterations; //! The current temperature. double temperature; //! The number of initial moves before reducing the temperature. size_t initMoves; //! The number of sweeps before a MoveControl() call. size_t moveCtrlSweep; //! Tolerance for convergence. double tolerance; //! Number of sweeps in tolerance before system is considered frozen. size_t maxToleranceSweep; //! Proportional control in feedback move control. double gain; //! Maximum move size of each parameter. arma::mat maxMove; //! Move size of each parameter. arma::mat moveSize; /** * GenerateMove proposes a move on element iterate(idx), and determines if * that move is acceptable or not according to the Metropolis criterion. * After that it increments idx so the next call will make a move on next * parameters. When all elements of the state have been moved (a sweep), it * resets idx and increments sweepCounter. When sweepCounter reaches * moveCtrlSweep, it performs MoveControl() and resets sweepCounter. * * @param iterate Current optimization position. * @param accept Matrix representing which parameters have had accepted moves. * @param energy Current energy of the system. * @param idx Current parameter to modify. * @param sweepCounter Current counter representing how many sweeps have been * completed. */ void GenerateMove(arma::mat& iterate, arma::mat& accept, double& energy, size_t& idx, size_t& sweepCounter); /** * MoveControl() uses a proportional feedback control to determine the size * parameter to pass to the move generation distribution. The target of such * move control is to make the acceptance ratio, accept/nMoves, be as close to * 0.44 as possible. Generally speaking, the larger the move size is, the * larger the function value change of the move will be, and less likely such * move will be accepted by the Metropolis criterion. Thus, the move size is * controlled by * * log(moveSize) = log(moveSize) + gain * (accept/nMoves - target) * * For more theory and the mysterious 0.44 value, see Jimmy K.-C. Lam and * Jean-Marc Delosme. `An efficient simulated annealing schedule: derivation'. * Technical Report 8816, Yale University, 1988. * * @param nMoves Number of moves since last call. * @param accept Matrix representing which parameters have had accepted moves. */ void MoveControl(const size_t nMoves, arma::mat& accept); }; }; // namespace optimization }; // namespace mlpack #include "sa_impl.hpp" #endif RcppMLPACK/src/mlpack/core/optimizers/sa/exponential_schedule.hpp0000644000176200001440000000456313647512751024602 0ustar liggesusers/** * @file exponential_schedule.hpp * @author Zhihao Lou * * Exponential (geometric) cooling schedule used in SA. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SA_EXPONENTIAL_SCHEDULE_HPP #define __MLPACK_CORE_OPTIMIZERS_SA_EXPONENTIAL_SCHEDULE_HPP namespace mlpack { namespace optimization { /** * The exponential cooling schedule cools the temperature T at every step * according to the equation * * \f[ * T_{n+1} = (1-\lambda) T_{n} * \f] * * where \f$ 0<\lambda<1 \f$ is the cooling speed. The smaller \f$ \lambda \f$ * is, the slower the cooling speed, and better the final result will be. Some * literature uses \f$ \alpha = (-1 \lambda) \f$ instead. In practice, * \f$ \alpha \f$ is very close to 1 and will be awkward to input (e.g. * alpha = 0.999999 vs lambda = 1e-6). */ class ExponentialSchedule { public: /* * Construct the ExponentialSchedule with the given parameter. * * @param lambda Cooling speed. */ ExponentialSchedule(const double lambda = 0.001) : lambda(lambda) { } /** * Returns the next temperature given current status. The current system's * energy is not used in this calculation. * * @param currentTemperature Current temperature of system. * @param currentEnergy Current energy of system (not used). */ double NextTemperature( const double currentTemperature, const double /* currentEnergy */) { return (1 - lambda) * currentTemperature; } //! Get the cooling speed, lambda. double Lambda() const { return lambda; } //! Modify the cooling speed, lambda. double& Lambda() { return lambda; } private: //! The cooling speed. double lambda; }; }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/sa/sa_impl.hpp0000644000176200001440000001747313647512751022030 0ustar liggesusers/** * @file sa_impl.hpp * @auther Zhihao Lou * * The implementation of the SA optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SA_SA_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_SA_SA_IMPL_HPP #include namespace mlpack { namespace optimization { template< typename FunctionType, typename CoolingScheduleType > SA::SA( FunctionType& function, CoolingScheduleType& coolingSchedule, const size_t maxIterations, const double initT, const size_t initMoves, const size_t moveCtrlSweep, const double tolerance, const size_t maxToleranceSweep, const double maxMoveCoef, const double initMoveCoef, const double gain) : function(function), coolingSchedule(coolingSchedule), maxIterations(maxIterations), temperature(initT), initMoves(initMoves), moveCtrlSweep(moveCtrlSweep), tolerance(tolerance), maxToleranceSweep(maxToleranceSweep), gain(gain) { const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; maxMove.set_size(rows, cols); maxMove.fill(maxMoveCoef); moveSize.set_size(rows, cols); moveSize.fill(initMoveCoef); } //! Optimize the function (minimize). template< typename FunctionType, typename CoolingScheduleType > double SA::Optimize(arma::mat &iterate) { const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; size_t frozenCount = 0; double energy = function.Evaluate(iterate); double oldEnergy = energy; math::RandomSeed(std::time(NULL)); size_t idx = 0; size_t sweepCounter = 0; arma::mat accept(rows, cols); accept.zeros(); // Initial moves to get rid of dependency of initial states. for (size_t i = 0; i < initMoves; ++i) GenerateMove(iterate, accept, energy, idx, sweepCounter); // Iterating and cooling. for (size_t i = 0; i != maxIterations; ++i) { oldEnergy = energy; GenerateMove(iterate, accept, energy, idx, sweepCounter); temperature = coolingSchedule.NextTemperature(temperature, energy); // Determine if the optimization has entered (or continues to be in) a // frozen state. if (std::abs(energy - oldEnergy) < tolerance) ++frozenCount; else frozenCount = 0; // Terminate, if possible. if (frozenCount >= maxToleranceSweep * moveCtrlSweep * iterate.n_elem) { Rcpp::Rcout << "SA: minimized within tolerance " << tolerance << " for " << maxToleranceSweep << " sweeps after " << i << " iterations; " << "terminating optimization." << std::endl; return energy; } } Rcpp::Rcout << "SA: maximum iterations (" << maxIterations << ") reached; " << "terminating optimization." << std::endl; return energy; } /** * GenerateMove proposes a move on element iterate(idx), and determines * it that move is acceptable or not according to the Metropolis criterion. * After that it increments idx so next call will make a move on next * parameters. When all elements of the state has been moved (a sweep), it * resets idx and increments sweepCounter. When sweepCounter reaches * moveCtrlSweep, it performs moveControl and resets sweepCounter. */ template< typename FunctionType, typename CoolingScheduleType > void SA::GenerateMove( arma::mat& iterate, arma::mat& accept, double& energy, size_t& idx, size_t& sweepCounter) { const double prevEnergy = energy; const double prevValue = iterate(idx); // It is possible to use a non-Laplace distribution here, but it is difficult // because the acceptance ratio should be as close to 0.44 as possible, and // MoveControl() is derived for the Laplace distribution. // Sample from a Laplace distribution with scale parameter moveSize(idx). const double unif = 2.0 * math::Random() - 1.0; const double move = (unif < 0) ? (moveSize(idx) * std::log(1 + unif)) : (-moveSize(idx) * std::log(1 - unif)); iterate(idx) += move; energy = function.Evaluate(iterate); // According to the Metropolis criterion, accept the move with probability // min{1, exp(-(E_new - E_old) / T)}. const double xi = math::Random(); const double delta = energy - prevEnergy; const double criterion = std::exp(-delta / temperature); if (delta <= 0. || criterion > xi) { accept(idx) += 1.; } else // Reject the move; restore previous state. { iterate(idx) = prevValue; energy = prevEnergy; } ++idx; if (idx == iterate.n_elem) // Finished with a sweep. { idx = 0; ++sweepCounter; } if (sweepCounter == moveCtrlSweep) // Do MoveControl(). { MoveControl(moveCtrlSweep, accept); sweepCounter = 0; } } /** * MoveControl() uses a proportional feedback control to determine the size * parameter to pass to the move generation distribution. The target of such * move control is to make the acceptance ratio, accept/nMoves, be as close to * 0.44 as possible. Generally speaking, the larger the move size is, the larger * the function value change of the move will be, and less likely such move will * be accepted by the Metropolis criterion. Thus, the move size is controlled by * * log(moveSize) = log(moveSize) + gain * (accept/nMoves - target) * * For more theory and the mysterious 0.44 value, see Jimmy K.-C. Lam and * Jean-Marc Delosme. `An efficient simulated annealing schedule: derivation'. * Technical Report 8816, Yale University, 1988. */ template< typename FunctionType, typename CoolingScheduleType > void SA::MoveControl(const size_t nMoves, arma::mat& accept) { arma::mat target; target.copy_size(accept); target.fill(0.44); moveSize = arma::log(moveSize); moveSize += gain * (accept / (double) nMoves - target); moveSize = arma::exp(moveSize); // To avoid the use of element-wise arma::min(), which is only available in // Armadillo after v3.930, we use a for loop here instead. for (size_t i = 0; i < accept.n_elem; ++i) moveSize(i) = (moveSize(i) > maxMove(i)) ? maxMove(i) : moveSize(i); accept.zeros(); } template< typename FunctionType, typename CoolingScheduleType > std::string SA:: ToString() const { std::ostringstream convert; convert << "SA [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << util::Indent(function.ToString(), 2); convert << " Cooling Schedule:" << std::endl; convert << util::Indent(coolingSchedule.ToString(), 2); convert << " Temperature: " << temperature << std::endl; convert << " Initial moves: " << initMoves << std::endl; convert << " Sweeps per move control: " << moveCtrlSweep << std::endl; convert << " Tolerance: " << tolerance << std::endl; convert << " Maximum sweeps below tolerance: " << maxToleranceSweep << std::endl; convert << " Move control gain: " << gain << std::endl; convert << " Maximum iterations: " << maxIterations << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/sgd/0000755000176200001440000000000013647514343020030 5ustar liggesusersRcppMLPACK/src/mlpack/core/optimizers/sgd/test_function.hpp0000644000176200001440000000360513647512751023432 0ustar liggesusers/** * @file test_function.hpp * @author Ryan Curtin * * Very simple test function for SGD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SGD_TEST_FUNCTION_HPP #define __MLPACK_CORE_OPTIMIZERS_SGD_TEST_FUNCTION_HPP #include namespace mlpack { namespace optimization { namespace test { //! Very, very simple test function which is the composite of three other //! functions. It turns out that although this function is very simple, //! optimizing it fully can take a very long time. It seems to take in excess //! of 10 million iterations with a step size of 0.0005. class SGDTestFunction { public: //! Nothing to do for the constructor. SGDTestFunction() { } //! Return 3 (the number of functions). size_t NumFunctions() const { return 3; } //! Get the starting point. arma::mat GetInitialPoint() const { return arma::mat("6; -45.6; 6.2"); } //! Evaluate a function. double Evaluate(const arma::mat& coordinates, const size_t i) const; //! Evaluate the gradient of a function. void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) const; }; }; // namespace test }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/sgd/sgd.hpp0000644000176200001440000001374213647512751021326 0ustar liggesusers/** * @file sgd.hpp * @author Ryan Curtin * * Stochastic Gradient Descent (SGD). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SGD_SGD_HPP #define __MLPACK_CORE_OPTIMIZERS_SGD_SGD_HPP #include namespace mlpack { namespace optimization { /** * Stochastic Gradient Descent is a technique for minimizing a function which * can be expressed as a sum of other functions. That is, suppose we have * * \f[ * f(A) = \sum_{i = 0}^{n} f_i(A) * \f] * * and our task is to minimize \f$ A \f$. Stochastic gradient descent iterates * over each function \f$ f_i(A) \f$, producing the following update scheme: * * \f[ * A_{j + 1} = A_j + \alpha \nabla f_i(A) * \f] * * where \f$ \alpha \f$ is a parameter which specifies the step size. \f$ i \f$ * is chosen according to \f$ j \f$ (the iteration number). The SGD class * supports either scanning through each of the \f$ n \f$ functions \f$ f_i(A) * \f$ linearly, or in a random sequence. The algorithm continues until \f$ j * \f$ reaches the maximum number of iterations -- or when a full sequence of * updates through each of the \f$ n \f$ functions \f$ f_i(A) \f$ produces an * improvement within a certain tolerance \f$ \epsilon \f$. That is, * * \f[ * | f(A_{j + n}) - f(A_j) | < \epsilon. * \f] * * The parameter \f$\epsilon\f$ is specified by the tolerance parameter to the * constructor; \f$n\f$ is specified by the maxIterations parameter. * * This class is useful for data-dependent functions whose objective function * can be expressed as a sum of objective functions operating on an individual * point. Then, SGD considers the gradient of the objective function operating * on an individual point in its update of \f$ A \f$. * * For SGD to work, a DecomposableFunctionType template parameter is required. * This class must implement the following function: * * size_t NumFunctions(); * double Evaluate(const arma::mat& coordinates, const size_t i); * void Gradient(const arma::mat& coordinates, * const size_t i, * arma::mat& gradient); * * NumFunctions() should return the number of functions (\f$n\f$), and in the * other two functions, the parameter i refers to which individual function (or * gradient) is being evaluated. So, for the case of a data-dependent function, * such as NCA (see mlpack::nca::NCA), NumFunctions() should return the number * of points in the dataset, and Evaluate(coordinates, 0) will evaluate the * objective function on the first point in the dataset (presumably, the dataset * is held internally in the DecomposableFunctionType). * * @tparam DecomposableFunctionType Decomposable objective function type to be * minimized. */ template class SGD { public: /** * Construct the SGD optimizer with the given function and parameters. * * @param function Function to be optimized (minimized). * @param stepSize Step size for each iteration. * @param maxIterations Maximum number of iterations allowed (0 means no * limit). * @param tolerance Maximum absolute tolerance to terminate algorithm. * @param shuffle If true, the function order is shuffled; otherwise, each * function is visited in linear order. */ SGD(DecomposableFunctionType& function, const double stepSize = 0.01, const size_t maxIterations = 100000, const double tolerance = 1e-5, const bool shuffle = true); /** * Optimize the given function using stochastic gradient descent. The given * starting point will be modified to store the finishing point of the * algorithm, and the final objective value is returned. * * @param iterate Starting point (will be modified). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate); //! Get the instantiated function to be optimized. const DecomposableFunctionType& Function() const { return function; } //! Modify the instantiated function. DecomposableFunctionType& Function() { return function; } //! Get the step size. double StepSize() const { return stepSize; } //! Modify the step size. double& StepSize() { return stepSize; } //! Get the maximum number of iterations (0 indicates no limit). size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations (0 indicates no limit). size_t& MaxIterations() { return maxIterations; } //! Get the tolerance for termination. double Tolerance() const { return tolerance; } //! Modify the tolerance for termination. double& Tolerance() { return tolerance; } //! Get whether or not the individual functions are shuffled. bool Shuffle() const { return shuffle; } //! Modify whether or not the individual functions are shuffled. bool& Shuffle() { return shuffle; } // convert the obkect into a string std::string ToString() const; private: //! The instantiated function. DecomposableFunctionType& function; //! The step size for each example. double stepSize; //! The maximum number of allowed iterations. size_t maxIterations; //! The tolerance for termination. double tolerance; //! Controls whether or not the individual functions are shuffled when //! iterating. bool shuffle; }; }; // namespace optimization }; // namespace mlpack // Include implementation. #include "sgd_impl.hpp" #endif RcppMLPACK/src/mlpack/core/optimizers/sgd/test_function.cpp0000644000176200001440000000353313647514343023424 0ustar liggesusers/** * @file test_function.cpp * @author Ryan Curtin * * Implementation of very simple test function for stochastic gradient descent * (SGD). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "test_function.hpp" using namespace mlpack; using namespace mlpack::optimization; using namespace mlpack::optimization::test; double SGDTestFunction::Evaluate(const arma::mat& coordinates, const size_t i) const { switch (i) { case 0: return -std::exp(-std::abs(coordinates[0])); case 1: return std::pow(coordinates[1], 2); case 2: return std::pow(coordinates[2], 4) + 3 * std::pow(coordinates[2], 2); default: return 0; } } void SGDTestFunction::Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) const { gradient.zeros(3); switch (i) { case 0: if (coordinates[0] >= 0) gradient[0] = std::exp(-coordinates[0]); else gradient[0] = -std::exp(coordinates[1]); break; case 1: gradient[1] = 2 * coordinates[1]; break; case 2: gradient[2] = 4 * std::pow(coordinates[2], 3) + 6 * coordinates[2]; break; } } RcppMLPACK/src/mlpack/core/optimizers/sgd/sgd_impl.hpp0000644000176200001440000001100513647512751022335 0ustar liggesusers/** * @file sgd_impl.hpp * @author Ryan Curtin * * Implementation of stochastic gradient descent. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SGD_SGD_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_SGD_SGD_IMPL_HPP #include // In case it hasn't been included yet. #include "sgd.hpp" namespace mlpack { namespace optimization { template SGD::SGD(DecomposableFunctionType& function, const double stepSize, const size_t maxIterations, const double tolerance, const bool shuffle) : function(function), stepSize(stepSize), maxIterations(maxIterations), tolerance(tolerance), shuffle(shuffle) { /* Nothing to do. */ } //! Optimize the function (minimize). template double SGD::Optimize(arma::mat& iterate) { // Find the number of functions to use. const size_t numFunctions = function.NumFunctions(); // This is used only if shuffle is true. arma::vec visitationOrder; if (shuffle) visitationOrder = arma::shuffle(arma::linspace(0, (numFunctions - 1), numFunctions)); // To keep track of where we are and how things are going. size_t currentFunction = 0; double overallObjective = 0; double lastObjective = DBL_MAX; // Calculate the first objective function. for (size_t i = 0; i < numFunctions; ++i) overallObjective += function.Evaluate(iterate, i); // Now iterate! arma::mat gradient(iterate.n_rows, iterate.n_cols); for (size_t i = 1; i != maxIterations; ++i, ++currentFunction) { // Is this iteration the start of a sequence? if ((currentFunction % numFunctions) == 0) { // Output current objective function. Rcpp::Rcout << "SGD: iteration " << i << ", objective " << overallObjective << "." << std::endl; if (overallObjective != overallObjective) { Rcpp::Rcout << "SGD: converged to " << overallObjective << "; terminating" << " with failure. Try a smaller step size?" << std::endl; return overallObjective; } if (std::abs(lastObjective - overallObjective) < tolerance) { Rcpp::Rcout << "SGD: minimized within tolerance " << tolerance << "; " << "terminating optimization." << std::endl; return overallObjective; } // Reset the counter variables. lastObjective = overallObjective; overallObjective = 0; currentFunction = 0; if (shuffle) // Determine order of visitation. visitationOrder = arma::shuffle(visitationOrder); } // Evaluate the gradient for this iteration. function.Gradient(iterate, currentFunction, gradient); // And update the iterate. iterate -= stepSize * gradient; // Now add that to the overall objective function. overallObjective += function.Evaluate(iterate, currentFunction); } Rcpp::Rcout << "SGD: maximum iterations (" << maxIterations << ") reached; " << "terminating optimization." << std::endl; return overallObjective; } // Convert the object to a string. template std::string SGD::ToString() const { std::ostringstream convert; convert << "SGD [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << util::Indent(function.ToString(), 2); convert << " Step size: " << stepSize << std::endl; convert << " Maximum iterations: " << maxIterations << std::endl; convert << " Tolerance: " << tolerance << std::endl; convert << " Shuffle points: " << (shuffle ? "true" : "false") << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/lrsdp/0000755000176200001440000000000013647514343020377 5ustar liggesusersRcppMLPACK/src/mlpack/core/optimizers/lrsdp/lrsdp_function.cpp0000644000176200001440000001246513647514343024144 0ustar liggesusers/** * @file lrsdp_function.cpp * @author Ryan Curtin * @author Abhishek Laddha * * Implementation of the LRSDPFunction class, and also template specializations * for faster execution with the AugLagrangian optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "lrsdp_function.hpp" using namespace mlpack; using namespace mlpack::optimization; LRSDPFunction::LRSDPFunction(const size_t numConstraints, const arma::mat& initialPoint): a(numConstraints), b(numConstraints), initialPoint(initialPoint), aModes(numConstraints) { } double LRSDPFunction::Evaluate(const arma::mat& coordinates) const { return -accu(coordinates * trans(coordinates)); } void LRSDPFunction::Gradient(const arma::mat& /* coordinates */, arma::mat& /* gradient */) const { Rcpp::Rcout << "LRSDP::Gradient() not implemented for arbitrary optimizers!" << std::endl; } double LRSDPFunction::EvaluateConstraint(const size_t index, const arma::mat& coordinates) const { arma::mat rrt = coordinates * trans(coordinates); if (aModes[index] == 0) return trace(a[index] * rrt) - b[index]; else { double value = -b[index]; for (size_t i = 0; i < a[index].n_cols; ++i) value += a[index](2, i) * rrt(a[index](0, i), a[index](1, i)); return value; } } void LRSDPFunction::GradientConstraint(const size_t /* index */, const arma::mat& /* coordinates */, arma::mat& /* gradient */) const { Rcpp::Rcout << "LRSDP::GradientConstraint() not implemented for arbitrary " << "optimizers!" << std::endl; } // Return a string representation of the object. std::string LRSDPFunction::ToString() const { std::stringstream convert; convert << "LRSDPFunction [" << this << "]" << std::endl; convert << " Number of constraints: " << a.size() << std::endl; convert << " Constraint matrix (A_i) size: " << initialPoint.n_rows << "x" << initialPoint.n_cols << std::endl; convert << " A_i modes: " << aModes.t(); convert << " Constraint b_i values: " << b.t(); convert << " Objective matrix (C) size: " << c.n_rows << "x" << c.n_cols << std::endl; return convert.str(); } namespace mlpack { namespace optimization { // Template specializations for function and gradient evaluation. template<> double AugLagrangianFunction::Evaluate( const arma::mat& coordinates) const { // We can calculate the entire objective in a smart way. // L(R, y, s) = Tr(C * (R R^T)) - // sum_{i = 1}^{m} (y_i (Tr(A_i * (R R^T)) - b_i)) + // (sigma / 2) * sum_{i = 1}^{m} (Tr(A_i * (R R^T)) - b_i)^2 // Let's start with the objective: Tr(C * (R R^T)). // Simple, possibly slow solution. arma::mat rrt = coordinates * trans(coordinates); double objective = trace(function.C() * rrt); // Now each constraint. for (size_t i = 0; i < function.B().n_elem; ++i) { // Take the trace subtracted by the b_i. double constraint = -function.B()[i]; if (function.AModes()[i] == 0) { constraint += trace(function.A()[i] * rrt); } else { for (size_t j = 0; j < function.A()[i].n_cols; ++j) { constraint += function.A()[i](2, j) * rrt(function.A()[i](0, j), function.A()[i](1, j)); } } objective -= (lambda[i] * constraint); objective += (sigma / 2) * std::pow(constraint, 2.0); } return objective; } template<> void AugLagrangianFunction::Gradient( const arma::mat& coordinates, arma::mat& gradient) const { // We can calculate the gradient in a smart way. // L'(R, y, s) = 2 * S' * R // with // S' = C - sum_{i = 1}^{m} y'_i A_i // y'_i = y_i - sigma * (Trace(A_i * (R R^T)) - b_i) arma::mat rrt = coordinates * trans(coordinates); arma::mat s = function.C(); for (size_t i = 0; i < function.B().n_elem; ++i) { double constraint = -function.B()[i]; if (function.AModes()[i] == 0) { constraint += trace(function.A()[i] * rrt); } else { for (size_t j = 0; j < function.A()[i].n_cols; ++j) { constraint += function.A()[i](2, j) * rrt(function.A()[i](0, j), function.A()[i](1, j)); } } double y = lambda[i] - sigma * constraint; if (function.AModes()[i] == 0) { s -= (y * function.A()[i]); } else { // We only need to subtract the entries which could be modified. for (size_t j = 0; j < function.A()[i].n_cols; ++j) { s(function.A()[i](0, j), function.A()[i](1, j)) -= y; } } } gradient = 2 * s * coordinates; } }; // namespace optimization }; // namespace mlpack RcppMLPACK/src/mlpack/core/optimizers/lrsdp/lrsdp.cpp0000644000176200001440000000301413647514343022225 0ustar liggesusers/** * @file lrsdp.cpp * @author Ryan Curtin * * An implementation of Monteiro and Burer's formulation of low-rank * semidefinite programs (LR-SDP). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "lrsdp.hpp" using namespace mlpack; using namespace mlpack::optimization; using namespace std; LRSDP::LRSDP(const size_t numConstraints, const arma::mat& initialPoint) : function(numConstraints, initialPoint), augLag(function) { } double LRSDP::Optimize(arma::mat& coordinates) { augLag.Sigma() = 20; augLag.Optimize(coordinates, 1000); return augLag.Function().Evaluate(coordinates); } // Convert the object to a string. std::string LRSDP::ToString() const { std::ostringstream convert; convert << "LRSDP [" << this << "]" << std::endl; convert << " Optimizer: " << util::Indent(augLag.ToString(), 1) << std::endl; return convert.str(); } RcppMLPACK/src/mlpack/core/optimizers/lrsdp/lrsdp.hpp0000644000176200001440000001035013647512751022234 0ustar liggesusers/** * @file lrsdp.hpp * @author Ryan Curtin * * An implementation of Monteiro and Burer's formulation of low-rank * semidefinite programs (LR-SDP). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_HPP #define __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_HPP #include #include #include "lrsdp_function.hpp" namespace mlpack { namespace optimization { /** * LRSDP is the implementation of Monteiro and Burer's formulation of low-rank * semidefinite programs (LR-SDP). This solver uses the augmented Lagrangian * optimizer to solve low-rank semidefinite programs. */ class LRSDP { public: /** * Create an LRSDP to be optimized. The solution will end up being a matrix * of size (rank) x (rows). To construct each constraint and the objective * function, use the functions A(), B(), and C() to set them correctly. * * @param numConstraints Number of constraints in the problem. * @param rank Rank of the solution (<= rows). * @param rows Number of rows in the solution. */ LRSDP(const size_t numConstraints, const arma::mat& initialPoint); /** * Create an LRSDP to be optimized, passing in an already-created * AugLagrangian object. The given initial point should be set to the size * (rows) x (rank), where (rank) is the reduced rank of the problem. * * @param numConstraints Number of constraints in the problem. * @param initialPoint Initial point of the optimization. * @param auglag Pre-initialized AugLagrangian object. */ LRSDP(const size_t numConstraints, const arma::mat& initialPoint, AugLagrangian& augLagrangian); /** * Optimize the LRSDP and return the final objective value. The given * coordinates will be modified to contain the final solution. * * @param coordinates Starting coordinates for the optimization. */ double Optimize(arma::mat& coordinates); //! Return the objective function matrix (C). const arma::mat& C() const { return function.C(); } //! Modify the objective function matrix (C). arma::mat& C() { return function.C(); } //! Return the vector of A matrices (which correspond to the constraints). const std::vector& A() const { return function.A(); } //! Modify the veector of A matrices (which correspond to the constraints). std::vector& A() { return function.A(); } //! Return the vector of modes for the A matrices. const arma::uvec& AModes() const { return function.AModes(); } //! Modify the vector of modes for the A matrices. arma::uvec& AModes() { return function.AModes(); } //! Return the vector of B values. const arma::vec& B() const { return function.B(); } //! Modify the vector of B values. arma::vec& B() { return function.B(); } //! Return the function to be optimized. const LRSDPFunction& Function() const { return function; } //! Modify the function to be optimized. LRSDPFunction& Function() { return function; } //! Return the augmented Lagrangian object. const AugLagrangian& AugLag() const { return augLag; } //! Modify the augmented Lagrangian object. AugLagrangian& AugLag() { return augLag; } //! Return a string representation of the object. std::string ToString() const; private: //! Function to optimize, which the AugLagrangian object holds. LRSDPFunction function; //! The AugLagrangian object which will be used for optimization. AugLagrangian augLag; }; }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/optimizers/lrsdp/lrsdp_function.hpp0000644000176200001440000000724213647512751024147 0ustar liggesusers/** * @file lrsdp_function.hpp * @author Ryan Curtin * @author Abhishek Laddha * * A class that represents the objective function which LRSDP optimizes. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_FUNCTION_HPP #define __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_FUNCTION_HPP #include #include namespace mlpack { namespace optimization { /** * The objective function that LRSDP is trying to optimize. */ class LRSDPFunction { public: /** * Construct the LRSDPFunction with the given initial point and number of * constraints. Set the A, B, and C matrices for each constraint using the * A(), B(), and C() functions. */ LRSDPFunction(const size_t numConstraints, const arma::mat& initialPoint); /** * Evaluate the objective function of the LRSDP (no constraints) at the given * coordinates. */ double Evaluate(const arma::mat& coordinates) const; /** * Evaluate the gradient of the LRSDP (no constraints) at the given * coordinates. */ void Gradient(const arma::mat& coordinates, arma::mat& gradient) const; /** * Evaluate a particular constraint of the LRSDP at the given coordinates. */ double EvaluateConstraint(const size_t index, const arma::mat& coordinates) const; /** * Evaluate the gradient of a particular constraint of the LRSDP at the given * coordinates. */ void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient) const; //! Get the number of constraints in the LRSDP. size_t NumConstraints() const { return b.n_elem; } //! Get the initial point of the LRSDP. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Return the objective function matrix (C). const arma::mat& C() const { return c; } //! Modify the objective function matrix (C). arma::mat& C() { return c; } //! Return the vector of A matrices (which correspond to the constraints). const std::vector& A() const { return a; } //! Modify the veector of A matrices (which correspond to the constraints). std::vector& A() { return a; } //! Return the vector of modes for the A matrices. const arma::uvec& AModes() const { return aModes; } //! Modify the vector of modes for the A matrices. arma::uvec& AModes() { return aModes; } //! Return the vector of B values. const arma::vec& B() const { return b; } //! Modify the vector of B values. arma::vec& B() { return b; } //! Return string representation of object. std::string ToString() const; private: //! Objective function matrix c. arma::mat c; //! A_i for each constraint. std::vector a; //! b_i for each constraint. arma::vec b; //! Initial point. arma::mat initialPoint; //! 1 if entries in matrix, 0 for normal. arma::uvec aModes; }; }; }; #endif // __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_FUNCTION_HPP RcppMLPACK/src/mlpack/core/arma_extend/0000755000176200001440000000000013647512751017336 5ustar liggesusersRcppMLPACK/src/mlpack/core/arma_extend/restrictors.hpp0000644000176200001440000000157113647512751022436 0ustar liggesusers// Modifications to allow u64/s64 in Armadillo when ARMA_64BIT_WORD is not // defined. Only required on Armadillo < 3.6.2. #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #ifndef ARMA_64BIT_WORD template<> struct arma_scalar_only { typedef u64 result; }; template<> struct arma_scalar_only { typedef s64 result; }; template<> struct arma_integral_only { typedef u64 result; }; template<> struct arma_integral_only { typedef s64 result; }; template<> struct arma_unsigned_integral_only { typedef u64 result; }; template<> struct arma_signed_integral_only { typedef s64 result; }; template<> struct arma_signed_only { typedef s64 result; }; #endif #endif RcppMLPACK/src/mlpack/core/arma_extend/glue_ccov_proto.hpp0000644000176200001440000000233113647512751023237 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup glue_ccov //! @{ class glue_ccov { public: template inline static void direct_ccov(Mat& out, const Mat& A, const Mat& B, const uword norm_type); template inline static void direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& A, const Mat< std::complex >& B, const uword norm_type); template inline static void apply(Mat& out, const Glue& X); }; //! @} RcppMLPACK/src/mlpack/core/arma_extend/SpMat_extra_bones.hpp0000644000176200001440000000107113647512751023463 0ustar liggesusers/** * @file SpMat_extra_bones.hpp * @author Ryan Curtin * * Add a batch constructor for SpMat, if the version is older than 3.810.0. */ #if ARMA_VERSION_MAJOR == 3 && ARMA_VERSION_MINOR < 810 template inline SpMat( const Base& locations, const Base& values, const bool sort_locations = true); template inline SpMat( const Base& locations, const Base& values, const uword n_rows, const uword n_cols, const bool sort_locations = true); #endif RcppMLPACK/src/mlpack/core/arma_extend/glue_ccov_meat.hpp0000644000176200001440000000710713647512751023030 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup glue_cov //! @{ template inline void glue_ccov::direct_ccov(Mat& out, const Mat& A, const Mat& B, const uword norm_type) { arma_extra_debug_sigprint(); if(A.is_vec() && B.is_vec()) { arma_debug_check( (A.n_elem != B.n_elem), "ccov(): the number of elements in A and B must match" ); const eT* A_ptr = A.memptr(); const eT* B_ptr = B.memptr(); eT A_acc = eT(0); eT B_acc = eT(0); eT out_acc = eT(0); const uword N = A.n_elem; for(uword i=0; i 1) ? eT(N-1) : eT(1) ) : eT(N); out.set_size(1,1); out[0] = out_acc/norm_val; } else { arma_debug_assert_same_size(A, B, "ccov()"); const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); out = A * trans(B); out -= (sum(A) * trans(sum(B))) / eT(N); out /= norm_val; } } template inline void glue_ccov::direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& A, const Mat< std::complex >& B, const uword norm_type) { arma_extra_debug_sigprint(); typedef typename std::complex eT; if(A.is_vec() && B.is_vec()) { arma_debug_check( (A.n_elem != B.n_elem), "cov(): the number of elements in A and B must match" ); const eT* A_ptr = A.memptr(); const eT* B_ptr = B.memptr(); eT A_acc = eT(0); eT B_acc = eT(0); eT out_acc = eT(0); const uword N = A.n_elem; for(uword i=0; i 1) ? eT(N-1) : eT(1) ) : eT(N); out.set_size(1,1); out[0] = out_acc/norm_val; } else { arma_debug_assert_same_size(A, B, "ccov()"); const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); out = A * trans(conj(B)); out -= (sum(A) * trans(conj(sum(B)))) / eT(N); out /= norm_val; } } template inline void glue_ccov::apply(Mat& out, const Glue& X) { arma_extra_debug_sigprint(); typedef typename T1::elem_type eT; const unwrap_check A_tmp(X.A, out); const unwrap_check B_tmp(X.B, out); const Mat& A = A_tmp.M; const Mat& B = B_tmp.M; const uword norm_type = X.aux_uword; if(&A != &B) { glue_ccov::direct_ccov(out, A, B, norm_type); } else { op_ccov::direct_ccov(out, A, norm_type); } } //! @} RcppMLPACK/src/mlpack/core/arma_extend/SpMat_extra_meat.hpp0000644000176200001440000002120313647512751023302 0ustar liggesusers/** * @file SpMat_extra_meat.hpp * @author Ryan Curtin * * Take the Armadillo batch sparse matrix constructor function from newer * Armadillo versions and port it to versions earlier than 3.810.0. */ #if ARMA_VERSION_MAJOR == 3 && ARMA_VERSION_MINOR < 810 //! Insert a large number of values at once. //! locations.row[0] should be row indices, locations.row[1] should be column indices, //! and values should be the corresponding values. //! If sort_locations is false, then it is assumed that the locations and values //! are already sorted in column-major ordering. template template inline SpMat::SpMat(const Base& locations_expr, const Base& vals_expr, const bool sort_locations) : n_rows(0) , n_cols(0) , n_elem(0) , n_nonzero(0) , vec_state(0) , values(NULL) , row_indices(NULL) , col_ptrs(NULL) { arma_extra_debug_sigprint_this(this); const unwrap locs_tmp( locations_expr.get_ref() ); const Mat& locs = locs_tmp.M; const unwrap vals_tmp( vals_expr.get_ref() ); const Mat& vals = vals_tmp.M; arma_debug_check( (vals.is_vec() == false), "SpMat::SpMat(): given 'values' object is not a vector" ); arma_debug_check((locs.n_cols != vals.n_elem), "SpMat::SpMat(): number of locations is different than number of values"); // If there are no elements in the list, max() will fail. if (locs.n_cols == 0) { init(0, 0); return; } arma_debug_check((locs.n_rows != 2), "SpMat::SpMat(): locations matrix must have two rows"); // Automatically determine size (and check if it's sorted). uvec bounds = arma::max(locs, 1); init(bounds[0] + 1, bounds[1] + 1); // Resize to correct number of elements. mem_resize(vals.n_elem); // Reset column pointers to zero. arrayops::inplace_set(access::rwp(col_ptrs), uword(0), n_cols + 1); bool actually_sorted = true; if(sort_locations == true) { // sort_index() uses std::sort() which may use quicksort... so we better // make sure it's not already sorted before taking an O(N^2) sort penalty. for (uword i = 1; i < locs.n_cols; ++i) { if ((locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) <= locs.at(0, i - 1))) { actually_sorted = false; break; } } if(actually_sorted == false) { // This may not be the fastest possible implementation but it maximizes code reuse. Col abslocs(locs.n_cols); for (uword i = 0; i < locs.n_cols; ++i) { abslocs[i] = locs.at(1, i) * n_rows + locs.at(0, i); } // Now we will sort with sort_index(). uvec sorted_indices = sort_index(abslocs); // Ascending sort. // Now we add the elements in this sorted order. for (uword i = 0; i < sorted_indices.n_elem; ++i) { arma_debug_check((locs.at(0, sorted_indices[i]) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, sorted_indices[i]) >= n_cols), "SpMat::SpMat(): invalid column index"); access::rw(values[i]) = vals[sorted_indices[i]]; access::rw(row_indices[i]) = locs.at(0, sorted_indices[i]); access::rw(col_ptrs[locs.at(1, sorted_indices[i]) + 1])++; } } } if( (sort_locations == false) || (actually_sorted == true) ) { // Now set the values and row indices correctly. // Increment the column pointers in each column (so they are column "counts"). for (uword i = 0; i < vals.n_elem; ++i) { arma_debug_check((locs.at(0, i) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, i) >= n_cols), "SpMat::SpMat(): invalid column index"); // Check ordering in debug mode. if(i > 0) { arma_debug_check ( ( (locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) < locs.at(0, i - 1)) ), "SpMat::SpMat(): out of order points; either pass sort_locations = true, or sort points in column-major ordering" ); arma_debug_check((locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) == locs.at(0, i - 1)), "SpMat::SpMat(): two identical point locations in list"); } access::rw(values[i]) = vals[i]; access::rw(row_indices[i]) = locs.at(0, i); access::rw(col_ptrs[locs.at(1, i) + 1])++; } } // Now fix the column pointers. for (uword i = 0; i <= n_cols; ++i) { access::rw(col_ptrs[i + 1]) += col_ptrs[i]; } } //! Insert a large number of values at once. //! locations.row[0] should be row indices, locations.row[1] should be column indices, //! and values should be the corresponding values. //! If sort_locations is false, then it is assumed that the locations and values //! are already sorted in column-major ordering. //! In this constructor the size is explicitly given. template template inline SpMat::SpMat(const Base& locations_expr, const Base& vals_expr, const uword in_n_rows, const uword in_n_cols, const bool sort_locations) : n_rows(0) , n_cols(0) , n_elem(0) , n_nonzero(0) , vec_state(0) , values(NULL) , row_indices(NULL) , col_ptrs(NULL) { arma_extra_debug_sigprint_this(this); init(in_n_rows, in_n_cols); const unwrap locs_tmp( locations_expr.get_ref() ); const Mat& locs = locs_tmp.M; const unwrap vals_tmp( vals_expr.get_ref() ); const Mat& vals = vals_tmp.M; arma_debug_check( (vals.is_vec() == false), "SpMat::SpMat(): given 'values' object is not a vector" ); arma_debug_check((locs.n_rows != 2), "SpMat::SpMat(): locations matrix must have two rows"); arma_debug_check((locs.n_cols != vals.n_elem), "SpMat::SpMat(): number of locations is different than number of values"); // Resize to correct number of elements. mem_resize(vals.n_elem); // Reset column pointers to zero. arrayops::inplace_set(access::rwp(col_ptrs), uword(0), n_cols + 1); bool actually_sorted = true; if(sort_locations == true) { // sort_index() uses std::sort() which may use quicksort... so we better // make sure it's not already sorted before taking an O(N^2) sort penalty. for (uword i = 1; i < locs.n_cols; ++i) { if ((locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) <= locs.at(0, i - 1))) { actually_sorted = false; break; } } if(actually_sorted == false) { // This may not be the fastest possible implementation but it maximizes code reuse. Col abslocs(locs.n_cols); for (uword i = 0; i < locs.n_cols; ++i) { abslocs[i] = locs.at(1, i) * n_rows + locs.at(0, i); } // Now we will sort with sort_index(). uvec sorted_indices = sort_index(abslocs); // Ascending sort. // Now we add the elements in this sorted order. for (uword i = 0; i < sorted_indices.n_elem; ++i) { arma_debug_check((locs.at(0, sorted_indices[i]) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, sorted_indices[i]) >= n_cols), "SpMat::SpMat(): invalid column index"); access::rw(values[i]) = vals[sorted_indices[i]]; access::rw(row_indices[i]) = locs.at(0, sorted_indices[i]); access::rw(col_ptrs[locs.at(1, sorted_indices[i]) + 1])++; } } } if( (sort_locations == false) || (actually_sorted == true) ) { // Now set the values and row indices correctly. // Increment the column pointers in each column (so they are column "counts"). for (uword i = 0; i < vals.n_elem; ++i) { arma_debug_check((locs.at(0, i) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, i) >= n_cols), "SpMat::SpMat(): invalid column index"); // Check ordering in debug mode. if(i > 0) { arma_debug_check ( ( (locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) < locs.at(0, i - 1)) ), "SpMat::SpMat(): out of order points; either pass sort_locations = true or sort points in column-major ordering" ); arma_debug_check((locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) == locs.at(0, i - 1)), "SpMat::SpMat(): two identical point locations in list"); } access::rw(values[i]) = vals[i]; access::rw(row_indices[i]) = locs.at(0, i); access::rw(col_ptrs[locs.at(1, i) + 1])++; } } // Now fix the column pointers. for (uword i = 0; i <= n_cols; ++i) { access::rw(col_ptrs[i + 1]) += col_ptrs[i]; } } #endif RcppMLPACK/src/mlpack/core/arma_extend/promote_type.hpp0000644000176200001440000000702413647512751022600 0ustar liggesusers// Extra promote_type definitions until 64-bit index support is added to // Armadillo. These aren't necessary on Armadillo > 3.6.1. #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #ifndef ARMA_64BIT_WORD template struct is_promotable, s64> : public is_promotable_ok { typedef std::complex result; }; template struct is_promotable, u64> : public is_promotable_ok { typedef std::complex result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; // float ? template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template struct is_promotable > : public is_promotable_ok { typedef std::complex result; }; template struct is_promotable > : public is_promotable_ok { typedef std::complex result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; // float ? template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; // float ? template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; #endif #endif RcppMLPACK/src/mlpack/core/arma_extend/hdf5_misc.hpp0000644000176200001440000000107613647512751021714 0ustar liggesusers// To hack in u64/s64 support to Armadillo when it is not compiled with // ARMA_64BIT_WORD. namespace hdf5_misc { #if defined(ARMA_USE_HDF5) #if !(defined(ARMA_64BIT_WORD) || defined(ARMA_USE_U64S64)) #if defined(ULLONG_MAX) template<> inline hid_t get_hdf5_type< long long >() { return H5Tcopy(H5T_NATIVE_LLONG); } template<> inline hid_t get_hdf5_type< unsigned long long >() { return H5Tcopy(H5T_NATIVE_ULLONG); } #endif #endif #endif } // namespace hdf5_misc RcppMLPACK/src/mlpack/core/arma_extend/typedef.hpp0000644000176200001440000000202513647512751021506 0ustar liggesusers// Extensions to typedef u64 and s64 until that support is added into // Armadillo. We only need to typedef s64 on Armadillo > 1.2.0. This is not // necessary for Armadillo > 3.6.1. #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #ifndef ARMA_64BIT_WORD // We must typedef both u64 and s64. #if ULONG_MAX >= 0xffffffffffffffff typedef unsigned long u64; typedef long s64; #elif ULLONG_MAX >= 0xffffffffffffffff typedef unsigned long long u64; typedef long long s64; #else #error "don't know how to typedef 'u64' on this system" #endif namespace junk { struct arma_64_elem_size_test { arma_static_check( (sizeof(u64) != 8), ERROR___TYPE_U64_HAS_UNSUPPORTED_SIZE ); arma_static_check( (sizeof(s64) != 8), ERROR___TYPE_S64_HAS_UNSUPPORTED_SIZE ); }; } #endif #endif RcppMLPACK/src/mlpack/core/arma_extend/fn_ccov.hpp0000644000176200001440000000256513647512751021474 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup fn_ccov //! @{ template inline const Op ccov(const Base& X, const uword norm_type = 0) { arma_extra_debug_sigprint(); arma_debug_check( (norm_type > 1), "ccov(): norm_type must be 0 or 1"); return Op(X.get_ref(), norm_type, 0); } template inline const Glue cov(const Base& A, const Base& B, const uword norm_type = 0) { arma_extra_debug_sigprint(); arma_debug_check( (norm_type > 1), "ccov(): norm_type must be 0 or 1"); return Glue(A.get_ref(), B.get_ref(), norm_type); } //! @} RcppMLPACK/src/mlpack/core/arma_extend/fn_inplace_reshape.hpp0000644000176200001440000000251413647512751023656 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup fn_inplace_reshape //! @{ /** * This does not handle column vectors or row vectors entirely correctly. You * should be able to do multiplication or other basic operations with the * resulting matrix, but it may have other problems. So if you are using this * on vectors (arma::Col<> or arma::Row<>), be careful, and be warned that * bizarre behavior may occur. */ template inline Mat& inplace_reshape(Mat& X, const uword new_n_rows, const uword new_n_cols) { arma_extra_debug_sigprint(); arma_debug_check((new_n_rows * new_n_cols) != X.n_elem, "inplace_reshape(): cannot add or remove elements"); access::rw(X.n_rows) = new_n_rows; access::rw(X.n_cols) = new_n_cols; return X; } //! @} RcppMLPACK/src/mlpack/core/arma_extend/op_ccov_proto.hpp0000644000176200001440000000212613647512751022723 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup op_cov //! @{ class op_ccov { public: template inline static void direct_ccov(Mat& out, const Mat& X, const uword norm_type); template inline static void direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& X, const uword norm_type); template inline static void apply(Mat& out, const Op& in); }; //! @} RcppMLPACK/src/mlpack/core/arma_extend/arma_extend.hpp0000644000176200001440000000216313647512751022340 0ustar liggesusers/*** * @file arma_extend.hpp * @author Ryan Curtin * * Include Armadillo extensions which currently are not part of the main * Armadillo codebase. * * This will allow the use of the ccov() function (which performs the same * function as cov(trans(X)) but without the cost of computing trans(X)). This * also gives sparse matrix support, if it is necessary. */ #ifndef __MLPACK_CORE_ARMA_EXTEND_ARMA_EXTEND_HPP #define __MLPACK_CORE_ARMA_EXTEND_ARMA_EXTEND_HPP // Add batch constructor for sparse matrix (if version <= 3.810.0). #define ARMA_EXTRA_SPMAT_PROTO mlpack/core/arma_extend/SpMat_extra_bones.hpp #define ARMA_EXTRA_SPMAT_MEAT mlpack/core/arma_extend/SpMat_extra_meat.hpp #include #define NDEBUG 1 namespace arma { // u64/s64 #include "typedef.hpp" #include "traits.hpp" #include "promote_type.hpp" #include "restrictors.hpp" #include "hdf5_misc.hpp" // ccov() #include "op_ccov_proto.hpp" #include "op_ccov_meat.hpp" #include "glue_ccov_proto.hpp" #include "glue_ccov_meat.hpp" #include "fn_ccov.hpp" // inplace_reshape() #include "fn_inplace_reshape.hpp" }; #endif RcppMLPACK/src/mlpack/core/arma_extend/traits.hpp0000644000176200001440000000236013647512751021356 0ustar liggesusers// Extra traits to support u64 and s64 (or, specifically, unsigned long and // long) until that is applied to the Armadillo sources. // This isn't necessary if Armadillo was compiled with 64-bit support, or if // ARMA_USE_U64S64 is enabled, or if Armadillo >= 3.6.2 is used (by default // Armadillo 3.6.2 allows long types). #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #if !(defined(ARMA_64BIT_WORD) || defined(ARMA_USE_U64S64)) template struct is_u64 { static const bool value = false; }; template<> struct is_u64 { static const bool value = true; }; template struct is_s64 { static const bool value = false; }; template<> struct is_s64 { static const bool value = true; }; template<> struct is_supported_elem_type { static const bool value = true; }; template<> struct is_supported_elem_type { static const bool value = true; }; template<> struct is_signed { static const bool value = false; }; #endif #endif RcppMLPACK/src/mlpack/core/arma_extend/op_ccov_meat.hpp0000644000176200001440000000453513647512751022514 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup op_cov //! @{ template inline void op_ccov::direct_ccov(Mat& out, const Mat& A, const uword norm_type) { arma_extra_debug_sigprint(); if(A.is_vec()) { if(A.n_rows == 1) { out = var(trans(A), norm_type); } else { out = var(A, norm_type); } } else { const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); const Col acc = sum(A, 1); out = A * trans(A); out -= (acc * trans(acc)) / eT(N); out /= norm_val; } } template inline void op_ccov::direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& A, const uword norm_type) { arma_extra_debug_sigprint(); typedef typename std::complex eT; if(A.is_vec()) { if(A.n_rows == 1) { const Mat tmp_mat = var(trans(A), norm_type); out.set_size(1,1); out[0] = tmp_mat[0]; } else { const Mat tmp_mat = var(A, norm_type); out.set_size(1,1); out[0] = tmp_mat[0]; } } else { const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); const Col acc = sum(A, 1); out = A * trans(conj(A)); out -= (acc * trans(conj(acc))) / eT(N); out /= norm_val; } } template inline void op_ccov::apply(Mat& out, const Op& in) { arma_extra_debug_sigprint(); typedef typename T1::elem_type eT; const unwrap_check tmp(in.m, out); const Mat& A = tmp.M; const uword norm_type = in.aux_uword_a; op_ccov::direct_ccov(out, A, norm_type); } //! @} RcppMLPACK/src/mlpack/core/util/0000755000176200001440000000000013647514343016023 5ustar liggesusersRcppMLPACK/src/mlpack/core/util/timers.hpp0000644000176200001440000001124713647512751020045 0ustar liggesusers/** * @file timers.hpp * @author Matthew Amidon * @author Marcus Edel * * Timers for MLPACK. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTILITIES_TIMERS_HPP #define __MLPACK_CORE_UTILITIES_TIMERS_HPP #include #include #if defined(__unix__) || defined(__unix) #include // clock_gettime() #include // timeval, gettimeofday() #include // flags like _POSIX_VERSION #elif defined(__MACH__) && defined(__APPLE__) #include // mach_timebase_info, // mach_absolute_time() // TEMPORARY #include // clock_gettime() #include // timeval, gettimeofday() #include // flags like _POSIX_VERSION #elif defined(_WIN32) #include //GetSystemTimeAsFileTime(), // QueryPerformanceFrequency(), // QueryPerformanceCounter() #include //timeval on windows // uint64_t isn't defined on every windows. #if !defined(HAVE_UINT64_T) #if SIZEOF_UNSIGNED_LONG == 8 typedef unsigned long uint64_t; #else typedef unsigned long long uint64_t; #endif // SIZEOF_UNSIGNED_LONG #endif // HAVE_UINT64_T //gettimeofday has no equivalent will need to write extra code for that. #if defined(_MSC_VER) || defined(_MSC_EXTENSIONS) #define DELTA_EPOCH_IN_MICROSECS 11644473600000000Ui64 #else #define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL #endif // _MSC_VER, _MSC_EXTENSIONS #else #error "unknown OS" #endif namespace mlpack { /** * The timer class provides a way for MLPACK methods to be timed. The three * methods contained in this class allow a named timer to be started and * stopped, and its value to be obtained. */ class Timer { public: /** * Start the given timer. If a timer is started, then stopped, then * re-started, then re-stopped, the final value of the timer is the length of * both runs -- that is, MLPACK timers are additive for each time they are * run, and do not reset. * * @note Undefined behavior will occur if a timer is started twice. * * @param name Name of timer to be started. */ static void Start(const std::string& name); /** * Stop the given timer. * * @note Undefined behavior will occur if a timer is started twice. * * @param name Name of timer to be stopped. */ static void Stop(const std::string& name); /** * Get the value of the given timer. * * @param name Name of timer to return value of. */ static timeval Get(const std::string& name); }; class Timers { public: //! Nothing to do for the constructor. Timers() { } /** * Returns a copy of all the timers used via this interface. */ std::map& GetAllTimers(); /** * Returns a copy of the timer specified. * * @param timerName The name of the timer in question. */ timeval GetTimer(const std::string& timerName); /** * Prints the specified timer. If it took longer than a minute to complete * the timer will be displayed in days, hours, and minutes as well. * * @param timerName The name of the timer in question. */ void PrintTimer(const std::string& timerName); /**  * Initializes a timer, available like a normal value specified on  * the command line.  Timers are of type timeval. If a timer is started, then * stopped, then re-started, then stopped, the final timer value will be the * length of both runs of the timer.  *  * @param timerName The name of the timer in question.  */ void StartTimer(const std::string& timerName); /**  * Halts the timer, and replaces it's value with  * the delta time from it's start   *   * @param timerName The name of the timer in question.  */ void StopTimer(const std::string& timerName); private: std::map timers; void FileTimeToTimeVal(timeval* tv); void GetTime(timeval* tv); }; }; // namespace mlpack #endif // __MLPACK_CORE_UTILITIES_TIMERS_HPP RcppMLPACK/src/mlpack/core/util/string_util.cpp0000644000176200001440000000361513647514343021077 0ustar liggesusers/** * @file string_util.cpp * @author Trironk Kiatkungwanglai * @author Ryan Birmingham * * Defines methods useful for formatting output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "string_util.hpp" using namespace mlpack; using namespace mlpack::util; using namespace std; //! A utility function that replaces all all newlines with a number of spaces //! depending on the indentation level. string mlpack::util::Indent(string input, const size_t howManyTabs) { // For each declared... string standardTab = " "; string bigTab = ""; for (size_t ind = 0; ind < howManyTabs; ind++) { // Increase amount tabbed on later lines. bigTab += standardTab; // Add indentation to first line. input.insert(0, 1, ' '); input.insert(0, 1, ' '); } // Create the character sequence to replace all newline characters. std::string tabbedNewline("\n" + bigTab); // Replace all newline characters with the precomputed character sequence. size_t startPos = 0; while ((startPos = input.find("\n", startPos)) != string::npos) { // Don't replace the last newline. if (startPos == input.length() - 1) break; input.replace(startPos, 1, tabbedNewline); startPos += tabbedNewline.length(); } return input; } RcppMLPACK/src/mlpack/core/util/prefixedoutstream.cpp0000644000176200001440000000633113647514343022304 0ustar liggesusers/** * @file prefixedoutstream.cpp * @author Ryan Curtin * @author Matthew Amidon * * Implementation of PrefixedOutStream methods. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include #include #include #include #include #include "prefixedoutstream.hpp" using namespace mlpack::util; /** * These are all necessary because gcc's template mechanism does not seem smart * enough to figure out what I want to pass into operator<< without these. That * may not be the actual case, but it works when these is here. */ PrefixedOutStream& PrefixedOutStream::operator<<(bool val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(short val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(unsigned short val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(int val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(unsigned int val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(long val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(unsigned long val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(float val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(double val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(long double val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(void* val) { BaseLogic(val); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(const char* str) { BaseLogic(str); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(std::string& str) { BaseLogic(str); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(std::streambuf* sb) { BaseLogic(sb); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<( std::ostream& (*pf)(std::ostream&)) { BaseLogic(pf); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<(std::ios& (*pf)(std::ios&)) { BaseLogic(pf); return *this; } PrefixedOutStream& PrefixedOutStream::operator<<( std::ios_base& (*pf) (std::ios_base&)) { BaseLogic(pf); return *this; } RcppMLPACK/src/mlpack/core/util/arma_traits.hpp0000644000176200001440000000600313647512751021042 0ustar liggesusers/** * @file arma_traits.hpp * @author Ryan Curtin * * Some traits used for template metaprogramming (SFINAE) with Armadillo types. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_ARMA_TRAITS_HPP #define __MLPACK_CORE_UTIL_ARMA_TRAITS_HPP // Structs have public members by default (that's why they are chosen over // classes). /** * If value == true, then VecType is some sort of Armadillo vector or subview. * You might use this struct like this: * * @code * // Only accepts VecTypes that are actually Armadillo vector types. * template * void Function(const VecType& argumentA, * typename boost::enable_if >* = 0); * @endcode * * The use of the enable_if object allows the compiler to instantiate Function() * only if VecType is one of the Armadillo vector types. It has a default * argument because it isn't meant to be used in either the function call or the * function body. */ template struct IsVector { const static bool value = false; }; // Commenting out the first template per case, because //Visual Studio doesn't like this instantiaion pattern (error C2910). //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; #if ((ARMA_VERSION_MAJOR >= 10) || \ ((ARMA_VERSION_MAJOR == 9) && (ARMA_VERSION_MINOR >= 869))) // Armadillo 9.869+ has SpSubview_col and SpSubview_row template struct IsVector > { const static bool value = true; }; template struct IsVector > { const static bool value = true; }; #else // fallback for older Armadillo versions template struct IsVector > { const static bool value = true; }; #endif #endif RcppMLPACK/src/mlpack/core/util/option.cpp0000644000176200001440000000326213647514343020042 0ustar liggesusers/** * @file option.cpp * @author Ryan Curtin * * Implementation of the ProgramDoc class. The class registers itself with CLI * when constructed. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "cli.hpp" #include "option.hpp" #include using namespace mlpack; using namespace mlpack::util; using namespace std; /** * Construct a ProgramDoc object. When constructed, it will register itself * with CLI. A fatal error will be thrown if more than one is constructed. * * @param programName Short string representing the name of the program. * @param documentation Long string containing documentation on how to use the * program and what it is. No newline characters are necessary; this is * taken care of by CLI later. * @param defaultModule Name of the default module. */ ProgramDoc::ProgramDoc(const std::string& programName, const std::string& documentation) : programName(programName), documentation(documentation) { // Register this with CLI. CLI::RegisterProgramDoc(this); } RcppMLPACK/src/mlpack/core/util/prefixedoutstream_impl.hpp0000644000176200001440000001062513647512751023334 0ustar liggesusers/** * @file prefixedoutstream.hpp * @author Ryan Curtin * @author Matthew Amidon * * Implementation of templated PrefixedOutStream member functions. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_IMPL_HPP #define __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_IMPL_HPP // Just in case it hasn't been included. #include "prefixedoutstream.hpp" #include namespace mlpack { namespace util { template PrefixedOutStream& PrefixedOutStream::operator<<(const T& s) { CallBaseLogic(s); return *this; } //! This handles forwarding all primitive types transparently template void PrefixedOutStream::CallBaseLogic(const T& s, typename boost::disable_if< boost::is_class >::type*) { BaseLogic(s); } // Forward all objects that do not implement a ToString() method transparently template void PrefixedOutStream::CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type*, typename boost::disable_if< HasToString >::type*) { BaseLogic(s); } // Call ToString() on all objects that implement ToString() before forwarding template void PrefixedOutStream::CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type*, typename boost::enable_if< HasToString >::type*) { std::string result = s.ToString(); BaseLogic(result); } template void PrefixedOutStream::BaseLogic(const T& val) { // We will use this to track whether or not we need to terminate at the end of // this call (only for streams which terminate after a newline). bool newlined = false; std::string line; // If we need to, output the prefix. PrefixIfNeeded(); std::ostringstream convert; convert << val; if(convert.fail()) { PrefixIfNeeded(); if (!ignoreInput) { destination << "Failed lexical_cast(T) for output; output" " not shown." << std::endl; newlined = true; } } else { line = convert.str(); // If the length of the casted thing was 0, it may have been a stream // manipulator, so send it directly to the stream and don't ask questions. if (line.length() == 0) { // The prefix cannot be necessary at this point. if (!ignoreInput) // Only if the user wants it. destination << val; return; } // Now, we need to check for newlines in this line. If we find one, output // up until the newline, then output the newline and the prefix and continue // looking. size_t nl; size_t pos = 0; while ((nl = line.find('\n', pos)) != std::string::npos) { PrefixIfNeeded(); // Only output if the user wants it. if (!ignoreInput) { destination << line.substr(pos, nl - pos); destination << std::endl; newlined = true; } carriageReturned = true; // Regardless of whether or not we display it. pos = nl + 1; } if (pos != line.length()) // We need to display the rest. { PrefixIfNeeded(); if (!ignoreInput) destination << line.substr(pos); } } // If we displayed a newline and we need to terminate afterwards, do that. if (fatal && newlined) exit(1); } // This is an inline function (that is why it is here and not in .cc). void PrefixedOutStream::PrefixIfNeeded() { // If we need to, output a prefix. if (carriageReturned) { if (!ignoreInput) // But only if we are allowed to. destination << prefix; carriageReturned = false; // Denote that the prefix has been displayed. } } }; // namespace util }; // namespace mlpack #endif // MLPACK_CLI_PREFIXEDOUTSTREAM_IMPL_H RcppMLPACK/src/mlpack/core/util/prefixedoutstream.hpp0000644000176200001440000001423713647512751022316 0ustar liggesusers/** * @file prefixedoutstream.hpp * @author Ryan Curtin * @author Matthew Amidon * * Declaration of the PrefixedOutStream class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_HPP #define __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_HPP #include #include #include #include #include #include #include #include #include namespace mlpack { namespace util { /** * Allows us to output to an ostream with a prefix at the beginning of each * line, in the same way we would output to cout or cerr. The prefix is * specified in the constructor (as well as the destination ostream). A newline * must be passed to the stream, and then the prefix will be prepended to the * next line. For example, * * @code * PrefixedOutStream outstr(std::cout, "[TEST] "); * outstr << "Hello world I like " << 7.5; * outstr << "...Continue" << std::endl; * outstr << "After the CR\n" << std::endl; * @endcode * * would give, on std::cout, * * @code * [TEST] Hello world I like 7.5...Continue * [TEST] After the CR * [TEST] * @endcode * * These objects are used for the mlpack::Log levels (DEBUG, INFO, WARN, and * FATAL). */ class PrefixedOutStream { public: /** * Set up the PrefixedOutStream. * * @param destination ostream which receives output from this object. * @param prefix The prefix to prepend to each line. */ PrefixedOutStream(std::ostream& destination, const char* prefix, bool ignoreInput = false, bool fatal = false) : destination(destination), ignoreInput(ignoreInput), prefix(prefix), // We want the first call to operator<< to prefix the prefix so we set // carriageReturned to true. carriageReturned(true), fatal(fatal) { /* nothing to do */ } //! Write a bool to the stream. PrefixedOutStream& operator<<(bool val); //! Write a short to the stream. PrefixedOutStream& operator<<(short val); //! Write an unsigned short to the stream. PrefixedOutStream& operator<<(unsigned short val); //! Write an int to the stream. PrefixedOutStream& operator<<(int val); //! Write an unsigned int to the stream. PrefixedOutStream& operator<<(unsigned int val); //! Write a long to the stream. PrefixedOutStream& operator<<(long val); //! Write an unsigned long to the stream. PrefixedOutStream& operator<<(unsigned long val); //! Write a float to the stream. PrefixedOutStream& operator<<(float val); //! Write a double to the stream. PrefixedOutStream& operator<<(double val); //! Write a long double to the stream. PrefixedOutStream& operator<<(long double val); //! Write a void pointer to the stream. PrefixedOutStream& operator<<(void* val); //! Write a character array to the stream. PrefixedOutStream& operator<<(const char* str); //! Write a string to the stream. PrefixedOutStream& operator<<(std::string& str); //! Write a streambuf to the stream. PrefixedOutStream& operator<<(std::streambuf* sb); //! Write an ostream manipulator function to the stream. PrefixedOutStream& operator<<(std::ostream& (*pf)(std::ostream&)); //! Write an ios manipulator function to the stream. PrefixedOutStream& operator<<(std::ios& (*pf)(std::ios&)); //! Write an ios_base manipulator function to the stream. PrefixedOutStream& operator<<(std::ios_base& (*pf)(std::ios_base&)); //! Write anything else to the stream. template PrefixedOutStream& operator<<(const T& s); //! The output stream that all data is to be sent too; example: std::cout. std::ostream& destination; //! Discards input, prints nothing if true. bool ignoreInput; private: HAS_MEM_FUNC(ToString, HasToString) //! This handles forwarding all primitive types transparently template void CallBaseLogic(const T& s, typename boost::disable_if< boost::is_class >::type* = 0); //! Forward all objects that do not implement a ToString() method template void CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type* = 0, typename boost::disable_if< HasToString >::type* = 0); //! Call ToString() on all objects that implement ToString() before forwarding template void CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type* = 0, typename boost::enable_if< HasToString >::type* = 0); /** * @brief Conducts the base logic required in all the operator << overloads. * Mostly just a good idea to reduce copy-pasta. * * @tparam T The type of the data to output. * @param val The The data to be output. */ template void BaseLogic(const T& val); /** * Output the prefix, but only if we need to and if we are allowed to. */ inline void PrefixIfNeeded(); //! Contains the prefix we must prepend to each line. std::string prefix; //! If true, the previous call to operator<< encountered a CR, and a prefix //! will be necessary. bool carriageReturned; //! If true, the application will terminate with an error code when a CR is //! encountered. bool fatal; }; }; // namespace util }; // namespace mlpack // Template definitions. #include "prefixedoutstream_impl.hpp" #endif RcppMLPACK/src/mlpack/core/util/option.hpp0000644000176200001440000000774713647512751020064 0ustar liggesusers/** * @file option.hpp * @author Matthew Amidon * * Definition of the Option class, which is used to define parameters which are * used by CLI. The ProgramDoc class also resides here. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_OPTION_HPP #define __MLPACK_CORE_UTIL_OPTION_HPP #include #include "cli.hpp" namespace mlpack { namespace util { /** * A static object whose constructor registers a parameter with the CLI class. * This should not be used outside of CLI itself, and you should use the * PARAM_FLAG(), PARAM_DOUBLE(), PARAM_INT(), PARAM_STRING(), or other similar * macros to declare these objects instead of declaring them directly. * * @see core/io/cli.hpp, mlpack::CLI */ template class Option { public: /** * Construct an Option object. When constructed, it will register * itself with CLI. * * @param ignoreTemplate Whether or not the template type matters for this * option. Essentially differs options with no value (flags) from those * that do, and thus require a type. * @param defaultValue Default value this parameter will be initialized to. * @param identifier The name of the option (no dashes in front; for --help, * we would pass "help"). * @param description A short string describing the option. * @param parent Full pathname of the parent module that "owns" this option. * The default is the root node (an empty string). * @param required Whether or not the option is required at runtime. */ Option(bool ignoreTemplate, N defaultValue, const std::string& identifier, const std::string& description, const std::string& parent = std::string(""), bool required = false); /** * Constructs an Option object. When constructed, it will register a flag * with CLI. * * @param identifier The name of the option (no dashes in front); for --help * we would pass "help". * @param description A short string describing the option. * @param parent Full pathname of the parent module that "owns" this option. * The default is the root node (an empty string). */ Option(const std::string& identifier, const std::string& description, const std::string& parent = std::string("")); }; /** * A static object whose constructor registers program documentation with the * CLI class. This should not be used outside of CLI itself, and you should use * the PROGRAM_INFO() macro to declare these objects. Only one ProgramDoc * object should ever exist. * * @see core/io/cli.hpp, mlpack::CLI */ class ProgramDoc { public: /** * Construct a ProgramDoc object. When constructed, it will register itself * with CLI. * * @param programName Short string representing the name of the program. * @param documentation Long string containing documentation on how to use the * program and what it is. No newline characters are necessary; this is * taken care of by CLI later. */ ProgramDoc(const std::string& programName, const std::string& documentation); //! The name of the program. std::string programName; //! Documentation for what the program does. std::string documentation; }; }; // namespace util }; // namespace mlpack // For implementations of templated functions #include "option_impl.hpp" #endif RcppMLPACK/src/mlpack/core/util/version.hpp0000644000176200001440000000270513647512751020226 0ustar liggesusers/** * @file version.hpp * @author Ryan Curtin * * The current version of mlpack, available as macros and as a string. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_VERSION_HPP #define __MLPACK_CORE_UTIL_VERSION_HPP #include // The version of mlpack. If this is svn trunk, this will be a version with // higher number than the most recent release. #define __MLPACK_VERSION_MAJOR 1 #define __MLPACK_VERSION_MINOR 0 #define __MLPACK_VERSION_PATCH 9 // The name of the version (for use by --version). namespace mlpack { namespace util { /** * This will return either "mlpack x.y.z" or "mlpack trunk-rXXXXX" depending on * whether or not this is a stable version of mlpack or an svn revision. */ std::string GetVersion(); }; // namespace util }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/util/sfinae_utility.hpp0000644000176200001440000000523513647512751021572 0ustar liggesusers/** * @file sfinae_utility.hpp * @author Trironk Kiatkungwanglai * * This file contains macro utilities for the SFINAE Paradigm. These utilities * determine if classes passed in as template parameters contain members at * compile time, which is useful for changing functionality depending on what * operations an object is capable of performing. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_SFINAE_UTILITY #define __MLPACK_CORE_SFINAE_UTILITY #include #include /* * Constructs a template supporting the SFINAE pattern. * * This macro generates a template struct that is useful for enabling/disabling * a method if the template class passed in contains a member function matching * a given signature with a specified name. * * The generated struct should be used in conjunction with boost::disable_if and * boost::enable_if. Here is an example usage: * * For general references, see: * http://stackoverflow.com/a/264088/391618 * * For an MLPACK specific use case, see /mlpack/core/util/prefixedoutstream.hpp * and /mlpack/core/util/prefixedoutstream_impl.hpp * * @param NAME the name of the struct to construct. For example: HasToString * @param FUNC the name of the function to check for. For example: ToString */ #define HAS_MEM_FUNC(FUNC, NAME) \ template \ struct NAME { \ typedef char yes[1]; \ typedef char no [2]; \ template struct type_check; \ template static yes &chk(type_check *); \ template static no &chk(...); \ static bool const value = sizeof(chk(0)) == sizeof(yes); \ }; #endif RcppMLPACK/src/mlpack/core/util/version.cpp0000644000176200001440000000225313647514343020216 0ustar liggesusers/** * @file version.cpp * @author Ryan Curtin * * The implementation of GetVersion(). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "version.hpp" #include // If we are not an svn revision, just use the macros to assemble the version // name. std::string mlpack::util::GetVersion() { #ifndef __MLPACK_SUBVERSION std::stringstream o; o << "mlpack " << __MLPACK_VERSION_MAJOR << "." << __MLPACK_VERSION_MINOR << "." << __MLPACK_VERSION_PATCH; return o.str(); #else #include "svnversion.hpp" #endif } RcppMLPACK/src/mlpack/core/util/nulloutstream.hpp0000644000176200001440000000546513647512751021465 0ustar liggesusers/** * @file nulloutstream.hpp * @author Ryan Curtin * @author Matthew Amidon * * Definition of the NullOutStream class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_NULLOUTSTREAM_HPP #define __MLPACK_CORE_UTIL_NULLOUTSTREAM_HPP #include #include #include namespace mlpack { namespace util { /** * Used for Log::Debug when not compiled with debugging symbols. This class * does nothing and should be optimized out entirely by the compiler. */ class NullOutStream { public: /** * Does nothing. */ NullOutStream() { } /** * Does nothing. */ NullOutStream(const NullOutStream& /* other */) { } //! Does nothing. NullOutStream& operator<<(bool) { return *this; } //! Does nothing. NullOutStream& operator<<(short) { return *this; } //! Does nothing. NullOutStream& operator<<(unsigned short) { return *this; } //! Does nothing. NullOutStream& operator<<(int) { return *this; } //! Does nothing. NullOutStream& operator<<(unsigned int) { return *this; } //! Does nothing. NullOutStream& operator<<(long) { return *this; } //! Does nothing. NullOutStream& operator<<(unsigned long) { return *this; } //! Does nothing. NullOutStream& operator<<(float) { return *this; } //! Does nothing. NullOutStream& operator<<(double) { return *this; } //! Does nothing. NullOutStream& operator<<(long double) { return *this; } //! Does nothing. NullOutStream& operator<<(void*) { return *this; } //! Does nothing. NullOutStream& operator<<(const char*) { return *this; } //! Does nothing. NullOutStream& operator<<(std::string&) { return *this; } //! Does nothing. NullOutStream& operator<<(std::streambuf*) { return *this; } //! Does nothing. NullOutStream& operator<<(std::ostream& (*) (std::ostream&)) { return *this; } //! Does nothing. NullOutStream& operator<<(std::ios& (*) (std::ios&)) { return *this; } //! Does nothing. NullOutStream& operator<<(std::ios_base& (*) (std::ios_base&)) { return *this; } //! Does nothing. template NullOutStream& operator<<(const T&) { return *this; } }; } // namespace util } // namespace mlpack #endif RcppMLPACK/src/mlpack/core/util/option_impl.hpp0000644000176200001440000000350313647512751021067 0ustar liggesusers/** * @file option_impl.hpp * @author Matthew Amidon * * Implementation of template functions for the Option class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_OPTION_IMPL_HPP #define __MLPACK_CORE_UTIL_OPTION_IMPL_HPP // Just in case it has not been included. #include "option.hpp" namespace mlpack { namespace util { /** * Registers a parameter with CLI. */ template Option::Option(bool ignoreTemplate, N defaultValue, const std::string& identifier, const std::string& description, const std::string& alias, bool required) { if (ignoreTemplate) { CLI::Add(identifier, description, alias, required); } else { CLI::Add(identifier, description, alias, required); CLI::GetParam(identifier) = defaultValue; } } /** * Registers a flag parameter with CLI. */ template Option::Option(const std::string& identifier, const std::string& description, const std::string& alias) { CLI::AddFlag(identifier, description, alias); } }; // namespace util }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/util/timers.cpp0000644000176200001440000001454613647514343020044 0ustar liggesusers/** * @file timers.cpp * @author Matthew Amidon * @author Marcus Edel * * Implementation of timers. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "timers.hpp" #include "cli.hpp" #include "log.hpp" #include #include using namespace mlpack; // On Windows machines, we need to define timersub. #ifdef _WIN32 inline void timersub(const timeval* tvp, const timeval* uvp, timeval* vvp) { vvp->tv_sec = tvp->tv_sec - uvp->tv_sec; vvp->tv_usec = tvp->tv_usec - uvp->tv_usec; if (vvp->tv_usec < 0) { --vvp->tv_sec; vvp->tv_usec += 1000000; } } #endif /** * Start the given timer. */ void Timer::Start(const std::string& name) { CLI::GetSingleton().timer.StartTimer(name); } /** * Stop the given timer. */ void Timer::Stop(const std::string& name) { CLI::GetSingleton().timer.StopTimer(name); } /** * Get the given timer. */ timeval Timer::Get(const std::string& name) { return CLI::GetSingleton().timer.GetTimer(name); } std::map& Timers::GetAllTimers() { return timers; } timeval Timers::GetTimer(const std::string& timerName) { std::string name(timerName); return timers[name]; } void Timers::PrintTimer(const std::string& timerName) { timeval& t = timers[timerName]; Log::Info << t.tv_sec << "." << std::setw(6) << std::setfill('0') << t.tv_usec << "s"; // Also output convenient day/hr/min/sec. int days = t.tv_sec / 86400; // Integer division rounds down. int hours = (t.tv_sec % 86400) / 3600; int minutes = (t.tv_sec % 3600) / 60; int seconds = (t.tv_sec % 60); // No output if it didn't even take a minute. if (!(days == 0 && hours == 0 && minutes == 0)) { bool output = false; // Denotes if we have output anything yet. Log::Info << " ("; // Only output units if they have nonzero values (yes, a bit tedious). if (days > 0) { Log::Info << days << " days"; output = true; } if (hours > 0) { if (output) Log::Info << ", "; Log::Info << hours << " hrs"; output = true; } if (minutes > 0) { if (output) Log::Info << ", "; Log::Info << minutes << " mins"; output = true; } if (seconds > 0) { if (output) Log::Info << ", "; Log::Info << seconds << "." << std::setw(1) << (t.tv_usec / 100000) << "secs"; output = true; } Log::Info << ")"; } Log::Info << std::endl; } void Timers::GetTime(timeval* tv) { #if defined(__MACH__) && defined(__APPLE__) static mach_timebase_info_data_t info; // If this is the first time we've run, get the timebase. // We can use denom == 0 to indicate that sTimebaseInfo is // uninitialised. if (info.denom == 0) { (void) mach_timebase_info(&info); } // Hope that the multiplication doesn't overflow. uint64_t nsecs = mach_absolute_time() * info.numer / info.denom; tv->tv_sec = nsecs / 1e9; tv->tv_usec = (nsecs / 1e3) - (tv->tv_sec * 1e6); #elif defined(_POSIX_VERSION) #if defined(_POSIX_TIMERS) && (_POSIX_TIMERS > 0) // Get the right clock_id. #if defined(CLOCK_MONOTONIC_PRECISE) static const clockid_t id = CLOCK_MONOTONIC_PRECISE; #elif defined(CLOCK_MONOTONIC_RAW) static const clockid_t id = CLOCK_MONOTONIC_RAW; #elif defined(CLOCK_MONOTONIC) static const clockid_t id = CLOCK_MONOTONIC; #elif defined(CLOCK_REALTIME) static const clockid_t id = CLOCK_REALTIME; #else static const clockid_t id = ((clockid_t) - 1); #endif // CLOCK struct timespec ts; // Returns the current value tp for the specified clock_id. if (clock_gettime(id, &ts) != -1 && id != ((clockid_t) - 1)) { tv->tv_sec = ts.tv_sec; tv->tv_usec = ts.tv_nsec / 1e3; } // Fallback for the clock_gettime function. gettimeofday(tv, NULL); #endif // _POSIX_TIMERS #elif defined(_WIN32) static double frequency = 0.0; static LARGE_INTEGER offset; // If this is the first time we've run, get the frequency. // We use frequency == 0.0 to indicate that // QueryPerformanceFrequency is uninitialised. if (frequency == 0.0) { LARGE_INTEGER pF; if (!QueryPerformanceFrequency(&pF)) { // Fallback for the QueryPerformanceCounter function. FileTimeToTimeVal(tv); } else { QueryPerformanceCounter(&offset); frequency = (double)pF.QuadPart / 1000000.0; } } if (frequency != 0.0) { LARGE_INTEGER pC; // Get the current performance-counter value. QueryPerformanceCounter(&pC); pC.QuadPart -= offset.QuadPart; double microseconds = (double)pC.QuadPart / frequency; pC.QuadPart = microseconds; tv->tv_sec = (long)pC.QuadPart / 1000000; tv->tv_usec = (long)(pC.QuadPart % 1000000); } #endif } void Timers::StartTimer(const std::string& timerName) { timeval tmp; tmp.tv_sec = 0; tmp.tv_usec = 0; GetTime(&tmp); // Check to see if the timer already exists. If it does, we'll subtract the // old value. if (timers.count(timerName) == 1) { timeval tmpDelta; timersub(&tmp, &timers[timerName], &tmpDelta); tmp = tmpDelta; } timers[timerName] = tmp; } #ifdef _WIN32 void Timers::FileTimeToTimeVal(timeval* tv) { FILETIME ftime; uint64_t ptime = 0; // Acquire the file time. GetSystemTimeAsFileTime(&ftime); // Now convert FILETIME to timeval. ptime |= ftime.dwHighDateTime; ptime = ptime << 32; ptime |= ftime.dwLowDateTime; ptime /= 10; ptime -= DELTA_EPOCH_IN_MICROSECS; tv->tv_sec = (long) (ptime / 1000000UL); tv->tv_usec = (long) (ptime % 1000000UL); } #endif // _WIN32 void Timers::StopTimer(const std::string& timerName) { timeval delta, b, a = timers[timerName]; GetTime(&b); // Calculate the delta time. timersub(&b, &a, &delta); timers[timerName] = delta; } RcppMLPACK/src/mlpack/core/util/string_util.hpp0000644000176200001440000000232713647512751021104 0ustar liggesusers/** * @file string_util.hpp * @author Trironk Kiatkungwanglai * @author Ryan Birmingham * * Declares methods that are useful for writing formatting output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_STRING_UTIL_HPP #define __MLPACK_CORE_STRING_UTIL_HPP #include namespace mlpack { namespace util { //! A utility function that replaces all all newlines with a number of spaces //! depending on the indentation level. std::string Indent(std::string input, const size_t howManyTabs = 1); }; // namespace util }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/0000755000176200001440000000000013647514343016005 5ustar liggesusersRcppMLPACK/src/mlpack/core/tree/cover_tree.hpp0000644000176200001440000000226613647512751020662 0ustar liggesusers/** * @file cover_tree.hpp * @author Ryan Curtin * * Includes all the necessary files to use the CoverTree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_HPP #define __MLPACK_CORE_TREE_COVER_TREE_HPP #include "bounds.hpp" #include "cover_tree/cover_tree.hpp" #include "cover_tree/single_tree_traverser.hpp" #include "cover_tree/single_tree_traverser_impl.hpp" #include "cover_tree/dual_tree_traverser.hpp" #include "cover_tree/dual_tree_traverser_impl.hpp" #include "cover_tree/traits.hpp" #endif RcppMLPACK/src/mlpack/core/tree/mrkd_statistic_impl.hpp0000644000176200001440000000641113647512751022566 0ustar liggesusers/** * @file mrkd_statistic_impl.hpp * @author James Cline * * Definition of the statistic for multi-resolution kd-trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_MRKD_STATISTIC_IMPL_HPP #define __MLPACK_CORE_TREE_MRKD_STATISTIC_IMPL_HPP // In case it hasn't already been included. #include "mrkd_statistic.hpp" namespace mlpack { namespace tree { template MRKDStatistic::MRKDStatistic(const TreeType& /* node */) : dataset(NULL), begin(0), count(0), leftStat(NULL), rightStat(NULL), parentStat(NULL) { } /** * This constructor is called when a leaf is created. * * @param dataset Matrix that the tree is being built on. * @param begin Starting index corresponding to this leaf. * @param count Number of points held in this leaf. * template MRKDStatistic::MRKDStatistic(const TreeType& node) : dataset(&dataset), begin(begin), count(count), leftStat(NULL), rightStat(NULL), parentStat(NULL) { centerOfMass = dataset.col(begin); for (size_t i = begin + 1; i < begin + count; ++i) centerOfMass += dataset.col(i); sumOfSquaredNorms = 0.0; for (size_t i = begin; i < begin + count; ++i) sumOfSquaredNorms += arma::norm(dataset.col(i), 2); } ** * This constructor is called when a non-leaf node is created. * This lets you build fast bottom-up statistics when building trees. * * @param dataset Matrix that the tree is being built on. * @param begin Starting index corresponding to this leaf. * @param count Number of points held in this leaf. * @param leftStat MRKDStatistic object of the left child node. * @param rightStat MRKDStatistic object of the right child node. * template MRKDStatistic::MRKDStatistic(const MatType& dataset, const size_t begin, const size_t count, MRKDStatistic& leftStat, MRKDStatistic& rightStat) : dataset(&dataset), begin(begin), count(count), leftStat(&leftStat), rightStat(&rightStat), parentStat(NULL) { sumOfSquaredNorms = leftStat.sumOfSquaredNorms + rightStat.sumOfSquaredNorms; * centerOfMass = ((leftStat.centerOfMass * leftStat.count) + (rightStat.centerOfMass * rightStat.count)) / (leftStat.count + rightStat.count); * centerOfMass = leftStat.centerOfMass + rightStat.centerOfMass; isWhitelistValid = false; leftStat.parentStat = this; rightStat.parentStat = this; } */ }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_MRKD_STATISTIC_IMPL_HPP RcppMLPACK/src/mlpack/core/tree/binary_space_tree/0000755000176200001440000000000013647521523021461 5ustar liggesusersRcppMLPACK/src/mlpack/core/tree/binary_space_tree/mean_split.hpp0000644000176200001440000001202013647512751024323 0ustar liggesusers/** * @file mean_split.hpp * @author Yash Vadalia * @author Ryan Curtin * * Definition of MeanSplit, a class that splits a binary space partitioning tree * node into two parts using the mean of the values in a certain dimension. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_HPP #include namespace mlpack { namespace tree /** Trees and tree-building procedures. */ { /** * A binary space partitioning tree node is split into its left and right child. * The split is done in the dimension that has the maximum width. The points are * divided into two parts based on the mean in this dimension. */ template class MeanSplit { public: /** * Split the node according to the mean value in the dimension with maximum * width. * * @param bound The bound used for this node. * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension This will be filled with the dimension the node is to * be split on. * @param splitCol The index at which the dataset is divided into two parts * after the rearrangement. */ static bool SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol); /** * Split the node according to the mean value in the dimension with maximum * width and return a list of changed indices. * * @param bound The bound used for this node. * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension This will be filled with the dimension the node is * to be split on. * @param splitCol The index at which the dataset is divided into two parts * after the rearrangement. * @param oldFromNew Vector which will be filled with the old positions for * each new point. */ static bool SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol, std::vector& oldFromNew); private: /** * Reorder the dataset into two parts such that they lie on either side of * splitCol. * * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension The dimension to split the node on. * @param splitVal The split in dimension splitDimension is based on this * value. */ static size_t PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal); /** * Reorder the dataset into two parts such that they lie on either side of * splitCol. Also returns a list of changed indices. * * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension The dimension to split the node on. * @param splitVal The split in dimension splitDimension is based on this * value. * @param oldFromNew Vector which will be filled with the old positions for * each new point. */ static size_t PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal, std::vector& oldFromNew); }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "mean_split_impl.hpp" #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/dual_tree_traverser.hpp0000644000176200001440000000674413647512751026251 0ustar liggesusers/** * @file dual_tree_traverser.hpp * @author Ryan Curtin * * Defines the DualTreeTraverser for the BinarySpaceTree tree type. This is a * nested class of BinarySpaceTree which traverses two trees in a depth-first * manner with a given set of rules which indicate the branches which can be * pruned and the order in which to recurse. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_HPP #include #include "binary_space_tree.hpp" namespace mlpack { namespace tree { template template class BinarySpaceTree:: DualTreeTraverser { public: /** * Instantiate the dual-tree traverser with the given rule set. */ DualTreeTraverser(RuleType& rule); /** * Traverse the two trees. This does not reset the number of prunes. * * @param queryNode The query node to be traversed. * @param referenceNode The reference node to be traversed. * @param score The score of the current node combination. */ void Traverse(BinarySpaceTree& queryNode, BinarySpaceTree& referenceNode); //! Get the number of prunes. size_t NumPrunes() const { return numPrunes; } //! Modify the number of prunes. size_t& NumPrunes() { return numPrunes; } //! Get the number of visited combinations. size_t NumVisited() const { return numVisited; } //! Modify the number of visited combinations. size_t& NumVisited() { return numVisited; } //! Get the number of times a node combination was scored. size_t NumScores() const { return numScores; } //! Modify the number of times a node combination was scored. size_t& NumScores() { return numScores; } //! Get the number of times a base case was calculated. size_t NumBaseCases() const { return numBaseCases; } //! Modify the number of times a base case was calculated. size_t& NumBaseCases() { return numBaseCases; } private: //! Reference to the rules with which the trees will be traversed. RuleType& rule; //! The number of prunes. size_t numPrunes; //! The number of node combinations that have been visited during traversal. size_t numVisited; //! The number of times a node combination was scored. size_t numScores; //! The number of times a base case was calculated. size_t numBaseCases; //! Traversal information, held in the class so that it isn't continually //! being reallocated. typename RuleType::TraversalInfoType traversalInfo; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "dual_tree_traverser_impl.hpp" #endif // __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_HPP RcppMLPACK/src/mlpack/core/tree/binary_space_tree/mean_split_impl.hpp0000644000176200001440000001626013647512751025356 0ustar liggesusers/** * @file mean_split_impl.hpp * @author Yash Vadalia * @author Ryan Curtin * * Implementation of class(MeanSplit) to split a binary space partition tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_IMPL_HPP #include "mean_split.hpp" namespace mlpack { namespace tree { template bool MeanSplit::SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol) { splitDimension = data.n_rows; // Indicate invalid. double maxWidth = -1; // Find the split dimension. for (size_t d = 0; d < data.n_rows; d++) { double width = bound[d].Width(); if (width > maxWidth) { maxWidth = width; splitDimension = d; } } if (maxWidth == 0) // All these points are the same. We can't split. return false; // Split in the middle of that dimension. double splitVal = bound[splitDimension].Mid(); // Perform the actual splitting. This will order the dataset such that points // with value in dimension splitDimension less than or equal to splitVal are // on the left of splitCol, and points with value in dimension splitDimension // greater than splitVal are on the right side of splitCol. splitCol = PerformSplit(data, begin, count, splitDimension, splitVal); return true; } template bool MeanSplit::SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol, std::vector& oldFromNew) { splitDimension = data.n_rows; // Indicate invalid. double maxWidth = -1; // Find the split dimension. for (size_t d = 0; d < data.n_rows; d++) { double width = bound[d].Width(); if (width > maxWidth) { maxWidth = width; splitDimension = d; } } if (maxWidth == 0) // All these points are the same. We can't split. return false; // Split in the middle of that dimension. double splitVal = bound[splitDimension].Mid(); // Perform the actual splitting. This will order the dataset such that points // with value in dimension splitDimension less than or equal to splitVal are // on the left of splitCol, and points with value in dimension splitDimension // greater than splitVal are on the right side of splitCol. splitCol = PerformSplit(data, begin, count, splitDimension, splitVal, oldFromNew); return true; } template size_t MeanSplit:: PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal) { // This method modifies the input dataset. We loop both from the left and // right sides of the points contained in this node. The points less than // splitVal should be on the left side of the matrix, and the points greater // than splitVal should be on the right side of the matrix. size_t left = begin; size_t right = begin + count - 1; // First half-iteration of the loop is out here because the termination // condition is in the middle. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; while (left <= right) { // Swap columns. data.swap_cols(left, right); // See how many points on the left are correct. When they are correct, // increase the left counter accordingly. When we encounter one that isn't // correct, stop. We will switch it later. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; // Now see how many points on the right are correct. When they are correct, // decrease the right counter accordingly. When we encounter one that isn't // correct, stop. We will switch it with the wrong point we found in the // previous loop. while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; } return left; } template size_t MeanSplit:: PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal, std::vector& oldFromNew) { // This method modifies the input dataset. We loop both from the left and // right sides of the points contained in this node. The points less than // splitVal should be on the left side of the matrix, and the points greater // than splitVal should be on the right side of the matrix. size_t left = begin; size_t right = begin + count - 1; // First half-iteration of the loop is out here because the termination // condition is in the middle. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; while (left <= right) { // Swap columns. data.swap_cols(left, right); // Update the indices for what we changed. size_t t = oldFromNew[left]; oldFromNew[left] = oldFromNew[right]; oldFromNew[right] = t; // See how many points on the left are correct. When they are correct, // increase the left counter accordingly. When we encounter one that isn't // correct, stop. We will switch it later. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; // Now see how many points on the right are correct. When they are correct, // decrease the right counter accordingly. When we encounter one that isn't // correct, stop. We will switch it with the wrong point we found in the // previous loop. while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; } return left; } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/binary_space_tree.hpp0000644000176200001440000004371613647512751025666 0ustar liggesusers/** * @file binary_space_tree.hpp * * Definition of generalized binary space partitioning tree (BinarySpaceTree). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_HPP #include #include "mean_split.hpp" #include "../statistic.hpp" namespace mlpack { namespace tree /** Trees and tree-building procedures. */ { /** * A binary space partitioning tree, such as a KD-tree or a ball tree. Once the * bound and type of dataset is defined, the tree will construct itself. Call * the constructor with the dataset to build the tree on, and the entire tree * will be built. * * This particular tree does not allow growth, so you cannot add or delete nodes * from it. If you need to add or delete a node, the better procedure is to * rebuild the tree entirely. * * This tree does take one runtime parameter in the constructor, which is the * max leaf size to be used. * * @tparam BoundType The bound used for each node. The valid types of bounds * and the necessary skeleton interface for this class can be found in * bounds/. * @tparam StatisticType Extra data contained in the node. See statistic.hpp * for the necessary skeleton interface. * @tparam MatType The dataset class. * @tparam SplitType The class that partitions the dataset/points at a * particular node into two parts. Its definition decides the way this split * is done. */ template > class BinarySpaceTree { private: //! The left child node. BinarySpaceTree* left; //! The right child node. BinarySpaceTree* right; //! The parent node (NULL if this is the root of the tree). BinarySpaceTree* parent; //! The index of the first point in the dataset contained in this node (and //! its children). size_t begin; //! The number of points of the dataset contained in this node (and its //! children). size_t count; //! The max leaf size. size_t maxLeafSize; //! The bound object for this node. BoundType bound; //! Any extra data contained in the node. StatisticType stat; //! The dimension this node split on if it is a parent. size_t splitDimension; //! The distance from the centroid of this node to the centroid of the parent. double parentDistance; //! The worst possible distance to the furthest descendant, cached to speed //! things up. double furthestDescendantDistance; //! The minimum distance from the center to any edge of the bound. double minimumBoundDistance; //! The dataset. MatType& dataset; public: //! So other classes can use TreeType::Mat. typedef MatType Mat; //! A single-tree traverser for binary space trees; see //! single_tree_traverser.hpp for implementation. template class SingleTreeTraverser; //! A dual-tree traverser for binary space trees; see dual_tree_traverser.hpp. template class DualTreeTraverser; /** * Construct this as the root node of a binary space tree using the given * dataset. This will modify the ordering of the points in the dataset! * * @param data Dataset to create tree from. This will be modified! * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t maxLeafSize = 20); /** * Construct this as the root node of a binary space tree using the given * dataset. This will modify the ordering of points in the dataset! A * mapping of the old point indices to the new point indices is filled. * * @param data Dataset to create tree from. This will be modified! * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, std::vector& oldFromNew, const size_t maxLeafSize = 20); /** * Construct this as the root node of a binary space tree using the given * dataset. This will modify the ordering of points in the dataset! A * mapping of the old point indices to the new point indices is filled, as * well as a mapping of the new point indices to the old point indices. * * @param data Dataset to create tree from. This will be modified! * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param newFromOld Vector which will be filled with the new positions for * each old point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, std::vector& oldFromNew, std::vector& newFromOld, const size_t maxLeafSize = 20); /** * Construct this node on a subset of the given matrix, starting at column * begin and using count points. The ordering of that subset of points * will be modified! This is used for recursive tree-building by the other * constructors which don't specify point indices. * * @param data Dataset to create tree from. This will be modified! * @param begin Index of point to start tree construction with. * @param count Number of points to use to construct tree. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t begin, const size_t count, BinarySpaceTree* parent = NULL, const size_t maxLeafSize = 20); /** * Construct this node on a subset of the given matrix, starting at column * begin_in and using count_in points. The ordering of that subset of points * will be modified! This is used for recursive tree-building by the other * constructors which don't specify point indices. * * A mapping of the old point indices to the new point indices is filled, but * it is expected that the vector is already allocated with size greater than * or equal to (begin_in + count_in), and if that is not true, invalid memory * reads (and writes) will occur. * * @param data Dataset to create tree from. This will be modified! * @param begin Index of point to start tree construction with. * @param count Number of points to use to construct tree. * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, BinarySpaceTree* parent = NULL, const size_t maxLeafSize = 20); /** * Construct this node on a subset of the given matrix, starting at column * begin_in and using count_in points. The ordering of that subset of points * will be modified! This is used for recursive tree-building by the other * constructors which don't specify point indices. * * A mapping of the old point indices to the new point indices is filled, as * well as a mapping of the new point indices to the old point indices. It is * expected that the vector is already allocated with size greater than or * equal to (begin_in + count_in), and if that is not true, invalid memory * reads (and writes) will occur. * * @param data Dataset to create tree from. This will be modified! * @param begin Index of point to start tree construction with. * @param count Number of points to use to construct tree. * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param newFromOld Vector which will be filled with the new positions for * each old point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, std::vector& newFromOld, BinarySpaceTree* parent = NULL, const size_t maxLeafSize = 20); /** * Create a binary space tree by copying the other tree. Be careful! This * can take a long time and use a lot of memory. * * @param other Tree to be replicated. */ BinarySpaceTree(const BinarySpaceTree& other); /** * Deletes this node, deallocating the memory for the children and calling * their destructors in turn. This will invalidate any pointers or references * to any nodes which are children of this one. */ ~BinarySpaceTree(); /** * Find a node in this tree by its begin and count (const). * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param begin The begin() of the node to find. * @param count The count() of the node to find. * @return The found node, or NULL if not found. */ const BinarySpaceTree* FindByBeginCount(size_t begin, size_t count) const; /** * Find a node in this tree by its begin and count. * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param begin The begin() of the node to find. * @param count The count() of the node to find. * @return The found node, or NULL if not found. */ BinarySpaceTree* FindByBeginCount(size_t begin, size_t count); //! Return the bound object for this node. const BoundType& Bound() const { return bound; } //! Return the bound object for this node. BoundType& Bound() { return bound; } //! Return the statistic object for this node. const StatisticType& Stat() const { return stat; } //! Return the statistic object for this node. StatisticType& Stat() { return stat; } //! Return whether or not this node is a leaf (true if it has no children). bool IsLeaf() const; //! Return the max leaf size. size_t MaxLeafSize() const { return maxLeafSize; } //! Modify the max leaf size. size_t& MaxLeafSize() { return maxLeafSize; } //! Fills the tree to the specified level. size_t ExtendTree(const size_t level); //! Gets the left child of this node. BinarySpaceTree* Left() const { return left; } //! Modify the left child of this node. BinarySpaceTree*& Left() { return left; } //! Gets the right child of this node. BinarySpaceTree* Right() const { return right; } //! Modify the right child of this node. BinarySpaceTree*& Right() { return right; } //! Gets the parent of this node. BinarySpaceTree* Parent() const { return parent; } //! Modify the parent of this node. BinarySpaceTree*& Parent() { return parent; } //! Get the split dimension for this node. size_t SplitDimension() const { return splitDimension; } //! Modify the split dimension for this node. size_t& SplitDimension() { return splitDimension; } //! Get the dataset which the tree is built on. const MatType& Dataset() const { return dataset; } //! Modify the dataset which the tree is built on. Be careful! MatType& Dataset() { return dataset; } //! Get the metric which the tree uses. typename BoundType::MetricType Metric() const { return bound.Metric(); } //! Get the centroid of the node and store it in the given vector. void Centroid(arma::vec& centroid) { bound.Centroid(centroid); } //! Return the number of children in this node. size_t NumChildren() const; /** * Return the furthest distance to a point held in this node. If this is not * a leaf node, then the distance is 0 because the node holds no points. */ double FurthestPointDistance() const; /** * Return the furthest possible descendant distance. This returns the maximum * distance from the centroid to the edge of the bound and not the empirical * quantity which is the actual furthest descendant distance. So the actual * furthest descendant distance may be less than what this method returns (but * it will never be greater than this). */ double FurthestDescendantDistance() const; //! Return the minimum distance from the center of the node to any bound edge. double MinimumBoundDistance() const; //! Return the distance from the center of this node to the center of the //! parent node. double ParentDistance() const { return parentDistance; } //! Modify the distance from the center of this node to the center of the //! parent node. double& ParentDistance() { return parentDistance; } /** * Return the specified child (0 will be left, 1 will be right). If the index * is greater than 1, this will return the right child. * * @param child Index of child to return. */ BinarySpaceTree& Child(const size_t child) const; //! Return the number of points in this node (0 if not a leaf). size_t NumPoints() const; /** * Return the number of descendants of this node. For a non-leaf in a binary * space tree, this is the number of points at the descendant leaves. For a * leaf, this is the number of points in the leaf. */ size_t NumDescendants() const; /** * Return the index (with reference to the dataset) of a particular descendant * of this node. The index should be greater than zero but less than the * number of descendants. * * @param index Index of the descendant. */ size_t Descendant(const size_t index) const; /** * Return the index (with reference to the dataset) of a particular point in * this node. This will happily return invalid indices if the given index is * greater than the number of points in this node (obtained with NumPoints()) * -- be careful. * * @param index Index of point for which a dataset index is wanted. */ size_t Point(const size_t index) const; //! Return the minimum distance to another node. double MinDistance(const BinarySpaceTree* other) const { return bound.MinDistance(other->Bound()); } //! Return the maximum distance to another node. double MaxDistance(const BinarySpaceTree* other) const { return bound.MaxDistance(other->Bound()); } //! Return the minimum and maximum distance to another node. math::Range RangeDistance(const BinarySpaceTree* other) const { return bound.RangeDistance(other->Bound()); } //! Return the minimum distance to another point. template double MinDistance(const VecType& point, typename boost::enable_if >::type* = 0) const { return bound.MinDistance(point); } //! Return the maximum distance to another point. template double MaxDistance(const VecType& point, typename boost::enable_if >::type* = 0) const { return bound.MaxDistance(point); } //! Return the minimum and maximum distance to another point. template math::Range RangeDistance(const VecType& point, typename boost::enable_if >::type* = 0) const { return bound.RangeDistance(point); } /** * Returns the dimension this parent's children are split on. */ size_t GetSplitDimension() const; /** * Obtains the number of nodes in the tree, starting with this. */ size_t TreeSize() const; /** * Obtains the number of levels below this node in the tree, starting with * this. */ size_t TreeDepth() const; //! Return the index of the beginning point of this subset. size_t Begin() const { return begin; } //! Modify the index of the beginning point of this subset. size_t& Begin() { return begin; } /** * Gets the index one beyond the last index in the subset. */ size_t End() const; //! Return the number of points in this subset. size_t Count() const { return count; } //! Modify the number of points in this subset. size_t& Count() { return count; } //! Returns false: this tree type does not have self children. static bool HasSelfChildren() { return false; } private: /** * Private copy constructor, available only to fill (pad) the tree to a * specified level. */ BinarySpaceTree(const size_t begin, const size_t count, BoundType bound, StatisticType stat, const int maxLeafSize = 20) : left(NULL), right(NULL), begin(begin), count(count), bound(bound), stat(stat), maxLeafSize(maxLeafSize) { } BinarySpaceTree* CopyMe() { return new BinarySpaceTree(begin, count, bound, stat, maxLeafSize); } /** * Splits the current node, assigning its left and right children recursively. * * @param data Dataset which we are using. */ void SplitNode(MatType& data); /** * Splits the current node, assigning its left and right children recursively. * Also returns a list of the changed indices. * * @param data Dataset which we are using. * @param oldFromNew Vector holding permuted indices. */ void SplitNode(MatType& data, std::vector& oldFromNew); public: /** * Returns a string representation of this object. */ std::string ToString() const; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "binary_space_tree_impl.hpp" #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/single_tree_traverser.hpp0000644000176200001440000000452413647512751026577 0ustar liggesusers/** * @file single_tree_traverser.hpp * @author Ryan Curtin * * A nested class of BinarySpaceTree which traverses the entire tree with a * given set of rules which indicate the branches which can be pruned and the * order in which to recurse. This traverser is a depth-first traverser. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_HPP #include #include "binary_space_tree.hpp" namespace mlpack { namespace tree { template template class BinarySpaceTree:: SingleTreeTraverser { public: /** * Instantiate the single tree traverser with the given rule set. */ SingleTreeTraverser(RuleType& rule); /** * Traverse the tree with the given point. * * @param queryIndex The index of the point in the query set which is being * used as the query point. * @param referenceNode The tree node to be traversed. */ void Traverse(const size_t queryIndex, BinarySpaceTree& referenceNode); //! Get the number of prunes. size_t NumPrunes() const { return numPrunes; } //! Modify the number of prunes. size_t& NumPrunes() { return numPrunes; } private: //! Reference to the rules with which the tree will be traversed. RuleType& rule; //! The number of nodes which have been pruned during traversal. size_t numPrunes; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "single_tree_traverser_impl.hpp" #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/binary_space_tree_impl.hpp0000644000176200001440000005240313647512751026700 0ustar liggesusers/** * @file binary_space_tree_impl.hpp * * Implementation of generalized space partitioning tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_IMPL_HPP // In case it wasn't included already for some reason. #include "binary_space_tree.hpp" //#include //#include #include namespace mlpack { namespace tree { // Each of these overloads is kept as a separate function to keep the overhead // from the two std::vectors out, if possible. template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t maxLeafSize) : left(NULL), right(NULL), parent(NULL), begin(0), /* This root node starts at index 0, */ count(data.n_cols), /* and spans all of the dataset. */ maxLeafSize(maxLeafSize), bound(data.n_rows), parentDistance(0), // Parent distance for the root is 0: it has no parent. dataset(data) { // Do the actual splitting of this node. SplitNode(data); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, std::vector& oldFromNew, const size_t maxLeafSize) : left(NULL), right(NULL), parent(NULL), begin(0), count(data.n_cols), maxLeafSize(maxLeafSize), bound(data.n_rows), parentDistance(0), // Parent distance for the root is 0: it has no parent. dataset(data) { // Initialize oldFromNew correctly. oldFromNew.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) oldFromNew[i] = i; // Fill with unharmed indices. // Now do the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, std::vector& oldFromNew, std::vector& newFromOld, const size_t maxLeafSize) : left(NULL), right(NULL), parent(NULL), begin(0), count(data.n_cols), maxLeafSize(maxLeafSize), bound(data.n_rows), parentDistance(0), // Parent distance for the root is 0: it has no parent. dataset(data) { // Initialize the oldFromNew vector correctly. oldFromNew.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) oldFromNew[i] = i; // Fill with unharmed indices. // Now do the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); // Map the newFromOld indices correctly. newFromOld.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) newFromOld[oldFromNew[i]] = i; } template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t begin, const size_t count, BinarySpaceTree* parent, const size_t maxLeafSize) : left(NULL), right(NULL), parent(parent), begin(begin), count(count), maxLeafSize(maxLeafSize), bound(data.n_rows), dataset(data) { // Perform the actual splitting. SplitNode(data); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, BinarySpaceTree* parent, const size_t maxLeafSize) : left(NULL), right(NULL), parent(parent), begin(begin), count(count), maxLeafSize(maxLeafSize), bound(data.n_rows), dataset(data) { // Hopefully the vector is initialized correctly! We can't check that // entirely but we can do a minor sanity check. assert(oldFromNew.size() == data.n_cols); // Perform the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, std::vector& newFromOld, BinarySpaceTree* parent, const size_t maxLeafSize) : left(NULL), right(NULL), parent(parent), begin(begin), count(count), maxLeafSize(maxLeafSize), bound(data.n_rows), dataset(data) { // Hopefully the vector is initialized correctly! We can't check that // entirely but we can do a minor sanity check. //Log::Assert(oldFromNew.size() == data.n_cols); // Perform the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); // Map the newFromOld indices correctly. newFromOld.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) newFromOld[oldFromNew[i]] = i; } /* template BinarySpaceTree::BinarySpaceTree() : left(NULL), right(NULL), parent(NULL), begin(0), count(0), bound(), stat(), maxLeafSize(20) // Default max leaf size is 20. { // Nothing to do. }*/ /** * Create a binary space tree by copying the other tree. Be careful! This can * take a long time and use a lot of memory. */ template BinarySpaceTree::BinarySpaceTree( const BinarySpaceTree& other) : left(NULL), right(NULL), parent(other.parent), begin(other.begin), count(other.count), maxLeafSize(other.maxLeafSize), bound(other.bound), stat(other.stat), splitDimension(other.splitDimension), parentDistance(other.parentDistance), furthestDescendantDistance(other.furthestDescendantDistance), dataset(other.dataset) { // Create left and right children (if any). if (other.Left()) { left = new BinarySpaceTree(*other.Left()); left->Parent() = this; // Set parent to this, not other tree. } if (other.Right()) { right = new BinarySpaceTree(*other.Right()); right->Parent() = this; // Set parent to this, not other tree. } } /** * Deletes this node, deallocating the memory for the children and calling their * destructors in turn. This will invalidate any pointers or references to any * nodes which are children of this one. */ template BinarySpaceTree:: ~BinarySpaceTree() { if (left) delete left; if (right) delete right; } /** * Find a node in this tree by its begin and count. * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param queryBegin The Begin() of the node to find. * @param queryCount The Count() of the node to find. * @return The found node, or NULL if nothing is found. */ template const BinarySpaceTree* BinarySpaceTree::FindByBeginCount( size_t queryBegin, size_t queryCount) const { //Log::Assert(queryBegin >= begin); //Log::Assert(queryCount <= count); if (begin == queryBegin && count == queryCount) return this; else if (IsLeaf()) return NULL; else if (queryBegin < right->Begin()) return left->FindByBeginCount(queryBegin, queryCount); else return right->FindByBeginCount(queryBegin, queryCount); } /** * Find a node in this tree by its begin and count (const). * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param queryBegin the Begin() of the node to find * @param queryCount the Count() of the node to find * @return the found node, or NULL */ template BinarySpaceTree* BinarySpaceTree::FindByBeginCount( const size_t queryBegin, const size_t queryCount) { //mlpack::Log::Assert(begin >= queryBegin); //mlpack::Log::Assert(count <= queryCount); if (begin == queryBegin && count == queryCount) return this; else if (IsLeaf()) return NULL; else if (queryBegin < left->End()) return left->FindByBeginCount(queryBegin, queryCount); else if (right) return right->FindByBeginCount(queryBegin, queryCount); else return NULL; } template size_t BinarySpaceTree::ExtendTree( size_t level) { --level; // Return the number of nodes duplicated. size_t nodesDuplicated = 0; if (level > 0) { if (!left) { left = CopyMe(); ++nodesDuplicated; } nodesDuplicated += left->ExtendTree(level); if (right) { nodesDuplicated += right->ExtendTree(level); } } return nodesDuplicated; } /* TODO: we can likely calculate this earlier, then store the * result in a private member variable; for now, we can * just calculate as needed... * * Also, perhaps we should rewrite these recursive functions * to avoid exceeding the stack limit */ template size_t BinarySpaceTree:: TreeSize() const { // Recursively count the nodes on each side of the tree. The plus one is // because we have to count this node, too. return 1 + (left ? left->TreeSize() : 0) + (right ? right->TreeSize() : 0); } template size_t BinarySpaceTree:: TreeDepth() const { // Recursively count the depth on each side of the tree. The plus one is // because we have to count this node, too. return 1 + std::max((left ? left->TreeDepth() : 0), (right ? right->TreeDepth() : 0)); } template inline bool BinarySpaceTree:: IsLeaf() const { return !left; } /** * Returns the number of children in this node. */ template inline size_t BinarySpaceTree:: NumChildren() const { if (left && right) return 2; if (left) return 1; return 0; } /** * Return a bound on the furthest point in the node from the centroid. This * returns 0 unless the node is a leaf. */ template inline double BinarySpaceTree:: FurthestPointDistance() const { if (!IsLeaf()) return 0.0; // Otherwise return the distance from the centroid to a corner of the bound. return 0.5 * bound.Diameter(); } /** * Return the furthest possible descendant distance. This returns the maximum * distance from the centroid to the edge of the bound and not the empirical * quantity which is the actual furthest descendant distance. So the actual * furthest descendant distance may be less than what this method returns (but * it will never be greater than this). */ template inline double BinarySpaceTree:: FurthestDescendantDistance() const { return furthestDescendantDistance; } //! Return the minimum distance from the center to any bound edge. template inline double BinarySpaceTree:: MinimumBoundDistance() const { return bound.MinWidth() / 2.0; } /** * Return the specified child. */ template inline BinarySpaceTree& BinarySpaceTree:: Child(const size_t child) const { if (child == 0) return *left; else return *right; } /** * Return the number of points contained in this node. */ template inline size_t BinarySpaceTree:: NumPoints() const { if (left) return 0; return count; } /** * Return the number of descendants contained in the node. */ template inline size_t BinarySpaceTree:: NumDescendants() const { return count; } /** * Return the index of a particular descendant contained in this node. */ template inline size_t BinarySpaceTree:: Descendant(const size_t index) const { return (begin + index); } /** * Return the index of a particular point contained in this node. */ template inline size_t BinarySpaceTree:: Point(const size_t index) const { return (begin + index); } /** * Gets the index one beyond the last index in the series. */ template inline size_t BinarySpaceTree:: End() const { return begin + count; } template void BinarySpaceTree::SplitNode( MatType& data) { // We need to expand the bounds of this node properly. bound |= data.cols(begin, begin + count - 1); // Calculate the furthest descendant distance. furthestDescendantDistance = 0.5 * bound.Diameter(); // Now, check if we need to split at all. if (count <= maxLeafSize) return; // We can't split this. // splitCol denotes the two partitions of the dataset after the split. The // points on its left go to the left child and the others go to the right // child. size_t splitCol; // Split the node. The elements of 'data' are reordered by the splitting // algorithm. This function call updates splitDimension and splitCol. const bool split = SplitType::SplitNode(bound, data, begin, count, splitDimension, splitCol); // The node may not be always split. For instance, if all the points are the // same, we can't split them. if (!split) return; // Now that we know the split column, we will recursively split the children // by calling their constructors (which perform this splitting process). left = new BinarySpaceTree(data, begin, splitCol - begin, this, maxLeafSize); right = new BinarySpaceTree(data, splitCol, begin + count - splitCol, this, maxLeafSize); // Calculate parent distances for those two nodes. arma::vec centroid, leftCentroid, rightCentroid; Centroid(centroid); left->Centroid(leftCentroid); right->Centroid(rightCentroid); const double leftParentDistance = bound.Metric().Evaluate(centroid, leftCentroid); const double rightParentDistance = bound.Metric().Evaluate(centroid, rightCentroid); left->ParentDistance() = leftParentDistance; right->ParentDistance() = rightParentDistance; } template void BinarySpaceTree::SplitNode( MatType& data, std::vector& oldFromNew) { // This should be a single function for Bound. // We need to expand the bounds of this node properly. bound |= data.cols(begin, begin + count - 1); // Calculate the furthest descendant distance. furthestDescendantDistance = 0.5 * bound.Diameter(); // First, check if we need to split at all. if (count <= maxLeafSize) return; // We can't split this. // splitCol denotes the two partitions of the dataset after the split. The // points on its left go to the left child and the others go to the right // child. size_t splitCol; // Split the node. The elements of 'data' are reordered by the splitting // algorithm. This function call updates splitDimension, splitCol and // oldFromNew. const bool split = SplitType::SplitNode(bound, data, begin, count, splitDimension, splitCol, oldFromNew); // The node may not be always split. For instance, if all the points are the // same, we can't split them. if (!split) return; // Now that we know the split column, we will recursively split the children // by calling their constructors (which perform this splitting process). left = new BinarySpaceTree(data, begin, splitCol - begin, oldFromNew, this, maxLeafSize); right = new BinarySpaceTree(data, splitCol, begin + count - splitCol, oldFromNew, this, maxLeafSize); // Calculate parent distances for those two nodes. arma::vec centroid, leftCentroid, rightCentroid; Centroid(centroid); left->Centroid(leftCentroid); right->Centroid(rightCentroid); const double leftParentDistance = bound.Metric().Evaluate(centroid, leftCentroid); const double rightParentDistance = bound.Metric().Evaluate(centroid, rightCentroid); left->ParentDistance() = leftParentDistance; right->ParentDistance() = rightParentDistance; } /** * Returns a string representation of this object. */ template std::string BinarySpaceTree:: ToString() const { std::ostringstream convert; convert << "BinarySpaceTree [" << this << "]" << std::endl; convert << " First point: " << begin << std::endl; convert << " Number of descendants: " << count << std::endl; convert << " Bound: " << std::endl; convert << mlpack::util::Indent(bound.ToString(), 2); convert << " Statistic: " << std::endl; convert << mlpack::util::Indent(stat.ToString(), 2); convert << " Max leaf size: " << maxLeafSize << std::endl; convert << " Split dimension: " << splitDimension << std::endl; // How many levels should we print? This will print the top two tree levels. if (left != NULL && parent == NULL) { convert << " Left child:" << std::endl; convert << mlpack::util::Indent(left->ToString(), 2); } if (right != NULL && parent == NULL) { convert << " Right child:" << std::endl; convert << mlpack::util::Indent(right->ToString(), 2); } return convert.str(); } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/single_tree_traverser_impl.hpp0000644000176200001440000001012613647512751027613 0ustar liggesusers/** * @file single_tree_traverser_impl.hpp * @author Ryan Curtin * * A nested class of BinarySpaceTree which traverses the entire tree with a * given set of rules which indicate the branches which can be pruned and the * order in which to recurse. This traverser is a depth-first traverser. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP // In case it hasn't been included yet. #include "single_tree_traverser.hpp" #include namespace mlpack { namespace tree { template template BinarySpaceTree:: SingleTreeTraverser::SingleTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0) { /* Nothing to do. */ } template template void BinarySpaceTree:: SingleTreeTraverser::Traverse( const size_t queryIndex, BinarySpaceTree& referenceNode) { // If this is a leaf, the base cases have already been run. if (referenceNode.IsLeaf()) return; // If either score is DBL_MAX, we do not recurse into that node. double leftScore = rule.Score(queryIndex, *referenceNode.Left()); // Immediately run the base case if it's not pruned. if ((leftScore != DBL_MAX) && (referenceNode.Left()->IsLeaf())) { for (size_t i = referenceNode.Left()->Begin(); i < referenceNode.Left()->End(); ++i) rule.BaseCase(queryIndex, i); } double rightScore = rule.Score(queryIndex, *referenceNode.Right()); // Immediately run the base case if it's not pruned. if ((rightScore != DBL_MAX) && (referenceNode.Right()->IsLeaf())) { for (size_t i = referenceNode.Right()->Begin(); i < referenceNode.Right()->End(); ++i) rule.BaseCase(queryIndex, i); } if (leftScore < rightScore) { // Recurse to the left. Traverse(queryIndex, *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(queryIndex, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) Traverse(queryIndex, *referenceNode.Right()); // Recurse to the right. else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(queryIndex, *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(queryIndex, *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) Traverse(queryIndex, *referenceNode.Left()); // Recurse to the left. else ++numPrunes; } else // leftScore is equal to rightScore. { if (leftScore == DBL_MAX) { numPrunes += 2; // Pruned both left and right. } else { // Choose the left first. Traverse(queryIndex, *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(queryIndex, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) Traverse(queryIndex, *referenceNode.Right()); else ++numPrunes; } } } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/traits.hpp0000644000176200001440000000421713647512751023507 0ustar liggesusers/** * @file traits.hpp * @author Ryan Curtin * * Specialization of the TreeTraits class for the BinarySpaceTree type of tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_TRAITS_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_TRAITS_HPP #include namespace mlpack { namespace tree { /** * This is a specialization of the TreeType class to the BinarySpaceTree tree * type. It defines characteristics of the binary space tree, and is used to * help write tree-independent (but still optimized) tree-based algorithms. See * mlpack/core/tree/tree_traits.hpp for more information. */ template class TreeTraits > { public: /** * Each binary space tree node has two children which represent * non-overlapping subsets of the space which the node represents. Therefore, * children are not overlapping. */ static const bool HasOverlappingChildren = false; /** * There is no guarantee that the first point in a node is its centroid. */ static const bool FirstPointIsCentroid = false; /** * Points are not contained at multiple levels of the binary space tree. */ static const bool HasSelfChildren = false; /** * Points are rearranged during building of the tree. */ static const bool RearrangesDataset = true; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/binary_space_tree/dual_tree_traverser_impl.hpp0000644000176200001440000002554013647512751027265 0ustar liggesusers/** * @file dual_tree_traverser_impl.hpp * @author Ryan Curtin * * Implementation of the DualTreeTraverser for BinarySpaceTree. This is a way * to perform a dual-tree traversal of two trees. The trees must be the same * type. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP // In case it hasn't been included yet. #include "dual_tree_traverser.hpp" namespace mlpack { namespace tree { template template BinarySpaceTree:: DualTreeTraverser::DualTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0), numVisited(0), numScores(0), numBaseCases(0) { /* Nothing to do. */ } template template void BinarySpaceTree:: DualTreeTraverser::Traverse( BinarySpaceTree& queryNode, BinarySpaceTree& referenceNode) { // Increment the visit counter. ++numVisited; // Store the current traversal info. traversalInfo = rule.TraversalInfo(); // If both are leaves, we must evaluate the base case. if (queryNode.IsLeaf() && referenceNode.IsLeaf()) { // Loop through each of the points in each node. for (size_t query = queryNode.Begin(); query < queryNode.End(); ++query) { // See if we need to investigate this point (this function should be // implemented for the single-tree recursion too). Restore the traversal // information first. rule.TraversalInfo() = traversalInfo; const double childScore = rule.Score(query, referenceNode); if (childScore == DBL_MAX) continue; // We can't improve this particular point. for (size_t ref = referenceNode.Begin(); ref < referenceNode.End(); ++ref) rule.BaseCase(query, ref); numBaseCases += referenceNode.Count(); } } else if ((!queryNode.IsLeaf()) && referenceNode.IsLeaf()) { // We have to recurse down the query node. In this case the recursion order // does not matter. const double leftScore = rule.Score(*queryNode.Left(), referenceNode); ++numScores; if (leftScore != DBL_MAX) Traverse(*queryNode.Left(), referenceNode); else ++numPrunes; // Before recursing, we have to set the traversal information correctly. rule.TraversalInfo() = traversalInfo; const double rightScore = rule.Score(*queryNode.Right(), referenceNode); ++numScores; if (rightScore != DBL_MAX) Traverse(*queryNode.Right(), referenceNode); else ++numPrunes; } else if (queryNode.IsLeaf() && (!referenceNode.IsLeaf())) { // We have to recurse down the reference node. In this case the recursion // order does matter. Before recursing, though, we have to set the // traversal information correctly. double leftScore = rule.Score(queryNode, *referenceNode.Left()); typename RuleType::TraversalInfoType leftInfo = rule.TraversalInfo(); rule.TraversalInfo() = traversalInfo; double rightScore = rule.Score(queryNode, *referenceNode.Right()); numScores += 2; if (leftScore < rightScore) { // Recurse to the left. Restore the left traversal info. Store the right // traversal info. traversalInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(queryNode, *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(queryNode, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = traversalInfo; Traverse(queryNode, *referenceNode.Right()); } else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(queryNode, *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(queryNode, *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) { // Restore the left traversal info. rule.TraversalInfo() = leftInfo; Traverse(queryNode, *referenceNode.Left()); } else ++numPrunes; } else // leftScore is equal to rightScore. { if (leftScore == DBL_MAX) { numPrunes += 2; } else { // Choose the left first. Restore the left traversal info. Store the // right traversal info. traversalInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(queryNode, *referenceNode.Left()); rightScore = rule.Rescore(queryNode, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = traversalInfo; Traverse(queryNode, *referenceNode.Right()); } else ++numPrunes; } } } else { // We have to recurse down both query and reference nodes. Because the // query descent order does not matter, we will go to the left query child // first. Before recursing, we have to set the traversal information // correctly. double leftScore = rule.Score(*queryNode.Left(), *referenceNode.Left()); typename RuleType::TraversalInfoType leftInfo = rule.TraversalInfo(); rule.TraversalInfo() = traversalInfo; double rightScore = rule.Score(*queryNode.Left(), *referenceNode.Right()); typename RuleType::TraversalInfoType rightInfo; numScores += 2; if (leftScore < rightScore) { // Recurse to the left. Restore the left traversal info. Store the right // traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Left(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Left(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Left(), *referenceNode.Right()); } else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(*queryNode.Left(), *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(*queryNode.Left(), *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) { // Restore the left traversal info. rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Left(), *referenceNode.Left()); } else ++numPrunes; } else { if (leftScore == DBL_MAX) { numPrunes += 2; } else { // Choose the left first. Restore the left traversal info and store the // right traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Left(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Left(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal information. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Left(), *referenceNode.Right()); } else ++numPrunes; } } // Restore the main traversal information. rule.TraversalInfo() = traversalInfo; // Now recurse down the right query node. leftScore = rule.Score(*queryNode.Right(), *referenceNode.Left()); leftInfo = rule.TraversalInfo(); rule.TraversalInfo() = traversalInfo; rightScore = rule.Score(*queryNode.Right(), *referenceNode.Right()); numScores += 2; if (leftScore < rightScore) { // Recurse to the left. Restore the left traversal info. Store the right // traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Right(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Right(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Right(), *referenceNode.Right()); } else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(*queryNode.Right(), *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(*queryNode.Right(), *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) { // Restore the left traversal info. rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Right(), *referenceNode.Left()); } else ++numPrunes; } else { if (leftScore == DBL_MAX) { numPrunes += 2; } else { // Choose the left first. Restore the left traversal info. Store the // right traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Right(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Right(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Right(), *referenceNode.Right()); } else ++numPrunes; } } } } }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP RcppMLPACK/src/mlpack/core/tree/rectangle_tree.hpp0000644000176200001440000000313313647512751021502 0ustar liggesusers/** * @file rectangle_tree.hpp * @author Andrew Wells * * Include all the necessary filse to use the Rectangle Type Trees (RTree, RStarTree, XTree, * and HilbertRTree.) * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_RECTANGLE_TREE_HPP #define __MLPACK_CORE_TREE_RECTANGLE_TREE_HPP /* we include bounds.hpp since it gives us the necessary files. * However, we will not use the "ballbounds" option. */ #include "bounds.hpp" #include "rectangle_tree/rectangle_tree.hpp" #include "rectangle_tree/single_tree_traverser.hpp" #include "rectangle_tree/single_tree_traverser_impl.hpp" #include "rectangle_tree/dual_tree_traverser.hpp" #include "rectangle_tree/dual_tree_traverser_impl.hpp" #include "rectangle_tree/r_tree_split.hpp" #include "rectangle_tree/r_star_tree_split.hpp" #include "rectangle_tree/r_tree_descent_heuristic.hpp" #include "rectangle_tree/r_star_tree_descent_heuristic.hpp" #include "rectangle_tree/traits.hpp" #endif RcppMLPACK/src/mlpack/core/tree/statistic.hpp0000644000176200001440000000361013647512751020526 0ustar liggesusers/** * @file statistic.hpp * * Definition of the policy type for the statistic class. * * You should define your own statistic that looks like EmptyStatistic. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_STATISTIC_HPP #define __MLPACK_CORE_TREE_STATISTIC_HPP namespace mlpack { namespace tree { /** * Empty statistic if you are not interested in storing statistics in your * tree. Use this as a template for your own. */ class EmptyStatistic { public: EmptyStatistic() { } ~EmptyStatistic() { } /** * This constructor is called when a node is finished being created. The * node is finished, and its children are finished, but it is not * necessarily true that the statistics of other nodes are initialized yet. * * @param node Node which this corresponds to. */ template EmptyStatistic(TreeType& /* node */) { } public: /** * Returns a string representation of this object. */ std::string ToString() const { std::stringstream convert; convert << "EmptyStatistic [" << this << "]" << std::endl; return convert.str(); } }; }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_STATISTIC_HPP RcppMLPACK/src/mlpack/core/tree/bounds.hpp0000644000176200001440000000202713647512751020012 0ustar liggesusers/** * @file bounds.hpp * * Bounds that are useful for binary space partitioning trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BOUNDS_HPP #define __MLPACK_CORE_TREE_BOUNDS_HPP #include #include #include "hrectbound.hpp" #include "ballbound.hpp" #endif // __MLPACK_CORE_TREE_BOUNDS_HPP RcppMLPACK/src/mlpack/core/tree/ballbound_impl.hpp0000644000176200001440000001761513647512751021514 0ustar liggesusers/** * @file ballbound_impl.hpp * * Bounds that are useful for binary space partitioning trees. * Implementation of BallBound ball bound metric policy class. * * @experimental * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BALLBOUND_IMPL_HPP #define __MLPACK_CORE_TREE_BALLBOUND_IMPL_HPP // In case it hasn't been included already. #include "ballbound.hpp" #include namespace mlpack { namespace bound { //! Empty Constructor. template BallBound::BallBound() : radius(-DBL_MAX), metric(new TMetricType()), ownsMetric(true) { /* Nothing to do. */ } /** * Create the ball bound with the specified dimensionality. * * @param dimension Dimensionality of ball bound. */ template BallBound::BallBound(const size_t dimension) : radius(-DBL_MAX), center(dimension), metric(new TMetricType()), ownsMetric(true) { /* Nothing to do. */ } /** * Create the ball bound with the specified radius and center. * * @param radius Radius of ball bound. * @param center Center of ball bound. */ template BallBound::BallBound(const double radius, const VecType& center) : radius(radius), center(center), metric(new TMetricType()), ownsMetric(true) { /* Nothing to do. */ } //! Copy Constructor. To prevent memory leaks. template BallBound::BallBound(const BallBound& other) : radius(other.radius), center(other.center), metric(other.metric), ownsMetric(false) { /* Nothing to do. */ } //! For the same reason as the Copy Constructor. To prevent memory leaks. template BallBound& BallBound::operator=( const BallBound& other) { radius = other.radius; center = other.center; metric = other.metric; ownsMetric = false; } //! Destructor to release allocated memory. template BallBound::~BallBound() { if (ownsMetric) delete metric; } //! Get the range in a certain dimension. template math::Range BallBound::operator[](const size_t i) const { if (radius < 0) return math::Range(); else return math::Range(center[i] - radius, center[i] + radius); } /** * Determines if a point is within the bound. */ template bool BallBound::Contains(const VecType& point) const { if (radius < 0) return false; else return metric->Evaluate(center, point) <= radius; } /** * Calculates minimum bound-to-point squared distance. */ template template double BallBound::MinDistance( const OtherVecType& point, typename boost::enable_if >* /* junk */) const { if (radius < 0) return DBL_MAX; else return math::ClampNonNegative(metric->Evaluate(point, center) - radius); } /** * Calculates minimum bound-to-bound squared distance. */ template double BallBound::MinDistance(const BallBound& other) const { if (radius < 0) return DBL_MAX; else { const double delta = metric->Evaluate(center, other.center) - radius - other.radius; return math::ClampNonNegative(delta); } } /** * Computes maximum distance. */ template template double BallBound::MaxDistance( const OtherVecType& point, typename boost::enable_if >* /* junk */) const { if (radius < 0) return DBL_MAX; else return metric->Evaluate(point, center) + radius; } /** * Computes maximum distance. */ template double BallBound::MaxDistance(const BallBound& other) const { if (radius < 0) return DBL_MAX; else return metric->Evaluate(other.center, center) + radius + other.radius; } /** * Calculates minimum and maximum bound-to-bound squared distance. * * Example: bound1.MinDistanceSq(other) for minimum squared distance. */ template template math::Range BallBound::RangeDistance( const OtherVecType& point, typename boost::enable_if >* /* junk */) const { if (radius < 0) return math::Range(DBL_MAX, DBL_MAX); else { const double dist = metric->Evaluate(center, point); return math::Range(math::ClampNonNegative(dist - radius), dist + radius); } } template math::Range BallBound::RangeDistance( const BallBound& other) const { if (radius < 0) return math::Range(DBL_MAX, DBL_MAX); else { const double dist = metric->Evaluate(center, other.center); const double sumradius = radius + other.radius; return math::Range(math::ClampNonNegative(dist - sumradius), dist + sumradius); } } /** * Expand the bound to include the given bound. * template const BallBound& BallBound::operator|=( const BallBound& other) { double dist = metric->Evaluate(center, other); // Now expand the radius as necessary. if (dist > radius) radius = dist; return *this; }*/ /** * Expand the bound to include the given point. Algorithm adapted from * Jack Ritter, "An Efficient Bounding Sphere" in Graphics Gems (1990). * The difference lies in the way we initialize the ball bound. The way we * expand the bound is same. */ template template const BallBound& BallBound::operator|=(const MatType& data) { if (radius < 0) { center = data.col(0); radius = 0; } // Now iteratively add points. for (size_t i = 0; i < data.n_cols; ++i) { const double dist = metric->Evaluate(center, (VecType) data.col(i)); // See if the new point lies outside the bound. if (dist > radius) { // Move towards the new point and increase the radius just enough to // accomodate the new point. arma::vec diff = data.col(i) - center; center += ((dist - radius) / (2 * dist)) * diff; radius = 0.5 * (dist + radius); } } return *this; } /** * Returns a string representation of this object. */ template std::string BallBound::ToString() const { std::ostringstream convert; convert << "BallBound [" << this << "]" << std::endl; convert << " Radius: " << radius << std::endl; convert << " Center:" << std::endl << center; convert << " ownsMetric: " << ownsMetric << std::endl; convert << " Metric:" << std::endl << metric->ToString(); return convert.str(); } }; // namespace bound }; // namespace mlpack #endif // __MLPACK_CORE_TREE_DBALLBOUND_IMPL_HPP RcppMLPACK/src/mlpack/core/tree/example_tree.hpp0000644000176200001440000002503613647512751021177 0ustar liggesusers/** * @file example_tree.hpp * @author Ryan Curtin * * An example tree. This contains all the functions that mlpack trees must * implement (although the actual implementations here don't make any sense * because this is just an example). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_EXAMPLE_TREE_HPP #define __MLPACK_CORE_TREE_EXAMPLE_TREE_HPP namespace mlpack { namespace tree { /** * This is not an actual space tree but instead an example tree that exists to * show and document all the functions that mlpack trees must implement. For a * better overview of trees, see @ref trees. Also be aware that the * implementations of each of the methods in this example tree are entirely fake * and do not work; this example tree exists for its API, not its * implementation. * * Note that trees often have different properties. These properties are known * at compile-time through the mlpack::tree::TreeTraits class, and some * properties may imply the existence (or non-existence) of certain functions. * Refer to the TreeTraits for more documentation on that. * * The three template parameters below must be template parameters to the tree, * in the order given below. More template parameters are fine, but they must * come after the first three. * * @tparam MetricType This defines the space in which the tree will be built. * For some trees, arbitrary metrics cannot be used, and a template * metaprogramming approach should be used to issue a compile-time error if * a metric cannot be used with a specific tree type. One example is the * tree::BinarySpaceTree tree type, which cannot work with the * metric::IPMetric class. * @tparam StatisticType A tree node can hold a statistic, which is sometimes * useful for various dual-tree algorithms. The tree itself does not need * to know anything about how the statistic works, but it needs to hold a * StatisticType in each node. It can be assumed that the StatisticType * class has a constructor StatisticType(const ExampleTree&). * @tparam MatType A tree could be built on a dense matrix or a sparse matrix. * All mlpack trees should be able to support any Armadillo-compatible * matrix type. When the tree is written it should be assumed that MatType * has the same functionality as arma::mat. */ template, typename StatisticType = EmptyStatistic, typename MatType = arma::mat> class ExampleTree { public: /** * This constructor will build the tree given a dataset and an instantiated * metric. Note that the parameter is a MatType& and not an arma::mat&. The * dataset is not modified by the tree-building process (if it is, see the * documentation for mlpack::tree::TreeTraits::RearrangesDataset for how to * deal with that situation). The MetricType parameter is necessary even * though some metrics do not hold any state. This is so that the tree does * not have to worry about instantiating the metric (if the tree had to worry * about this, this would almost certainly incur additional runtime complexity * and a larger runtime size of the tree node objects, which is to be * avoided). The metric can't be const, in case MetricType::Evaluate() is * non-const. * * When this constructor is finished, the entire tree will be built and ready * to use. The constructor should call the constructor of the statistic for * each node that is built (see tree::EmptyStatistic for more information). * * @param dataset The dataset that the tree will be built on. * @param metric The instantiated metric to use to build the dataset. */ ExampleTree(const MatType& dataset, MetricType& metric); //! Return the number of children of this node. size_t NumChildren() const; //! Return a particular child of this node. const ExampleTree& Child(const size_t i) const; //! Modify a particular child of this node. ExampleTree& Child(const size_t i); //! Return the parent node (NULL if this is the root of the tree). ExampleTree* Parent() const; //! Return the number of points held in this node. size_t NumPoints() const; /** * Return the index of a particular point of this node. mlpack trees do not, * in general, hold the actual dataset, and instead just hold the indices of * the points they contain. Thus, you might use this function in code like * this: * * @code * arma::vec thirdPoint = dataset.col(treeNode.Point(2)); * @endcode */ size_t Point(const size_t i) const; /** * Get the number of descendant points. This is the number of unique points * held in this node plus the number of points held in all descendant nodes. * This could be calculated at build-time and cached, or could be calculated * at run-time. This may be harder to calculate for trees that may hold * points in multiple nodes (like cover trees and spill trees, for instance). */ size_t NumDescendants() const; /** * Get the index of a particular descendant point. The ordering of the * descendants does not matter, as long as calling Descendant(0) through * Descendant(NumDescendants() - 1) will return the indices of every * unique descendant point of the node. */ size_t Descendant(const size_t i) const; //! Get the statistic for this node. const StatisticType& Stat() const; //! Modify the statistic for this node. StatisticType& Stat(); //! Get the instantiated metric for this node. const MetricType& Metric() const; //! Modify the instantiated metric for this node. MetricType& Metric(); /** * Return the minimum distance between this node and a point. It is not * required that the exact minimum distance between the node and the point is * returned but instead a lower bound on the minimum distance will suffice. * See the definitions in @ref trees for more information. * * @param point Point to return [lower bound on] minimum distance to. */ double MinDistance(const MatType& point) const; /** * Return the minimum distance between this node and another node. It is not * required that the exact minimum distance between the two nodes be returned * but instead a lower bound on the minimum distance will suffice. See the * definitions in @ref trees for more information. * * @param node Node to return [lower bound on] minimum distance to. */ double MinDistance(const ExampleTree& other) const; /** * Return the maximum distance between this node and a point. It is not * required that the exact maximum distance between the node and the point is * returned but instead an upper bound on the maximum distance will suffice. * See the definitions in @ref trees for more information. * * @param point Point to return [upper bound on] maximum distance to. */ double MaxDistance(const MatType& point) const; /** * Return the maximum distance between this node and another node. It is not * required that the exact maximum distance between the two nodes be returned * but instead an upper bound on the maximum distance will suffice. See the * definitions in @ref trees for more information. * * @param node Node to return [upper bound on] maximum distance to. */ double MaxDistance(const ExampleTree& other) const; /** * Return both the minimum and maximum distances between this node and a point * as a math::Range object. This overload is given because it is possible * that, for some tree types, calculation of both at once is faster than a * call to MinDistance() then MaxDistance(). It is not necessary that the * minimum and maximum distances be exact; it is sufficient to return a lower * bound on the minimum distance and an upper bound on the maximum distance. * See the definitions in @ref trees for more information. * * @param point Point to return [bounds on] minimum and maximum distances to. */ math::Range RangeDistance(const MatType& point) const; /** * Return both the minimum and maximum distances between this node and another * node as a math::Range object. This overload is given because it is * possible that, for some tree types, calculation of both at once is faster * than a call to MinDistance() then MaxDistance(). It is not necessary that * the minimum and maximum distances be exact; it is sufficient to return a * lower bound on the minimum distance and an upper bound on the maximum * distance. See the definitions in @ref trees for more information. * * @param node Node to return [bounds on] minimum and maximum distances to. */ math::Range RangeDistance(const ExampleTree& other) const; /** * Fill the given vector with the center of the node. * * @param centroid Vector to be filled with the center of the node. */ void Centroid(arma::vec& centroid) const; /** * Get the distance from the center of the node to the furthest descendant * point of this node. This does not necessarily need to be the exact * furthest descendant distance but instead can be an upper bound. See the * definitions in @ref trees for more information. */ double FurthestDescendantDistance() const; /** * Get the distance from the center of this node to the center of the parent * node. */ double ParentDistance() const; private: //! This member is just here so the ExampleTree compiles without warnings. It //! is not required to be a member in every type of tree. StatisticType stat; /** * This member is just here so the ExampleTree compiles without warnings. It * is not required to be a member in every type of tree. Be aware that * storing the metric as a member and not a reference may mean that for some * metrics (such as metric::MahalanobisDistance in high dimensionality) may * incur lots of unnecessary matrix copying. */ MetricType& metric; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/mrkd_statistic.hpp0000644000176200001440000000670213647512751021550 0ustar liggesusers/** * @file mrkd_statistic.hpp * @author James Cline * * Definition of the statistic for multi-resolution kd-trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_MRKD_STATISTIC_HPP #define __MLPACK_CORE_TREE_MRKD_STATISTIC_HPP #include namespace mlpack { namespace tree { /** * Statistic for multi-resolution kd-trees. */ class MRKDStatistic { public: //! Initialize an empty statistic. MRKDStatistic(); /** * This constructor is called when a node is finished initializing. * * @param node The node that has been finished. */ template MRKDStatistic(const TreeType& /* node */); /** * Returns a string representation of this object. */ std::string ToString() const; //! Get the index of the initial item in the dataset. size_t Begin() const { return begin; } //! Modify the index of the initial item in the dataset. size_t& Begin() { return begin; } //! Get the number of items in the dataset. size_t Count() const { return count; } //! Modify the number of items in the dataset. size_t& Count() { return count; } //! Get the center of mass. const arma::colvec& CenterOfMass() const { return centerOfMass; } //! Modify the center of mass. arma::colvec& CenterOfMass() { return centerOfMass; } //! Get the index of the dominating centroid. size_t DominatingCentroid() const { return dominatingCentroid; } //! Modify the index of the dominating centroid. size_t& DominatingCentroid() { return dominatingCentroid; } //! Access the whitelist. const std::vector& Whitelist() const { return whitelist; } //! Modify the whitelist. std::vector& Whitelist() { return whitelist; } private: //! The data points this object contains. const arma::mat* dataset; //! The initial item in the dataset, so we don't have to make a copy. size_t begin; //! The number of items in the dataset. size_t count; //! The left child. const MRKDStatistic* leftStat; //! The right child. const MRKDStatistic* rightStat; //! A link to the parent node; NULL if this is the root. const MRKDStatistic* parentStat; // Computed statistics. //! The center of mass for this dataset. arma::colvec centerOfMass; //! The sum of the squared Euclidean norms for this dataset. double sumOfSquaredNorms; // There may be a better place to store this -- HRectBound? //! The index of the dominating centroid of the associated hyperrectangle. size_t dominatingCentroid; //! The list of centroids that cannot own this hyperrectangle. std::vector whitelist; //! Whether or not the whitelist is valid. bool isWhitelistValid; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "mrkd_statistic_impl.hpp" #endif // __MLPACK_CORE_TREE_MRKD_STATISTIC_HPP RcppMLPACK/src/mlpack/core/tree/cosine_tree/0000755000176200001440000000000013647514343020304 5ustar liggesusersRcppMLPACK/src/mlpack/core/tree/cosine_tree/cosine_tree.cpp0000644000176200001440000003241213647514343023311 0ustar liggesusers/** * @file cosine_tree_impl.hpp * @author Siddharth Agrawal * * Implementation of cosine tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "cosine_tree.hpp" #include namespace mlpack { namespace tree { CosineTree::CosineTree(const arma::mat& dataset) : dataset(dataset), parent(NULL), right(NULL), left(NULL), numColumns(dataset.n_cols) { // Initialize sizes of column indices and l2 norms. indices.resize(numColumns); l2NormsSquared.zeros(numColumns); // Set indices and calculate squared norms of the columns. for(size_t i = 0; i < numColumns; i++) { indices[i] = i; double l2Norm = arma::norm(dataset.col(i), 2); l2NormsSquared(i) = l2Norm * l2Norm; } // Frobenius norm of columns in the node. frobNormSquared = arma::accu(l2NormsSquared); // Calculate centroid of columns in the node. CalculateCentroid(); splitPointIndex = ColumnSampleLS(); } CosineTree::CosineTree(CosineTree& parentNode, const std::vector& subIndices) : dataset(parentNode.GetDataset()), parent(&parentNode), right(NULL), left(NULL), numColumns(subIndices.size()) { // Initialize sizes of column indices and l2 norms. indices.resize(numColumns); l2NormsSquared.zeros(numColumns); // Set indices and squared norms of the columns. for(size_t i = 0; i < numColumns; i++) { indices[i] = parentNode.indices[subIndices[i]]; l2NormsSquared(i) = parentNode.l2NormsSquared(subIndices[i]); } // Frobenius norm of columns in the node. frobNormSquared = arma::accu(l2NormsSquared); // Calculate centroid of columns in the node. CalculateCentroid(); splitPointIndex = ColumnSampleLS(); } CosineTree::CosineTree(const arma::mat& dataset, const double epsilon, const double delta) : dataset(dataset), epsilon(epsilon), delta(delta) { // Declare the cosine tree priority queue. CosineNodeQueue treeQueue; // Define root node of the tree and add it to the queue. CosineTree root(dataset); arma::vec tempVector = arma::zeros(dataset.n_rows); root.L2Error(0); root.BasisVector(tempVector); treeQueue.push(&root); // Initialize Monte Carlo error estimate for comparison. double monteCarloError = root.FrobNormSquared(); while(monteCarloError > epsilon * root.FrobNormSquared()) { // Pop node from queue with highest projection error. CosineTree* currentNode; currentNode = treeQueue.top(); treeQueue.pop(); // Split the node into left and right children. currentNode->CosineNodeSplit(); // Obtain pointers to the left and right children of the current node. CosineTree *currentLeft, *currentRight; currentLeft = currentNode->Left(); currentRight = currentNode->Right(); // Calculate basis vectors of left and right children. arma::vec lBasisVector, rBasisVector; ModifiedGramSchmidt(treeQueue, currentLeft->Centroid(), lBasisVector); ModifiedGramSchmidt(treeQueue, currentRight->Centroid(), rBasisVector, &lBasisVector); // Add basis vectors to their respective nodes. currentLeft->BasisVector(lBasisVector); currentRight->BasisVector(rBasisVector); // Calculate Monte Carlo error estimates for child nodes. MonteCarloError(currentLeft, treeQueue, &lBasisVector, &rBasisVector); MonteCarloError(currentRight, treeQueue, &lBasisVector, &rBasisVector); // Push child nodes into the priority queue. treeQueue.push(currentLeft); treeQueue.push(currentRight); // Calculate Monte Carlo error estimate for the root node. monteCarloError = MonteCarloError(&root, treeQueue); } // Construct the subspace basis from the current priority queue. ConstructBasis(treeQueue); } void CosineTree::ModifiedGramSchmidt(CosineNodeQueue& treeQueue, arma::vec& centroid, arma::vec& newBasisVector, arma::vec* addBasisVector) { // Set new basis vector to centroid. newBasisVector = centroid; // Variables for iterating throught the priority queue. CosineTree *currentNode; CosineNodeQueue::const_iterator i = treeQueue.begin(); // For every vector in the current basis, remove its projection from the // centroid. for(; i != treeQueue.end(); i++) { currentNode = *i; double projection = arma::dot(currentNode->BasisVector(), centroid); newBasisVector -= projection * currentNode->BasisVector(); } // If additional basis vector is passed, take it into account. if(addBasisVector) { double projection = arma::dot(*addBasisVector, centroid); newBasisVector -= *addBasisVector * projection; } // Normalize the modified centroid vector. if(arma::norm(newBasisVector, 2)) newBasisVector /= arma::norm(newBasisVector, 2); } double CosineTree::MonteCarloError(CosineTree* node, CosineNodeQueue& treeQueue, arma::vec* addBasisVector1, arma::vec* addBasisVector2) { std::vector sampledIndices; arma::vec probabilities; // Sample O(log m) points from the input node's distribution. // 'm' is the number of columns present in the node. size_t numSamples = (int)log((double)node->NumColumns()) + 1; node->ColumnSamplesLS(sampledIndices, probabilities, numSamples); // Get pointer to the original dataset. arma::mat dataset = node->GetDataset(); // Initialize weighted projection magnitudes as zeros. arma::vec weightedMagnitudes; weightedMagnitudes.zeros(numSamples); // Set size of projection vector, depending on whether additional basis // vectors are passed. size_t projectionSize; if(addBasisVector1 && addBasisVector2) projectionSize = treeQueue.size() + 2; else projectionSize = treeQueue.size(); // For each sample, calculate the weighted projection onto the current basis. for(size_t i = 0; i < numSamples; i++) { // Initialize projection as a vector of zeros. arma::vec projection; projection.zeros(projectionSize); CosineTree *currentNode; CosineNodeQueue::const_iterator j = treeQueue.begin(); size_t k = 0; // Compute the projection of the sampled vector onto the existing subspace. for(; j != treeQueue.end(); j++, k++) { currentNode = *j; projection(k) = arma::dot(dataset.col(sampledIndices[i]), currentNode->BasisVector()); } // If two additional vectors are passed, take their projections. if(addBasisVector1 && addBasisVector2) { projection(k++) = arma::dot(dataset.col(sampledIndices[i]), *addBasisVector1); projection(k) = arma::dot(dataset.col(sampledIndices[i]), *addBasisVector2); } // Calculate the Frobenius norm squared of the projected vector. double frobProjection = arma::norm(projection, "frob"); double frobProjectionSquared = frobProjection * frobProjection; // Calculate the weighted projection magnitude. weightedMagnitudes(i) = frobProjectionSquared / probabilities(i); } // Compute mean and standard deviation of the weighted samples. double mu = arma::mean(weightedMagnitudes); double sigma = arma::stddev(weightedMagnitudes); if(!sigma) { node->L2Error(node->FrobNormSquared() - mu); return (node->FrobNormSquared() - mu); } // Fit a normal distribution using the calculated statistics, and calculate a // lower bound on the magnitudes for the passed 'delta' parameter. boost::math::normal dist(mu, sigma); double lowerBound = boost::math::quantile(dist, delta); // Upper bound on the subspace reconstruction error. node->L2Error(node->FrobNormSquared() - lowerBound); return (node->FrobNormSquared() - lowerBound); } void CosineTree::ConstructBasis(CosineNodeQueue& treeQueue) { // Initialize basis as matrix of zeros. basis.zeros(dataset.n_rows, treeQueue.size()); // Variables for iterating through the priority queue. CosineTree *currentNode; CosineNodeQueue::const_iterator i = treeQueue.begin(); // Transfer basis vectors from the queue to the basis matrix. size_t j = 0; for(; i != treeQueue.end(); i++, j++) { currentNode = *i; basis.col(j) = currentNode->BasisVector(); } } void CosineTree::CosineNodeSplit() { //! If less than two nodes, splitting does not make sense. if(numColumns < 3) return; //! Calculate cosines with respect to the splitting point. arma::vec cosines; CalculateCosines(cosines); //! Compute maximum and minimum cosine values. double cosineMax, cosineMin; cosineMax = arma::max(cosines % (cosines < 1)); cosineMin = arma::min(cosines); std::vector leftIndices, rightIndices; // Split columns into left and right children. The splitting condition for the // column to be in the left child is as follows: // cos_max - cos(i) <= cos(i) - cos_min for(size_t i = 0; i < numColumns; i++) { if(cosineMax - cosines(i) <= cosines(i) - cosineMin) { leftIndices.push_back(i); } else { rightIndices.push_back(i); } } // Split the node into left and right children. left = new CosineTree(*this, leftIndices); right = new CosineTree(*this, rightIndices); } void CosineTree::ColumnSamplesLS(std::vector& sampledIndices, arma::vec& probabilities, size_t numSamples) { // Initialize the cumulative distribution vector size. arma::vec cDistribution; cDistribution.zeros(numColumns + 1); // Calculate cumulative length-squared distribution for the node. for(size_t i = 0; i < numColumns; i++) { cDistribution(i+1) = cDistribution(i) + l2NormsSquared(i) / frobNormSquared; } // Intialize sizes of the 'sampledIndices' and 'probabilities' vectors. sampledIndices.resize(numSamples); probabilities.zeros(numSamples); for(size_t i = 0; i < numSamples; i++) { // Generate a random value for sampling. double randValue = arma::randu(); size_t start = 0, end = numColumns, searchIndex; // Sample from the distribution and store corresponding probability. searchIndex = BinarySearch(cDistribution, randValue, start, end); sampledIndices[i] = indices[searchIndex]; probabilities(i) = l2NormsSquared(searchIndex) / frobNormSquared; } } size_t CosineTree::ColumnSampleLS() { // If only one element is present, there can only be one sample. if(numColumns < 2) { return 0; } // Initialize the cumulative distribution vector size. arma::vec cDistribution; cDistribution.zeros(numColumns + 1); // Calculate cumulative length-squared distribution for the node. for(size_t i = 0; i < numColumns; i++) { cDistribution(i+1) = cDistribution(i) + l2NormsSquared(i) / frobNormSquared; } // Generate a random value for sampling. double randValue = arma::randu(); size_t start = 0, end = numColumns; // Sample from the distribution. return BinarySearch(cDistribution, randValue, start, end); } size_t CosineTree::BinarySearch(arma::vec& cDistribution, double value, size_t start, size_t end) { size_t pivot = (start + end) / 2; // If pivot is zero, first point is the sampled point. if(!pivot) { return pivot; } // Binary search recursive algorithm. if(value > cDistribution(pivot - 1) && value <= cDistribution(pivot)) { return (pivot - 1); } else if(value < cDistribution(pivot - 1)) { return BinarySearch(cDistribution, value, start, pivot - 1); } else { return BinarySearch(cDistribution, value, pivot + 1, end); } } void CosineTree::CalculateCosines(arma::vec& cosines) { // Initialize cosine vector as a vector of zeros. cosines.zeros(numColumns); for(size_t i = 0; i < numColumns; i++) { // If norm is zero, store cosine value as zero. Else, calculate cosine value // between two vectors. if(l2NormsSquared(i) == 0) { cosines(i) = 0; } else { cosines(i) = arma::norm_dot(dataset.col(indices[splitPointIndex]), dataset.col(indices[i])); } } } void CosineTree::CalculateCentroid() { // Initialize centroid as vector of zeros. centroid.zeros(dataset.n_rows); // Calculate centroid of columns in the node. for(size_t i = 0; i < numColumns; i++) { centroid += dataset.col(indices[i]); } centroid /= numColumns; } }; // namespace tree }; // namespace mlpack RcppMLPACK/src/mlpack/core/tree/cosine_tree/cosine_tree.hpp0000644000176200001440000002337313647512751023325 0ustar liggesusers/** * @file cosine_tree.hpp * @author Siddharth Agrawal * * Definition of Cosine Tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COSINE_TREE_COSINE_TREE_HPP #define __MLPACK_CORE_TREE_COSINE_TREE_COSINE_TREE_HPP #include #include namespace mlpack { namespace tree { // Predeclare classes for CosineNodeQueue typedef. class CompareCosineNode; class CosineTree; // CosineNodeQueue typedef. typedef boost::heap::priority_queue > CosineNodeQueue; class CosineTree { public: /** * CosineTree constructor for the root node of the tree. It initializes the * necessary variables required for splitting of the node, and building the * tree further. It takes a pointer to the input matrix and calculates the * relevant variables using it. * * @param dataset Matrix for which cosine tree is constructed. */ CosineTree(const arma::mat& dataset); /** * CosineTree constructor for nodes other than the root node of the tree. It * takes in a pointer to the parent node and a list of column indices which * mentions the columns to be included in the node. The function calculate the * relevant variables just like the constructor above. * * @param parentNode Pointer to the parent cosine node. * @param subIndices Pointer to vector of column indices to be included. */ CosineTree(CosineTree& parentNode, const std::vector& subIndices); /** * Construct the CosineTree and the basis for the given matrix, and passed * 'epsilon' and 'delta' parameters. The CosineTree is constructed by * splitting nodes in the direction of maximum error, stored using a priority * queue. Basis vectors are added from the left and right children of the * split node. The basis vector from a node is the orthonormalized centroid of * its columns. The splitting continues till the Monte Carlo estimate of the * input matrix's projection on the obtained subspace is less than a fraction * of the norm of the input matrix. * * @param dataset Matrix for which the CosineTree is constructed. * @param epsilon Error tolerance fraction for calculated subspace. * @param delta Cumulative probability for Monte Carlo error lower bound. */ CosineTree(const arma::mat& dataset, const double epsilon, const double delta); /** * Calculates the orthonormalization of the passed centroid, with respect to * the current vector subspace. * * @param treeQueue Priority queue of cosine nodes. * @param centroid Centroid of the node being added to the basis. * @param newBasisVector Orthonormalized centroid of the node. * @param addBasisVector Address to additional basis vector. */ void ModifiedGramSchmidt(CosineNodeQueue& treeQueue, arma::vec& centroid, arma::vec& newBasisVector, arma::vec* addBasisVector = NULL); /** * Estimates the squared error of the projection of the input node's matrix * onto the current vector subspace. A normal distribution is fit using * weighted norms of projections of samples drawn from the input node's matrix * columns. The error is calculated as the difference between the Frobenius * norm of the input node's matrix and lower bound of the normal distribution. * * @param node Node for which Monte Carlo estimate is calculated. * @param treeQueue Priority queue of cosine nodes. * @param addBasisVector1 Address to first additional basis vector. * @param addBasisVector2 Address to second additional basis vector. */ double MonteCarloError(CosineTree* node, CosineNodeQueue& treeQueue, arma::vec* addBasisVector1 = NULL, arma::vec* addBasisVector2 = NULL); /** * Constructs the final basis matrix, after the cosine tree construction. * * @param treeQueue Priority queue of cosine nodes. */ void ConstructBasis(CosineNodeQueue& treeQueue); /** * This function splits the cosine node into two children based on the cosines * of the columns contained in the node, with respect to the sampled splitting * point. The function also calls the CosineTree constructor for the children. */ void CosineNodeSplit(); /** * Sample 'numSamples' points from the Length-Squared distribution of the * cosine node. The function uses 'l2NormsSquared' to calculate the cumulative * probability distribution of the column vectors. The sampling is based on a * randomly generated values in the range [0, 1]. */ void ColumnSamplesLS(std::vector& sampledIndices, arma::vec& probabilities, size_t numSamples); /** * Sample a point from the Length-Squared distribution of the cosine node. The * function uses 'l2NormsSquared' to calculate the cumulative probability * distribution of the column vectors. The sampling is based on a randomly * generated value in the range [0, 1]. */ size_t ColumnSampleLS(); /** * Sample a column based on the cumulative Length-Squared distribution of the * cosine node, and a randomly generated value in the range [0, 1]. Binary * search is more efficient than searching linearly for the same. This leads * a significant speedup when there are large number of columns to choose from * and when a number of samples are to be drawn from the distribution. * * @param cDistribution Cumulative LS distibution of columns in the node. * @param value Randomly generated value in the range [0, 1]. * @param start Starting index of the distribution interval to search in. * @param end Ending index of the distribution interval to search in. */ size_t BinarySearch(arma::vec& cDistribution, double value, size_t start, size_t end); /** * Calculate cosines of the columns present in the node, with respect to the * sampled splitting point. The calculated cosine values are useful for * splitting the node into its children. * * @param cosines Vector to store the cosine values in. */ void CalculateCosines(arma::vec& cosines); /** * Calculate centroid of the columns present in the node. The calculated * centroid is used as a basis vector for the cosine tree being constructed. */ void CalculateCentroid(); //! Returns the basis of the constructed subspace. void GetFinalBasis(arma::mat& finalBasis) { finalBasis = basis; } //! Get pointer to the dataset matrix. const arma::mat& GetDataset() const { return dataset; } //! Get the indices of columns in the node. std::vector& VectorIndices() { return indices; } //! Set the Monte Carlo error. void L2Error(const double error) { this->l2Error = error; } //! Get the Monte Carlo error. double L2Error() const { return l2Error; } //! Get pointer to the centroid vector. arma::vec& Centroid() { return centroid; } //! Set the basis vector of the node. void BasisVector(arma::vec& bVector) { this->basisVector = bVector; } //! Get the basis vector of the node. arma::vec& BasisVector() { return basisVector; } //! Get pointer to the left child of the node. CosineTree* Left() { return left; } //! Get pointer to the right child of the node. CosineTree* Right() { return right; } //! Get number of columns of input matrix in the node. size_t NumColumns() const { return numColumns; } //! Get the Frobenius norm squared of columns in the node. double FrobNormSquared() const { return frobNormSquared; } //! Get the column index of split point of the node. size_t SplitPointIndex() const { return indices[splitPointIndex]; } private: //! Matrix for which cosine tree is constructed. const arma::mat& dataset; //! Error tolerance fraction for calculated subspace. double epsilon; //! Cumulative probability for Monte Carlo error lower bound. double delta; //! Subspace basis of the input dataset. arma::mat basis; //! Parent of the node. CosineTree* parent; //! Right child of the node. CosineTree* right; //! Left child of the node. CosineTree* left; //! Indices of columns of input matrix in the node. std::vector indices; //! L2-norm squared of columns in the node. arma::vec l2NormsSquared; //! Centroid of columns of input matrix in the node. arma::vec centroid; //! Orthonormalized basis vector of the node. arma::vec basisVector; //! Index of split point of cosine node. size_t splitPointIndex; //! Number of columns of input matrix in the node. size_t numColumns; //! Monte Carlo error for this node. double l2Error; //! Frobenius norm squared of columns in the node. double frobNormSquared; }; class CompareCosineNode { public: // Comparison function for construction of priority queue. bool operator() (const CosineTree* a, const CosineTree* b) const { return a->L2Error() < b->L2Error(); } }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/mrkd_statistic.cpp0000644000176200001440000000324613647514343021542 0ustar liggesusers/** * @file mrkd_statistic.cpp * @author James Cline * * Definition of the statistic for multi-resolution kd-trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "mrkd_statistic.hpp" using namespace mlpack; using namespace mlpack::tree; MRKDStatistic::MRKDStatistic() : dataset(NULL), begin(0), count(0), leftStat(NULL), rightStat(NULL), parentStat(NULL) { } /** * Returns a string representation of this object. */ std::string MRKDStatistic::ToString() const { std::ostringstream convert; convert << "MRKDStatistic [" << this << std::endl; convert << "begin: " << begin << std::endl; convert << "count: " << count << std::endl; convert << "sumOfSquaredNorms: " << sumOfSquaredNorms << std::endl; if (leftStat != NULL) { convert << "leftStat:" << std::endl; convert << mlpack::util::Indent(leftStat->ToString()); } if (rightStat != NULL) { convert << "rightStat:" << std::endl; convert << mlpack::util::Indent(rightStat->ToString()); } return convert.str(); } RcppMLPACK/src/mlpack/core/tree/hrectbound_impl.hpp0000644000176200001440000002577413647512751021714 0ustar liggesusers/** * @file hrectbound_impl.hpp * * Implementation of hyper-rectangle bound policy class. * Template parameter Power is the metric to use; use 2 for Euclidean (L2). * * @experimental * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_HRECTBOUND_IMPL_HPP #define __MLPACK_CORE_TREE_HRECTBOUND_IMPL_HPP #include // In case it has not been included yet. #include "hrectbound.hpp" namespace mlpack { namespace bound { /** * Empty constructor. */ template HRectBound::HRectBound() : dim(0), bounds(NULL), minWidth(0) { /* Nothing to do. */ } /** * Initializes to specified dimensionality with each dimension the empty * set. */ template HRectBound::HRectBound(const size_t dimension) : dim(dimension), bounds(new math::Range[dim]), minWidth(0) { /* Nothing to do. */ } /*** * Copy constructor necessary to prevent memory leaks. */ template HRectBound::HRectBound(const HRectBound& other) : dim(other.Dim()), bounds(new math::Range[dim]), minWidth(other.MinWidth()) { // Copy other bounds over. for (size_t i = 0; i < dim; i++) bounds[i] = other[i]; } /*** * Same as the copy constructor. */ template HRectBound& HRectBound::operator=( const HRectBound& other) { if (dim != other.Dim()) { // Reallocation is necessary. if (bounds) delete[] bounds; dim = other.Dim(); bounds = new math::Range[dim]; } // Now copy each of the bound values. for (size_t i = 0; i < dim; i++) bounds[i] = other[i]; minWidth = other.MinWidth(); return *this; } /** * Destructor: clean up memory. */ template HRectBound::~HRectBound() { if (bounds) delete[] bounds; } /** * Resets all dimensions to the empty set. */ template void HRectBound::Clear() { for (size_t i = 0; i < dim; i++) bounds[i] = math::Range(); minWidth = 0; } /*** * Calculates the centroid of the range, placing it into the given vector. * * @param centroid Vector which the centroid will be written to. */ template void HRectBound::Centroid(arma::vec& centroid) const { // Set size correctly if necessary. if (!(centroid.n_elem == dim)) centroid.set_size(dim); for (size_t i = 0; i < dim; i++) centroid(i) = bounds[i].Mid(); } /** * Calculates minimum bound-to-point squared distance. */ template template double HRectBound::MinDistance( const VecType& point, typename boost::enable_if >* /* junk */) const { //Log::Assert(point.n_elem == dim); double sum = 0; double lower, higher; for (size_t d = 0; d < dim; d++) { lower = bounds[d].Lo() - point[d]; higher = point[d] - bounds[d].Hi(); // Since only one of 'lower' or 'higher' is negative, if we add each's // absolute value to itself and then sum those two, our result is the // nonnegative half of the equation times two; then we raise to power Power. sum += pow((lower + fabs(lower)) + (higher + fabs(higher)), (double) Power); } // Now take the Power'th root (but make sure our result is squared if it needs // to be); then cancel out the constant of 2 (which may have been squared now) // that was introduced earlier. The compiler should optimize out the if // statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power) / 2.0; else return sum / pow(2.0, Power); } /** * Calculates minimum bound-to-bound squared distance. */ template double HRectBound::MinDistance(const HRectBound& other) const { //Log::Assert(dim == other.dim); double sum = 0; const math::Range* mbound = bounds; const math::Range* obound = other.bounds; double lower, higher; for (size_t d = 0; d < dim; d++) { lower = obound->Lo() - mbound->Hi(); higher = mbound->Lo() - obound->Hi(); // We invoke the following: // x + fabs(x) = max(x * 2, 0) // (x * 2)^2 / 4 = x^2 sum += pow((lower + fabs(lower)) + (higher + fabs(higher)), (double) Power); // Move bound pointers. mbound++; obound++; } // The compiler should optimize out this if statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power) / 2.0; else return sum / pow(2.0, Power); } /** * Calculates maximum bound-to-point squared distance. */ template template double HRectBound::MaxDistance( const VecType& point, typename boost::enable_if >* /* junk */) const { double sum = 0; //Log::Assert(point.n_elem == dim); for (size_t d = 0; d < dim; d++) { double v = std::max(fabs(point[d] - bounds[d].Lo()), fabs(bounds[d].Hi() - point[d])); sum += pow(v, (double) Power); } // The compiler should optimize out this if statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power); else return sum; } /** * Computes maximum distance. */ template double HRectBound::MaxDistance(const HRectBound& other) const { double sum = 0; //Log::Assert(dim == other.dim); double v; for (size_t d = 0; d < dim; d++) { v = std::max(fabs(other.bounds[d].Hi() - bounds[d].Lo()), fabs(bounds[d].Hi() - other.bounds[d].Lo())); sum += pow(v, (double) Power); // v is non-negative. } // The compiler should optimize out this if statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power); else return sum; } /** * Calculates minimum and maximum bound-to-bound squared distance. */ template math::Range HRectBound::RangeDistance(const HRectBound& other) const { double loSum = 0; double hiSum = 0; //Log::Assert(dim == other.dim); double v1, v2, vLo, vHi; for (size_t d = 0; d < dim; d++) { v1 = other.bounds[d].Lo() - bounds[d].Hi(); v2 = bounds[d].Lo() - other.bounds[d].Hi(); // One of v1 or v2 is negative. if (v1 >= v2) { vHi = -v2; // Make it nonnegative. vLo = (v1 > 0) ? v1 : 0; // Force to be 0 if negative. } else { vHi = -v1; // Make it nonnegative. vLo = (v2 > 0) ? v2 : 0; // Force to be 0 if negative. } loSum += pow(vLo, (double) Power); hiSum += pow(vHi, (double) Power); } if (TakeRoot) return math::Range(pow(loSum, 1.0 / (double) Power), pow(hiSum, 1.0 / (double) Power)); else return math::Range(loSum, hiSum); } /** * Calculates minimum and maximum bound-to-point squared distance. */ template template math::Range HRectBound::RangeDistance( const VecType& point, typename boost::enable_if >* /* junk */) const { double loSum = 0; double hiSum = 0; //Log::Assert(point.n_elem == dim); double v1, v2, vLo, vHi; for (size_t d = 0; d < dim; d++) { v1 = bounds[d].Lo() - point[d]; // Negative if point[d] > lo. v2 = point[d] - bounds[d].Hi(); // Negative if point[d] < hi. // One of v1 or v2 (or both) is negative. if (v1 >= 0) // point[d] <= bounds_[d].Lo(). { vHi = -v2; // v2 will be larger but must be negated. vLo = v1; } else // point[d] is between lo and hi, or greater than hi. { if (v2 >= 0) { vHi = -v1; // v1 will be larger, but must be negated. vLo = v2; } else { vHi = -std::min(v1, v2); // Both are negative, but we need the larger. vLo = 0; } } loSum += pow(vLo, (double) Power); hiSum += pow(vHi, (double) Power); } if (TakeRoot) return math::Range(pow(loSum, 1.0 / (double) Power), pow(hiSum, 1.0 / (double) Power)); else return math::Range(loSum, hiSum); } /** * Expands this region to include a new point. */ template template HRectBound& HRectBound::operator|=( const MatType& data) { //Log::Assert(data.n_rows == dim); arma::vec mins(min(data, 1)); arma::vec maxs(max(data, 1)); minWidth = DBL_MAX; for (size_t i = 0; i < dim; i++) { bounds[i] |= math::Range(mins[i], maxs[i]); const double width = bounds[i].Width(); if (width < minWidth) minWidth = width; } return *this; } /** * Expands this region to encompass another bound. */ template HRectBound& HRectBound::operator|=( const HRectBound& other) { assert(other.dim == dim); minWidth = DBL_MAX; for (size_t i = 0; i < dim; i++) { bounds[i] |= other.bounds[i]; const double width = bounds[i].Width(); if (width < minWidth) minWidth = width; } return *this; } /** * Determines if a point is within this bound. */ template template bool HRectBound::Contains(const VecType& point) const { for (size_t i = 0; i < point.n_elem; i++) { if (!bounds[i].Contains(point(i))) return false; } return true; } /** * Returns the diameter of the hyperrectangle (that is, the longest diagonal). */ template double HRectBound::Diameter() const { double d = 0; for (size_t i = 0; i < dim; ++i) d += std::pow(bounds[i].Hi() - bounds[i].Lo(), (double) Power); if (TakeRoot) return std::pow(d, 1.0 / (double) Power); else return d; } /** * Returns a string representation of this object. */ template std::string HRectBound::ToString() const { std::ostringstream convert; convert << "HRectBound [" << this << "]" << std::endl; convert << " Power: " << Power << std::endl; convert << " TakeRoot: " << (TakeRoot ? "true" : "false") << std::endl; convert << " Dimensionality: " << dim << std::endl; convert << " Bounds: " << std::endl; for (size_t i = 0; i < dim; ++i) convert << util::Indent(bounds[i].ToString()) << std::endl; convert << " Minimum width: " << minWidth << std::endl; return convert.str(); } }; // namespace bound }; // namespace mlpack #endif // __MLPACK_CORE_TREE_HRECTBOUND_IMPL_HPP RcppMLPACK/src/mlpack/core/tree/hrectbound.hpp0000644000176200001440000001360713647512751020663 0ustar liggesusers/** * @file hrectbound.hpp * * Bounds that are useful for binary space partitioning trees. * * This file describes the interface for the HRectBound class, which implements * a hyperrectangle bound. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_HRECTBOUND_HPP #define __MLPACK_CORE_TREE_HRECTBOUND_HPP #include #include #include namespace mlpack { namespace bound { /** * Hyper-rectangle bound for an L-metric. This should be used in conjunction * with the LMetric class. Be sure to use the same template parameters for * LMetric as you do for HRectBound -- otherwise odd results may occur. * * @tparam Power The metric to use; use 2 for Euclidean (L2). * @tparam TakeRoot Whether or not the root should be taken (see LMetric * documentation). */ template class HRectBound { public: //! This is the metric type that this bound is using. typedef metric::LMetric MetricType; /** * Empty constructor; creates a bound of dimensionality 0. */ HRectBound(); /** * Initializes to specified dimensionality with each dimension the empty * set. */ HRectBound(const size_t dimension); //! Copy constructor; necessary to prevent memory leaks. HRectBound(const HRectBound& other); //! Same as copy constructor; necessary to prevent memory leaks. HRectBound& operator=(const HRectBound& other); //! Destructor: clean up memory. ~HRectBound(); /** * Resets all dimensions to the empty set (so that this bound contains * nothing). */ void Clear(); //! Gets the dimensionality. size_t Dim() const { return dim; } //! Get the range for a particular dimension. No bounds checking. Be //! careful: this may make MinWidth() invalid. math::Range& operator[](const size_t i) { return bounds[i]; } //! Modify the range for a particular dimension. No bounds checking. const math::Range& operator[](const size_t i) const { return bounds[i]; } //! Get the minimum width of the bound. double MinWidth() const { return minWidth; } //! Modify the minimum width of the bound. double& MinWidth() { return minWidth; } /** * Calculates the centroid of the range, placing it into the given vector. * * @param centroid Vector which the centroid will be written to. */ void Centroid(arma::vec& centroid) const; /** * Calculates minimum bound-to-point distance. * * @param point Point to which the minimum distance is requested. */ template double MinDistance(const VecType& point, typename boost::enable_if >* = 0) const; /** * Calculates minimum bound-to-bound distance. * * @param other Bound to which the minimum distance is requested. */ double MinDistance(const HRectBound& other) const; /** * Calculates maximum bound-to-point squared distance. * * @param point Point to which the maximum distance is requested. */ template double MaxDistance(const VecType& point, typename boost::enable_if >* = 0) const; /** * Computes maximum distance. * * @param other Bound to which the maximum distance is requested. */ double MaxDistance(const HRectBound& other) const; /** * Calculates minimum and maximum bound-to-bound distance. * * @param other Bound to which the minimum and maximum distances are * requested. */ math::Range RangeDistance(const HRectBound& other) const; /** * Calculates minimum and maximum bound-to-point distance. * * @param point Point to which the minimum and maximum distances are * requested. */ template math::Range RangeDistance(const VecType& point, typename boost::enable_if >* = 0) const; /** * Expands this region to include new points. * * @tparam MatType Type of matrix; could be Mat, SpMat, a subview, or just a * vector. * @param data Data points to expand this region to include. */ template HRectBound& operator|=(const MatType& data); /** * Expands this region to encompass another bound. */ HRectBound& operator|=(const HRectBound& other); /** * Determines if a point is within this bound. */ template bool Contains(const VecType& point) const; /** * Returns the diameter of the hyperrectangle (that is, the longest diagonal). */ double Diameter() const; /** * Returns a string representation of this object. */ std::string ToString() const; /** * Return the metric associated with this bound. Because it is an LMetric, it * cannot store state, so we can make it on the fly. It is also static * because the metric is only dependent on the template arguments. */ static MetricType Metric() { return metric::LMetric(); } private: //! The dimensionality of the bound. size_t dim; //! The bounds for each dimension. math::Range* bounds; //! Cached minimum width of bound. double minWidth; }; }; // namespace bound }; // namespace mlpack #include "hrectbound_impl.hpp" #endif // __MLPACK_CORE_TREE_HRECTBOUND_HPP RcppMLPACK/src/mlpack/core/tree/ballbound.hpp0000644000176200001440000001340513647512751020464 0ustar liggesusers/** * @file ballbound.hpp * * Bounds that are useful for binary space partitioning trees. * Interface to a ball bound that works in arbitrary metric spaces. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BALLBOUND_HPP #define __MLPACK_CORE_TREE_BALLBOUND_HPP #include #include namespace mlpack { namespace bound { /** * Ball bound encloses a set of points at a specific distance (radius) from a * specific point (center). TMetricType is the custom metric type that defaults * to the Euclidean (L2) distance. * * @tparam VecType Type of vector (arma::vec or arma::sp_vec). * @tparam TMetricType metric type used in the distance measure. */ template > class BallBound { public: typedef VecType Vec; //! Need this for Binary Space Partion Tree typedef TMetricType MetricType; private: //! The radius of the ball bound. double radius; //! The center of the ball bound. VecType center; //! The metric used in this bound. TMetricType* metric; /** * To know whether this object allocated memory to the metric member * variable. This will be true except in the copy constructor and the * overloaded assignment operator. We need this to know whether we should * delete the metric member variable in the destructor. */ bool ownsMetric; public: //! Empty Constructor. BallBound(); /** * Create the ball bound with the specified dimensionality. * * @param dimension Dimensionality of ball bound. */ BallBound(const size_t dimension); /** * Create the ball bound with the specified radius and center. * * @param radius Radius of ball bound. * @param center Center of ball bound. */ BallBound(const double radius, const VecType& center); //! Copy constructor. To prevent memory leaks. BallBound(const BallBound& other); //! For the same reason as the Copy Constructor. To prevent memory leaks. BallBound& operator=(const BallBound& other); //! Destructor to release allocated memory. ~BallBound(); //! Get the radius of the ball. double Radius() const { return radius; } //! Modify the radius of the ball. double& Radius() { return radius; } //! Get the center point of the ball. const VecType& Center() const { return center; } //! Modify the center point of the ball. VecType& Center() { return center; } //! Get the dimensionality of the ball. double Dim() const { return center.n_elem; } /** * Get the minimum width of the bound (this is same as the diameter). * For ball bounds, width along all dimensions remain same. */ double MinWidth() const { return radius * 2.0; } //! Get the range in a certain dimension. math::Range operator[](const size_t i) const; /** * Determines if a point is within this bound. */ bool Contains(const VecType& point) const; /** * Place the centroid of BallBound into the given vector. * * @param centroid Vector which the centroid will be written to. */ void Centroid(VecType& centroid) const { centroid = center; } /** * Calculates minimum bound-to-point squared distance. */ template double MinDistance(const OtherVecType& point, typename boost::enable_if >* = 0) const; /** * Calculates minimum bound-to-bound squared distance. */ double MinDistance(const BallBound& other) const; /** * Computes maximum distance. */ template double MaxDistance(const OtherVecType& point, typename boost::enable_if >* = 0) const; /** * Computes maximum distance. */ double MaxDistance(const BallBound& other) const; /** * Calculates minimum and maximum bound-to-point distance. */ template math::Range RangeDistance( const OtherVecType& other, typename boost::enable_if >* = 0) const; /** * Calculates minimum and maximum bound-to-bound distance. * * Example: bound1.MinDistanceSq(other) for minimum distance. */ math::Range RangeDistance(const BallBound& other) const; /** * Expand the bound to include the given node. */ const BallBound& operator|=(const BallBound& other); /** * Expand the bound to include the given point. The centroid is recalculated * to be the center of all of the given points. * * @tparam MatType Type of matrix; could be arma::mat, arma::spmat, or a * vector. * @tparam data Data points to add. */ template const BallBound& operator|=(const MatType& data); /** * Returns the diameter of the ballbound. */ double Diameter() const { return 2 * radius; } /** * Returns the distance metric used in this bound. */ TMetricType Metric() const { return *metric; } /** * Returns a string representation of this object. */ std::string ToString() const; }; }; // namespace bound }; // namespace mlpack #include "ballbound_impl.hpp" #endif // __MLPACK_CORE_TREE_DBALLBOUND_HPP RcppMLPACK/src/mlpack/core/tree/binary_space_tree.hpp0000644000176200001440000000240113647512751022172 0ustar liggesusers/** * @file binary_space_tree.hpp * @author Ryan Curtin * * Include all the necessary files to use the BinarySpaceTree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_HPP #include "bounds.hpp" #include "binary_space_tree/binary_space_tree.hpp" #include "binary_space_tree/single_tree_traverser.hpp" #include "binary_space_tree/single_tree_traverser_impl.hpp" #include "binary_space_tree/dual_tree_traverser.hpp" #include "binary_space_tree/dual_tree_traverser_impl.hpp" #include "binary_space_tree/traits.hpp" #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/0000755000176200001440000000000013647521523020140 5ustar liggesusersRcppMLPACK/src/mlpack/core/tree/cover_tree/cover_tree.hpp0000644000176200001440000004577213647512751023030 0ustar liggesusers/** * @file cover_tree.hpp * @author Ryan Curtin * * Definition of CoverTree, which can be used in place of the BinarySpaceTree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_HPP #define __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_HPP #include #include #include "first_point_is_root.hpp" #include "../statistic.hpp" namespace mlpack { namespace tree { /** * A cover tree is a tree specifically designed to speed up nearest-neighbor * computation in high-dimensional spaces. Each non-leaf node references a * point and has a nonzero number of children, including a "self-child" which * references the same point. A leaf node represents only one point. * * The tree can be thought of as a hierarchy with the root node at the top level * and the leaf nodes at the bottom level. Each level in the tree has an * assigned 'scale' i. The tree follows these three conditions: * * - nesting: the level C_i is a subset of the level C_{i - 1}. * - covering: all node in level C_{i - 1} have at least one node in the * level C_i with distance less than or equal to b^i (exactly one of these * is a parent of the point in level C_{i - 1}. * - separation: all nodes in level C_i have distance greater than b^i to all * other nodes in level C_i. * * The value 'b' refers to the base, which is a parameter of the tree. These * three properties make the cover tree very good for fast, high-dimensional * nearest-neighbor search. * * The theoretical structure of the tree contains many 'implicit' nodes which * only have a "self-child" (a child referencing the same point, but at a lower * scale level). This practical implementation only constructs explicit nodes * -- non-leaf nodes with more than one child. A leaf node has no children, and * its scale level is INT_MIN. * * For more information on cover trees, see * * @code * @inproceedings{ * author = {Beygelzimer, Alina and Kakade, Sham and Langford, John}, * title = {Cover trees for nearest neighbor}, * booktitle = {Proceedings of the 23rd International Conference on Machine * Learning}, * series = {ICML '06}, * year = {2006}, * pages = {97--104] * } * @endcode * * For information on runtime bounds of the nearest-neighbor computation using * cover trees, see the following paper, presented at NIPS 2009: * * @code * @inproceedings{ * author = {Ram, P., and Lee, D., and March, W.B., and Gray, A.G.}, * title = {Linear-time Algorithms for Pairwise Statistical Problems}, * booktitle = {Advances in Neural Information Processing Systems 22}, * editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C.K.I. Williams * and A. Culotta}, * pages = {1527--1535}, * year = {2009} * } * @endcode * * The CoverTree class offers three template parameters; a custom metric type * can be used with MetricType (this class defaults to the L2-squared metric). * The root node's point can be chosen with the RootPointPolicy; by default, the * FirstPointIsRoot policy is used, meaning the first point in the dataset is * used. The StatisticType policy allows you to define statistics which can be * gathered during the creation of the tree. * * @tparam MetricType Metric type to use during tree construction. * @tparam RootPointPolicy Determines which point to use as the root node. * @tparam StatisticType Statistic to be used during tree creation. */ template, typename RootPointPolicy = FirstPointIsRoot, typename StatisticType = EmptyStatistic> class CoverTree { public: typedef arma::mat Mat; /** * Create the cover tree with the given dataset and given base. * The dataset will not be modified during the building procedure (unlike * BinarySpaceTree). * * The last argument will be removed in mlpack 1.1.0 (see #274 and #273). * * @param dataset Reference to the dataset to build a tree on. * @param base Base to use during tree building (default 2.0). */ CoverTree(const arma::mat& dataset, const double base = 2.0, MetricType* metric = NULL); /** * Create the cover tree with the given dataset and the given instantiated * metric. Optionally, set the base. The dataset will not be modified during * the building procedure (unlike BinarySpaceTree). * * @param dataset Reference to the dataset to build a tree on. * @param metric Instantiated metric to use during tree building. * @param base Base to use during tree building (default 2.0). */ CoverTree(const arma::mat& dataset, MetricType& metric, const double base = 2.0); /** * Construct a child cover tree node. This constructor is not meant to be * used externally, but it could be used to insert another node into a tree. * This procedure uses only one vector for the near set, the far set, and the * used set (this is to prevent unnecessary memory allocation in recursive * calls to this constructor). Therefore, the size of the near set, far set, * and used set must be passed in. The near set will be entirely used up, and * some of the far set may be used. The value of usedSetSize will be set to * the number of points used in the construction of this node, and the value * of farSetSize will be modified to reflect the number of points in the far * set _after_ the construction of this node. * * If you are calling this manually, be careful that the given scale is * as small as possible, or you may be creating an implicit node in your tree. * * @param dataset Reference to the dataset to build a tree on. * @param base Base to use during tree building. * @param pointIndex Index of the point this node references. * @param scale Scale of this level in the tree. * @param parent Parent of this node (NULL indicates no parent). * @param parentDistance Distance to the parent node. * @param indices Array of indices, ordered [ nearSet | farSet | usedSet ]; * will be modified to [ farSet | usedSet ]. * @param distances Array of distances, ordered the same way as the indices. * These represent the distances between the point specified by pointIndex * and each point in the indices array. * @param nearSetSize Size of the near set; if 0, this will be a leaf. * @param farSetSize Size of the far set; may be modified (if this node uses * any points in the far set). * @param usedSetSize The number of points used will be added to this number. */ CoverTree(const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize, MetricType& metric = NULL); /** * Manually construct a cover tree node; no tree assembly is done in this * constructor, and children must be added manually (use Children()). This * constructor is useful when the tree is being "imported" into the CoverTree * class after being created in some other manner. * * @param dataset Reference to the dataset this node is a part of. * @param base Base that was used for tree building. * @param pointIndex Index of the point in the dataset which this node refers * to. * @param scale Scale of this node's level in the tree. * @param parent Parent node (NULL indicates no parent). * @param parentDistance Distance to parent node point. * @param furthestDescendantDistance Distance to furthest descendant point. * @param metric Instantiated metric (optional). */ CoverTree(const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, const double furthestDescendantDistance, MetricType* metric = NULL); /** * Create a cover tree from another tree. Be careful! This may use a lot of * memory and take a lot of time. * * @param other Cover tree to copy from. */ CoverTree(const CoverTree& other); /** * Delete this cover tree node and its children. */ ~CoverTree(); //! A single-tree cover tree traverser; see single_tree_traverser.hpp for //! implementation. template class SingleTreeTraverser; //! A dual-tree cover tree traverser; see dual_tree_traverser.hpp. template class DualTreeTraverser; //! Get a reference to the dataset. const arma::mat& Dataset() const { return dataset; } //! Get the index of the point which this node represents. size_t Point() const { return point; } //! For compatibility with other trees; the argument is ignored. size_t Point(const size_t) const { return point; } bool IsLeaf() const { return (children.size() == 0); } size_t NumPoints() const { return 1; } //! Get a particular child node. const CoverTree& Child(const size_t index) const { return *children[index]; } //! Modify a particular child node. CoverTree& Child(const size_t index) { return *children[index]; } //! Get the number of children. size_t NumChildren() const { return children.size(); } //! Get the children. const std::vector& Children() const { return children; } //! Modify the children manually (maybe not a great idea). std::vector& Children() { return children; } //! Get the number of descendant points. size_t NumDescendants() const; //! Get the index of a particular descendant point. size_t Descendant(const size_t index) const; //! Get the scale of this node. int Scale() const { return scale; } //! Modify the scale of this node. Be careful... int& Scale() { return scale; } //! Get the base. double Base() const { return base; } //! Modify the base; don't do this, you'll break everything. double& Base() { return base; } //! Get the statistic for this node. const StatisticType& Stat() const { return stat; } //! Modify the statistic for this node. StatisticType& Stat() { return stat; } //! Return the minimum distance to another node. double MinDistance(const CoverTree* other) const; //! Return the minimum distance to another node given that the point-to-point //! distance has already been calculated. double MinDistance(const CoverTree* other, const double distance) const; //! Return the minimum distance to another point. double MinDistance(const arma::vec& other) const; //! Return the minimum distance to another point given that the distance from //! the center to the point has already been calculated. double MinDistance(const arma::vec& other, const double distance) const; //! Return the maximum distance to another node. double MaxDistance(const CoverTree* other) const; //! Return the maximum distance to another node given that the point-to-point //! distance has already been calculated. double MaxDistance(const CoverTree* other, const double distance) const; //! Return the maximum distance to another point. double MaxDistance(const arma::vec& other) const; //! Return the maximum distance to another point given that the distance from //! the center to the point has already been calculated. double MaxDistance(const arma::vec& other, const double distance) const; //! Return the minimum and maximum distance to another node. math::Range RangeDistance(const CoverTree* other) const; //! Return the minimum and maximum distance to another node given that the //! point-to-point distance has already been calculated. math::Range RangeDistance(const CoverTree* other, const double distance) const; //! Return the minimum and maximum distance to another point. math::Range RangeDistance(const arma::vec& other) const; //! Return the minimum and maximum distance to another point given that the //! point-to-point distance has already been calculated. math::Range RangeDistance(const arma::vec& other, const double distance) const; //! Returns true: this tree does have self-children. static bool HasSelfChildren() { return true; } //! Get the parent node. CoverTree* Parent() const { return parent; } //! Modify the parent node. CoverTree*& Parent() { return parent; } //! Get the distance to the parent. double ParentDistance() const { return parentDistance; } //! Modify the distance to the parent. double& ParentDistance() { return parentDistance; } //! Get the distance to the furthest point. This is always 0 for cover trees. double FurthestPointDistance() const { return 0.0; } //! Get the distance from the center of the node to the furthest descendant. double FurthestDescendantDistance() const { return furthestDescendantDistance; } //! Modify the distance from the center of the node to the furthest //! descendant. double& FurthestDescendantDistance() { return furthestDescendantDistance; } //! Get the minimum distance from the center to any bound edge (this is the //! same as furthestDescendantDistance). double MinimumBoundDistance() const { return furthestDescendantDistance; } //! Get the centroid of the node and store it in the given vector. void Centroid(arma::vec& centroid) const { centroid = dataset.col(point); } //! Get the instantiated metric. MetricType& Metric() const { return *metric; } private: //! Reference to the matrix which this tree is built on. const arma::mat& dataset; //! Index of the point in the matrix which this node represents. size_t point; //! The list of children; the first is the self-child. std::vector children; //! Scale level of the node. int scale; //! The base used to construct the tree. double base; //! The instantiated statistic. StatisticType stat; //! The number of descendant points. size_t numDescendants; //! The parent node (NULL if this is the root of the tree). CoverTree* parent; //! Distance to the parent. double parentDistance; //! Distance to the furthest descendant. double furthestDescendantDistance; //! Whether or not we need to destroy the metric in the destructor. bool localMetric; //! The metric used for this tree. MetricType* metric; /** * Create the children for this node. */ void CreateChildren(arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize); /** * Fill the vector of distances with the distances between the point specified * by pointIndex and each point in the indices array. The distances of the * first pointSetSize points in indices are calculated (so, this does not * necessarily need to use all of the points in the arrays). * * @param pointIndex Point to build the distances for. * @param indices List of indices to compute distances for. * @param distances Vector to store calculated distances in. * @param pointSetSize Number of points in arrays to calculate distances for. */ void ComputeDistances(const size_t pointIndex, const arma::Col& indices, arma::vec& distances, const size_t pointSetSize); /** * Split the given indices and distances into a near and a far set, returning * the number of points in the near set. The distances must already be * initialized. This will order the indices and distances such that the * points in the near set make up the first part of the array and the far set * makes up the rest: [ nearSet | farSet ]. * * @param indices List of indices; will be reordered. * @param distances List of distances; will be reordered. * @param bound If the distance is less than or equal to this bound, the point * is placed into the near set. * @param pointSetSize Size of point set (because we may be sorting a smaller * list than the indices vector will hold). */ size_t SplitNearFar(arma::Col& indices, arma::vec& distances, const double bound, const size_t pointSetSize); /** * Assuming that the list of indices and distances is sorted as * [ childFarSet | childUsedSet | farSet | usedSet ], * resort the sets so the organization is * [ childFarSet | farSet | childUsedSet | usedSet ]. * * The size_t parameters specify the sizes of each set in the array. Only the * ordering of the indices and distances arrays will be modified (not their * actual contents). * * The size of any of the four sets can be zero and this method will handle * that case accordingly. * * @param indices List of indices to sort. * @param distances List of distances to sort. * @param childFarSetSize Number of points in child far set (childFarSet). * @param childUsedSetSize Number of points in child used set (childUsedSet). * @param farSetSize Number of points in far set (farSet). */ size_t SortPointSet(arma::Col& indices, arma::vec& distances, const size_t childFarSetSize, const size_t childUsedSetSize, const size_t farSetSize); void MoveToUsedSet(arma::Col& indices, arma::vec& distances, size_t& nearSetSize, size_t& farSetSize, size_t& usedSetSize, arma::Col& childIndices, const size_t childFarSetSize, const size_t childUsedSetSize); size_t PruneFarSet(arma::Col& indices, arma::vec& distances, const double bound, const size_t nearSetSize, const size_t pointSetSize); /** * Take a look at the last child (the most recently created one) and remove * any implicit nodes that have been created. */ void RemoveNewImplicitNodes(); public: /** * Returns a string representation of this object. */ std::string ToString() const; size_t DistanceComps() const { return distanceComps; } size_t& DistanceComps() { return distanceComps; } private: size_t distanceComps; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "cover_tree_impl.hpp" #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/first_point_is_root.hpp0000644000176200001440000000343313647512751024755 0ustar liggesusers/** * @file first_point_is_root.hpp * @author Ryan Curtin * * A very simple policy for the cover tree; the first point in the dataset is * chosen as the root of the cover tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_FIRST_POINT_IS_ROOT_HPP #define __MLPACK_CORE_TREE_FIRST_POINT_IS_ROOT_HPP #include namespace mlpack { namespace tree { /** * This class is meant to be used as a choice for the policy class * RootPointPolicy of the CoverTree class. This policy determines which point * is used for the root node of the cover tree. This particular implementation * simply chooses the first point in the dataset as the root. A more complex * implementation might choose, for instance, the point with least maximum * distance to other points (the closest to the "middle"). */ class FirstPointIsRoot { public: /** * Return the point to be used as the root point of the cover tree. This just * returns 0. */ static size_t ChooseRoot(const arma::mat& /* dataset */) { return 0; } }; }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_FIRST_POINT_IS_ROOT_HPP RcppMLPACK/src/mlpack/core/tree/cover_tree/dual_tree_traverser.hpp0000644000176200001440000000672613647512751024730 0ustar liggesusers/** * @file dual_tree_traverser.hpp * @author Ryan Curtin * * A dual-tree traverser for the cover tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_HPP #include #include namespace mlpack { namespace tree { template template class CoverTree::DualTreeTraverser { public: /** * Initialize the dual tree traverser with the given rule type. */ DualTreeTraverser(RuleType& rule); /** * Traverse the two specified trees. * * @param queryNode Root of query tree. * @param referenceNode Root of reference tree. */ void Traverse(CoverTree& queryNode, CoverTree& referenceNode); //! Get the number of pruned nodes. size_t NumPrunes() const { return numPrunes; } //! Modify the number of pruned nodes. size_t& NumPrunes() { return numPrunes; } ///// These are all fake because this is a patch for kd-trees only and I still ///// want it to compile! size_t NumVisited() const { return 0; } size_t NumScores() const { return 0; } size_t NumBaseCases() const { return 0; } private: //! The instantiated rule set for pruning branches. RuleType& rule; //! The number of pruned nodes. size_t numPrunes; //! Struct used for traversal. struct DualCoverTreeMapEntry { //! The node this entry refers to. CoverTree* referenceNode; //! The score of the node. double score; //! The base case. double baseCase; //! The traversal info associated with the call to Score() for this entry. typename RuleType::TraversalInfoType traversalInfo; //! Comparison operator, for sorting within the map. bool operator<(const DualCoverTreeMapEntry& other) const { if (score == other.score) return (baseCase < other.baseCase); else return (score < other.score); } }; /** * Helper function for traversal of the two trees. */ void Traverse(CoverTree& queryNode, std::map >& referenceMap); //! Prepare map for recursion. void PruneMap(CoverTree& queryNode, std::map >& referenceMap, std::map >& childMap); void ReferenceRecursion(CoverTree& queryNode, std::map >& referenceMap); }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "dual_tree_traverser_impl.hpp" #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/cover_tree_impl.hpp0000644000176200001440000010625613647512751024044 0ustar liggesusers/** * @file cover_tree_impl.hpp * @author Ryan Curtin * * Implementation of CoverTree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_IMPL_HPP #define __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_IMPL_HPP // In case it hasn't already been included. #include "cover_tree.hpp" #include #include namespace mlpack { namespace tree { // Create the cover tree. template CoverTree::CoverTree( const arma::mat& dataset, const double base, MetricType* metric) : dataset(dataset), point(RootPointPolicy::ChooseRoot(dataset)), scale(INT_MAX), base(base), numDescendants(0), parent(NULL), parentDistance(0), furthestDescendantDistance(0), localMetric(metric == NULL), metric(metric), distanceComps(0) { // If we need to create a metric, do that. We'll just do it on the heap. if (localMetric) this->metric = new MetricType(); // If there is only one point in the dataset... uh, we're done. if (dataset.n_cols == 1) return; // Kick off the building. Create the indices array and the distances array. arma::Col indices = arma::linspace >(1, dataset.n_cols - 1, dataset.n_cols - 1); // This is now [1 2 3 4 ... n]. We must be sure that our point does not // occur. if (point != 0) indices[point - 1] = 0; // Put 0 back into the set; remove what was there. arma::vec distances(dataset.n_cols - 1); // Build the initial distances. ComputeDistances(point, indices, distances, dataset.n_cols - 1); // Create the children. size_t farSetSize = 0; size_t usedSetSize = 0; CreateChildren(indices, distances, dataset.n_cols - 1, farSetSize, usedSetSize); // If we ended up creating only one child, remove the implicit node. while (children.size() == 1) { // Prepare to delete the implicit child node. CoverTree* old = children[0]; // Now take its children and set their parent correctly. children.erase(children.begin()); for (size_t i = 0; i < old->NumChildren(); ++i) { children.push_back(&(old->Child(i))); // Set its parent correctly. old->Child(i).Parent() = this; } // Remove all the children so they don't get erased. old->Children().clear(); // Reduce our own scale. scale = old->Scale(); // Now delete it. delete old; } // Use the furthest descendant distance to determine the scale of the root // node. scale = (int) ceil(log(furthestDescendantDistance) / log(base)); // Initialize statistic. stat = StatisticType(*this); Rcpp::Rcout << distanceComps << " distance computations during tree " << "construction." << std::endl; } template CoverTree::CoverTree( const arma::mat& dataset, MetricType& metric, const double base) : dataset(dataset), point(RootPointPolicy::ChooseRoot(dataset)), scale(INT_MAX), base(base), numDescendants(0), parent(NULL), parentDistance(0), furthestDescendantDistance(0), localMetric(false), metric(&metric), distanceComps(0) { // If there is only one point in the dataset, uh, we're done. if (dataset.n_cols == 1) return; // Kick off the building. Create the indices array and the distances array. arma::Col indices = arma::linspace >(1, dataset.n_cols - 1, dataset.n_cols - 1); // This is now [1 2 3 4 ... n]. We must be sure that our point does not // occur. if (point != 0) indices[point - 1] = 0; // Put 0 back into the set; remove what was there. arma::vec distances(dataset.n_cols - 1); // Build the initial distances. ComputeDistances(point, indices, distances, dataset.n_cols - 1); // Create the children. size_t farSetSize = 0; size_t usedSetSize = 0; CreateChildren(indices, distances, dataset.n_cols - 1, farSetSize, usedSetSize); // If we ended up creating only one child, remove the implicit node. while (children.size() == 1) { // Prepare to delete the implicit child node. CoverTree* old = children[0]; // Now take its children and set their parent correctly. children.erase(children.begin()); for (size_t i = 0; i < old->NumChildren(); ++i) { children.push_back(&(old->Child(i))); // Set its parent correctly. old->Child(i).Parent() = this; } // Remove all the children so they don't get erased. old->Children().clear(); // Reduce our own scale. scale = old->Scale(); // Now delete it. delete old; } // Use the furthest descendant distance to determine the scale of the root // node. scale = (int) ceil(log(furthestDescendantDistance) / log(base)); // Initialize statistic. stat = StatisticType(*this); Rcpp::Rcout << distanceComps << " distance computations during tree " << "construction." << std::endl; } template CoverTree::CoverTree( const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize, MetricType& metric) : dataset(dataset), point(pointIndex), scale(scale), base(base), numDescendants(0), parent(parent), parentDistance(parentDistance), furthestDescendantDistance(0), localMetric(false), metric(&metric), distanceComps(0) { // If the size of the near set is 0, this is a leaf. if (nearSetSize == 0) { this->scale = INT_MIN; numDescendants = 1; stat = StatisticType(*this); return; } // Otherwise, create the children. CreateChildren(indices, distances, nearSetSize, farSetSize, usedSetSize); // Initialize statistic. stat = StatisticType(*this); } // Manually create a cover tree node. template CoverTree::CoverTree( const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, const double furthestDescendantDistance, MetricType* metric) : dataset(dataset), point(pointIndex), scale(scale), base(base), numDescendants(0), parent(parent), parentDistance(parentDistance), furthestDescendantDistance(furthestDescendantDistance), localMetric(metric == NULL), metric(metric), distanceComps(0) { // If necessary, create a local metric. if (localMetric) this->metric = new MetricType(); // Initialize the statistic. stat = StatisticType(*this); } template CoverTree::CoverTree( const CoverTree& other) : dataset(other.dataset), point(other.point), scale(other.scale), base(other.base), stat(other.stat), numDescendants(other.numDescendants), parent(other.parent), parentDistance(other.parentDistance), furthestDescendantDistance(other.furthestDescendantDistance), localMetric(false), metric(other.metric), distanceComps(0) { // Copy each child by hand. for (size_t i = 0; i < other.NumChildren(); ++i) { children.push_back(new CoverTree(other.Child(i))); children[i]->Parent() = this; } } template CoverTree::~CoverTree() { // Delete each child. for (size_t i = 0; i < children.size(); ++i) delete children[i]; // Delete the local metric, if necessary. if (localMetric) delete metric; } //! Return the number of descendant points. template inline size_t CoverTree::NumDescendants() const { return numDescendants; } //! Return the index of a particular descendant point. template inline size_t CoverTree::Descendant( const size_t index) const { // The first descendant is the point contained within this node. if (index == 0) return point; // Is it in the self-child? if (index < children[0]->NumDescendants()) return children[0]->Descendant(index); // Now check the other children. size_t sum = children[0]->NumDescendants(); for (size_t i = 1; i < children.size(); ++i) { if (index - sum < children[i]->NumDescendants()) return children[i]->Descendant(index - sum); sum += children[i]->NumDescendants(); } // This should never happen. return (size_t() - 1); } template double CoverTree::MinDistance( const CoverTree* other) const { // Every cover tree node will contain points up to base^(scale + 1) away. return std::max(metric->Evaluate(dataset.unsafe_col(point), other->Dataset().unsafe_col(other->Point())) - furthestDescendantDistance - other->FurthestDescendantDistance(), 0.0); } template double CoverTree::MinDistance( const CoverTree* other, const double distance) const { // We already have the distance as evaluated by the metric. return std::max(distance - furthestDescendantDistance - other->FurthestDescendantDistance(), 0.0); } template double CoverTree::MinDistance( const arma::vec& other) const { return std::max(metric->Evaluate(dataset.unsafe_col(point), other) - furthestDescendantDistance, 0.0); } template double CoverTree::MinDistance( const arma::vec& /* other */, const double distance) const { return std::max(distance - furthestDescendantDistance, 0.0); } template double CoverTree::MaxDistance( const CoverTree* other) const { return metric->Evaluate(dataset.unsafe_col(point), other->Dataset().unsafe_col(other->Point())) + furthestDescendantDistance + other->FurthestDescendantDistance(); } template double CoverTree::MaxDistance( const CoverTree* other, const double distance) const { // We already have the distance as evaluated by the metric. return distance + furthestDescendantDistance + other->FurthestDescendantDistance(); } template double CoverTree::MaxDistance( const arma::vec& other) const { return metric->Evaluate(dataset.unsafe_col(point), other) + furthestDescendantDistance; } template double CoverTree::MaxDistance( const arma::vec& /* other */, const double distance) const { return distance + furthestDescendantDistance; } //! Return the minimum and maximum distance to another node. template math::Range CoverTree:: RangeDistance(const CoverTree* other) const { const double distance = metric->Evaluate(dataset.unsafe_col(point), other->Dataset().unsafe_col(other->Point())); math::Range result; result.Lo() = distance - furthestDescendantDistance - other->FurthestDescendantDistance(); result.Hi() = distance + furthestDescendantDistance + other->FurthestDescendantDistance(); return result; } //! Return the minimum and maximum distance to another node given that the //! point-to-point distance has already been calculated. template math::Range CoverTree:: RangeDistance(const CoverTree* other, const double distance) const { math::Range result; result.Lo() = distance - furthestDescendantDistance - other->FurthestDescendantDistance(); result.Hi() = distance + furthestDescendantDistance + other->FurthestDescendantDistance(); return result; } //! Return the minimum and maximum distance to another point. template math::Range CoverTree:: RangeDistance(const arma::vec& other) const { const double distance = metric->Evaluate(dataset.unsafe_col(point), other); return math::Range(distance - furthestDescendantDistance, distance + furthestDescendantDistance); } //! Return the minimum and maximum distance to another point given that the //! point-to-point distance has already been calculated. template math::Range CoverTree:: RangeDistance(const arma::vec& /* other */, const double distance) const { return math::Range(distance - furthestDescendantDistance, distance + furthestDescendantDistance); } //! For a newly initialized node, create children using the near and far set. template inline void CoverTree::CreateChildren( arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize) { // Determine the next scale level. This should be the first level where there // are any points in the far set. So, if we know the maximum distance in the // distances array, this will be the largest i such that // maxDistance > pow(base, i) // and using this for the scale factor should guarantee we are not creating an // implicit node. If the maximum distance is 0, every point in the near set // will be created as a leaf, and a child to this node. We also do not need // to change the furthestChildDistance or furthestDescendantDistance. const double maxDistance = max(distances.rows(0, nearSetSize + farSetSize - 1)); if (maxDistance == 0) { // Make the self child at the lowest possible level. // This should not modify farSetSize or usedSetSize. size_t tempSize = 0; children.push_back(new CoverTree(dataset, base, point, INT_MIN, this, 0, indices, distances, 0, tempSize, usedSetSize, *metric)); distanceComps += children.back()->DistanceComps(); // Every point in the near set should be a leaf. for (size_t i = 0; i < nearSetSize; ++i) { // farSetSize and usedSetSize will not be modified. children.push_back(new CoverTree(dataset, base, indices[i], INT_MIN, this, distances[i], indices, distances, 0, tempSize, usedSetSize, *metric)); distanceComps += children.back()->DistanceComps(); usedSetSize++; } // The number of descendants is just the number of children, because each of // them are leaves and contain one point. numDescendants = children.size(); // Re-sort the dataset. We have // [ used | far | other used ] // and we want // [ far | all used ]. SortPointSet(indices, distances, 0, usedSetSize, farSetSize); return; } const int nextScale = std::min(scale, (int) ceil(log(maxDistance) / log(base))) - 1; const double bound = pow(base, nextScale); // First, make the self child. We must split the given near set into the near // set and far set for the self child. size_t childNearSetSize = SplitNearFar(indices, distances, bound, nearSetSize); // Build the self child (recursively). size_t childFarSetSize = nearSetSize - childNearSetSize; size_t childUsedSetSize = 0; children.push_back(new CoverTree(dataset, base, point, nextScale, this, 0, indices, distances, childNearSetSize, childFarSetSize, childUsedSetSize, *metric)); // Don't double-count the self-child (so, subtract one). numDescendants += children[0]->NumDescendants(); // The self-child can't modify the furthestChildDistance away from 0, but it // can modify the furthestDescendantDistance. furthestDescendantDistance = children[0]->FurthestDescendantDistance(); // Remove any implicit nodes we may have created. RemoveNewImplicitNodes(); distanceComps += children[0]->DistanceComps(); // Now the arrays, in memory, look like this: // [ childFar | childUsed | far | used ] // but we need to move the used points past our far set: // [ childFar | far | childUsed + used ] // and keeping in mind that childFar = our near set, // [ near | far | childUsed + used ] // is what we are trying to make. SortPointSet(indices, distances, childFarSetSize, childUsedSetSize, farSetSize); // Update size of near set and used set. nearSetSize -= childUsedSetSize; usedSetSize += childUsedSetSize; // Now for each point in the near set, we need to make children. To save // computation later, we'll create an array holding the points in the near // set, and then after each run we'll check which of those (if any) were used // and we will remove them. ...if that's faster. I think it is. while (nearSetSize > 0) { size_t newPointIndex = nearSetSize - 1; // Swap to front if necessary. if (newPointIndex != 0) { const size_t tempIndex = indices[newPointIndex]; const double tempDist = distances[newPointIndex]; indices[newPointIndex] = indices[0]; distances[newPointIndex] = distances[0]; indices[0] = tempIndex; distances[0] = tempDist; } // Will this be a new furthest child? if (distances[0] > furthestDescendantDistance) furthestDescendantDistance = distances[0]; // If there's only one point left, we don't need this crap. if ((nearSetSize == 1) && (farSetSize == 0)) { size_t childNearSetSize = 0; children.push_back(new CoverTree(dataset, base, indices[0], nextScale, this, distances[0], indices, distances, childNearSetSize, farSetSize, usedSetSize, *metric)); distanceComps += children.back()->DistanceComps(); numDescendants += children.back()->NumDescendants(); // Because the far set size is 0, we don't have to do any swapping to // move the point into the used set. ++usedSetSize; --nearSetSize; // And we're done. break; } // Create the near and far set indices and distance vectors. We don't fill // in the self-point, yet. arma::Col childIndices(nearSetSize + farSetSize); childIndices.rows(0, (nearSetSize + farSetSize - 2)) = indices.rows(1, nearSetSize + farSetSize - 1); arma::vec childDistances(nearSetSize + farSetSize); // Build distances for the child. ComputeDistances(indices[0], childIndices, childDistances, nearSetSize + farSetSize - 1); // Split into near and far sets for this point. childNearSetSize = SplitNearFar(childIndices, childDistances, bound, nearSetSize + farSetSize - 1); childFarSetSize = PruneFarSet(childIndices, childDistances, base * bound, childNearSetSize, (nearSetSize + farSetSize - 1)); // Now that we know the near and far set sizes, we can put the used point // (the self point) in the correct place; now, when we call // MoveToUsedSet(), it will move the self-point correctly. The distance // does not matter. childIndices(childNearSetSize + childFarSetSize) = indices[0]; childDistances(childNearSetSize + childFarSetSize) = 0; // Build this child (recursively). childUsedSetSize = 1; // Mark self point as used. children.push_back(new CoverTree(dataset, base, indices[0], nextScale, this, distances[0], childIndices, childDistances, childNearSetSize, childFarSetSize, childUsedSetSize, *metric)); numDescendants += children.back()->NumDescendants(); // Remove any implicit nodes. RemoveNewImplicitNodes(); distanceComps += children.back()->DistanceComps(); // Now with the child created, it returns the childIndices and // childDistances vectors in this form: // [ childFar | childUsed ] // For each point in the childUsed set, we must move that point to the used // set in our own vector. MoveToUsedSet(indices, distances, nearSetSize, farSetSize, usedSetSize, childIndices, childFarSetSize, childUsedSetSize); } // Calculate furthest descendant. for (size_t i = (nearSetSize + farSetSize); i < (nearSetSize + farSetSize + usedSetSize); ++i) if (distances[i] > furthestDescendantDistance) furthestDescendantDistance = distances[i]; } template size_t CoverTree::SplitNearFar( arma::Col& indices, arma::vec& distances, const double bound, const size_t pointSetSize) { // Sanity check; there is no guarantee that this condition will not be true. // ...or is there? if (pointSetSize <= 1) return 0; // We'll traverse from both left and right. size_t left = 0; size_t right = pointSetSize - 1; // A modification of quicksort, with the pivot value set to the bound. // Everything on the left of the pivot will be less than or equal to the // bound; everything on the right will be greater than the bound. while ((distances[left] <= bound) && (left != right)) ++left; while ((distances[right] > bound) && (left != right)) --right; while (left != right) { // Now swap the values and indices. const size_t tempPoint = indices[left]; const double tempDist = distances[left]; indices[left] = indices[right]; distances[left] = distances[right]; indices[right] = tempPoint; distances[right] = tempDist; // Traverse the left, seeing how many points are correctly on that side. // When we encounter an incorrect point, stop. We will switch it later. while ((distances[left] <= bound) && (left != right)) ++left; // Traverse the right, seeing how many points are correctly on that side. // When we encounter an incorrect point, stop. We will switch it with the // wrong point from the left side. while ((distances[right] > bound) && (left != right)) --right; } // The final left value is the index of the first far value. return left; } // Returns the maximum distance between points. template void CoverTree::ComputeDistances( const size_t pointIndex, const arma::Col& indices, arma::vec& distances, const size_t pointSetSize) { // For each point, rebuild the distances. The indices do not need to be // modified. distanceComps += pointSetSize; for (size_t i = 0; i < pointSetSize; ++i) { distances[i] = metric->Evaluate(dataset.unsafe_col(pointIndex), dataset.unsafe_col(indices[i])); } } template size_t CoverTree::SortPointSet( arma::Col& indices, arma::vec& distances, const size_t childFarSetSize, const size_t childUsedSetSize, const size_t farSetSize) { // We'll use low-level memcpy calls ourselves, just to ensure it's done // quickly and the way we want it to be. Unfortunately this takes up more // memory than one-element swaps, but there's not a great way around that. const size_t bufferSize = std::min(farSetSize, childUsedSetSize); const size_t bigCopySize = std::max(farSetSize, childUsedSetSize); // Sanity check: there is no need to sort if the buffer size is going to be // zero. if (bufferSize == 0) return (childFarSetSize + farSetSize); size_t* indicesBuffer = new size_t[bufferSize]; double* distancesBuffer = new double[bufferSize]; // The start of the memory region to copy to the buffer. const size_t bufferFromLocation = ((bufferSize == farSetSize) ? (childFarSetSize + childUsedSetSize) : childFarSetSize); // The start of the memory region to move directly to the new place. const size_t directFromLocation = ((bufferSize == farSetSize) ? childFarSetSize : (childFarSetSize + childUsedSetSize)); // The destination to copy the buffer back to. const size_t bufferToLocation = ((bufferSize == farSetSize) ? childFarSetSize : (childFarSetSize + farSetSize)); // The destination of the directly moved memory region. const size_t directToLocation = ((bufferSize == farSetSize) ? (childFarSetSize + farSetSize) : childFarSetSize); // Copy the smaller piece to the buffer. memcpy(indicesBuffer, indices.memptr() + bufferFromLocation, sizeof(size_t) * bufferSize); memcpy(distancesBuffer, distances.memptr() + bufferFromLocation, sizeof(double) * bufferSize); // Now move the other memory. memmove(indices.memptr() + directToLocation, indices.memptr() + directFromLocation, sizeof(size_t) * bigCopySize); memmove(distances.memptr() + directToLocation, distances.memptr() + directFromLocation, sizeof(double) * bigCopySize); // Now copy the temporary memory to the right place. memcpy(indices.memptr() + bufferToLocation, indicesBuffer, sizeof(size_t) * bufferSize); memcpy(distances.memptr() + bufferToLocation, distancesBuffer, sizeof(double) * bufferSize); delete[] indicesBuffer; delete[] distancesBuffer; // This returns the complete size of the far set. return (childFarSetSize + farSetSize); } template void CoverTree::MoveToUsedSet( arma::Col& indices, arma::vec& distances, size_t& nearSetSize, size_t& farSetSize, size_t& usedSetSize, arma::Col& childIndices, const size_t childFarSetSize, // childNearSetSize is 0 in this case. const size_t childUsedSetSize) { const size_t originalSum = nearSetSize + farSetSize + usedSetSize; // Loop across the set. We will swap points as we need. It should be noted // that farSetSize and nearSetSize may change with each iteration of this loop // (depending on if we make a swap or not). size_t startChildUsedSet = 0; // Where to start in the child set. for (size_t i = 0; i < nearSetSize; ++i) { // Discover if this point was in the child's used set. for (size_t j = startChildUsedSet; j < childUsedSetSize; ++j) { if (childIndices[childFarSetSize + j] == indices[i]) { // We have found a point; a swap is necessary. // Since this point is from the near set, to preserve the near set, we // must do a swap. if (farSetSize > 0) { if ((nearSetSize - 1) != i) { // In this case it must be a three-way swap. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; size_t tempNearIndex = indices[nearSetSize - 1]; double tempNearDist = distances[nearSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[i]; distances[nearSetSize + farSetSize - 1] = distances[i]; indices[nearSetSize - 1] = tempIndex; distances[nearSetSize - 1] = tempDist; indices[i] = tempNearIndex; distances[i] = tempNearDist; } else { // We can do a two-way swap. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[i]; distances[nearSetSize + farSetSize - 1] = distances[i]; indices[i] = tempIndex; distances[i] = tempDist; } } else if ((nearSetSize - 1) != i) { // A two-way swap is possible. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[i]; distances[nearSetSize + farSetSize - 1] = distances[i]; indices[i] = tempIndex; distances[i] = tempDist; } else { // No swap is necessary. } // We don't need to do a complete preservation of the child index set, // but we want to make sure we only loop over points we haven't seen. // So increment the child counter by 1 and move a point if we need. if (j != startChildUsedSet) { childIndices[childFarSetSize + j] = childIndices[childFarSetSize + startChildUsedSet]; } // Update all counters from the swaps we have done. ++startChildUsedSet; --nearSetSize; --i; // Since we moved a point out of the near set we must step back. break; // Break out of this for loop; back to the first one. } } } // Now loop over the far set. This loop is different because we only require // a normal two-way swap instead of the three-way swap to preserve the near // set / far set ordering. for (size_t i = 0; i < farSetSize; ++i) { // Discover if this point was in the child's used set. for (size_t j = startChildUsedSet; j < childUsedSetSize; ++j) { if (childIndices[childFarSetSize + j] == indices[i + nearSetSize]) { // We have found a point to swap. // Perform the swap. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[nearSetSize + i]; distances[nearSetSize + farSetSize - 1] = distances[nearSetSize + i]; indices[nearSetSize + i] = tempIndex; distances[nearSetSize + i] = tempDist; if (j != startChildUsedSet) { childIndices[childFarSetSize + j] = childIndices[childFarSetSize + startChildUsedSet]; } // Update all counters from the swaps we have done. ++startChildUsedSet; --farSetSize; --i; break; // Break out of this for loop; back to the first one. } } } // Update used set size. usedSetSize += childUsedSetSize; //Log::Assert(originalSum == (nearSetSize + farSetSize + usedSetSize)); } template size_t CoverTree::PruneFarSet( arma::Col& indices, arma::vec& distances, const double bound, const size_t nearSetSize, const size_t pointSetSize) { // What we are trying to do is remove any points greater than the bound from // the far set. We don't care what happens to those indices and distances... // so, we don't need to properly swap points -- just drop new ones in place. size_t left = nearSetSize; size_t right = pointSetSize - 1; while ((distances[left] <= bound) && (left != right)) ++left; while ((distances[right] > bound) && (left != right)) --right; while (left != right) { // We don't care what happens to the point which should be on the right. indices[left] = indices[right]; distances[left] = distances[right]; --right; // Since we aren't changing the right. // Advance to next location which needs to switch. while ((distances[left] <= bound) && (left != right)) ++left; while ((distances[right] > bound) && (left != right)) --right; } // The far set size is the left pointer, with the near set size subtracted // from it. return (left - nearSetSize); } /** * Take a look at the last child (the most recently created one) and remove any * implicit nodes that have been created. */ template inline void CoverTree:: RemoveNewImplicitNodes() { // If we created an implicit node, take its self-child instead (this could // happen multiple times). while (children[children.size() - 1]->NumChildren() == 1) { CoverTree* old = children[children.size() - 1]; children.erase(children.begin() + children.size() - 1); // Now take its child. children.push_back(&(old->Child(0))); // Set its parent correctly. old->Child(0).Parent() = this; old->Child(0).ParentDistance() = old->ParentDistance(); old->Child(0).DistanceComps() = old->DistanceComps(); // Remove its child (so it doesn't delete it). old->Children().erase(old->Children().begin() + old->Children().size() - 1); // Now delete it. delete old; } } /** * Returns a string representation of this object. */ template std::string CoverTree::ToString() const { std::ostringstream convert; convert << "CoverTree [" << this << "]" << std::endl; convert << "dataset: " << &dataset << std::endl; convert << "point: " << point << std::endl; convert << "scale: " << scale << std::endl; convert << "base: " << base << std::endl; // convert << "StatisticType: " << stat << std::endl; convert << "parent distance : " << parentDistance << std::endl; convert << "furthest child distance: " << furthestDescendantDistance; convert << std::endl; convert << "children:"; // How many levels should we print? This will print the top two tree levels. if (IsLeaf() == false && parent == NULL) { for (size_t i = 0; i < children.size(); i++) { convert << std::endl << mlpack::util::Indent(children.at(i)->ToString()); } } return convert.str(); } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/single_tree_traverser.hpp0000644000176200001440000000433613647512751025257 0ustar liggesusers/** * @file single_tree_traverser.hpp * @author Ryan Curtin * * Defines the SingleTreeTraverser for the cover tree. This implements a * single-tree breadth-first recursion with a pruning rule and a base case (two * point) rule. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_HPP #include #include "cover_tree.hpp" namespace mlpack { namespace tree { template template class CoverTree::SingleTreeTraverser { public: /** * Initialize the single tree traverser with the given rule. */ SingleTreeTraverser(RuleType& rule); /** * Traverse the tree with the given point. * * @param queryIndex The index of the point in the query set which is used as * the query point. * @param referenceNode The tree node to be traversed. */ void Traverse(const size_t queryIndex, CoverTree& referenceNode); //! Get the number of prunes so far. size_t NumPrunes() const { return numPrunes; } //! Set the number of prunes (good for a reset to 0). size_t& NumPrunes() { return numPrunes; } private: //! Reference to the rules with which the tree will be traversed. RuleType& rule; //! The number of nodes which have been pruned during traversal. size_t numPrunes; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "single_tree_traverser_impl.hpp" #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/single_tree_traverser_impl.hpp0000644000176200001440000001716313647512751026302 0ustar liggesusers/** * @file single_tree_traverser_impl.hpp * @author Ryan Curtin * * Implementation of the single tree traverser for cover trees, which implements * a breadth-first traversal. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP // In case it hasn't been included yet. #include "single_tree_traverser.hpp" #include namespace mlpack { namespace tree { //! This is the structure the cover tree map will use for traversal. template struct CoverTreeMapEntry { //! The node this entry refers to. CoverTree* node; //! The score of the node. double score; //! The index of the parent node. size_t parent; //! The base case evaluation. double baseCase; //! Comparison operator. bool operator<(const CoverTreeMapEntry& other) const { return (score < other.score); } }; template template CoverTree:: SingleTreeTraverser::SingleTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0) { /* Nothing to do. */ } template template void CoverTree:: SingleTreeTraverser::Traverse( const size_t queryIndex, CoverTree& referenceNode) { // This is a non-recursive implementation (which should be faster than a // recursive implementation). typedef CoverTreeMapEntry MapEntryType; // We will use this map as a priority queue. Each key represents the scale, // and then the vector is all the nodes in that scale which need to be // investigated. Because no point in a scale can add a point in its own // scale, we know that the vector for each scale is final when we get to it. // In addition, map is organized in such a way that rbegin() will return the // largest scale. std::map > mapQueue; // Create the score for the children. double rootChildScore = rule.Score(queryIndex, referenceNode); if (rootChildScore == DBL_MAX) { numPrunes += referenceNode.NumChildren(); } else { // Manually add the children of the first node. // Often, a ruleset will return without doing any computation on cover trees // using TreeTraits::FirstPointIsCentroid; this is an optimization that // (theoretically) the compiler should get right. double rootBaseCase = rule.BaseCase(queryIndex, referenceNode.Point()); // Don't add the self-leaf. size_t i = 0; if (referenceNode.Child(0).NumChildren() == 0) { ++numPrunes; i = 1; } for (/* i was set above. */; i < referenceNode.NumChildren(); ++i) { MapEntryType newFrame; newFrame.node = &referenceNode.Child(i); newFrame.score = rootChildScore; newFrame.baseCase = rootBaseCase; newFrame.parent = referenceNode.Point(); // Put it into the map. mapQueue[newFrame.node->Scale()].push_back(newFrame); } } // Now begin the iteration through the map, but only if it has anything in it. if (mapQueue.empty()) return; typename std::map >::reverse_iterator rit = mapQueue.rbegin(); // We will treat the leaves differently (below). while ((*rit).first != INT_MIN) { // Get a reference to the current scale. std::vector& scaleVector = (*rit).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); // Now loop over each element. for (size_t i = 0; i < scaleVector.size(); ++i) { // Get a reference to the current element. const MapEntryType& frame = scaleVector.at(i); CoverTree* node = frame.node; const double score = frame.score; const size_t parent = frame.parent; const size_t point = node->Point(); double baseCase = frame.baseCase; // First we recalculate the score of this node to find if we can prune it. if (rule.Rescore(queryIndex, *node, score) == DBL_MAX) { ++numPrunes; continue; } // Create the score for the children. const double childScore = rule.Score(queryIndex, *node); // Now if this childScore is DBL_MAX we can prune all children. In this // recursion setup pruning is all or nothing for children. if (childScore == DBL_MAX) { numPrunes += node->NumChildren(); continue; } // If we are a self-child, the base case has already been evaluated. // Often, a ruleset will return without doing any computation on cover // trees using TreeTraits::FirstPointIsCentroid; this is an optimization // that (theoretically) the compiler should get right. if (point != parent) baseCase = rule.BaseCase(queryIndex, point); // Don't add the self-leaf. size_t j = 0; if (node->Child(0).NumChildren() == 0) { ++numPrunes; j = 1; } for (/* j is already set. */; j < node->NumChildren(); ++j) { MapEntryType newFrame; newFrame.node = &node->Child(j); newFrame.score = childScore; newFrame.baseCase = baseCase; newFrame.parent = point; mapQueue[newFrame.node->Scale()].push_back(newFrame); } } // Now clear the memory for this scale; it isn't needed anymore. mapQueue.erase((*rit).first); } // Now deal with the leaves. for (size_t i = 0; i < mapQueue[INT_MIN].size(); ++i) { const MapEntryType& frame = mapQueue[INT_MIN].at(i); CoverTree* node = frame.node; const double score = frame.score; const size_t point = node->Point(); // First, recalculate the score of this node to find if we can prune it. double rescore = rule.Rescore(queryIndex, *node, score); if (rescore == DBL_MAX) { ++numPrunes; continue; } // For this to be a valid dual-tree algorithm, we *must* evaluate the // combination, even if pruning it will make no difference. It's the // definition. const double actualScore = rule.Score(queryIndex, *node); if (actualScore == DBL_MAX) { ++numPrunes; continue; } else { // Evaluate the base case, since the combination was not pruned. // Often, a ruleset will return without doing any computation on cover // trees using TreeTraits::FirstPointIsCentroid; this is an optimization // that (theoretically) the compiler should get right. rule.BaseCase(queryIndex, point); } } } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/traits.hpp0000644000176200001440000000412013647512751022157 0ustar liggesusers/** * @file traits.hpp * @author Ryan Curtin * * This file contains the specialization of the TreeTraits class for the * CoverTree type of tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_TRAITS_HPP #define __MLPACK_CORE_TREE_COVER_TREE_TRAITS_HPP #include namespace mlpack { namespace tree { /** * The specialization of the TreeTraits class for the CoverTree tree type. It * defines characteristics of the cover tree, and is used to help write * tree-independent (but still optimized) tree-based algorithms. See * mlpack/core/tree/tree_traits.hpp for more information. */ template class TreeTraits > { public: /** * The cover tree (or, this implementation of it) does not require that * children represent non-overlapping subsets of the parent node. */ static const bool HasOverlappingChildren = true; /** * Each cover tree node contains only one point, and that point is its * centroid. */ static const bool FirstPointIsCentroid = true; /** * Cover trees do have self-children. */ static const bool HasSelfChildren = true; /** * Points are not rearranged when the tree is built. */ static const bool RearrangesDataset = false; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/cover_tree/dual_tree_traverser_impl.hpp0000644000176200001440000002764513647512751025754 0ustar liggesusers/** * @file dual_tree_traverser_impl.hpp * @author Ryan Curtin * * A dual-tree traverser for the cover tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP #include #include namespace mlpack { namespace tree { template template CoverTree:: DualTreeTraverser::DualTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0) { /* Nothing to do. */ } template template void CoverTree:: DualTreeTraverser::Traverse( CoverTree& queryNode, CoverTree& referenceNode) { // Start by creating a map and adding the reference root node to it. std::map > refMap; DualCoverTreeMapEntry rootRefEntry; rootRefEntry.referenceNode = &referenceNode; // Perform the evaluation between the roots of either tree. rootRefEntry.score = rule.Score(queryNode, referenceNode); rootRefEntry.baseCase = rule.BaseCase(queryNode.Point(), referenceNode.Point()); rootRefEntry.traversalInfo = rule.TraversalInfo(); refMap[referenceNode.Scale()].push_back(rootRefEntry); Traverse(queryNode, refMap); } template template void CoverTree:: DualTreeTraverser::Traverse( CoverTree& queryNode, std::map >& referenceMap) { if (referenceMap.size() == 0) return; // Nothing to do! // First recurse down the reference nodes as necessary. ReferenceRecursion(queryNode, referenceMap); // Did the map get emptied? if (referenceMap.size() == 0) return; // Nothing to do! // Now, reduce the scale of the query node by recursing. But we can't recurse // if the query node is a leaf node. if ((queryNode.Scale() != INT_MIN) && (queryNode.Scale() >= (*referenceMap.rbegin()).first)) { // Recurse into the non-self-children first. The recursion order cannot // affect the runtime of the algorithm, because each query child recursion's // results are separate and independent. I don't think this is true in // every case, and we may have to modify this section to consider scores in // the future. for (size_t i = 1; i < queryNode.NumChildren(); ++i) { // We need a copy of the map for this child. std::map > childMap; PruneMap(queryNode.Child(i), referenceMap, childMap); Traverse(queryNode.Child(i), childMap); } std::map > selfChildMap; PruneMap(queryNode.Child(0), referenceMap, selfChildMap); Traverse(queryNode.Child(0), selfChildMap); } if (queryNode.Scale() != INT_MIN) return; // No need to evaluate base cases at this level. It's all done. // If we have made it this far, all we have is a bunch of base case // evaluations to do. //Log::Assert((*referenceMap.begin()).first == INT_MIN); //Log::Assert(queryNode.Scale() == INT_MIN); std::vector& pointVector = (*referenceMap.begin()).second; for (size_t i = 0; i < pointVector.size(); ++i) { // Get a reference to the frame. const DualCoverTreeMapEntry& frame = pointVector[i]; CoverTree* refNode = frame.referenceNode; // If the point is the same as both parents, then we have already done this // base case. if ((refNode->Point() == refNode->Parent()->Point()) && (queryNode.Point() == queryNode.Parent()->Point())) { ++numPrunes; continue; } // Score the node, to see if we can prune it, after restoring the traversal // info. rule.TraversalInfo() = frame.traversalInfo; double score = rule.Score(queryNode, *refNode); if (score == DBL_MAX) { ++numPrunes; continue; } // If not, compute the base case. rule.BaseCase(queryNode.Point(), pointVector[i].referenceNode->Point()); } } template template void CoverTree:: DualTreeTraverser::PruneMap( CoverTree& queryNode, std::map >& referenceMap, std::map >& childMap) { if (referenceMap.empty()) return; // Nothing to do. // Copy the zero set first. if ((*referenceMap.begin()).first == INT_MIN) { // Get a reference to the vector representing the entries at this scale. std::vector& scaleVector = (*referenceMap.begin()).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); const int thisScale = (*referenceMap.begin()).first; childMap[thisScale].reserve(scaleVector.size()); std::vector& newScaleVector = childMap[thisScale]; // Loop over each entry in the vector. for (size_t j = 0; j < scaleVector.size(); ++j) { const DualCoverTreeMapEntry& frame = scaleVector[j]; // First evaluate if we can prune without performing the base case. CoverTree* refNode = frame.referenceNode; // Perform the actual scoring, after restoring the traversal info. rule.TraversalInfo() = frame.traversalInfo; double score = rule.Score(queryNode, *refNode); if (score == DBL_MAX) { // Pruned. Move on. ++numPrunes; continue; } // If it isn't pruned, we must evaluate the base case. const double baseCase = rule.BaseCase(queryNode.Point(), refNode->Point()); // Add to child map. newScaleVector.push_back(frame); newScaleVector.back().score = score; newScaleVector.back().baseCase = baseCase; newScaleVector.back().traversalInfo = rule.TraversalInfo(); } // If we didn't add anything, then strike this vector from the map. if (newScaleVector.size() == 0) childMap.erase((*referenceMap.begin()).first); } typename std::map >::reverse_iterator it = referenceMap.rbegin(); while ((it != referenceMap.rend())) { const int thisScale = (*it).first; if (thisScale == INT_MIN) // We already did it. break; // Get a reference to the vector representing the entries at this scale. std::vector& scaleVector = (*it).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); childMap[thisScale].reserve(scaleVector.size()); std::vector& newScaleVector = childMap[thisScale]; // Loop over each entry in the vector. for (size_t j = 0; j < scaleVector.size(); ++j) { const DualCoverTreeMapEntry& frame = scaleVector[j]; // First evaluate if we can prune without performing the base case. CoverTree* refNode = frame.referenceNode; // Perform the actual scoring, after restoring the traversal info. rule.TraversalInfo() = frame.traversalInfo; double score = rule.Score(queryNode, *refNode); if (score == DBL_MAX) { // Pruned. Move on. ++numPrunes; continue; } // If it isn't pruned, we must evaluate the base case. const double baseCase = rule.BaseCase(queryNode.Point(), refNode->Point()); // Add to child map. newScaleVector.push_back(frame); newScaleVector.back().score = score; newScaleVector.back().baseCase = baseCase; newScaleVector.back().traversalInfo = rule.TraversalInfo(); } // If we didn't add anything, then strike this vector from the map. if (newScaleVector.size() == 0) childMap.erase((*it).first); ++it; // Advance to next scale. } } template template void CoverTree:: DualTreeTraverser::ReferenceRecursion( CoverTree& queryNode, std::map >& referenceMap) { // First, reduce the maximum scale in the reference map down to the scale of // the query node. while (!referenceMap.empty()) { // Hacky bullshit to imitate jl cover tree. if (queryNode.Parent() == NULL && (*referenceMap.rbegin()).first < queryNode.Scale()) break; if (queryNode.Parent() != NULL && (*referenceMap.rbegin()).first <= queryNode.Scale()) break; // If the query node's scale is INT_MIN and the reference map's maximum // scale is INT_MIN, don't try to recurse... if ((queryNode.Scale() == INT_MIN) && ((*referenceMap.rbegin()).first == INT_MIN)) break; // Get a reference to the current largest scale. std::vector& scaleVector = (*referenceMap.rbegin()).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); // Now loop over each element. for (size_t i = 0; i < scaleVector.size(); ++i) { // Get a reference to the current element. const DualCoverTreeMapEntry& frame = scaleVector.at(i); CoverTree* refNode = frame.referenceNode; // Create the score for the children. double score = rule.Rescore(queryNode, *refNode, frame.score); // Now if this childScore is DBL_MAX we can prune all children. In this // recursion setup pruning is all or nothing for children. if (score == DBL_MAX) { ++numPrunes; continue; } // If it is not pruned, we must evaluate the base case. // Add the children. for (size_t j = 0; j < refNode->NumChildren(); ++j) { rule.TraversalInfo() = frame.traversalInfo; double childScore = rule.Score(queryNode, refNode->Child(j)); if (childScore == DBL_MAX) { ++numPrunes; continue; } // It wasn't pruned; evaluate the base case. const double baseCase = rule.BaseCase(queryNode.Point(), refNode->Child(j).Point()); DualCoverTreeMapEntry newFrame; newFrame.referenceNode = &refNode->Child(j); newFrame.score = childScore; // Use the score of the parent. newFrame.baseCase = baseCase; newFrame.traversalInfo = rule.TraversalInfo(); referenceMap[newFrame.referenceNode->Scale()].push_back(newFrame); } } // Now clear the memory for this scale; it isn't needed anymore. referenceMap.erase((*referenceMap.rbegin()).first); } } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/tree_traits.hpp0000644000176200001440000000754213647512751021054 0ustar liggesusers/** * @file tree_traits.hpp * @author Ryan Curtin * * This file implements the basic, unspecialized TreeTraits class, which * provides information about tree types. If you create a tree class, you * should specialize this class with the characteristics of your tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_TREE_TRAITS_HPP #define __MLPACK_CORE_TREE_TREE_TRAITS_HPP namespace mlpack { namespace tree { /** * The TreeTraits class provides compile-time information on the characteristics * of a given tree type. These include traits such as whether or not a node * knows the distance to its parent node, or whether or not the subspaces * represented by children can overlap. * * These traits can be used for static compile-time optimization: * * @code * // This if statement will be optimized out at compile time! * if (TreeTraits::HasOverlappingChildren == false) * { * // Do a simpler computation because no children overlap. * } * else * { * // Do the full, complex calculation. * } * @endcode * * The traits can also be used in conjunction with SFINAE to write specialized * versions of functions: * * @code * template * void Compute(TreeType& node, * boost::enable_if< * TreeTraits::RearrangesDataset>::type*) * { * // Computation where special dataset-rearranging tree constructor is * // called. * } * * template * void Compute(TreeType& node, * boost::enable_if< * !TreeTraits::RearrangesDataset>::type*) * { * // Computation where normal tree constructor is called. * } * @endcode * * In those two examples, the boost::enable_if<> class takes a boolean template * parameter which allows that function to be called when the boolean is true. * * Each trait must be a static const value and not a function; only const values * can be used as template parameters (with the exception of constexprs, which * are a C++11 feature; but MLPACK is not using C++11). By default (the * unspecialized implementation of TreeTraits), each parameter is set to make as * few assumptions about the tree as possible; so, even if TreeTraits is not * specialized for a particular tree type, tree-based algorithms should still * work. * * When you write your own tree, you must specialize the TreeTraits class to * your tree type and set the corresponding values appropriately. See * mlpack/core/tree/binary_space_tree/traits.hpp for an example. */ template class TreeTraits { public: /** * This is true if the subspaces represented by the children of a node can * overlap. */ static const bool HasOverlappingChildren = true; /** * This is true if Point(0) is the centroid of the node. */ static const bool FirstPointIsCentroid = false; /** * This is true if the points contained in the first child of a node * (Child(0)) are also contained in that node. */ static const bool HasSelfChildren = false; /** * This is true if the tree rearranges points in the dataset when it is built. */ static const bool RearrangesDataset = false; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/src/mlpack/core/tree/traversal_info.hpp0000644000176200001440000000764213647512751021546 0ustar liggesusers/** * @file traversal_info.hpp * @author Ryan Curtin * * This class will hold the traversal information for dual-tree traversals. A * dual-tree traversal should be updating the members of this class before * Score() is called. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_TRAVERSAL_INFO_HPP #define __MLPACK_CORE_TREE_TRAVERSAL_INFO_HPP /** * The TraversalInfo class holds traversal information which is used in * dual-tree (and single-tree) traversals. A traversal should be updating the * members of this class before Score() is called. This class should be held as * a member of the RuleType class and the interface to it should be through a * TraversalInfo() method. * * The information held by this class is the last node combination visited * before the current node combination was recursed into, and the score * resulting from when Score() was called on that combination. However, this * information is identical for a query node and a reference node in a * particular node combination, so traversals only need to update the * TraversalInfo object in a query node (and the algorithms should only use the * TraversalInfo object from a query node). * * In general, this auxiliary traversal information is used to try and make a * prune without needing to call BaseCase() or calculate the distance between * nodes. Using this information you can place bounds on the distance between * the two nodes quickly. * * If the traversal is not updating the members of this class correctly, a * likely result is a null pointer dereference. Dual-tree algorithms should * assume that the members are set properly and should not need to check for * null pointers. * * There is one exception, which is the root node combination; the score can be * set to 0 and the query and reference nodes can just be set to the root nodes; * no algorithm should be able to prune the root combination anyway. */ template class TraversalInfo { public: /** * Create the TraversalInfo object and initialize the pointers to NULL. */ TraversalInfo() : lastQueryNode(NULL), lastReferenceNode(NULL), lastScore(0.0), lastBaseCase(0.0) { /* Nothing to do. */ } //! Get the last query node. TreeType* LastQueryNode() const { return lastQueryNode; } //! Modify the last query node. TreeType*& LastQueryNode() { return lastQueryNode; } //! Get the last reference node. TreeType* LastReferenceNode() const { return lastReferenceNode; } //! Modify the last reference node. TreeType*& LastReferenceNode() { return lastReferenceNode; } //! Get the score associated with the last query and reference nodes. double LastScore() const { return lastScore; } //! Modify the score associated with the last query and reference nodes. double& LastScore() { return lastScore; } //! Get the base case associated with the last node combination. double LastBaseCase() const { return lastBaseCase; } //! Modify the base case associated with the last node combination. double& LastBaseCase() { return lastBaseCase; } private: //! The last query node. TreeType* lastQueryNode; //! The last reference node. TreeType* lastReferenceNode; //! The last score. double lastScore; //! The last base case. double lastBaseCase; }; #endif RcppMLPACK/src/mlpack/core.hpp0000644000176200001440000001753513647512751015573 0ustar liggesusers/*** * @file core.hpp * * Include all of the base components required to write MLPACK methods, and the * main MLPACK Doxygen documentation. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_HPP #define __MLPACK_CORE_HPP #if _WIN64 #ifndef ARMA_64BIT_WORD #define ARMA_64BIT_WORD #endif #endif /** * @mainpage MLPACK Documentation * * @section intro_sec Introduction * * MLPACK is an intuitive, fast, scalable C++ machine learning library, meant to * be a machine learning analog to LAPACK. It aims to implement a wide array of * machine learning methods and function as a "swiss army knife" for machine * learning researchers. The MLPACK development website can be found at * http://mlpack.org. * * MLPACK uses the Armadillo C++ matrix library (http://arma.sourceforge.net) * for general matrix, vector, and linear algebra support. MLPACK also uses the * program_options, math_c99, and unit_test_framework components of the Boost * library; in addition, LibXml2 is used. * * @section howto How To Use This Documentation * * This documentation is API documentation similar to Javadoc. It isn't * necessarily a tutorial, but it does provide detailed documentation on every * namespace, method, and class. * * Each MLPACK namespace generally refers to one machine learning method, so * browsing the list of namespaces provides some insight as to the breadth of * the methods contained in the library. * * To generate this documentation in your own local copy of MLPACK, you can * simply use Doxygen, from the root directory (@c /mlpack/trunk/ ): * * @code * $ doxygen * @endcode * * @section executables Executables * * MLPACK provides several executables so that MLPACK methods can be used * without any need for knowledge of C++. These executables are all * self-documented, and that documentation can be accessed by running the * executables with the '-h' or '--help' flag. * * A full list of executables is given below: * * allkfn, allknn, emst, gmm, hmm_train, hmm_loglik, hmm_viterbi, hmm_generate, * kernel_pca, kmeans, lars, linear_regression, local_coordinate_coding, mvu, * nbc, nca, pca, radical, sparse_coding * * @section tutorial Tutorials * * A few short tutorials on how to use MLPACK are given below. * * - @ref build * - @ref matrices * - @ref iodoc * - @ref timer * - @ref sample * - @ref verinfo * * Tutorials on specific methods are also available. * * - @ref nstutorial * - @ref lrtutorial * - @ref rstutorial * - @ref dettutorial * - @ref emst_tutorial * - @ref kmtutorial * - @ref fmkstutorial * * @section methods Methods in MLPACK * * The following methods are included in MLPACK: * * - Decision Stump - mlpack::decision_stump::DecisionStump * - Density Estimation Trees - mlpack::det::DTree * - Euclidean Minimum Spanning Trees - mlpack::emst::DualTreeBoruvka * - Gaussian Mixture Models (GMMs) - mlpack::gmm::GMM * - Hidden Markov Models (HMMs) - mlpack::hmm::HMM * - Kernel PCA - mlpack::kpca::KernelPCA * - K-Means Clustering - mlpack::kmeans::KMeans * - Least-Angle Regression (LARS/LASSO) - mlpack::regression::LARS * - Local Coordinate Coding - mlpack::lcc::LocalCoordinateCoding * - Locality-Sensitive Hashing - mlpack::neighbor::LSHSearch * - Naive Bayes Classifier - mlpack::naive_bayes::NaiveBayesClassifier * - Neighborhood Components Analysis (NCA) - mlpack::nca::NCA * - Nonnegative Matrix Factorization (NMF) - mlpack::amf::AMF<> * - Nystroem Method - mlpack::kernel::NystroemMethod * - Perceptron - mlpack::perceptron::Perceptron * - Principal Components Analysis (PCA) - mlpack::pca::PCA * - QUIC-SVD - mlpack::svd::QUIC_SVD * - RADICAL (ICA) - mlpack::radical::Radical * - Regularized SVD - mlpack::svd::RegularizedSVD * - Simple Least-Squares Linear Regression - * mlpack::regression::LinearRegression * - Sparse Autoencoding - mlpack::nn::SparseAutoencoder * - Sparse Coding - mlpack::sparse_coding::SparseCoding * - Tree-based neighbor search (AllkNN, AllkFN) - * mlpack::neighbor::NeighborSearch * - Tree-based range search - mlpack::range::RangeSearch * * @section remarks Final Remarks * * MLPACK contributors include: * * - Ryan Curtin * - James Cline * - Neil Slagle * - Matthew Amidon * - Vlad Grantcharov * - Ajinkya Kale * - Bill March * - Dongryeol Lee * - Nishant Mehta * - Parikshit Ram * - Rajendran Mohan * - Trironk Kiatkungwanglai * - Patrick Mason * - Chip Mappus * - Hua Ouyang * - Long Quoc Tran * - Noah Kauffman * - Guillermo Colon * - Wei Guan * - Ryan Riegel * - Nikolaos Vasiloglou * - Garry Boyer * - Andreas Löf * - Marcus Edel * - Mudit Raj Gupta * - Sumedh Ghaisas * - Michael Fox * - Ryan Birmingham * - Siddharth Agrawal * - Saheb Motiani * - Yash Vadalia * - Abhishek Laddha * - Vahab Akbarzadeh * - Andrew Wells * - Zhihao Lou */ // First, include all of the prerequisites. #include // Now the core mlpack classes. #include #include //#include //#include //#include #include #include #include #include #include #include //#include #include #include // Include kernel traits. #include #include #include #include #include #include #include #include #include #include #include #endif // Clean up unfortunate Windows preprocessor definitions, even if this file was // already included. Use std::min and std::max! #ifdef _WIN32 #ifdef min #undef min #endif #ifdef max #undef max #endif #endif RcppMLPACK/src/RcppMLPACK.h0000644000176200001440000000733513647514343014624 0ustar liggesusers#ifndef RcppMLPACK__RcppMLPACK__h #define RcppMLPACK__RcppMLPACK__h #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #endif RcppMLPACK/src/Makevars.win0000644000176200001440000000322113647514343015135 0ustar liggesusersCXX_STD = CXX11 MLPACKOBJECTS=\ ./mlpack/methods/pca/pca.o \ ./mlpack/methods/det/dtree.o \ ./mlpack/methods/det/dt_utils.o \ ./mlpack/methods/linear_regression/linear_regression.o \ ./mlpack/methods/radical/radical.o \ ./mlpack/methods/neighbor_search/unmap.o \ ./mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.o \ ./mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.o \ ./mlpack/methods/lars/lars.o \ ./mlpack/methods/logistic_regression/logistic_regression_function.o \ ./mlpack/methods/regularized_svd/regularized_svd_function.o \ ./mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.o \ ./mlpack/core/math/lin_alg.o \ ./mlpack/core/math/random.o \ ./mlpack/core/kernels/pspectrum_string_kernel.o \ ./mlpack/core/optimizers/lbfgs/test_functions.o \ ./mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.o \ ./mlpack/core/optimizers/sgd/test_function.o \ ./mlpack/core/dists/gaussian_distribution.o \ ./mlpack/core/dists/discrete_distribution.o \ ./mlpack/core/dists/laplace_distribution.o \ ./mlpack/core/tree/mrkd_statistic.o \ ./mlpack/core/tree/cosine_tree/cosine_tree.o \ ./mlpack/core/util/string_util.o PKGOBJECTS = RcppExports.o kmeans.o OBJECTS= $(MLPACKOBJECTS) $(PKGOBJECTS) PKG_CPPFLAGS=-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false -I. PKG_LIBS= $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) USERDIR = ../inst/lib$(R_ARCH) STATICLIB = libRcppMLPACK.a USERLIB = $(USERDIR)/$(STATICLIB) RM = rm -f .PHONY: all clean all: userlib $(SHLIB) clean: ${RM} $(OBJECTS) $(SHLIB) $(USERLIB): $(OBJECTS) userlib: $(STATICLIB) -mkdir -p $(USERDIR) -mv $(STATICLIB) $(USERLIB) $(STATICLIB): $(OBJECTS) RcppMLPACK/src/kmeans.cpp0000644000176200001440000000072113647514343014631 0ustar liggesusers#include "RcppMLPACK.h" using namespace mlpack::kmeans; using namespace Rcpp; // [[Rcpp::export]] List kmeans(SEXP data, const int& clusters) { NumericMatrix Xr(data); arma::mat X(Xr.begin(), Xr.nrow(), Xr.ncol(), false); arma::Col assignments; // Initialize with the default arguments. KMeans<> k; k.Cluster(X, clusters, assignments); return List::create(_["clusters"] = clusters, _["result"] = assignments); } RcppMLPACK/src/RcppExports.cpp0000644000176200001440000000077113647514343015651 0ustar liggesusers#include #include using namespace Rcpp; // kmeans List kmeans(SEXP data, const int& clusters); RcppExport SEXP RcppMLPACK_kmeans(SEXP dataSEXP, SEXP clustersSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< SEXP >::type data(dataSEXP); Rcpp::traits::input_parameter< const int& >::type clusters(clustersSEXP); rcpp_result_gen = Rcpp::wrap(kmeans(data, clusters)); return rcpp_result_gen; END_RCPP } RcppMLPACK/vignettes/0000755000176200001440000000000013647514343014070 5ustar liggesusersRcppMLPACK/vignettes/RcppMLPACK-intro.Rnw0000644000176200001440000002564113647512751017516 0ustar liggesusers\documentclass[letterpaper]{article} %\VignetteIndexEntry{RcppMLPACK-introduction} %\VignetteKeywords{R, C++, Armadillo, MLPACK, machine learning} %\VignettePackage{RcppMLPACK} \usepackage[utf8]{inputenc} \usepackage{amsmath,amsthm, amsfonts, array} \usepackage{graphicx,booktabs,xcolor,color} \usepackage{geometry} \geometry{left=2.5cm,right=2.5cm,top=2.3cm,bottom=2.3cm} \usepackage{url} \usepackage{amsmath} \usepackage{multirow} \usepackage{epstopdf} \usepackage{listings} \definecolor{shadecolor}{rgb}{0.95,0.95,0.95} \lstset{xrightmargin=0pt, keepspaces=true, basicstyle=\ttfamily, commentstyle=\ttfamily, numbers=left, numberstyle=\tiny, backgroundcolor=\color{shadecolor}, columns=fullflexible, showstringspaces=false, breaklines=true, framerule=0.7pt, frameround=tttt, captionpos=b, xleftmargin=-0.2em, xrightmargin=-0.25em, aboveskip=1em, } \usepackage{indentfirst} \setlength{\parindent}{2em} \linespread{1.3} \begin{document} \title{\textbf{RcppMLPACK}: R integration with MLPACK using Rcpp} \author{Qiang Kou\\qkou@umail.iu.edu} \date{\today} \maketitle \section{MLPACK} MLPACK\cite{curtin2013mlpack} is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. It outperforms competing machine learning libraries by large margins, for detailed results of benchmarking, please refer to Ref\cite{curtin2013mlpack}. \subsection{Input/Output using Armadillo} MLPACK uses Armadillo as input and output, which provides vector, matrix and cube types (integer, floating point and complex numbers are supported)\cite{arma}. By providing Matlab-like syntax and various matrix decompositions through integration with LAPACK or other high performance libraries (such as Intel MKL, AMD ACML, or OpenBLAS), Armadillo keeps a good balance between speed and ease of use. Armadillo is widely used in C++ libraries like MLPACK. However, there is one point we need to pay attention to: Armadillo matrices in MLPACK are stored in a column-major format for speed. That means observations are stored as columns and dimensions as rows. So when using MLPACK, additional transpose may be needed. \subsection{Simple API} Although MLPACK relies on template techniques in C++, it provides an intuitive and simple API. For the standard usage of methods, default arguments are provided. The MLPACK paper\cite{curtin2013mlpack} provides a good example: standard k-means clustering in Euclidean space can be initialized like below: \begin{lstlisting}[caption={k-means with all default arguments}, language=C++] KMeans<> k(); \end{lstlisting} If we want to use Manhattan distance, a different cluster initialization policy, and allow empty clusters, the object can be initialized with additional arguments: \begin{lstlisting}[caption={k-means with additional arguments}, language=C++] KMeans k(); \end{lstlisting} \subsection{Available methods in MLPACK} Commonly used machine learning methods are all implemented in MLPACK. Besides, it contains a strong set of tree-building and tree-query routines. Available methods in MLPACK: \begin{itemize} \item Fast Hierarchical Clustering (Euclidean Minimum Spanning Trees) \item Gaussian Mixture Models (trained via EM) \item Hidden Markov Models (training, prediction, and classification) \item Kernel Principal Components Analysis \item K-Means clustering \item LARS/Lasso Regression \item Least-squares Linear Regression \item Maximum Variance Unfolding (using LRSDP) \item Naive Bayes Classifier \item Neighbourhood Components Analysis (using LRSDP) \item RADICAL (Robust, Accurate, Direct ICA aLgorithm) \item Tree-based k-nearest-neighbours search and classifier \item Tree-based range search \end{itemize} For the details of each algorithm and usage, please visit the tutorial page of MLPACK (\url{http://www.mlpack.org/tutorial.html}). \section{RcppMLPACK} \subsection{Rcpp and RcppArmadillo} As said above, MLPACK is a C++ library using Armadillo. Since we \textbf{Rcpp}\cite{Eddelbuettel:Francois:2011:JSSOBK:v40i08} and \textbf{RcppArmadillo}\cite{eddelbuettel2014rcpparmadillo}, which can used to integrate C++ and Armadillo with R seamlessly, \textbf{RcppMLPACK} becomes something very natural. \subsection{k-means example} Here we continue the k-means example above. By using \textbf{RcppMLPACK}, a k-means method which can be called by R can be implemented like Listing \ref{kmeans}. The interfere between R and C++ is handled by \textbf{Rcpp} and \textbf{RcppArmadillo}. \begin{lstlisting}[caption={k-means example}, language=C++, label={kmeans}] #include "RcppMLPACK.h" using namespace mlpack::kmeans; using namespace Rcpp; // [[Rcpp::export]] List kmeans(const arma::mat& data, const int& clusters) { arma::Col assignments; // Initialize with the default arguments. KMeans<> k; k.Cluster(data, clusters, assignments); return List::create(_["clusters"] = clusters, _["result"] = assignments); } \end{lstlisting} \subsection{Using inline package} \textbf{inline}\cite{inline} package provides a complete wrapper around the compilation, linking, and loading steps. So all the steps can be done in an R session. There is no reason that \textbf{RcppMLPACK} doesn't support the inline compilation. \begin{lstlisting}[caption={k-means example, inline version}, language=R] library(inline) library(RcppMLPACK) code <- ' arma::mat data = as(test); int clusters = as(n); arma::Col assignments; mlpack::kmeans::KMeans<> k; k.Cluster(data, clusters, assignments); return List::create(_["clusters"] = clusters, _["result"] = assignments); ' mlKmeans <- cxxfunction(signature(test="numeric", n ="integer"), code, plugin="RcppMLPACK") data(trees, package="datasets") mlKmeans(t(trees), 3) \end{lstlisting} As said above, MLPACK uses a column-major format of matrix, so when we pass data from R to MLPACK, a transpose may be needed. \subsection{RcppMLPACK.package.skeleton} The package also contains a \texttt{RcppMLPACK.package.skeleton()} function for people who want to use MLPACK code in their own package. It follows the structure of \texttt{RcppArmadillo.package.skeleton()}. \begin{lstlisting}[caption={RcppMLPACK.package.skeleton}, language=R] library(RcppMLPACK) RcppMLPACK.package.skeleton("foobar") Creating directories ... Creating DESCRIPTION ... Creating NAMESPACE ... Creating Read-and-delete-me ... Saving functions and data ... Making help files ... Done. Further steps are described in './foobar/Read-and-delete-me'. Adding RcppMLPACK settings >> added Imports: Rcpp >> added LinkingTo: Rcpp, RcppArmadillo, BH, RcppMLPACK >> added useDynLib and importFrom directives to NAMESPACE >> added Makevars file with RcppMLPACK settings >> added Makevars.win file with RcppMLPACK settings >> added example src file using MLPACK classes >> invoked Rcpp::compileAttributes to create wrappers \end{lstlisting} By using \texttt{RcppMLPACK.package.skeleton()}, a package skeleton is generated and files are list below. \begin{lstlisting}[caption={RcppMLPACK.package.skeleton result}, language=R] system("ls -R foobar") foobar: DESCRIPTION man NAMESPACE R Read-and-delete-me src foobar/man: foobar-package.Rd foobar/R: RcppExports.R foobar/src: kmeans.cpp Makevars Makevars.win RcppExports.cpp \end{lstlisting} \subsection{Processing files on disk} MLPACK contains two functions, \texttt{mlpack::data::Load()} and \texttt{mlpack::data::Save()}. They are used into load and save matrix from files. These can be really useful, since reading large files into R can be a nightmare. We can just pass the file names from R to C++, not read them first. \begin{lstlisting}[caption={Reading and writing files on disk}, language=C++] std::string inputFile = Rcpp::as(input); std::string outputFile = Rcpp::as(output); arma::mat data; mlpack::data::Load(inputFile, data, true); int clusters = as(n); arma::Col assignments; mlpack::kmeans::KMeans<> k; k.Cluster(data, clusters, assignments); arma::Mat out = trans(assignments); mlpack::data::Save(outputFile, out); \end{lstlisting} \subsection{Performance} Even without a performance testing, we are still sure the C++ implementations should be faster. A small wine data set from UCI data sets repository is used for benchmarking, you can download it from \url{https://archive.ics.uci.edu/ml/datasets/Wine}. \textbf{rbenchmark}\cite{rbenchmark} package is also used. \begin{lstlisting}[caption={Benchmarking script}, language=R] suppressMessages(library(rbenchmark)) res <- benchmark(mlKmeans(t(wine),3), kmeans(wine,3), columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), order="relative") \end{lstlisting} \begin{table}[h!] \centering \caption{Benchmarking result} \begin{tabular}{rrrrrr} \hline test & replications & elapsed & relative & user.self & sys.self \\ [0.3ex] \hline mlKmeans(t(wine), 3) & 100& 0.028 & 1.000& 0.028 & 0.000\\ kmeans(wine, 3) & 100 & 0.947& 33.821 & 0.484 & 0.424\\[0.3ex] \hline %inserts single line \end{tabular} \end{table} From benchmarking result, we can see that MLPACK version of k-means is 33-time faster than \texttt{kmeans()} in R. However, we should note that R returns more information than the clustering result and there are much more checking functions in R. \section*{Acknowledgement} Very special thank you to developers of Rcpp and MLPACK. Also huge thanks to people on Rcpp-devel for the help. Testing and bugs reports are deeply welcome. If you have any questions, you can always find me by email (\url{qkou@umail.iu.edu}), or open issues on \url{https://github.com/thirdwing/RcppMLPACK}. \bibliography{ref} \bibliographystyle{unsrt} \newpage \appendix \section{Modifications on original MLPACK library} To avoid the maintenance tasks, we try to minimize the modification on MLPACK. However, for the integration and pass \texttt{R CMD check}, there are some changes we have to make. There is no changes in the methods, all modifications are in utility classes and done by a script. \subsection*{\texttt{Log} class} \texttt{Log} class provides four levels of log information. The logging functions are replaced with \texttt{Rcpp::Rcout} to redirect the output stream to R session. \subsection*{\texttt{cli} class} \texttt{cli} class is used to parse the command line arguments. It relies on \texttt{boost::program\_options} which requires additional linking. Since all arguments are passed from R, this class is removed. \subsection*{\texttt{SaveRestoreUtility} class} \texttt{SaveRestoreUtility} class is used to store and restore MLPACK models from XML files. This class has been removed to avoid additional linking to libxml2. \subsection*{\texttt{Timer} class} In many methods, \texttt{Timer} class is used for timing and print information into log. This class has been commented. \end{document} RcppMLPACK/vignettes/ref.bib0000644000176200001440000000405613647512751015330 0ustar liggesusers@article{Eddelbuettel:Francois:2011:JSSOBK:v40i08, author = "Dirk Eddelbuettel and Romain Francois", title = "Rcpp: Seamless {R} and {C++} Integration", journal = "Journal of Statistical Software", volume = "40", number = "8", pages = "1--18", day = "13", month = "4", year = "2011", CODEN = "JSSOBK", ISSN = "1548-7660", bibdate = "2011-03-21", URL = "http://www.jstatsoft.org/v40/i08", accepted = "2011-03-21", acknowledgement = "", keywords = "", submitted = "2010-11-15", } @article{eddelbuettel2014rcpparmadillo, title={Rcpparmadillo: Accelerating {R} with high-performance {C++} linear algebra}, author={Eddelbuettel, Dirk and Sanderson, Conrad}, journal={Computational Statistics \& Data Analysis}, volume={71}, pages={1054--1063}, year={2014}, publisher={Elsevier} } @article{curtin2013mlpack, title={{MLPACK}: A scalable {C++} machine learning library}, author={Curtin, Ryan and Cline, James and Slagle, Neil and March, William and Ram, Parikshit and Mehta, Nishant and Gray, Alexander}, journal={The Journal of Machine Learning Research}, volume={14}, number={1}, pages={801--805}, year={2013}, publisher={JMLR. org} } @TECHREPORT{arma, author = {Conrad Sanderson}, title = {Armadillo: An Open Source {C++} Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments}, institution = {NICTA}, year = {2010}, owner = {Administrator}, timestamp = {2014.01.11}, url = {http://arma.sf.net/} } @MANUAL{inline, title = {{inline}: inline {C}, {C++}, {Fortran} function calls from {R}}, author = {Oleg Sklyar and Duncan Murdoch and Mike Smith and Dirk Eddelbuettel and Romain Fran\c{c}ois}, year = {2013}, note = {R package version 0.3.13}, owner = {Administrator}, timestamp = {2013.12.29}, url = {http://cran.r-project.org/web/package=inline} } @MANUAL{rbenchmark, title = {rbenchmark: Benchmarking routine for {R}}, author = {Wacek Kusnierczyk}, year = {2012}, note = {R package version 1.0}, url = {http://cran.r-project.org/packages=rbenchmark} } RcppMLPACK/R/0000755000176200001440000000000013647512751012262 5ustar liggesusersRcppMLPACK/R/inline.R0000644000176200001440000000056413647512751013670 0ustar liggesusersinlineCxxPlugin <- function(...) { plugin <- Rcpp::Rcpp.plugin.maker( include.before = "#include ", libs = sprintf("%s $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)", RcppMLPACKLdFlags()), LinkingTo = c("RcppArmadillo", "Rcpp", "BH", "RcppMLPACK"), package = "RcppMLPACK" ) settings <- plugin() settings } RcppMLPACK/R/fun.R0000644000176200001440000000021613647512751013174 0ustar liggesusersmlKmeans <- function(X, y) { X <- as.matrix(X) y <- as.numeric(y) .Call( "RcppMLPACK_kmeans", X, y, PACKAGE = "RcppMLPACK" ) } RcppMLPACK/R/flags.R0000644000176200001440000000237013647512751013503 0ustar liggesusersRcppMLPACK.system.file <- function(...){ tools::file_path_as_absolute( base::system.file( ..., package = "RcppMLPACK" ) ) } staticLinking <- function() { ! grepl( "^linux", R.version$os ) } RcppMLPACKLdPath <- function() { if (nzchar(.Platform$r_arch)) { ## eg amd64, ia64, mips path <- RcppMLPACK.system.file("lib",.Platform$r_arch) } else { path <- RcppMLPACK.system.file("lib") } path } RcppMLPACKLdFlags <- function(static=staticLinking()) { RcppMLPACKdir <- RcppMLPACKLdPath() if (static) { # static is default on Windows and OS X flags <- paste('"',RcppMLPACKdir, '/libRcppMLPACK.a"', sep="") } else { # else for dynamic linking flags <- paste("-L", RcppMLPACKdir, " -lRcppMLPACK", sep="") # baseline setting if ((.Platform$OS.type == "unix") && # on Linux, we can use rpath to encode path (length(grep("^linux",R.version$os)))) { flags <- paste(flags, " -Wl,-rpath,", RcppMLPACKdir, sep="") } } invisible(flags) } CxxFlags <- function() { path <- RcppMLPACK.system.file("include") paste("-I", path, sep="") } LdFlags <- function(static=staticLinking()) { cat(RcppMLPACKLdFlags(static=static)) } RcppMLPACK/R/RcppMLPACK.package.skeleton.R0000644000176200001440000000736713647512751017433 0ustar liggesusers## RcppMLPACK.package.skeleton.R: makes a skeleton for a package that wants to use RcppMLPACK ## ## Copyright (C) 2016 Qiang Kou ## ## This file is part of RcppMLPACK. ## ## RcppMLPACK is free software: you can redistribute it and/or modify it ## under the terms of the GNU General Public License as published by ## the Free Software Foundation, either version 2 of the License, or ## (at your option) any later version. ## ## RcppMLPACK distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with RcppMLPACK. If not, see . RcppMLPACK.package.skeleton <- function(name="anRpackage", list=character(), environment=.GlobalEnv, path=".", force=FALSE, code_files=character(), example_code=TRUE) { env <- parent.frame(1) if (! length(list)) { fake <- TRUE assign( "Rcpp.fake.fun", function(){}, envir = env ) } else { fake <- FALSE } ## first let the traditional version do its business call <- match.call() call[[1]] <- as.name("package.skeleton") if ("example_code" %in% names(call)){ ## remove the example_code argument call[["example_code"]] <- NULL } if (fake) { call[["list"]] <- "Rcpp.fake.fun" } tryCatch(eval(call, envir=env), error = function(e) { stop("error while calling `package.skeleton`") }) message("\nAdding RcppMLPACK settings") ## now pick things up root <- file.path(path, name) ## Add Rcpp to the DESCRIPTION DESCRIPTION <- file.path(root, "DESCRIPTION") if (file.exists(DESCRIPTION)) { x <- cbind(read.dcf( DESCRIPTION), "Imports" = sprintf("Rcpp (>= %s)", packageDescription("Rcpp")[["Version"]]), "LinkingTo" = "Rcpp, RcppMLPACK, RcppArmadillo, BH") write.dcf(x, file=DESCRIPTION) message(" >> added Imports: Rcpp") message(" >> added LinkingTo: Rcpp, RcppMLPACK, RcppArmadillo, BH") } ## add a useDynLib to NAMESPACE, NAMESPACE <- file.path( root, "NAMESPACE") lines <- readLines( NAMESPACE ) if (! grepl("useDynLib", lines)) { lines <- c(sprintf("useDynLib(%s)", name), "importFrom(Rcpp, evalCpp)", ## ensures Rcpp instantiation lines) writeLines(lines, con = NAMESPACE) message( " >> added useDynLib and importFrom directives to NAMESPACE") } ## lay things out in the src directory src <- file.path(root, "src") if (!file.exists(src)) { dir.create(src) } skeleton <- system.file("skeleton", package="RcppMLPACK") Makevars <- file.path(src, "Makevars") if (!file.exists(Makevars)) { file.copy(file.path(skeleton, "Makevars"), Makevars) message(" >> added Makevars file with RcppMLPACK settings") } Makevars.win <- file.path(src, "Makevars.win") if (! file.exists( Makevars.win)) { file.copy(file.path(skeleton, "Makevars.win"), Makevars.win) message(" >> added Makevars.win file with RcppMLPACK settings") } if (example_code) { file.copy(file.path(skeleton, "kmeans.cpp"), src) message(" >> added example src file using MLPACK classes") Rcpp::compileAttributes(root) message(" >> invoked Rcpp::compileAttributes to create wrappers") } if (fake) { rm("Rcpp.fake.fun", envir=env) unlink(file.path(root, "R" , "Rcpp.fake.fun.R")) unlink(file.path(root, "man", "Rcpp.fake.fun.Rd")) } invisible(NULL) } RcppMLPACK/MD50000644000176200001440000012713513647521523012377 0ustar liggesusersf3f18643495b3b39a8f726c5a230c0c9 *DESCRIPTION a23bb2bd943c984d56b510b8d3edebdd *NAMESPACE 17190b03088faebdc374f4d45dbb65ff *R/RcppMLPACK.package.skeleton.R d5d3dbd4cdf2bda0c36bb044e9c5a5b9 *R/flags.R ab5a6ddf08aef281bb6199da2d7cf77b *R/fun.R e6ae8db771a0543cb675399383b9814a *R/inline.R 6e06bc43bb06ff1b133cca5f9d668b47 *README.md 30c9b1c34676f8fd5dc9b9505fe34e79 *build/vignette.rds 1b6625ad84abec837f77cbcfc1c0e328 *cleanup be061af1b67f6c02704d3d4227c7994d *configure 0724589651666b6ab524bd722decf0ae *configure.ac 49192f26625a83e1597a39ce6f3c50ca *inst/doc/RcppMLPACK-intro.Rnw 03154263a17addde68c00ed22d6a1363 *inst/doc/RcppMLPACK-intro.pdf dde1869ce28cf473420f01539ae099d7 *inst/include/RcppMLPACK.h 453d5937f3e6c4a123b7bc98f32b6bb3 *inst/include/mlpack/core.hpp 0daf541b040cc692d20517f1cc65b840 *inst/include/mlpack/core/arma_extend/SpMat_extra_bones.hpp 4d8590285527d4b07fc0d7dd2d3d1340 *inst/include/mlpack/core/arma_extend/SpMat_extra_meat.hpp db48729aa081b6b1fe7b77f763de6fb0 *inst/include/mlpack/core/arma_extend/arma_extend.hpp 947db90cb4aef09960e40a774b8a3486 *inst/include/mlpack/core/arma_extend/fn_ccov.hpp 6dcefd3e47ff7dc655f3244d8f6ef4fc *inst/include/mlpack/core/arma_extend/fn_inplace_reshape.hpp 562bff887ec2bf0faba90afa591f53cc *inst/include/mlpack/core/arma_extend/glue_ccov_meat.hpp 63bda57189595a0f9d42e7f01638fda9 *inst/include/mlpack/core/arma_extend/glue_ccov_proto.hpp 99ea37ed895b3945fbc0ed32ba2a7c12 *inst/include/mlpack/core/arma_extend/hdf5_misc.hpp 2b23cb598ca3f67253f5c80570d3d67e *inst/include/mlpack/core/arma_extend/op_ccov_meat.hpp d259f1735adcababd9b3247227776701 *inst/include/mlpack/core/arma_extend/op_ccov_proto.hpp eb15d96bcf808d8694f391f869e8193f *inst/include/mlpack/core/arma_extend/promote_type.hpp b1037f2d99573bdb250b1f5501c5bd69 *inst/include/mlpack/core/arma_extend/restrictors.hpp 698bdbe184b312d19606c6d5b3a17f3e *inst/include/mlpack/core/arma_extend/traits.hpp ce8a284891bf8636a99a7c1acdfb6640 *inst/include/mlpack/core/arma_extend/typedef.hpp cf3668e6107812004148a44a38593680 *inst/include/mlpack/core/data/load.hpp ce6e82f84c1c0156611cbd56e1673b7c *inst/include/mlpack/core/data/load_impl.hpp b448ce6892bc7e231f589efc7e30b8af *inst/include/mlpack/core/data/normalize_labels.hpp 14b08b0c0e66de7307df19917afd808d *inst/include/mlpack/core/data/normalize_labels_impl.hpp d0e9a0530bc1cd17291e742962ba8de6 *inst/include/mlpack/core/data/save.hpp d0b10209f197d1c4a1695d9b304fed3b *inst/include/mlpack/core/data/save_impl.hpp 8f859f81df8f8a59ae860648343b8393 *inst/include/mlpack/core/dists/discrete_distribution.hpp c3e812bbf7e0e017f8febd93eb212f14 *inst/include/mlpack/core/dists/gaussian_distribution.hpp 10fe95cccce642f16fcb3783caf3eae2 *inst/include/mlpack/core/dists/laplace_distribution.hpp f81296af004a261c531ffcaa4c7de703 *inst/include/mlpack/core/kernels/cosine_distance.hpp 7efc005baadee95480c3ed7450956bf9 *inst/include/mlpack/core/kernels/cosine_distance_impl.hpp e48a653b1c7741031963b98981d3c926 *inst/include/mlpack/core/kernels/epanechnikov_kernel.hpp e72a06dc0eb263efc5d170a8e4cd911e *inst/include/mlpack/core/kernels/epanechnikov_kernel_impl.hpp 1d95e02a2d3787c754f6a1bd622a7449 *inst/include/mlpack/core/kernels/example_kernel.hpp f7b4cbd3fffcb7cb8dd746acfa31a1d0 *inst/include/mlpack/core/kernels/gaussian_kernel.hpp 246ecae342463c8ebe5127fc8c186e86 *inst/include/mlpack/core/kernels/hyperbolic_tangent_kernel.hpp 8d895a08fc66c7edac3a5ec94beb4c79 *inst/include/mlpack/core/kernels/kernel_traits.hpp ad193c067b4de1a02737ede5d0701755 *inst/include/mlpack/core/kernels/laplacian_kernel.hpp 1b0b2077f00916fa6885410207289d24 *inst/include/mlpack/core/kernels/linear_kernel.hpp 4cc55126a3c1ac6e87e63e63c16fd657 *inst/include/mlpack/core/kernels/polynomial_kernel.hpp c2613c5eb83007f1afed39599f150035 *inst/include/mlpack/core/kernels/pspectrum_string_kernel.hpp 4ef25b80ed462e6513e3213fac8a1870 *inst/include/mlpack/core/kernels/pspectrum_string_kernel_impl.hpp 0e661104e9de639ec50412f35ac8b18b *inst/include/mlpack/core/kernels/spherical_kernel.hpp 8fced8e2463e9bf85594f09788689896 *inst/include/mlpack/core/kernels/triangular_kernel.hpp 3c5acad6dbcd2c3124525e6d11104e60 *inst/include/mlpack/core/math/clamp.hpp 41220e77c7403450aeff291e011a9e62 *inst/include/mlpack/core/math/lin_alg.hpp c6c4d743e75b7f6d51b8b8dae4454381 *inst/include/mlpack/core/math/random.hpp bb8ea1e837795609e29fda6d22f1fc5f *inst/include/mlpack/core/math/range.hpp ec96563ffaf3a353c85d3eefaddc6182 *inst/include/mlpack/core/math/range_impl.hpp 362514ec8e2986a80bf5bce745d5c5ad *inst/include/mlpack/core/math/round.hpp f079601ef96f69ea3d83a989ae5304da *inst/include/mlpack/core/metrics/ip_metric.hpp b3c5274536acae1544c61996c2fba939 *inst/include/mlpack/core/metrics/ip_metric_impl.hpp a3ec4579e4495366f29d2cc3d1c360fc *inst/include/mlpack/core/metrics/lmetric.hpp fae452f1155bc218d0d50a50566f0f57 *inst/include/mlpack/core/metrics/lmetric_impl.hpp 903e866a5d26e165c426bbf9137fec17 *inst/include/mlpack/core/metrics/mahalanobis_distance.hpp b215e1646f29f8ecae6d9e0145f2b754 *inst/include/mlpack/core/metrics/mahalanobis_distance_impl.hpp 5f669757e1621b5ec483c0f80dc545de *inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian.hpp 71e4d51937cc00c64d866b661283e61f *inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function.hpp e0b40978d9bd9228e3e9422dee9f52fe *inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function_impl.hpp a0d558ce658cde193fe69f108898d335 *inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_impl.hpp 2fff5108346b62ccbcf178a210d184d5 *inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.hpp 520d31ae7134f750c2bc8a35bef1317a *inst/include/mlpack/core/optimizers/lbfgs/lbfgs.hpp 61dead353ff606cdbec10f82ddc085a5 *inst/include/mlpack/core/optimizers/lbfgs/lbfgs_impl.hpp f59dfa0e705cfb03f205ad87b5c672fd *inst/include/mlpack/core/optimizers/lbfgs/test_functions.hpp 9ba06f870917b7647a4227150dd58e19 *inst/include/mlpack/core/optimizers/lrsdp/lrsdp.hpp 6d69a849fdf324e260f3e43b4491545f *inst/include/mlpack/core/optimizers/lrsdp/lrsdp_function.hpp ed3fa5eb8a86e0581da963d6f5c563a6 *inst/include/mlpack/core/optimizers/sa/exponential_schedule.hpp 8fb102f5e2c8c0d903f5ea3203227b6d *inst/include/mlpack/core/optimizers/sa/sa.hpp bdb6a65ff56fe932ca9cc1efb29eda9f *inst/include/mlpack/core/optimizers/sa/sa_impl.hpp 11d2c3bbc1dcd02d7a12ad3572720caf *inst/include/mlpack/core/optimizers/sgd/sgd.hpp 9bc171757250bebedbc0a0b762cbe717 *inst/include/mlpack/core/optimizers/sgd/sgd_impl.hpp 4f2a50996cfd13213aa67893edac1051 *inst/include/mlpack/core/optimizers/sgd/test_function.hpp d030d0939cf5b612a5530fd42a227384 *inst/include/mlpack/core/tree/ballbound.hpp c2249b763e9fb8912ccc965a99a44b82 *inst/include/mlpack/core/tree/ballbound_impl.hpp 156d29d36527000e9f09f8d57b99d921 *inst/include/mlpack/core/tree/binary_space_tree.hpp 77678cb0780d20a49fdf5af5b2de9e8c *inst/include/mlpack/core/tree/binary_space_tree/binary_space_tree.hpp 2d1dd4457138cc43c79c1ed90c37c4af *inst/include/mlpack/core/tree/binary_space_tree/binary_space_tree_impl.hpp a1ae1da6aac191119cdd054be22bc728 *inst/include/mlpack/core/tree/binary_space_tree/dual_tree_traverser.hpp eb1c36d1297adb158c621b157dc71fcf *inst/include/mlpack/core/tree/binary_space_tree/dual_tree_traverser_impl.hpp c316868ce59c129b84f2c2d2497b61e5 *inst/include/mlpack/core/tree/binary_space_tree/mean_split.hpp fd23b6d8d462bce8d6ec5e8ec23b85ea *inst/include/mlpack/core/tree/binary_space_tree/mean_split_impl.hpp d0b6605fa2b06cfa172d6a87a3f722a8 *inst/include/mlpack/core/tree/binary_space_tree/single_tree_traverser.hpp 5797a620b1751942b4fb7b5cd592a310 *inst/include/mlpack/core/tree/binary_space_tree/single_tree_traverser_impl.hpp 92b8ce90ffab4dd1d9c7da8542f42922 *inst/include/mlpack/core/tree/binary_space_tree/traits.hpp cd1052f34d890f0138b4baedb3070425 *inst/include/mlpack/core/tree/bounds.hpp ab6c2805c63f1b04dc6ad1d304ee6582 *inst/include/mlpack/core/tree/cosine_tree/cosine_tree.hpp ecab86725009c76f880db2b1d73c0b87 *inst/include/mlpack/core/tree/cover_tree.hpp 0757686573aa4b9e439a694e3a5edb80 *inst/include/mlpack/core/tree/cover_tree/cover_tree.hpp ddf807cd44a71fbcf0ab614754b31747 *inst/include/mlpack/core/tree/cover_tree/cover_tree_impl.hpp 996318c57be5491337f4b718d82622f4 *inst/include/mlpack/core/tree/cover_tree/dual_tree_traverser.hpp 6125216cfbf81ffe0a4401fb961e5403 *inst/include/mlpack/core/tree/cover_tree/dual_tree_traverser_impl.hpp 31237e268a1e90a0e7ba1f487aaa8264 *inst/include/mlpack/core/tree/cover_tree/first_point_is_root.hpp 513e82d9a9fed41f7b0a8e05ca7830eb *inst/include/mlpack/core/tree/cover_tree/single_tree_traverser.hpp b2f989daf0de22632d44f42e231e4fb4 *inst/include/mlpack/core/tree/cover_tree/single_tree_traverser_impl.hpp 7bd673dc2b26f1625dc62097880a424e *inst/include/mlpack/core/tree/cover_tree/traits.hpp 7afc92ca720ad54f862c6d8e3d1481a3 *inst/include/mlpack/core/tree/example_tree.hpp 847dc09228893dcb8f1292e5b0f4c2f3 *inst/include/mlpack/core/tree/hrectbound.hpp 4d6db481e4f4ab47d7859131f2f866dc *inst/include/mlpack/core/tree/hrectbound_impl.hpp a49afc2151d43dc194975d86a649b88f *inst/include/mlpack/core/tree/mrkd_statistic.hpp a685124a32bbafe382f787f2f4a3590b *inst/include/mlpack/core/tree/mrkd_statistic_impl.hpp a66dd2ea840f8673fa670e8de968053d *inst/include/mlpack/core/tree/rectangle_tree.hpp d72c5b3b901050beb978d49e9ab22f06 *inst/include/mlpack/core/tree/statistic.hpp f273d31476a0797eefcfd14b34495c99 *inst/include/mlpack/core/tree/traversal_info.hpp 05c163942fcc2bf4042f4474e191a636 *inst/include/mlpack/core/tree/tree_traits.hpp 6e5d179d1d0ff0e5e567a07f99642a95 *inst/include/mlpack/core/util/arma_traits.hpp 79713e1709ddd528c042889b4c894f7b *inst/include/mlpack/core/util/nulloutstream.hpp 12836653018653c64b13b51733ff58b1 *inst/include/mlpack/core/util/option.hpp 497ccf5a8123e20047260c4202473387 *inst/include/mlpack/core/util/option_impl.hpp cba73d7062ee224ce31a09f71b539354 *inst/include/mlpack/core/util/prefixedoutstream.hpp 73e5596cd45747b9ee0a41dff5105ddc *inst/include/mlpack/core/util/prefixedoutstream_impl.hpp 6ed5344c79dd8c7e2341caa95945e618 *inst/include/mlpack/core/util/sfinae_utility.hpp 0f864be4065b4ab9ac8ef258a30d9e0d *inst/include/mlpack/core/util/string_util.hpp 98cf94054c6148f63e3a7dbc416a6135 *inst/include/mlpack/core/util/timers.hpp 2520a41f29bf36172eb9f7fe0e191a51 *inst/include/mlpack/core/util/version.hpp a2d9b809a17bf405be6b79f410694c7f *inst/include/mlpack/methods/amf/amf.hpp 76d5d5d38a40c97618a8b80fdac55c9a *inst/include/mlpack/methods/amf/amf_impl.hpp 2658723657f5d78eda9f6dcb5b456135 *inst/include/mlpack/methods/amf/init_rules/random_acol_init.hpp 1776d3cc53489c2b01ccbe466ca28f99 *inst/include/mlpack/methods/amf/init_rules/random_init.hpp 4c2e3baaa010e32f563da56eb17cc380 *inst/include/mlpack/methods/amf/termination_policies/complete_incremental.hpp 8ed4f2df116f5412798a93bd47cbf275 *inst/include/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp fe23ee1a7beb2af453c9f8b09a236389 *inst/include/mlpack/methods/amf/termination_policies/incomplete_incremental.hpp 7d863cc040388db428af45400af9ca2c *inst/include/mlpack/methods/amf/termination_policies/simple_residue_termination.hpp c8b32e60c8c618a82766cc750eb76d12 *inst/include/mlpack/methods/amf/termination_policies/simple_tolerance_termination.hpp 68fb88ffbfce40ebf8393a96a6fed56a *inst/include/mlpack/methods/amf/termination_policies/validation_RMSE_termination.hpp d4c0b2cc1b987c880566c0df5bc43cb0 *inst/include/mlpack/methods/amf/update_rules/nmf_als.hpp c88ca2c2aa91c6589cfbe0e9fa85839a *inst/include/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp 8170a1490f040966ef3fca9a30119bdd *inst/include/mlpack/methods/amf/update_rules/nmf_mult_div.hpp bc679276b513f603e9c8474ef9f8b56e *inst/include/mlpack/methods/amf/update_rules/svd_batch_learning.hpp 956f52a40f2e8fc0e35edb42efb2f810 *inst/include/mlpack/methods/amf/update_rules/svd_complete_incremental_learning.hpp 24304c69705360bbf5f9295b1e89f970 *inst/include/mlpack/methods/amf/update_rules/svd_incomplete_incremental_learning.hpp b4eff17a7108ecb49bf0d73ffceb3089 *inst/include/mlpack/methods/cf/cf.hpp fb9db6cf0d546cd7b569bf0391b59964 *inst/include/mlpack/methods/cf/cf_impl.hpp c18db9286505427d3f83490dab0e400d *inst/include/mlpack/methods/decision_stump/decision_stump.hpp 23e3fc1e15d842a0a862ce7071824c15 *inst/include/mlpack/methods/decision_stump/decision_stump_impl.hpp b209d6f9c1824ec9aef34ecde54c0b64 *inst/include/mlpack/methods/det/dt_utils.hpp 990f3ff6260d0b7efd1566643afe84f4 *inst/include/mlpack/methods/det/dtree.hpp e7de5b62a48407a2dcbc4303eecc799f *inst/include/mlpack/methods/emst/dtb.hpp 2d3db0a78661b80a535c89a1e4feba14 *inst/include/mlpack/methods/emst/dtb_impl.hpp a80e027eeaea35d7899ec7eb03ccc44a *inst/include/mlpack/methods/emst/dtb_rules.hpp dcd115285d62c522a613b04aab2b4e33 *inst/include/mlpack/methods/emst/dtb_rules_impl.hpp 08f1935294c319674d4044552b4dfb0b *inst/include/mlpack/methods/emst/dtb_stat.hpp 165baa9f675c9de9047d08d2477e8844 *inst/include/mlpack/methods/emst/edge_pair.hpp dcde2555d083578635321a8e91d317e0 *inst/include/mlpack/methods/emst/union_find.hpp b9b68b61dc31a95c8edf46e60fa84695 *inst/include/mlpack/methods/fastmks/fastmks.hpp 8bed153b8950b1c67281f539adf191a8 *inst/include/mlpack/methods/fastmks/fastmks_impl.hpp 2f2ffe0d0fb25e2b854f1de086fa75dd *inst/include/mlpack/methods/fastmks/fastmks_rules.hpp fe32afec2ac2eb80ab44080f8f69e9dd *inst/include/mlpack/methods/fastmks/fastmks_rules_impl.hpp d63900ffedc199cd7e44a083eeb969d5 *inst/include/mlpack/methods/fastmks/fastmks_stat.hpp a4eb221fe22709535916f92842a2929a *inst/include/mlpack/methods/gmm/diagonal_constraint.hpp be6667e777534d8f00a62f9280ab4eba *inst/include/mlpack/methods/gmm/eigenvalue_ratio_constraint.hpp 9a4923c5a5d44259ac36aa8da1bc25db *inst/include/mlpack/methods/gmm/em_fit.hpp 1cf33c150ab8e08ac46d94b6ab2bf316 *inst/include/mlpack/methods/gmm/em_fit_impl.hpp 593b75ccb64866faf433b7093f35b776 *inst/include/mlpack/methods/gmm/gmm.hpp a94b6943b98fbd3e57e2f196f48ba762 *inst/include/mlpack/methods/gmm/gmm_impl.hpp e07d5aa86ac08df50f93c252e6b26e60 *inst/include/mlpack/methods/gmm/no_constraint.hpp cd0e6afab7b4e069470262c461052b15 *inst/include/mlpack/methods/gmm/phi.hpp 56f717d5952b8067dfef2a18d440677b *inst/include/mlpack/methods/gmm/positive_definite_constraint.hpp e1ba77e7d3c78fa88f02594daa1ad104 *inst/include/mlpack/methods/hmm/hmm.hpp e44d3c79be3db565925ec34134e96d55 *inst/include/mlpack/methods/hmm/hmm_impl.hpp 811e2b5a10473d9432be62c9297dd25a *inst/include/mlpack/methods/hmm/hmm_util.hpp ad1e8fd0da2cb4296739b350cf65cd9c *inst/include/mlpack/methods/hmm/hmm_util_impl.hpp 70b9cd14efab3e71c978716d07063f1d *inst/include/mlpack/methods/kernel_pca/kernel_pca.hpp 14fc60dce74d4e10ff269a7a6656629a *inst/include/mlpack/methods/kernel_pca/kernel_pca_impl.hpp b264cd08b6783f6d2bbedd6910ac572a *inst/include/mlpack/methods/kernel_pca/kernel_rules/naive_method.hpp 111389c0164af95c41a204797f936092 *inst/include/mlpack/methods/kernel_pca/kernel_rules/nystroem_method.hpp 63a7e232fb3c1334e5006111ab7bf5a9 *inst/include/mlpack/methods/kmeans/allow_empty_clusters.hpp 2d4d9191fdd8a061193c48245b9059d1 *inst/include/mlpack/methods/kmeans/kmeans.hpp c22261184ca34f54c8217a35d87a06f9 *inst/include/mlpack/methods/kmeans/kmeans_impl.hpp 632451104fde57cbefa5364957bfc801 *inst/include/mlpack/methods/kmeans/max_variance_new_cluster.hpp 3ec1ec3a750fcc36a3ebb81ccc4f72d1 *inst/include/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp d20e8f1bfacb02e3df7f45c8d9a43441 *inst/include/mlpack/methods/kmeans/random_partition.hpp 28dfd2929fa51d5777b75c4c90f8d68a *inst/include/mlpack/methods/kmeans/refined_start.hpp 88a1417c0c267fe4c176d78120412f98 *inst/include/mlpack/methods/kmeans/refined_start_impl.hpp 1837c95498783fc3685fddabbfe04bcf *inst/include/mlpack/methods/lars/lars.hpp 0536165729365aae2b9a052b1cbf29c6 *inst/include/mlpack/methods/linear_regression/linear_regression.hpp 4f086495027c32e68a66d5c5077c1a7f *inst/include/mlpack/methods/local_coordinate_coding/lcc.hpp 79919266f32f5e85e364ae542c4c84ff *inst/include/mlpack/methods/local_coordinate_coding/lcc_impl.hpp f257bdd291486114e6ca34195fb90ab4 *inst/include/mlpack/methods/logistic_regression/logistic_regression.hpp 2dfdf68d7a3d51b2ea75137b38935d7d *inst/include/mlpack/methods/logistic_regression/logistic_regression_function.hpp 662a0bddef4a7ad6f6c0c79c70bb2612 *inst/include/mlpack/methods/logistic_regression/logistic_regression_impl.hpp ed879e4eacf0a0127559b90628cd6c02 *inst/include/mlpack/methods/lsh/lsh_search.hpp 7af2c15e24e59da1389be295ba7ab4cf *inst/include/mlpack/methods/lsh/lsh_search_impl.hpp 9450336d51b269f44981c2b1948e79ae *inst/include/mlpack/methods/mvu/mvu.hpp 1bd877852a253c05b6bbfd5413ce503d *inst/include/mlpack/methods/naive_bayes/naive_bayes_classifier.hpp 62e42d07122c35f22fcf6b893848155f *inst/include/mlpack/methods/naive_bayes/naive_bayes_classifier_impl.hpp 296c40bbba84ea3b88524c96d3e16f88 *inst/include/mlpack/methods/nca/nca.hpp 12ff816cca37217984650c02c3c4c4ab *inst/include/mlpack/methods/nca/nca_impl.hpp 72e77ea066286ebb70dacfe3568adc40 *inst/include/mlpack/methods/nca/nca_softmax_error_function.hpp 68391fe6938d9e9e8aa421a68af37d22 *inst/include/mlpack/methods/nca/nca_softmax_error_function_impl.hpp a29e1096869f59f15e64b0aacb484a67 *inst/include/mlpack/methods/neighbor_search/neighbor_search.hpp 9180d6ef8cbadb60a0015dcb032b2603 *inst/include/mlpack/methods/neighbor_search/neighbor_search_impl.hpp 6a0f975d8e6b8fec050062a35a331537 *inst/include/mlpack/methods/neighbor_search/neighbor_search_rules.hpp d8e332f696a23781430df79b7753c615 *inst/include/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp 75188fbb51919261cf1d2406ed64492a *inst/include/mlpack/methods/neighbor_search/neighbor_search_stat.hpp a6d8e40503f254399aef711c849aa85f *inst/include/mlpack/methods/neighbor_search/ns_traversal_info.hpp 0fd635ea92a0ccc8dc353d1db789ee17 *inst/include/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.hpp 8dcf417f2b367b80a23a59636d152a02 *inst/include/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort_impl.hpp e5de9dbbbe0feefc933cfe88f567fb3b *inst/include/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.hpp 6f007adaa06f6027eb99692d9ea41b65 *inst/include/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort_impl.hpp 08b2ed90d794f526ba04faf96552e61e *inst/include/mlpack/methods/neighbor_search/typedef.hpp f5e739ee3d2a98ccdacb8e83506c16c6 *inst/include/mlpack/methods/neighbor_search/unmap.hpp 0378bd4254548efd38db0cebad059c1a *inst/include/mlpack/methods/nystroem_method/kmeans_selection.hpp f71809b147ac1f8bfc41f83ede078db6 *inst/include/mlpack/methods/nystroem_method/nystroem_method.hpp a52813b9412ae5778584258983881bc9 *inst/include/mlpack/methods/nystroem_method/nystroem_method_impl.hpp 9f61b9227b7740d012ec42828c45b455 *inst/include/mlpack/methods/nystroem_method/ordered_selection.hpp 9a25fe4318f8794a87e08da2ba7a4109 *inst/include/mlpack/methods/nystroem_method/random_selection.hpp 9425879f2cabf5a20ef058e8eeefcaf4 *inst/include/mlpack/methods/pca/pca.hpp 7389b2c8512d8bc07234e3fee84c3d72 *inst/include/mlpack/methods/perceptron/initialization_methods/random_init.hpp 62c94d4b7b5a7f1d388ce14a10b60335 *inst/include/mlpack/methods/perceptron/initialization_methods/zero_init.hpp 39d1f572ea6314cfb9053b17daae6f27 *inst/include/mlpack/methods/perceptron/learning_policies/simple_weight_update.hpp bf5e3f335d1518d6792dbf1ea7cf31f6 *inst/include/mlpack/methods/perceptron/perceptron.hpp 4edbe4e0fac9aa285e9b41bd7b78add8 *inst/include/mlpack/methods/perceptron/perceptron_impl.hpp a04ea30fd94fdfc8f3059f4057239e29 *inst/include/mlpack/methods/quic_svd/quic_svd.hpp ccb6c218ddbfa903ad1e614e80679b61 *inst/include/mlpack/methods/quic_svd/quic_svd_impl.hpp 9665e45dc6513087ce3e9d83a5474324 *inst/include/mlpack/methods/radical/radical.hpp 74ef98b01b1a2d5efccb25e588e50366 *inst/include/mlpack/methods/range_search/range_search.hpp e13ada24ea511f2b35e540ac89946ee3 *inst/include/mlpack/methods/range_search/range_search_impl.hpp 756dd6e90f922dbbf7b39ddf8920a1da *inst/include/mlpack/methods/range_search/range_search_rules.hpp 1bf5498a1d7264f647c38ed95ab7f695 *inst/include/mlpack/methods/range_search/range_search_rules_impl.hpp fe1de85f467babe565ae69852ccb826e *inst/include/mlpack/methods/range_search/range_search_stat.hpp 611981a9936edb1740cfa8555a1e21ae *inst/include/mlpack/methods/rann/ra_query_stat.hpp f0b4ebdaeeb51071cb507dfc81e1787b *inst/include/mlpack/methods/rann/ra_search.hpp f4688ace61684480ed226231ebee1e48 *inst/include/mlpack/methods/rann/ra_search_impl.hpp 4df8c3458984e609e0e5eca6a60e4798 *inst/include/mlpack/methods/rann/ra_search_rules.hpp 6876ef0cbe7b479a9310a96bc2842aad *inst/include/mlpack/methods/rann/ra_search_rules_impl.hpp 565f0d7a4f2210db1ece1cb3a5d0880b *inst/include/mlpack/methods/rann/ra_typedef.hpp d092a2d3f50809d9ddd96eb0492e00dd *inst/include/mlpack/methods/regularized_svd/regularized_svd.hpp f213dd4ab841939f190aa16238fc3b98 *inst/include/mlpack/methods/regularized_svd/regularized_svd_function.hpp 50df54c97b236a500c316ff6f275111a *inst/include/mlpack/methods/regularized_svd/regularized_svd_impl.hpp fb6d2cc30537ab8aeba11b03708919bd *inst/include/mlpack/methods/sparse_autoencoder/sparse_autoencoder.hpp 4ebfd8c58fd88efec5548f1de08174ae *inst/include/mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.hpp d7983afe9b235d3d389e2828445200c2 *inst/include/mlpack/methods/sparse_autoencoder/sparse_autoencoder_impl.hpp 2c922c61f29952d6e132ee1ebef778ee *inst/include/mlpack/methods/sparse_coding/data_dependent_random_initializer.hpp 7f603718c9dbb56ae659aed82ab5f6d0 *inst/include/mlpack/methods/sparse_coding/nothing_initializer.hpp 8224899b23a9ecc036f263046104efc1 *inst/include/mlpack/methods/sparse_coding/random_initializer.hpp 6e602803864575a906ef9f8ff06bbb74 *inst/include/mlpack/methods/sparse_coding/sparse_coding.hpp d293cfee2d78cfe8c7ce055a1c7eebba *inst/include/mlpack/methods/sparse_coding/sparse_coding_impl.hpp 67c3ed4bcb4cb83c19e725684f14542a *inst/include/mlpack/prereqs.hpp 187846a405b7c2a57de12c8a8a06ef9f *inst/skeleton/Makevars 5bd7709ac203f18ac08bf6bfac819535 *inst/skeleton/Makevars.win b7dae8ee457f3324d355244f864d2e38 *inst/skeleton/kmeans.cpp 92d7f79f2dd0d09d2a1e3043a1cb91aa *man/RcppMLPACK-package.Rd ed5c2115a4824cb83e6a6619418fa007 *man/RcppMLPACK.package.skeleton.Rd 0bf2ae1e4fc4e997deb5e566e7e6f174 *man/mlKmeans.Rd f8325867abcb98348b31695b25693af1 *src/Makevars a0ec9436a84a2e5c5d7fe3884f48d15e *src/Makevars.win 6076ab022a4c7f6306dbb64e6c37b406 *src/RcppExports.cpp dde1869ce28cf473420f01539ae099d7 *src/RcppMLPACK.h a8c17d533723dc59755c5b16384b54d1 *src/init.c c8a8813d94bcd3241d4b6f8bef71166f *src/kmeans.cpp 453d5937f3e6c4a123b7bc98f32b6bb3 *src/mlpack/core.hpp 0daf541b040cc692d20517f1cc65b840 *src/mlpack/core/arma_extend/SpMat_extra_bones.hpp 4d8590285527d4b07fc0d7dd2d3d1340 *src/mlpack/core/arma_extend/SpMat_extra_meat.hpp db48729aa081b6b1fe7b77f763de6fb0 *src/mlpack/core/arma_extend/arma_extend.hpp 947db90cb4aef09960e40a774b8a3486 *src/mlpack/core/arma_extend/fn_ccov.hpp 6dcefd3e47ff7dc655f3244d8f6ef4fc *src/mlpack/core/arma_extend/fn_inplace_reshape.hpp 562bff887ec2bf0faba90afa591f53cc *src/mlpack/core/arma_extend/glue_ccov_meat.hpp 63bda57189595a0f9d42e7f01638fda9 *src/mlpack/core/arma_extend/glue_ccov_proto.hpp 99ea37ed895b3945fbc0ed32ba2a7c12 *src/mlpack/core/arma_extend/hdf5_misc.hpp 2b23cb598ca3f67253f5c80570d3d67e *src/mlpack/core/arma_extend/op_ccov_meat.hpp d259f1735adcababd9b3247227776701 *src/mlpack/core/arma_extend/op_ccov_proto.hpp eb15d96bcf808d8694f391f869e8193f *src/mlpack/core/arma_extend/promote_type.hpp b1037f2d99573bdb250b1f5501c5bd69 *src/mlpack/core/arma_extend/restrictors.hpp 698bdbe184b312d19606c6d5b3a17f3e *src/mlpack/core/arma_extend/traits.hpp ce8a284891bf8636a99a7c1acdfb6640 *src/mlpack/core/arma_extend/typedef.hpp cf3668e6107812004148a44a38593680 *src/mlpack/core/data/load.hpp ce6e82f84c1c0156611cbd56e1673b7c *src/mlpack/core/data/load_impl.hpp b448ce6892bc7e231f589efc7e30b8af *src/mlpack/core/data/normalize_labels.hpp 14b08b0c0e66de7307df19917afd808d *src/mlpack/core/data/normalize_labels_impl.hpp d0e9a0530bc1cd17291e742962ba8de6 *src/mlpack/core/data/save.hpp d0b10209f197d1c4a1695d9b304fed3b *src/mlpack/core/data/save_impl.hpp fc89a07401692e7b4fccce4a3edc349d *src/mlpack/core/dists/discrete_distribution.cpp 8f859f81df8f8a59ae860648343b8393 *src/mlpack/core/dists/discrete_distribution.hpp e6aedefc844cd5902ba4e1055ec5ee13 *src/mlpack/core/dists/gaussian_distribution.cpp c3e812bbf7e0e017f8febd93eb212f14 *src/mlpack/core/dists/gaussian_distribution.hpp ec2c3327f5862d59f8eb9a6c69597711 *src/mlpack/core/dists/laplace_distribution.cpp 10fe95cccce642f16fcb3783caf3eae2 *src/mlpack/core/dists/laplace_distribution.hpp f81296af004a261c531ffcaa4c7de703 *src/mlpack/core/kernels/cosine_distance.hpp 7efc005baadee95480c3ed7450956bf9 *src/mlpack/core/kernels/cosine_distance_impl.hpp 7cd9f535cf049d13cd735bf43b9f1710 *src/mlpack/core/kernels/epanechnikov_kernel.cpp e48a653b1c7741031963b98981d3c926 *src/mlpack/core/kernels/epanechnikov_kernel.hpp e72a06dc0eb263efc5d170a8e4cd911e *src/mlpack/core/kernels/epanechnikov_kernel_impl.hpp 1d95e02a2d3787c754f6a1bd622a7449 *src/mlpack/core/kernels/example_kernel.hpp f7b4cbd3fffcb7cb8dd746acfa31a1d0 *src/mlpack/core/kernels/gaussian_kernel.hpp 246ecae342463c8ebe5127fc8c186e86 *src/mlpack/core/kernels/hyperbolic_tangent_kernel.hpp 8d895a08fc66c7edac3a5ec94beb4c79 *src/mlpack/core/kernels/kernel_traits.hpp ad193c067b4de1a02737ede5d0701755 *src/mlpack/core/kernels/laplacian_kernel.hpp 1b0b2077f00916fa6885410207289d24 *src/mlpack/core/kernels/linear_kernel.hpp 4cc55126a3c1ac6e87e63e63c16fd657 *src/mlpack/core/kernels/polynomial_kernel.hpp 756e10022106f00b4769d2ea3b6b4c83 *src/mlpack/core/kernels/pspectrum_string_kernel.cpp c2613c5eb83007f1afed39599f150035 *src/mlpack/core/kernels/pspectrum_string_kernel.hpp 4ef25b80ed462e6513e3213fac8a1870 *src/mlpack/core/kernels/pspectrum_string_kernel_impl.hpp 0e661104e9de639ec50412f35ac8b18b *src/mlpack/core/kernels/spherical_kernel.hpp 8fced8e2463e9bf85594f09788689896 *src/mlpack/core/kernels/triangular_kernel.hpp 3c5acad6dbcd2c3124525e6d11104e60 *src/mlpack/core/math/clamp.hpp f040d150fd110cb2a4ee90c5933962ec *src/mlpack/core/math/lin_alg.cpp 41220e77c7403450aeff291e011a9e62 *src/mlpack/core/math/lin_alg.hpp 1387f10e4c77df8f450a5c32af771687 *src/mlpack/core/math/random.cpp c6c4d743e75b7f6d51b8b8dae4454381 *src/mlpack/core/math/random.hpp bb8ea1e837795609e29fda6d22f1fc5f *src/mlpack/core/math/range.hpp ec96563ffaf3a353c85d3eefaddc6182 *src/mlpack/core/math/range_impl.hpp 362514ec8e2986a80bf5bce745d5c5ad *src/mlpack/core/math/round.hpp f079601ef96f69ea3d83a989ae5304da *src/mlpack/core/metrics/ip_metric.hpp b3c5274536acae1544c61996c2fba939 *src/mlpack/core/metrics/ip_metric_impl.hpp a3ec4579e4495366f29d2cc3d1c360fc *src/mlpack/core/metrics/lmetric.hpp fae452f1155bc218d0d50a50566f0f57 *src/mlpack/core/metrics/lmetric_impl.hpp 903e866a5d26e165c426bbf9137fec17 *src/mlpack/core/metrics/mahalanobis_distance.hpp b215e1646f29f8ecae6d9e0145f2b754 *src/mlpack/core/metrics/mahalanobis_distance_impl.hpp 5f669757e1621b5ec483c0f80dc545de *src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian.hpp 71e4d51937cc00c64d866b661283e61f *src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function.hpp e0b40978d9bd9228e3e9422dee9f52fe *src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function_impl.hpp a0d558ce658cde193fe69f108898d335 *src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_impl.hpp a37880c9bd5f097813fdad9f5b655548 *src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.cpp 2fff5108346b62ccbcf178a210d184d5 *src/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.hpp 520d31ae7134f750c2bc8a35bef1317a *src/mlpack/core/optimizers/lbfgs/lbfgs.hpp 61dead353ff606cdbec10f82ddc085a5 *src/mlpack/core/optimizers/lbfgs/lbfgs_impl.hpp 5b7f552b0e743d2c9a8430c921409bb1 *src/mlpack/core/optimizers/lbfgs/test_functions.cpp f59dfa0e705cfb03f205ad87b5c672fd *src/mlpack/core/optimizers/lbfgs/test_functions.hpp 7846ea36877d67f7bb4472d8520695f6 *src/mlpack/core/optimizers/lrsdp/lrsdp.cpp 9ba06f870917b7647a4227150dd58e19 *src/mlpack/core/optimizers/lrsdp/lrsdp.hpp bcb1e7ac57875cd882c9f54d06a2cf3c *src/mlpack/core/optimizers/lrsdp/lrsdp_function.cpp 6d69a849fdf324e260f3e43b4491545f *src/mlpack/core/optimizers/lrsdp/lrsdp_function.hpp ed3fa5eb8a86e0581da963d6f5c563a6 *src/mlpack/core/optimizers/sa/exponential_schedule.hpp 8fb102f5e2c8c0d903f5ea3203227b6d *src/mlpack/core/optimizers/sa/sa.hpp bdb6a65ff56fe932ca9cc1efb29eda9f *src/mlpack/core/optimizers/sa/sa_impl.hpp 11d2c3bbc1dcd02d7a12ad3572720caf *src/mlpack/core/optimizers/sgd/sgd.hpp 9bc171757250bebedbc0a0b762cbe717 *src/mlpack/core/optimizers/sgd/sgd_impl.hpp 36eb3544b27f8413e69760e3064db9fd *src/mlpack/core/optimizers/sgd/test_function.cpp 4f2a50996cfd13213aa67893edac1051 *src/mlpack/core/optimizers/sgd/test_function.hpp d030d0939cf5b612a5530fd42a227384 *src/mlpack/core/tree/ballbound.hpp c2249b763e9fb8912ccc965a99a44b82 *src/mlpack/core/tree/ballbound_impl.hpp 156d29d36527000e9f09f8d57b99d921 *src/mlpack/core/tree/binary_space_tree.hpp 77678cb0780d20a49fdf5af5b2de9e8c *src/mlpack/core/tree/binary_space_tree/binary_space_tree.hpp 2d1dd4457138cc43c79c1ed90c37c4af *src/mlpack/core/tree/binary_space_tree/binary_space_tree_impl.hpp a1ae1da6aac191119cdd054be22bc728 *src/mlpack/core/tree/binary_space_tree/dual_tree_traverser.hpp eb1c36d1297adb158c621b157dc71fcf *src/mlpack/core/tree/binary_space_tree/dual_tree_traverser_impl.hpp c316868ce59c129b84f2c2d2497b61e5 *src/mlpack/core/tree/binary_space_tree/mean_split.hpp fd23b6d8d462bce8d6ec5e8ec23b85ea *src/mlpack/core/tree/binary_space_tree/mean_split_impl.hpp d0b6605fa2b06cfa172d6a87a3f722a8 *src/mlpack/core/tree/binary_space_tree/single_tree_traverser.hpp 5797a620b1751942b4fb7b5cd592a310 *src/mlpack/core/tree/binary_space_tree/single_tree_traverser_impl.hpp 92b8ce90ffab4dd1d9c7da8542f42922 *src/mlpack/core/tree/binary_space_tree/traits.hpp cd1052f34d890f0138b4baedb3070425 *src/mlpack/core/tree/bounds.hpp 1b312630d98c37d7fda98fbc739b8191 *src/mlpack/core/tree/cosine_tree/cosine_tree.cpp ab6c2805c63f1b04dc6ad1d304ee6582 *src/mlpack/core/tree/cosine_tree/cosine_tree.hpp ecab86725009c76f880db2b1d73c0b87 *src/mlpack/core/tree/cover_tree.hpp 0757686573aa4b9e439a694e3a5edb80 *src/mlpack/core/tree/cover_tree/cover_tree.hpp ddf807cd44a71fbcf0ab614754b31747 *src/mlpack/core/tree/cover_tree/cover_tree_impl.hpp 996318c57be5491337f4b718d82622f4 *src/mlpack/core/tree/cover_tree/dual_tree_traverser.hpp 6125216cfbf81ffe0a4401fb961e5403 *src/mlpack/core/tree/cover_tree/dual_tree_traverser_impl.hpp 31237e268a1e90a0e7ba1f487aaa8264 *src/mlpack/core/tree/cover_tree/first_point_is_root.hpp 513e82d9a9fed41f7b0a8e05ca7830eb *src/mlpack/core/tree/cover_tree/single_tree_traverser.hpp b2f989daf0de22632d44f42e231e4fb4 *src/mlpack/core/tree/cover_tree/single_tree_traverser_impl.hpp 7bd673dc2b26f1625dc62097880a424e *src/mlpack/core/tree/cover_tree/traits.hpp 7afc92ca720ad54f862c6d8e3d1481a3 *src/mlpack/core/tree/example_tree.hpp 847dc09228893dcb8f1292e5b0f4c2f3 *src/mlpack/core/tree/hrectbound.hpp 4d6db481e4f4ab47d7859131f2f866dc *src/mlpack/core/tree/hrectbound_impl.hpp d8b6404380f28814ff5344ca7fcc464f *src/mlpack/core/tree/mrkd_statistic.cpp a49afc2151d43dc194975d86a649b88f *src/mlpack/core/tree/mrkd_statistic.hpp a685124a32bbafe382f787f2f4a3590b *src/mlpack/core/tree/mrkd_statistic_impl.hpp a66dd2ea840f8673fa670e8de968053d *src/mlpack/core/tree/rectangle_tree.hpp d72c5b3b901050beb978d49e9ab22f06 *src/mlpack/core/tree/statistic.hpp f273d31476a0797eefcfd14b34495c99 *src/mlpack/core/tree/traversal_info.hpp 05c163942fcc2bf4042f4474e191a636 *src/mlpack/core/tree/tree_traits.hpp 9601cdf4d510422f5bf187e1b261f4a1 *src/mlpack/core/util/arma_traits.hpp 79713e1709ddd528c042889b4c894f7b *src/mlpack/core/util/nulloutstream.hpp d7287d92202ad6e7e928cbcfdb42db65 *src/mlpack/core/util/option.cpp 12836653018653c64b13b51733ff58b1 *src/mlpack/core/util/option.hpp 497ccf5a8123e20047260c4202473387 *src/mlpack/core/util/option_impl.hpp a42649bf01bee25a14bea10affbcbe14 *src/mlpack/core/util/prefixedoutstream.cpp cba73d7062ee224ce31a09f71b539354 *src/mlpack/core/util/prefixedoutstream.hpp 73e5596cd45747b9ee0a41dff5105ddc *src/mlpack/core/util/prefixedoutstream_impl.hpp 6ed5344c79dd8c7e2341caa95945e618 *src/mlpack/core/util/sfinae_utility.hpp 9aa0f22d42155804ffe44a2c822cf27b *src/mlpack/core/util/string_util.cpp 0f864be4065b4ab9ac8ef258a30d9e0d *src/mlpack/core/util/string_util.hpp a1d216997b1591aa5c28aad084e05ef7 *src/mlpack/core/util/timers.cpp 98cf94054c6148f63e3a7dbc416a6135 *src/mlpack/core/util/timers.hpp eb9c12d1abd12847239ec9b7fbaf4bbe *src/mlpack/core/util/version.cpp 2520a41f29bf36172eb9f7fe0e191a51 *src/mlpack/core/util/version.hpp a2d9b809a17bf405be6b79f410694c7f *src/mlpack/methods/amf/amf.hpp 76d5d5d38a40c97618a8b80fdac55c9a *src/mlpack/methods/amf/amf_impl.hpp 2658723657f5d78eda9f6dcb5b456135 *src/mlpack/methods/amf/init_rules/random_acol_init.hpp 1776d3cc53489c2b01ccbe466ca28f99 *src/mlpack/methods/amf/init_rules/random_init.hpp 8ed4f2df116f5412798a93bd47cbf275 *src/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp 3c0f3a2cc3224e5cd4b858a7f9bb7411 *src/mlpack/methods/amf/termination_policies/incomplete_incremental_termination.hpp 7d863cc040388db428af45400af9ca2c *src/mlpack/methods/amf/termination_policies/simple_residue_termination.hpp c8b32e60c8c618a82766cc750eb76d12 *src/mlpack/methods/amf/termination_policies/simple_tolerance_termination.hpp 68fb88ffbfce40ebf8393a96a6fed56a *src/mlpack/methods/amf/termination_policies/validation_RMSE_termination.hpp d4c0b2cc1b987c880566c0df5bc43cb0 *src/mlpack/methods/amf/update_rules/nmf_als.hpp c88ca2c2aa91c6589cfbe0e9fa85839a *src/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp 8170a1490f040966ef3fca9a30119bdd *src/mlpack/methods/amf/update_rules/nmf_mult_div.hpp bc679276b513f603e9c8474ef9f8b56e *src/mlpack/methods/amf/update_rules/svd_batch_learning.hpp 956f52a40f2e8fc0e35edb42efb2f810 *src/mlpack/methods/amf/update_rules/svd_complete_incremental_learning.hpp 24304c69705360bbf5f9295b1e89f970 *src/mlpack/methods/amf/update_rules/svd_incomplete_incremental_learning.hpp b4eff17a7108ecb49bf0d73ffceb3089 *src/mlpack/methods/cf/cf.hpp fb9db6cf0d546cd7b569bf0391b59964 *src/mlpack/methods/cf/cf_impl.hpp c18db9286505427d3f83490dab0e400d *src/mlpack/methods/decision_stump/decision_stump.hpp 23e3fc1e15d842a0a862ce7071824c15 *src/mlpack/methods/decision_stump/decision_stump_impl.hpp 05e0653f084179340a0e3567d1e952c4 *src/mlpack/methods/det/dt_utils.cpp b209d6f9c1824ec9aef34ecde54c0b64 *src/mlpack/methods/det/dt_utils.hpp c41bc64ef1905085a2d9820187ce3b53 *src/mlpack/methods/det/dtree.cpp 990f3ff6260d0b7efd1566643afe84f4 *src/mlpack/methods/det/dtree.hpp e7de5b62a48407a2dcbc4303eecc799f *src/mlpack/methods/emst/dtb.hpp 2d3db0a78661b80a535c89a1e4feba14 *src/mlpack/methods/emst/dtb_impl.hpp a80e027eeaea35d7899ec7eb03ccc44a *src/mlpack/methods/emst/dtb_rules.hpp dcd115285d62c522a613b04aab2b4e33 *src/mlpack/methods/emst/dtb_rules_impl.hpp 08f1935294c319674d4044552b4dfb0b *src/mlpack/methods/emst/dtb_stat.hpp 165baa9f675c9de9047d08d2477e8844 *src/mlpack/methods/emst/edge_pair.hpp dcde2555d083578635321a8e91d317e0 *src/mlpack/methods/emst/union_find.hpp b9b68b61dc31a95c8edf46e60fa84695 *src/mlpack/methods/fastmks/fastmks.hpp 8bed153b8950b1c67281f539adf191a8 *src/mlpack/methods/fastmks/fastmks_impl.hpp 2f2ffe0d0fb25e2b854f1de086fa75dd *src/mlpack/methods/fastmks/fastmks_rules.hpp fe32afec2ac2eb80ab44080f8f69e9dd *src/mlpack/methods/fastmks/fastmks_rules_impl.hpp d63900ffedc199cd7e44a083eeb969d5 *src/mlpack/methods/fastmks/fastmks_stat.hpp a4eb221fe22709535916f92842a2929a *src/mlpack/methods/gmm/diagonal_constraint.hpp be6667e777534d8f00a62f9280ab4eba *src/mlpack/methods/gmm/eigenvalue_ratio_constraint.hpp 9a4923c5a5d44259ac36aa8da1bc25db *src/mlpack/methods/gmm/em_fit.hpp 1cf33c150ab8e08ac46d94b6ab2bf316 *src/mlpack/methods/gmm/em_fit_impl.hpp 593b75ccb64866faf433b7093f35b776 *src/mlpack/methods/gmm/gmm.hpp a94b6943b98fbd3e57e2f196f48ba762 *src/mlpack/methods/gmm/gmm_impl.hpp e07d5aa86ac08df50f93c252e6b26e60 *src/mlpack/methods/gmm/no_constraint.hpp cd0e6afab7b4e069470262c461052b15 *src/mlpack/methods/gmm/phi.hpp 56f717d5952b8067dfef2a18d440677b *src/mlpack/methods/gmm/positive_definite_constraint.hpp e1ba77e7d3c78fa88f02594daa1ad104 *src/mlpack/methods/hmm/hmm.hpp e44d3c79be3db565925ec34134e96d55 *src/mlpack/methods/hmm/hmm_impl.hpp 811e2b5a10473d9432be62c9297dd25a *src/mlpack/methods/hmm/hmm_util.hpp ad1e8fd0da2cb4296739b350cf65cd9c *src/mlpack/methods/hmm/hmm_util_impl.hpp 70b9cd14efab3e71c978716d07063f1d *src/mlpack/methods/kernel_pca/kernel_pca.hpp 14fc60dce74d4e10ff269a7a6656629a *src/mlpack/methods/kernel_pca/kernel_pca_impl.hpp b264cd08b6783f6d2bbedd6910ac572a *src/mlpack/methods/kernel_pca/kernel_rules/naive_method.hpp 111389c0164af95c41a204797f936092 *src/mlpack/methods/kernel_pca/kernel_rules/nystroem_method.hpp 63a7e232fb3c1334e5006111ab7bf5a9 *src/mlpack/methods/kmeans/allow_empty_clusters.hpp 2d4d9191fdd8a061193c48245b9059d1 *src/mlpack/methods/kmeans/kmeans.hpp c22261184ca34f54c8217a35d87a06f9 *src/mlpack/methods/kmeans/kmeans_impl.hpp 632451104fde57cbefa5364957bfc801 *src/mlpack/methods/kmeans/max_variance_new_cluster.hpp 3ec1ec3a750fcc36a3ebb81ccc4f72d1 *src/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp d20e8f1bfacb02e3df7f45c8d9a43441 *src/mlpack/methods/kmeans/random_partition.hpp 28dfd2929fa51d5777b75c4c90f8d68a *src/mlpack/methods/kmeans/refined_start.hpp 88a1417c0c267fe4c176d78120412f98 *src/mlpack/methods/kmeans/refined_start_impl.hpp 0cc995b54bc7caa3bbe64358400cbad2 *src/mlpack/methods/lars/lars.cpp 1837c95498783fc3685fddabbfe04bcf *src/mlpack/methods/lars/lars.hpp be08f783ea85e0d26248e51c254075d2 *src/mlpack/methods/linear_regression/linear_regression.cpp 0536165729365aae2b9a052b1cbf29c6 *src/mlpack/methods/linear_regression/linear_regression.hpp 4f086495027c32e68a66d5c5077c1a7f *src/mlpack/methods/local_coordinate_coding/lcc.hpp 79919266f32f5e85e364ae542c4c84ff *src/mlpack/methods/local_coordinate_coding/lcc_impl.hpp f257bdd291486114e6ca34195fb90ab4 *src/mlpack/methods/logistic_regression/logistic_regression.hpp 4ab0f075aa4a01cf07a8fadfd5a98bb8 *src/mlpack/methods/logistic_regression/logistic_regression_function.cpp 2dfdf68d7a3d51b2ea75137b38935d7d *src/mlpack/methods/logistic_regression/logistic_regression_function.hpp 662a0bddef4a7ad6f6c0c79c70bb2612 *src/mlpack/methods/logistic_regression/logistic_regression_impl.hpp ed879e4eacf0a0127559b90628cd6c02 *src/mlpack/methods/lsh/lsh_search.hpp 7af2c15e24e59da1389be295ba7ab4cf *src/mlpack/methods/lsh/lsh_search_impl.hpp 1c1414080fdb03bfe8b7ae461270a02c *src/mlpack/methods/mvu/mvu.cpp 9450336d51b269f44981c2b1948e79ae *src/mlpack/methods/mvu/mvu.hpp 1bd877852a253c05b6bbfd5413ce503d *src/mlpack/methods/naive_bayes/naive_bayes_classifier.hpp 62e42d07122c35f22fcf6b893848155f *src/mlpack/methods/naive_bayes/naive_bayes_classifier_impl.hpp 296c40bbba84ea3b88524c96d3e16f88 *src/mlpack/methods/nca/nca.hpp 12ff816cca37217984650c02c3c4c4ab *src/mlpack/methods/nca/nca_impl.hpp 72e77ea066286ebb70dacfe3568adc40 *src/mlpack/methods/nca/nca_softmax_error_function.hpp 68391fe6938d9e9e8aa421a68af37d22 *src/mlpack/methods/nca/nca_softmax_error_function_impl.hpp a29e1096869f59f15e64b0aacb484a67 *src/mlpack/methods/neighbor_search/neighbor_search.hpp 9180d6ef8cbadb60a0015dcb032b2603 *src/mlpack/methods/neighbor_search/neighbor_search_impl.hpp 6a0f975d8e6b8fec050062a35a331537 *src/mlpack/methods/neighbor_search/neighbor_search_rules.hpp d8e332f696a23781430df79b7753c615 *src/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp 75188fbb51919261cf1d2406ed64492a *src/mlpack/methods/neighbor_search/neighbor_search_stat.hpp a6d8e40503f254399aef711c849aa85f *src/mlpack/methods/neighbor_search/ns_traversal_info.hpp 4136fb453a1af51226bbe6ff22078272 *src/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.cpp 0fd635ea92a0ccc8dc353d1db789ee17 *src/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.hpp 8dcf417f2b367b80a23a59636d152a02 *src/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort_impl.hpp 8f0d9308b503050cf8366770cdcc2e51 *src/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.cpp e5de9dbbbe0feefc933cfe88f567fb3b *src/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.hpp 6f007adaa06f6027eb99692d9ea41b65 *src/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort_impl.hpp 08b2ed90d794f526ba04faf96552e61e *src/mlpack/methods/neighbor_search/typedef.hpp dd4a7c7e1a07698bf9fdf3e8d9fdd935 *src/mlpack/methods/neighbor_search/unmap.cpp f5e739ee3d2a98ccdacb8e83506c16c6 *src/mlpack/methods/neighbor_search/unmap.hpp 0378bd4254548efd38db0cebad059c1a *src/mlpack/methods/nystroem_method/kmeans_selection.hpp f71809b147ac1f8bfc41f83ede078db6 *src/mlpack/methods/nystroem_method/nystroem_method.hpp a52813b9412ae5778584258983881bc9 *src/mlpack/methods/nystroem_method/nystroem_method_impl.hpp 9f61b9227b7740d012ec42828c45b455 *src/mlpack/methods/nystroem_method/ordered_selection.hpp 9a25fe4318f8794a87e08da2ba7a4109 *src/mlpack/methods/nystroem_method/random_selection.hpp 244363743822bad086b432dd83d89ae2 *src/mlpack/methods/pca/pca.cpp 9425879f2cabf5a20ef058e8eeefcaf4 *src/mlpack/methods/pca/pca.hpp 7389b2c8512d8bc07234e3fee84c3d72 *src/mlpack/methods/perceptron/initialization_methods/random_init.hpp 62c94d4b7b5a7f1d388ce14a10b60335 *src/mlpack/methods/perceptron/initialization_methods/zero_init.hpp 39d1f572ea6314cfb9053b17daae6f27 *src/mlpack/methods/perceptron/learning_policies/simple_weight_update.hpp bf5e3f335d1518d6792dbf1ea7cf31f6 *src/mlpack/methods/perceptron/perceptron.hpp 4edbe4e0fac9aa285e9b41bd7b78add8 *src/mlpack/methods/perceptron/perceptron_impl.hpp a04ea30fd94fdfc8f3059f4057239e29 *src/mlpack/methods/quic_svd/quic_svd.hpp ccb6c218ddbfa903ad1e614e80679b61 *src/mlpack/methods/quic_svd/quic_svd_impl.hpp 79f7d46552ef136f3c334820644f70bd *src/mlpack/methods/radical/radical.cpp 9665e45dc6513087ce3e9d83a5474324 *src/mlpack/methods/radical/radical.hpp 74ef98b01b1a2d5efccb25e588e50366 *src/mlpack/methods/range_search/range_search.hpp e13ada24ea511f2b35e540ac89946ee3 *src/mlpack/methods/range_search/range_search_impl.hpp 756dd6e90f922dbbf7b39ddf8920a1da *src/mlpack/methods/range_search/range_search_rules.hpp 1bf5498a1d7264f647c38ed95ab7f695 *src/mlpack/methods/range_search/range_search_rules_impl.hpp fe1de85f467babe565ae69852ccb826e *src/mlpack/methods/range_search/range_search_stat.hpp 611981a9936edb1740cfa8555a1e21ae *src/mlpack/methods/rann/ra_query_stat.hpp f0b4ebdaeeb51071cb507dfc81e1787b *src/mlpack/methods/rann/ra_search.hpp f4688ace61684480ed226231ebee1e48 *src/mlpack/methods/rann/ra_search_impl.hpp 4df8c3458984e609e0e5eca6a60e4798 *src/mlpack/methods/rann/ra_search_rules.hpp 6876ef0cbe7b479a9310a96bc2842aad *src/mlpack/methods/rann/ra_search_rules_impl.hpp 565f0d7a4f2210db1ece1cb3a5d0880b *src/mlpack/methods/rann/ra_typedef.hpp d092a2d3f50809d9ddd96eb0492e00dd *src/mlpack/methods/regularized_svd/regularized_svd.hpp 154ce54542b8977e36f81354e491ec38 *src/mlpack/methods/regularized_svd/regularized_svd_function.cpp f213dd4ab841939f190aa16238fc3b98 *src/mlpack/methods/regularized_svd/regularized_svd_function.hpp 50df54c97b236a500c316ff6f275111a *src/mlpack/methods/regularized_svd/regularized_svd_impl.hpp fb6d2cc30537ab8aeba11b03708919bd *src/mlpack/methods/sparse_autoencoder/sparse_autoencoder.hpp e91fa58f3f5edb5967b9f487de379437 *src/mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.cpp 4ebfd8c58fd88efec5548f1de08174ae *src/mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.hpp d7983afe9b235d3d389e2828445200c2 *src/mlpack/methods/sparse_autoencoder/sparse_autoencoder_impl.hpp 2c922c61f29952d6e132ee1ebef778ee *src/mlpack/methods/sparse_coding/data_dependent_random_initializer.hpp 7f603718c9dbb56ae659aed82ab5f6d0 *src/mlpack/methods/sparse_coding/nothing_initializer.hpp 8224899b23a9ecc036f263046104efc1 *src/mlpack/methods/sparse_coding/random_initializer.hpp 6e602803864575a906ef9f8ff06bbb74 *src/mlpack/methods/sparse_coding/sparse_coding.hpp d293cfee2d78cfe8c7ce055a1c7eebba *src/mlpack/methods/sparse_coding/sparse_coding_impl.hpp 67c3ed4bcb4cb83c19e725684f14542a *src/mlpack/prereqs.hpp 49192f26625a83e1597a39ce6f3c50ca *vignettes/RcppMLPACK-intro.Rnw 25231a8c1ef3828de758113a77ed6842 *vignettes/ref.bib RcppMLPACK/inst/0000755000176200001440000000000013647514343013035 5ustar liggesusersRcppMLPACK/inst/doc/0000755000176200001440000000000013647514343013602 5ustar liggesusersRcppMLPACK/inst/doc/RcppMLPACK-intro.Rnw0000644000176200001440000002564113647512751017230 0ustar liggesusers\documentclass[letterpaper]{article} %\VignetteIndexEntry{RcppMLPACK-introduction} %\VignetteKeywords{R, C++, Armadillo, MLPACK, machine learning} %\VignettePackage{RcppMLPACK} \usepackage[utf8]{inputenc} \usepackage{amsmath,amsthm, amsfonts, array} \usepackage{graphicx,booktabs,xcolor,color} \usepackage{geometry} \geometry{left=2.5cm,right=2.5cm,top=2.3cm,bottom=2.3cm} \usepackage{url} \usepackage{amsmath} \usepackage{multirow} \usepackage{epstopdf} \usepackage{listings} \definecolor{shadecolor}{rgb}{0.95,0.95,0.95} \lstset{xrightmargin=0pt, keepspaces=true, basicstyle=\ttfamily, commentstyle=\ttfamily, numbers=left, numberstyle=\tiny, backgroundcolor=\color{shadecolor}, columns=fullflexible, showstringspaces=false, breaklines=true, framerule=0.7pt, frameround=tttt, captionpos=b, xleftmargin=-0.2em, xrightmargin=-0.25em, aboveskip=1em, } \usepackage{indentfirst} \setlength{\parindent}{2em} \linespread{1.3} \begin{document} \title{\textbf{RcppMLPACK}: R integration with MLPACK using Rcpp} \author{Qiang Kou\\qkou@umail.iu.edu} \date{\today} \maketitle \section{MLPACK} MLPACK\cite{curtin2013mlpack} is a C++ machine learning library with emphasis on scalability, speed, and ease-of-use. It outperforms competing machine learning libraries by large margins, for detailed results of benchmarking, please refer to Ref\cite{curtin2013mlpack}. \subsection{Input/Output using Armadillo} MLPACK uses Armadillo as input and output, which provides vector, matrix and cube types (integer, floating point and complex numbers are supported)\cite{arma}. By providing Matlab-like syntax and various matrix decompositions through integration with LAPACK or other high performance libraries (such as Intel MKL, AMD ACML, or OpenBLAS), Armadillo keeps a good balance between speed and ease of use. Armadillo is widely used in C++ libraries like MLPACK. However, there is one point we need to pay attention to: Armadillo matrices in MLPACK are stored in a column-major format for speed. That means observations are stored as columns and dimensions as rows. So when using MLPACK, additional transpose may be needed. \subsection{Simple API} Although MLPACK relies on template techniques in C++, it provides an intuitive and simple API. For the standard usage of methods, default arguments are provided. The MLPACK paper\cite{curtin2013mlpack} provides a good example: standard k-means clustering in Euclidean space can be initialized like below: \begin{lstlisting}[caption={k-means with all default arguments}, language=C++] KMeans<> k(); \end{lstlisting} If we want to use Manhattan distance, a different cluster initialization policy, and allow empty clusters, the object can be initialized with additional arguments: \begin{lstlisting}[caption={k-means with additional arguments}, language=C++] KMeans k(); \end{lstlisting} \subsection{Available methods in MLPACK} Commonly used machine learning methods are all implemented in MLPACK. Besides, it contains a strong set of tree-building and tree-query routines. Available methods in MLPACK: \begin{itemize} \item Fast Hierarchical Clustering (Euclidean Minimum Spanning Trees) \item Gaussian Mixture Models (trained via EM) \item Hidden Markov Models (training, prediction, and classification) \item Kernel Principal Components Analysis \item K-Means clustering \item LARS/Lasso Regression \item Least-squares Linear Regression \item Maximum Variance Unfolding (using LRSDP) \item Naive Bayes Classifier \item Neighbourhood Components Analysis (using LRSDP) \item RADICAL (Robust, Accurate, Direct ICA aLgorithm) \item Tree-based k-nearest-neighbours search and classifier \item Tree-based range search \end{itemize} For the details of each algorithm and usage, please visit the tutorial page of MLPACK (\url{http://www.mlpack.org/tutorial.html}). \section{RcppMLPACK} \subsection{Rcpp and RcppArmadillo} As said above, MLPACK is a C++ library using Armadillo. Since we \textbf{Rcpp}\cite{Eddelbuettel:Francois:2011:JSSOBK:v40i08} and \textbf{RcppArmadillo}\cite{eddelbuettel2014rcpparmadillo}, which can used to integrate C++ and Armadillo with R seamlessly, \textbf{RcppMLPACK} becomes something very natural. \subsection{k-means example} Here we continue the k-means example above. By using \textbf{RcppMLPACK}, a k-means method which can be called by R can be implemented like Listing \ref{kmeans}. The interfere between R and C++ is handled by \textbf{Rcpp} and \textbf{RcppArmadillo}. \begin{lstlisting}[caption={k-means example}, language=C++, label={kmeans}] #include "RcppMLPACK.h" using namespace mlpack::kmeans; using namespace Rcpp; // [[Rcpp::export]] List kmeans(const arma::mat& data, const int& clusters) { arma::Col assignments; // Initialize with the default arguments. KMeans<> k; k.Cluster(data, clusters, assignments); return List::create(_["clusters"] = clusters, _["result"] = assignments); } \end{lstlisting} \subsection{Using inline package} \textbf{inline}\cite{inline} package provides a complete wrapper around the compilation, linking, and loading steps. So all the steps can be done in an R session. There is no reason that \textbf{RcppMLPACK} doesn't support the inline compilation. \begin{lstlisting}[caption={k-means example, inline version}, language=R] library(inline) library(RcppMLPACK) code <- ' arma::mat data = as(test); int clusters = as(n); arma::Col assignments; mlpack::kmeans::KMeans<> k; k.Cluster(data, clusters, assignments); return List::create(_["clusters"] = clusters, _["result"] = assignments); ' mlKmeans <- cxxfunction(signature(test="numeric", n ="integer"), code, plugin="RcppMLPACK") data(trees, package="datasets") mlKmeans(t(trees), 3) \end{lstlisting} As said above, MLPACK uses a column-major format of matrix, so when we pass data from R to MLPACK, a transpose may be needed. \subsection{RcppMLPACK.package.skeleton} The package also contains a \texttt{RcppMLPACK.package.skeleton()} function for people who want to use MLPACK code in their own package. It follows the structure of \texttt{RcppArmadillo.package.skeleton()}. \begin{lstlisting}[caption={RcppMLPACK.package.skeleton}, language=R] library(RcppMLPACK) RcppMLPACK.package.skeleton("foobar") Creating directories ... Creating DESCRIPTION ... Creating NAMESPACE ... Creating Read-and-delete-me ... Saving functions and data ... Making help files ... Done. Further steps are described in './foobar/Read-and-delete-me'. Adding RcppMLPACK settings >> added Imports: Rcpp >> added LinkingTo: Rcpp, RcppArmadillo, BH, RcppMLPACK >> added useDynLib and importFrom directives to NAMESPACE >> added Makevars file with RcppMLPACK settings >> added Makevars.win file with RcppMLPACK settings >> added example src file using MLPACK classes >> invoked Rcpp::compileAttributes to create wrappers \end{lstlisting} By using \texttt{RcppMLPACK.package.skeleton()}, a package skeleton is generated and files are list below. \begin{lstlisting}[caption={RcppMLPACK.package.skeleton result}, language=R] system("ls -R foobar") foobar: DESCRIPTION man NAMESPACE R Read-and-delete-me src foobar/man: foobar-package.Rd foobar/R: RcppExports.R foobar/src: kmeans.cpp Makevars Makevars.win RcppExports.cpp \end{lstlisting} \subsection{Processing files on disk} MLPACK contains two functions, \texttt{mlpack::data::Load()} and \texttt{mlpack::data::Save()}. They are used into load and save matrix from files. These can be really useful, since reading large files into R can be a nightmare. We can just pass the file names from R to C++, not read them first. \begin{lstlisting}[caption={Reading and writing files on disk}, language=C++] std::string inputFile = Rcpp::as(input); std::string outputFile = Rcpp::as(output); arma::mat data; mlpack::data::Load(inputFile, data, true); int clusters = as(n); arma::Col assignments; mlpack::kmeans::KMeans<> k; k.Cluster(data, clusters, assignments); arma::Mat out = trans(assignments); mlpack::data::Save(outputFile, out); \end{lstlisting} \subsection{Performance} Even without a performance testing, we are still sure the C++ implementations should be faster. A small wine data set from UCI data sets repository is used for benchmarking, you can download it from \url{https://archive.ics.uci.edu/ml/datasets/Wine}. \textbf{rbenchmark}\cite{rbenchmark} package is also used. \begin{lstlisting}[caption={Benchmarking script}, language=R] suppressMessages(library(rbenchmark)) res <- benchmark(mlKmeans(t(wine),3), kmeans(wine,3), columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), order="relative") \end{lstlisting} \begin{table}[h!] \centering \caption{Benchmarking result} \begin{tabular}{rrrrrr} \hline test & replications & elapsed & relative & user.self & sys.self \\ [0.3ex] \hline mlKmeans(t(wine), 3) & 100& 0.028 & 1.000& 0.028 & 0.000\\ kmeans(wine, 3) & 100 & 0.947& 33.821 & 0.484 & 0.424\\[0.3ex] \hline %inserts single line \end{tabular} \end{table} From benchmarking result, we can see that MLPACK version of k-means is 33-time faster than \texttt{kmeans()} in R. However, we should note that R returns more information than the clustering result and there are much more checking functions in R. \section*{Acknowledgement} Very special thank you to developers of Rcpp and MLPACK. Also huge thanks to people on Rcpp-devel for the help. Testing and bugs reports are deeply welcome. If you have any questions, you can always find me by email (\url{qkou@umail.iu.edu}), or open issues on \url{https://github.com/thirdwing/RcppMLPACK}. \bibliography{ref} \bibliographystyle{unsrt} \newpage \appendix \section{Modifications on original MLPACK library} To avoid the maintenance tasks, we try to minimize the modification on MLPACK. However, for the integration and pass \texttt{R CMD check}, there are some changes we have to make. There is no changes in the methods, all modifications are in utility classes and done by a script. \subsection*{\texttt{Log} class} \texttt{Log} class provides four levels of log information. The logging functions are replaced with \texttt{Rcpp::Rcout} to redirect the output stream to R session. \subsection*{\texttt{cli} class} \texttt{cli} class is used to parse the command line arguments. It relies on \texttt{boost::program\_options} which requires additional linking. Since all arguments are passed from R, this class is removed. \subsection*{\texttt{SaveRestoreUtility} class} \texttt{SaveRestoreUtility} class is used to store and restore MLPACK models from XML files. This class has been removed to avoid additional linking to libxml2. \subsection*{\texttt{Timer} class} In many methods, \texttt{Timer} class is used for timing and print information into log. This class has been commented. \end{document} RcppMLPACK/inst/doc/RcppMLPACK-intro.pdf0000644000176200001440000041770513647514343017240 0ustar liggesusers%PDF-1.5 % 3 0 obj << /Length 2199 /Filter /FlateDecode >> stream xڽX[F~_#T,BŻlf3SAg~ϕFrڭ<5Mss pN*o.seȈй_9?3ҋ~~7ϧMQn@yY.=\ѬXDE\ZEPwX⺇H'EO9ŠI/*۝ޫ$puxWYl8kznw$ 'hK1iI!oZ G=Wo>à+rf0w#&3%kwG謖d1p6VT:5*?_GZxz&fVg$%(-"$pJ\ 6{爸Vnz&h%8-} C:1dU)ێ$% vT-جѐ  lMkH(%01F 9:A5ܵ+y2{@44%~@5g49 sn'_ RB $zFen_<ѳÄ8 Cg3R:hf~QiVXM#-b֝![i۔qrL9 xӬ&ҜVT#2s:zP%'a׳-i >YK|-uk,x&47;dz:Nܺ?[]W<@uLi~&Ļ3 R w˜ҩ"W~Cm0d*M@\ ԃ-W*w-DP~X3r뾗çn*FKw;lg7 /jޣՀȷ3<$ďE`OM). zbSWccuwT*tf,A< تրP/(z>6Y1a(t$okl?Ҳ?P4SARt:1BzGw׌ 3sD"NjEL@֢*I#KjjNo.{:4zaQb? \RG#՚a";%* p@T" yt ö5xs!Ӵ_*vr"H5RzG^}gtkd}{~D $ $AA+SHNtxӟ:d,q@\f"Xd83?KqTLI*#_o5]B;*|TZN>˝PaQn;OF7[Mi@dkKIܢy3ͅg^XtxQQʙtZ-ȗk$폎9⡱iz3j6J3zڊ{-Kc4l l) 8jk'!֕?02v> stream xXIs6WHMLd'N[SHdoHE'V-pU(WWI vX!,_bPe2 JY\H|OC?ɔakfQj oFoqҮyfFW7CjDF +#bCTI/EBE"Q5|ta*$6L)QjjbyVAQbyK*b8pgn%gu3AF73fVesJ Iϱ I3jj7ֆ4,KbϯrՃeh.BG_F0O{EVҊO ^Vh% oEMf?ioqS%*X=5ޱԍXk3E=ыa9}i~\3P $&1_D:Ug[Qv0(C'Ԣg3pgG\Aiˈ)%V`pg7ֱ*ټ$- +N(ݺ9J-u]``MtC"uR44YOnO-#~ TB*^nf<'u28KN6LZ8 .4ֲ0,kϾf.Xx}$b{l\~EXPx=zz<j@Vsm.O4R|J*- 6.GX=Y߉ P,<* F+OGiꟀ#=N56i>'gbx1Ykl- mݺl̍垂pO4&:o1`/ g^/J. vZ- MPN&%J0 a%=5DGP,-\Y>olN\Ls|TAV dGmH\ D1>kQFF=5ٌ,VrtH*x7ICGEH 3>۝TcҰĎuGkC@SJy- gG(Ym ZDKDBńRe溦q?\0,_ՕkÅ.ybE3]`͵q#r`D~LC/N]3̑FTκyh e# JƬӃ7gARO 4°8 m`(&(`4dm[ηvx'.C &0xVG8&+Wk YÀѱaVo:Ec*j,FXw2/p\AC 3{/q*4u{/W拄Z˾YHhUP;SYיzow_>)igY#\EO{sS*0KY2yt;kmUVw@eAuA.+y7}JȰB^#r}̇lhu<\9^Jw4bX#łr{ (d59`?bv\eg)7 [pl^a|>v/+_Z_)i<Dqq > stream xZ[7~_1J%>S@T-Tj'АjId-!;=|ըG4'VV䜾[?֥!ZJ3F@jxujV1IjT/~y|{vtHU؜a:{YII% ixH1N86*|g LLCLcLט15;X<'LO0c"^a:v~pp;h(/AiJ6橌 &cc"c(0=֘>ݰߧh6M11-1= (\zOR\xp<۶ߩiߪ#Wd!uzeCF5XHYw9;S Bقv0wSl(l /LM+ #De5۹xWr99;o:(_ ˞6w/xZiI(I.Z<8ҁ 0qC4,Ÿs8*qO\b${+{ 4vc|z)7ƞգ͟KHnE(?ſrUj%oܻl5s ^O|cZ&}{SEMm7荓%/dG&݋\<~hNd~f~O'ka1TSowpJޙY#׉a1._n=.ăSf[X{x*7d(.Չv in8¥ƃvsL8zn8®cg +s#=`ԦtKjSn`<4,YV<MɈ/ ph.r0䬠s,S :78si_gC<[^/3i?]# Is;m/DmIɰ|9#P2 \gY CPܐp:}D*<@)aG;@1)nuČ %e 3@ ۽Fj8eQfr%2uB7D(#Ҩٓ$G!^Ϥ!;{5`4UXh{7Ə0u *Hugjk{qL:ylrofrsq۬JA{QK:QQY2p؇!K"pĖ1'^_]-BuudnL-,>),# V!{]vh}_k( IўؑxQ\pu|گ`MP 7fxn[UA ! endstream endobj 22 0 obj << /Length 2010 /Filter /FlateDecode >> stream xZIs6Wpz)5S!2lm$MߚTrT}IH-ym,h {>8y8uq1.%Ʊ0M,.Fů٦חF+P^rكe-y߳<\*lC(Zή>_?r?|:6lrlᶞztVn'?"u'RŊs~<~LJ^~3@+x*$r\S!D;XxVFUCɗV5C; 7gc)=ht8g|Z}{#܇=iO]!Էf~8l'5mS\\ += @.E~׹Fq0C%ZJ8a,_a pEbO2L3o:@PXyi{l\5k~qY X`W[Oݶ^X6UdGA5 ?10v%*L ԛV AAqK:m!r@ƿ7{O U Y@or0nc39xhg!TzAXO(i֗^yhi m1Д/FgX {XԎ+i8` S.Q6kF[Gn$*s+V{*a4UС">d~׫xe#:F[3)qBdV~ϐKF6a7vd-uH[|;T OVS!EUess[ ]/`ڌR(-}ptS/n" ^nE^a%X:yuDXW=}PR| .׸v[d/C \ldnF+K?SBkhk,sJ-yF)KM (/ kz iF#QHNNU0>HiB vx+)kgA!v0?7F:ho n%jh?6%5lXVI=۫׼;◉Dיz"UaΡ%Rdgh?^giLC7π{vk`;I69x!& s~@p o d8/<k1]irĬLO#?f$<(0{BHj]F_>Du{LNo;'c\"s悹 ~d}6r͚mgCܩl.p"ڝ7ɌZ&1gJf$$e)p:2f`Ln' Oʙq`g<Ŭ82e}`#, ـԒpqf0i9:%d<0<2{(Qs*I&& +Al \@cpKY+)A9F51i%R!S,_f}$M>hDAj.ڂ"Gl^^@hIwhIpQٶsM`!Skb඿" endstream endobj 25 0 obj << /Length 2382 /Filter /FlateDecode >> stream xZݓ۶'jb$7LLzNO;R$(N8,ouuƳ\p}~uV#EVrf"+Eή\d* 4nzJ…^վE %RN]s,pGdS Le qbUDWζZu6؁Dp9._`G n8l_tÙ[l W\)=soq }H 7<g%+lgNKYڿSgM3mܓMZ~ pW_ȿM5L" Ìa +>hܟb[CN]VH 5$umסzx ūߪ3œLeoj} Hauf֟cNͭހx2u}E?fC\_^A*^^*ybRݲ/<rǶwUKq-4#}nY6]D e6o "9NҶ}~zop &*4QeoBůy&²,B)VHدK\)2B?k-2X %nh&T2tݎ} ʠF##K"#"ٳsIeGOܠMM6;πSx#?'[abDJJp&P~u2F1qE:'`\u緽x[03N06=:|pdv2o($3k$y=GD^@!-%ƒ`lh."VRBÒmL!ʗG+ ͦ2#OXb(bD痡q4ZmL!&=s\vPW<:״˦Z0.E:Q r%P^gb*i{ȒtvRx*,l{ ;?LI:PDcc{e2,lb߽^ZrSgtqz+h:&Viu7,γ"ҔGj;SBB2bZo"6t.hϫp |w_Ih߈"< x^{w&TeO[+zĘC/zUl<^TA$P)ȀQ!=$EXߏvEA= V#OD=^'䷽kGFR1,TZt2ix} fQF:eyOr,)3RuQƤQ?JL.S-iU_Ѩrr DH(sOr(1T9B P&K,>Ғ_i!T5$3|/7|X>e)z 1Mo#5YOrsIΦAJQ~"RZ< Ut i׹"j@e!; dw< O_X0aPVO^%(!ĄH*~p.|]$ SH kG ?܅ oAp#s8_PY2-1ZwMSO?0\V9%!d8\Zh T~Ts 4qBGIlg) mHVw\`SѨρ`%S18JKB(Jx_$V<, C8t\ѧլ3cU׏ Ia~"P|eRH4$I S:v6'<5"f9fK~pPGC}zNY!ĸEnb1@mO^P]h_nECt"y-DKy;4zY# ^;O|6՝%|ΡÂ}sF$#l;OHȒU6u2 9R_ endstream endobj 28 0 obj << /Length 1793 /Filter /FlateDecode >> stream xڝWKsHWph9کK0qQb3e 35GFf3%>*f0W<{x'$hEhw\DCV(C KTwal6s\h0I;xy9xnQt!{9 HɢjCmU:ЁQpv ;5հ[>16l3YtCFPrY4_ڛvgm+7 9xjZat:^Zd@9JK|7а] >4髚.D O}E_M AǪKh Xx:P:U\{';N͊㝌 2 ߏ'ԦB/H^4jn֍"S>K, %[jx ybtHH4V.0KE6Ƞ|-{ ](7y(9<x;%Q&vc,Y^#kUlLYtZ`)[쟣o kO~yd(K"Z DWƀmqQAU)H}($aX JHE:`"сy=CkEG2^oH;: ˜"pUP0!7:N$qBq4/5.|,k;t'@_E`Q爹U3x91% K!,X!ۚF".漉}uZa:Ԩ \m`q:\:%QǰMq 23Dm09X@% `gJ̔{D(m>G! NɐLIU;õ@EҮJPpij.+|`naO# M&phŭlY)ɵseCqqpŪ-P`JM=8kV!:7;PRxzO-jkr-F-͗$i6 SgQ8T̈́;BWXO@.ж܆zJPbh`<&zWp KtĘׁHFTҒ ;Xȝ" c!ĀF31"R]( ZcBnD@A$biGG{.01REb)s^ 1DL<5y0"8eV&!XPFyBIAͯ--S D%T,Iz~ÿC sٜ1IR?PAS%:3؎PNϐr- =~> ] qU>BYŵ(|:e=,w!jַr|aM|򄽾(@c'f<Wvm;jBF1d&NP\?;/–ı0׺Z8bZaQuw5Wk~Kc'& ؔAɩ= AMbȅu $&>Ǥ&VN@ 'A D[/5]"U{8T.XSxT+ D?KC#.mË^/_8 =| 0f0ySPA endstream endobj 32 0 obj << /Length 1356 /Filter /FlateDecode >> stream xڵWYoF~#D }kӦMcڢ%Z",*Iu}ZSgwgggg96AxFƦ:Yid*rlvoᷫR: oV*lVk]ﱲGͩ8UNx䕛k? >#޵yE.)IIdY /נ@W.H%i;70U̓iC45 U!~id :-B!\Zh&8G`xgD'-Fe[O%Qb5[vraq1q.:4 %'q6"fu yA=Mjm7'j7@i_`xtд*u@ ` #5^bO~^=ce*1Rk<gD \-3ƀeX_ٱ̷ۣʉ;H.ض"Tɻ.\PvQHb"ḧihd:r*@ز~_xlOdb Q&]6 .^K*|#@I yx< pwɀ:zy*\ QYy8h q2Y)H֍^-LBǖ;~< h2e,oѮ$¶{ ua.A8iY'C3AI{ s<9};]758n[aeԧ󞄁1]96zcCh\'1)T="Wͥ(4)- ݅4i,2=OgTy9O}:) M"90b dnHIdc[~1YNxъ5ן ƖcooL$_dSXvLx`y+[PUBHc_Cܔf ,"{PF'.\%rjcxt8Drj NsZ$B+{!5 {k jE+!N3$VlLRzy;6TߦٝjVIȗ`5#vq >4( `4IY-E> 9Uq ayԥdZK65Я,Fr(8x%bn|x"OǕTOdlj?ŭ $fq;DݑIjl+;GPylxАb/Z-1^ endstream endobj 38 0 obj << /Length 149 /Filter /FlateDecode >> stream x3135R0P0Bc3csCB.c46K$r9yr+p{E=}JJS ]  b<]00 @0?`d=0s@f d'n.WO@.sud endstream endobj 48 0 obj << /Length1 1660 /Length2 11691 /Length3 0 /Length 12754 /Filter /FlateDecode >> stream xڍP c lpww@pNN~r{]ݫ$URe5NL,|q1MV ; %5?rJ ؖ_ qM&tz#*mVv++7 ?D@baP`ȂmA`;w 3s8Xyy4ڀ,9-1 69 s'';>ffWWW&#LjdP9\@&?J(m@Ƅ@ P3pK 6ur:ok c㛉 *#d,wsLu,l4m쀶fS k<hkh~-Fo?SD ɑpfI[q $,@o}wg{V`W[ S [?0qcVwHy!#398YXXx {؜jv?jLy[!x:]@'gXY&N#-? ӿ,:,o `_zofv爙TTUU.J11 `ep}x%yV +ݷ>'ew/E4,. 'uߖuoFRi"636 o`Ado jkf6Z8JYL,Z-H h`dea?2z{>vOx71+c{067 d};G۟[ `f;ފ('Y_,_qAoL"? _xF ^3 Af,X ű| d|f@: oϟ6?; d6 j%peܝF6KÍhp$͵2ɳqK=lhkrۣדAn)cѺ~"xBF5=g{/ +NY\{g|;>i+c! {U\rO_gգtK(;1a]bL#xDyjo{U9vSkA_cMSy.zG .fs-%i(91 fwbRtg:XkO*I<)KM+ɇK -|ɉT4!Mwwwy%FObnudp=ݿg.NK˿K6I gf\5[Re1S52ԫF'_ģRAٯ8ajk)4ް)'F 6\1_Y8R7nk05N(4|פP"Q[&]y0 `}10$/CE9#Ukmo1eR9M|XcBF@#AFt!=/'5ՑD4F[ѽUİS3$GyvE&b '9f|si BdYN')îtӻa֜RRbwXUI_ NύDiƵV/$%ZakJ{߰,g5GTq*5N3̔ p0- xY'f ObVd { r2N$kN>Jjva+uZ KF4G)}:Aexڔ{M)rN]RL5UtΖ>=MUa- sT pBLU]XC(|G#-LфlGLc.T,eYEGTA^׽ \ ljDtt;*e^}ptZz'G]"_/dfN*N#ꨉd:6KSK79Ydn,=:"ޛKd &2t =~n'82+n-boP&F,~x\2o I6Nm ;f(ҪDU{^XﵕЁEV{ [uԱQs0AqtM!L1ˣ+iWٷ Yx)@غllCUc t|t)LVʘS.Ԫޔfw=-w/2_X .g-=zOtעV$~ZXW}xVbc;ƚz]Yiɒ_a}ˣ8haSXIdAKIBrgd%LCj^WܪXn  {t3D2]Jx9<^9omQ Dl.s'x05.#DNW Jj ںb6a'n@jG!\LPp{CkvM -;0RMz+֘m72PV>Ə- HA2\5 Ğ X~_*Y],.RJvV+b! İa5 ~gPݗ,ř|Bjs,*1Ў s6_0@~t$`Mv9uϥENNwsp:?>Kdtm6{˖˄fm$fȤDËVil-9u5% t& tc>; 3D#5~Yɤ)ʔů=;N*„p^_8Y6?H/%]p/LlvNK4x < dbk2:,[/ҝCde.sB4A5{o4oɩ UQENu7{wm-d5@ӂo L.@s䶀=g'1]<һwzOk L]ݽ>byXw_.ޥ!At:,@5n{KpIPDyAPŅMmM^81i24j&<>ʚ{ BtFvԷ ;< vwcHj?/JRH0lg x$ _FV 2"rI2H 5,:NpnӋ(NdQx.lިK>EwC[ޅ*y RX{H&7񙥁Nd%]a$Y##fHY]|{+xF 4?w3qFk 2<|/5F[2aMpmsSYCo#:YzK6}TA'J?V>y쳫n^n¡;[ }!u"4ɷ3}^IGc_9* ghBn?QJ|¾3~<,'9ldBV`;{LwͷM2Ǵr"ɣ- Ԑ~•I' ]9o1ٍ"g!%99*=Jst|Z:j-N͚ʟaکT}:nÎlZj?Bi)za!b0eh体-=n\)ۆSh_]8~-w0JVÚq/l d/" թ">Oגo鑳WJuXwWhg+ef(8x. :rP糅&&&is8~"=C[SZpeH4n7 ဿv9fh->/}@d}s~n /zj5`9e~VpʋH /GI7sb˒Kc;=_t.4˖]dH^nʼns 8(!^ym[n8p$o{V) ׳i͖1d8WrbT81gZ O 5zH{Zٓ>DʅԞ8e=9pߘTςiMX}+r3ݷG4xqxLY$S[? B^@+q%$Ǫ7omBdӒ4>I雬Pw~QdCm:/#!bC#|Z1Nj =LFjT*xI!Y!R@fnlwAރNo~dLH9"LEލUsRx=hYA)Ð ]z|o\ Mi~O 6D75m'֤4KY)g*sz;J'.Oݒ:6EPa'q(PѠz7UÊxKGu3 gaG[4i|7,vpuvdJ> 4lg VsWaAGQ Hc(:s^/AL˓\:G0@r-<@!$}dhH5/&\c B+bxv9MntJDsDӺ!P1j@evW5ؘ&4wqb=PW?$>Tpoo@5l@h9U_bdKGg$'` MmcHUa, ޳x .Cu&E^)NACOOip 0ٻ Lw ^ٸ>CKgo._-msi?:U貍lkth3ͮ"H ߮m>dE;76"oإ%oRLQPz 2 y痶٣mzy5EA+ܨ•S`lO ]Cr0Y5+㳢@yLơz 2qdGhNL=\Nzձ~o=_AnmKL?> y^uFa›Bj@$cozv~;YƐ)eyFh5(A }58ۀ']C '=Z\jfGeoXSe`LD2 Tl]!eɀ0٦߲Rn,maG(2z="3TA_uͶ1݄|7&mBC^?a:xO`U1{| IAu0MJBW$-wIp^jI#p:j sD^}JIzƁ]Fƌa))+צS\D~c_Ւ[/)%"S8y y{tU מMDz[,|5 [ xdW7(dմY&)8D0qc:?\$VT:9y!3$ʍ;+dTAXi<R<0T(c6(S+g,hq *Fx;d bGQ+>s0J'A@{i8x:G!~ "躌r`gV!*j6})F5GRiwx)q܈2{I|i۰^R_L"KGp -rB=cHiS*ZhПQ~ê+ *6&ӈFtɱMW{#|2A2%O_6R `9ڤ"%V%Aa(- LGvSFߔ 1^Jn+j&z-ywޢ"8tA:'kuwAſ0 }n wᾺs)JTEL{KVզ7 c2^Ȳѹ (kT8 YӆaN~Q9pzsezd3#(o|q0K8⑯bq"E!OHQg]e߾&OMвrTC1!x3TmNw>CJ&Kױx}׃vPw~N$YrʴK] [6mB[IY;o'aÍX%/2d.WGGT!kq_3\j6 ,J`tT]cKb}cRtnW}'biVgJG?CpL;8H8^2")/l^B[,0}G ާ!]hZ' (O\Fk'COT}bR߿Bhu:}'Y]9ztXnf8K^ &=W8f@S~dE)ucAMd{$) kLDFueq.}lU1[_Ŕm~ akI gjHNz>`ݲYVjV ^=yim\.q>|~J#6=Q֞ZXfM{#NOu' nV\_xw:^7*)IUt&¦刟<#P-=); [2!-/`O;1{ZO9dVCYD]ctڥQm"~$asJ8ˋmgDD&y.v L͌{w,ںCJoz蕷2քo269Rr"R%}u@vmO#৛{w7Z9 IͶkfG9s9 Q˘yN:c')Ӽi1M-'&b>gt$}fl |5yWM+b{ Uc` <'ԾXZ:CˈXz{~ hӳ'5 \E>& *ŒKa7C*TP<FzU}򆝂w&QwㄫQט;]\҃Wc<{m0M\7ԣ\^Z?DsgBc&}&T^  ٳ%M*LfJ**׃:(,Cܥ}|N R4mJ:, z[6oӺ{sNJl~y\`rBoW\e8C)ZLie =<1$̫sGھ-[в"(&(+><TK3ehZ_ !nH&oGs noy9Q+阰,O)~?q-:?W핮G ۧ"FP`yB )G؝Y6)8T]OZ 2/!~.xA3^&/r<Gf`8䣅+.4qi%Qƴ0x;s#|q X\ZI=H"Mp~B4Zr͸|_ L-& kso`JX``w4)ۚWⰪ}FrPA5bYda5pc+!/v\L6S9c8?pXƝNϛ5D^ e[3Sg-,IK00RI ]2Z!TՏ&B'cKB-.\ΌQd终Ĭh|@S`3#]7c]ҏ@cxOr~Jȏ-s3B:vVRS\Ao?L!i+vx\ֽ%L^yáv\|u3$v1p[?H6)PP^0wiXu f|gWFzCtM7qVhlB+T|j2.d9P1U#hH֡/R7L(,S\c焊N(rBk<ւK,kM3~8MQJx4=zKĄ@()C_E)WQC=V4t92)!A0n XH9FAx_FHLfZu94@*'VO,,mNeH4/ʤV_4=7͟ёcqEauyTs+^X aY L0t0Nm/kh5ȝ tKlBeTz`n5?ޫ`L7ĝS  mokY% ͂n:;0?4JTyDrt)EV.҂c8 7\ZU{e&J>f.R^B-о7>a-,} "%q̰P@2Me/$B7gIf\eTcҫa`yIrƶm:ܲf|m~{cĜ;op@40 JeIPG2WW&0?tiÈO'6]LzO4JK憓zLI;Kfh/E#zW6xQǢD5'BTum3mU=~Vo3]>d'1CC᭕oOMHb}?|BPZ죪hrnP=pBB+qaZ=t)iB<nyqF/ qWW^S%A-wx\~fE2KZVD /E$M2"XSHk,{ʷ[9jVPP;91w27u%-T[MD0U.)AG]ڿTOE ĵX0PLpQ`G V^c* cb%B{oy.B |FtYxh fM;Zi. 3YSȮxEU<洝[G0'ER2ZK&1|x/Pc-Ќ}45 ICP6$Ye’f̘A endstream endobj 50 0 obj << /Length1 2022 /Length2 13882 /Length3 0 /Length 15118 /Filter /FlateDecode >> stream xڍT w[RSXq)n(Z2sZwQihIZA,rg6Nv!' N_9 ,/ i70U&x5T8<N>!N~! !nB@qIC\|ll=^+ђ )(;@ fg rA@+@ 8<6^n/'[8 WO}o^_ Xjtc^7)#ߊ<3Grsur==^@ TՕ8Z_u$mnJ?h րq898u,^יS~]M)l c˸x 77/ (q9_ W+@5 %#~nPjUA~ APo$j@kkݿkL?Hf7տ '|Mi _zc/WAoZ=ū_ÿ+W*N@W* zW!${}}KZ?3qy])ȿzZpW=wJArxk9n^oȿ^cx |ex| 'ts{^ Dy[/IͶ3%nG?[=b:S݇u~mYke'&ķfZ3;(KC_K$ɑt$v\B`۠{ \=04po}V&v4wQfuZQ#x#1`]|B̂+p+nG{/1-!95 ~EYRT\5BX&ȕHqP*홨DK-:Gn"F'Nbw'nC\o8RD\3#tX[". ehJy H^(0wߞ~*gtXLgl4iK/`~?2T MZ?l?ѰE-2\|TdN fcUg3bɪQ _Wd1|~ksU[\hir6N*%`Fj$R;ZW0)gGn%v9rLb_bnDjayh[49b}7E,u/(MI?Loo꫱K2tƀp4rTh]ڝk[c=\Г%aFck$^mIEӶ2a>lvDf{a{\iy QgJ9lS~̓+{A7 ބ~;B0 i3>U^DHpD EbyW=bwr@3NZ,&B,Wԛ}_)i2Ao 8ohC`Ixې:h䜕vsvS@aLd>c8!V:qay2}uC8#R}Lqtؽ1fEDnٞGZ3TijKwZQ~ËI&,G~H"=D1dfn 2vb/͙gP4<~wd<7A*􆬙Me&JCaްtPuu,NUgɤ0`CŊ7ҵ_9r(ȭN׌w\8mnB燢uex>Ԕ,y6=!ܐ\zE,#aeH+b[ Ey!mkpm'lYw--GX#?-rGhL~96 l}9d5=wAÓL7.Vdo6jFڵRٶLQZ2>LlL(RW0TKCijJ۷I|Ԏ+\arUBڊ]_};*%ZV>Sb1vB$S"~#w MXp FPCum%1-EDjs&05Kבfc%Vb /Ov+6f2!k6;¿S0:PnXwD2kAcyZ 7i>Ey`jqjfP0zwNZOvUonRh=D^S1قp!k!rު:Iu2J㚤M}]:kݭW\zɵj tR˶.VhY7/wߒ\B/S+~Ø,|5%U;ϠrA>3sQcɩ#i7c}~F Ihm~Ț*rkDа5+*;xN&W#:aӢ p Sj!pޅ~8ӒY@Ob,<]L !rb߻XD.IP/e9Z~kī5 A\[hHH 542y%]7G޷\ $!}\8#M;[YGf$Gũ kB&m,J,RTMmoaFt([+w^z5KdL14Vqnms>Ҟeh6hg)aEg`|O* @Aq<퍛h7Ӥx`HWJ;-{o6T˶" TDMGMOi[~T.3*imO9llAެ/͓U LizP1,. c2D|wu?`g[>!IPc;hW`{|^uǕvgjQ"yIɵ0|6-(&m;o(afĚ˗&. ԁ@E&B)bfGșdY:\R:{-3qed9F绛>EzUb,uK'Ȧi8Ҟ/Gu:%Qө/f*3%4su *VoM(^Хkq2bQd(mYNOxՠC^]>4aNԜNL7eАN&(6 :6%ݡ#z9 ;I[P$]h߶E<_ɐ'7ퟝܼs4hR ,B|.:~w|<1*5+/挖HS3(0f&p~Q;pX{T|{B䊵 WIc(Y[Z8!?5m%dyi)ICdL:AʆaJAbA[_J׼L_ộª͆԰>Uso-x!px5*$Q=_6]=mWt}SQ(w{۸s}NibRmStəu^YZ BKN F@Z$aEK2V@U x~â$CH{ZE6VShԦs'sEv @,Na6g % 3 2;JԛI՟b/\2<eH:܎FיۑI蛔9<KϮ.gUw D>~>[& [+\Q/sp;RM(u;`7 z)њFBpO=.d^*/db٭E#SRo$Vf y#d9s~'a}Xf=G73eb[.gm.FHt m'Ի.7Ln1vjS|̴_=ۘߐH<ISnnwD_J~kúW" + GCIAZ xJ$M@մX܈*j(tS@݇ ě,\~GڪaH0[l-hwڙėpOHl-S!vأ[$:iP*jA,.P!~1 vۋ`#[]tA">[ 5sMKBi; #g]>U#5fo?h1MUۅs[LUg}ڮU>Q(l$fo;P˰Y~Dh,_ff{͍ +}J2C;)B&%0{t<*$̰lZ 1_?3D^vO;NuŔ߶yTM*:|lpM,j܄6ycH~hns>wedQ'Bzxtc㎞Ǜ ɤbjj9d2% 'A7%B@ pwŀqqH-Sʘ֢屛~1h'Tڡ=5kcc3Q[8mIBtbR@Tt U:rh^lϯː+V>4m073z jXᾗ݀rOWDZIY?b7#/z0&qatqQlFH2OPE"7ǧ:yT%-LʋbxNG3n>pD[ "~yGa|3"Bi-2>sY??XAtL6zFH m# 洈A;vIZ}=)UIaDMHpIdÁ[#yUc+ƙIV1daGO (m#j^*9^]G*ES=~]QnH&A݆{=t87B@QpN⾸joazrhyγ;xtwfe\ж" *_L0&`@ed}aDv|"meL V s;?PrI"3B˘^ʭIp43[\ >-Y鹫|Ү7[A&qhboO/mX o9 TKqYe{,qu}CnQ+}@ġ`۽ S@n&7@et0N'gYI.Wj{pEt-~z&5QM|?P [jSz1@%ni NW+JI4]o3iŢs{DvfNȓEcYrW!lđ݋&3 RDr]EqkZn ٍ6X5P M⦏? ܪ׻HCBY9ܻ50#颇Lo"d3AJ=p39b:mV|v[c޵P_ p]$B3g5$'XdPMz䉈10)sKe ӨL27ѭ9 *]">aPj `[?^/.`Ա6")+{ F>zLxS\L- FiёqmHi-hY8\-)QA7so6v_Aɍ*k)o! ]3+;ZK3x1"fPb+ϡA%[#oi\"s$w<2ކUwy7.@KÜ`&<Զ7F߭h.CWa XٿKkh2VOA8aRc ЌkIjJ6GTUс7 .O/PFF Tja`& Ȭ:hvZz5lqF*9!2cT}8Zsfvmhz~;^7!/:M #c) JPci~i &Tv< RP@(ʘ^X0~ALuYal<2"" ,?>w eG kEvt; :-jR{#2>gΏR̉Ky8"d(=R+xlse+}kq-eiQ),63{Lb֡MKa<<%luڪwE^w"shQyɽ;}8p&V{_Pcʱխű)]>N8_'%8P_gScd!jݒ78#BӏffBPP˕JLNG}zX!ӋBƄ I;#&%+P "QL7T«S,3I)HK/W~^d %˳HvnVGkY_F޴+px$Qnb1"\{Zցo`!T;S$q>( z]p4~؆W,B'!! -A`D% 8/4}͉fNHPnAj#~2z J745EWR2tQSIT)AJ/ ז"3y&̿"狞w ŻP#k]wR}yI,{թpjgP&!a v^00./v™CQbz[օ@ 7D]; x!3*ؕ5%؝,{Ҙ ̓I,,ZtXYOcg@.ݑX=ʪB~%M-rdJ%ďI1\~f#`B~g3{|]\]M_"s1etr@0J!@-BMF=I\k#D(y- (J9|qs#BiY Xa->ϥT%^cLbϙX3 nrI}>:.`LcgJ~sh%C[H"8I6NG4V"ל4Kl?wMU+d0M/~sɂ#V-3ROp2"*A}ۀ̮ې~GiP52ÄT艊#s]iroix_Y| bQ AΧ˧TbW„FK.l譹G6=˻ uSg-> {Aa}, %mƱ!Hm><%jxZ毗q?8ޏ c!Z~u+ܬbvp=2ک*jNv9,usz-nTvr@7 Ѕ\LufsR^~c@5ӷ2WPH~$l F0wd35y2BbB#fwۂuR0C(}V?,? _컓o[Ƨgd/z.35~O=: EBϿd;ly\uL~k2 ah渚8GLBnޯ+2Ͼ-'J'TG.| hT!';d]QVjT Y!Iˋh )-3i5{P^|pO[f E׭X݆`|sYs3!9..F<^?r]w[S1C܎´f8֢:cV9КՍPWbѺL c!8ՠ  MLr ~k8҆" L wMHƭY](q4`^CתAGJ]$<`sG$V$ ׬I!l>7O+fٰGCkI{r>N(E;~j^MBN7Y:FJ`f{'oU>C(QP!]9ׇ,_͊nAUn,ՑmF?ԔYMWᒰj)YR0e;vS[wFLJjD髒W? H֫E td p3m~gk')DS3qD{g kl)(0-,btFB#o4erC*QdQq[>odD8+x.k38&%ek( 8*p)b/uAp% ^ql>'PKjzc͗]eɠX#C)ceA)kkbɧ$R\i JVM%-iCS(z5C#W+_lW9\E^\z>M9I#ܛ}u_W}:*EC<,¬WaCL2[IU7Џ2A?o52X}FxmHxj[;(e |З;c[?3BdeLC nGU顧;Y#Mq6B|%"t2L+%gH (%=Dȭ3r^ l%hDIf I\8H~(aw? I)氀i(>e1 e==X3!,}SiyC=m aMQ𷺈'\,cKn{$g:obnե@"x ףR~) V̆bax$>WhDR1LٌE¼굛TԝtJZqp~9$`g~C`Ȣݷ͌'^bo04{8b$0t) af2IP?gG`8XAA0C {OZиO E![q D[v3N{=Z$P'\MFDzq0#OJ{yOD&p 58n5}LY܁_Gs=݂ X0b%om}Yqj+ەz_u]=6LM̦p ?-pXgXn~Yfr\w5W{9l_0gX:ƈ7t1ظ`s'xBw 0ƶg5"IM8J*IVekUxpk`m-IjOy%`UӒVwqSa:4jyi545h?4smU&Q҄ײe_l Tkahm2ש=}<ߛuJE-|=>nBt8 `?6.^UFW/n/ } K i ١PWTy!.lxF͵0 1]1m BzE?> 4`?x5YGM1jsI ĀcK7[0&($MB%MYLs1S ){q(UHGS>9.S}~g whwg̉/im*\P+Y]@k劽|L"}@i0ޯkzQ*,Q%"+Up"Iv@ҕ>ax#T]'3s.x}:Ǩ6)E("WY .v (d}*$;bu5p81ɗ nqxOFO[&cV2ݴzKїmN`/V`J_g".:~ۚ hXk6sn0bU[{m9UUs]_EKnJW1q.k*tG=Td7kHⶖuOmf>i*D>f  6 p]^i UыL_lv(y2ToQ>؅%25osѰ1w9jL؛S9x[B*uu9I]:$=EI8=ʹbV2}Ϭa4 2Cc^z=ʹn<1-.KvCLg|tC4}k0]uO2T7#QtdI| 9V>zΏ?E}^lχ#׹ӗ؁QfրqTԊ(ArEuʛn;Ni1AA_pmgI3ޯ7ߕ)lf@rI-ԜJobqw~oj{A#Gf\AA[N=`$#BvVǜ&YdPlfyCW4}tFU!?N&:}zT\~yNwYGab^o!YEZi˺°ˋ=<ҋt{_; endstream endobj 52 0 obj << /Length1 2539 /Length2 21965 /Length3 0 /Length 23420 /Filter /FlateDecode >> stream xڌP\۶ wwwww4%;Aw ^ɹ_^Q1VCM,nlqv1 $5ll,llZv Ĉ:@7w;g'? $݀ LSvv(x89<ll661tvH{YY N@wDjIg7;[8|Yy>wY;A@GpDKs_.lA VV///sGwg7z&ty P1wS "5@5  ,NNV@788@S^ tX&z`gaN斖Ύ.N>vN6k; @UF b;Yeh >oin`n6;ss:\sts9U"_n]vtvt:O h n?}s+Vm';WL"2  ޖqdK ` .`g As7@n?M+;Khc;X wwϿeH(Hh1S: go3 ג?5sYy'kg?ق?{gs9K@%7bfb^m_^oKpp[M6wsxi=@ NTϥUZy8o<|ĝlm7Jdi϶#99՜z_:ղ| ~tWo|s;_W`f2~h{,N _%x*R*/Xe~;U7qX~'U7qX~8E7sQM\T~8oGM蚿 ]7k&pt%~7u 7s˷@[\NhpX:;$\YmZ$V?piZw7]X)~ =`K[6 ߱q:a"@&9:Q ;=s:zpοv/5j3+ {8\F8x,q9]=O;cܞ>"(W_g';Q u7JA^M{0 ؽ|nǝ_H D\s ,Nż3.Oy; VeGlRϥ~ DT3qUHOko%4K=T~)]H,PA2!V_/0MC ֙>%Ja-ANM:׌} vV@B4{,GV*qxr6l{V^}s0%o!.haڗK>o5!.YKaЏ;9%4A7ElmIlF9Put>;KRju ].U.Or[ϧ O8v-cI*/٪F?d~-Y[hϫPU?bea0*+)zDj : {:+CvLkN50q[qK:d*X'?Da ^^Ԣh^{#G]^CmSZShޜK;@%OWOdwEV9NI=p{%sJ| 1*ܐK*B=U(S+ٽФ/m:9@F* $Ȩ6])"ƋxFM957.C.Mt+^pY<,>J" ˕sB"\x\{TW!`1*\JeKNTRHY+y]J 0"rK׌`目6nё#;`My\z0l-R<\17NR#:wԮxC e` `x8Xؼ8&5y\7-aLQSyуGu0P& ǩU_ZTa%Fy[1V%r&F3d`q`:XX菐V/UǐO B>m `aU@wjM@6 9(QO)uV9HUE1y?d IkUMr䶤{/Zz4>}=B&kkBNYK5á{UHxBc>Șn *G3X:T#^";-}iYhI&VJ"oB-sKC'$óg 4h>:%esBڰp qW˴b4ɡQr ي怅.oe?z|ק3lӴ^·f$2?Csͯ>=Hbr>GH|Er8D;}1Mwo%R<[eRCWNʥ BԼ p>5 .˒acQpJ\q/^>mQLWޱg[rWp*tH<_6BD7]{3D k~Ⱥ iP`D=85D{(kHhH;|=n0-VT {} QDXj_1XHfzr(EK4T=F+im$_R݉84˲O|.S閎nzu Ei $ì! ckbl83ekg<ڤVD}.IEyMqp&m h:{_ޕ tl#8oSjѶ}@ J䴰 rPdOpf &NnLKj2v#CTHW^ Ks;n'm:p+B>̣ҫ7 ڦk<]bk&ɿ')'P޷,m="}X7& ҄eP^NcآG(V3t{1(˖NCCOCƄq%O"& KGox#,MxQ/H^K)Ho=-c2"Z%o !)G. ә| t Γ !6ҩ4'8f`I^}bCe&uzz[{p3dzO7M-Mq&HA3cr_NcT^Lc#/~*zP"R-U-Jڊsg i?ts.Rm^Č%hq ٨Lwrl~I~ٌ'IY;CsFNSZȑ_jtm R[(-j>*4DK>FwϫƳyDwGLՐ_@vr_TjL$2:_^\'`qE<7e6}C?Q'h{.tCLno|R)_w!鈭|<vKb5/'l#ma'mР# &VT袃>o ndCAIsl]"G4k F3 uNQR".6T$ebf &Sm7ܕ%^ne!vUA}]GSn$R7 qKVpPq!sE"6vv۟? g>yk+&ak)Qm{ 4|C:Z-OiέB۲)ߖ*W'D8Ʈ*ӽ%_/DfwbopV,~|43?+8mX;n6eTCqȃ9)r݅yfa蝀@oSfHL6'T?ZQr˘=c2fIGSE3YcuYza vL3$=lXX3.w~Y LIw͘v|GLdPރjkEě7C67\zƽ\ercE9&=J@Lwȷ"GJ.?i/8b=+ ^ݻ| z|/wl]\; ##G=OC,6ɻ'(5yQIkaNQHh+wuc.a _Pxkjc\ƯQxt'>PJGViqJdÄܵz6vހ7L$y W^ҵ5Vy/~} 8֑RhoC $=MkH|hMfqrZ L4t4xVA L  !`B/ Q ϫD&m:琅7ek<%@2u+H! UdP$Mg-sl& {(n/{2L*4eXvFu*=r3cWdݢZb 2x{*Hpz=luBUs]\m2R- , ZoM4a&IÝ TZq&$!jo4`M37gSF Mn-k 5X~^6B/Ň]E_sfuF`ʞv'u4M,QJ7a_}"n?<$_iQ+T ȺbֱS8iܙ&G0AҜaUBʓBJ.i5~=?w{o9{n/85ݓ5/ Ld?ts3r Wք{CE-o7U:V$4#B{ *>P6ktnX.3Pb`qcSMUݪ}KJ#h2ŭ%XRondt`s` ҠG0_dn)c~Nx4/Lq7i:յ\KU4 ?Eߪ: Mu-RW/J8]T߃(y N3DʻCe6?|#D?»)5?lT rFGjˆbW̑1<ykDݲsĠ'cί_G\zL搡+ug{u:j"}n.P ^y^7WL,[he%,tC\^(uaCtkmsDSw||^ImŃבoE]P~5E,R+"aǵ%VBnn|<^ .TS!B.ye(rm8W ʹ;cb\z` D$dQ Cb28iA,:6\v)7e#Pfγ\ ;:R Pecȣsi|}w(+Qm_ddZ=gU]_Z)B,ϩn͸i'XyrK^d8GD F\~DIT[ E@\)F]'iKkIJͶjBRPcݎ!c∱6SbzvKPDrLN|IvId%kM{xKV夂n#S[]7/o?qBDu.2p e{ Gm% MCZw?X'FAo¢%SZqP .a_vY" +cAlCJQ`"of &Jk1{#Y]+ޏf!7)H)11ϢSl19X~1چAٿ(J'#9i >N"om"Txp d7PAh ujƲJA(:#Ȅ)ں}>&SDfd˩"4P> dI P#{ۄ8i1Jb!E5& uZΧJDl,KOE9(m:BsG.h>8-g~(K>iw>G4?Yh?+':K'Ť!D/ϡo߲s;ȉԺm2Q`~# -?&l SV+{_bw 2 ":osd~( !j~́ v-d}1{%]L,-7)JJS)Wbߗ\ųI  )fH]q~B3K+M¯X`_C*D5u.yzƴZRtɭe\dYHF!1^%mn{Ѵ+># J(|V($[Vf_E"+_xA$_Qj/TԐ's5[b2(^cm嫗Cs]J~YCĻ5D7.3/7Zsȷb_a83,Z,W?JY@`}b2J]dэQ5ɣFL t01+mO||Ȫ\.A*Jt8Z$ 5fʬj_Pwxf_1wF06ܲ"duUw%"\P> µ*[AiI%et/(ʮ|mr=IoFHXUz`˔c6qD潁b Y@$~mTŴ۞G|AD+- 4L3}+ZCovNmHoUVx{/Ql~υ@S!EMunkg*_S /%C"HٳS"z銱9l4 ȃhJY_Q+'>Lj Ψ +M P7-0A26Ɠ ns֮a_h{Y¼gpݓ f1z#[}9+dA<MBu 7o'\CuG͗>!Hh:QQ7YBrW>Z^9UFrfs// 5_wQx1M˿{*qz'MEַ\.o4`I`(4ax9d^y2'2ݩl)/etCM!Y <3X[k 9'뎳ju^PG4aTMIcwo{0$ .E|g#G&ǣnBaEleSuEww<Hm޲bH9* 4oeS͜N|;zI7&-CZ%ٟ8jH4eյqUP@Vu+D<6}QNf%ބ=W& ڐKΉiP5pX'yeZu PR%a\! W5=/Μ8򢛤͐ Вx ѽ(Nx`m#II%)JeBL5n+dPJ!iS{rng_ߠ8dj!mlV kicEmD.V]p,f`|$Ѵۣ̿b*.Svu1'uGT~ ] [cb}Z܀ Q᭗5G3AҥQ4R6U = aE\*9莶Ŗw8>d l7Fɒht_Z\9hɛH*fQE,xHFtRdDʒV{ǂx.!>jY*N)h*Kc?,aS>'zp0.R ^Ap;:Bk?2t~C[ŽsIL eu4 i'vRR}䁾uV>P;q AJ0$ EolZeO %@oD/vE&E>p(I {F 9 ;yxK|wQTyL# pvY\Q 5޼I},'RZES/w<ӛR{hMtwf\fd~0k%:s=|2MT̪pLpUlq!EL͑$qh\ Ӭj~ɔAl`6f<.ir0 'x!{pJa7ȣ .zZѐe~7źW[Pnn+mTӂ^ "^*s ^,m_ĶF&^vM5 2.ot^ȷx k 'L?4kwNUAQvdmu8 6^|sOBKq"Ijp}捅ēfP#wvzF6Q99Kx*4EMHS[(j͕r$ټWv,! 5FUǮQɅFO/a1Ԫ177>l?ZkUfJb*{X| <{6q\+WC n߮f 4MkҦ[4k~F3}SwU->@C@Mܼa?{M޶.t;e6'"r0 ȝ.`M;<& 4?Dl%2 C2P'H˟tYp W[2z d_2 KYTY߸ d3i?t!#(/UIh`-q_ {"d؈-ș2glEiB5p ˖- ]} DB;7!1J[fdIV&WY,te'uUV VȏTa?EQ"RS:0ڤ&/ 'vr[u=,4=ZTH}7!"38J3DckJ9Iӟ+hEٟŶ$uQm6Uqeg|^/\gNmYkbO275[LD6{}Qr0Q5PGͬ6쵭"RhiAqÑ"]xO/!~ÄʅI>B#weU\/ҀQXG31(ޝRpǪ wXS?jHA+=0b&_9|H3w[1C>sguʀ8b}/8^rKOe B$:4Fx 6 4l&9wC-eĩ9˅tW8.YdY9U F2=>ӼYow57V`5=}J\I'iZsuo2ÄJ&xo$fPB77}VrMSRpl J0x_~@ I0_|ΕKmBeAAi+` vN@[7d#uo* YN`0BM!`6q(S|gjҿ 4ܽ/şdU}KzC}aB,7ibbn&K^t_`ҥM/!%VMhX#4TB}l⚊0Gmj'9l{SUjIoy v+;wa ңTv[4큋L\).Z܌;@DJs1`=Q9%2 T%IleoHwG[)!DB LcO2Uz; Wc{t^Tg.!._)lƠ pΑK6SLTA4L.BoJ+hIǵ};F.i4H:wy\c%+~f4Gmͥgu:EZjKTܬ _ 3Gh$ G%LM7QňBI!"|ԆD9G|eoX%ܐa\>4H/Wnv"|$zY2`(_D!tiUp%wc%"ՄG˭X)=!RʔޫL(ӻ$O-IJĉS D߾R]>v, 욱jYmO<+׬D!K֌GWOwM\8l o/]0VRF}Ajv0Ġfs gDx˒%d!,a< o%׳Cw7R sWM-fHQ܅~h/)XE-CvcV!-'Gz}>ևռ0yBWxZ˞nj?a/tD8 ů–)=xm>~~VAbLJH,"l"Tw}/*v*!鶕ޘKST#=[ چt oцɜ$vIQma"=^WɭuᎦBMFz뛨|&-Zw$P}=w5NKea?QIZ4ᇻ| ̳i '%cx앒ɝ`r^c!h<(+GR!4q0K|X24Ég{N:k7zil2~zX EEzI<}m [rY7CPvjBo!N$Gp샢ss=- ޞ*meñ4|}BGKoukUr!*DLDq&j dUT\ӡ4jT(JCpAa F|[ӲΫMWY+Zmb+,5jn[#,&VHot>ST&6s¶?U;D&j u݇lu^ Jw/,n`t(cAOeq8doS ڣf@bۛn-3Zh-"wr9@ӯJpt-$# mr]i?d5F!XG&(ؑk *jגpwHi}ػJ޲ˍATٷtYSa33Pl)f.\s\/x]׏a16vt߬7 %Z|[e4VdYc1;x cۀ~'PFM},3W(:C=9t'Alql-th'͗@ʘ @5D! ZqӞC,^VԏG^<Hs<>#ŐJ%#+hB HYg]ps,sDCǂZHϴD3;[fBKR+]j U`;95d><5J7Fe#1l_9vX90#/}D2., 0"Erq?fp\:n_ؗ0 - Lt㐳TAR'_>5w*l $Taw&?J|h3jEźbMu5r+="U_7OkYXAX[uӆ4T̤nZ'̟^-*?C_! =bTbru3mkn9%5_3B!xY֍Ç@IQ4q|j:lplTRKV2O5c[B4Mk2Lqϟ7\/-enjrx8˴Ѽ-]QVRm.~K.#AIB8о ߛ,#5y[!*eT %S_”ߝXg"0br&U\CnD 015o}e՛|!vV.0|v.UOLBC$Q6dQ4|!n-f"a-\Vmʭ'*h(vgs#/}gN]po>6 BҜ/QN 4PfFs +<RfV|hA{O),X%gIQUM};mc"D 0θbd QJp8Mh=`'0G4l9 *{jJ-:{[dPۍ/o]8l@C dqĠxvgCU7]L:Um qݥxe,UE#KWEJl.ˁ['p^+%2ؿˇx׽)Y9*x)VTj:{s7WNDY6#%WN+N WD`%b1`MXzVInΓDtfErxȪ٪6~YI1Tig@z'ݬe&>#RfE$s]]f8~ HǺUWym9ƚlprRd&!5&0G$m*e!71*Zf,1Γ/3v{㾁ޚ6Bs YBɞĵ4I_DOV$iل'^q3:Q ־ ;{^`I氎f{H)j+ȇcɢ"#.WԊ`z.2K,H` 2ZidUH~Fn[B-i2HjwxdAeUV^\Y )fE$tvQטFԂWkb\N]JǪý*ưcp^@{QCqTem(^P.dU~]<`a1D3Yk|2R"P!l؝6u.acS]+Z }%oF`Rf^/ 44ڹ9Ѣ8es/Կ6 hM 3?G0EgBIj '| #>U٪Gn\+Jg?!)vw._ R ^&}n68 ǹ=CIutgwn<b"\R":f[ _wY/i,F*_+xPa5d/F9q[r@dUG&/ mO-vV[v !MKC C*jyz)`}e5YȴB"!^a꩞~ >!LCQng c?Lǁ ܁^ÒC.`l.06uiZY#p+ZCr$h xnXjL^ƿDOUxʧvf02qSxd>PۦT1NQxoᆂJJw?>ϕbP{ߔ -hJyvMμFZ!VI[&FE (qc$?[4{ο, EogZO.Y?q<+,!=8X"nWG`֚X}hܦo!_ ^j`X& 7~'3]4} -_l*DI8D廑ߖV(tC&ZYFc]TtF\vVf MbOf8lL=k ԇZf D걗tG P?rK܄cl@ۊ䎕 %3Z촣pf^C@'ȩyq)ʤu.WBdA8~:5EG1) Y/^A925oUt=ATj eSbK:lϚɩ)fy5g/RPR՜a#I svTl8`C&rjеvvpsȚSm"ބP|y+Xc=}dߧ6 q0'< I-_9f ,=@ecBbÓ}G);X&˜.'YCz'GM *L+9rf]c:$xg" z1`o}Eh=xCKO7Nrc`{FG|sxpk0Y K-֠O^Ud̺{Fk^ԎSYOvԬT=ӎ&1Ÿ3Ҭ\Ǒt2VKde ^ʁ"%Bn],]_,zus͙xzgXfp!-׹ ъ#1Sw?gjlCk7Ź5tn.4Pڧ Ag%sdn[˅ND-H6"Õ"U˵@'9%ʹ<ƈ ;P*2KY(◬ 1`)wܴ:u+ݍ=BQ& mf2)x4?~r]g&_/Qڎ~Be_ qѱ s 3n`S}dU)IS)irMDgSpL#*m]{ԎvE'QN1I_RU!G f9):eG&UԮG& =u)]QG2e$cD{9wgOJ1ʤ TɎ.mFl'yU"7I:Gx*xIf>wyT,e95UQ-ik#w:Tc>ĐǼo *Sm_KUD QY*m(I; X-Q7ÂIz7-D2?Xz WmafS-]Tx<ɣjvQu}#o;K@%km*%ho&v1ikI(H|oAKW`HT'0ar"/mW٠ekub,O#PЇo͠ Hn=̟*i,hRtG$Jw)ݍ TQB%VJp8E>I(bJK[+A~xaC%W>tE9By$[ֺCs?$pfT<  9@@Fyh9˟bgxA"\(RVO.@L}Hۗ˸ƾ~e8pYvwف2X*ubBqO2M^MvɨdIf#m{c-Th-k7 )YdĬXb;ݳsS_>00hYON- ruį-J:"w'H^7m[BrwRCtW[xkmSMu\wAF#uCx re*ZjO'zɬTJfnjX{T?sI0bI?վl\JOFXwk*=ϲސ0q(ۤ(,U*q8^bay#)Arp>J CoWSKxʛqKφˈh~y,s58aj8s5~ɸgDnka%3taKSEʲb4W3um$XEa{E ͺH"a w>L-& G-SG1]>3sèc.mLo]f0UI$C)ƮeUM:HYzVk10}rN} $7DC;Rh*fmųu(r`fn؂E)CP[ms@)k;62-5(geٹv6`8f!4"|̦t@n2f Q9)tyӃ??G>HiKhߡ,.)&&NNԆx T 5:8-#5?-Z:8hA|?ex|E|K۬#U:,#y"qثdܳѐqg%*ՃB5|=pqA*jPXzDbrG[ä95W`jYsznwHl2u,P6F[;!D*-v&B"UeΝL[ݕ|vفW 8H@lƻ %GBOWkE~olBMi BDso#RӅ^Ŝt) QUaqQx9gUOZ,< J;wV({ՇNbp :d^8άR%bԊq!Xnߋ^3TV62o JYXLqeK~vwA߱?^hij mh"/`7;ڮZoݷ6A"[ێkd9MN LPЎ2+%.7YDA`]PukY 0U&Q hz3*ݳu4"G ¾$d tZv~߇yUMw(q*eTk^Tb8G ׇ^IBɔZgK YEߗً+9F39>kbdssea7ߴVG_i0YY![ӣ+K@rmӈJ0[*J2_OE#gͰ~H-u"lH-)n[w;yinsKz\I5fk2O! ٠-a}b$(Y"-BxH  pFkn듡iZs6籫zd]s7:b">B=r83 endstream endobj 54 0 obj << /Length1 1684 /Length2 10049 /Length3 0 /Length 11139 /Filter /FlateDecode >> stream xڍP\.{p&7=oqhw$8!;!Bp ';3{uo[h59-7'P y9@ ;#/1f   =dA';Ug@ <@ "Y@ a08xCllaOi `dp rC,A*f vzh rh;[ځaYs9q:CmYv0[ [~7 P9ck\ OG;K0bT.`ȟ*:7'3uyNX699?<@v '?*4jjsts"0O,qvrC`n듵-ݛ:@=!k;&]t!v`%ٿLDl0?]`/K[u](:uqvX?5?0|@` 77A0$[ ~2y3?˥.m`g뤥|~n@( vY)`n;i8n Z>p?3?$G rs':?mM*J0HAl>F;7y;/O)dv[ i,n'F?-X:[0~ yc~ZE+pqBaO.PpI˿7Ko$,opY > l >rp9 O Oy\.)Xʞ.p.Vçsy;Si{ S_tB?Vij܃`cqR4̾>V“ciLEh6jKMNJ*m/öF͎~fIZ; _')9t$w~; w+3仺 akx+x5 /Ehļ/Ӎ5~7P`;KAƊ3sy57HĆ[k;R̐ lWz?]d޷d :1K5쳣u=a0!LFʅH:v]1m2 _W5/mnvꎂ Z;=wy >DZDi 4>H*F쨢8U!˕¦Vals9Mzڑ^J3g>SQ[΃ݻnVNdi MM!/kd1(^7R_ Q ѥd6i`bdw xbD |",-+FWDMmAw'V!"Qx_{>Ĥ(Yn-J`V=Qd TAvVd!a}рȠ995?];HĻ쁦^L}ǁѝvdX#dކ#VBbǺ.{z C"WAV@y@1'Y㱼pĐ8PJ a۠؀1^IgV}ps$*Ԯ/(·<݂Md=cp*7jC: O'Lg~zA,6ƴmَJqG{m1ź2 %1%<' /F_s9[h{\8;_6gTjguzYeS$^S.v/zd~d$C7ax(f#vg"XPD) 0 L>1fݲ}Z1>H'z&1l|"UZ?iRnp)Ly`A٫Kh+ ယ/G;"rHO Vk٫IB nI,x)fHJouKmlTϙF(YT}/~!{đ_j^ `v$0sg"H)=Jyo~B$U ҆у;]:u,S]!P_$ L!kH dhC,A[O׳ 4t~RYTgq Kgt-ǫ Gg7Rlyך%R{^dm:vj1 b;WR`aU_(eQ'A\$ )o|;! htML4ɓAD}1SRSs2e' qSL.S[l^ixdD2BL Y\^q`DS[l3mGSS1fWa5*d]C)j~_j 4ŒPxMyPjhZ"WJQ _֒"i &yvyJq֯a1 hlP5m/FU;~N :]hPW 8BT 6TAV%(_Gx~an_̔{@>JI"v~ .yJw"5f~-77CG4X&zlhV]*)HJv.*OXuFaq[j+1DRPLmp6+( X!Ur:F-=$-n1L) WsVf&op8z媎iyDR߾a!>˹5M:5Z~)?\NJ썚#F>Us8p)vXL*ʚyechQ$|)--z3Rȿr9>T%}!%Y$WKxR̦1UC0 O3Q  D-"<¾Sg; 2t/Rhg&ϧn W9JsS -bH?CZry뺩#-vN9q||\jGLo_.t8 PPz)n_#o:ћI :]-bFwR2M „Ojb=G 'Ѯ"pX#O;(7VբUJ%-L%dx 4ajH@p:ov8:[hZ ?xwњlzK 7"o]T9r yWE {o3^ Pi;W/ÁWV+a"oọ T4,b#8 I50dAGƑipYF~$::];Gn 31zC*\1Vx%z] F`2j!WRLQY,X5/F!r. WoL,>|UPd5'njZ`(g8cEH@Riw4TMCJ)설Xw2!]4=r?d5JGŏ WD7t՞=.})}6`MTi͸Vzߒ{+-)&l I]128>m<_֑nu˳XJv Zcį_ Uk+&锷N??Rp •^ >]X*qQJFDc*-6 !Ef> e+x]"1C{ǧO-R`3m"·fb!jOǡ"{ETT=BZHGu;cx< EcB^D%.) !rAzEz*U5oT߈c?cU A&"R<Â!aV34RyV:( PGz&*ubt]T[|3ýŚoYISܓFhwFIܘnm9E('(]_@fҴXN'z1(;n $Q VzBĆű3Wpc[@ڞ% ZŜRhXjrftwEV M7mp"*rsfyIr:zX'ES5#'L.e&');' +-+w1|Zbؚ"U[nߋ70NU`1qU޴ǹZ\J.89X{E<{&b(`:6.,NХ9-DSAb+=-G[%#p # ;箄OÏ얯hJjҊ8LG $zD3,s@ܬcea.m̨~kw Z"gQ$2?Y>:F3P muf#Iu|!"S_DY xywp[( _BPsAʀSom Lh"ur@, ho2hzD~OȸWƂLwy>6+v'[ H"Ϯٛҹ>H'&2C&lAaȬM;ΈyAsv=%:F(Esh҇Ŗ/9`fMSLA,K_޽Nm釻cRRq6MHGSb#]b\2 V7>#No|:Rպk0Xz/*z7g#ܨs#L G9#ϔ֞0@ GYw· m<g;Ѭ5;e/6 96~QJwNpv/P {anwК Iwu}) )84L~\ Tќ|C#~B &b&*y.l̷7n酶YIy #8M i*or#NoKV3غ_1vQKGƆ0 [k`=W`~5Y~)byca.O8el^ V¥rMŘ7ɗKdMq Ky׊^1p:tǧ*F9\cQ*zd]E0B(&ZӾ>2usl:tI9/YnlVR"-5jJL٧iq @L6:)xؠUGzFaFMsNnQzIA?GLDwz7 /)κ51r]Blqױti9h OQPZ6e) P`ʾL^1erS~-a hLO s)I[jXc">W2#< S,a汿]Kċ3/$lbF*Ǎ^IM(nP؃qW;u#U+{<}+%Qh]Xo0:|0@rҋN#rT,-=ӗh4ԐBq3LVJu-*br+|h\rO OшR6~$?qȽm,Q 5C~[?;})hͳwgF`Jn8&|፹:TϳRRPDw-u]S"F )9Y4JfɽIj{^D:sU`*:jss2oNYDHxX/n>{8 q$M /-:m Kbχ4 2|zdKHdd)4mԨTC^&*3K ?A~nO1^sK7NFSIU҃uZk;*i{D,4UT|'.Mo@6eCT@)Ii=N6ҍe_qO\Mz*7h/,ՋI\̠% 3:-LMzε 5+!,NuF7KS\{-ZcXQvn.r;pSW)J B1/mwjoZ޿؏w>yy[!ac'mɚu7~2Co>B\hpgktQ;1рvӵS< 1R{H ~!F'w47QVBG+/ EyNS#nNS %h%/K0DvC˷^ix R~hU1 sJbY!^J ˾E 7Ny397sn֪Ȇ_/徑ss7NTOݟ_Or셁KGKlx]3'B`r9c g)FuaҞ2'@٘ZO S1":dsTBVJ&fw"WeC*H }KA)5<'IxFKѢ=$F#w}tMoJ㐭^ mhJT 3z5|E.jzgs_|cZH6;I/W\?y0} W<jӥ+<^a4eM3K> pRǒ\L3mm{ߘxc?w]`Jbi~I|9%*I>mȦzk;S7̔ Q u;3>FT6˹0=ގ`58-;^k·(浀b'pF>{M7bӹCQ4+l?Gq+ ߃Mpy?1fU$ (UmFBEo?3Gz+;氷L& Lk(/]r kij_7w%v`*3*m {> JPz۔"H:(XmXqGLKt>tdI?5N66]{TFxE<70 #"SޱXN@R.q`zirV M?6zDOj'uYv~*/<|$l v*gS9wuWNmlzՎg~Oq|Myyr1󌃦v|wo-ҿ Ҡ쏏ڧs\zUR,! /_xUElLkHߘR[ŒM2N: ᐀ 85~6`( =Us=<4SI;&+Bz~l9 4+T2AaY<m2D_kķiL2 ?Ŕ }kbIM]PhqX엘~;*H͒O%́M5  : te9ʡ 4oF= &6씛)ȒW4F:YkR[+ ilHAWMfEU Ծ1`ߐ U0 mIO71qێ_Pa:VĠ4#Y&<*,м W%*f(;܁ٺG J}KP[͙~*ayFwtHoJRc۾JgFLM>}~t=ۙV 6y/&|B XikގɘWM~}^n Dwgϙ}( ձ#Uo''|~ ;;0GKt&z]24'8DŽX=;"5 /8 ԋn2UCȶzBSDń#(E!c{Ψe[݅8bdD8IJC"9տHPbVrL |~yG鞭~QRX;ZX /S2&ν!D泩J`n|)jؑjD6V(-MZkcc `I]w~H0X!uʒ;J2gRhG5d'AEL:Jҗ\i7 v*.Ŵ1^ J$FuAT3`OPu+ڡ3hϡ+l76~ bDUYutiB^-[H_M@wøC%,V_:n;KjE}*0V{z(pprv^*cw-~b&P-PSAò},;U4"@~͍R n< Nr%7@sEۗrat #J>"ȋun5}(LY3Rf ٷRU_|j64i>|ɸ&fȘjZa(w>+0$Ģ?u5a@)jNmqT6.~~Xzd+%Ŵ 1NПP$ۏ1|㌭E$zIO ~!WGD 1ܮvN,@ lJ?7;:Awūі[\2z)4oefK?Im$=`'2^m ]fuhE&tG աvU%yf@ZO:l/Ǻu R y_o误}zEjd ͱ %'R(xTP!PF4E%~E"7\BbSbb'-cĊukhY}ן%PǞQ8"*5sC{TL@7F}fD `miSn6VM{ fB4?BNVZh$ژkc"q#J-nɓgNr)8oZ8|mBb~ƻT #h-pLmNBi[TWr2ӊs‡ +dsdzPmu:$rB1CB6e3SSeܼ2WS|gopnЧZ f,E+ `;?`ۡϗ쏲dW\R}(7`J n?KPE6oTv &ύ۰~D̛fg)&.bYs>7okٮSpvA? ;.I$Z.esh$| V: =GboDJ{a\_H.o~QFfoīJ7hW?4!m~홄o"^vB23c!+ [mP&suDrr>a>vgC ,Z76-К #{i572PdG cs/KcLBŤT5qOoN(B`iDl endstream endobj 56 0 obj << /Length1 1672 /Length2 10014 /Length3 0 /Length 11093 /Filter /FlateDecode >> stream xڍP-L$XiݥFq <' ],;3{UuڧBUYjB\X %u6h]AQhA.`(_ 3gswqظx@;@3?@l PbC!  35yYxH9f f _%m\]YY=<4 zR>G@.H*7oCnf`{5%@7T*,ns5{1v{,U6O%C@P[y,ogE=/R ba\3gg3/೐عlϫh CV9L`uF=Qn.oӟ*7y)H/b#A|V3?Y㬠A '_C?z^3Vs[ϥK >u|&/_ðpsv~XIq@ Y@mMHMͶN=sYAWb)CXKRt_^7'<$Olp VGL¬)hEeҬ!7@2]aiH}MgyG"#*H_>P)S#&:1llQ2IA& ޟdjueK>R`RC)óɺ̯.p՟[=4ғ/:} E(C127̊ _ءU04]crՃyZX `[ovwTi.M~gQBoˌ?-؟'>%%xG ^r_`Bxmzmn4j$zp][2'JdL2_+RgxtT a .8qZ3^ &yG ^6kNWoM<[|L,f&W5~G|ˍ==A1,Ku:]7Sߺ0U{D\bk6H.G,@rr﫭PB/ZpySh+ML8Đ %6c{# i]|ȇ ,}|F[a) )6{d#G橨l(IoFxv]Z99_Ҝ ew33͜_WU8}5ք/K-{ec@f84yW:j4TWȑʦeccCd5CjGK2j%b&aokRӡ=ʙlz48WMWt6pyYj/\N> U,dN^7R*M9[Zg%Wq NYdz#1a:+ۄ]IHVхca,j+/6 qY:<"B]GW-CkԨrjI8,IB֪Y*O"D ڭ;h%}<,Eُ}V7mޥĢvP ڧ- PA Uo<zZN 8KUӂQl- ܢ qUhoAG. pD'iғ\h%TwpU%_^,]XV g^#6,Ь$8FxQ,gE3(3ȃKՈ*ts웯DJڕܙ7?ˁ&_6iWPfJPw""իM$Nr5hVwZ6]dXHɵ#M\J$\⨈PSdBq?w;y3Zα/DƗKAY`>jQK%7?ͅ}lr53۾"<|]uY = l6%zW96٩gD,쎞.L7Q &$k۠c7Z2y&Ӑz5>X&,?0B`CS`nDؚA8U$1zfQNT1~ޜxcKccReif! &qďD>ީ"K^nCsmОnU-U֋=HDJ)IMID%'3wlnT<. =pGdsDO5,uEyGj{{/c~LMzC'6~!YrEO D$]7 y6G^4 X1D@:lt~!,1HC*c5H;ӊׯ-I|XD[|#R| ]Ok.O+tBt/]dV/gT#_q;ՉnDLgQ fV!`| M=xU:(!ReM-N4RZP_)a}ҧGNifS_`TCt&ןe$z yԾ0P"U\)Y_mt) XI~T",%z~B FֽSAK>hQ#Ƚ8gRT_c-. GM`Yr7ؓ繺Y9f{TI4eS_Mj3ҿ)b3a5a)u#W?a)=숽mwdrP ̪J/+fwz!O"N>R2hq?y7g3@`LnA8>10%D2& zVcn WkC۞uNjVۤ˿7ݖSNJ-t,NFapJTn u(v<Ǵ)rX<k2(y k;Z_ c9@C?|Av\;,^:[ێa')+O@/{UT&"a]1}jD \- zߜ'N{﷏@d"v|-MrCf!FI@$.X}cq.+ /N)yA|ZPk,㓱fT=Z7ЊeIcE*LAc}Xy ThN] {L ?4ʕv9Ա֚ݳOK&:>D~v}7Uu=:Y?c̝sqX])⻾MjJ`F䥖DŽ$4xoF"j8qQF?:K/lGPё,ڜgjvkb64󥷑GqG: 1yZ .2|vP{Չ7HR~kts )p2lS>%)t0Θ4U{Kkq զ߭0|5͓\e4n OžIj3?G%(sۘ{0VVQA'PdV3T Vˣ x`l_l=h{$ 6]+WP"7dYP?Ѥz&g9?$*Q=i:9$TeRM'}'VoPnk8`S>y31R5Ia"^;̑̌FL< XQȊȏwJ^us}+5v)7A6?\!ϰ ԰`. *p`2vؽJ8N}lL7ܱa~0@rޕVUKKG7V&߀a{L]CN1V@XQH| zYx\VT,t* וKlۯ::T5^k>|yIiE'zՂUj>^IRUsW2QcZ 7!!\X̀CJB ^} ߹y3pFP>}謁NirpYȸZcs4bͭOa!j|eĬw҈R &0NT'%0݀[x4!ӷ4l@+$}[ rs 5Ba!lէSԝ:R|{҃,_QI+)"Ť>=0oZDΙ:q2p&\Q.{2d\`b₢{Y1tt9Q|^gSALPRa̅S9 -BC 2N58\gvi/Df0<|?ͽH%ܯ<6-H)|Э䉈A !L}_sR7*Nz Jz췧l3yq8)zǜ{٩ T= NFb{oʻwOR22J]L_pSNUC]]Ľe4`%o,pI_VlL6>chU03\d(ɨש{uwDRPe:`5tgC> W3s: όY;mX!/w_55g^tSF#I$ d鍞ЃIr{23m+e$f-3Q38X+2m"$5]! نGW;>qm8cr> l8-jյIxb WȲ{T&k&F A=:4vz j@ƂI2vTaW~+{t4hr/A3 #0_B#N  Ձ]g:9Ӵұ:X O-MUL.U1NL9 &_Fߢ&ZmФ6]*;lM}3%b5%6ݚͰ2 %<$`VO*Nkζ.O<+ŪqݥxT fcOҏwi.Mbz^REHuz^d= ^XάG\6[[ le5A \IDg n -~M{։| p0ct͐0I%/WXslA:&;n9<xgtƇ\qG+1=ҋ4!lSO<Қ ^%TR|J+ 㣟|#N]y諴 >gNEid6ZGN+$HHvK釉wf5DLO6YGP2؜4Ɯ՟IAܚʲE ZjN]yh\[`⦧Re&ѪO[?tRtvjٽ(4h]0 6O[|bphnue̴5jR37/2d/}P,ŧM73BXsд^[g͸Sr-HseW$Ls]Ҋ n5\Z P ( 4 D - o^A,3찲 ޥ 17q,x0zoU%Nz%*t熯˹6t8lv,sa_+dHS#TG[.C EuOv?| jɯ*gyuRU?9=7++PAEfLPq d;D.BxG`emsU" ǼnI}0MYְ*+쯠oJXM)fٶW0:Ybo@V$stT7>|1}Jж@Zo51@K1'.aÑ7 |a%i82]CnHIbӲ nyJCFq+Mdm7 *ߓ]FBw&4t;'cn2ԉCsrHOᄈlj#&D/)o/ 6;qOy ZsjM"|q6+H0}Y^O"IH8 m! O9as=.B3UýڜڈS.K#9Pk3a1]~] :ђ;4Q({~dXY6uRT^}9Qd+nvAlM"Ҽ鎧QQb}9?u\uQwU\MFq/}◬dZWȄf5 ݛň %.@N,«o:0!wQ u'u~QTd(& CTg /"uuFhj %G}]a,z-Xaj%yP;إ8m]84Eqc6%ޢmU9JzqW .hXF!FKa3Guh/zݓf'Bn/Րd@A^᪇<9()6alބ=,_FQ'еCIXp,r~x=K<),iRbMMީgC r)\ds)G(m޶;?#4#M W|MrшnD .Ϯ܀yKm=$Vl^[6;>"DS;GO~cDf&5o؍iگ*ק{_fYq*΃N4 .~r ,XLZz4Szh ֙{2^y}\QVT8ybZ@1] `#P$poc]k YF*(\2lʧ㯶2x͇G8&Ftܳ*\Hιvp丅2JXt֩i;L|%D W endstream endobj 58 0 obj << /Length1 1511 /Length2 7501 /Length3 0 /Length 8514 /Filter /FlateDecode >> stream xڍT6 "!)04H %t0 1tItH4ҍtJw(]ߨsoZ3~v;afZ;P y8@n 0`˸,`pL7SB.00cuZxje( uv`bp vH9\V ` ,j9 srzzzrX8q@]mY`@ rY~ xn]@ֆ<-\A8Ak+~7@[I 1Vc5.+ lCl6`G@]^cX@Z8A`G K-R x}Uf vqU+ rks,d7:@?g0W Μ;HI/ 8f @ yYq u:l%6 sulXla>yW'-k(RQb]*ii `¹ a+ ?J(@O.'ab9XV/gvc(o>̿/=08ՠ ПuUYݝW&^ k 0U຿ i@|@h[/ͿXAm7?>b傯57( b'܉OIpI7 N<^!'dݜn 7 Gl[p\A#b 8}@l _ߔ?Ya@DBC[*(<ٷG&_λ&Td^H% t-m1K.P4~B}Ӕ|,Nk|c qXށTM%Sv;? Fve,wA+n3ۚ;*?K&أu -OҢ)ў:$0@NJ͓kzgTۭ̐`xWz7Ydַ(wΫQ46[;. V$s|VV p+̕VIEY$!Df#@l5Cq#V'6X'roKxI i|"-|^[1R[v'7=jKjHɟ_u3=ޘ<=9ADzl~ ."iFRҿķUl/RwªK#V~%ĉLl7o? Ixh%ߩ!Y1>yʨ~dޜ̧nO#?21NekV}/% ػ)Up=(lwx0v4RلZ`J> pT1EwNXSas|%ڇd1ȱZχuypGS&hf!~G. b4/ bd]Tm;: yU,ե{0f'Jupgf^ ưL!ڻ(4mLw=G0<$o:(fΡZL\KEZfMvewm%f :qkɴh̻XᎼ$pMem9C/qps52{BiM5:s^ CQʘs ٚS=}h)\H)c215!<#ksLHC ؼ cJ[X5;.zWZrvEfWWV+VOvW;cϙכ}5f??F8:4QAsJQբ'Q}hAcSulC# BS>״hƿbBq/ G>PYje;vK:Iaci( ˇXA[vB?[|AԊ#gц*Dz*ť?3-WZhsSnsUɋv1B*Puue~XW T覚Oj^3g D $5x:yF$"m/=yr(lHF}PKlRg_)O|QKv#xWsϸ|(NُE/h\aNX45a#Q Q35hR;'\Geq2t /r\1]ˎ  e.}ڽR]@.s҅hY Mj4@<>Uij?bӎ5!$囻 Fٕ*C >J-@E䧍( QԜRiXYzս'uqM ͮA)ۄhed7q9-d%xF4y f %ݡK7&Y5?EC>ߜ܇r(?;fQ=K*/hH1g"~2=Xp(P;s b0Eܨ$AkjOR"x45iv/ۙGãݩvY9P;ܾW>T+`!"7V,Y/}A,GicaGYqrgT:{VySŞ^lz˴Ry+u*Lw[9f )|9)#]OHޟfFYqN|xrrsR9E2S${A>-JtpG"iUKT8K]ګ,-n/L7V}j'IJr="^!,4(?pKjJwZaJ'LEkeVb-Ǧ \ZII/Z?o$Vxnԓ5KаZJxu պ,=,q*.N]K$lKh1d=80TXx@4iPejFUv/2}Zzݎ'..]t--h֣}T}VBE RXE"9OnǡŻO?QwC|303`y3r&2`U i 8EU)ϳ"%Ǐ+EB- JEc`N-Kt Rzlm9QD ,"l[n>{GyHq?cimLke:M<:Cqr[mO&W074iގ.0m=7?HcI\~>^YƾOYwJP}7cм%y?LfjfA ,WjI4k.Yֽ#(6|d(bb wuoYR\35kڶYYo PQKq#05/1@I?_:u..#oОJmE9[p4ti1ԄO3J"ڔ.Ѽb+`Úx@dxÿΒ"HT_v.̇0iGH-eǎYc3iÕO('S)rŌ}7;oͩK˔ݬry9[]60!9jt'P Uw%t8)}!CpYسj%}"TgIeߒp6=FM3{:US]>9s>%,_# & $SH-9yaY6oYƽ.CiLajU{ n^F㻈¢+Fr|~=DU^!- gk1{s~ }S9~ck - P/2!eEW̸oOk3\}#fkyyLx)'0!?aSlQ,JQԛ޺zfi|=9͟Awi}F]} FZN7Đ#K_ޤ,Ol ZSf${|$R> L{獭Gtkg@K0Tt-B̃wJZ5jh"bq>$WJ.6GTz ; 4UY܂ZfP}[,&}\‹Ta-ш>0yy7ˡ.q|%Rzo?;ShxFyuTi\1F,n]wf+;IAi6(!β; ăԶt߰Iʗu6<{g/og<xJh0~`06 zgBqh{Y!U{D͑ģY-qj^CwwG ΨP~Xņ5s ,449E洉$o͟IYa5DNPJCǎFӊqޛVsԬ/\_ɉ Yo3|-uZ[lP^o|ڑ:QpOYtqx27xVũnjpGȼPp*O^!?^8&8q{L}*R.0wUj 3=%CHe)=K6kY$h\q"Jijϊp'KR1|&["_G 6{Db]pBڅBROJo54{ 7%\Sn#j zam9 SqGq̉ki/G4U >.Y֠kn!YV(Ȓg]YLmQ;"{x7:)΁Pnk %(;8.aPIwjR&)#"k^"62}uf,iZm`uh쥰5\x' -KC -<ڽhQzwClo,NYKPNb2(e*|2XDHGZxZB!6:>R%HnIqsCka>^Tߟc!;@JY+Ca#!akQLYA`7YA#7knռr_aul5k]ao'g1[DyKw"f#SFu圯<9ٽaTq4_!jq ZFHsXU]蹻$I.NЎV?">Hx< ,_)k6lĶSY{ycgEދY[]?{CXP?)#i>rß]A*P$?/GQ>xcYH (v/a3>qmi4&&ZfҨiPj|󒉝!~Y"BU!sکnYMJzlQEB4gXp|rRGkuJ4{i替D9 E37՗\XJ._'vuP~JI?\bC뮗F!yk+7,r =4r%\nD 2k, 騪-Q?7`P*ɏ w_>'$3L͖.P*|)ќEGo]]U*J}{ (\BӀ ?w?7,a]Gd)7-I p4G* ~AI-:S~sm$ҏRXCu`l$FQdiLpYtZ: yFGV H^1Ah]9?ǦulFgHԱ-~YmL0#SWJؿ\]*Ip/mS@rҌe\~&v*XC\=/j44IÐzxXcM/"t4,rՀ׉)Χ* r$[+LKgL>A9q,"=J?αj\u2YCn'K"R',fcyi=JFH=7PL!uq^];C! TskYi3ZKBn)F Zf !(Fd"u {7q [DHV*tb>tQw0VXKqة)l0,a13ݶ.rkX*ϳ|TI[H1E-#HK[=oU,gS#uGD*=ipt;jUN`X>O3QbL!ѕJ@dD6Wl.Y ^ȓ4gR\c7kRRb3 8Kxjw:4fzM'/lgF endstream endobj 60 0 obj << /Length1 1888 /Length2 14054 /Length3 0 /Length 15229 /Filter /FlateDecode >> stream xڍeP ܵqwwwwwh\@pwB w %k^yޢ 1u̵\?&Wdr8;YZl66N66$jj-;$I79&e~ Tvv(x89<ll66: =AVeZ dc ~O%=t# diP6:Z;4-A@]XYXYlD^ -@tZb{4$j-/5 x38,No)NV@7[w@W_LrgQg` rTeX`&o s?do=rcF?ʼ# ?)}X\{'g/' kcXyj;\=RǼn666>>e O'\]oc@?H~@o"$vv ڀfZ 0d{;폟~)?UVMIGE딐p1sr9\޷fǿr坬};P[t/=k8)GFllo?ϔ*/#?tA>G)oCu2 `mwyS43; _v h[ ybc?%{Eߤ CWe mO1XYo)nH\,7U_*JRElVUVEVEjުhqu7wtycެ ~`cOUb |g/o/vt7o߸9_ hpqpۿ{ߎ/_?㭳ϟdeoAH?-CjCʼnw9 \D_!HxK _r,wo`s6ٛgof~8vWS20t=mu%6Cz;u(+D\u[i{ Oj<.*Ce8b o`<.hEC&G>$= ) 8ᤏ>ӲLf=9!zug1GSZ#sI1WƈEۦMZuYKύ&۸b *7bs!cfH1z=iB.Cwfz*_Z\n~o yAv eqqXVaJ}uiUûTC =wYqm}t)ۓ6׆x^^*[4J x<0 $$Ml3e ?dȖ~3brr3㧄$MS!캱*4j,/9c(Mτ5!R\oVRj= s(P¯ Q C.uz>U%dOo-/Ft xi"6vxEGEf6`+6{_Enq[!n9 KroHdm[?;g jOrhٷrpwP|p)71"쓫ebOMk}j#/Ҷ+ЏmAN.'XW9ڂdh(kԏ]K)o)p(mD#ݻɜ B>J/wI~w*4"AfЫ@gWHCfxh~m+=b5Iґ@A04(PkKeK lZve7.;0EOu Ok̹BZfԛ7U;L|4CZ(}vUk[8k,#\w4O!)] QxXVIrƥ.1ɫIrg<K]@y;`2#x}|%nm7xV#X!I]EQvE p?E"bN_2PqR@þ,#נXiWLx0",Inbԇʋ`a N!+rc6HZ4V9*H/+P8u hBOW]ůlZJLսD7h%4q`"F[!RS-bFg+K_~S:0=@;s>Bc+@.9EST4&=5wmj%I]bm"oȿGQ)tI /<4 L&Y`Wqbn~hx|'!zoKM +H zYA#^鿈 j) iNUuثY\Vv 8Y6މQ^X]}9$hq,e֦& Tj^zE0$6+eH[+JHo~ iS'TP9wmx4ҡXc|r9U)λٴ,RnʵI@~1}!ڪ}َSJ' pj>`4.91~'s/Vl; Q3|@B{5>-2x#jG PE±d֜.}LMUP}䴓aYW))I^p,#A (/$~G(>W(1Mm5iX=);ٯǚv|WuGR'a0:9Wi85V]@8E&mw ?o5/4 >E}B fBu(>=eJ8KE{6kR80#o++4Y؄Qt$o幽([7>#׵&8irWmRڼn~6ӟRZV=/*QPO'V怠lDT%$@:!{w2 NJا`99F#J֙ص.{ΈXW$q9]m*ԽǹHAG ڷpFke0=#ʷ<IH*tҸX WjξwAyWTniN:i4X`6q #mYFB @aV Y&'$-ARu>f뫴cLx8,b#CIQ0t PN;k ;d1Yoۜ:r,'Dӛ`܀nJ?dh/^&?T '$5pMr9i@3N[[َHEs:1e~ek`P .4mJI`V/]-,^iNZj )83!X" 30ǫ Sd17rQɢ:5k#+,?I/CJY\#fIz$|Bi)TCD<rZ_Um[UQ/(y䠥:~ml%騒\W}1=VFv$XيL#[7e999H- `8QJ '(4W&~Fʃ4[*P1~P< ggK#zi劍8fo H!ɡ6KYVOFkq=r$J߄ Ҕׇ-SVڞW#e[%6PYb|yҠ}oO:Td'Ű<]d/\GM3aAѭF/ƈHQAl/Sy]d癓uakp,N .-J#%^?XNE<2~tN&tF"EL4T ?00R@jztB9U|umRUVrkka:_ԛMMwد=';B)R?c/f9nWD dNwWH7t3.6oa/~m0Yy`x.*oG!V Ohyz*6qh߼w+s`D(=śr]EVurUn+tm20LNWswYa4$M^|WйM̬]{n[Ӡ rW{r3Jo/XWLm\a|e_WOpYsou>C,Ol!yرmaHͤ'De@G=x ;j/Vnt Y06kLjE +Ӕ;șhkw3eU  iG6P6هZo=h@5u1؝:m4(z՜tgz{ncj0I⵸F{JHѿOF gR ]/3[oJPY!yR'.P${ۄ[(*DY(pZF[wTm51"j&|Tg4lE*cjp=myq'#"rrOOcάH+P*ĩh8bD3}J",߫1Ԉu٢U+|{$zrD 5x?E;:+ #L6G:%neco"/7RTGȭQ[&g}a:DH#ŇҞdJ7MOm3RX*n]JT^ g-vz CvVSnۀ5#2:t jLcaG)Q@xsv=SdCr|nơw({'Z4,;24h+ d lK5H<<+*#6$S 6ݨݟ"x̚3gtީ"$u~??`G폋Yl]~H6~DӽI?*WS}MFK xf4NhVh,Q.6q%Ю9SND4d뾙[RLbh@ 頨aCG~U 6FyCv;5ªБV#:Hs?>>$+ڱ^<N e G&?L z0iW3GsACpz&:sJ|YR1 SM9>W/q?y+ %{-Cr 31/m$ PTBՊ52?U4Y@PoHsUsxٻmٗyw&gADS$; (x T f\e&fIнCȩvQO9ObaVn[,kM^2P<~u˷QShqv+v -%8G+&.ۥ'%'+ƶ L~a,W;q>3_mz(_x _"}#$yB. ;Qnݝne5W0hd1Uxɖ5!PQ/?wkXK%'ڞ..Z͑j@D~"׼uȒO*m5 2tLdPr/jO?9̴'FHU*fu)f\(UB@{qAD(L^f jdL 6{WyOӊ9˩9x#ah|&`ts~!B6ٲD`F݊]Ta,kzIެb^Y_F|cSԾdâQsܻM!vɶscHs v)<'ʭ\n`jS\0f4Gږ.GH?FV 8̿& ;MO2gnғAFI1\Xn̋s _TtP$˫a~Q\)jyc8e%_U寢IP` D$MrK9c\/WJDWRi0^4ߑȍr_tHXc-5#b2\G_)zey=aNb.iS!!l<*H dz!6soDx!Y^Iv.+&wM NA\g=rLs.v>SH{]fڍjׯ1E^=zC[`sh?9av1A|ucJ}n.iE*, pqj~~#0֐".>F(Ƅڻt̉y;tu"w/tAbW?mvWg1>oiQ+PU@`Ey2o}6wlc-ru7]YZ4x!X>@)6PCdi!V¿ߕ = ~WL=dU '!Yg`n9[iStf[ud٬@e۹3KB@9y%O NU'`fd&uOgjP&ʕt/WR߭Lw=a]@; gh 2`ZU,3o U+$'*:ס# 8KZ\ObId* ֔ް/ئiLrONj.lIL]tűt̜+$NR1&S״ol_<^郂 m-^@s.IOZi?3c&=$ U5 ל}C&2SfsZâBٿ$n} ?G]ǃC!  @歊.뼯Yڨ حw)qWCm#h8w)ܙgnL(%:;Ն]̅ݺޅjq}8L9ψS,xdUv,!(E|4 v,Hu4)׿\+DXďXI(2ަGh^ʪGH8Ȝfn#釕4*9R%pcn癎%*&Kd?֒ۆt5N0A8M+2g& oS|q[.: 79VEg?>~AioY%$"jүbo>:LzeU)L\2oc~%_l)%C YL;c")'.ZQ ۤKAʚM4ROOhC1krkeW(%]MZIy1zVFI}.7 )ů/umZQ䃉#MFVL~Pd2hBsƸ/n WrŎQfkIR|oy̤^bBPAC#Q[VEO`pͰpƲ$1#Q'uN.ZďTW"'`I_7%QTKƯmq&udEq~6vױ ͺeƽfC-J N×k_E\A?.gt4~IDxPIحzyONvL`!+BwAbj,p?4˹* LkŤVHd+mXN.RQ0VOշgELE2|@1jW <-g0N,#4­%#WHbf?IiJ{M@xk  cZQ^X_ O6vFg"`*7&oHxAVfoU:n8m22^CVYn7S#'i]DfJxIþh2B*WM JN} ,n89T(1*RMMSC8yYTx!l`)ŘæEuP?MOT%|d8MMA[@0B>0 ME'\=,KQ  A)ܮU+Ã5F-=FuވW߼/ vMc*r ”qnZa:KͭQ̓x`{V 6t ۏ|4=q9l (cjl|{0j,a!=rǼ Eg?;/wcθLPGL]:=)pn>s(K7*#UMtZlzqD1 ZJH_sAR ৮TnW $}{|_d}l(TE)Mf^4vVӫ ">2˯Eq|ݍ>*w'|'ȣ0GsoS p|rI1׀p^~ec~~'d=`g*;pAb)tgt!ʜK9-4!/&M]$+P.l 6 KPZ*'F(Qo툩 =Y`&kF6W Hӷ]S,tn?W/)^0=AYNh3[61l2FDlϝ-?l1 y۶x,$i bpnUwEt9nGq\>k2LZރ}}84~{I汓?#'(qbvI'+g7 ./!`1˲PT],'z;iN^yI-.7VD/mg($%l=Iݩ߸.*F0xO'Xkb[ǐuܲ;+xtˏ_mbu KH4_ byY\$.0Q>= M9-|Mܳ{XFubqNڻ%D+{R+ MYM%NW850>$KPgLz,߽ɍ jg3wI%X/zz$ 18;|Xlz j J>ӲD!QqBE5I!]6y \ > S9T g4>st$zzd˚t~8 ۾\D B]R/ޒY ǎ./D{~_/8ն)߄78KMA@!P$9 "1TE͕3 !-kt֌QxbCKs}& 4&whޞw#jо:^x¡[^`TxFAk, ح~_=g Tf%8쐐nVy,wHGoJMNӾWVms GD;"B%˟3I&|0Gle Hò\s#صl:g}&Ei"#x#cǀ,Y\PO%_lKսsc,GHc J9CĆTHM2 endstream endobj 62 0 obj << /Length1 2447 /Length2 16975 /Length3 0 /Length 18407 /Filter /FlateDecode >> stream xڌP  upww[pww kM=+`Vݽ@A bbguf`adɩ0Y((T-#P,lye!4t Arvk _C;G^ @ cg tBp03w#ژE;@halh 3t6ڀ *v@g Aolhhhh&HCpp6(@6KcD[8Gbgf@[' bH쁶1=XY @9zXؚL-IYFgwgzoCCk'; "JCPdhadaFa@m5:;!Oh ߇kekf_djakb {&5[ 6  :L T=)Y~A5xLAe},L?^N@ ߊE,, cgOth :G w63hX̿ 0;[k?1<%sx1Xl\NNFQ4;yJۚx,KM {=hK4@1a`6b<Q_f$bm??zC k-@s 9;&_S WhbbΆ]5N@E gcf?b{fma Ts}X\V 4_F [c;K 0tt4@`M+&@hk r9">RNo'I0A&?$b0IA&? $>A >?' ?ħ `R@jOB 6=h_WT fddba19s-.yP A nc;kW[bcd/)7rpP'MA)~nc"5Ts{s,@2APAeIqm5?@m]l~Gf]Lv azl,K ݿzj3Ol< 7 _cV*_NֆNr%'(Rer6weq37 ^u'c;[Z/wԂ/b/?#=t#|_3 nwc؝Ha:3pNH/ڎ /zذDgx6icBxU=_V-]2.(nVCw9?"2ĨESe-#qf ŸpGșz#C9aꥵqsBթO!- A WIQR63 #X)4.w>muL3wU ikQљI~Wh.w3suy|vJsTta{It]W wweF)7>˩9=wr3/ih\#0uyڌuzhGV$V.^OƵR?>{4xX'MIsa4[ǟjaW@E'r.}@&,jR%<&y>!ߣ_lNfFWt,t x 5fܔA))šɴïU;XxP6/1U{l27F`9'L2;s{+ӦpyL їѫj^!3崏uq*so_n˶ ֐kh-{WG׆} Р׎;ˠhm]#!h Qy1]d52&Eܰnw0;2'4GnӼY5.x踋Ygo,)\sŧ{Ti=5P,cS&-(s;!?WHrbRHFoi`e)RohS\rLBy\Lv?:3Kc mat<1U&ijkӒZX%ggqY j%d]9e!18&~YѴB ʂ,٧D, plTjp.>g@VOr &#9d*LMEyz^H(œ:9B  z8bWn$qS5s JƿWOTVl|%c6kF40:V) i8«@g1֖hvPX MkiӋ+]n+٘1p/cnq' ni@UBdB-UҲwW5SQt@a@yBL/؉G.wLmyvw}/7&TFcFRwĚ-xyy{Wа.DMC :oz̞M !aN}i;1Z;QGXc ,g>vNϽDO;Б|м1210JfWWX\s?,XƑC]Yq RfhX}w-y*Vܑ#@2h[Vs0o@@$ـ.eQ\U%'_!mՄɾb, 4Ck8-3dÙ7j`y!5AM.~Fm6 Gfen؍/퟽N jMܒ𽗂?ȼy25eHQujG|K[,);6këZz߽: &ց[PI]ዘ`֛pW#nDY-xvM;< -CXLS O.â$^4CqZY}l133{>.l}"7lH[0O~AJ(x¥vb|7[Ff,:6_cRi  *;zT׍ HJm;^$ cWO½ &-Ma{.Pܾ $XE܈C"3zk(19ŋ]iD @fQ9R1{I~b@S[y;n: .2tEQT #5/Byy&c4D ECWٍ _ޡ;^„c6]UShCm gD5_SK/ |܏XMv^ݍTɇ G('g| 1pyOƶ@~˽@kI9x䎮'U0H}Y ;:_E.2DWOR%[M8͘S^/Pಝf}_Nep/ )n1xic7PǾp,87FM]ʕETJx_GW|0T*ES|ݳ [&s~9] ƄLFGS~Xt èx\༳Ըiz.DZ7<%C@ px>,ǂfvkԸ:˞O_yQgh>T_v7Z2<=L'޹#u?apl,^>/u׫dPk96j:glh;hGz$Y59J|+Q` S(XvD* wO]Ћ} {J?}EQF -pYW.F.)˷sGڞشphfA(~n؆KV$糏!QBo:>[z~^.L=>!i,!Nat r]Lg.o -i)YὰI.`P+Jl^'\ XW"ZR9qTåmKyiAuTǟDF=f T#/eU$%C_CXpa;~Tx׆Z};GҼI#YdR%$7I O,u|T%zES~O=hqz&)t媤^D5 8pq09eAfLimy53sjBWfbgXJXd>Ұ;ERϕlja,5Tw 7 'ӭcThuU'(EWvssjRN_e/>C^(d(+hξ hYғŹUqԚ7Uk2Pxs*_M>?mc埾'QY-{ԯ)][- @v;U|0FX__6Zfr[/ VDyܻtÉjB|QS5L  )0Θ7)jTOĒJ#QpcT_^H4VkBKo-ki9V?P yW.>H ?˛:ӛ|K*ixymL˲4stٍy"C3ӎmRIb<6}\)xîd_P}vWe!J!l]o*iγOT:&o|XDh9~moM{ͅg$yg&4*^VpKnhip_ <|>e 'ANNH-:|q h a%cSa^y2pCGd=ߡ&:xM~-1fSʚ.iFMBfrVZvvX? JrfMsC!}q7N? ο,7OVYBHpc6 A,3OnZ@<9S3i xm:r,|EhhSP:4(7$2/wQWR~?:!0Äw^XOOrcmw>{.YE~ᄇko1;ߕDZ16,ϼT{,p=_k\ BL.Xw!4(й1m4pl1Owqv^`}2֏XP4IPKld0#R!H%05prfW%{@A =bU^,-"Zu?a?!M!j0B%_KLD&ۋ KQkILõCn5𑀒~!T *4mvii?6ebb{s=4ƻX jԱO@m@:ѓ"Ԍn}޾k9m"@gV|3`_y?!p`xeţ,diVȰ[a;\P;e8 ٰWGޑJ3o ]o$e n3IdF@ "|o TF@UՅՏϯ8Yuwl:›"#8Є5C1=.̿Av\ ϖӋ;%/kƝ4u6ƭ;IqXrNtF_ETxolZMs$[ ,H!VO}ӜDu\˰!:l7,,~SlA|vTG8#q{r eSCk\EV֨:jNvFQ5?6ڄƉL߅*c_ֈ} F1ܒˈ+ |90l&Ky$<Ś%7T\JF@"Ք `Enwo|})SD~Qd2P~%Xzt 7*Rn;b ƛ`,yDfA eˤAew',B+l{af+h_?v!Z(l+/ۧPŇU[:UJÂ4DK/ ;.#-.rE+8j9&d!'(2_uze*K݄J*5t?ۥ3{e*6GM@^R?ʮGWatg`bb{t"u)ꣾ+Ի05q?d0Fn_U#6|`c^PUR0}`+3-A{#ݹXozV=>ed\ՏҴ+=uK2ʱ}MO[ qIz8m#[[7K: zB2Iab#P6!J7ֆխ?sGu} @SU|,2_Oj7vCV1Iv1*KCVwq ӰUt%r9biJUe@@5U /q,|'tA/ȃUS'ЃDGKS.jsF("Bۻ(Jv49}yǵEC WQ$0vI_eWnw4]}i%ݩmnWLş~?sw $9-"{De+=_<t/sn} `Y,Z70c5r'( Od] Ĵ,.Q%=lԾc$gqeWLRs"h!ZK ڐB:?u%1V 0CX]Mj%4B0t9u9a V-[ߖbV NUEDpՐ^vڴYr˲]}~ɏ[j+|6Uk.5U&RHvtzqx6޼>_-', "K ٍ]r2L-"k&!6ύjIMî<5??)"2؈Ji'bg\w$S6#q'V'⬊,v_0脲 | y=/ YƂ sJV^ɟ3cEA.ɪYMp5~)A.~I'e&UӻW َ\xa@ſKu"ЀJ>ޛ{b,TF0A~/Q`dfmĘm(2[$nIlԥojȸ4,6dk>FXW/vIܔ/3;6,ݹX_5N~u)TՈqI߂ٟSE}, PQ2vc~(#d0lߥgɾr7m$" ־cVyy(3yxHo.?TҶ wOG,ZJ_a=RjW쮛r&g[`'Ėjz#Qޣdu*ݣx\\JO`l. %}V:&&uBrR}nص(-5b:0c#g6Fu8ji@VZYtH| ڬ`$3|E7 p9\5qU1?a\O+W!eu9* ,:#K´rE Ѩ%6!T0B!" ̘9||Qw]?rŎq"rj3C$WfAmw=66}^Gi΍ l@-~ouJ+I *ȉkh"TOeO RhгB}dI+cfYKABhjT s>}|ub I,9>Qj}W= &#+btxJ?s:]i9'hM;/0 k+9aj1ŏQBO7'#_eTa%NsKpw J$M>sH> 2S6ٿ/|n.|14Q4JՑg%cTз;B0~*CU*c0\D!֟\JZMcj0bLheOvv}75kTB5;^bnIЦsݲ2զw-5~q~au)AW'v@eWa>~̊Hg-d[1sA6Ȋ(\3{J9,3'1s;Lی.%Sڲ;r.+3|EgD|*9VTq]E.IcP$ԍ.!`RNzMctcWqPXD0^c<'"ހms9("U6 %_W5е8K0|nƫQZ #?,W4䣭%-T!2ҀCh^Lͣ!?LEV7YZǀtp$P6X%jњ$0ѣ%f\GֻbJ %$d5,]b TiCtUGj^ &˽˃:-1]&/_ǽc60Q+aUӡEv,S+bvCzoɷbN5(׼Gc!T-7~^XkIFL4ANM*|ҔJ#9NO :K6?HP6 W8&Hk,gOF|ݷYu&z&yP*BC`_&̗J0ђX$lI56QuYNy,%B+ִiһo)z!.r>BABCU6B{E3%tD (F0?rј?֯~a}ޠO9c~/d$Ge;S7IlCSYzZ>z)KB&s5Xx6#EA|BM8⭏qSXo%Z+-.(^1o9ڦ{ojb]b|&+7qqf|t{Zї /]櫨 fVK鯪k2*1?\NкgJ2 ~~spC<YmU6Wh9=TKZ"<(s C ڍ6vJǞ +:0jjJ2v/UTl=w:H)2$663 ʑ牬cIR72){M5nd}#3}zq&dzëE1\KVkv8ȗZ n`V&N[* EUBnVبTMVeyKu$rO.y+iI5˹䶄L5tS:c:W D* ;t4r ]w~Q Ț*TotpV6f.'6::)e2;os@աiB_-U>kIwaѳ(]S`k sJ[ #c"a{gkͣZVE'JSl٤{ޠ!L`YNOWC+k03_%wLK}2[P]a '^u3T>ctmxՎz&~ԣِ<4nuxTm8"Aݩ$xȥu~CSйaLف~u+(F5#Aa%6}iX %d?w]-Ü D]bLӯ}M$Kͽ% >oV*qWE SӢmaGi,F!|st)-d ANmh>ޯѐ|7S7 bSۮsH6ֱ5A%$&7L2 f֞=cA60KpLSNB}Fc_Ciw|I0^?$)6/@RMryWJ ߛ(0xu 5uv._j@:Eo+}띑g7fځ=uFz*^WR`y^MW70\]ӧ[+BoL/IVY$5JZ5ĒZ'þGl3u9הG?4_kV kXגwvλ} \]NGc [;9U%$'Sݕ̮b8T[gl1Qg|wd%B ]fXP9TfF^^1=-wW̹rDf!҈e/U!k~XU,_`'.>WMskCPɾ̧ ʣcf\2WMY z{jWp!D֊B1"<Zɓe@f{}@AygJKC9?A0WN.VlGw%Iߑn].غ.n ;Agq$IM ]ztZ=y1O^[Y@R:LhLIsUw>kv։ a٫\}lo yF0u 9`c ڜ T6CcAFN 0 ̾hw+wNQvHͣҾ@Y_WLXa^Wଐ!L$:B$ 8GyD=:g"wPj ebHRj{B.ʆF!C-=,Jk(AO{&RXvD7Vs;PtBq֍k5-h-LKRpg#8)xZPR8)cNQg<} \@2]# em4gSZ,F=, ^mz'd(*|>q')Lĉb.  2T"I[[b J"ƇS$g4(&8_{,˦ݤ oYH-EYCm˧g^: GڍjJ  =˲|> si)Y}?`j*z pR&N\Щ c֮}y2~c/WSĭ5HNGkɜLpy[[v5Isu'+1XMloE"d9;nn=؇a%L^%^It01@f#4%$vWU*)=sO A&@awd /5ۭw2AS}碙 Ć^&#2 hz.J1zz0u,he>e8`ܷ5wh˜Di~~P8\RK~5/1i4V!4"Spxt`Q뫜5|LZ(Woy-}%#G-qE"&~Yݰ ,«֏Z5j ) W>3N t`qxXcǀE8;an0v+VT,M(%u%Gn:8g?.Mt'O Ps!cU,0U?;MTdxNZI4vX{DFIajK]%q}3c@lPhђ 'Gj,rQYQ^r|2{Q>-:@c<" ӝި=[㱅_Z6Z͘]PjՏPf࡮_ #̈fVn׺$;C s5|֘ +bm[I{BI!Kv1x Q/13Ie:2FQ!"sIN SUd%H`ώ3n.tQEpokh u iU;m Yں*>_w:2&h]~?6Iy͐@[w~:؀I{ـ)%J5L~;HB :,R2*XXl-׫Vdӹ,֭iEnj~9plKb* J-^2{W)VM#+ OʙGw$d|y>ɭ4| %Sst ^[O+0_aZ.KT$.YI?  V)Kj_wc < uqbsB|?qi$о3Λ))ZJ-%ebbM<Jax$2lO0HoGH -adYײjVܲ<5SӼg,u`}: 8Ʉ"7L"i1pu[aU]b4T}S-;"%i")xWӽx M4bz"4pVl4Q`^Y}WCqje 0'nt˯tW9'JJΙ9 ٩}alTճ#+br,!2lч2}Bq{HLgolM)CQ^3v,{aqg ihh}VX&l?v]ZmvH u#g Ҙ9 Tiw\:::Ӫyj0 DST꤁bb;.tN-_ Oe>rw0>JZJdNxa?nS$h${)ED@OT%(_Iڹ-&Ȳf0OR(;Lufm{0 7KCon&aƥ|&oz;F5Ov]HG|gg7mY+MfϦ[󌢌Xg"1*&FϫB _^fpw 1/x7JyDjow,2ga*옴M]I6'CHW&Fo ̞2Iw]ͰS-VPL#"Q@_ƅd5tDxÂISJK ^.ٳa5?XSC/kNSK e榰4zxX/tfK[PNbYp)bIe?Xj|+ z&A@=&땘F/M?I(=ɈUz/!I7*W} J?>]o\[bIx#Gj'ǒb6YIjMQ7~]OExGY0L+[l_AZ,Qj+R^QV,{E3EHa4/{\C ,8勻ɑc%~eNFyn j.zJE[H­@pu? I+-`y[a $G;:%alvry(J@5&3kZ(RDZ#gSF1nipџB-oXRoO[ nZ]ҨXd!ĔOB-/u8RuSo(&\gS L{[{ŅupǼq\)g[Oys) kuKYEX $1748 1L+Ī~hduR&[~)>(EHMn>+c1$({y BVM$U?Osˈ$ezfKB~YTlM<~Dmz-<\PxRI[5d9ABe3oDVO)D&L=_3چrM@6p)q0!eX/j 1u(x}iE\YU! J0+CYJ΃ R̼*Y|q/(X{OqszvYi%rI3Y`8luPYB}QGWg4sw]/p6W;c1CFYу\T̬GN5 endstream endobj 64 0 obj << /Length1 1617 /Length2 3883 /Length3 0 /Length 4897 /Filter /FlateDecode >> stream xڍu 8{>JeVx5Yge'[dMcÄfƖBBGT,BK(kv})|CS}\{?{罟21Pw98,Q& U4 `p Btf+N08xI$۴Dr! {0i&SB8w h!0$Â:aM/L$#A` r逺ǠXIt')(9RRޒH7$rF {~ˀ ՚$0`!tDo$W )XGOL cw3g8k8Lݯ}" BܑX_ @c\AX@CX@+GGz!1HrAH@[$w? q'$ ic>uĹX"n>- D+`qXuD.exxzZb&6' P4zYj3_w 7{sǹhr` H/ =?Ft0A N6蟘J y#+u~LML~SCI  ,&H̯*3hX. o.#Y -s[("b?Hi|+tu=C~?~WYDțo%OzH.c\$M0D6~MpH˅r!$\ yw}9, 縿dp,㑾tP#o#s b@J#Srwr4YS e!) )?F$:D?< $S͑%d*Hf d*y?s^9zlQVQ.yyiW) 4jKwxv5MCO)1gt?2<[/sJ'#ѭivqQ.T&~Jy@j/3T8TоHUЊPϲf4_Gy't, tJX\,;进ᮖSR6¼A^uږOF >VP mϙR֢E9=;aJ,Q>\'ES=1JQa c*6>6zuݬ-a/&cEg׆7k[giŕk-NmH2 <ԟf~kxʓ ]&3D>V$_~vfo+bAX 3S:̮3Rit +1.GS֮~@XΤ@H$&Ao$$R"KtlFAz;y&8Ș@*mx,-1{3xР2hB2GlQQR0$lg؁sCIQϪ7Ժ&*s(;-֮ER&vzT,tk23v})}qّ7rW.6n`G]2wmJJpuKFZ IWO|1ɲ=G}@xfˌў s/Vn)39r^.PyfL;Ѳ ª-K-`{~Ԥ᲌ ˏ3(4'Ȅ7Y^ 쇓v*ˆ̍aLQ^gx^Z aׂ8o~I`uUp-ÙI.>}\ݵD~QbHgY]q?p_vPn|"ky4\;qtAZ 07Fi/e6:$Uثk.܌z|otCkZ]7G;ubm_/썸F {14A]Rͬ/B.✌doDŽ<=w-|hMb|I6s5qskib>MVQ?dS/lq񁫬ipXs>pT;s|J(}ۢ,H׮ YK ':ٝ)TdIx%_q{X[!n[4LL;*KjIn̉t `k6y D%H_3nnFSpߩmOQO%WL//11ۋ$jkSݥ⺩%4(.ys|p4{s%9#S:0CR*ucWYVޝ&?59aEYpƕ14.r%w>KʒWSp- 6 suWN S|(Bظܗ;Ez"HĸPa ŜugzNN Uk5L%G{ fWze߬[ۧ身[3D=˴j--,><_UY#("hő zLUgrkVe҂[YY+zk4_yO*MIU_9ҁџj\1yKNY?g"§.J;aHK9Y-))2(| ٩N c.}GqL&ڡB1`&}~jӴ@& @Z6%G^FP Sŗgd1Z S (Xc6,g9nU[-9P^"s6e'ًvۑv*' NZG˳M)~6iIƍ K>-+k|~% [z-15c_q0OV+|X:7),ğ.+%և/PDMQpURG8㇕s^S){0sqzD֪c:ZS*tK2U#>GՄ_V^4m)|їɿQw|{F`˝ýQO;pa} *qne͒ѥ4j6V.=1LC +1N]}0Lm8ݶBgX*Jӈ+{]#}~677ry!WЄ5Y 79r cw9شӏ)h/ܕ]@snHCLs_k1z{3tT݆v1u&'@ꍲZV}\%>qBcV# ~N@x(wAvVFƑ Q+r9eS(Vȭ*? ^J_Y;LN.oK tF͇E e 9UH2_.}X߽ȧsN?rfB/Rcϑ\dD [2ػ=кDK{S8=\EA沖NJILUfh9Drleڞ=+Mw1 ṣ2|(=+v{~hwa≗}O{4~;]pg]h[^(5?wԝn=bs%fG w]{ Yo*0ĔbIɌx믪5.î2m׭K|O~Yjla(.Hߩ寰M乻ُ ތ.r_7oOT\[lq+!Bˈz1`+ržVdgLv> endobj 10 0 obj << /Type /ObjStm /N 49 /First 370 /Length 2578 /Filter /FlateDecode >> stream xZ[S9~nmjU r#$&I-Ńmr_?9r24VKGOE%j/:F0wAV+t%B[\%6 qe*aPJy8\k+ ]t`.څZX0t1+ca XA*:aXոz,Z WAGU RM9'48հA#Aw"xU h6ְ<Qk0@@"4,?0ER zpVTA>ɣoWMOnOf< PA3\OO/`kUWPI˧XGߘ.ܵdf sbd] ̡.a3PugWȣ\,$hkF_eR_xW h_i -(ZwuսtQŒm?[ u\@hsZvcNc(NپE{;bD-ι~[T_rָЏffEd4i$XYh1ꝆN!e:i:eC_,81״>Kuנ%uM|XR62~˻v\CڑQǓ^_VmČa X=d=y̒?ƨ"GS.4y,ħZ_ 9WѼG=gKZR' KsR aOO^- \6\$pz9Oe '!-'j5THg-qh;)fߓ{l~1tr6SDIg¶U}Eib]O ?~lŤ1DD[-#m۟jE-=HjuHγػ=YKpw}WU!uIT!̾+ԻIM?Ej4BOdYv^:_ȈK;=S+fe.xG^.`f9ET/yPUDa H1 HԶ9 ,=gcMݣ%.i'm & rp9HEa* ^^ߑiYK+3oMatMTCuP[kt]y^c>Js9ƯN@ X]' epNm!Qe(q:{Q!&SM46́].g:KUXBm nea穬ʼno#(ʾ,.h4λ9&hϼ4Rj Y`(`BE~6oncK0+cXT#]xoMF%&_RYEݮᏴc[`E[Or#ۉմ=ch$jT%I{~45TGͥe}ueHN"g=?I85=ѕ#cIQu-mK*KJ}n5N3;i*6Sztpxx?/ޫ 9*}xR߰(}4;p_r{p_6$14@`4<}4>5[Ξ|HH'|!KZȁ Ol?H^ʱ+9wqw6/g7J& &2^ӡ+L[M~~f ?ʏC}FOvuьW\27pU3NTlUFمŴiD^*s]࿃Mg1U=r/gԪX')<mΆ@NF1%f|FD((131MUw7쓋6b<mO׃| swf /?@<aä*_NWB{'?|}{ WZ<#izXvfkH8pys'7z+Dl f Z+pWC)i|sJw4]x\TW8-k ?!Sud5jEY wJlוMq.swdlWdKWԎ\hWXm.{kfT3?av_5o햖 "ZGZW5`/g3qJTKUњjo2,HWib{h҅ d)4 endstream endobj 69 0 obj << /Type /XRef /Index [0 70] /Size 70 /W [1 3 1] /Root 67 0 R /Info 68 0 R /ID [<54ABA0D3B6A9591627791699DF6C33F7> <54ABA0D3B6A9591627791699DF6C33F7>] /Length 200 /Filter /FlateDecode >> stream x9RQ Ƞ̠2(B+5[E`B #sWH)Rɩ MyP ăGҧLDL":zw%ąI S""&F"+.ٯwOYD^EQע,**jFԉA|]M)W]me[WٸzS1c( 8 I% endstream endobj startxref 138754 %%EOF RcppMLPACK/inst/include/0000755000176200001440000000000013647512751014461 5ustar liggesusersRcppMLPACK/inst/include/mlpack/0000755000176200001440000000000013647512751015730 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/0000755000176200001440000000000013647512751017373 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/nystroem_method/0000755000176200001440000000000013647512751022613 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/nystroem_method/ordered_selection.hpp0000644000176200001440000000321313647512751027014 0ustar liggesusers/** * @file ordered_selection.hpp * @author Ryan Curtin * * Select the first points of the dataset for use in the Nystroem method of * kernel matrix approximation. This is mostly for testing, but might have * other uses. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_ORDERED_SELECTION_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_ORDERED_SELECTION_HPP #include namespace mlpack { namespace kernel { class OrderedSelection { public: /** * Select the specified number of points in the dataset. * * @param data Dataset to sample from. * @param m Number of points to select. * @return Indices of selected points from the dataset. */ const static arma::Col Select(const arma::mat& /* unused */, const size_t m) { // This generates [0 1 2 3 ... (m - 1)]. return arma::linspace >(0, m - 1, m); } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/nystroem_method/nystroem_method.hpp0000644000176200001440000000574013647512751026552 0ustar liggesusers/** * @file nystroem_method.hpp * @author Ryan Curtin * @author Marcus Edel * * Implementation of the Nystroem method for approximating a kernel matrix. * There are many variations on how to do this, so template parameters allow the * selection of many different techniques. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_HPP #include #include "kmeans_selection.hpp" namespace mlpack { namespace kernel { template< typename KernelType, typename PointSelectionPolicy = KMeansSelection<> > class NystroemMethod { public: /** * Create the NystroemMethod object. The constructor here does not really do * anything. * * @param data Data matrix. * @param kernel Kernel to be used for computation. * @param rank Rank to be used for matrix approximation. */ NystroemMethod(const arma::mat& data, KernelType& kernel, const size_t rank); /** * Apply the low-rank factorization to obtain an output matrix G such that * K' = G * G^T. * * @param output Matrix to store kernel approximation into. */ void Apply(arma::mat& output); /** * Construct the kernel matrix with matrix that contains the selected points. * * @param data Data matrix pointer. * @param miniKernel to store the constructed mini-kernel matrix in. * @param miniKernel to store the constructed semi-kernel matrix in. */ void GetKernelMatrix(const arma::mat* data, arma::mat& miniKernel, arma::mat& semiKernel); /** * Construct the kernel matrix with the selected points. * * @param points Indices of selected points. * @param miniKernel to store the constructed mini-kernel matrix in. * @param miniKernel to store the constructed semi-kernel matrix in. */ void GetKernelMatrix(const arma::Col& selectedPoints, arma::mat& miniKernel, arma::mat& semiKernel); private: //! The reference dataset. const arma::mat& data; //! The locally stored kernel, if it is necessary. KernelType& kernel; //! Rank used for matrix approximation. const size_t rank; }; }; // namespace kernel }; // namespace mlpack // Include implementation. #include "nystroem_method_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/nystroem_method/nystroem_method_impl.hpp0000644000176200001440000000703413647512751027571 0ustar liggesusers/** * @file nystroem_method_impl.hpp * @author Ryan Curtin * @author Marcus Edel * * Implementation of the Nystroem method for approximating a kernel matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_IMPL_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_NYSTROEM_METHOD_IMPL_HPP // In case it hasn't been included yet. #include "nystroem_method.hpp" namespace mlpack { namespace kernel { template NystroemMethod::NystroemMethod( const arma::mat& data, KernelType& kernel, const size_t rank) : data(data), kernel(kernel), rank(rank) { } template void NystroemMethod::GetKernelMatrix( const arma::mat* selectedData, arma::mat& miniKernel, arma::mat& semiKernel) { // Assemble mini-kernel matrix. for (size_t i = 0; i < rank; ++i) for (size_t j = 0; j < rank; ++j) miniKernel(i, j) = kernel.Evaluate(selectedData->col(i), selectedData->col(j)); // Construct semi-kernel matrix with interactions between selected data and // all points. for (size_t i = 0; i < data.n_cols; ++i) for (size_t j = 0; j < rank; ++j) semiKernel(i, j) = kernel.Evaluate(data.col(i), selectedData->col(j)); // Clean the memory. delete selectedData; } template void NystroemMethod::GetKernelMatrix( const arma::Col& selectedPoints, arma::mat& miniKernel, arma::mat& semiKernel) { // Assemble mini-kernel matrix. for (size_t i = 0; i < rank; ++i) for (size_t j = 0; j < rank; ++j) miniKernel(i, j) = kernel.Evaluate(data.col(selectedPoints(i)), data.col(selectedPoints(j))); // Construct semi-kernel matrix with interactions between selected points and // all points. for (size_t i = 0; i < data.n_cols; ++i) for (size_t j = 0; j < rank; ++j) semiKernel(i, j) = kernel.Evaluate(data.col(i), data.col(selectedPoints(j))); } template void NystroemMethod::Apply(arma::mat& output) { arma::mat miniKernel(rank, rank); arma::mat semiKernel(data.n_cols, rank); GetKernelMatrix(PointSelectionPolicy::Select(data, rank), miniKernel, semiKernel); // Singular value decomposition mini-kernel matrix. arma::mat U, V; arma::vec s; arma::svd(U, s, V, miniKernel); // Construct the output matrix. arma::mat normalization = arma::diagmat(1.0 / sqrt(s)); output = semiKernel * U * normalization * V; } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/nystroem_method/random_selection.hpp0000644000176200001440000000323213647512751026651 0ustar liggesusers/** * @file random_selection.hpp * @author Ryan Curtin * * Randomly select some points (with replacement) to use for the Nystroem * method. Replacement is suboptimal, but for rank << number of points, this is * unlikely. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_RANDOM_SELECTION_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_RANDOM_SELECTION_HPP #include namespace mlpack { namespace kernel { class RandomSelection { public: /** * Randomly select the specified number of points in the dataset. * * @param data Dataset to sample from. * @param m Number of points to select. * @return Indices of selected points from the dataset. */ const static arma::Col Select(const arma::mat& data, const size_t m) { arma::Col selectedPoints(m); for (size_t i = 0; i < m; ++i) selectedPoints(i) = math::RandInt(0, data.n_cols); return selectedPoints; } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/nystroem_method/kmeans_selection.hpp0000644000176200001440000000372213647512751026653 0ustar liggesusers/** * @file kmeans_selection.hpp * @author Marcus Edel * * Use the centroids of the K-Means clustering method for use in the Nystroem * method of kernel matrix approximation. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NYSTROEM_METHOD_KMEANS_SELECTION_HPP #define __MLPACK_METHODS_NYSTROEM_METHOD_KMEANS_SELECTION_HPP #include #include namespace mlpack { namespace kernel { template > class KMeansSelection { public: /** * Use the K-Means clustering method to select the specified number of points * in the dataset. You are responsible for deleting the returned matrix! * * @param data Dataset to sample from. * @param m Number of points to select. * @return Matrix pointer in which centroids are stored. */ const static arma::mat* Select(const arma::mat& data, const size_t m, const size_t maxIterations = 5) { arma::Col assignments; arma::mat* centroids = new arma::mat; // Perform the K-Means clustering method. ClusteringType kmeans(maxIterations); kmeans.Cluster(data, m, assignments, *centroids); return centroids; } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/0000755000176200001440000000000013647512751020136 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/amf/amf_impl.hpp0000644000176200001440000000501313647512751022432 0ustar liggesusers/** * @file amf_impl.hpp * @author Sumedh Ghaisas * @author Mohan Rajendran * @author Ryan Curtin * * Implementation of AMF class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ namespace mlpack { namespace amf { /** * Construct the AMF object. */ template AMF::AMF( const TerminationPolicyType& terminationPolicy, const InitializationRuleType& initializationRule, const UpdateRuleType& update) : terminationPolicy(terminationPolicy), initializationRule(initializationRule), update(update) { } /** * Apply Alternating Matrix Factorization to the provided matrix. * * @param V Input matrix to be factorized * @param W Basis matrix to be output * @param H Encoding matrix to output * @param r Rank r of the factorization */ template template double AMF:: Apply(const MatType& V, const size_t r, arma::mat& W, arma::mat& H) { // Initialize W and H. initializationRule.Initialize(V, r, W, H); Rcpp::Rcout << "Initialized W and H." << std::endl; update.Initialize(V, r); terminationPolicy.Initialize(V); while (!terminationPolicy.IsConverged(W, H)) { // Update the values of W and H based on the update rules provided. update.WUpdate(V, W, H); update.HUpdate(V, W, H); } const double residue = terminationPolicy.Index(); const size_t iteration = terminationPolicy.Iteration(); Rcpp::Rcout << "AMF converged to residue of " << residue << " in " << iteration << " iterations." << std::endl; return residue; } }; // namespace amf }; // namespace mlpack RcppMLPACK/inst/include/mlpack/methods/amf/update_rules/0000755000176200001440000000000013647512751022632 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/amf/update_rules/svd_complete_incremental_learning.hpp0000644000176200001440000001370413647512751032274 0ustar liggesusers#ifndef SVD_COMPLETE_INCREMENTAL_LEARNING_HPP_INCLUDED #define SVD_COMPLETE_INCREMENTAL_LEARNING_HPP_INCLUDED #include namespace mlpack { namespace amf { template class SVDCompleteIncrementalLearning { public: SVDCompleteIncrementalLearning(double u = 0.0001, double kw = 0, double kh = 0) : u(u), kw(kw), kh(kh) {} void Initialize(const MatType& dataset, const size_t rank) { (void)rank; n = dataset.n_rows; m = dataset.n_cols; currentUserIndex = 0; currentItemIndex = 0; } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ inline void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { arma::mat deltaW(1, W.n_cols); deltaW.zeros(); while(true) { double val; if((val = V(currentItemIndex, currentUserIndex)) != 0) { deltaW += (val - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW -= kw * W.row(currentItemIndex); break; } currentUserIndex = currentUserIndex + 1; if(currentUserIndex == n) { currentUserIndex = 0; currentItemIndex = (currentItemIndex + 1) % m; } } W.row(currentItemIndex) += u*deltaW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ inline void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); while(true) { double val; if((val = V(currentItemIndex, currentUserIndex)) != 0) deltaH += (val - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(W.row(currentItemIndex)); if(kh != 0) deltaH -= kh * H.col(currentUserIndex); currentUserIndex = currentUserIndex + 1; if(currentUserIndex == n) { currentUserIndex = 0; currentItemIndex = (currentItemIndex + 1) % m; } } H.col(currentUserIndex++) += u * deltaH; } private: double u; double kw; double kh; size_t n; size_t m; size_t currentUserIndex; size_t currentItemIndex; }; template<> class SVDCompleteIncrementalLearning { public: SVDCompleteIncrementalLearning(double u = 0.01, double kw = 0, double kh = 0) : u(u), kw(kw), kh(kh), it(NULL) {} ~SVDCompleteIncrementalLearning() { delete it; } void Initialize(const arma::sp_mat& dataset, const size_t rank) { (void)rank; n = dataset.n_rows; m = dataset.n_cols; it = new arma::sp_mat::const_iterator(dataset.begin()); isStart = true; } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ inline void WUpdate(const arma::sp_mat& V, arma::mat& W, const arma::mat& H) { if(!isStart) (*it)++; else isStart = false; if(*it == V.end()) { delete it; it = new arma::sp_mat::const_iterator(V.begin()); } size_t currentUserIndex = it->col(); size_t currentItemIndex = it->row(); arma::mat deltaW(1, W.n_cols); deltaW.zeros(); deltaW += (**it - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW -= kw * W.row(currentItemIndex); W.row(currentItemIndex) += u*deltaW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ inline void HUpdate(const arma::sp_mat& V, const arma::mat& W, arma::mat& H) { (void)V; arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); size_t currentUserIndex = it->col(); size_t currentItemIndex = it->row(); deltaH += (**it - arma::dot(W.row(currentItemIndex), H.col(currentUserIndex))) * arma::trans(W.row(currentItemIndex)); if(kh != 0) deltaH -= kh * H.col(currentUserIndex); H.col(currentUserIndex++) += u * deltaH; } private: double u; double kw; double kh; size_t n; size_t m; arma::sp_mat dummy; arma::sp_mat::const_iterator* it; bool isStart; }; } } #endif // SVD_COMPLETE_INCREMENTAL_LEARNING_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/update_rules/nmf_mult_dist.hpp0000644000176200001440000000571313647512751026215 0ustar liggesusers/** * @file nmf_mult_dist.hpp * @author Mohan Rajendran * * Update rules for the Non-negative Matrix Factorization. This follows a method * described in the paper 'Algorithms for Non-negative Matrix Factorization' * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures * that the Frobenius norm \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ is * non-increasing between subsequent iterations. Both of the update rules * for W and H are defined in this file. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIST_UPDATE_RULES_HPP #define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIST_UPDATE_RULES_HPP #include namespace mlpack { namespace amf { /** * The multiplicative distance update rules for matrices W and H. */ class NMFMultiplicativeDistanceUpdate { public: // Empty constructor required for the UpdateRule template. NMFMultiplicativeDistanceUpdate() { } template void Initialize(const MatType& dataset, const size_t rank) { (void)dataset; (void)rank; } /** * The update rule for the basis matrix W. The formula used is * \f[ * W_{ia} \leftarrow W_{ia} \frac{(VH^T)_{ia}}{(WHH^T)_{ia}} * \f] * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline static void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { W = (W % (V * H.t())) / (W * H * H.t()); } /** * The update rule for the encoding matrix H. The formula used is * \f[ * H_{a\mu} \leftarrow H_{a\mu} \frac{(W^T V)_{a\mu}}{(W^T WH)_{a\mu}} * \f] * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline static void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { H = (H % (W.t() * V)) / (W.t() * W * H); } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/update_rules/svd_batch_learning.hpp0000644000176200001440000001177013647512751027165 0ustar liggesusers/** * @file simple_residue_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_AMF_UPDATE_RULES_SVD_BATCHLEARNING_HPP #define __MLPACK_METHODS_AMF_UPDATE_RULES_SVD_BATCHLEARNING_HPP #include namespace mlpack { namespace amf { class SVDBatchLearning { public: SVDBatchLearning(double u = 0.0002, double kw = 0, double kh = 0, double momentum = 0.9, double min = -DBL_MIN, double max = DBL_MAX) : u(u), kw(kw), kh(kh), min(min), max(max), momentum(momentum) {} template void Initialize(const MatType& dataset, const size_t rank) { const size_t n = dataset.n_rows; const size_t m = dataset.n_cols; mW.zeros(n, rank); mH.zeros(rank, m); } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { size_t n = V.n_rows; size_t m = V.n_cols; size_t r = W.n_cols; mW = momentum * mW; arma::mat deltaW(n, r); deltaW.zeros(); for(size_t i = 0;i < n;i++) { for(size_t j = 0;j < m;j++) { double val; if((val = V(i, j)) != 0) deltaW.row(i) += (val - arma::dot(W.row(i), H.col(j))) * arma::trans(H.col(j)); } if(kw != 0) deltaW.row(i) -= kw * W.row(i); } mW += u * deltaW; W += mW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { size_t n = V.n_rows; size_t m = V.n_cols; size_t r = W.n_cols; mH = momentum * mH; arma::mat deltaH(r, m); deltaH.zeros(); for(size_t j = 0;j < m;j++) { for(size_t i = 0;i < n;i++) { double val; if((val = V(i, j)) != 0) deltaH.col(j) += (val - arma::dot(W.row(i), H.col(j))) * arma::trans(W.row(i)); } if(kh != 0) deltaH.col(j) -= kh * H.col(j); } mH += u*deltaH; H += mH; } private: double u; double kw; double kh; double min; double max; double momentum; arma::mat mW; arma::mat mH; }; template<> inline void SVDBatchLearning::WUpdate(const arma::sp_mat& V, arma::mat& W, const arma::mat& H) { size_t n = V.n_rows; size_t r = W.n_cols; mW = momentum * mW; arma::mat deltaW(n, r); deltaW.zeros(); for(arma::sp_mat::const_iterator it = V.begin();it != V.end();it++) { size_t row = it.row(); size_t col = it.col(); deltaW.row(it.row()) += (*it - arma::dot(W.row(row), H.col(col))) * arma::trans(H.col(col)); } if(kw != 0) for(size_t i = 0; i < n; i++) { deltaW.row(i) -= kw * W.row(i); } mW += u * deltaW; W += mW; } template<> inline void SVDBatchLearning::HUpdate(const arma::sp_mat& V, const arma::mat& W, arma::mat& H) { size_t m = V.n_cols; size_t r = W.n_cols; mH = momentum * mH; arma::mat deltaH(r, m); deltaH.zeros(); for(arma::sp_mat::const_iterator it = V.begin();it != V.end();it++) { size_t row = it.row(); size_t col = it.col(); deltaH.col(col) += (*it - arma::dot(W.row(row), H.col(col))) * arma::trans(W.row(row)); } if(kh != 0) for(size_t j = 0; j < m; j++) { deltaH.col(j) -= kh * H.col(j); } mH += u*deltaH; H += mH; } } // namespace amf } // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/update_rules/nmf_als.hpp0000644000176200001440000000644013647512751024766 0ustar liggesusers/** * @file nmf_als.hpp * @author Mohan Rajendran * * Update rules for the Non-negative Matrix Factorization. This follows a method * titled 'Alternating Least Squares' described in the paper 'Positive Matrix * Factorization: A Non-negative Factor Model with Optimal Utilization of * Error Estimates of Data Values' by P. Paatero and U. Tapper. It uses least * squares projection formula to reduce the error value of * \f$ \sqrt{\sum_i \sum_j(V-WH)^2} \f$ by alternately calculating W and H * respectively while holding the other matrix constant. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_ALS_HPP #define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_ALS_HPP #include namespace mlpack { namespace amf { /** * The alternating least square update rules of matrices W and H. */ class NMFALSUpdate { public: // Empty constructor required for the UpdateRule template. NMFALSUpdate() { } template void Initialize(const MatType& dataset, const size_t rank) { (void)dataset; (void)rank; } /** * The update rule for the basis matrix W. The formula used is * \f[ * W^T = \frac{HV^T}{HH^T} * \f] * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline static void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { // The call to inv() sometimes fails; so we are using the psuedoinverse. // W = (inv(H * H.t()) * H * V.t()).t(); W = V * H.t() * pinv(H * H.t()); // Set all negative numbers to machine epsilon for (size_t i = 0; i < W.n_elem; i++) { if (W(i) < 0.0) { W(i) = 0.0; } } } /** * The update rule for the encoding matrix H. The formula used is * \f[ * H = \frac{W^TV}{W^TW} * \f] * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline static void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { H = pinv(W.t() * W) * W.t() * V; // Set all negative numbers to 0. for (size_t i = 0; i < H.n_elem; i++) { if (H(i) < 0.0) { H(i) = 0.0; } } } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/update_rules/nmf_mult_div.hpp0000644000176200001440000001056213647512751026032 0ustar liggesusers/** * @file mult_div_update_rules.hpp * @author Mohan Rajendran * * Update rules for the Non-negative Matrix Factorization. This follows a method * described in the paper 'Algorithms for Non-negative Matrix Factorization' * by D. D. Lee and H. S. Seung. This is a multiplicative rule that ensures * that the Kullback–Leibler divergence * \f$ \sum_i \sum_j (V_{ij} log\frac{V_{ij}}{(WH)_{ij}}-V_{ij}+(WH)_{ij}) \f$is * non-increasing between subsequent iterations. Both of the update rules * for W and H are defined in this file. * * This set of update rules is not meant to work with sparse matrices. Using * sparse matrices often causes NaNs in the output, so other choices of update * rules are better in that situation. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIV_HPP #define __MLPACK_METHODS_LMF_UPDATE_RULES_NMF_MULT_DIV_HPP #include namespace mlpack { namespace amf { class NMFMultiplicativeDivergenceUpdate { public: // Empty constructor required for the WUpdateRule template. NMFMultiplicativeDivergenceUpdate() { } template void Initialize(const MatType& dataset, const size_t rank) { (void)dataset; (void)rank; } /** * The update rule for the basis matrix W. The formula used is * \f[ * W_{ia} \leftarrow W_{ia} \frac{\sum_{\mu} H_{a\mu} V_{i\mu}/(WH)_{i\mu}} * {\sum_{\nu} H_{a\nu}} * \f] * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. */ template inline static void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { // Simple implementation left in the header file. arma::mat t1; arma::rowvec t2; t1 = W * H; for (size_t i = 0; i < W.n_rows; ++i) { for (size_t j = 0; j < W.n_cols; ++j) { // Writing this as a single expression does not work as of Armadillo // 3.920. This should be fixed in a future release, and then the code // below can be fixed. //t2 = H.row(j) % V.row(i) / t1.row(i); t2.set_size(H.n_cols); for (size_t k = 0; k < t2.n_elem; ++k) { t2(k) = H(j, k) * V(i, k) / t1(i, k); } W(i, j) = W(i, j) * sum(t2) / sum(H.row(j)); } } } /** * The update rule for the encoding matrix H. The formula used is * \f[ * H_{a\mu} \leftarrow H_{a\mu} \frac{\sum_{i} W_{ia} V_{i\mu}/(WH)_{i\mu}} * {\sum_{k} H_{ka}} * \f] * The function takes in all the matrices and only changes the value * of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to updated. */ template inline static void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { // Simple implementation left in the header file. arma::mat t1; arma::colvec t2; t1 = W * H; for (size_t i = 0; i < H.n_rows; i++) { for (size_t j = 0; j < H.n_cols; j++) { // Writing this as a single expression does not work as of Armadillo // 3.920. This should be fixed in a future release, and then the code // below can be fixed. //t2 = W.col(i) % V.col(j) / t1.col(j); t2.set_size(W.n_rows); for (size_t k = 0; k < t2.n_elem; ++k) { t2(k) = W(k, i) * V(k, j) / t1(k, j); } H(i,j) = H(i,j) * sum(t2) / sum(W.col(i)); } } } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/update_rules/svd_incomplete_incremental_learning.hpp0000644000176200001440000001131413647512751032616 0ustar liggesusers#ifndef SVD_INCREMENTAL_LEARNING_HPP_INCLUDED #define SVD_INCREMENTAL_LEARNING_HPP_INCLUDED namespace mlpack { namespace amf { class SVDIncompleteIncrementalLearning { public: SVDIncompleteIncrementalLearning(double u = 0.001, double kw = 0, double kh = 0) : u(u), kw(kw), kh(kh) {} template void Initialize(const MatType& dataset, const size_t rank) { (void)rank; n = dataset.n_rows; m = dataset.n_cols; currentUserIndex = 0; } /** * The update rule for the basis matrix W. * The function takes in all the matrices and only changes the * value of the W matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be updated. * @param H Encoding matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ template inline void WUpdate(const MatType& V, arma::mat& W, const arma::mat& H) { arma::mat deltaW(n, W.n_cols); deltaW.zeros(); for(size_t i = 0;i < n;i++) { double val; if((val = V(i, currentUserIndex)) != 0) deltaW.row(i) += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW.row(i) -= kw * W.row(i); } W += u*deltaW; } /** * The update rule for the encoding matrix H. * The function takes in all the matrices and only changes the * value of the H matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix. * @param H Encoding matrix to be updated. */ template inline void HUpdate(const MatType& V, const arma::mat& W, arma::mat& H) { arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); for(size_t i = 0;i < n;i++) { double val; if((val = V(i, currentUserIndex)) != 0) deltaH += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(W.row(i)); } if(kh != 0) deltaH -= kh * H.col(currentUserIndex); H.col(currentUserIndex++) += u * deltaH; currentUserIndex = currentUserIndex % m; } private: double u; double kw; double kh; size_t n; size_t m; size_t currentUserIndex; }; template<> inline void SVDIncompleteIncrementalLearning:: WUpdate(const arma::sp_mat& V, arma::mat& W, const arma::mat& H) { arma::mat deltaW(n, W.n_cols); deltaW.zeros(); for(arma::sp_mat::const_iterator it = V.begin_col(currentUserIndex); it != V.end_col(currentUserIndex);it++) { double val = *it; size_t i = it.row(); deltaW.row(i) += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(H.col(currentUserIndex)); if(kw != 0) deltaW.row(i) -= kw * W.row(i); } W += u*deltaW; } template<> inline void SVDIncompleteIncrementalLearning:: HUpdate(const arma::sp_mat& V, const arma::mat& W, arma::mat& H) { arma::mat deltaH(H.n_rows, 1); deltaH.zeros(); for(arma::sp_mat::const_iterator it = V.begin_col(currentUserIndex); it != V.end_col(currentUserIndex);it++) { double val = *it; size_t i = it.row(); if((val = V(i, currentUserIndex)) != 0) deltaH += (val - arma::dot(W.row(i), H.col(currentUserIndex))) * arma::trans(W.row(i)); } if(kh != 0) deltaH -= kh * H.col(currentUserIndex); H.col(currentUserIndex++) += u * deltaH; currentUserIndex = currentUserIndex % m; } }; // namepsace amf }; // namespace mlpack #endif // SVD_INCREMENTAL_LEARNING_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/init_rules/0000755000176200001440000000000013647512751022313 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/amf/init_rules/random_init.hpp0000644000176200001440000000322613647512751025332 0ustar liggesusers/** * @file random_init.hpp * @author Mohan Rajendran * * Intialization rule for Non-Negative Matrix Factorization (NMF). This simple * initialization is performed by assigning a random matrix to W and H. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_RANDOM_INIT_HPP #define __MLPACK_METHODS_LMF_RANDOM_INIT_HPP #include namespace mlpack { namespace amf { class RandomInitialization { public: // Empty constructor required for the InitializeRule template RandomInitialization() { } template inline static void Initialize(const MatType& V, const size_t r, arma::mat& W, arma::mat& H) { // Simple implementation (left in the header file due to its simplicity). size_t n = V.n_rows; size_t m = V.n_cols; // Intialize to random values. W.randu(n, r); H.randu(r, m); } }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/init_rules/random_acol_init.hpp0000644000176200001440000000515013647512751026326 0ustar liggesusers/** * @file random_acol_init.hpp * @author Mohan Rajendran * * Intialization rule for Non-Negative Matrix Factorization. This simple * initialization is performed by the random Acol initialization introduced in * the paper 'Algorithms, Initializations and Convergence' by Langville et al. * This method sets each of the columns of W by averaging p randomly chosen * columns of V. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LMF_RANDOM_ACOL_INIT_HPP #define __MLPACK_METHODS_LMF_RANDOM_ACOL_INIT_HPP #include namespace mlpack { namespace amf { /** * This class initializes the W matrix of the NMF algorithm by averaging p * randomly chosen columns of V. In this case, p is a template parameter. H is * then set randomly. * * @tparam The number of random columns to average for each column of W. */ template class RandomAcolInitialization { public: // Empty constructor required for the InitializeRule template RandomAcolInitialization() { } template inline static void Initialize(const MatType& V, const size_t r, arma::mat& W, arma::mat& H) { const size_t n = V.n_rows; const size_t m = V.n_cols; if (p > m) { Log::Warn << "Number of random columns is more than the number of columns" << "available in the V matrix; weird results may ensue!" << std::endl; } W.zeros(n, r); // Initialize W matrix with random columns. for (size_t col = 0; col < r; col++) { for (size_t randCol = 0; randCol < p; randCol++) { // .col() does not work in this case, as of Armadillo 3.920. W.unsafe_col(col) += V.col(math::RandInt(0, m)); } } // Now divide by p. W /= p; // Initialize H to random values. H.randu(r, m); } }; // Class RandomAcolInitialization }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/0000755000176200001440000000000013647512751024356 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/simple_tolerance_termination.hpp0000644000176200001440000000676013647512751033036 0ustar liggesusers/** * @file simple_tolerance_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHODS_AMF_SIMPLE_TOLERANCE_TERMINATION_HPP_INCLUDED #define _MLPACK_METHODS_AMF_SIMPLE_TOLERANCE_TERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { template class SimpleToleranceTermination { public: SimpleToleranceTermination(const double tolerance = 1e-5, const size_t maxIterations = 10000, const size_t reverseStepTolerance = 3) : tolerance(tolerance), maxIterations(maxIterations), reverseStepTolerance(reverseStepTolerance) {} void Initialize(const MatType& V) { residueOld = DBL_MAX; iteration = 1; residue = DBL_MIN; reverseStepCount = 0; this->V = &V; c_index = 0; c_indexOld = 0; reverseStepCount = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { // Calculate norm of WH after each iteration. arma::mat WH; WH = W * H; residueOld = residue; size_t n = V->n_rows; size_t m = V->n_cols; double sum = 0; size_t count = 0; for(size_t i = 0;i < n;i++) { for(size_t j = 0;j < m;j++) { double temp = 0; if((temp = (*V)(i,j)) != 0) { temp = (temp - WH(i, j)); temp = temp * temp; sum += temp; count++; } } } residue = sum / count; residue = sqrt(residue); iteration++; if((residueOld - residue) / residueOld < tolerance && iteration > 4) { if(reverseStepCount == 0 && isCopy == false) { isCopy = true; this->W = W; this->H = H; c_index = residue; c_indexOld = residueOld; } reverseStepCount++; } else { reverseStepCount = 0; if(residue <= c_indexOld && isCopy == true) { isCopy = false; } } if(reverseStepCount == reverseStepTolerance || iteration > maxIterations) { if(isCopy) { W = this->W; H = this->H; residue = c_index; } return true; } else return false; } const double& Index() { return residue; } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return maxIterations; } private: double tolerance; size_t maxIterations; const MatType* V; size_t iteration; double residueOld; double residue; double normOld; size_t reverseStepTolerance; size_t reverseStepCount; bool isCopy; arma::mat W; arma::mat H; double c_indexOld; double c_index; }; // class SimpleToleranceTermination }; // namespace amf }; // namespace mlpack #endif // _MLPACK_METHODS_AMF_SIMPLE_TOLERANCE_TERMINATION_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/simple_residue_termination.hpp0000644000176200001440000000434113647512751032513 0ustar liggesusers/** * @file simple_residue_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHODS_AMF_SIMPLERESIDUETERMINATION_HPP_INCLUDED #define _MLPACK_METHODS_AMF_SIMPLERESIDUETERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { class SimpleResidueTermination { public: SimpleResidueTermination(const double minResidue = 1e-10, const size_t maxIterations = 10000) : minResidue(minResidue), maxIterations(maxIterations) { } template void Initialize(const MatType& V) { residue = minResidue; iteration = 1; normOld = 0; const size_t n = V.n_rows; const size_t m = V.n_cols; nm = n * m; } bool IsConverged(arma::mat& W, arma::mat& H) { // Calculate norm of WH after each iteration. arma::mat WH; WH = W * H; double norm = sqrt(accu(WH % WH) / nm); if (iteration != 0) { residue = fabs(normOld - norm); residue /= normOld; } normOld = norm; iteration++; if(residue < minResidue || iteration > maxIterations) return true; else return false; } const double& Index() { return residue; } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return maxIterations; } public: double minResidue; size_t maxIterations; double residue; size_t iteration; double normOld; size_t nm; }; // class SimpleResidueTermination }; // namespace amf }; // namespace mlpack #endif // _MLPACK_METHODS_AMF_SIMPLERESIDUETERMINATION_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/incomplete_incremental.hpp0000644000176200001440000000346513647512751031617 0ustar liggesusers/** * @file incomplete_incremental_termination.hpp * @author Sumedh Ghaisas * * This file is part of MLPACK 1.0.9. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _INCOMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #define _INCOMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { template class IncompleteIncrementalTermination { public: IncompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy()) : t_policy(t_policy) {} template void Initialize(const MatType& V) { t_policy.Initialize(V); incrementalIndex = V.n_rows; iteration = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { iteration++; if(iteration % incrementalIndex == 0) return t_policy.IsConverged(W, H); else return false; } const double& Index() { return t_policy.Index(); } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return t_policy.MaxIterations(); } private: TerminationPolicy t_policy; size_t incrementalIndex; size_t iteration; }; }; // namespace amf }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/validation_RMSE_termination.hpp0000644000176200001440000000757713647512751032500 0ustar liggesusers/** * @file validation_RMSE_termination.hpp * @author Sumedh Ghaisas * * Termination policy that checks validation RMSE. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef VALIDATION_RMSE_TERMINATION_HPP_INCLUDED #define VALIDATION_RMSE_TERMINATION_HPP_INCLUDED #include namespace mlpack { namespace amf { template class ValidationRMSETermination { public: ValidationRMSETermination(MatType& V, size_t num_test_points, double tolerance = 1e-5, size_t maxIterations = 10000, size_t reverseStepTolerance = 3) : tolerance(tolerance), maxIterations(maxIterations), num_test_points(num_test_points), reverseStepTolerance(reverseStepTolerance) { size_t n = V.n_rows; size_t m = V.n_cols; test_points.zeros(num_test_points, 3); for(size_t i = 0; i < num_test_points; i++) { double t_val; size_t t_row; size_t t_col; do { t_row = rand() % n; t_col = rand() % m; } while((t_val = V(t_row, t_col)) == 0); test_points(i, 0) = t_row; test_points(i, 1) = t_col; test_points(i, 2) = t_val; V(t_row, t_col) = 0; } } void Initialize(const MatType& /* V */) { iteration = 1; rmse = DBL_MAX; rmseOld = DBL_MAX; c_index = 0; c_indexOld = 0; reverseStepCount = 0; isCopy = false; } bool IsConverged(arma::mat& W, arma::mat& H) { // Calculate norm of WH after each iteration. arma::mat WH; WH = W * H; if (iteration != 0) { rmseOld = rmse; rmse = 0; for(size_t i = 0; i < num_test_points; i++) { size_t t_row = test_points(i, 0); size_t t_col = test_points(i, 1); double t_val = test_points(i, 2); double temp = (t_val - WH(t_row, t_col)); temp *= temp; rmse += temp; } rmse /= num_test_points; rmse = sqrt(rmse); } iteration++; if((rmseOld - rmse) / rmseOld < tolerance && iteration > 4) { if(reverseStepCount == 0 && isCopy == false) { isCopy = true; this->W = W; this->H = H; c_indexOld = rmseOld; c_index = rmse; } reverseStepCount++; } else { reverseStepCount = 0; if(rmse <= c_indexOld && isCopy == true) { isCopy = false; } } if(reverseStepCount == reverseStepTolerance || iteration > maxIterations) { if(isCopy) { W = this->W; H = this->H; rmse = c_index; } return true; } else return false; } const double& Index() { return rmse; } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return maxIterations; } private: double tolerance; size_t maxIterations; size_t num_test_points; size_t iteration; arma::Mat test_points; double rmseOld; double rmse; size_t reverseStepTolerance; size_t reverseStepCount; bool isCopy; arma::mat W; arma::mat H; double c_indexOld; double c_index; }; } // namespace amf } // namespace mlpack #endif // VALIDATION_RMSE_TERMINATION_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/complete_incremental_termination.hpp0000644000176200001440000000377713647512751033707 0ustar liggesusers/** * @file complete_incremental_termination_hpp * @author Sumedh Ghaisas * * Complete incremental learning termination policy. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #define COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED namespace mlpack { namespace amf { template class CompleteIncrementalTermination { public: CompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy()) : t_policy(t_policy) {} template void Initialize(const MatType& V) { t_policy.Initialize(V); incrementalIndex = accu(V != 0); iteration = 0; } void Initialize(const arma::sp_mat& V) { t_policy.Initialize(V); incrementalIndex = V.n_nonzero; iteration = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { iteration++; if(iteration % incrementalIndex == 0) return t_policy.IsConverged(W, H); else return false; } const double& Index() { return t_policy.Index(); } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return t_policy.MaxIterations(); } private: TerminationPolicy t_policy; size_t incrementalIndex; size_t iteration; }; } // namespace amf } // namespace mlpack #endif // COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/termination_policies/complete_incremental.hpp0000644000176200001440000000377613647512751031275 0ustar liggesusers/** * @file complete_incremental_termination_hpp * @author Sumedh Ghaisas * * Complete incremental learning termination policy. * * This file is part of MLPACK 1.0.9. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED #define COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED namespace mlpack { namespace amf { template class CompleteIncrementalTermination { public: CompleteIncrementalTermination(TerminationPolicy t_policy = TerminationPolicy()) : t_policy(t_policy) {} template void Initialize(const MatType& V) { t_policy.Initialize(V); incrementalIndex = accu(V != 0); iteration = 0; } void Initialize(const arma::sp_mat& V) { t_policy.Initialize(V); incrementalIndex = V.n_nonzero; iteration = 0; } bool IsConverged(arma::mat& W, arma::mat& H) { iteration++; if(iteration % incrementalIndex == 0) return t_policy.IsConverged(W, H); else return false; } const double& Index() { return t_policy.Index(); } const size_t& Iteration() { return iteration; } const size_t& MaxIterations() { return t_policy.MaxIterations(); } private: TerminationPolicy t_policy; size_t incrementalIndex; size_t iteration; }; } // namespace amf } // namespace mlpack #endif // COMPLETE_INCREMENTAL_TERMINATION_HPP_INCLUDED RcppMLPACK/inst/include/mlpack/methods/amf/amf.hpp0000644000176200001440000001315513647512751021417 0ustar liggesusers/** * @file amf.hpp * @author Sumedh Ghaisas * @author Mohan Rajendran * @author Ryan Curtin * * The AMF (alternating matrix factorization) class, from which more commonly * known techniques such as incremental SVD, NMF, and batch-learning SVD can be * derived. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_AMF_AMF_HPP #define __MLPACK_METHODS_AMF_AMF_HPP #include #include #include #include namespace mlpack { namespace amf { /** * This class implements AMF (alternating matrix factorization) on the given * matrix V. Alternating matrix factorization decomposes V in the form * \f$ V \approx WH \f$ where W is called the basis matrix and H is called the * encoding matrix. V is taken to be of size n x m and the obtained W is n x r * and H is r x m. The size r is called the rank of the factorization. * * The implementation requires three template types; the first contains the * policy used to determine when the algorithm has converged; the second * contains the initialization rule for the W and H matrix; the last contains * the update rule to be used during each iteration. This templatization allows * the user to try various update rules, initialization rules, and termination * policies (including ones not supplied with MLPACK) for factorization. By * default, the template parameters to AMF implement non-negative matrix * factorization with the multiplicative distance update. * * A simple example of how to run AMF (or NMF) is shown below. * * @code * extern arma::mat V; // Matrix that we want to perform LMF on. * size_t r = 10; // Rank of decomposition * arma::mat W; // Basis matrix * arma::mat H; // Encoding matrix * * AMF<> amf; // Default options: NMF with multiplicative distance update rules. * amf.Apply(V, W, H, r); * @endcode * * @tparam TerminationPolicy The policy to use for determining when the * factorization has converged. * @tparam InitializationRule The initialization rule for initializing W and H * matrix. * @tparam UpdateRule The update rule for calculating W and H matrix at each * iteration. * * @see NMF_MultiplicativeDistanceUpdate */ template class AMF { public: /** * Create the AMF object and (optionally) set the parameters which AMF will * run with. The minimum residue refers to the root mean square of the * difference between two subsequent iterations of the product W * H. A low * residue indicates that subsequent iterations are not producing much change * in W and H. Once the residue goes below the specified minimum residue, the * algorithm terminates. * * @param initializationRule Optional instantiated InitializationRule object * for initializing the W and H matrices. * @param updateRule Optional instantiated UpdateRule object; this parameter * is useful when the update rule for the W and H vector has state that * it needs to store (i.e. HUpdate() and WUpdate() are not static * functions). * @param terminationPolicy Optional instantiated TerminationPolicy object. */ AMF(const TerminationPolicyType& terminationPolicy = TerminationPolicyType(), const InitializationRuleType& initializeRule = InitializationRuleType(), const UpdateRuleType& update = UpdateRuleType()); /** * Apply Alternating Matrix Factorization to the provided matrix. * * @param V Input matrix to be factorized. * @param W Basis matrix to be output. * @param H Encoding matrix to output. * @param r Rank r of the factorization. */ template double Apply(const MatType& V, const size_t r, arma::mat& W, arma::mat& H); //! Access the termination policy. const TerminationPolicyType& TerminationPolicy() const { return terminationPolicy; } //! Modify the termination policy. TerminationPolicyType& TerminationPolicy() { return terminationPolicy; } //! Access the initialization rule. const InitializationRuleType& InitializeRule() const { return initializationRule; } //! Modify the initialization rule. InitializationRuleType& InitializeRule() { return initializationRule; } //! Access the update rule. const UpdateRuleType& Update() const { return update; } //! Modify the update rule. UpdateRuleType& Update() { return update; } private: //! Termination policy. TerminationPolicyType terminationPolicy; //! Instantiated initialization Rule. InitializationRuleType initializationRule; //! Instantiated update rule. UpdateRuleType update; }; // class AMF }; // namespace amf }; // namespace mlpack // Include implementation. #include "amf_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/radical/0000755000176200001440000000000013647512751020772 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/radical/radical.hpp0000644000176200001440000001243113647512751023103 0ustar liggesusers/** * @file radical.hpp * @author Nishant Mehta * * Declaration of Radical class (RADICAL is Robust, Accurate, Direct ICA * aLgorithm). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RADICAL_RADICAL_HPP #define __MLPACK_METHODS_RADICAL_RADICAL_HPP #include namespace mlpack { namespace radical { /** * An implementation of RADICAL, an algorithm for independent component * analysis (ICA). * * Let X be a matrix where each column is a point and each row a dimension. * The goal is to find a square unmixing matrix W such that Y = W X and * the rows of Y are independent components. * * For more details, see the following paper: * * @code * @article{learned2003ica, * title = {ICA Using Spacings Estimates of Entropy}, * author = {Learned-Miller, E.G. and Fisher III, J.W.}, * journal = {Journal of Machine Learning Research}, * volume = {4}, * pages = {1271--1295}, * year = {2003} * } * @endcode */ class Radical { public: /** * Set the parameters to RADICAL. * * @param noiseStdDev Standard deviation of the Gaussian noise added to the * replicates of the data points during Radical2D * @param replicates Number of Gaussian-perturbed replicates to use (per * point) in Radical2D * @param angles Number of angles to consider in brute-force search during * Radical2D * @param sweeps Number of sweeps. Each sweep calls Radical2D once for each * pair of dimensions * @param m The variable m from Vasicek's m-spacing estimator of entropy. */ Radical(const double noiseStdDev = 0.175, const size_t replicates = 30, const size_t angles = 150, const size_t sweeps = 0, const size_t m = 0); /** * Run RADICAL. * * @param matX Input data into the algorithm - a matrix where each column is * a point and each row is a dimension. * @param matY Estimated independent components - a matrix where each column * is a point and each row is an estimated independent component. * @param matW Estimated unmixing matrix, where matY = matW * matX. */ void DoRadical(const arma::mat& matX, arma::mat& matY, arma::mat& matW); /** * Vasicek's m-spacing estimator of entropy, with overlap modification from * (Learned-Miller and Fisher, 2003). * * @param x Empirical sample (one-dimensional) over which to estimate entropy. */ double Vasicek(arma::vec& x) const; /** * Make replicates of each data point (the number of replicates is set in * either the constructor or with Replicates()) and perturb data with Gaussian * noise with standard deviation noiseStdDev. */ void CopyAndPerturb(arma::mat& xNew, const arma::mat& x) const; //! Two-dimensional version of RADICAL. double DoRadical2D(const arma::mat& matX); //! Get the standard deviation of the additive Gaussian noise. double NoiseStdDev() const { return noiseStdDev; } //! Modify the standard deviation of the additive Gaussian noise. double& NoiseStdDev() { return noiseStdDev; } //! Get the number of Gaussian-perturbed replicates used per point. size_t Replicates() const { return replicates; } //! Modify the number of Gaussian-perturbed replicates used per point. size_t& Replicates() { return replicates; } //! Get the number of angles considered during brute-force search. size_t Angles() const { return angles; } //! Modify the number of angles considered during brute-force search. size_t& Angles() { return angles; } //! Get the number of sweeps. size_t Sweeps() const { return sweeps; } //! Modify the number of sweeps. size_t& Sweeps() { return sweeps; } // Returns a string representation of this object. std::string ToString() const; private: //! Standard deviation of the Gaussian noise added to the replicates of //! the data points during Radical2D. double noiseStdDev; //! Number of Gaussian-perturbed replicates to use (per point) in Radical2D. size_t replicates; //! Number of angles to consider in brute-force search during Radical2D. size_t angles; //! Number of sweeps; each sweep calls Radical2D once for each pair of //! dimensions. size_t sweeps; //! Value of m to use for Vasicek's m-spacing estimator of entropy. size_t m; //! Internal matrix, held as member variable to prevent memory reallocations. arma::mat perturbed; //! Internal matrix, held as member variable to prevent memory reallocations. arma::mat candidate; }; void WhitenFeatureMajorMatrix(const arma::mat& matX, arma::mat& matXWhitened, arma::mat& matWhitening); }; // namespace radical }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/naive_bayes/0000755000176200001440000000000013647512751021660 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/naive_bayes/naive_bayes_classifier.hpp0000644000176200001440000001112713647512751027064 0ustar liggesusers/** * @file naive_bayes_classifier.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * A Naive Bayes Classifier which parametrically estimates the distribution of * the features. It is assumed that the features have been sampled from a * Gaussian PDF. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_HPP #define __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_HPP #include #include namespace mlpack { namespace naive_bayes /** The Naive Bayes Classifier. */ { /** * The simple Naive Bayes classifier. This class trains on the data by * calculating the sample mean and variance of the features with respect to each * of the labels, and also the class probabilities. The class labels are * assumed to be positive integers (starting with 0), and are expected to be the * last row of the data input to the constructor. * * Mathematically, it computes P(X_i = x_i | Y = y_j) for each feature X_i for * each of the labels y_j. Alongwith this, it also computes the classs * probabilities P(Y = y_j). * * For classifying a data point (x_1, x_2, ..., x_n), it computes the following: * arg max_y(P(Y = y)*P(X_1 = x_1 | Y = y) * ... * P(X_n = x_n | Y = y)) * * Example use: * * @code * extern arma::mat training_data, testing_data; * NaiveBayesClassifier<> nbc(training_data, 5); * arma::vec results; * * nbc.Classify(testing_data, results); * @endcode */ template class NaiveBayesClassifier { private: //! Sample mean for each class. MatType means; //! Sample variances for each class. MatType variances; //! Class probabilities. arma::vec probabilities; public: /** * Initializes the classifier as per the input and then trains it by * calculating the sample mean and variances. The input data is expected to * have integer labels as the last row (starting with 0 and not greater than * the number of classes). * * Example use: * @code * extern arma::mat training_data, testing_data; * extern arma::Col labels; * NaiveBayesClassifier nbc(training_data, labels, 5); * @endcode * * @param data Training data points. * @param labels Labels corresponding to training data points. * @param classes Number of classes in this classifier. * @param incrementalVariance If true, an incremental algorithm is used to * calculate the variance; this can prevent loss of precision in some * cases, but will be somewhat slower to calculate. */ NaiveBayesClassifier(const MatType& data, const arma::Col& labels, const size_t classes, const bool incrementalVariance = false); /** * Given a bunch of data points, this function evaluates the class of each of * those data points, and puts it in the vector 'results'. * * @code * arma::mat test_data; // each column is a test point * arma::Col results; * ... * nbc.Classify(test_data, &results); * @endcode * * @param data List of data points. * @param results Vector that class predictions will be placed into. */ void Classify(const MatType& data, arma::Col& results); //! Get the sample means for each class. const MatType& Means() const { return means; } //! Modify the sample means for each class. MatType& Means() { return means; } //! Get the sample variances for each class. const MatType& Variances() const { return variances; } //! Modify the sample variances for each class. MatType& Variances() { return variances; } //! Get the prior probabilities for each class. const arma::vec& Probabilities() const { return probabilities; } //! Modify the prior probabilities for each class. arma::vec& Probabilities() { return probabilities; } }; }; // namespace naive_bayes }; // namespace mlpack // Include implementation. #include "naive_bayes_classifier_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/naive_bayes/naive_bayes_classifier_impl.hpp0000644000176200001440000001313113647512751030102 0ustar liggesusers/** * @file naive_bayes_classifier_impl.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * @author Vahab Akbarzadeh (v.akbarzadeh@gmail.com) * * A Naive Bayes Classifier which parametrically estimates the distribution of * the features. This classifier makes its predictions based on the assumption * that the features have been sampled from a set of Gaussians with diagonal * covariance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_IMPL_HPP #define __MLPACK_METHODS_NAIVE_BAYES_NAIVE_BAYES_CLASSIFIER_IMPL_HPP #include // In case it hasn't been included already. #include "naive_bayes_classifier.hpp" namespace mlpack { namespace naive_bayes { template NaiveBayesClassifier::NaiveBayesClassifier( const MatType& data, const arma::Col& labels, const size_t classes, const bool incrementalVariance) { const size_t dimensionality = data.n_rows; // Update the variables according to the number of features and classes // present in the data. probabilities.zeros(classes); means.zeros(dimensionality, classes); variances.zeros(dimensionality, classes); Log::Info << "Training Naive Bayes classifier on " << data.n_cols << " examples with " << dimensionality << " features each." << std::endl; // Calculate the class probabilities as well as the sample mean and variance // for each of the features with respect to each of the labels. if (incrementalVariance) { // Use incremental algorithm. for (size_t j = 0; j < data.n_cols; ++j) { const size_t label = labels[j]; ++probabilities[label]; arma::vec delta = data.col(j) - means.col(label); means.col(label) += delta / probabilities[label]; variances.col(label) += delta % (data.col(j) - means.col(label)); } for (size_t i = 0; i < classes; ++i) { if (probabilities[i] > 2) variances.col(i) /= (probabilities[i] - 1); } } else { // Don't use incremental algorithm. This is a two-pass algorithm. It is // possible to calculate the means and variances using a faster one-pass // algorithm but there are some precision and stability issues. If this is // too slow, it's an option to use the faster algorithm by default and then // have this (and the incremental algorithm) be other options. // Calculate the means. for (size_t j = 0; j < data.n_cols; ++j) { const size_t label = labels[j]; ++probabilities[label]; means.col(label) += data.col(j); } // Normalize means. for (size_t i = 0; i < classes; ++i) if (probabilities[i] != 0.0) means.col(i) /= probabilities[i]; // Calculate variances. for (size_t j = 0; j < data.n_cols; ++j) { const size_t label = labels[j]; variances.col(label) += square(data.col(j) - means.col(label)); } // Normalize variances. for (size_t i = 0; i < classes; ++i) if (probabilities[i] > 1) variances.col(i) /= (probabilities[i] - 1); } // Ensure that the variances are invertible. for (size_t i = 0; i < variances.n_elem; ++i) if (variances[i] == 0.0) variances[i] = 1e-50; probabilities /= data.n_cols; } template void NaiveBayesClassifier::Classify(const MatType& data, arma::Col& results) { // Check that the number of features in the test data is same as in the // training data. Log::Assert(data.n_rows == means.n_rows); arma::vec probs = arma::log(probabilities); arma::mat invVar = 1.0 / variances; arma::mat testProbs = arma::repmat(probs.t(), data.n_cols, 1); results.set_size(data.n_cols); // No need to fill with anything yet. Log::Info << "Running Naive Bayes classifier on " << data.n_cols << " data points with " << data.n_rows << " features each." << std::endl; // Calculate the joint probability for each of the data points for each of the // means.n_cols. // Loop over every class. for (size_t i = 0; i < means.n_cols; i++) { // This is an adaptation of gmm::phi() for the case where the covariance is // a diagonal matrix. arma::mat diffs = data - arma::repmat(means.col(i), 1, data.n_cols); arma::mat rhs = -0.5 * arma::diagmat(invVar.col(i)) * diffs; arma::vec exponents(diffs.n_cols); for (size_t j = 0; j < diffs.n_cols; ++j) exponents(j) = std::exp(arma::accu(diffs.col(j) % rhs.unsafe_col(j))); testProbs.col(i) += log(pow(2 * M_PI, (double) data.n_rows / -2.0) * pow(det(arma::diagmat(invVar.col(i))), -0.5) * exponents); } // Now calculate the label. for (size_t i = 0; i < data.n_cols; ++i) { // Find the index of the class with maximum probability for this point. arma::uword maxIndex = 0; arma::vec pointProbs = testProbs.row(i).t(); pointProbs.max(maxIndex); results[i] = maxIndex; } return; } }; // namespace naive_bayes }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/nca/0000755000176200001440000000000013647512751020134 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/nca/nca_softmax_error_function_impl.hpp0000644000176200001440000002416713647512751027320 0ustar liggesusers/** * @file nca_softmax_impl.h * @author Ryan Curtin * * Implementation of the Softmax error function. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTCLIN_IMPL_H #define __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTCLIN_IMPL_H // In case it hasn't been included already. #include "nca_softmax_error_function.hpp" namespace mlpack { namespace nca { // Initialize with the given kernel. template SoftmaxErrorFunction::SoftmaxErrorFunction( const arma::mat& dataset, const arma::Col& labels, MetricType metric) : dataset(dataset), labels(labels), metric(metric), precalculated(false) { /* nothing to do */ } //! The non-separable implementation, which uses Precalculate() to save time. template double SoftmaxErrorFunction::Evaluate(const arma::mat& coordinates) { // Calculate the denominators and numerators, if necessary. Precalculate(coordinates); return -accu(p); // Sum of p_i for all i. We negate because our solver // minimizes, not maximizes. }; //! The separated objective function, which does not use Precalculate(). template double SoftmaxErrorFunction::Evaluate(const arma::mat& coordinates, const size_t i) { // Unfortunately each evaluation will take O(N) time because it requires a // scan over all points in the dataset. Our objective is to compute p_i. double denominator = 0; double numerator = 0; // It's quicker to do this now than one point at a time later. stretchedDataset = coordinates * dataset; for (size_t k = 0; k < dataset.n_cols; ++k) { // Don't consider the case where the points are the same. if (k == i) continue; // We want to evaluate exp(-D(A x_i, A x_k)). double eval = std::exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(k))); // If they are in the same if (labels[i] == labels[k]) numerator += eval; denominator += eval; } // Now the result is just a simple division, but we have to be sure that the // denominator is not 0. if (denominator == 0.0) { Rcpp::Rcout << "Denominator of p_" << i << " is 0!" << std::endl; return 0; } return -(numerator / denominator); // Negate because the optimizer is a // minimizer. } //! The non-separable implementation, where Precalculate() is used. template void SoftmaxErrorFunction::Gradient(const arma::mat& coordinates, arma::mat& gradient) { // Calculate the denominators and numerators, if necessary. Precalculate(coordinates); // Now, we handle the summation over i: // sum_i (p_i sum_k (p_ik x_ik x_ik^T) - // sum_{j in class of i} (p_ij x_ij x_ij^T) // We can algebraically manipulate the whole thing to produce a more // memory-friendly way to calculate this. Looping over each i and k (again // O((n * (n + 1)) / 2) as with the last step, we can add the following to the // sum: // // if class of i is the same as the class of k, add // (((p_i - (1 / p_i)) p_ik) + ((p_k - (1 / p_k)) p_ki)) x_ik x_ik^T // otherwise, add // (p_i p_ik + p_k p_ki) x_ik x_ik^T arma::mat sum; sum.zeros(stretchedDataset.n_rows, stretchedDataset.n_rows); for (size_t i = 0; i < stretchedDataset.n_cols; i++) { for (size_t k = (i + 1); k < stretchedDataset.n_cols; k++) { // Calculate p_ik and p_ki first. double eval = exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(k))); double p_ik = 0, p_ki = 0; p_ik = eval / denominators(i); p_ki = eval / denominators(k); // Subtract x_i from x_k. We are not using stretched points here. arma::vec x_ik = dataset.col(i) - dataset.col(k); arma::mat secondTerm = (x_ik * trans(x_ik)); if (labels[i] == labels[k]) sum += ((p[i] - 1) * p_ik + (p[k] - 1) * p_ki) * secondTerm; else sum += (p[i] * p_ik + p[k] * p_ki) * secondTerm; } } // Assemble the final gradient. gradient = -2 * coordinates * sum; } //! The separable implementation. template void SoftmaxErrorFunction::Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) { // We will need to calculate p_i before this evaluation is done, so these two // variables will hold the information necessary for that. double numerator = 0; double denominator = 0; // The gradient involves two matrix terms which are eventually combined into // one. arma::mat firstTerm; arma::mat secondTerm; firstTerm.zeros(coordinates.n_rows, coordinates.n_cols); secondTerm.zeros(coordinates.n_rows, coordinates.n_cols); // Compute the stretched dataset. stretchedDataset = coordinates * dataset; for (size_t k = 0; k < dataset.n_cols; ++k) { // Don't consider the case where the points are the same. if (i == k) continue; // Calculate the numerator of p_ik. double eval = exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(k))); // If the points are in the same class, we must add to the second term of // the gradient as well as the numerator of p_i. We will divide by the // denominator of p_ik later. For x_ik we are not using stretched points. arma::vec x_ik = dataset.col(i) - dataset.col(k); if (labels[i] == labels[k]) { numerator += eval; secondTerm += eval * x_ik * trans(x_ik); } // We always have to add to the denominator of p_i and the first term of the // gradient computation. We will divide by the denominator of p_ik later. denominator += eval; firstTerm += eval * x_ik * trans(x_ik); } // Calculate p_i. double p = 0; if (denominator == 0) { Rcpp::Rcout << "Denominator of p_" << i << " is 0!" << std::endl; // If the denominator is zero, then all p_ik should be zero and there is // no gradient contribution from this point. gradient.zeros(coordinates.n_rows, coordinates.n_rows); return; } else { p = numerator / denominator; firstTerm /= denominator; secondTerm /= denominator; } // Now multiply the first term by p_i, and add the two together and multiply // all by 2 * A. We negate it though, because our optimizer is a minimizer. gradient = -2 * coordinates * (p * firstTerm - secondTerm); } template const arma::mat SoftmaxErrorFunction::GetInitialPoint() const { return arma::eye(dataset.n_rows, dataset.n_rows); } template void SoftmaxErrorFunction::Precalculate( const arma::mat& coordinates) { // Ensure it is the right size. lastCoordinates.set_size(coordinates.n_rows, coordinates.n_cols); // Make sure the calculation is necessary. if ((accu(coordinates == lastCoordinates) == coordinates.n_elem) && precalculated) return; // No need to calculate; we already have this stuff saved. // Coordinates are different; save the new ones, and stretch the dataset. lastCoordinates = coordinates; stretchedDataset = coordinates * dataset; // For each point i, we must evaluate the softmax function: // p_ij = exp( -K(x_i, x_j) ) / ( sum_{k != i} ( exp( -K(x_i, x_k) ))) // p_i = sum_{j in class of i} p_ij // We will do this by keeping track of the denominators for each i as well as // the numerators (the sum for all j in class of i). This will be on the // order of O((n * (n + 1)) / 2), which really isn't all that great. p.zeros(stretchedDataset.n_cols); denominators.zeros(stretchedDataset.n_cols); for (size_t i = 0; i < stretchedDataset.n_cols; i++) { for (size_t j = (i + 1); j < stretchedDataset.n_cols; j++) { // Evaluate exp(-d(x_i, x_j)). double eval = exp(-metric.Evaluate(stretchedDataset.unsafe_col(i), stretchedDataset.unsafe_col(j))); // Add this to the denominators of both p_i and p_j: K(i, j) = K(j, i). denominators[i] += eval; denominators[j] += eval; // If i and j are the same class, add to numerator of both. if (labels[i] == labels[j]) { p[i] += eval; p[j] += eval; } } } // Divide p_i by their denominators. p /= denominators; // Clean up any bad values. for (size_t i = 0; i < stretchedDataset.n_cols; i++) { if (denominators[i] == 0.0) { Rcpp::Rcout << "Denominator of p_{" << i << ", j} is 0." << std::endl; // Set to usable values. denominators[i] = std::numeric_limits::infinity(); p[i] = 0; } } // We've done a precalculation. Mark it as done. precalculated = true; } template std::string SoftmaxErrorFunction::ToString() const{ std::ostringstream convert; convert << "Sofmax Error Function [" << this << "]" << std::endl; convert << " Dataset: " << dataset.n_rows << "x" << dataset.n_cols << std::endl; convert << " Labels: " << labels.n_elem << std::endl; //convert << "Metric: " << metric << std::endl; convert << " Precalculated: " << precalculated << std::endl; return convert.str(); } }; // namespace nca }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/nca/nca_impl.hpp0000644000176200001440000000461113647512751022431 0ustar liggesusers/** * @file nca_impl.hpp * @author Ryan Curtin * * Implementation of templated NCA class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_IMPL_HPP #define __MLPACK_METHODS_NCA_NCA_IMPL_HPP // In case it was not already included. #include "nca.hpp" namespace mlpack { namespace nca { // Just set the internal matrix reference. template class OptimizerType> NCA::NCA(const arma::mat& dataset, const arma::Col& labels, MetricType metric) : dataset(dataset), labels(labels), metric(metric), errorFunction(dataset, labels, metric), optimizer(OptimizerType >(errorFunction)) { /* Nothing to do. */ } template class OptimizerType> void NCA::LearnDistance(arma::mat& outputMatrix) { // See if we were passed an initialized matrix. if ((outputMatrix.n_rows != dataset.n_rows) || (outputMatrix.n_cols != dataset.n_rows)) outputMatrix.eye(dataset.n_rows, dataset.n_rows); //Timer::Start("nca_sgd_optimization"); optimizer.Optimize(outputMatrix); //Timer::Stop("nca_sgd_optimization"); } template class OptimizerType> std::string NCA::ToString() const { std::ostringstream convert; convert << "NCA [" << this << "]" << std::endl; convert << " Dataset: " << dataset.n_rows << "x" << dataset.n_cols << std::endl; convert << " Metric: " << std::endl << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace nca }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/nca/nca.hpp0000644000176200001440000001040313647512751021404 0ustar liggesusers/** * @file nca.hpp * @author Ryan Curtin * * Declaration of NCA class (Neighborhood Components Analysis). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_HPP #define __MLPACK_METHODS_NCA_NCA_HPP #include #include #include #include "nca_softmax_error_function.hpp" namespace mlpack { namespace nca /** Neighborhood Components Analysis. */ { /** * An implementation of Neighborhood Components Analysis, both a linear * dimensionality reduction technique and a distance learning technique. The * method seeks to improve k-nearest-neighbor classification on a dataset by * scaling the dimensions. The method is nonparametric, and does not require a * value of k. It works by using stochastic ("soft") neighbor assignments and * using optimization techniques over the gradient of the accuracy of the * neighbor assignments. * * For more details, see the following published paper: * * @code * @inproceedings{Goldberger2004, * author = {Goldberger, Jacob and Roweis, Sam and Hinton, Geoff and * Salakhutdinov, Ruslan}, * booktitle = {Advances in Neural Information Processing Systems 17}, * pages = {513--520}, * publisher = {MIT Press}, * title = {{Neighbourhood Components Analysis}}, * year = {2004} * } * @endcode */ template class OptimizerType = optimization::SGD> class NCA { public: /** * Construct the Neighborhood Components Analysis object. This simply stores * the reference to the dataset and labels as well as the parameters for * optimization before the actual optimization is performed. * * @param dataset Input dataset. * @param labels Input dataset labels. * @param stepSize Step size for stochastic gradient descent. * @param maxIterations Maximum iterations for stochastic gradient descent. * @param tolerance Tolerance for termination of stochastic gradient descent. * @param shuffle Whether or not to shuffle the dataset during SGD. * @param metric Instantiated metric to use. */ NCA(const arma::mat& dataset, const arma::Col& labels, MetricType metric = MetricType()); /** * Perform Neighborhood Components Analysis. The output distance learning * matrix is written into the passed reference. If LearnDistance() is called * with an outputMatrix which has the correct size (dataset.n_rows x * dataset.n_rows), that matrix will be used as the starting point for * optimization. * * @param output_matrix Covariance matrix of Mahalanobis distance. */ void LearnDistance(arma::mat& outputMatrix); //! Get the dataset reference. const arma::mat& Dataset() const { return dataset; } //! Get the labels reference. const arma::Col& Labels() const { return labels; } //! Get the optimizer. const OptimizerType >& Optimizer() const { return optimizer; } OptimizerType >& Optimizer() { return optimizer; } // Returns a string representation of this object. std::string ToString() const; private: //! Dataset reference. const arma::mat& dataset; //! Labels reference. const arma::Col& labels; //! Metric to be used. MetricType metric; //! The function to optimize. SoftmaxErrorFunction errorFunction; //! The optimizer to use. OptimizerType > optimizer; }; }; // namespace nca }; // namespace mlpack // Include the implementation. #include "nca_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/nca/nca_softmax_error_function.hpp0000644000176200001440000001466113647512751026275 0ustar liggesusers/** * @file nca_softmax_error_function.hpp * @author Ryan Curtin * * Implementation of the stochastic neighbor assignment probability error * function (the "softmax error"). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTION_HPP #define __MLPACK_METHODS_NCA_NCA_SOFTMAX_ERROR_FUNCTION_HPP #include namespace mlpack { namespace nca { /** * The "softmax" stochastic neighbor assignment probability function. * * The actual function is * * p_ij = (exp(-|| A x_i - A x_j || ^ 2)) / * (sum_{k != i} (exp(-|| A x_i - A x_k || ^ 2))) * * where x_n represents a point and A is the current scaling matrix. * * This class is more flexible than the original paper, allowing an arbitrary * metric function to be used in place of || A x_i - A x_j ||^2, meaning that * the squared Euclidean distance is not the only allowed metric for NCA. * However, that is probably the best way to use this class. * * In addition to the standard Evaluate() and Gradient() functions which MLPACK * optimizers use, overloads of Evaluate() and Gradient() are given which only * operate on one point in the dataset. This is useful for optimizers like * stochastic gradient descent (see mlpack::optimization::SGD). */ template class SoftmaxErrorFunction { public: /** * Initialize with the given kernel; useful when the kernel has some state to * store, which is set elsewhere. If no kernel is given, an empty kernel is * used; this way, you can call the constructor with no arguments. A * reference to the dataset we will be optimizing over is also required. * * @param dataset Matrix containing the dataset. * @param labels Vector of class labels for each point in the dataset. * @param kernel Instantiated kernel (optional). */ SoftmaxErrorFunction(const arma::mat& dataset, const arma::Col& labels, MetricType metric = MetricType()); /** * Evaluate the softmax function for the given covariance matrix. This is the * non-separable implementation, where the objective function is not * decomposed into the sum of several objective functions. * * @param covariance Covariance matrix of Mahalanobis distance. */ double Evaluate(const arma::mat& covariance); /** * Evaluate the softmax objective function for the given covariance matrix on * only one point of the dataset. This is the separable implementation, where * the objective function is decomposed into the sum of many objective * functions, and here, only one of those constituent objective functions is * returned. * * @param covariance Covariance matrix of Mahalanobis distance. * @param i Index of point to use for objective function. */ double Evaluate(const arma::mat& covariance, const size_t i); /** * Evaluate the gradient of the softmax function for the given covariance * matrix. This is the non-separable implementation, where the objective * function is not decomposed into the sum of several objective functions. * * @param covariance Covariance matrix of Mahalanobis distance. * @param gradient Matrix to store the calculated gradient in. */ void Gradient(const arma::mat& covariance, arma::mat& gradient); /** * Evaluate the gradient of the softmax function for the given covariance * matrix on only one point of the dataset. This is the separable * implementation, where the objective function is decomposed into the sum of * many objective functions, and here, only one of those constituent objective * functions is returned. * * @param covariance Covariance matrix of Mahalanobis distance. * @param i Index of point to use for objective function. * @param gradient Matrix to store the calculated gradient in. */ void Gradient(const arma::mat& covariance, const size_t i, arma::mat& gradient); /** * Get the initial point. */ const arma::mat GetInitialPoint() const; /** * Get the number of functions the objective function can be decomposed into. * This is just the number of points in the dataset. */ size_t NumFunctions() const { return dataset.n_cols; } // convert the obkect into a string std::string ToString() const; private: //! The dataset. const arma::mat& dataset; //! Labels for each point in the dataset. const arma::Col& labels; //! The instantiated metric. MetricType metric; //! Last coordinates. Used for the non-separable Evaluate() and Gradient(). arma::mat lastCoordinates; //! Stretched dataset. Kept internal to avoid memory reallocations. arma::mat stretchedDataset; //! Holds calculated p_i, for the non-separable Evaluate() and Gradient(). arma::vec p; //! Holds denominators for calculation of p_ij, for the non-separable //! Evaluate() and Gradient(). arma::vec denominators; //! False if nothing has ever been precalculated (only at construction time). bool precalculated; /** * Precalculate the denominators and numerators that will make up the p_ij, * but only if the coordinates matrix is different than the last coordinates * the Precalculate() method was run with. This method is only called by the * non-separable Evaluate() and Gradient(). * * This will update last_coordinates_ and stretched_dataset_, and also * calculate the p_i and denominators_ which are used in the calculation of * p_i or p_ij. The calculation will be O((n * (n + 1)) / 2), which is not * great. * * @param coordinates Coordinates matrix to use for precalculation. */ void Precalculate(const arma::mat& coordinates); }; }; // namespace nca }; // namespace mlpack // Include implementation. #include "nca_softmax_error_function_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/hmm/0000755000176200001440000000000013647512751020154 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/hmm/hmm.hpp0000644000176200001440000003531713647512751021457 0ustar liggesusers/** * @file hmm.hpp * @author Ryan Curtin * @author Tran Quoc Long * * Definition of HMM class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_HPP #define __MLPACK_METHODS_HMM_HMM_HPP #include namespace mlpack { namespace hmm /** Hidden Markov Models. */ { /** * A class that represents a Hidden Markov Model with an arbitrary type of * emission distribution. This HMM class supports training (supervised and * unsupervised), prediction of state sequences via the Viterbi algorithm, * estimation of state probabilities, generation of random sequences, and * calculation of the log-likelihood of a given sequence. * * The template parameter, Distribution, specifies the distribution which the * emissions follow. The class should implement the following functions: * * @code * class Distribution * { * public: * // The type of observation used by this distribution. * typedef something DataType; * * // Return the probability of the given observation. * double Probability(const DataType& observation) const; * * // Estimate the distribution based on the given observations. * void Estimate(const std::vector& observations); * * // Estimate the distribution based on the given observations, given also * // the probability of each observation coming from this distribution. * void Estimate(const std::vector& observations, * const std::vector& probabilities); * }; * @endcode * * See the mlpack::distribution::DiscreteDistribution class for an example. One * would use the DiscreteDistribution class when the observations are * non-negative integers. Other distributions could be Gaussians, a mixture of * Gaussians (GMM), or any other probability distribution implementing the * four Distribution functions. * * Usage of the HMM class generally involves either training an HMM or loading * an already-known HMM and taking probability measurements of sequences. * Example code for supervised training of a Gaussian HMM (that is, where the * emission output distribution is a single Gaussian for each hidden state) is * given below. * * @code * extern arma::mat observations; // Each column is an observation. * extern arma::Col states; // Hidden states for each observation. * // Create an untrained HMM with 5 hidden states and default (N(0, 1)) * // Gaussian distributions with the dimensionality of the dataset. * HMM hmm(5, GaussianDistribution(observations.n_rows)); * * // Train the HMM (the labels could be omitted to perform unsupervised * // training). * hmm.Train(observations, states); * @endcode * * Once initialized, the HMM can evaluate the probability of a certain sequence * (with LogLikelihood()), predict the most likely sequence of hidden states * (with Predict()), generate a sequence (with Generate()), or estimate the * probabilities of each state for a sequence of observations (with Estimate()). * * @tparam Distribution Type of emission distribution for this HMM. */ template class HMM { public: /** * Create the Hidden Markov Model with the given number of hidden states and * the given default distribution for emissions. The dimensionality of the * observations is taken from the emissions variable, so it is important that * the given default emission distribution is set with the correct * dimensionality. Alternately, set the dimensionality with Dimensionality(). * Optionally, the tolerance for convergence of the Baum-Welch algorithm can * be set. * * By default, the transition matrix and initial probability vector are set to * contain equal probability for each state. * * @param states Number of states. * @param emissions Default distribution for emissions. * @param tolerance Tolerance for convergence of training algorithm * (Baum-Welch). */ HMM(const size_t states, const Distribution emissions, const double tolerance = 1e-5); /** * Create the Hidden Markov Model with the given initial probability vector, * the given transition matrix, and the given emission distributions. The * dimensionality of the observations of the HMM are taken from the given * emission distributions. Alternately, the dimensionality can be set with * Dimensionality(). * * The initial state probability vector should have length equal to the number * of states, and each entry represents the probability of being in the given * state at time T = 0 (the beginning of a sequence). * * The transition matrix should be such that T(i, j) is the probability of * transition to state i from state j. The columns of the matrix should sum * to 1. * * The emission matrix should be such that E(i, j) is the probability of * emission i while in state j. The columns of the matrix should sum to 1. * * Optionally, the tolerance for convergence of the Baum-Welch algorithm can * be set. * * @param initial Initial state probabilities. * @param transition Transition matrix. * @param emission Emission distributions. * @param tolerance Tolerance for convergence of training algorithm * (Baum-Welch). */ HMM(const arma::vec& initial, const arma::mat& transition, const std::vector& emission, const double tolerance = 1e-5); /** * Train the model using the Baum-Welch algorithm, with only the given * unlabeled observations. Instead of giving a guess transition and emission * matrix here, do that in the constructor. Each matrix in the vector of data * sequences holds an individual data sequence; each point in each individual * data sequence should be a column in the matrix. The number of rows in each * matrix should be equal to the dimensionality of the HMM (which is set in * the constructor). * * It is preferable to use the other overload of Train(), with labeled data. * That will produce much better results. However, if labeled data is * unavailable, this will work. In addition, it is possible to use Train() * with labeled data first, and then continue to train the model using this * overload of Train() with unlabeled data. * * The tolerance of the Baum-Welch algorithm can be set either in the * constructor or with the Tolerance() method. When the change in * log-likelihood of the model between iterations is less than the tolerance, * the Baum-Welch algorithm terminates. * * @note * Train() can be called multiple times with different sequences; each time it * is called, it uses the current parameters of the HMM as a starting point * for training. * @endnote * * @param dataSeq Vector of observation sequences. */ void Train(const std::vector& dataSeq); /** * Train the model using the given labeled observations; the transition and * emission matrices are directly estimated. Each matrix in the vector of * data sequences corresponds to a vector in the vector of state sequences. * Each point in each individual data sequence should be a column in the * matrix, and its state should be the corresponding element in the state * sequence vector. For instance, dataSeq[0].col(3) corresponds to the fourth * observation in the first data sequence, and its state is stateSeq[0][3]. * The number of rows in each matrix should be equal to the dimensionality of * the HMM (which is set in the constructor). * * @note * Train() can be called multiple times with different sequences; each time it * is called, it uses the current parameters of the HMM as a starting point * for training. * @endnote * * @param dataSeq Vector of observation sequences. * @param stateSeq Vector of state sequences, corresponding to each * observation. */ void Train(const std::vector& dataSeq, const std::vector >& stateSeq); /** * Estimate the probabilities of each hidden state at each time step for each * given data observation, using the Forward-Backward algorithm. Each matrix * which is returned has columns equal to the number of data observations, and * rows equal to the number of hidden states in the model. The log-likelihood * of the most probable sequence is returned. * * @param dataSeq Sequence of observations. * @param stateProb Matrix in which the probabilities of each state at each * time interval will be stored. * @param forwardProb Matrix in which the forward probabilities of each state * at each time interval will be stored. * @param backwardProb Matrix in which the backward probabilities of each * state at each time interval will be stored. * @param scales Vector in which the scaling factors at each time interval * will be stored. * @return Log-likelihood of most likely state sequence. */ double Estimate(const arma::mat& dataSeq, arma::mat& stateProb, arma::mat& forwardProb, arma::mat& backwardProb, arma::vec& scales) const; /** * Estimate the probabilities of each hidden state at each time step of each * given data observation, using the Forward-Backward algorithm. The returned * matrix of state probabilities has columns equal to the number of data * observations, and rows equal to the number of hidden states in the model. * The log-likelihood of the most probable sequence is returned. * * @param dataSeq Sequence of observations. * @param stateProb Probabilities of each state at each time interval. * @return Log-likelihood of most likely state sequence. */ double Estimate(const arma::mat& dataSeq, arma::mat& stateProb) const; /** * Generate a random data sequence of the given length. The data sequence is * stored in the dataSequence parameter, and the state sequence is stored in * the stateSequence parameter. Each column of dataSequence represents a * random observation. * * @param length Length of random sequence to generate. * @param dataSequence Vector to store data in. * @param stateSequence Vector to store states in. * @param startState Hidden state to start sequence in (default 0). */ void Generate(const size_t length, arma::mat& dataSequence, arma::Col& stateSequence, const size_t startState = 0) const; /** * Compute the most probable hidden state sequence for the given data * sequence, using the Viterbi algorithm, returning the log-likelihood of the * most likely state sequence. * * @param dataSeq Sequence of observations. * @param stateSeq Vector in which the most probable state sequence will be * stored. * @return Log-likelihood of most probable state sequence. */ double Predict(const arma::mat& dataSeq, arma::Col& stateSeq) const; /** * Compute the log-likelihood of the given data sequence. * * @param dataSeq Data sequence to evaluate the likelihood of. * @return Log-likelihood of the given sequence. */ double LogLikelihood(const arma::mat& dataSeq) const; //! Return the vector of initial state probabilities. const arma::vec& Initial() const { return initial; } //! Modify the vector of initial state probabilities. arma::vec& Initial() { return initial; } //! Return the transition matrix. const arma::mat& Transition() const { return transition; } //! Return a modifiable transition matrix reference. arma::mat& Transition() { return transition; } //! Return the emission distributions. const std::vector& Emission() const { return emission; } //! Return a modifiable emission probability matrix reference. std::vector& Emission() { return emission; } //! Get the dimensionality of observations. size_t Dimensionality() const { return dimensionality; } //! Set the dimensionality of observations. size_t& Dimensionality() { return dimensionality; } //! Get the tolerance of the Baum-Welch algorithm. double Tolerance() const { return tolerance; } //! Modify the tolerance of the Baum-Welch algorithm. double& Tolerance() { return tolerance; } /** * Returns a string representation of this object. */ std::string ToString() const; private: // Helper functions. /** * The Forward algorithm (part of the Forward-Backward algorithm). Computes * forward probabilities for each state for each observation in the given data * sequence. The returned matrix has rows equal to the number of hidden * states and columns equal to the number of observations. * * @param dataSeq Data sequence to compute probabilities for. * @param scales Vector in which scaling factors will be saved. * @param forwardProb Matrix in which forward probabilities will be saved. */ void Forward(const arma::mat& dataSeq, arma::vec& scales, arma::mat& forwardProb) const; /** * The Backward algorithm (part of the Forward-Backward algorithm). Computes * backward probabilities for each state for each observation in the given * data sequence, using the scaling factors found (presumably) by Forward(). * The returned matrix has rows equal to the number of hidden states and * columns equal to the number of observations. * * @param dataSeq Data sequence to compute probabilities for. * @param scales Vector of scaling factors. * @param backwardProb Matrix in which backward probabilities will be saved. */ void Backward(const arma::mat& dataSeq, const arma::vec& scales, arma::mat& backwardProb) const; //! Initial state probability vector. arma::vec initial; //! Transition probability matrix. arma::mat transition; //! Set of emission probability distributions; one for each state. std::vector emission; //! Dimensionality of observations. size_t dimensionality; //! Tolerance of Baum-Welch algorithm. double tolerance; }; }; // namespace hmm }; // namespace mlpack // Include implementation. #include "hmm_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/hmm/hmm_util_impl.hpp0000644000176200001440000001560113647512751023527 0ustar liggesusers/** * @file hmm_util_impl.hpp * @author Ryan Curtin * * Implementation of HMM load/save functions. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_UTIL_IMPL_HPP #define __MLPACK_METHODS_HMM_HMM_UTIL_IMPL_HPP // In case it hasn't already been included. #include "hmm_util.hpp" #include namespace mlpack { namespace hmm { template void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr) { Rcpp::Rcout << "HMM save not implemented for arbitrary distributions." << std::endl; } template<> void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr) { std::string type = "discrete"; size_t states = hmm.Transition().n_rows; sr.SaveParameter(type, "hmm_type"); sr.SaveParameter(states, "hmm_states"); sr.SaveParameter(hmm.Transition(), "hmm_transition"); // Now the emissions. for (size_t i = 0; i < states; ++i) { // Generate name. std::stringstream s; s << "hmm_emission_distribution_" << i; sr.SaveParameter(hmm.Emission()[i].Probabilities(), s.str()); } } template<> void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr) { std::string type = "gaussian"; size_t states = hmm.Transition().n_rows; sr.SaveParameter(type, "hmm_type"); sr.SaveParameter(states, "hmm_states"); sr.SaveParameter(hmm.Transition(), "hmm_transition"); // Now the emissions. for (size_t i = 0; i < states; ++i) { // Generate name. std::stringstream s; s << "hmm_emission_mean_" << i; sr.SaveParameter(hmm.Emission()[i].Mean(), s.str()); s.str(""); s << "hmm_emission_covariance_" << i; sr.SaveParameter(hmm.Emission()[i].Covariance(), s.str()); } } template<> void SaveHMM(const HMM >& hmm, util::SaveRestoreUtility& sr) { std::string type = "gmm"; size_t states = hmm.Transition().n_rows; sr.SaveParameter(type, "hmm_type"); sr.SaveParameter(states, "hmm_states"); sr.SaveParameter(hmm.Transition(), "hmm_transition"); // Now the emissions. for (size_t i = 0; i < states; ++i) { // Generate name. std::stringstream s; s << "hmm_emission_" << i << "_gaussians"; sr.SaveParameter(hmm.Emission()[i].Gaussians(), s.str()); s.str(""); s << "hmm_emission_" << i << "_weights"; sr.SaveParameter(hmm.Emission()[i].Weights(), s.str()); for (size_t g = 0; g < hmm.Emission()[i].Gaussians(); ++g) { s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_mean"; sr.SaveParameter(hmm.Emission()[i].Means()[g], s.str()); s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_covariance"; sr.SaveParameter(hmm.Emission()[i].Covariances()[g], s.str()); } } } template void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr) { Rcpp::Rcout << "HMM load not implemented for arbitrary distributions." << std::endl; } template<> void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr) { std::string type; size_t states; sr.LoadParameter(type, "hmm_type"); if (type != "discrete") { Rcpp::Rcout << "Cannot load non-discrete HMM (of type " << type << ") as " << "discrete HMM!" << std::endl; } sr.LoadParameter(states, "hmm_states"); // Load transition matrix. sr.LoadParameter(hmm.Transition(), "hmm_transition"); // Now each emission distribution. hmm.Emission().resize(states); for (size_t i = 0; i < states; ++i) { std::stringstream s; s << "hmm_emission_distribution_" << i; sr.LoadParameter(hmm.Emission()[i].Probabilities(), s.str()); } hmm.Dimensionality() = 1; } template<> void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr) { std::string type; size_t states; sr.LoadParameter(type, "hmm_type"); if (type != "gaussian") { Rcpp::Rcout << "Cannot load non-Gaussian HMM (of type " << type << ") as " << "a Gaussian HMM!" << std::endl; } sr.LoadParameter(states, "hmm_states"); // Load transition matrix. sr.LoadParameter(hmm.Transition(), "hmm_transition"); // Now each emission distribution. hmm.Emission().resize(states); for (size_t i = 0; i < states; ++i) { std::stringstream s; s << "hmm_emission_mean_" << i; sr.LoadParameter(hmm.Emission()[i].Mean(), s.str()); s.str(""); s << "hmm_emission_covariance_" << i; sr.LoadParameter(hmm.Emission()[i].Covariance(), s.str()); } hmm.Dimensionality() = hmm.Emission()[0].Mean().n_elem; } template<> void LoadHMM(HMM >& hmm, util::SaveRestoreUtility& sr) { std::string type; size_t states; sr.LoadParameter(type, "hmm_type"); if (type != "gmm") { Rcpp::Rcout << "Cannot load non-GMM HMM (of type " << type << ") as " << "a Gaussian Mixture Model HMM!" << std::endl; } sr.LoadParameter(states, "hmm_states"); // Load transition matrix. sr.LoadParameter(hmm.Transition(), "hmm_transition"); // Now each emission distribution. hmm.Emission().resize(states, gmm::GMM<>(1, 1)); for (size_t i = 0; i < states; ++i) { std::stringstream s; s << "hmm_emission_" << i << "_gaussians"; size_t gaussians; sr.LoadParameter(gaussians, s.str()); s.str(""); // Extract dimensionality. arma::vec meanzero; s << "hmm_emission_" << i << "_gaussian_0_mean"; sr.LoadParameter(meanzero, s.str()); size_t dimensionality = meanzero.n_elem; // Initialize GMM correctly. hmm.Emission()[i].Gaussians() = gaussians; hmm.Emission()[i].Dimensionality() = dimensionality; for (size_t g = 0; g < gaussians; ++g) { s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_mean"; sr.LoadParameter(hmm.Emission()[i].Means()[g], s.str()); s.str(""); s << "hmm_emission_" << i << "_gaussian_" << g << "_covariance"; sr.LoadParameter(hmm.Emission()[i].Covariances()[g], s.str()); } s.str(""); s << "hmm_emission_" << i << "_weights"; sr.LoadParameter(hmm.Emission()[i].Weights(), s.str()); } hmm.Dimensionality() = hmm.Emission()[0].Dimensionality(); } }; // namespace hmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/hmm/hmm_util.hpp0000644000176200001440000000327313647512751022510 0ustar liggesusers/** * @file hmm_util.hpp * @author Ryan Curtin * * Save/load utilities for HMMs. This should be eventually merged into the HMM * class itself. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_UTIL_HPP #define __MLPACK_METHODS_HMM_HMM_UTIL_HPP #include "hmm.hpp" namespace mlpack { namespace hmm { /** * Save an HMM to file. This only works for GMMs, DiscreteDistributions, and * GaussianDistributions. * * @tparam Distribution Distribution type of HMM. * @param sr SaveRestoreUtility to use. */ template void SaveHMM(const HMM& hmm, util::SaveRestoreUtility& sr); /** * Load an HMM from file. This only works for GMMs, DiscreteDistributions, and * GaussianDistributions. * * @tparam Distribution Distribution type of HMM. * @param sr SaveRestoreUtility to use. */ template void LoadHMM(HMM& hmm, util::SaveRestoreUtility& sr); }; // namespace hmm }; // namespace mlpack // Include implementation. #include "hmm_util_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/hmm/hmm_impl.hpp0000644000176200001440000004601113647512751022471 0ustar liggesusers/** * @file hmm_impl.hpp * @author Ryan Curtin * @author Tran Quoc Long * * Implementation of HMM class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_HMM_IMPL_HPP #define __MLPACK_METHODS_HMM_HMM_IMPL_HPP // Just in case... #include "hmm.hpp" namespace mlpack { namespace hmm { /** * Create the Hidden Markov Model with the given number of hidden states and the * given number of emission states. */ template HMM::HMM(const size_t states, const Distribution emissions, const double tolerance) : initial(arma::ones(states) / (double) states), transition(arma::ones(states, states) / (double) states), emission(states, /* default distribution */ emissions), dimensionality(emissions.Dimensionality()), tolerance(tolerance) { /* nothing to do */ } /** * Create the Hidden Markov Model with the given transition matrix and the given * emission probability matrix. */ template HMM::HMM(const arma::vec& initial, const arma::mat& transition, const std::vector& emission, const double tolerance) : initial(initial), transition(transition), emission(emission), tolerance(tolerance) { // Set the dimensionality, if we can. if (emission.size() > 0) dimensionality = emission[0].Dimensionality(); else { Rcpp::Rcout << "HMM::HMM(): no emission distributions given; assuming a " << "dimensionality of 0 and hoping it gets set right later." << std::endl; dimensionality = 0; } } /** * Train the model using the Baum-Welch algorithm, with only the given unlabeled * observations. Each matrix in the vector of data sequences holds an * individual data sequence; each point in each individual data sequence should * be a column in the matrix. The number of rows in each matrix should be equal * to the dimensionality of the HMM. * * It is preferable to use the other overload of Train(), with labeled data. * That will produce much better results. However, if labeled data is * unavailable, this will work. In addition, it is possible to use Train() with * labeled data first, and then continue to train the model using this overload * of Train() with unlabeled data. * * @param dataSeq Set of data sequences to train on. */ template void HMM::Train(const std::vector& dataSeq) { // We should allow a guess at the transition and emission matrices. double loglik = 0; double oldLoglik = 0; // Maximum iterations? size_t iterations = 1000; // Find length of all sequences and ensure they are the correct size. size_t totalLength = 0; for (size_t seq = 0; seq < dataSeq.size(); seq++) { totalLength += dataSeq[seq].n_cols; if (dataSeq[seq].n_rows != dimensionality) Rcpp::Rcout << "HMM::Train(): data sequence " << seq << " has " << "dimensionality " << dataSeq[seq].n_rows << " (expected " << dimensionality << " dimensions)." << std::endl; } // These are used later for training of each distribution. We initialize it // all now so we don't have to do any allocation later on. std::vector emissionProb(transition.n_cols, arma::vec(totalLength)); arma::mat emissionList(dimensionality, totalLength); // This should be the Baum-Welch algorithm (EM for HMM estimation). This // follows the procedure outlined in Elliot, Aggoun, and Moore's book "Hidden // Markov Models: Estimation and Control", pp. 36-40. for (size_t iter = 0; iter < iterations; iter++) { // Clear new transition matrix and emission probabilities. arma::vec newInitial(transition.n_rows); newInitial.zeros(); arma::mat newTransition(transition.n_rows, transition.n_cols); newTransition.zeros(); // Reset log likelihood. loglik = 0; // Sum over time. size_t sumTime = 0; // Loop over each sequence. for (size_t seq = 0; seq < dataSeq.size(); seq++) { arma::mat stateProb; arma::mat forward; arma::mat backward; arma::vec scales; // Add the log-likelihood of this sequence. This is the E-step. loglik += Estimate(dataSeq[seq], stateProb, forward, backward, scales); // Now re-estimate the parameters. This is the M-step. // pi_i = sum_d ((1 / P(seq[d])) sum_t (f(i, 0) b(i, 0)) // T_ij = sum_d ((1 / P(seq[d])) sum_t (f(i, t) T_ij E_i(seq[d][t]) b(i, // t + 1))) // E_ij = sum_d ((1 / P(seq[d])) sum_{t | seq[d][t] = j} f(i, t) b(i, t) // We store the new estimates in a different matrix. for (size_t t = 0; t < dataSeq[seq].n_cols; t++) { for (size_t j = 0; j < transition.n_cols; j++) { // Add to estimate of initial probability for state j. newInitial[j] = stateProb(j, 0); if (t < dataSeq[seq].n_cols - 1) { // Estimate of T_ij (probability of transition from state j to state // i). We postpone multiplication of the old T_ij until later. for (size_t i = 0; i < transition.n_rows; i++) newTransition(i, j) += forward(j, t) * backward(i, t + 1) * emission[i].Probability(dataSeq[seq].unsafe_col(t + 1)) / scales[t + 1]; } // Add to list of emission observations, for Distribution::Estimate(). emissionList.col(sumTime) = dataSeq[seq].col(t); emissionProb[j][sumTime] = stateProb(j, t); } sumTime++; } } // Normalize the new initial probabilities. if (dataSeq.size() == 0) initial = newInitial / dataSeq.size(); // Assign the new transition matrix. We use %= (element-wise // multiplication) because every element of the new transition matrix must // still be multiplied by the old elements (this is the multiplication we // earlier postponed). transition %= newTransition; // Now we normalize the transition matrix. for (size_t i = 0; i < transition.n_cols; i++) transition.col(i) /= accu(transition.col(i)); // Now estimate emission probabilities. for (size_t state = 0; state < transition.n_cols; state++) emission[state].Estimate(emissionList, emissionProb[state]); Rcpp::Rcout << "Iteration " << iter << ": log-likelihood " << loglik << std::endl; if (std::abs(oldLoglik - loglik) < tolerance) { Rcpp::Rcout << "Converged after " << iter << " iterations." << std::endl; break; } oldLoglik = loglik; } } /** * Train the model using the given labeled observations; the transition and * emission matrices are directly estimated. */ template void HMM::Train(const std::vector& dataSeq, const std::vector >& stateSeq) { // Simple error checking. if (dataSeq.size() != stateSeq.size()) { Rcpp::Rcout << "HMM::Train(): number of data sequences (" << dataSeq.size() << ") not equal to number of state sequences (" << stateSeq.size() << ")." << std::endl; } initial.zeros(); transition.zeros(); // Estimate the transition and emission matrices directly from the // observations. The emission list holds the time indices for observations // from each state. std::vector > > emissionList(transition.n_cols); for (size_t seq = 0; seq < dataSeq.size(); seq++) { // Simple error checking. if (dataSeq[seq].n_cols != stateSeq[seq].n_elem) { Rcpp::Rcout << "HMM::Train(): number of observations (" << dataSeq[seq].n_cols << ") in sequence " << seq << " not equal to number of states (" << stateSeq[seq].n_cols << ") in sequence " << seq << "." << std::endl; } if (dataSeq[seq].n_rows != dimensionality) { Rcpp::Rcout << "HMM::Train(): data sequence " << seq << " has " << "dimensionality " << dataSeq[seq].n_rows << " (expected " << dimensionality << " dimensions)." << std::endl; } // Loop over each observation in the sequence. For estimation of the // transition matrix, we must ignore the last observation. initial[stateSeq[seq][0]]++; for (size_t t = 0; t < dataSeq[seq].n_cols - 1; t++) { transition(stateSeq[seq][t + 1], stateSeq[seq][t])++; emissionList[stateSeq[seq][t]].push_back(std::make_pair(seq, t)); } // Last observation. emissionList[stateSeq[seq][stateSeq[seq].n_elem - 1]].push_back( std::make_pair(seq, stateSeq[seq].n_elem - 1)); } // Normalize initial weights. initial /= accu(initial); // Normalize transition matrix. for (size_t col = 0; col < transition.n_cols; col++) { // If the transition probability sum is greater than 0 in this column, the // emission probability sum will also be greater than 0. We want to avoid // division by 0. double sum = accu(transition.col(col)); if (sum > 0) transition.col(col) /= sum; } // Estimate emission matrix. for (size_t state = 0; state < transition.n_cols; state++) { // Generate full sequence of observations for this state from the list of // emissions that are from this state. arma::mat emissions(dimensionality, emissionList[state].size()); for (size_t i = 0; i < emissions.n_cols; i++) { emissions.col(i) = dataSeq[emissionList[state][i].first].col( emissionList[state][i].second); } emission[state].Estimate(emissions); } } /** * Estimate the probabilities of each hidden state at each time step for each * given data observation. */ template double HMM::Estimate(const arma::mat& dataSeq, arma::mat& stateProb, arma::mat& forwardProb, arma::mat& backwardProb, arma::vec& scales) const { // First run the forward-backward algorithm. Forward(dataSeq, scales, forwardProb); Backward(dataSeq, scales, backwardProb); // Now assemble the state probability matrix based on the forward and backward // probabilities. stateProb = forwardProb % backwardProb; // Finally assemble the log-likelihood and return it. return accu(log(scales)); } /** * Estimate the probabilities of each hidden state at each time step for each * given data observation. */ template double HMM::Estimate(const arma::mat& dataSeq, arma::mat& stateProb) const { // We don't need to save these. arma::mat forwardProb, backwardProb; arma::vec scales; return Estimate(dataSeq, stateProb, forwardProb, backwardProb, scales); } /** * Generate a random data sequence of a given length. The data sequence is * stored in the dataSequence parameter, and the state sequence is stored in * the stateSequence parameter. */ template void HMM::Generate(const size_t length, arma::mat& dataSequence, arma::Col& stateSequence, const size_t startState) const { // Set vectors to the right size. stateSequence.set_size(length); dataSequence.set_size(dimensionality, length); // Set start state (default is 0). stateSequence[0] = startState; // Choose first emission state. double randValue = math::Random(); // We just have to find where our random value sits in the probability // distribution of emissions for our starting state. dataSequence.col(0) = emission[startState].Random(); // Now choose the states and emissions for the rest of the sequence. for (size_t t = 1; t < length; t++) { // First choose the hidden state. randValue = math::Random(); // Now find where our random value sits in the probability distribution of // state changes. double probSum = 0; for (size_t st = 0; st < transition.n_rows; st++) { probSum += transition(st, stateSequence[t - 1]); if (randValue <= probSum) { stateSequence[t] = st; break; } } // Now choose the emission. dataSequence.col(t) = emission[stateSequence[t]].Random(); } } /** * Compute the most probable hidden state sequence for the given observation * using the Viterbi algorithm. Returns the log-likelihood of the most likely * sequence. */ template double HMM::Predict(const arma::mat& dataSeq, arma::Col& stateSeq) const { // This is an implementation of the Viterbi algorithm for finding the most // probable sequence of states to produce the observed data sequence. We // don't use log-likelihoods to save that little bit of time, but we'll // calculate the log-likelihood at the end of it all. stateSeq.set_size(dataSeq.n_cols); arma::mat logStateProb(transition.n_rows, dataSeq.n_cols); arma::mat stateSeqBack(transition.n_rows, dataSeq.n_cols); // Store the logs of the transposed transition matrix. This is because we // will be using the rows of the transition matrix. arma::mat logTrans(log(trans(transition))); // The calculation of the first state is slightly different; the probability // of the first state being state j is the maximum probability that the state // came to be j from another state. logStateProb.col(0).zeros(); for (size_t state = 0; state < transition.n_rows; state++) { logStateProb(state, 0) = log(initial[state] * emission[state].Probability(dataSeq.unsafe_col(0))); stateSeqBack(state, 0) = state; } // Store the best first state. arma::uword index; for (size_t t = 1; t < dataSeq.n_cols; t++) { // Assemble the state probability for this element. // Given that we are in state j, we use state with the highest probability // of being the previous state. for (size_t j = 0; j < transition.n_rows; j++) { arma::vec prob = logStateProb.col(t - 1) + logTrans.col(j); logStateProb(j, t) = prob.max(index) + log(emission[j].Probability(dataSeq.unsafe_col(t))); stateSeqBack(j, t) = index; } } // Backtrack to find the most probable state sequence. logStateProb.unsafe_col(dataSeq.n_cols - 1).max(index); stateSeq[dataSeq.n_cols - 1] = index; for (size_t t = 2; t <= dataSeq.n_cols; t++) stateSeq[dataSeq.n_cols - t] = stateSeqBack(stateSeq[dataSeq.n_cols - t + 1], dataSeq.n_cols - t + 1); return logStateProb(stateSeq(dataSeq.n_cols - 1), dataSeq.n_cols - 1); } /** * Compute the log-likelihood of the given data sequence. */ template double HMM::LogLikelihood(const arma::mat& dataSeq) const { arma::mat forward; arma::vec scales; Forward(dataSeq, scales, forward); // The log-likelihood is the log of the scales for each time step. return accu(log(scales)); } /** * The Forward procedure (part of the Forward-Backward algorithm). */ template void HMM::Forward(const arma::mat& dataSeq, arma::vec& scales, arma::mat& forwardProb) const { // Our goal is to calculate the forward probabilities: // P(X_k | o_{1:k}) for all possible states X_k, for each time point k. forwardProb.zeros(transition.n_rows, dataSeq.n_cols); scales.zeros(dataSeq.n_cols); // The first entry in the forward algorithm uses the initial state // probabilities. Note that MATLAB assumes that the starting state (at // t = -1) is state 0; this is not our assumption here. To force that // behavior, you could append a single starting state to every single data // sequence and that should produce results in line with MATLAB. for (size_t state = 0; state < transition.n_rows; state++) forwardProb(state, 0) = initial(state) * emission[state].Probability(dataSeq.unsafe_col(0)); // Then normalize the column. scales[0] = accu(forwardProb.col(0)); forwardProb.col(0) /= scales[0]; // Now compute the probabilities for each successive observation. for (size_t t = 1; t < dataSeq.n_cols; t++) { for (size_t j = 0; j < transition.n_rows; j++) { // The forward probability of state j at time t is the sum over all states // of the probability of the previous state transitioning to the current // state and emitting the given observation. forwardProb(j, t) = accu(forwardProb.col(t - 1) % trans(transition.row(j))) * emission[j].Probability(dataSeq.unsafe_col(t)); } // Normalize probability. scales[t] = accu(forwardProb.col(t)); forwardProb.col(t) /= scales[t]; } } template void HMM::Backward(const arma::mat& dataSeq, const arma::vec& scales, arma::mat& backwardProb) const { // Our goal is to calculate the backward probabilities: // P(X_k | o_{k + 1:T}) for all possible states X_k, for each time point k. backwardProb.zeros(transition.n_rows, dataSeq.n_cols); // The last element probability is 1. backwardProb.col(dataSeq.n_cols - 1).fill(1); // Now step backwards through all other observations. for (size_t t = dataSeq.n_cols - 2; t + 1 > 0; t--) { for (size_t j = 0; j < transition.n_rows; j++) { // The backward probability of state j at time t is the sum over all state // of the probability of the next state having been a transition from the // current state multiplied by the probability of each of those states // emitting the given observation. for (size_t state = 0; state < transition.n_rows; state++) backwardProb(j, t) += transition(state, j) * backwardProb(state, t + 1) * emission[state].Probability(dataSeq.unsafe_col(t + 1)); // Normalize by the weights from the forward algorithm. backwardProb(j, t) /= scales[t + 1]; } } } template std::string HMM::ToString() const { std::ostringstream convert; convert << "HMM [" << this << "]" << std::endl; convert << " Dimensionality: " << dimensionality <. */ #ifndef __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_HPP #define __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_HPP #include #include #include "sparse_autoencoder_function.hpp" namespace mlpack { namespace nn { /** * A sparse autoencoder is a neural network whose aim to learn compressed * representations of the data, typically for dimensionality reduction, with a * constraint on the activity of the neurons in the network. Sparse autoencoders * can be stacked together to learn a hierarchy of features, which provide a * better representation of the data for classification. This is a method used * in the recently developed field of deep learning. More technical details * about the model can be found on the following webpage: * * http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial * * An example of how to use the interface is shown below: * * @code * arma::mat data; // Data matrix. * const size_t vSize = 64; // Size of visible layer, depends on the data. * const size_t hSize = 25; // Size of hidden layer, depends on requirements. * * // Train the model using default options. * SparseAutoencoder encoder1(data, vSize, hSize); * * const size_t numBasis = 5; // Parameter required for L-BFGS algorithm. * const size_t numIterations = 100; // Maximum number of iterations. * * // Use an instantiated optimizer for the training. * SparseAutoencoderFunction saf(data, vSize, hSize); * L_BFGS optimizer(saf, numBasis, numIterations); * SparseAutoencoder encoder2(optimizer); * * arma::mat features1, features2; // Matrices for storing new representations. * * // Get new representations from the trained models. * encoder1.GetNewFeatures(data, features1); * encoder2.GetNewFeatures(data, features2); * @endcode * * This implementation allows the use of arbitrary mlpack optimizers via the * OptimizerType template parameter. * * @tparam OptimizerType The optimizer to use; by default this is L-BFGS. Any * mlpack optimizer can be used here. */ template< template class OptimizerType = mlpack::optimization::L_BFGS > class SparseAutoencoder { public: /** * Construct the sparse autoencoder model with the given training data. This * will train the model. The parameters 'lambda', 'beta' and 'rho' can be set * optionally. Changing these parameters will have an effect on regularization * and sparsity of the model. * * @param data Input data with each column as one example. * @param visibleSize Size of input vector expected at the visible layer. * @param hiddenSize Size of input vector expected at the hidden layer. * @param lambda L2-regularization parameter. * @param beta KL divergence parameter. * @param rho Sparsity parameter. */ SparseAutoencoder(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, const double lambda = 0.0001, const double beta = 3, const double rho = 0.01); /** * Construct the sparse autoencoder model with the given training data. This * will train the model. This overload takes an already instantiated optimizer * and uses it to train the model. The optimizer should hold an instantiated * SparseAutoencoderFunction object for the function to operate upon. This * option should be preferred when the optimizer options are to be changed. * * @param optimizer Instantiated optimizer with instantiated error function. */ SparseAutoencoder(OptimizerType& optimizer); /** * Transforms the provided data into the representation learned by the sparse * autoencoder. The function basically performs a feedforward computation * using the learned weights, and returns the hidden layer activations. * * @param data Matrix of the provided data. * @param features The hidden layer representation of the provided data. */ void GetNewFeatures(arma::mat& data, arma::mat& features); /** * Returns the elementwise sigmoid of the passed matrix, where the sigmoid * function of a real number 'x' is [1 / (1 + exp(-x))]. * * @param x Matrix of real values for which we require the sigmoid activation. */ void Sigmoid(const arma::mat& x, arma::mat& output) const { output = (1.0 / (1 + arma::exp(-x))); } //! Sets size of the visible layer. void VisibleSize(const size_t visible) { this->visibleSize = visible; } //! Gets size of the visible layer. size_t VisibleSize() const { return visibleSize; } //! Sets size of the hidden layer. void HiddenSize(const size_t hidden) { this->hiddenSize = hidden; } //! Gets the size of the hidden layer. size_t HiddenSize() const { return hiddenSize; } //! Sets the L2-regularization parameter. void Lambda(const double l) { this->lambda = l; } //! Gets the L2-regularization parameter. double Lambda() const { return lambda; } //! Sets the KL divergence parameter. void Beta(const double b) { this->beta = b; } //! Gets the KL divergence parameter. double Beta() const { return beta; } //! Sets the sparsity parameter. void Rho(const double r) { this->rho = r; } //! Gets the sparsity parameter. double Rho() const { return rho; } private: //! Parameters after optimization. arma::mat parameters; //! Size of the visible layer. size_t visibleSize; //! Size of the hidden layer. size_t hiddenSize; //! L2-regularization parameter. double lambda; //! KL divergence parameter. double beta; //! Sparsity parameter. double rho; }; }; // namespace nn }; // namespace mlpack // Include implementation. #include "sparse_autoencoder_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/sparse_autoencoder/sparse_autoencoder_impl.hpp0000644000176200001440000000655413647512751030711 0ustar liggesusers/** * @file sparse_autoencoder_impl.hpp * @author Siddharth Agrawal * * Implementation of sparse autoencoders. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_IMPL_HPP #define __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_IMPL_HPP // In case it hasn't been included yet. #include "sparse_autoencoder.hpp" namespace mlpack { namespace nn { template class OptimizerType> SparseAutoencoder::SparseAutoencoder(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, double lambda, double beta, double rho) : visibleSize(visibleSize), hiddenSize(hiddenSize), lambda(lambda), beta(beta), rho(rho) { SparseAutoencoderFunction encoderFunction(data, visibleSize, hiddenSize, lambda, beta, rho); OptimizerType optimizer(encoderFunction); parameters = encoderFunction.GetInitialPoint(); // Train the model. Timer::Start("sparse_autoencoder_optimization"); const double out = optimizer.Optimize(parameters); Timer::Stop("sparse_autoencoder_optimization"); Log::Info << "SparseAutoencoder::SparseAutoencoder(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> SparseAutoencoder::SparseAutoencoder( OptimizerType &optimizer) : parameters(optimizer.Function().GetInitialPoint()), visibleSize(optimizer.Function().VisibleSize()), hiddenSize(optimizer.Function().HiddenSize()), lambda(optimizer.Function().Lambda()), beta(optimizer.Function().Beta()), rho(optimizer.Function().Rho()) { Timer::Start("sparse_autoencoder_optimization"); const double out = optimizer.Optimize(parameters); Timer::Stop("sparse_autoencoder_optimization"); Log::Info << "SparseAutoencoder::SparseAutoencoder(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> void SparseAutoencoder::GetNewFeatures(arma::mat& data, arma::mat& features) { const size_t l1 = hiddenSize; const size_t l2 = visibleSize; Sigmoid(parameters.submat(0, 0, l1 - 1, l2 - 1) * data + arma::repmat(parameters.submat(0, l2, l1 - 1, l2), 1, data.n_cols), features); } }; // namespace nn }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.hpp0000644000176200001440000001211113647512751031557 0ustar liggesusers/** * @file sparse_autoencoder_function.hpp * @author Siddharth Agrawal * * The function to be optimized for sparse autoencoders. Any mlpack optimizer * can be used. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_FUNCTION_HPP #define __MLPACK_METHODS_SPARSE_AUTOENCODER_SPARSE_AUTOENCODER_FUNCTION_HPP #include namespace mlpack { namespace nn { /** * This is a class for the sparse autoencoder objective function. It can be used * to create learning models like self-taught learning, stacked autoencoders, * conditional random fields (CRFs), and so forth. */ class SparseAutoencoderFunction { public: /** * Construct the sparse autoencoder objective function with the given * parameters. * * @param data The data matrix. * @param visibleSize Size of input vector expected at the visible layer. * @param hiddenSize Size of input vector expected at the hidden layer. * @param lambda L2-regularization parameter. * @param beta KL divergence parameter. * @param rho Sparsity parameter. */ SparseAutoencoderFunction(const arma::mat& data, const size_t visibleSize, const size_t hiddenSize, const double lambda = 0.0001, const double beta = 3, const double rho = 0.01); //! Initializes the parameters of the model to suitable values. const arma::mat InitializeWeights(); /** * Evaluates the objective function of the sparse autoencoder model using the * given parameters. The cost function has terms for the reconstruction * error, regularization cost and the sparsity cost. The objective function * takes a low value when the model is able to reconstruct the data well * using weights which are low in value and when the average activations of * neurons in the hidden layers agrees well with the sparsity parameter 'rho'. * * @param parameters Current values of the model parameters. */ double Evaluate(const arma::mat& parameters) const; /** * Evaluates the gradient values of the objective function given the current * set of parameters. The function performs a feedforward pass and computes * the error in reconstructing the data points. It then uses the * backpropagation algorithm to compute the gradient values. * * @param parameters Current values of the model parameters. * @param gradient Matrix where gradient values will be stored. */ void Gradient(const arma::mat& parameters, arma::mat& gradient) const; /** * Returns the elementwise sigmoid of the passed matrix, where the sigmoid * function of a real number 'x' is [1 / (1 + exp(-x))]. * * @param x Matrix of real values for which we require the sigmoid activation. */ void Sigmoid(const arma::mat& x, arma::mat& output) const { output = (1.0 / (1 + arma::exp(-x))); } //! Return the initial point for the optimization. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Sets size of the visible layer. void VisibleSize(const size_t visible) { this->visibleSize = visible; } //! Gets size of the visible layer. size_t VisibleSize() const { return visibleSize; } //! Sets size of the hidden layer. void HiddenSize(const size_t hidden) { this->hiddenSize = hidden; } //! Gets the size of the hidden layer. size_t HiddenSize() const { return hiddenSize; } //! Sets the L2-regularization parameter. void Lambda(const double l) { this->lambda = l; } //! Gets the L2-regularization parameter. double Lambda() const { return lambda; } //! Sets the KL divergence parameter. void Beta(const double b) { this->beta = b; } //! Gets the KL divergence parameter. double Beta() const { return beta; } //! Sets the sparsity parameter. void Rho(const double r) { this->rho = r; } //! Gets the sparsity parameter. double Rho() const { return rho; } private: //! The matrix of data points. const arma::mat& data; //! Intial parameter vector. arma::mat initialPoint; //! Size of the visible layer. size_t visibleSize; //! Size of the hidden layer. size_t hiddenSize; //! L2-regularization parameter. double lambda; //! KL divergence parameter. double beta; //! Sparsity parameter. double rho; }; }; // namespace nn }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/regularized_svd/0000755000176200001440000000000013647512751022564 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/regularized_svd/regularized_svd.hpp0000644000176200001440000000532413647512751026472 0ustar liggesusers/** * @file regularized_svd.hpp * @author Siddharth Agrawal * * An implementation of Regularized SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_HPP #define __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_HPP #include #include #include "regularized_svd_function.hpp" namespace mlpack { namespace svd { template< template class OptimizerType = mlpack::optimization::SGD > class RegularizedSVD { public: /** * Constructor for Regularized SVD. Obtains the user and item matrices after * training on the passed data. The constructor initiates an object of class * RegularizedSVDFunction for optimization. It uses the SGD optimizer by * default. The optimizer uses a template specialization of Optimize(). * * @param data Dataset for which SVD is calculated. * @param u User matrix in the matrix decomposition. * @param v Item matrix in the matrix decomposition. * @param rank Rank used for matrix factorization. * @param iterations Number of optimization iterations. * @param lambda Regularization parameter for the optimization. */ RegularizedSVD(const arma::mat& data, arma::mat& u, arma::mat& v, const size_t rank, const size_t iterations = 10, const double alpha = 0.01, const double lambda = 0.02); private: //! Rating data. const arma::mat& data; //! Rank used for matrix factorization. size_t rank; //! Number of optimization iterations. size_t iterations; //! Learning rate for the SGD optimizer. double alpha; //! Regularization parameter for the optimization. double lambda; //! Function that will be held by the optimizer. RegularizedSVDFunction rSVDFunc; //! Default SGD optimizer for the class. mlpack::optimization::SGD optimizer; }; }; // namespace svd }; // namespace mlpack // Include implementation. #include "regularized_svd_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/regularized_svd/regularized_svd_function.hpp0000644000176200001440000001015313647512751030373 0ustar liggesusers/** * @file regularized_svd_function.hpp * @author Siddharth Agrawal * * An implementation of the RegularizedSVDFunction class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_FUNCTION_SVD_HPP #define __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_FUNCTION_SVD_HPP #include #include namespace mlpack { namespace svd { class RegularizedSVDFunction { public: /** * Constructor for RegularizedSVDFunction class. The constructor calculates * the number of users and items in the passed data. It also randomly * initializes the parameter values. * * @param data Dataset for which SVD is calculated. * @param rank Rank used for matrix factorization. * @param lambda Regularization parameter used for optimization. */ RegularizedSVDFunction(const arma::mat& data, const size_t rank, const double lambda); /** * Evaluates the cost function over all examples in the data. * * @param parameters Parameters(user/item matrices) of the decomposition. */ double Evaluate(const arma::mat& parameters) const; /** * Evaluates the cost function for one training example. Useful for the SGD * optimizer abstraction which uses one training example at a time. * * @param parameters Parameters(user/item matrices) of the decomposition. * @param i Index of the training example to be used. */ double Evaluate(const arma::mat& parameters, const size_t i) const; /** * Evaluates the full gradient of the cost function over all the training * examples. * * @param parameters Parameters(user/item matrices) of the decomposition. * @param gradient Calculated gradient for the parameters. */ void Gradient(const arma::mat& parameters, arma::mat& gradient) const; //! Return the initial point for the optimization. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Return the dataset passed into the constructor. const arma::mat& Dataset() const { return data; } //! Return the number of training examples. Useful for SGD optimizer. size_t NumFunctions() const { return data.n_cols; } //! Return the number of users in the data. size_t NumUsers() const { return numUsers; } //! Return the number of items in the data. size_t NumItems() const { return numItems; } //! Return the regularization parameters. double Lambda() const { return lambda; } //! Return the rank used for the factorization. size_t Rank() const { return rank; } private: //! Rating data. const arma::mat& data; //! Initial parameter point. arma::mat initialPoint; //! Rank used for matrix factorization. size_t rank; //! Regularization parameter for the optimization. double lambda; //! Number of users in the given dataset. size_t numUsers; //! Number of items in the given dataset. size_t numItems; }; }; // namespace svd }; // namespace mlpack namespace mlpack { namespace optimization { /** * Template specialization for SGD optimizer. Used because the gradient * affects only a small number of parameters per example, and thus the normal * abstraction does not work as fast as we might like it to. */ template<> double SGD::Optimize( arma::mat& parameters); }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/regularized_svd/regularized_svd_impl.hpp0000644000176200001440000000422213647512751027507 0ustar liggesusers/** * @file regularized_svd_impl.hpp * @author Siddharth Agrawal * * An implementation of Regularized SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_IMPL_HPP #define __MLPACK_METHODS_REGULARIZED_SVD_REGULARIZED_SVD_IMPL_HPP namespace mlpack { namespace svd { template class OptimizerType> RegularizedSVD::RegularizedSVD(const arma::mat& data, arma::mat& u, arma::mat& v, const size_t rank, const size_t iterations, const double alpha, const double lambda) : data(data), rank(rank), iterations(iterations), alpha(alpha), lambda(lambda), rSVDFunc(data, rank, lambda), optimizer(rSVDFunc, alpha, iterations * data.n_cols) { arma::mat parameters = rSVDFunc.GetInitialPoint(); // Train the model. Timer::Start("regularized_svd_optimization"); const double out = optimizer.Optimize(parameters); Timer::Stop("regularized_svd_optimization"); const size_t numUsers = max(data.row(0)) + 1; const size_t numItems = max(data.row(1)) + 1; u = parameters.submat(0, 0, rank - 1, numUsers - 1); v = parameters.submat(0, numUsers, rank - 1, numUsers + numItems - 1); } }; // namespace svd }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/logistic_regression/0000755000176200001440000000000013647512751023450 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/logistic_regression/logistic_regression_function.hpp0000644000176200001440000001254613647512751032153 0ustar liggesusers/** * @file logistic_regression_function.hpp * @author Sumedh Ghaisas * * Implementation of the logistic regression function, which is meant to be * optimized by a separate optimizer class that takes LogisticRegressionFunction * as its FunctionType class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_FUNCTION_HPP #define __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_FUNCTION_HPP #include namespace mlpack { namespace regression { /** * The log-likelihood function for the logistic regression objective function. * This is used by various mlpack optimizers to train a logistic regression * model. */ class LogisticRegressionFunction { public: LogisticRegressionFunction(const arma::mat& predictors, const arma::vec& responses, const double lambda = 0); LogisticRegressionFunction(const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda = 0); //! Return the initial point for the optimization. const arma::mat& InitialPoint() const { return initialPoint; } //! Modify the initial point for the optimization. arma::mat& InitialPoint() { return initialPoint; } //! Return the regularization parameter (lambda). const double& Lambda() const { return lambda; } //! Modify the regularization parameter (lambda). double& Lambda() { return lambda; } //! Return the matrix of predictors. const arma::mat& Predictors() const { return predictors; } //! Return the vector of responses. const arma::vec& Responses() const { return responses; } /** * Evaluate the logistic regression log-likelihood function with the given * parameters. Note that if a point has 0 probability of being classified * directly with the given parameters, then Evaluate() will return nan (this * is kind of a corner case and should not happen for reasonable models). * * The optimum (minimum) of this function is 0.0, and occurs when each point * is classified correctly with very high probability. * * @param parameters Vector of logistic regression parameters. */ double Evaluate(const arma::mat& parameters) const; /** * Evaluate the logistic regression log-likelihood function with the given * parameters, but using only one data point. This is useful for optimizers * such as SGD, which require a separable objective function. Note that if * the point has 0 probability of being classified correctly with the given * parameters, then Evaluate() will return nan (this is kind of a corner case * and should not happen for reasonable models). * * The optimum (minimum) of this function is 0.0, and occurs when the point is * classified correctly with very high probability. * * @param parameters Vector of logistic regression parameters. * @param i Index of point to use for objective function evaluation. */ double Evaluate(const arma::mat& parameters, const size_t i) const; /** * Evaluate the gradient of the logistic regression log-likelihood function * with the given parameters. * * @param parameters Vector of logistic regression parameters. * @param gradient Vector to output gradient into. */ void Gradient(const arma::mat& parameters, arma::mat& gradient) const; /** * Evaluate the gradient of the logistic regression log-likelihood function * with the given parameters, and with respect to only one point in the * dataset. This is useful for optimizers such as SGD, which require a * separable objective function. * * @param parameters Vector of logistic regression parameters. * @param i Index of points to use for objective function gradient evaluation. * @param gradient Vector to output gradient into. */ void Gradient(const arma::mat& parameters, const size_t i, arma::mat& gradient) const; //! Return the initial point for the optimization. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Return the number of separable functions (the number of predictor points). size_t NumFunctions() const { return predictors.n_cols; } private: //! The initial point, from which to start the optimization. arma::mat initialPoint; //! The matrix of data points (predictors). const arma::mat& predictors; //! The vector of responses to the input data points. const arma::vec& responses; //! The regularization parameter for L2-regularization. double lambda; }; }; // namespace regression }; // namespace mlpack #endif // __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_FUNCTION_HPP RcppMLPACK/inst/include/mlpack/methods/logistic_regression/logistic_regression_impl.hpp0000644000176200001440000001267513647512751031272 0ustar liggesusers/** * @file logistic_regression_impl.hpp * @author Sumedh Ghaisas * * Implementation of the LogisticRegression class. This implementation supports * L2-regularization. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_IMPL_HPP #define __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_IMPL_HPP // In case it hasn't been included yet. #include "logistic_regression.hpp" namespace mlpack { namespace regression { template class OptimizerType> LogisticRegression::LogisticRegression( const arma::mat& predictors, const arma::vec& responses, const double lambda) : parameters(arma::zeros(predictors.n_rows + 1)), lambda(lambda) { LogisticRegressionFunction errorFunction(predictors, responses, lambda); OptimizerType optimizer(errorFunction); // Train the model. //Timer::Start("logistic_regression_optimization"); const double out = optimizer.Optimize(parameters); //Timer::Stop("logistic_regression_optimization"); Rcpp::Rcout << "LogisticRegression::LogisticRegression(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> LogisticRegression::LogisticRegression( const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda) : parameters(arma::zeros(predictors.n_rows + 1)), lambda(lambda) { LogisticRegressionFunction errorFunction(predictors, responses, lambda); errorFunction.InitialPoint() = initialPoint; OptimizerType optimizer(errorFunction); // Train the model. //Timer::Start("logistic_regression_optimization"); const double out = optimizer.Optimize(parameters); //Timer::Stop("logistic_regression_optimization"); Rcpp::Rcout << "LogisticRegression::LogisticRegression(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> LogisticRegression::LogisticRegression( OptimizerType& optimizer) : parameters(optimizer.Function().GetInitialPoint()), lambda(optimizer.Function().Lambda()) { //Timer::Start("logistic_regression_optimization"); const double out = optimizer.Optimize(parameters); //Timer::Stop("logistic_regression_optimization"); Rcpp::Rcout << "LogisticRegression::LogisticRegression(): final objective of " << "trained model is " << out << "." << std::endl; } template class OptimizerType> LogisticRegression::LogisticRegression( const arma::vec& parameters, const double lambda) : parameters(parameters), lambda(lambda) { // Nothing to do. } template class OptimizerType> void LogisticRegression::Predict(const arma::mat& predictors, arma::vec& responses, const double decisionBoundary) const { // Calculate sigmoid function for each point. The (1.0 - decisionBoundary) // term correctly sets an offset so that floor() returns 0 or 1 correctly. responses = arma::floor((1.0 / (1.0 + arma::exp(-parameters(0) - predictors.t() * parameters.subvec(1, parameters.n_elem - 1)))) + (1.0 - decisionBoundary)); } template class OptimizerType> double LogisticRegression::ComputeError( const arma::mat& predictors, const arma::vec& responses) const { // Construct a new error function. LogisticRegressionFunction newErrorFunction(predictors, responses, lambda); return newErrorFunction.Evaluate(parameters); } template class OptimizerType> double LogisticRegression::ComputeAccuracy( const arma::mat& predictors, const arma::vec& responses, const double decisionBoundary) const { // Predict responses using the current model. arma::vec tempResponses; Predict(predictors, tempResponses, decisionBoundary); // Count the number of responses that were correct. size_t count = 0; for (size_t i = 0; i < responses.n_elem; i++) if (responses(i) == tempResponses(i)) count++; return (double) (count * 100) / responses.n_rows; } template class OptimizerType> std::string LogisticRegression::ToString() const { std::ostringstream convert; convert << "Logistic Regression [" << this << "]" << std::endl; convert << " Parameters: " << parameters.n_rows << std::endl; convert << " Lambda: " << lambda << std::endl; return convert.str(); } }; // namespace regression }; // namespace mlpack #endif // __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_IMPL_HPP RcppMLPACK/inst/include/mlpack/methods/logistic_regression/logistic_regression.hpp0000644000176200001440000001473513647512751030250 0ustar liggesusers/** * @file logistic_regression.hpp * @author Sumedh Ghaisas * * The LogisticRegression class, which implements logistic regression. This * implements supports L2-regularization. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_HPP #define __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_HPP #include #include #include "logistic_regression_function.hpp" namespace mlpack { namespace regression { template< template class OptimizerType = mlpack::optimization::L_BFGS > class LogisticRegression { public: /** * Construct the LogisticRegression class with the given labeled training * data. This will train the model. Optionally, specify lambda, which is the * penalty parameter for L2-regularization. If not specified, it is set to 0, * which results in standard (unregularized) logistic regression. * * @param predictors Input training variables. * @param responses Outputs resulting from input training variables. * @param lambda L2-regularization parameter. */ LogisticRegression(const arma::mat& predictors, const arma::vec& responses, const double lambda = 0); /** * Construct the LogisticRegression class with the given labeled training * data. This will train the model. Optionally, specify lambda, which is the * penalty parameter for L2-regularization. If not specified, it is set to 0, * which results in standard (unregularized) logistic regression. * * @param predictors Input training variables. * @param responses Outputs results from input training variables. * @param initialPoint Initial model to train with. * @param lambda L2-regularization parameter. */ LogisticRegression(const arma::mat& predictors, const arma::vec& responses, const arma::mat& initialPoint, const double lambda = 0); /** * Construct the LogisticRegression class with the given labeled training * data. This will train the model. This overload takes an already * instantiated optimizer (which holds the LogisticRegressionFunction error * function, which must also be instantiated), so that the optimizer can be * configured before the training is run by this constructor. The predictors * and responses and initial point are all taken from the error function * contained in the optimizer. * * @param optimizer Instantiated optimizer with instantiated error function. */ LogisticRegression(OptimizerType& optimizer); /** * Construct a logistic regression model from the given parameters, without * performing any training. The lambda parameter is used for the * ComputeAccuracy() and ComputeError() functions; this constructor does not * train the model itself. * * @param parameters Parameters making up the model. * @param lambda L2-regularization penalty parameter. */ LogisticRegression(const arma::vec& parameters, const double lambda = 0); //! Return the parameters (the b vector). const arma::vec& Parameters() const { return parameters; } //! Modify the parameters (the b vector). arma::vec& Parameters() { return parameters; } //! Return the lambda value for L2-regularization. const double& Lambda() const { return lambda; } //! Modify the lambda value for L2-regularization. double& Lambda() { return lambda; } /** * Predict the responses to a given set of predictors. The responses will be * either 0 or 1. Optionally, specify the decision boundary; logistic * regression returns a value between 0 and 1. If the value is greater than * the decision boundary, the response is taken to be 1; otherwise, it is 0. * By default the decision boundary is 0.5. * * @param predictors Input predictors. * @param responses Vector to put output predictions of responses into. * @param decisionBoundary Decision boundary (default 0.5). */ void Predict(const arma::mat& predictors, arma::vec& responses, const double decisionBoundary = 0.5) const; /** * Compute the accuracy of the model on the given predictors and responses, * optionally using the given decision boundary. The responses should be * either 0 or 1. Logistic regression returns a value between 0 and 1. If * the value is greater than the decision boundary, the response is taken to * be 1; otherwise, it is 0. By default, the decision boundary is 0.5. * * The accuracy is returned as a percentage, between 0 and 100. * * @param predictors Input predictors. * @param responses Vector of responses. * @param decisionBoundary Decision boundary (default 0.5). * @return Percentage of responses that are predicted correctly. */ double ComputeAccuracy(const arma::mat& predictors, const arma::vec& responses, const double decisionBoundary = 0.5) const; /** * Compute the error of the model. This returns the negative objective * function of the logistic regression log-likelihood function. For the model * to be optimal, the negative log-likelihood function should be minimized. * * @param predictors Input predictors. * @param responses Vector of responses. */ double ComputeError(const arma::mat& predictors, const arma::vec& responses) const; // Returns a string representation of this object. std::string ToString() const; private: //! Vector of trained parameters. arma::vec parameters; //! L2-regularization penalty parameter. double lambda; }; }; // namespace regression }; // namespace mlpack // Include implementation. #include "logistic_regression_impl.hpp" #endif // __MLPACK_METHODS_LOGISTIC_REGRESSION_LOGISTIC_REGRESSION_HPP RcppMLPACK/inst/include/mlpack/methods/rann/0000755000176200001440000000000013647512751020331 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/rann/ra_search.hpp0000644000176200001440000003247013647512751022777 0ustar liggesusers/** * @file ra_search.hpp * @author Parikshit Ram * * Defines the RASearch class, which performs an abstract rank-approximate * nearest/farthest neighbor query on two datasets. * * The details of this method can be found in the following paper: * * @inproceedings{ram2009rank, * title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and * Speed in High Dimensions}}, * author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}}, * booktitle={{Advances of Neural Information Processing Systems}}, * year={2009} * } * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_HPP #include #include #include #include #include "ra_query_stat.hpp" /** * The RASearch class: This class provides a generic manner to perform * rank-approximate search via random-sampling. If the 'naive' option is chosen, * this rank-approximate search will be done by randomly sampled from the whole * set. If the 'naive' option is not chosen, the sampling is done in a * stratified manner in the tree as mentioned in the algorithms in Figure 2 of * the following paper: * * @inproceedings{ram2009rank, * title={{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and * Speed in High Dimensions}}, * author={{Ram, P. and Lee, D. and Ouyang, H. and Gray, A. G.}}, * booktitle={{Advances of Neural Information Processing Systems}}, * year={2009} * } * * RASearch is currently known to not work with ball trees (#356). * * @tparam SortPolicy The sort policy for distances; see NearestNeighborSort. * @tparam MetricType The metric to use for computation. * @tparam TreeType The tree type to use. */ template, RAQueryStat > > class RASearch { public: /** * Initialize the RASearch object, passing both a query and reference dataset. * Optionally, perform the computation in naive mode or single-tree mode, and * set the leaf size used for tree-building. An initialized distance metric * can be given, for cases where the metric has internal data (i.e. the * distance::MahalanobisDistance class). * * This method will copy the matrices to internal copies, which are rearranged * during tree-building. You can avoid this extra copy by pre-constructing * the trees and passing them using a diferent constructor. * * @param referenceSet Set of reference points. * @param querySet Set of query points. * @param naive If true, the rank-approximate search will be performed by * directly sampling the whole set instead of using the stratified * sampling on the tree. * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ RASearch(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RASearch object, passing only one dataset, which is * used as both the query and the reference dataset. Optionally, perform the * computation in naive mode or single-tree mode, and set the leaf size used * for tree-building. An initialized distance metric can be given, for cases * where the metric has internal data (i.e. the distance::MahalanobisDistance * class). * * If naive mode is being used and a pre-built tree is given, it may not work: * naive mode operates by building a one-node tree (the root node holds all * the points). If that condition is not satisfied with the pre-built tree, * then naive mode will not work. * * @param referenceSet Set of reference points. * @param naive If true, the rank-approximate search will be performed * by directly sampling the whole set instead of using the stratified * sampling on the tree. * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ RASearch(const typename TreeType::Mat& referenceSet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RASearch object with the given datasets and * pre-constructed trees. It is assumed that the points in referenceSet and * querySet correspond to the points in referenceTree and queryTree, * respectively. Optionally, choose to use single-tree mode. Naive mode is * not available as an option for this constructor; instead, to run naive * computation, construct a tree with all of the points in one leaf (i.e. * leafSize = number of points). Additionally, an instantiated distance * metric can be given, for cases where the distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param queryTree Pre-built tree for query points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param querySet Set of query points corresponding to queryTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RASearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RASearch object with the given reference dataset and * pre-constructed tree. It is assumed that the points in referenceSet * correspond to the points in referenceTree. Optionally, choose to use * single-tree mode. Naive mode is not available as an option for this * constructor; instead, to run naive computation, construct a tree with all * the points in one leaf (i.e. leafSize = number of points). Additionally, * an instantiated distance metric can be given, for the case where the * distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RASearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Delete the RASearch object. The tree is the only member we are * responsible for deleting. The others will take care of themselves. */ ~RASearch(); /** * Compute the rank approximate nearest neighbors and store the output in the * given matrices. The matrices will be set to the size of n columns by k * rows, where n is the number of points in the query dataset and k is the * number of neighbors being searched for. * * Note that tau, the rank-approximation parameter, specifies that we are * looking for k neighbors with probability alpha of being in the top tau * percent of nearest neighbors. So, as an example, if our dataset has 1000 * points, and we want 5 nearest neighbors with 95% probability of being in * the top 5% of nearest neighbors (or, the top 50 nearest neighbors), we set * k = 5, tau = 5, and alpha = 0.95. * * The method will fail (and issue a failure message) if the value of tau is * too low: tau must be set such that the number of points in the * corresponding percentile of the data is greater than k. Thus, if we choose * tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting * to choose 5 nearest neighbors out of the closest 1 point -- this is * invalid. * * @param k Number of neighbors to search for. * @param resultingNeighbors Matrix storing lists of neighbors for each query * point. * @param distances Matrix storing distances of neighbors for each query * point. * @param tau The rank-approximation in percentile of the data. The default * value is 5%. * @param alpha The desired success probability. The default value is 0.95. * @param sampleAtLeaves Sample at leaves for faster but less accurate * computation. This defaults to 'false'. * @param firstLeafExact Traverse to the first leaf without approximation. * This can ensure that the query definitely finds its (near) duplicate * if there exists one. This defaults to 'false' for now. * @param singleSampleLimit The limit on the largest node that can be * approximated by sampling. This defaults to 20. */ void Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const double tau = 5, const double alpha = 0.95, const bool sampleAtLeaves = false, const bool firstLeafExact = false, const size_t singleSampleLimit = 20); /** * This function recursively resets the RAQueryStat of the queryTree to set * 'bound' to WorstDistance and the 'numSamplesMade' to 0. This allows a user * to perform multiple searches on the same pair of trees, possibly with * different levels of approximation without requiring to build a new pair of * trees for every new (approximate) search. */ void ResetQueryTree(); // Returns a string representation of this object. std::string ToString() const; private: //! Copy of reference dataset (if we need it, because tree building modifies //! it). arma::mat referenceCopy; //! Copy of query dataset (if we need it, because tree building modifies it). arma::mat queryCopy; //! Reference dataset. const arma::mat& referenceSet; //! Query dataset (may not be given). const arma::mat& querySet; //! Pointer to the root of the reference tree. TreeType* referenceTree; //! Pointer to the root of the query tree (might not exist). TreeType* queryTree; //! If true, this object created the trees and is responsible for them. bool treeOwner; //! Indicates if a separate query set was passed. bool hasQuerySet; //! Indicates if naive random sampling on the set is being used. bool naive; //! Indicates if single-tree search is being used (opposed to dual-tree). bool singleMode; //! Instantiation of kernel. MetricType metric; //! Permutations of reference points during tree building. std::vector oldFromNewReferences; //! Permutations of query points during tree building. std::vector oldFromNewQueries; //! Total number of pruned nodes during the neighbor search. size_t numberOfPrunes; /** * @param treeNode The node of the tree whose RAQueryStat is reset * and whose children are to be explored recursively. */ void ResetRAQueryStat(TreeType* treeNode); }; // class RASearch }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "ra_search_impl.hpp" // Include convenient typedefs. #include "ra_typedef.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/rann/ra_query_stat.hpp0000644000176200001440000000474113647512751023732 0ustar liggesusers/** * @file ra_query_stat.hpp * @author Parikshit Ram * * Defines the RAQueryStat class, which is the statistic used for * rank-approximate nearest neighbor search (RASearch). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_QUERY_STAT_HPP #define __MLPACK_METHODS_RANN_RA_QUERY_STAT_HPP #include #include #include #include namespace mlpack { namespace neighbor { /** * Extra data for each node in the tree. For neighbor searches, each node only * needs to store a bound on neighbor distances. * * Every query is required to make a minimum number of samples to guarantee the * desired approximation error. The 'numSamplesMade' keeps track of the minimum * number of samples made by all queries in the node in question. */ template class RAQueryStat { public: /** * Initialize the statistic with the worst possible distance according to our * sorting policy. */ RAQueryStat() : bound(SortPolicy::WorstDistance()), numSamplesMade(0) { } /** * Initialization for a node. */ template RAQueryStat(const TreeType& /* node */) : bound(SortPolicy::WorstDistance()), numSamplesMade(0) { } //! Get the bound. double Bound() const { return bound; } //! Modify the bound. double& Bound() { return bound; } //! Get the number of samples made. size_t NumSamplesMade() const { return numSamplesMade; } //! Modify the number of samples made. size_t& NumSamplesMade() { return numSamplesMade; } private: //! The bound on the node's neighbor distances. double bound; //! The minimum number of samples made by any query in this node. size_t numSamplesMade; }; #endif RcppMLPACK/inst/include/mlpack/methods/rann/ra_search_impl.hpp0000644000176200001440000003420713647512751024020 0ustar liggesusers/** * @file ra_search_impl.hpp * @author Parikshit Ram * * Implementation of RASearch class to perform rank-approximate * all-nearest-neighbors on two specified data sets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_IMPL_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_IMPL_HPP #include #include "ra_search_rules.hpp" namespace mlpack { namespace neighbor { namespace aux { //! Call the tree constructor that does mapping. template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } }; // namespace aux // Construct the object. template RASearch:: RASearch(const typename TreeType::Mat& referenceSetIn, const typename TreeType::Mat& querySetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet((tree::TreeTraits::RearrangesDataset && !singleMode) ? queryCopy : querySetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), hasQuerySet(true), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric), numberOfPrunes(0) { // We'll time tree building. //Timer::Start("tree_building"); if (tree::TreeTraits::RearrangesDataset) { referenceCopy = referenceSetIn; if (!singleMode) queryCopy = querySetIn; } // Construct as a naive object if we need to. if (!naive) { referenceTree = aux::BuildTree(const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = aux::BuildTree(const_cast(querySet), oldFromNewQueries); } // Stop the timer we started above. //Timer::Stop("tree_building"); } // Construct the object. template RASearch:: RASearch(const typename TreeType::Mat& referenceSetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset && !singleMode ? referenceCopy : referenceSetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), hasQuerySet(false), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric), numberOfPrunes(0) { // We'll time tree building. //Timer::Start("tree_building"); if (tree::TreeTraits::RearrangesDataset) referenceCopy = referenceSetIn; // Construct as a naive object if we need to. if (!naive) referenceTree = aux::BuildTree(const_cast(referenceSet), oldFromNewReferences); // Stop the timer we started above. //Timer::Stop("tree_building"); } // Construct the object. template RASearch:: RASearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), hasQuerySet(true), naive(false), singleMode(singleMode), metric(metric), numberOfPrunes(0) // Nothing else to initialize. { } // Construct the object. template RASearch:: RASearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), hasQuerySet(false), naive(false), singleMode(singleMode), metric(metric), numberOfPrunes(0) // Nothing else to initialize. { } /** * The tree is the only member we may be responsible for deleting. The others * will take care of themselves. */ template RASearch:: ~RASearch() { if (treeOwner) { if (referenceTree) delete referenceTree; if (queryTree) delete queryTree; } } /** * Computes the best neighbors and stores them in resultingNeighbors and * distances. */ template void RASearch:: Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const double tau, const double alpha, const bool sampleAtLeaves, const bool firstLeafExact, const size_t singleSampleLimit) { //Timer::Start("computing_neighbors"); // If we have built the trees ourselves, then we will have to map all the // indices back to their original indices when this computation is finished. // To avoid an extra copy, we will store the neighbors and distances in a // separate matrix. arma::Mat* neighborPtr = &resultingNeighbors; arma::mat* distancePtr = &distances; // Mapping is only required if this tree type rearranges points and we are not // in naive mode. if (tree::TreeTraits::RearrangesDataset) { if (treeOwner && !(singleMode && hasQuerySet)) distancePtr = new arma::mat; // Query indices need to be mapped. if (treeOwner) neighborPtr = new arma::Mat; // All indices need mapping. } // Set the size of the neighbor and distance matrices. neighborPtr->set_size(k, querySet.n_cols); distancePtr->set_size(k, querySet.n_cols); distancePtr->fill(SortPolicy::WorstDistance()); size_t numPrunes = 0; if (naive) { // We don't need to run the base case on every possible combination of // points; we can achieve the rank approximation guarantee with probability // alpha by sampling the reference set. typedef RASearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric, tau, alpha, naive, sampleAtLeaves, firstLeafExact, singleSampleLimit); // Find how many samples from the reference set we need and sample uniformly // from the reference set without replacement. const size_t numSamples = rules.MinimumSamplesReqd(referenceSet.n_cols, k, tau, alpha); arma::uvec distinctSamples; rules.ObtainDistinctSamples(numSamples, referenceSet.n_cols, distinctSamples); // Run the base case on each combination of query point and sampled // reference point. for (size_t i = 0; i < querySet.n_cols; ++i) for (size_t j = 0; j < distinctSamples.n_elem; ++j) rules.BaseCase(i, (size_t) distinctSamples[j]); } else if (singleMode) { // Create the helper object for the tree traversal. Initialization of // RASearchRules already implicitly performs the naive tree traversal. typedef RASearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric, tau, alpha, naive, sampleAtLeaves, firstLeafExact, singleSampleLimit); // If the reference root node is a leaf, then the sampling has already been // done in the RASearchRules constructor. This happens when naive = true. if (!referenceTree->IsLeaf()) { Rcpp::Rcout << "Performing single-tree traversal..." << std::endl; // Create the traverser. typename TreeType::template SingleTreeTraverser traverser(rules); // Now have it traverse for each point. for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Single-tree traversal complete." << std::endl; Rcpp::Rcout << "Average number of distance calculations per query point: " << (rules.NumDistComputations() / querySet.n_cols) << "." << std::endl; } } else // Dual-tree recursion. { Rcpp::Rcout << "Performing dual-tree traversal..." << std::endl; typedef RASearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric, tau, alpha, sampleAtLeaves, firstLeafExact, singleSampleLimit); typename TreeType::template DualTreeTraverser traverser(rules); if (queryTree) { Rcpp::Rcout << "Query statistic pre-search: " << queryTree->Stat().NumSamplesMade() << std::endl; traverser.Traverse(*queryTree, *referenceTree); } else { Rcpp::Rcout << "Query statistic pre-search: " << referenceTree->Stat().NumSamplesMade() << std::endl; traverser.Traverse(*referenceTree, *referenceTree); } numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Dual-tree traversal complete." << std::endl; Rcpp::Rcout << "Average number of distance calculations per query point: " << (rules.NumDistComputations() / querySet.n_cols) << "." << std::endl; } //Timer::Stop("computing_neighbors"); Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; // Now, do we need to do mapping of indices? if (!treeOwner || !tree::TreeTraits::RearrangesDataset) { // No mapping needed. We are done. return; } else if (treeOwner && hasQuerySet && !singleMode) // Map both sets. { // Set size of output matrices correctly. resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewQueries[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewQueries[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrices. delete neighborPtr; delete distancePtr; } else if (treeOwner && !hasQuerySet) { // No query tree -- map both references and queries. resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewReferences[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewReferences[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } } else if (treeOwner && hasQuerySet && singleMode) // Map only references. { // Set size of neighbor indices matrix correctly. resultingNeighbors.set_size(k, querySet.n_cols); // Map indices of neighbors. for (size_t i = 0; i < resultingNeighbors.n_cols; i++) { for (size_t j = 0; j < resultingNeighbors.n_rows; j++) { resultingNeighbors(j, i) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrix. delete neighborPtr; } } // Search template void RASearch::ResetQueryTree() { if (!singleMode) { if (queryTree) ResetRAQueryStat(queryTree); else ResetRAQueryStat(referenceTree); } } template void RASearch::ResetRAQueryStat( TreeType* treeNode) { treeNode->Stat().Bound() = SortPolicy::WorstDistance(); treeNode->Stat().NumSamplesMade() = 0; for (size_t i = 0; i < treeNode->NumChildren(); i++) ResetRAQueryStat(&treeNode->Child(i)); } // Returns a String of the Object. template std::string RASearch::ToString() const { std::ostringstream convert; convert << "RA Search [" << this << "]" << std::endl; convert << " Reference Set: " << referenceSet.n_rows << "x" ; convert << referenceSet.n_cols << std::endl; if (&referenceSet != &querySet) convert << " QuerySet: " << querySet.n_rows << "x" << querySet.n_cols << std::endl; if (naive) convert << " Naive: TRUE" << std::endl; if (singleMode) convert << " Single Node: TRUE" << std::endl; convert << " Metric: " << std::endl << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/rann/ra_typedef.hpp0000644000176200001440000000503513647512751023167 0ustar liggesusers/** * @file ra_typedef.hpp * @author Parikshit Ram * * Simple typedefs describing template instantiations of the RASearch * class which are commonly used. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_RANN_RA_TYPEDEF_HPP #define __MLPACK_RANN_RA_TYPEDEF_HPP // In case someone included this directly. #include "ra_search.hpp" #include #include #include namespace mlpack { namespace neighbor { /** * The AllkRANN class is the all-k-rank-approximate-nearest-neighbors method. * It returns squared L2 distances (squared Euclidean distances) for each of the * k rank-approximate nearest-neighbors. Squared distances are used because * they are slightly faster than non-squared distances (they have one fewer call * to sqrt()). * * The approximation is controlled with two parameters (see allkrann_main.cpp) * which can be specified at search time. So the tree building is done only once * while the search can be performed multiple times with different approximation * levels. */ typedef RASearch<> AllkRANN; /** * The AllkRAFN class is the all-k-rank-approximate-farthest-neighbors method. * It returns squared L2 distances (squared Euclidean distances) for each of the * k rank-approximate farthest-neighbors. Squared distances are used because * they are slightly faster than non-squared distances (they have one fewer * call to sqrt()). * * The approximation is controlled with two parameters (see allkrann_main.cpp) * which can be specified at search time. So the tree building is done only once * while the search can be performed multiple times with different approximation * levels. */ typedef RASearch AllkRAFN; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/rann/ra_search_rules.hpp0000644000176200001440000003110413647512751024202 0ustar liggesusers/** * @file ra_search_rules.hpp * @author Parikshit Ram * * Defines the pruning rules and base case rules necessary to perform a * tree-based rank-approximate search (with an arbitrary tree) for the RASearch * class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_RULES_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_RULES_HPP #include "../neighbor_search/ns_traversal_info.hpp" #include "ra_search.hpp" // For friend declaration. namespace mlpack { namespace neighbor { template class RASearchRules { public: RASearchRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric, const double tau = 5, const double alpha = 0.95, const bool naive = false, const bool sampleAtLeaves = false, const bool firstLeafExact = false, const size_t singleSampleLimit = 20); double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * For rank-approximation, the scoring function first checks if pruning * by distance is possible. * If yes, then the node is given the score of * 'DBL_MAX' and the expected number of samples from that node are * added to the number of samples made for the query. * * If no, then the function tries to see if the node can be pruned by * approximation. If number of samples required from this node is small * enough, then that number of samples are acquired from this node * and the score is set to be 'DBL_MAX'. * * If the pruning by approximation is not possible either, the algorithm * continues with the usual tree-traversal. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * For rank-approximation, the scoring function first checks if pruning * by distance is possible. * If yes, then the node is given the score of * 'DBL_MAX' and the expected number of samples from that node are * added to the number of samples made for the query. * * If no, then the function tries to see if the node can be pruned by * approximation. If number of samples required from this node is small * enough, then that number of samples are acquired from this node * and the score is set to be 'DBL_MAX'. * * If the pruning by approximation is not possible either, the algorithm * continues with the usual tree-traversal. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be * recursed into at all (it should be pruned). This is used when the score * has already been calculated, but another recursion may have modified the * bounds for pruning. So the old score is checked against the new pruning * bound. * * For rank-approximation, it also checks if the number of samples left * for a query to satisfy the rank constraint is small enough at this * point of the algorithm, then this node is approximated by sampling * and given a new score of 'DBL_MAX'. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore); /** * Get the score for recursion order. A low score indicates priority for * recursionm while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * For the rank-approximation, we check if the referenceNode can be * approximated by sampling. If it can be, enough samples are made for * every query in the queryNode. No further query-tree traversal is * performed. * * The 'NumSamplesMade' query stat is propagated up the tree. And then * if pruning occurs (by distance or by sampling), the 'NumSamplesMade' * stat is not propagated down the tree. If no pruning occurs, the * stat is propagated down the tree. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Get the score for recursion order, passing the base case result (in the * situation where it may be needed to calculate the recursion order). A low * score indicates priority for recursion, while DBL_MAX indicates that the * node should not be recursed into at all (it should be pruned). * * For the rank-approximation, we check if the referenceNode can be * approximated by sampling. If it can be, enough samples are made for * every query in the queryNode. No further query-tree traversal is * performed. * * The 'NumSamplesMade' query stat is propagated up the tree. And then * if pruning occurs (by distance or by sampling), the 'NumSamplesMade' * stat is not propagated down the tree. If no pruning occurs, the * stat is propagated down the tree. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be * recursed into at all (it should be pruned). This is used when the score * has already been calculated, but another recursion may have modified the * bounds for pruning. So the old score is checked against the new pruning * bound. * * For the rank-approximation, we check if the referenceNode can be * approximated by sampling. If it can be, enough samples are made for * every query in the queryNode. No further query-tree traversal is * performed. * * The 'NumSamplesMade' query stat is propagated up the tree. And then * if pruning occurs (by distance or by sampling), the 'NumSamplesMade' * stat is not propagated down the tree. If no pruning occurs, the * stat is propagated down the tree. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Socre() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore); size_t NumDistComputations() { return numDistComputations; } size_t NumEffectiveSamples() { if (numSamplesMade.n_elem == 0) return 0; else return arma::sum(numSamplesMade); } typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference set. const arma::mat& referenceSet; //! The query set. const arma::mat& querySet; //! The matrix the resultant neighbor indices should be stored in. arma::Mat& neighbors; //! The matrix the resultant neighbor distances should be stored in. arma::mat& distances; //! The instantiated metric. MetricType& metric; //! Whether to sample at leaves or just use all of it bool sampleAtLeaves; //! Whether to do exact computation on the first leaf before any sampling bool firstLeafExact; //! The limit on the largest node that can be approximated by sampling size_t singleSampleLimit; //! The minimum number of samples required per query size_t numSamplesReqd; //! The number of samples made for every query arma::Col numSamplesMade; //! The sampling ratio double samplingRatio; // TO REMOVE: just for testing size_t numDistComputations; TraversalInfoType traversalInfo; /** * Insert a point into the neighbors and distances matrices; this is a helper * function. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); /** * Compute the minimum number of samples required to guarantee * the given rank-approximation and success probability. * * @param n Size of the set to be sampled from. * @param k The number of neighbors required within the rank-approximation. * @param tau The rank-approximation in percentile of the data. * @param alpha The success probability desired. */ size_t MinimumSamplesReqd(const size_t n, const size_t k, const double tau, const double alpha) const; /** * Compute the success probability of obtaining 'k'-neighbors from a * set of size 'n' within the top 't' neighbors if 'm' samples are made. * * @param n Size of the set being sampled from. * @param k The number of neighbors required within the rank-approximation. * @param m The number of random samples. * @param t The desired rank-approximation. */ double SuccessProbability(const size_t n, const size_t k, const size_t m, const size_t t) const; /** * Pick up desired number of samples (with replacement) from a given range * of integers so that only the distinct samples are returned from * the range [0 - specified upper bound) * * @param numSamples Number of random samples. * @param rangeUpperBound The upper bound on the range of integers. * @param distinctSamples The list of the distinct samples. */ void ObtainDistinctSamples(const size_t numSamples, const size_t rangeUpperBound, arma::uvec& distinctSamples) const; /** * Perform actual scoring for single-tree case. */ double Score(const size_t queryIndex, TreeType& referenceNode, const double distance, const double bestDistance); /** * Perform actual scoring for dual-tree case. */ double Score(TreeType& queryNode, TreeType& referenceNode, const double distance, const double bestDistance); // So that RASearch can access ObtainDistinctSamples() and // MinimumSamplesReqd(). Maybe refactoring is a better solution but this is // okay for now. friend class RASearch; }; // class RASearchRules }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "ra_search_rules_impl.hpp" #endif // __MLPACK_METHODS_RANN_RA_SEARCH_RULES_HPP RcppMLPACK/inst/include/mlpack/methods/rann/ra_search_rules_impl.hpp0000644000176200001440000010657713647512751025244 0ustar liggesusers/** * @file ra_search_rules_impl.hpp * @author Parikshit Ram * * Implementation of RASearchRules. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANN_RA_SEARCH_RULES_IMPL_HPP #define __MLPACK_METHODS_RANN_RA_SEARCH_RULES_IMPL_HPP // In case it hasn't been included yet. #include "ra_search_rules.hpp" namespace mlpack { namespace neighbor { template RASearchRules:: RASearchRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric, const double tau, const double alpha, const bool naive, const bool sampleAtLeaves, const bool firstLeafExact, const size_t singleSampleLimit) : referenceSet(referenceSet), querySet(querySet), neighbors(neighbors), distances(distances), metric(metric), sampleAtLeaves(sampleAtLeaves), firstLeafExact(firstLeafExact), singleSampleLimit(singleSampleLimit) { // Validate tau to make sure that the rank approximation is greater than the // number of neighbors requested. // The rank approximation. const size_t n = referenceSet.n_cols; const size_t k = neighbors.n_rows; const size_t t = (size_t) std::ceil(tau * (double) n / 100.0); if (t < k) { Rcpp::Rcout << "Rank-approximation percentile " << tau << " corresponds to " << t << " points, which is less than k (" << k << ")."; Rcpp::Rcout << "Cannot return " << k << " approximate nearest neighbors " << "from the nearest " << t << " points. Increase tau!" << std::endl; } else if (t == k) Rcpp::Rcout << "Rank-approximation percentile " << tau << " corresponds to " << t << " points; because k = " << k << ", this is exact search!" << std::endl; //Timer::Start("computing_number_of_samples_reqd"); numSamplesReqd = MinimumSamplesReqd(n, k, tau, alpha); //Timer::Stop("computing_number_of_samples_reqd"); // Initialize some statistics to be collected during the search. numSamplesMade = arma::zeros >(querySet.n_cols); numDistComputations = 0; samplingRatio = (double) numSamplesReqd / (double) n; Rcpp::Rcout << "Minimum samples required per query: " << numSamplesReqd << ", sampling ratio: " << samplingRatio << std::endl; if (naive) // No tree traversal; just do naive sampling here. { // Sample enough points. for (size_t i = 0; i < querySet.n_cols; ++i) { arma::uvec distinctSamples; ObtainDistinctSamples(numSamplesReqd, n, distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) BaseCase(i, (size_t) distinctSamples[j]); } } } template inline force_inline void RASearchRules:: ObtainDistinctSamples(const size_t numSamples, const size_t rangeUpperBound, arma::uvec& distinctSamples) const { // Keep track of the points that are sampled. arma::Col sampledPoints; sampledPoints.zeros(rangeUpperBound); for (size_t i = 0; i < numSamples; i++) sampledPoints[(size_t) math::RandInt(rangeUpperBound)]++; distinctSamples = arma::find(sampledPoints > 0); return; } template size_t RASearchRules:: MinimumSamplesReqd(const size_t n, const size_t k, const double tau, const double alpha) const { size_t ub = n; // The upper bound on the binary search. size_t lb = k; // The lower bound on the binary search. size_t m = lb; // The minimum number of random samples. // The rank-approximation. const size_t t = (size_t) std::ceil(tau * (double) n / 100.0); double prob; //Log::Assert(alpha <= 1.0); // going through all values of sample sizes // to find the minimum samples required to satisfy the // desired bound bool done = false; // This performs a binary search on the integer values between 'lb = k' // and 'ub = n' to find the minimum number of samples 'm' required to obtain // the desired success probability 'alpha'. do { prob = SuccessProbability(n, k, m, t); if (prob > alpha) { if (prob - alpha < 0.001 || ub < lb + 2) { done = true; break; } else ub = m; } else { if (prob < alpha) { if (m == lb) { m++; continue; } else lb = m; } else { done = true; break; } } m = (ub + lb) / 2; } while (!done); return (std::min(m + 1, n)); } template double RASearchRules::SuccessProbability( const size_t n, const size_t k, const size_t m, const size_t t) const { if (k == 1) { if (m > n - t) return 1.0; double eps = (double) t / (double) n; return 1.0 - std::pow(1.0 - eps, (double) m); } // Faster implementation for topK = 1. else { if (m < k) return 0.0; if (m > n - t + k - 1) return 1.0; double eps = (double) t / (double) n; double sum = 0.0; // The probability that 'k' of the 'm' samples lie within the top 't' // of the neighbors is given by: // sum_{j = k}^m Choose(m, j) (t/n)^j (1 - t/n)^{m - j} // which is also equal to // 1 - sum_{j = 0}^{k - 1} Choose(m, j) (t/n)^j (1 - t/n)^{m - j} // // So this is a m - k term summation or a k term summation. So if // m > 2k, do the k term summation, otherwise do the m term summation. size_t lb; size_t ub; bool topHalf; if (2 * k < m) { // Compute 1 - sum_{j = 0}^{k - 1} Choose(m, j) eps^j (1 - eps)^{m - j} // eps = t/n. // // Choosing 'lb' as 1 and 'ub' as k so as to sum from 1 to (k - 1), and // add the term (1 - eps)^m term separately. lb = 1; ub = k; topHalf = true; sum = std::pow(1 - eps, (double) m); } else { // Compute sum_{j = k}^m Choose(m, j) eps^j (1 - eps)^{m - j} // eps = t/n. // // Choosing 'lb' as k and 'ub' as m so as to sum from k to (m - 1), and // add the term eps^m term separately. lb = k; ub = m; topHalf = false; sum = std::pow(eps, (double) m); } for (size_t j = lb; j < ub; j++) { // Compute Choose(m, j). double mCj = (double) m; size_t jTrans; // If j < m - j, compute Choose(m, j). // If j > m - j, compute Choose(m, m - j). if (topHalf) jTrans = j; else jTrans = m - j; for(size_t i = 2; i <= jTrans; i++) { mCj *= (double) (m - (i - 1)); mCj /= (double) i; } sum += (mCj * std::pow(eps, (double) j) * std::pow(1.0 - eps, (double) (m - j))); } if (topHalf) sum = 1.0 - sum; return sum; } // For k > 1. } template inline force_inline double RASearchRules::BaseCase( const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, then this search is only using one dataset // and we should not return identical points. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return 0.0; double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // If this distance is better than any of the current candidates, the // SortDistance() function will give us the position to insert it into. arma::vec queryDist = distances.unsafe_col(queryIndex); arma::Col queryIndices = neighbors.unsafe_col(queryIndex); size_t insertPosition = SortPolicy::SortDistance(queryDist, queryIndices, distance); // SortDistance() returns (size_t() - 1) if we shouldn't add it. if (insertPosition != (size_t() - 1)) InsertNeighbor(queryIndex, insertPosition, referenceIndex, distance); numSamplesMade[queryIndex]++; // TO REMOVE numDistComputations++; return distance; } template inline double RASearchRules::Score( const size_t queryIndex, TreeType& referenceNode) { const arma::vec queryPoint = querySet.unsafe_col(queryIndex); const double distance = SortPolicy::BestPointToNodeDistance(queryPoint, &referenceNode); const double bestDistance = distances(distances.n_rows - 1, queryIndex); return Score(queryIndex, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult) { const arma::vec queryPoint = querySet.unsafe_col(queryIndex); const double distance = SortPolicy::BestPointToNodeDistance(queryPoint, &referenceNode, baseCaseResult); const double bestDistance = distances(distances.n_rows - 1, queryIndex); return Score(queryIndex, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( const size_t queryIndex, TreeType& referenceNode, const double distance, const double bestDistance) { // If this is better than the best distance we've seen so far, maybe there // will be something down this node. Also check if enough samples are already // made for this query. if (SortPolicy::IsBetter(distance, bestDistance) && numSamplesMade[queryIndex] < numSamplesReqd) { // We cannot prune this node; try approximating it by sampling. // If we are required to visit the first leaf (to find possible duplicates), // make sure we do not approximate. if (numSamplesMade[queryIndex] > 0 || !firstLeafExact) { // Check if this node can be approximated by sampling. size_t samplesReqd = (size_t) std::ceil(samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - numSamplesMade[queryIndex]); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples required and not at a leaf, then can't prune. return distance; } else { if (!referenceNode.IsLeaf()) { // Then samplesReqd <= singleSampleLimit. // Hence, approximate the node by sampling enough number of points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // Node approximated, so we can prune it. return DBL_MAX; } else // We are at a leaf. { if (sampleAtLeaves) // If allowed to sample at leaves. { // Approximate node by sampling enough number of points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // (Leaf) node approximated, so we can prune it. return DBL_MAX; } else { // Not allowed to sample from leaves, so cannot prune. return distance; } } } } else { // Try first to visit the first leaf to boost your accuracy and find // (near) duplicates if they exist. return distance; } } else { // Either there cannot be anything better in this node, or enough number of // samples are already made. So prune it. // Add 'fake' samples from this node; they are fake because the distances to // these samples need not be computed. // If enough samples are already made, this step does not change the result // of the search. numSamplesMade[queryIndex] += (size_t) std::floor( samplingRatio * (double) referenceNode.NumDescendants()); return DBL_MAX; } } template inline double RASearchRules:: Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) { // If we are already pruning, still prune. if (oldScore == DBL_MAX) return oldScore; // Just check the score again against the distances. const double bestDistance = distances(distances.n_rows - 1, queryIndex); // If this is better than the best distance we've seen so far, // maybe there will be something down this node. // Also check if enough samples are already made for this query. if (SortPolicy::IsBetter(oldScore, bestDistance) && numSamplesMade[queryIndex] < numSamplesReqd) { // We cannot prune this node; thus, we try approximating this node by // sampling. // Here, we assume that since we are re-scoring, the algorithm has already // sampled some candidates, and if specified, also traversed to the first // leaf. So no check regarding that is made any more. // Check if this node can be approximated by sampling. size_t samplesReqd = (size_t) std::ceil(samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - numSamplesMade[queryIndex]); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples are required and we are not at a leaf, then we // can't prune. return oldScore; } else { if (!referenceNode.IsLeaf()) { // Then, samplesReqd <= singleSampleLimit. Hence, approximate the node // by sampling enough number of points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function so // no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // Node approximated, so we can prune it. return DBL_MAX; } else // We are at a leaf. { if (sampleAtLeaves) { // Approximate node by sampling enough points. arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t i = 0; i < distinctSamples.n_elem; i++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[i])); // (Leaf) node approximated, so we can prune it. return DBL_MAX; } else { // We cannot sample from leaves, so we cannot prune. return oldScore; } } } } else { // Either there cannot be anything better in this node, or enough number of // samples are already made, so prune it. // Add 'fake' samples from this node; they are fake because the distances to // these samples need not be computed. If enough samples are already made, // this step does not change the result of the search. numSamplesMade[queryIndex] += (size_t) std::floor(samplingRatio * (double) referenceNode.NumDescendants()); return DBL_MAX; } } // Rescore(point, node, oldScore) template inline double RASearchRules::Score( TreeType& queryNode, TreeType& referenceNode) { // First try to find the distance bound to check if we can prune by distance. // Calculate the best node-to-node distance. const double distance = SortPolicy::BestNodeToNodeDistance(&queryNode, &referenceNode); double pointBound = DBL_MAX; double childBound = DBL_MAX; const double maxDescendantDistance = queryNode.FurthestDescendantDistance(); for (size_t i = 0; i < queryNode.NumPoints(); i++) { const double bound = distances(distances.n_rows - 1, queryNode.Point(i)) + maxDescendantDistance; if (bound < pointBound) pointBound = bound; } for (size_t i = 0; i < queryNode.NumChildren(); i++) { const double bound = queryNode.Child(i).Stat().Bound(); if (bound < childBound) childBound = bound; } // Update the bound. queryNode.Stat().Bound() = std::min(pointBound, childBound); const double bestDistance = queryNode.Stat().Bound(); return Score(queryNode, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult) { // First try to find the distance bound to check if we can prune // by distance. // Find the best node-to-node distance. const double distance = SortPolicy::BestNodeToNodeDistance(&queryNode, &referenceNode, baseCaseResult); double pointBound = DBL_MAX; double childBound = DBL_MAX; const double maxDescendantDistance = queryNode.FurthestDescendantDistance(); for (size_t i = 0; i < queryNode.NumPoints(); i++) { const double bound = distances(distances.n_rows - 1, queryNode.Point(i)) + maxDescendantDistance; if (bound < pointBound) pointBound = bound; } for (size_t i = 0; i < queryNode.NumChildren(); i++) { const double bound = queryNode.Child(i).Stat().Bound(); if (bound < childBound) childBound = bound; } // update the bound queryNode.Stat().Bound() = std::min(pointBound, childBound); const double bestDistance = queryNode.Stat().Bound(); return Score(queryNode, referenceNode, distance, bestDistance); } template inline double RASearchRules::Score( TreeType& queryNode, TreeType& referenceNode, const double distance, const double bestDistance) { // Update the number of samples made for this node -- propagate up from child // nodes if child nodes have made samples that the parent node is not aware // of. Remember, we must propagate down samples made to the child nodes if // 'queryNode' descend is deemed necessary. // Only update from children if a non-leaf node, obviously. if (!queryNode.IsLeaf()) { size_t numSamplesMadeInChildNodes = std::numeric_limits::max(); // Find the minimum number of samples made among all children. for (size_t i = 0; i < queryNode.NumChildren(); i++) { const size_t numSamples = queryNode.Child(i).Stat().NumSamplesMade(); if (numSamples < numSamplesMadeInChildNodes) numSamplesMadeInChildNodes = numSamples; } // The number of samples made for a node is propagated up from the child // nodes if the child nodes have made samples that the parent (which is the // current 'queryNode') is not aware of. queryNode.Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), numSamplesMadeInChildNodes); } // Now check if the node-pair interaction can be pruned. // If this is better than the best distance we've seen so far, maybe there // will be something down this node. Also check if enough samples are already // made for this 'queryNode'. if (SortPolicy::IsBetter(distance, bestDistance) && queryNode.Stat().NumSamplesMade() < numSamplesReqd) { // We cannot prune this node; try approximating this node by sampling. // If we are required to visit the first leaf (to find possible duplicates), // make sure we do not approximate. if (queryNode.Stat().NumSamplesMade() > 0 || !firstLeafExact) { // Check if this node can be approximated by sampling. size_t samplesReqd = (size_t) std::ceil(samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - queryNode.Stat().NumSamplesMade()); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples are required and we are not at a leaf, then we // can't prune. Since query tree descent is necessary now, propagate // the number of samples made down to the children. // Iterate through all children and propagate the number of samples made // to the children. Only update if the parent node has made samples the // children have not seen. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return distance; } else { if (!referenceNode.IsLeaf()) { // Then samplesReqd <= singleSampleLimit. Hence, approximate node by // sampling enough number of points for every query in the query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in the 'BaseCase' function // so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the queryNode and also update // the number of sample made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples // made for the child nodes of this query node. // Node is approximated, so we can prune it. return DBL_MAX; } else { if (sampleAtLeaves) { // Approximate node by sampling enough number of points for every // query in the query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in the 'BaseCase' // function so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the queryNode and also // update the number of sample made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples // made for the child nodes of this query node. // (Leaf) node is approximated, so we can prune it. return DBL_MAX; } else { // We cannot sample from leaves, so we cannot prune. Propagate the // number of samples made down to the children. // Go through all children and propagate the number of // samples made to the children. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return distance; } } } } else { // We must first visit the first leaf to boost accuracy. // Go through all children and propagate the number of // samples made to the children. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return distance; } } else { // Either there cannot be anything better in this node, or enough number of // samples are already made, so prune it. // Add 'fake' samples from this node; fake because the distances to // these samples need not be computed. If enough samples are already made, // this step does not change the result of the search since this queryNode // will never be descended anymore. queryNode.Stat().NumSamplesMade() += (size_t) std::floor(samplingRatio * (double) referenceNode.NumDescendants()); // Since we are not going to descend down the query tree for this reference // node, there is no point updating the number of samples made for the child // nodes of this query node. return DBL_MAX; } } template inline double RASearchRules:: Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) { if (oldScore == DBL_MAX) return oldScore; // First try to find the distance bound to check if we can prune by distance. double pointBound = DBL_MAX; double childBound = DBL_MAX; const double maxDescendantDistance = queryNode.FurthestDescendantDistance(); for (size_t i = 0; i < queryNode.NumPoints(); i++) { const double bound = distances(distances.n_rows - 1, queryNode.Point(i)) + maxDescendantDistance; if (bound < pointBound) pointBound = bound; } for (size_t i = 0; i < queryNode.NumChildren(); i++) { const double bound = queryNode.Child(i).Stat().Bound(); if (bound < childBound) childBound = bound; } // Update the bound. queryNode.Stat().Bound() = std::min(pointBound, childBound); const double bestDistance = queryNode.Stat().Bound(); // Now check if the node-pair interaction can be pruned by sampling. // Update the number of samples made for that node. Propagate up from child // nodes if child nodes have made samples that the parent node is not aware // of. Remember, we must propagate down samples made to the child nodes if // the parent samples. // Only update from children if a non-leaf node, obviously. if (!queryNode.IsLeaf()) { size_t numSamplesMadeInChildNodes = std::numeric_limits::max(); // Find the minimum number of samples made among all children for (size_t i = 0; i < queryNode.NumChildren(); i++) { const size_t numSamples = queryNode.Child(i).Stat().NumSamplesMade(); if (numSamples < numSamplesMadeInChildNodes) numSamplesMadeInChildNodes = numSamples; } // The number of samples made for a node is propagated up from the child // nodes if the child nodes have made samples that the parent (which is the // current 'queryNode') is not aware of. queryNode.Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), numSamplesMadeInChildNodes); } // Now check if the node-pair interaction can be pruned by sampling. // If this is better than the best distance we've seen so far, maybe there // will be something down this node. Also check if enough samples are already // made for this query. if (SortPolicy::IsBetter(oldScore, bestDistance) && queryNode.Stat().NumSamplesMade() < numSamplesReqd) { // We cannot prune this node, so approximate by sampling. // Here we assume that since we are re-scoring, the algorithm has already // sampled some candidates, and if specified, also traversed to the first // leaf. So no checks regarding that are made any more. size_t samplesReqd = (size_t) std::ceil( samplingRatio * (double) referenceNode.NumDescendants()); samplesReqd = std::min(samplesReqd, numSamplesReqd - queryNode.Stat().NumSamplesMade()); if (samplesReqd > singleSampleLimit && !referenceNode.IsLeaf()) { // If too many samples are required and we are not at a leaf, then we // can't prune. // Since query tree descent is necessary now, propagate the number of // samples made down to the children. // Go through all children and propagate the number of samples made to the // children. Only update if the parent node has made samples the children // have not seen. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return oldScore; } else { if (!referenceNode.IsLeaf()) // If not a leaf, { // then samplesReqd <= singleSampleLimit. Hence, approximate the node // by sampling enough points for every query in the query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in the 'BaseCase' // function so no book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the query node and also update // the number of samples made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples made // for the child nodes of this query node. // Node approximated, so we can prune it. return DBL_MAX; } else // We are at a leaf. { if (sampleAtLeaves) { // Approximate node by sampling enough points for every query in the // query node. for (size_t i = 0; i < queryNode.NumDescendants(); ++i) { const size_t queryIndex = queryNode.Descendant(i); arma::uvec distinctSamples; ObtainDistinctSamples(samplesReqd, referenceNode.NumDescendants(), distinctSamples); for (size_t j = 0; j < distinctSamples.n_elem; j++) // The counting of the samples are done in BaseCase() so no // book-keeping is required here. BaseCase(queryIndex, referenceNode.Descendant(distinctSamples[j])); } // Update the number of samples made for the query node and also // update the number of samples made for the child nodes. queryNode.Stat().NumSamplesMade() += samplesReqd; // Since we are not going to descend down the query tree for this // reference node, there is no point updating the number of samples // made for the child nodes of this query node. // (Leaf) node approximated, so we can prune it. return DBL_MAX; } else { // We cannot sample from leaves, so we cannot prune. // Propagate the number of samples made down to the children. for (size_t i = 0; i < queryNode.NumChildren(); i++) queryNode.Child(i).Stat().NumSamplesMade() = std::max( queryNode.Stat().NumSamplesMade(), queryNode.Child(i).Stat().NumSamplesMade()); return oldScore; } } } } else { // Either there cannot be anything better in this node, or enough samples // are already made, so prune it. // Add 'fake' samples from this node; fake because the distances to // these samples need not be computed. If enough samples are already made, // this step does not change the result of the search since this query node // will never be descended anymore. queryNode.Stat().NumSamplesMade() += (size_t) std::floor(samplingRatio * (double) referenceNode.NumDescendants()); // Since we are not going to descend down the query tree for this reference // node, there is no point updating the number of samples made for the child // nodes of this query node. return DBL_MAX; } } // Rescore(node, node, oldScore) /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void RASearchRules::InsertNeighbor( const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (distances.n_rows - 1)) { int len = (distances.n_rows - 1) - pos; memmove(distances.colptr(queryIndex) + (pos + 1), distances.colptr(queryIndex) + pos, sizeof(double) * len); memmove(neighbors.colptr(queryIndex) + (pos + 1), neighbors.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. distances(pos, queryIndex) = distance; neighbors(pos, queryIndex) = neighbor; } }; // namespace neighbor }; // namespace mlpack #endif // __MLPACK_METHODS_RANN_RA_SEARCH_RULES_IMPL_HPP RcppMLPACK/inst/include/mlpack/methods/fastmks/0000755000176200001440000000000013647512751021043 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/fastmks/fastmks_stat.hpp0000644000176200001440000000733213647512751024264 0ustar liggesusers/** * @file fastmks_stat.hpp * @author Ryan Curtin * * The statistic used in trees with FastMKS. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_STAT_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_STAT_HPP #include #include namespace mlpack { namespace fastmks { /** * The statistic used in trees with FastMKS. This stores both the bound and the * self-kernels for each node in the tree. */ class FastMKSStat { public: /** * Default initialization. */ FastMKSStat() : bound(-DBL_MAX), selfKernel(0.0), lastKernel(0.0), lastKernelNode(NULL) { } /** * Initialize this statistic for the given tree node. The TreeType's metric * better be IPMetric with some kernel type (that is, Metric().Kernel() must * exist). * * @param node Node that this statistic is built for. */ template FastMKSStat(const TreeType& node) : bound(-DBL_MAX), lastKernel(0.0), lastKernelNode(NULL) { // Do we have to calculate the centroid? if (tree::TreeTraits::FirstPointIsCentroid) { // If this type of tree has self-children, then maybe the evaluation is // already done. These statistics are built bottom-up, so the child stat // should already be done. if ((tree::TreeTraits::HasSelfChildren) && (node.NumChildren() > 0) && (node.Point(0) == node.Child(0).Point(0))) { selfKernel = node.Child(0).Stat().SelfKernel(); } else { selfKernel = sqrt(node.Metric().Kernel().Evaluate( node.Dataset().unsafe_col(node.Point(0)), node.Dataset().unsafe_col(node.Point(0)))); } } else { // Calculate the centroid. arma::vec centroid; node.Centroid(centroid); selfKernel = sqrt(node.Metric().Kernel().Evaluate(centroid, centroid)); } } //! Get the self-kernel. double SelfKernel() const { return selfKernel; } //! Modify the self-kernel. double& SelfKernel() { return selfKernel; } //! Get the bound. double Bound() const { return bound; } //! Modify the bound. double& Bound() { return bound; } //! Get the last kernel evaluation. double LastKernel() const { return lastKernel; } //! Modify the last kernel evaluation. double& LastKernel() { return lastKernel; } //! Get the address of the node corresponding to the last distance evaluation. void* LastKernelNode() const { return lastKernelNode; } //! Modify the address of the node corresponding to the last distance //! evaluation. void*& LastKernelNode() { return lastKernelNode; } private: //! The bound for pruning. double bound; //! The self-kernel evaluation: sqrt(K(centroid, centroid)). double selfKernel; //! The last kernel evaluation. double lastKernel; //! The node corresponding to the last kernel evaluation. This has to be void //! otherwise we get recursive template arguments. void* lastKernelNode; }; }; // namespace fastmks }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/fastmks/fastmks_rules_impl.hpp0000644000176200001440000004517513647512751025473 0ustar liggesusers/** * @file fastmks_rules_impl.hpp * @author Ryan Curtin * * Implementation of FastMKSRules for cover tree search. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_IMPL_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_IMPL_HPP // In case it hasn't already been included. #include "fastmks_rules.hpp" namespace mlpack { namespace fastmks { template FastMKSRules::FastMKSRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& indices, arma::mat& products, KernelType& kernel) : referenceSet(referenceSet), querySet(querySet), indices(indices), products(products), kernel(kernel), lastQueryIndex(-1), lastReferenceIndex(-1), lastKernel(0.0), baseCases(0), scores(0) { // Precompute each self-kernel. queryKernels.set_size(querySet.n_cols); for (size_t i = 0; i < querySet.n_cols; ++i) queryKernels[i] = sqrt(kernel.Evaluate(querySet.unsafe_col(i), querySet.unsafe_col(i))); referenceKernels.set_size(referenceSet.n_cols); for (size_t i = 0; i < referenceSet.n_cols; ++i) referenceKernels[i] = sqrt(kernel.Evaluate(referenceSet.unsafe_col(i), referenceSet.unsafe_col(i))); // Set to invalid memory, so that the first node combination does not try to // dereference null pointers. traversalInfo.LastQueryNode() = (TreeType*) this; traversalInfo.LastReferenceNode() = (TreeType*) this; } template inline force_inline double FastMKSRules::BaseCase( const size_t queryIndex, const size_t referenceIndex) { // Score() always happens before BaseCase() for a given node combination. For // cover trees, the kernel evaluation between the two centroid points already // happened. So we don't need to do it. Note that this optimizes out if the // first conditional is false (its result is known at compile time). if (tree::TreeTraits::FirstPointIsCentroid) { if ((queryIndex == lastQueryIndex) && (referenceIndex == lastReferenceIndex)) return lastKernel; // Store new values. lastQueryIndex = queryIndex; lastReferenceIndex = referenceIndex; } ++baseCases; double kernelEval = kernel.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // Update the last kernel value, if we need to. if (tree::TreeTraits::FirstPointIsCentroid) lastKernel = kernelEval; // If the reference and query sets are identical, we still need to compute the // base case (so that things can be bounded properly), but we won't add it to // the results. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return kernelEval; // If this is a better candidate, insert it into the list. if (kernelEval < products(products.n_rows - 1, queryIndex)) return kernelEval; size_t insertPosition = 0; for ( ; insertPosition < products.n_rows; ++insertPosition) if (kernelEval >= products(insertPosition, queryIndex)) break; InsertNeighbor(queryIndex, insertPosition, referenceIndex, kernelEval); return kernelEval; } template double FastMKSRules::Score(const size_t queryIndex, TreeType& referenceNode) { // Compare with the current best. const double bestKernel = products(products.n_rows - 1, queryIndex); // See if we can perform a parent-child prune. const double furthestDist = referenceNode.FurthestDescendantDistance(); if (referenceNode.Parent() != NULL) { double maxKernelBound; const double parentDist = referenceNode.ParentDistance(); const double combinedDistBound = parentDist + furthestDist; const double lastKernel = referenceNode.Parent()->Stat().LastKernel(); if (kernel::KernelTraits::IsNormalized) { const double squaredDist = std::pow(combinedDistBound, 2.0); const double delta = (1 - 0.5 * squaredDist); if (lastKernel <= delta) { const double gamma = combinedDistBound * sqrt(1 - 0.25 * squaredDist); maxKernelBound = lastKernel * delta + gamma * sqrt(1 - std::pow(lastKernel, 2.0)); } else { maxKernelBound = 1.0; } } else { maxKernelBound = lastKernel + combinedDistBound * queryKernels[queryIndex]; } if (maxKernelBound < bestKernel) return DBL_MAX; } // Calculate the maximum possible kernel value, either by calculating the // centroid or, if the centroid is a point, use that. ++scores; double kernelEval; if (tree::TreeTraits::FirstPointIsCentroid) { // Could it be that this kernel evaluation has already been calculated? if (tree::TreeTraits::HasSelfChildren && referenceNode.Parent() != NULL && referenceNode.Point(0) == referenceNode.Parent()->Point(0)) { kernelEval = referenceNode.Parent()->Stat().LastKernel(); } else { kernelEval = BaseCase(queryIndex, referenceNode.Point(0)); } } else { const arma::vec queryPoint = querySet.unsafe_col(queryIndex); arma::vec refCentroid; referenceNode.Centroid(refCentroid); kernelEval = kernel.Evaluate(queryPoint, refCentroid); } referenceNode.Stat().LastKernel() = kernelEval; double maxKernel; if (kernel::KernelTraits::IsNormalized) { const double squaredDist = std::pow(furthestDist, 2.0); const double delta = (1 - 0.5 * squaredDist); if (kernelEval <= delta) { const double gamma = furthestDist * sqrt(1 - 0.25 * squaredDist); maxKernel = kernelEval * delta + gamma * sqrt(1 - std::pow(kernelEval, 2.0)); } else { maxKernel = 1.0; } } else { maxKernel = kernelEval + furthestDist * queryKernels[queryIndex]; } // We return the inverse of the maximum kernel so that larger kernels are // recursed into first. return (maxKernel > bestKernel) ? (1.0 / maxKernel) : DBL_MAX; } template double FastMKSRules::Score(TreeType& queryNode, TreeType& referenceNode) { // Update and get the query node's bound. queryNode.Stat().Bound() = CalculateBound(queryNode); const double bestKernel = queryNode.Stat().Bound(); // First, see if we can make a parent-child or parent-parent prune. These // four bounds on the maximum kernel value are looser than the bound normally // used, but they can prevent a base case from needing to be calculated. // Convenience caching so lines are shorter. const double queryParentDist = queryNode.ParentDistance(); const double queryDescDist = queryNode.FurthestDescendantDistance(); const double refParentDist = referenceNode.ParentDistance(); const double refDescDist = referenceNode.FurthestDescendantDistance(); double adjustedScore = traversalInfo.LastBaseCase(); const double queryDistBound = (queryParentDist + queryDescDist); const double refDistBound = (refParentDist + refDescDist); double dualQueryTerm; double dualRefTerm; // The parent-child and parent-parent prunes work by applying the same pruning // condition as when the parent node was used, except they are tighter because // queryDistBound < queryNode.Parent()->FurthestDescendantDistance() // and // refDistBound < referenceNode.Parent()->FurthestDescendantDistance() // so we construct the same bounds that were used when Score() was called with // the parents, except with the tighter distance bounds. Sometimes this // allows us to prune nodes without evaluating the base cases between them. if (traversalInfo.LastQueryNode() == queryNode.Parent()) { // We can assume that queryNode.Parent() != NULL, because at the root node // combination, the traversalInfo.LastQueryNode() pointer will _not_ be // NULL. We also should be guaranteed that // traversalInfo.LastReferenceNode() is either the reference node or the // parent of the reference node. adjustedScore += queryDistBound * traversalInfo.LastReferenceNode()->Stat().SelfKernel(); dualQueryTerm = queryDistBound; } else { // The query parent could be NULL, which does weird things and we have to // consider. if (traversalInfo.LastReferenceNode() != NULL) { adjustedScore += queryDescDist * traversalInfo.LastReferenceNode()->Stat().SelfKernel(); dualQueryTerm = queryDescDist; } else { // This makes it so a child-parent (or parent-parent) prune is not // possible. dualQueryTerm = 0.0; adjustedScore = bestKernel; } } if (traversalInfo.LastReferenceNode() == referenceNode.Parent()) { // We can assume that referenceNode.Parent() != NULL, because at the root // node combination, the traversalInfo.LastReferenceNode() pointer will // _not_ be NULL. adjustedScore += refDistBound * traversalInfo.LastQueryNode()->Stat().SelfKernel(); dualRefTerm = refDistBound; } else { // The reference parent could be NULL, which does weird things and we have // to consider. if (traversalInfo.LastQueryNode() != NULL) { adjustedScore += refDescDist * traversalInfo.LastQueryNode()->Stat().SelfKernel(); dualRefTerm = refDescDist; } else { // This makes it so a child-parent (or parent-parent) prune is not // possible. dualRefTerm = 0.0; adjustedScore = bestKernel; } } // Now add the dual term. adjustedScore += (dualQueryTerm * dualRefTerm); if (adjustedScore < bestKernel) { // It is not possible that this node combination can contain a point // combination with kernel value better than the minimum kernel value to // improve any of the results, so we can prune it. return DBL_MAX; } // We were unable to perform a parent-child or parent-parent prune, so now we // must calculate kernel evaluation, if necessary. double kernelEval = 0.0; if (tree::TreeTraits::FirstPointIsCentroid) { // For this type of tree, we may have already calculated the base case in // the parents. if ((traversalInfo.LastQueryNode() != NULL) && (traversalInfo.LastReferenceNode() != NULL) && (traversalInfo.LastQueryNode()->Point(0) == queryNode.Point(0)) && (traversalInfo.LastReferenceNode()->Point(0) == referenceNode.Point(0))) { // Base case already done. kernelEval = traversalInfo.LastBaseCase(); // When BaseCase() is called after Score(), these must be correct so that // another kernel evaluation is not performed. lastQueryIndex = queryNode.Point(0); lastReferenceIndex = referenceNode.Point(0); } else { // The kernel must be evaluated, but it is between points in the dataset, // so we can call BaseCase(). BaseCase() will set lastQueryIndex and // lastReferenceIndex correctly. kernelEval = BaseCase(queryNode.Point(0), referenceNode.Point(0)); } traversalInfo.LastBaseCase() = kernelEval; } else { // Calculate the maximum possible kernel value. arma::vec queryCentroid; arma::vec refCentroid; queryNode.Centroid(queryCentroid); referenceNode.Centroid(refCentroid); kernelEval = kernel.Evaluate(queryCentroid, refCentroid); traversalInfo.LastBaseCase() = kernelEval; } ++scores; double maxKernel; if (kernel::KernelTraits::IsNormalized) { // We have a tighter bound for normalized kernels. const double querySqDist = std::pow(queryDescDist, 2.0); const double refSqDist = std::pow(refDescDist, 2.0); const double bothSqDist = std::pow((queryDescDist + refDescDist), 2.0); if (kernelEval <= (1 - 0.5 * bothSqDist)) { const double queryDelta = (1 - 0.5 * querySqDist); const double queryGamma = queryDescDist * sqrt(1 - 0.25 * querySqDist); const double refDelta = (1 - 0.5 * refSqDist); const double refGamma = refDescDist * sqrt(1 - 0.25 * refSqDist); maxKernel = kernelEval * (queryDelta * refDelta - queryGamma * refGamma) + sqrt(1 - std::pow(kernelEval, 2.0)) * (queryGamma * refDelta + queryDelta * refGamma); } else { maxKernel = 1.0; } } else { // Use standard bound; kernel is not normalized. const double refKernelTerm = queryDescDist * referenceNode.Stat().SelfKernel(); const double queryKernelTerm = refDescDist * queryNode.Stat().SelfKernel(); maxKernel = kernelEval + refKernelTerm + queryKernelTerm + (queryDescDist * refDescDist); } // Store relevant information for parent-child pruning. traversalInfo.LastQueryNode() = &queryNode; traversalInfo.LastReferenceNode() = &referenceNode; // We return the inverse of the maximum kernel so that larger kernels are // recursed into first. return (maxKernel > bestKernel) ? (1.0 / maxKernel) : DBL_MAX; } template double FastMKSRules::Rescore(const size_t queryIndex, TreeType& /*referenceNode*/, const double oldScore) const { const double bestKernel = products(products.n_rows - 1, queryIndex); return ((1.0 / oldScore) > bestKernel) ? oldScore : DBL_MAX; } template double FastMKSRules::Rescore(TreeType& queryNode, TreeType& /*referenceNode*/, const double oldScore) const { queryNode.Stat().Bound() = CalculateBound(queryNode); const double bestKernel = queryNode.Stat().Bound(); return ((1.0 / oldScore) > bestKernel) ? oldScore : DBL_MAX; } /** * Calculate the bound for the given query node. This bound represents the * minimum value which a node combination must achieve to guarantee an * improvement in the results. * * @param queryNode Query node to calculate bound for. */ template double FastMKSRules::CalculateBound(TreeType& queryNode) const { // We have four possible bounds -- just like NeighborSearchRules, but they are // slightly different in this context. // // (1) min ( min_{all points p in queryNode} P_p[k], // min_{all children c in queryNode} B(c) ); // (2) max_{all points p in queryNode} P_p[k] + (worst child distance + worst // descendant distance) sqrt(K(I_p[k], I_p[k])); // (3) max_{all children c in queryNode} B(c) + <-- not done yet. ignored. // (4) B(parent of queryNode); double worstPointKernel = DBL_MAX; double bestAdjustedPointKernel = -DBL_MAX; const double queryDescendantDistance = queryNode.FurthestDescendantDistance(); // Loop over all points in this node to find the best and worst. for (size_t i = 0; i < queryNode.NumPoints(); ++i) { const size_t point = queryNode.Point(i); if (products(products.n_rows - 1, point) < worstPointKernel) worstPointKernel = products(products.n_rows - 1, point); if (products(products.n_rows - 1, point) == -DBL_MAX) continue; // Avoid underflow. // This should be (queryDescendantDistance + centroidDistance) for any tree // but it works for cover trees since centroidDistance = 0 for cover trees. const double candidateKernel = products(products.n_rows - 1, point) - queryDescendantDistance * referenceKernels[indices(indices.n_rows - 1, point)]; if (candidateKernel > bestAdjustedPointKernel) bestAdjustedPointKernel = candidateKernel; } // Loop over all the children in the node. double worstChildKernel = DBL_MAX; for (size_t i = 0; i < queryNode.NumChildren(); ++i) { if (queryNode.Child(i).Stat().Bound() < worstChildKernel) worstChildKernel = queryNode.Child(i).Stat().Bound(); } // Now assemble bound (1). const double firstBound = (worstPointKernel < worstChildKernel) ? worstPointKernel : worstChildKernel; // Bound (2) is bestAdjustedPointKernel. const double fourthBound = (queryNode.Parent() == NULL) ? -DBL_MAX : queryNode.Parent()->Stat().Bound(); // Pick the best of these bounds. const double interA = (firstBound > bestAdjustedPointKernel) ? firstBound : bestAdjustedPointKernel; // const double interA = 0.0; const double interB = fourthBound; return (interA > interB) ? interA : interB; } /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void FastMKSRules::InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (products.n_rows - 1)) { int len = (products.n_rows - 1) - pos; memmove(products.colptr(queryIndex) + (pos + 1), products.colptr(queryIndex) + pos, sizeof(double) * len); memmove(indices.colptr(queryIndex) + (pos + 1), indices.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. products(pos, queryIndex) = distance; indices(pos, queryIndex) = neighbor; } }; // namespace fastmks }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/fastmks/fastmks_impl.hpp0000644000176200001440000004433413647512751024255 0ustar liggesusers/** * @file fastmks_impl.hpp * @author Ryan Curtin * * Implementation of the FastMKS class (fast max-kernel search). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_IMPL_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_IMPL_HPP // In case it hasn't yet been included. #include "fastmks.hpp" #include "fastmks_rules.hpp" #include #include namespace mlpack { namespace fastmks { // Single dataset, no instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, const bool single, const bool naive) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive) { //Timer::Start("tree_building"); if (!naive) referenceTree = new TreeType(referenceSet); if (!naive && !single) queryTree = new TreeType(referenceSet); //Timer::Stop("tree_building"); } // Two datasets, no instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, const bool single, const bool naive) : referenceSet(referenceSet), querySet(querySet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive) { //Timer::Start("tree_building"); // If necessary, the trees should be built. if (!naive) referenceTree = new TreeType(referenceSet); if (!naive && !single) queryTree = new TreeType(querySet); //Timer::Stop("tree_building"); } // One dataset, instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, KernelType& kernel, const bool single, const bool naive) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive), metric(kernel) { //Timer::Start("tree_building"); // If necessary, the reference tree should be built. There is no query tree. if (!naive) referenceTree = new TreeType(referenceSet, metric); if (!naive && !single) queryTree = new TreeType(referenceSet, metric); //Timer::Stop("tree_building"); } // Two datasets, instantiated kernel. template FastMKS::FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, KernelType& kernel, const bool single, const bool naive) : referenceSet(referenceSet), querySet(querySet), referenceTree(NULL), queryTree(NULL), treeOwner(true), single(single), naive(naive), metric(kernel) { //Timer::Start("tree_building"); // If necessary, the trees should be built. if (!naive) referenceTree = new TreeType(referenceSet, metric); if (!naive && !single) queryTree = new TreeType(querySet, metric); //Timer::Stop("tree_building"); } // One dataset, pre-built tree. template FastMKS::FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const bool single, const bool naive) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), single(single), naive(naive), metric(referenceTree->Metric()) { // The query tree cannot be the same as the reference tree. if (referenceTree) queryTree = new TreeType(*referenceTree); } // Two datasets, pre-built trees. template FastMKS::FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const arma::mat& querySet, TreeType* queryTree, const bool single, const bool naive) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), single(single), naive(naive), metric(referenceTree->Metric()) { // Nothing to do. } template FastMKS::~FastMKS() { // If we created the trees, we must delete them. if (treeOwner) { if (queryTree) delete queryTree; if (referenceTree) delete referenceTree; } else if (&querySet == &referenceSet) { // The user passed in a reference tree which we needed to copy. if (queryTree) delete queryTree; } } template void FastMKS::Search(const size_t k, arma::Mat& indices, arma::mat& products) { // No remapping will be necessary because we are using the cover tree. indices.set_size(k, querySet.n_cols); products.set_size(k, querySet.n_cols); products.fill(-DBL_MAX); //Timer::Start("computing_products"); // Naive implementation. if (naive) { // Simple double loop. Stupid, slow, but a good benchmark. for (size_t q = 0; q < querySet.n_cols; ++q) { for (size_t r = 0; r < referenceSet.n_cols; ++r) { if ((&querySet == &referenceSet) && (q == r)) continue; const double eval = metric.Kernel().Evaluate(querySet.unsafe_col(q), referenceSet.unsafe_col(r)); size_t insertPosition; for (insertPosition = 0; insertPosition < indices.n_rows; ++insertPosition) if (eval > products(insertPosition, q)) break; if (insertPosition < indices.n_rows) InsertNeighbor(indices, products, q, insertPosition, r, eval); } } //Timer::Stop("computing_products"); return; } // Single-tree implementation. if (single) { // Create rules object (this will store the results). This constructor // precalculates each self-kernel value. typedef FastMKSRules RuleType; RuleType rules(referenceSet, querySet, indices, products, metric.Kernel()); typename TreeType::template SingleTreeTraverser traverser(rules); for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); // Save the number of pruned nodes. const size_t numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; Rcpp::Rcout << rules.BaseCases() << " base cases." << std::endl; Rcpp::Rcout << rules.Scores() << " scores." << std::endl; //Timer::Stop("computing_products"); return; } // Dual-tree implementation. typedef FastMKSRules RuleType; RuleType rules(referenceSet, querySet, indices, products, metric.Kernel()); typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*queryTree, *referenceTree); const size_t numPrunes = traverser.NumPrunes(); Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; Rcpp::Rcout << rules.BaseCases() << " base cases." << std::endl; Rcpp::Rcout << rules.Scores() << " scores." << std::endl; //Timer::Stop("computing_products"); return; } /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void FastMKS::InsertNeighbor(arma::Mat& indices, arma::mat& products, const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (products.n_rows - 1)) { int len = (products.n_rows - 1) - pos; memmove(products.colptr(queryIndex) + (pos + 1), products.colptr(queryIndex) + pos, sizeof(double) * len); memmove(indices.colptr(queryIndex) + (pos + 1), indices.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. products(pos, queryIndex) = distance; indices(pos, queryIndex) = neighbor; } // Return string of object. template std::string FastMKS::ToString() const { std::ostringstream convert; convert << "FastMKS [" << this << "]" << std::endl; convert << " Naive: " << naive << std::endl; convert << " Single: " << single << std::endl; convert << " Metric: " << std::endl; convert << mlpack::util::Indent(metric.ToString(),2); convert << std::endl; return convert.str(); } // Specialized implementation for tighter bounds for Gaussian. /* template<> void FastMKS::Search(const size_t k, arma::Mat& indices, arma::mat& products) { Log::Warn << "Alternate implementation!" << std::endl; // Terrible copypasta. Bad bad bad. // No remapping will be necessary. indices.set_size(k, querySet.n_cols); products.set_size(k, querySet.n_cols); products.fill(-1.0); Timer::Start("computing_products"); size_t kernelEvaluations = 0; // Naive implementation. if (naive) { // Simple double loop. Stupid, slow, but a good benchmark. for (size_t q = 0; q < querySet.n_cols; ++q) { for (size_t r = 0; r < referenceSet.n_cols; ++r) { const double eval = metric.Kernel().Evaluate(querySet.unsafe_col(q), referenceSet.unsafe_col(r)); ++kernelEvaluations; size_t insertPosition; for (insertPosition = 0; insertPosition < indices.n_rows; ++insertPosition) if (eval > products(insertPosition, q)) break; if (insertPosition < indices.n_rows) InsertNeighbor(indices, products, q, insertPosition, r, eval); } } Timer::Stop("computing_products"); Rcpp::Rcout << "Kernel evaluations: " << kernelEvaluations << "." << std::endl; return; } // Single-tree implementation. if (single) { // Calculate number of pruned nodes. size_t numPrunes = 0; // Precalculate query products ( || q || for all q). arma::vec queryProducts(querySet.n_cols); for (size_t queryIndex = 0; queryIndex < querySet.n_cols; ++queryIndex) queryProducts[queryIndex] = sqrt(metric.Kernel().Evaluate( querySet.unsafe_col(queryIndex), querySet.unsafe_col(queryIndex))); kernelEvaluations += querySet.n_cols; // Screw the CoverTreeTraverser, we'll implement it by hand. for (size_t queryIndex = 0; queryIndex < querySet.n_cols; ++queryIndex) { // Use an array of priority queues? std::priority_queue< SearchFrame > >, std::vector > > >, SearchFrameCompare > > > frameQueue; // Add initial frame. SearchFrame > > nextFrame; nextFrame.node = referenceTree; nextFrame.eval = metric.Kernel().Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceTree->Point())); Log::Assert(nextFrame.eval <= 1); ++kernelEvaluations; // The initial evaluation will be the best so far. indices(0, queryIndex) = referenceTree->Point(); products(0, queryIndex) = nextFrame.eval; frameQueue.push(nextFrame); tree::CoverTree >* referenceNode; double eval; double maxProduct; while (!frameQueue.empty()) { // Get the information for this node. const SearchFrame > >& frame = frameQueue.top(); referenceNode = frame.node; eval = frame.eval; // Loop through the children, seeing if we can prune them; if not, add // them to the queue. The self-child is different -- it has the same // parent (and therefore the same kernel evaluation). if (referenceNode->NumChildren() > 0) { SearchFrame > > childFrame; // We must handle the self-child differently, to avoid adding it to // the results twice. childFrame.node = &(referenceNode->Child(0)); childFrame.eval = eval; // Alternate pruning rule. const double mdd = childFrame.node->FurthestDescendantDistance(); if (eval >= (1 - std::pow(mdd, 2.0) / 2.0)) maxProduct = 1; else maxProduct = eval * (1 - std::pow(mdd, 2.0) / 2.0) + sqrt(1 - std::pow(eval, 2.0)) * mdd * sqrt(1 - std::pow(mdd, 2.0) / 4.0); // Add self-child if we can't prune it. if (maxProduct > products(products.n_rows - 1, queryIndex)) { // But only if it has children of its own. if (childFrame.node->NumChildren() > 0) frameQueue.push(childFrame); } else ++numPrunes; for (size_t i = 1; i < referenceNode->NumChildren(); ++i) { // Before we evaluate the child, let's see if it can possibly have // a better evaluation. const double mpdd = std::min( referenceNode->Child(i).ParentDistance() + referenceNode->Child(i).FurthestDescendantDistance(), 2.0); double maxChildEval = 1; if (eval < (1 - std::pow(mpdd, 2.0) / 2.0)) maxChildEval = eval * (1 - std::pow(mpdd, 2.0) / 2.0) + sqrt(1 - std::pow(eval, 2.0)) * mpdd * sqrt(1 - std::pow(mpdd, 2.0) / 4.0); if (maxChildEval > products(products.n_rows - 1, queryIndex)) { // Evaluate child. childFrame.node = &(referenceNode->Child(i)); childFrame.eval = metric.Kernel().Evaluate( querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceNode->Child(i).Point())); ++kernelEvaluations; // Can we prune it? If we can, we can avoid putting it in the // queue (saves time). const double cmdd = childFrame.node->FurthestDescendantDistance(); if (childFrame.eval > (1 - std::pow(cmdd, 2.0) / 2.0)) maxProduct = 1; else maxProduct = childFrame.eval * (1 - std::pow(cmdd, 2.0) / 2.0) + sqrt(1 - std::pow(eval, 2.0)) * cmdd * sqrt(1 - std::pow(cmdd, 2.0) / 4.0); if (maxProduct > products(products.n_rows - 1, queryIndex)) { // Good enough to recurse into. While we're at it, check the // actual evaluation and see if it's an improvement. if (childFrame.eval > products(products.n_rows - 1, queryIndex)) { // This is a better result. Find out where to insert. size_t insertPosition = 0; for ( ; insertPosition < products.n_rows - 1; ++insertPosition) if (childFrame.eval > products(insertPosition, queryIndex)) break; // Insert into the correct position; we are guaranteed that // insertPosition is valid. InsertNeighbor(indices, products, queryIndex, insertPosition, childFrame.node->Point(), childFrame.eval); } // Now add this to the queue (if it has any children which may // need to be recursed into). if (childFrame.node->NumChildren() > 0) frameQueue.push(childFrame); } else ++numPrunes; } else ++numPrunes; } } frameQueue.pop(); } } Rcpp::Rcout << "Pruned " << numPrunes << " nodes." << std::endl; Rcpp::Rcout << "Kernel evaluations: " << kernelEvaluations << "." << std::endl; Rcpp::Rcout << "Distance evaluations: " << distanceEvaluations << "." << std::endl; Timer::Stop("computing_products"); return; } // Double-tree implementation. Log::Fatal << "Dual-tree search not implemented yet... oops..." << std::endl; } */ }; // namespace fastmks }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/fastmks/fastmks.hpp0000644000176200001440000002172113647512751023227 0ustar liggesusers/** * @file fastmks.hpp * @author Ryan Curtin * * Definition of the FastMKS class, which implements fast exact max-kernel * search. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_HPP #include #include #include "fastmks_stat.hpp" #include namespace mlpack { namespace fastmks /** Fast max-kernel search. */ { /** * An implementation of fast exact max-kernel search. Given a query dataset and * a reference dataset (or optionally just a reference dataset which is also * used as the query dataset), fast exact max-kernel search finds, for each * point in the query dataset, the k points in the reference set with maximum * kernel value K(p_q, p_r), where k is a specified parameter and K() is a * Mercer kernel. * * For more information, see the following paper. * * @code * @inproceedings{curtin2013fast, * title={Fast Exact Max-Kernel Search}, * author={Curtin, Ryan R. and Ram, Parikshit and Gray, Alexander G.}, * booktitle={Proceedings of the 2013 SIAM International Conference on Data * Mining (SDM 13)}, * year={2013} * } * @endcode * * This class allows specification of the type of kernel and also of the type of * tree. FastMKS can be run on kernels that work on arbitrary objects -- * however, this only works with cover trees and other trees that are built only * on points in the dataset (and not centroids of regions or anything like * that). * * @tparam KernelType Type of kernel to run FastMKS with. * @tparam TreeType Type of tree to run FastMKS with; it must have metric * IPMetric. */ template< typename KernelType, typename TreeType = tree::CoverTree, tree::FirstPointIsRoot, FastMKSStat> > class FastMKS { public: /** * Create the FastMKS object using the reference set as the query set. * Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Set of data to run FastMKS on. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, const bool single = false, const bool naive = false); /** * Create the FastMKS object using separate reference and query sets. * Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Reference set of data for FastMKS. * @param querySet Set of query points for FastMKS. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, const bool single = false, const bool naive = false); /** * Create the FastMKS object using the reference set as the query set, and * with an initialized kernel. This is useful for when the kernel stores * state. Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Reference set of data for FastMKS. * @param kernel Initialized kernel. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, KernelType& kernel, const bool single = false, const bool naive = false); /** * Create the FastMKS object using separate reference and query sets, and with * an initialized kernel. This is useful for when the kernel stores state. * Optionally, specify whether or not single-tree search or naive * (brute-force) search should be used. * * @param referenceSet Reference set of data for FastMKS. * @param querySet Set of query points for FastMKS. * @param kernel Initialized kernel. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, const arma::mat& querySet, KernelType& kernel, const bool single = false, const bool naive = false); /** * Create the FastMKS object with an already-initialized tree built on the * reference points. Be sure that the tree is built with the metric type * IPMetric. For this constructor, the reference set and the * query set are the same points. Optionally, whether or not to run * single-tree search or brute-force (naive) search can be specified. * * @param referenceSet Reference set of data for FastMKS. * @param referenceTree Tree built on reference data. * @param single Whether or not to run single-tree search. * @param naive Whether or not to run brute-force (naive) search. */ FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const bool single = false, const bool naive = false); /** * Create the FastMKS object with already-initialized trees built on the * reference and query points. Be sure that the trees are built with the * metric type IPMetric. Optionally, whether or not to run * single-tree search or naive (brute-force) search can be specified. * * @param referenceSet Reference set of data for FastMKS. * @param referenceTree Tree built on reference data. * @param querySet Set of query points for FastMKS. * @param queryTree Tree built on query data. * @param single Whether or not to use single-tree search. * @param naive Whether or not to use naive (brute-force) search. */ FastMKS(const arma::mat& referenceSet, TreeType* referenceTree, const arma::mat& querySet, TreeType* queryTree, const bool single = false, const bool naive = false); //! Destructor for the FastMKS object. ~FastMKS(); /** * Search for the maximum inner products of the query set (or if no query set * was passed, the reference set is used). The resulting maximum inner * products are stored in the products matrix and the corresponding point * indices are stores in the indices matrix. The results for each point in * the query set are stored in the corresponding column of the indices and * products matrices; for instance, the index of the point with maximum inner * product to point 4 in the query set will be stored in row 0 and column 4 of * the indices matrix. * * @param k The number of maximum kernels to find. * @param indices Matrix to store resulting indices of max-kernel search in. * @param products Matrix to store resulting max-kernel values in. */ void Search(const size_t k, arma::Mat& indices, arma::mat& products); //! Get the inner-product metric induced by the given kernel. const metric::IPMetric& Metric() const { return metric; } //! Modify the inner-product metric induced by the given kernel. metric::IPMetric& Metric() { return metric; } /** * Returns a string representation of this object. */ std::string ToString() const; private: //! The reference dataset. const arma::mat& referenceSet; //! The query dataset. const arma::mat& querySet; //! The tree built on the reference dataset. TreeType* referenceTree; //! The tree built on the query dataset. This is NULL if there is no query //! set. TreeType* queryTree; //! If true, this object created the trees and is responsible for them. bool treeOwner; //! If true, single-tree search is used. bool single; //! If true, naive (brute-force) search is used. bool naive; //! The instantiated inner-product metric induced by the given kernel. metric::IPMetric metric; //! Utility function. Copied too many times from too many places. void InsertNeighbor(arma::Mat& indices, arma::mat& products, const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); }; }; // namespace fastmks }; // namespace mlpack // Include implementation. #include "fastmks_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/fastmks/fastmks_rules.hpp0000644000176200001440000001323413647512751024441 0ustar liggesusers/** * @file fastmks_rules.hpp * @author Ryan Curtin * * Rules for the single or dual tree traversal for fast max-kernel search. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_HPP #define __MLPACK_METHODS_FASTMKS_FASTMKS_RULES_HPP #include #include #include "../neighbor_search/ns_traversal_info.hpp" namespace mlpack { namespace fastmks { /** * The base case and pruning rules for FastMKS (fast max-kernel search). */ template class FastMKSRules { public: FastMKSRules(const arma::mat& referenceSet, const arma::mat& querySet, arma::Mat& indices, arma::mat& products, KernelType& kernel); //! Compute the base case (kernel value) between two points. double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to be recursed into. * @param referenceNode Candidate reference node to be recursed into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that a node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) const; /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that a node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to be recursed into. * @param referenceNode Candidate reference node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; //! Get the number of times BaseCase() was called. size_t BaseCases() const { return baseCases; } //! Modify the number of times BaseCase() was called. size_t& BaseCases() { return baseCases; } //! Get the number of times Score() was called. size_t Scores() const { return scores; } //! Modify the number of times Score() was called. size_t& Scores() { return scores; } typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference dataset. const arma::mat& referenceSet; //! The query dataset. const arma::mat& querySet; //! The indices of the maximum kernel results. arma::Mat& indices; //! The maximum kernels. arma::mat& products; //! Cached query set self-kernels (|| q || for each q). arma::vec queryKernels; //! Cached reference set self-kernels (|| r || for each r). arma::vec referenceKernels; //! The instantiated kernel. KernelType& kernel; //! The last query index BaseCase() was called with. size_t lastQueryIndex; //! The last reference index BaseCase() was called with. size_t lastReferenceIndex; //! The last kernel evaluation resulting from BaseCase(). double lastKernel; //! Calculate the bound for a given query node. double CalculateBound(TreeType& queryNode) const; //! Utility function to insert neighbor into list of results. void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); //! For benchmarking. size_t baseCases; //! For benchmarking. size_t scores; TraversalInfoType traversalInfo; }; }; // namespace fastmks }; // namespace mlpack // Include implementation. #include "fastmks_rules_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/gmm/0000755000176200001440000000000013647512751020153 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/gmm/gmm_impl.hpp0000644000176200001440000003763213647512751022500 0ustar liggesusers/** * @file gmm_impl.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * @author Ryan Curtin * * Implementation of template-based GMM methods. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_GMM_IMPL_HPP #define __MLPACK_METHODS_GMM_GMM_IMPL_HPP // In case it hasn't already been included. #include "gmm.hpp" //#include namespace mlpack { namespace gmm { /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. The means and covariances will be set to 0. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. */ template GMM::GMM(const size_t gaussians, const size_t dimensionality) : gaussians(gaussians), dimensionality(dimensionality), means(gaussians, arma::vec(dimensionality)), covariances(gaussians, arma::mat(dimensionality, dimensionality)), weights(gaussians), localFitter(FittingType()), fitter(localFitter) { // Clear the memory; set it to 0. Technically this model is still valid, but // only barely. weights.fill(1.0 / gaussians); for (size_t i = 0; i < gaussians; ++i) { means[i].zeros(); covariances[i].eye(); } } /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. Also, pass in an initialized FittingType class; * this is useful in cases where the FittingType class needs to store some * state. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. * @param fitter Initialized fitting mechanism. */ template GMM::GMM(const size_t gaussians, const size_t dimensionality, FittingType& fitter) : gaussians(gaussians), dimensionality(dimensionality), means(gaussians, arma::vec(dimensionality)), covariances(gaussians, arma::mat(dimensionality, dimensionality)), weights(gaussians), fitter(fitter) { // Clear the memory; set it to 0. Technically this model is still valid, but // only barely. weights.fill(1.0 / gaussians); for (size_t i = 0; i < gaussians; ++i) { means[i].zeros(); covariances[i].eye(); } } // Copy constructor. template template GMM::GMM(const GMM& other) : gaussians(other.Gaussians()), dimensionality(other.Dimensionality()), means(other.Means()), covariances(other.Covariances()), weights(other.Weights()), localFitter(FittingType()), fitter(localFitter) { /* Nothing to do. */ } // Copy constructor for when the other GMM uses the same fitting type. template GMM::GMM(const GMM& other) : gaussians(other.Gaussians()), dimensionality(other.Dimensionality()), means(other.Means()), covariances(other.Covariances()), weights(other.Weights()), localFitter(other.Fitter()), fitter(localFitter) { /* Nothing to do. */ } template template GMM& GMM::operator=( const GMM& other) { gaussians = other.Gaussians(); dimensionality = other.Dimensionality(); means = other.Means(); covariances = other.Covariances(); weights = other.Weights(); return *this; } template GMM& GMM::operator=(const GMM& other) { gaussians = other.Gaussians(); dimensionality = other.Dimensionality(); means = other.Means(); covariances = other.Covariances(); weights = other.Weights(); localFitter = other.Fitter(); return *this; } // Load a GMM from file. /* template void GMM::Load(const std::string& filename) { util::SaveRestoreUtility load; if (!load.ReadFile(filename)) Rcpp::Rcout << "GMM::Load(): could not read file '" << filename << "'!\n"; load.LoadParameter(gaussians, "gaussians"); load.LoadParameter(dimensionality, "dimensionality"); load.LoadParameter(weights, "weights"); // We need to do a little error checking here. if (weights.n_elem != gaussians) { Rcpp::Rcout << "GMM::Load('" << filename << "'): file reports " << gaussians << " gaussians but weights vector only contains " << weights.n_elem << " elements!" << std::endl; } means.resize(gaussians); covariances.resize(gaussians); for (size_t i = 0; i < gaussians; ++i) { std::stringstream o; o << i; std::string meanName = "mean" + o.str(); std::string covName = "covariance" + o.str(); load.LoadParameter(means[i], meanName); load.LoadParameter(covariances[i], covName); } }*/ // Save a GMM to a file. /* template void GMM::Save(const std::string& filename) const { util::SaveRestoreUtility save; save.SaveParameter(gaussians, "gaussians"); save.SaveParameter(dimensionality, "dimensionality"); save.SaveParameter(weights, "weights"); for (size_t i = 0; i < gaussians; ++i) { // Generate names for the XML nodes. std::stringstream o; o << i; std::string meanName = "mean" + o.str(); std::string covName = "covariance" + o.str(); // Now save them. save.SaveParameter(means[i], meanName); save.SaveParameter(covariances[i], covName); } if (!save.WriteFile(filename)) Rcpp::Rcout << "GMM::Save(): error saving to '" << filename << "'.\n"; }*/ /** * Return the probability of the given observation being from this GMM. */ template double GMM::Probability(const arma::vec& observation) const { // Sum the probability for each Gaussian in our mixture (and we have to // multiply by the prior for each Gaussian too). double sum = 0; for (size_t i = 0; i < gaussians; i++) sum += weights[i] * phi(observation, means[i], covariances[i]); return sum; } /** * Return the probability of the given observation being from the given * component in the mixture. */ template double GMM::Probability(const arma::vec& observation, const size_t component) const { // We are only considering one Gaussian component -- so we only need to call // phi() once. We do consider the prior probability! return weights[component] * phi(observation, means[component], covariances[component]); } /** * Return a randomly generated observation according to the probability * distribution defined by this object. */ template arma::vec GMM::Random() const { // Determine which Gaussian it will be coming from. double gaussRand = math::Random(); size_t gaussian = 0; double sumProb = 0; for (size_t g = 0; g < gaussians; g++) { sumProb += weights(g); if (gaussRand <= sumProb) { gaussian = g; break; } } return trans(chol(covariances[gaussian])) * arma::randn(dimensionality) + means[gaussian]; } /** * Fit the GMM to the given observations. */ template double GMM::Estimate(const arma::mat& observations, const size_t trials, const bool useExistingModel) { double bestLikelihood; // This will be reported later. // We don't need to store temporary models if we are only doing one trial. if (trials == 1) { // Train the model. The user will have been warned earlier if the GMM was // initialized with no parameters (0 gaussians, dimensionality of 0). fitter.Estimate(observations, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); } else { if (trials == 0) return -DBL_MAX; // It's what they asked for... // If each trial must start from the same initial location, we must save it. std::vector meansOrig; std::vector covariancesOrig; arma::vec weightsOrig; if (useExistingModel) { meansOrig = means; covariancesOrig = covariances; weightsOrig = weights; } // We need to keep temporary copies. We'll do the first training into the // actual model position, so that if it's the best we don't need to copy it. fitter.Estimate(observations, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial 0 is " << bestLikelihood << "." << std::endl; // Now the temporary model. std::vector meansTrial(gaussians, arma::vec(dimensionality)); std::vector covariancesTrial(gaussians, arma::mat(dimensionality, dimensionality)); arma::vec weightsTrial(gaussians); for (size_t trial = 1; trial < trials; ++trial) { if (useExistingModel) { meansTrial = meansOrig; covariancesTrial = covariancesOrig; weightsTrial = weightsOrig; } fitter.Estimate(observations, meansTrial, covariancesTrial, weightsTrial, useExistingModel); // Check to see if the log-likelihood of this one is better. double newLikelihood = LogLikelihood(observations, meansTrial, covariancesTrial, weightsTrial); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial " << trial << " is " << newLikelihood << "." << std::endl; if (newLikelihood > bestLikelihood) { // Save new likelihood and copy new model. bestLikelihood = newLikelihood; means = meansTrial; covariances = covariancesTrial; weights = weightsTrial; } } } // Report final log-likelihood and return it. Rcpp::Rcout << "GMM::Estimate(): log-likelihood of trained GMM is " << bestLikelihood << "." << std::endl; return bestLikelihood; } /** * Fit the GMM to the given observations, each of which has a certain * probability of being from this distribution. */ template double GMM::Estimate(const arma::mat& observations, const arma::vec& probabilities, const size_t trials, const bool useExistingModel) { double bestLikelihood; // This will be reported later. // We don't need to store temporary models if we are only doing one trial. if (trials == 1) { // Train the model. The user will have been warned earlier if the GMM was // initialized with no parameters (0 gaussians, dimensionality of 0). fitter.Estimate(observations, probabilities, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); } else { if (trials == 0) return -DBL_MAX; // It's what they asked for... // If each trial must start from the same initial location, we must save it. std::vector meansOrig; std::vector covariancesOrig; arma::vec weightsOrig; if (useExistingModel) { meansOrig = means; covariancesOrig = covariances; weightsOrig = weights; } // We need to keep temporary copies. We'll do the first training into the // actual model position, so that if it's the best we don't need to copy it. fitter.Estimate(observations, probabilities, means, covariances, weights, useExistingModel); bestLikelihood = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial 0 is " << bestLikelihood << "." << std::endl; // Now the temporary model. std::vector meansTrial(gaussians, arma::vec(dimensionality)); std::vector covariancesTrial(gaussians, arma::mat(dimensionality, dimensionality)); arma::vec weightsTrial(gaussians); for (size_t trial = 1; trial < trials; ++trial) { if (useExistingModel) { meansTrial = meansOrig; covariancesTrial = covariancesOrig; weightsTrial = weightsOrig; } fitter.Estimate(observations, meansTrial, covariancesTrial, weightsTrial, useExistingModel); // Check to see if the log-likelihood of this one is better. double newLikelihood = LogLikelihood(observations, meansTrial, covariancesTrial, weightsTrial); Rcpp::Rcout << "GMM::Estimate(): Log-likelihood of trial " << trial << " is " << newLikelihood << "." << std::endl; if (newLikelihood > bestLikelihood) { // Save new likelihood and copy new model. bestLikelihood = newLikelihood; means = meansTrial; covariances = covariancesTrial; weights = weightsTrial; } } } // Report final log-likelihood and return it. Rcpp::Rcout << "GMM::Estimate(): log-likelihood of trained GMM is " << bestLikelihood << "." << std::endl; return bestLikelihood; } /** * Classify the given observations as being from an individual component in this * GMM. */ template void GMM::Classify(const arma::mat& observations, arma::Col& labels) const { // This is not the best way to do this! // We should not have to fill this with values, because each one should be // overwritten. labels.set_size(observations.n_cols); for (size_t i = 0; i < observations.n_cols; ++i) { // Find maximum probability component. double probability = 0; for (size_t j = 0; j < gaussians; ++j) { double newProb = Probability(observations.unsafe_col(i), j); if (newProb >= probability) { probability = newProb; labels[i] = j; } } } } /** * Get the log-likelihood of this data's fit to the model. */ template double GMM::LogLikelihood( const arma::mat& data, const std::vector& meansL, const std::vector& covariancesL, const arma::vec& weightsL) const { double loglikelihood = 0; arma::vec phis; arma::mat likelihoods(gaussians, data.n_cols); for (size_t i = 0; i < gaussians; i++) { phi(data, meansL[i], covariancesL[i], phis); likelihoods.row(i) = weightsL(i) * trans(phis); } // Now sum over every point. for (size_t j = 0; j < data.n_cols; j++) loglikelihood += log(accu(likelihoods.col(j))); return loglikelihood; } template std::string GMM::ToString() const { std::ostringstream convert; std::ostringstream data; convert << "GMM [" << this << "]" << std::endl; convert << " Gaussians: " << gaussians << std::endl; convert << " Dimensionality: "<(). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_EM_FIT_HPP #define __MLPACK_METHODS_GMM_EM_FIT_HPP #include // Default clustering mechanism. #include // Default covariance matrix constraint. #include "positive_definite_constraint.hpp" namespace mlpack { namespace gmm { /** * This class contains methods which can fit a GMM to observations using the EM * algorithm. It requires an initial clustering mechanism, which is by default * the KMeans algorithm. The clustering mechanism must implement the following * method: * * - void Cluster(const arma::mat& observations, * const size_t clusters, * arma::Col& assignments); * * This method should create 'clusters' clusters, and return the assignment of * each point to a cluster. */ template, typename CovarianceConstraintPolicy = PositiveDefiniteConstraint> class EMFit { public: /** * Construct the EMFit object, optionally passing an InitialClusteringType * object (just in case it needs to store state). Setting the maximum number * of iterations to 0 means that the EM algorithm will iterate until * convergence (with the given tolerance). * * The parameter forcePositive controls whether or not the covariance matrices * are checked for positive definiteness at each iteration. This could be a * time-consuming task, so, if you know your data is well-behaved, you can set * it to false and save some runtime. * * @param maxIterations Maximum number of iterations for EM. * @param tolerance Log-likelihood tolerance required for convergence. * @param forcePositive Check for positive-definiteness of each covariance * matrix at each iteration. * @param clusterer Object which will perform the initial clustering. */ EMFit(const size_t maxIterations = 300, const double tolerance = 1e-10, InitialClusteringType clusterer = InitialClusteringType(), CovarianceConstraintPolicy constraint = CovarianceConstraintPolicy()); /** * Fit the observations to a Gaussian mixture model (GMM) using the EM * algorithm. The size of the vectors (indicating the number of components) * must already be set. Optionally, if useInitialModel is set to true, then * the model given in the means, covariances, and weights parameters is used * as the initial model, instead of using the InitialClusteringType::Cluster() * option. * * @param observations List of observations to train on. * @param means Vector to store trained means in. * @param covariances Vector to store trained covariances in. * @param weights Vector to store a priori weights in. * @param useInitialModel If true, the given model is used for the initial * clustering. */ void Estimate(const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel = false); /** * Fit the observations to a Gaussian mixture model (GMM) using the EM * algorithm, taking into account the probabilities of each point being from * this mixture. The size of the vectors (indicating the number of * components) must already be set. Optionally, if useInitialModel is set to * true, then the model given in the means, covariances, and weights * parameters is used as the initial model, instead of using the * InitialClusteringType::Cluster() option. * * @param observations List of observations to train on. * @param probabilities Probability of each point being from this model. * @param means Vector to store trained means in. * @param covariances Vector to store trained covariances in. * @param weights Vector to store a priori weights in. * @param useInitialModel If true, the given model is used for the initial * clustering. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel = false); //! Get the clusterer. const InitialClusteringType& Clusterer() const { return clusterer; } //! Modify the clusterer. InitialClusteringType& Clusterer() { return clusterer; } //! Get the covariance constraint policy class. const CovarianceConstraintPolicy& Constraint() const { return constraint; } //! Modify the covariance constraint policy class. CovarianceConstraintPolicy& Constraint() { return constraint; } //! Get the maximum number of iterations of the EM algorithm. size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations of the EM algorithm. size_t& MaxIterations() { return maxIterations; } //! Get the tolerance for the convergence of the EM algorithm. double Tolerance() const { return tolerance; } //! Modify the tolerance for the convergence of the EM algorithm. double& Tolerance() { return tolerance; } private: /** * Run the clusterer, and then turn the cluster assignments into Gaussians. * This is a helper function for both overloads of Estimate(). The vectors * must be already set to the number of clusters. * * @param observations List of observations. * @param means Vector to store means in. * @param covariances Vector to store covariances in. * @param weights Vector to store a priori weights in. */ void InitialClustering(const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights); /** * Calculate the log-likelihood of a model. Yes, this is reimplemented in the * GMM code. Intuition suggests that the log-likelihood is not the best way * to determine if the EM algorithm has converged. * * @param data Data matrix. * @param means Vector of means. * @param covariances Vector of covariance matrices. * @param weights Vector of a priori weights. */ double LogLikelihood(const arma::mat& data, const std::vector& means, const std::vector& covariances, const arma::vec& weights) const; //! Maximum iterations of EM algorithm. size_t maxIterations; //! Tolerance for convergence of EM. double tolerance; //! Object which will perform the clustering. InitialClusteringType clusterer; //! Object which applies constraints to the covariance matrix. CovarianceConstraintPolicy constraint; }; }; // namespace gmm }; // namespace mlpack // Include implementation. #include "em_fit_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/gmm/positive_definite_constraint.hpp0000644000176200001440000000337113647512751026645 0ustar liggesusers/** * @file positive_definite_constraint.hpp * @author Ryan Curtin * * Restricts a covariance matrix to being positive definite. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_POSITIVE_DEFINITE_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_POSITIVE_DEFINITE_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * Given a covariance matrix, force the matrix to be positive definite. */ class PositiveDefiniteConstraint { public: /** * Apply the positive definiteness constraint to the given covariance matrix. * * @param covariance Covariance matrix. */ static void ApplyConstraint(arma::mat& covariance) { // TODO: make this more efficient. if (arma::det(covariance) <= 1e-50) { Rcpp::Rcout << "Covariance matrix is not positive definite. Adding " << "perturbation." << std::endl; double perturbation = 1e-30; while (arma::det(covariance) <= 1e-50) { covariance.diag() += perturbation; perturbation *= 10; } } } }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/gmm/em_fit_impl.hpp0000644000176200001440000002501113647512751023147 0ustar liggesusers/** * @file em_fit_impl.hpp * @author Ryan Curtin * * Implementation of EM algorithm for fitting GMMs. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_EM_FIT_IMPL_HPP #define __MLPACK_METHODS_GMM_EM_FIT_IMPL_HPP // In case it hasn't been included yet. #include "em_fit.hpp" // Definition of phi(). #include "phi.hpp" namespace mlpack { namespace gmm { //! Constructor. template EMFit::EMFit( const size_t maxIterations, const double tolerance, InitialClusteringType clusterer, CovarianceConstraintPolicy constraint) : maxIterations(maxIterations), tolerance(tolerance), clusterer(clusterer), constraint(constraint) { /* Nothing to do. */ } template void EMFit::Estimate( const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel) { // Only perform initial clustering if the user wanted it. if (!useInitialModel) InitialClustering(observations, means, covariances, weights); double l = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "EMFit::Estimate(): initial clustering log-likelihood: " << l << std::endl; double lOld = -DBL_MAX; arma::mat condProb(observations.n_cols, means.size()); // Iterate to update the model until no more improvement is found. size_t iteration = 1; while (std::abs(l - lOld) > tolerance && iteration != maxIterations) { Rcpp::Rcout << "EMFit::Estimate(): iteration " << iteration << ", " << "log-likelihood " << l << "." << std::endl; // Calculate the conditional probabilities of choosing a particular // Gaussian given the observations and the present theta value. for (size_t i = 0; i < means.size(); i++) { // Store conditional probabilities into condProb vector for each // Gaussian. First we make an alias of the condProb vector. arma::vec condProbAlias = condProb.unsafe_col(i); phi(observations, means[i], covariances[i], condProbAlias); condProbAlias *= weights[i]; } // Normalize row-wise. for (size_t i = 0; i < condProb.n_rows; i++) { // Avoid dividing by zero; if the probability for everything is 0, we // don't want to make it NaN. const double probSum = accu(condProb.row(i)); if (probSum != 0.0) condProb.row(i) /= probSum; } // Store the sum of the probability of each state over all the observations. arma::vec probRowSums = trans(arma::sum(condProb, 0 /* columnwise */)); // Calculate the new value of the means using the updated conditional // probabilities. for (size_t i = 0; i < means.size(); i++) { // Don't update if there's no probability of the Gaussian having points. if (probRowSums[i] != 0) means[i] = (observations * condProb.col(i)) / probRowSums[i]; // Calculate the new value of the covariances using the updated // conditional probabilities and the updated means. arma::mat tmp = observations - (means[i] * arma::ones(observations.n_cols)); arma::mat tmpB = tmp % (arma::ones(observations.n_rows) * trans(condProb.col(i))); // Don't update if there's no probability of the Gaussian having points. if (probRowSums[i] != 0.0) covariances[i] = (tmp * trans(tmpB)) / probRowSums[i]; // Apply covariance constraint. constraint.ApplyConstraint(covariances[i]); } // Calculate the new values for omega using the updated conditional // probabilities. weights = probRowSums / observations.n_cols; // Update values of l; calculate new log-likelihood. lOld = l; l = LogLikelihood(observations, means, covariances, weights); iteration++; } } template void EMFit::Estimate( const arma::mat& observations, const arma::vec& probabilities, std::vector& means, std::vector& covariances, arma::vec& weights, const bool useInitialModel) { if (!useInitialModel) InitialClustering(observations, means, covariances, weights); double l = LogLikelihood(observations, means, covariances, weights); Rcpp::Rcout << "EMFit::Estimate(): initial clustering log-likelihood: " << l << std::endl; double lOld = -DBL_MAX; arma::mat condProb(observations.n_cols, means.size()); // Iterate to update the model until no more improvement is found. size_t iteration = 1; while (std::abs(l - lOld) > tolerance && iteration != maxIterations) { // Calculate the conditional probabilities of choosing a particular // Gaussian given the observations and the present theta value. for (size_t i = 0; i < means.size(); i++) { // Store conditional probabilities into condProb vector for each // Gaussian. First we make an alias of the condProb vector. arma::vec condProbAlias = condProb.unsafe_col(i); phi(observations, means[i], covariances[i], condProbAlias); condProbAlias *= weights[i]; } // Normalize row-wise. for (size_t i = 0; i < condProb.n_rows; i++) { // Avoid dividing by zero; if the probability for everything is 0, we // don't want to make it NaN. const double probSum = accu(condProb.row(i)); if (probSum != 0.0) condProb.row(i) /= probSum; } // This will store the sum of probabilities of each state over all the // observations. arma::vec probRowSums(means.size()); // Calculate the new value of the means using the updated conditional // probabilities. for (size_t i = 0; i < means.size(); i++) { // Calculate the sum of probabilities of points, which is the // conditional probability of each point being from Gaussian i // multiplied by the probability of the point being from this mixture // model. probRowSums[i] = accu(condProb.col(i) % probabilities); means[i] = (observations * (condProb.col(i) % probabilities)) / probRowSums[i]; // Calculate the new value of the covariances using the updated // conditional probabilities and the updated means. arma::mat tmp = observations - (means[i] * arma::ones(observations.n_cols)); arma::mat tmpB = tmp % (arma::ones(observations.n_rows) * trans(condProb.col(i) % probabilities)); covariances[i] = (tmp * trans(tmpB)) / probRowSums[i]; // Apply covariance constraint. constraint.ApplyConstraint(covariances[i]); } // Calculate the new values for omega using the updated conditional // probabilities. weights = probRowSums / accu(probabilities); // Update values of l; calculate new log-likelihood. lOld = l; l = LogLikelihood(observations, means, covariances, weights); iteration++; } } template void EMFit:: InitialClustering(const arma::mat& observations, std::vector& means, std::vector& covariances, arma::vec& weights) { // Assignments from clustering. arma::Col assignments; // Run clustering algorithm. clusterer.Cluster(observations, means.size(), assignments); // Now calculate the means, covariances, and weights. weights.zeros(); for (size_t i = 0; i < means.size(); ++i) { means[i].zeros(); covariances[i].zeros(); } // From the assignments, generate our means, covariances, and weights. for (size_t i = 0; i < observations.n_cols; ++i) { const size_t cluster = assignments[i]; // Add this to the relevant mean. means[cluster] += observations.col(i); // Add this to the relevant covariance. // covariances[cluster] += observations.col(i) * trans(observations.col(i)); // Now add one to the weights (we will normalize). weights[cluster]++; } // Now normalize the mean and covariance. for (size_t i = 0; i < means.size(); ++i) { // covariances[i] -= means[i] * trans(means[i]); means[i] /= (weights[i] > 1) ? weights[i] : 1; // covariances[i] /= (weights[i] > 1) ? weights[i] : 1; } for (size_t i = 0; i < observations.n_cols; ++i) { const size_t cluster = assignments[i]; const arma::vec normObs = observations.col(i) - means[cluster]; covariances[cluster] += normObs * normObs.t(); } for (size_t i = 0; i < means.size(); ++i) { covariances[i] /= (weights[i] > 1) ? weights[i] : 1; // Apply constraints to covariance matrix. constraint.ApplyConstraint(covariances[i]); } // Finally, normalize weights. weights /= accu(weights); } template double EMFit::LogLikelihood( const arma::mat& observations, const std::vector& means, const std::vector& covariances, const arma::vec& weights) const { double logLikelihood = 0; arma::vec phis; arma::mat likelihoods(means.size(), observations.n_cols); for (size_t i = 0; i < means.size(); ++i) { phi(observations, means[i], covariances[i], phis); likelihoods.row(i) = weights(i) * trans(phis); } // Now sum over every point. for (size_t j = 0; j < observations.n_cols; ++j) { if (accu(likelihoods.col(j)) == 0) Rcpp::Rcout << "Likelihood of point " << j << " is 0! It is probably an " << "outlier." << std::endl; logLikelihood += log(accu(likelihoods.col(j))); } return logLikelihood; } }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/gmm/eigenvalue_ratio_constraint.hpp0000644000176200001440000000656613647512751026467 0ustar liggesusers/** * @file eigenvalue_ratio_constraint.hpp * @author Ryan Curtin * * Constrain a covariance matrix to have a certain ratio of eigenvalues. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_EIGENVALUE_RATIO_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_EIGENVALUE_RATIO_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * Given a vector of eigenvalue ratios, ensure that the covariance matrix always * has those eigenvalue ratios. When you create this object, make sure that the * vector of ratios that you pass does not go out of scope, because this object * holds a reference to that vector instead of copying it. */ class EigenvalueRatioConstraint { public: /** * Create the EigenvalueRatioConstraint object with the given vector of * eigenvalue ratios. These ratios are with respect to the first eigenvalue, * which is the largest eigenvalue, so the first element of the vector should * be 1. In addition, all other elements should be less than or equal to 1. */ EigenvalueRatioConstraint(const arma::vec& ratios) : ratios(ratios) { // Check validity of ratios. if (std::abs(ratios[0] - 1.0) > 1e-20) Rcpp::Rcout << "EigenvalueRatioConstraint::EigenvalueRatioConstraint(): " << "first element of ratio vector is not 1.0!" << std::endl; for (size_t i = 1; i < ratios.n_elem; ++i) { if (ratios[i] > 1.0) Rcpp::Rcout << "EigenvalueRatioConstraint::EigenvalueRatioConstraint(): " << "element " << i << " of ratio vector is greater than 1.0!" << std::endl; if (ratios[i] < 0.0) Rcpp::Rcout << "EigenvalueRatioConstraint::EigenvalueRatioConstraint(): " << "element " << i << " of ratio vectors is negative and will " << "probably cause the covariance to be non-invertible..." << std::endl; } } /** * Apply the eigenvalue ratio constraint to the given covariance matrix. */ void ApplyConstraint(arma::mat& covariance) const { // Eigendecompose the matrix. arma::vec eigenvalues; arma::mat eigenvectors; arma::eig_sym(eigenvalues, eigenvectors, covariance); // Change the eigenvalues to what we are forcing them to be. There // shouldn't be any negative eigenvalues anyway, so it doesn't matter if we // are suddenly forcing them to be positive. If the first eigenvalue is // negative, well, there are going to be some problems later... eigenvalues = (eigenvalues[0] * ratios); // Reassemble the matrix. covariance = eigenvectors * arma::diagmat(eigenvalues) * eigenvectors.t(); } private: //! Ratios for eigenvalues. const arma::vec& ratios; }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/gmm/phi.hpp0000644000176200001440000001174413647512751021453 0ustar liggesusers/** * @author Parikshit Ram (pram@cc.gatech.edu) * @file phi.hpp * * This file computes the Gaussian probability * density function * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_MOG_PHI_HPP #define __MLPACK_METHODS_MOG_PHI_HPP #include namespace mlpack { namespace gmm { /** * Calculates the univariate Gaussian probability density function. * * Example use: * @code * double x, mean, var; * .... * double f = phi(x, mean, var); * @endcode * * @param x Observation. * @param mean Mean of univariate Gaussian. * @param var Variance of univariate Gaussian. * @return Probability of x being observed from the given univariate Gaussian. */ inline double phi(const double x, const double mean, const double var) { return exp(-1.0 * ((x - mean) * (x - mean) / (2 * var))) / sqrt(2 * M_PI * var); } /** * Calculates the multivariate Gaussian probability density function. * * Example use: * @code * extern arma::vec x, mean; * extern arma::mat cov; * .... * double f = phi(x, mean, cov); * @endcode * * @param x Observation. * @param mean Mean of multivariate Gaussian. * @param cov Covariance of multivariate Gaussian. * @return Probability of x being observed from the given multivariate Gaussian. */ inline double phi(const arma::vec& x, const arma::vec& mean, const arma::mat& cov) { arma::vec diff = mean - x; // Parentheses required for Armadillo 3.0.0 bug. arma::vec exponent = -0.5 * (trans(diff) * inv(cov) * diff); // TODO: What if det(cov) < 0? return pow(2 * M_PI, (double) x.n_elem / -2.0) * pow(det(cov), -0.5) * exp(exponent[0]); } /** * Calculates the multivariate Gaussian probability density function and also * the gradients with respect to the mean and the variance. * * Example use: * @code * extern arma::vec x, mean, g_mean, g_cov; * std::vector d_cov; // the dSigma * .... * double f = phi(x, mean, cov, d_cov, &g_mean, &g_cov); * @endcode */ inline double phi(const arma::vec& x, const arma::vec& mean, const arma::mat& cov, const std::vector& d_cov, arma::vec& g_mean, arma::vec& g_cov) { // We don't call out to another version of the function to avoid inverting the // covariance matrix more than once. arma::mat cinv = inv(cov); arma::vec diff = mean - x; // Parentheses required for Armadillo 3.0.0 bug. arma::vec exponent = -0.5 * (trans(diff) * inv(cov) * diff); long double f = pow(2 * M_PI, (double) x.n_elem / 2) * pow(det(cov), -0.5) * exp(exponent[0]); // Calculate the g_mean values; this is a (1 x dim) vector. arma::vec invDiff = cinv * diff; g_mean = f * invDiff; // Calculate the g_cov values; this is a (1 x (dim * (dim + 1) / 2)) vector. for (size_t i = 0; i < d_cov.size(); i++) { arma::mat inv_d = cinv * d_cov[i]; g_cov[i] = f * dot(d_cov[i] * invDiff, invDiff) + accu(inv_d.diag()) / 2; } return f; } /** * Calculates the multivariate Gaussian probability density function for each * data point (column) in the given matrix, with respect to the given mean and * variance. * * @param x List of observations. * @param mean Mean of multivariate Gaussian. * @param cov Covariance of multivariate Gaussian. * @param probabilities Output probabilities for each input observation. */ inline void phi(const arma::mat& x, const arma::vec& mean, const arma::mat& cov, arma::vec& probabilities) { // Column i of 'diffs' is the difference between x.col(i) and the mean. arma::mat diffs = x - (mean * arma::ones(x.n_cols)); // Now, we only want to calculate the diagonal elements of (diffs' * cov^-1 * // diffs). We just don't need any of the other elements. We can calculate // the right hand part of the equation (instead of the left side) so that // later we are referencing columns, not rows -- that is faster. arma::mat rhs = -0.5 * inv(cov) * diffs; arma::vec exponents(diffs.n_cols); // We will now fill this. for (size_t i = 0; i < diffs.n_cols; i++) exponents(i) = exp(accu(diffs.unsafe_col(i) % rhs.unsafe_col(i))); probabilities = pow(2 * M_PI, (double) mean.n_elem / -2.0) * pow(det(cov), -0.5) * exponents; } }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/gmm/no_constraint.hpp0000644000176200001440000000253213647512751023546 0ustar liggesusers/** * @file no_constraint.hpp * @author Ryan Curtin * * No constraint on the covariance matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_NO_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_NO_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * This class enforces no constraint on the covariance matrix. It's faster this * way, although depending on your situation you may end up with a * non-invertible covariance matrix. */ class NoConstraint { public: //! Do nothing, and do not modify the covariance matrix. static void ApplyConstraint(const arma::mat& /* covariance */) { } }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/gmm/diagonal_constraint.hpp0000644000176200001440000000251613647512751024712 0ustar liggesusers/** * @file diagonal_constraint.hpp * @author Ryan Curtin * * Constrain a covariance matrix to be diagonal. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_GMM_DIAGONAL_CONSTRAINT_HPP #define __MLPACK_METHODS_GMM_DIAGONAL_CONSTRAINT_HPP #include namespace mlpack { namespace gmm { /** * Force a covariance matrix to be diagonal. */ class DiagonalConstraint { public: //! Force a covariance matrix to be diagonal. static void ApplyConstraint(arma::mat& covariance) { // Save the diagonal only. arma::vec diagonal = covariance.diag(); covariance = arma::diagmat(diagonal); } }; }; // namespace gmm }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/gmm/gmm.hpp0000644000176200001440000003372113647512751021452 0ustar liggesusers/** * @author Parikshit Ram (pram@cc.gatech.edu) * @file gmm.hpp * * Defines a Gaussian Mixture model and * estimates the parameters of the model * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_MOG_MOG_EM_HPP #define __MLPACK_METHODS_MOG_MOG_EM_HPP #include // This is the default fitting method class. #include "em_fit.hpp" namespace mlpack { namespace gmm /** Gaussian Mixture Models. */ { /** * A Gaussian Mixture Model (GMM). This class uses maximum likelihood loss * functions to estimate the parameters of the GMM on a given dataset via the * given fitting mechanism, defined by the FittingType template parameter. The * GMM can be trained using normal data, or data with probabilities of being * from this GMM (see GMM::Estimate() for more information). * * The FittingType template class must provide a way for the GMM to train on * data. It must provide the following two functions: * * @code * void Estimate(const arma::mat& observations, * std::vector& means, * std::vector& covariances, * arma::vec& weights); * * void Estimate(const arma::mat& observations, * const arma::vec& probabilities, * std::vector& means, * std::vector& covariances, * arma::vec& weights); * @endcode * * These functions should produce a trained GMM from the given observations and * probabilities. These may modify the size of the model (by increasing the * size of the mean and covariance vectors as well as the weight vectors), but * the method should expect that these vectors are already set to the size of * the GMM as specified in the constructor. * * For a sample implementation, see the EMFit class; this class uses the EM * algorithm to train a GMM, and is the default fitting type. * * The GMM, once trained, can be used to generate random points from the * distribution and estimate the probability of points being from the * distribution. The parameters of the GMM can be obtained through the * accessors and mutators. * * Example use: * * @code * // Set up a mixture of 5 gaussians in a 4-dimensional space (uses the default * // EM fitting mechanism). * GMM<> g(5, 4); * * // Train the GMM given the data observations. * g.Estimate(data); * * // Get the probability of 'observation' being observed from this GMM. * double probability = g.Probability(observation); * * // Get a random observation from the GMM. * arma::vec observation = g.Random(); * @endcode */ template > class GMM { private: //! The number of Gaussians in the model. size_t gaussians; //! The dimensionality of the model. size_t dimensionality; //! Vector of means; one for each Gaussian. std::vector means; //! Vector of covariances; one for each Gaussian. std::vector covariances; //! Vector of a priori weights for each Gaussian. arma::vec weights; public: /** * Create an empty Gaussian Mixture Model, with zero gaussians. */ GMM() : gaussians(0), dimensionality(0), localFitter(FittingType()), fitter(localFitter) { // Warn the user. They probably don't want to do this. If this constructor // is being used (because it is required by some template classes), the user // should know that it is potentially dangerous. Rcpp::Rcout << "GMM::GMM(): no parameters given; Estimate() may fail " << "unless parameters are set." << std::endl; } /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. The means and covariances will be set to 0. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. */ GMM(const size_t gaussians, const size_t dimensionality); /** * Create a GMM with the given number of Gaussians, each of which have the * specified dimensionality. Also, pass in an initialized FittingType class; * this is useful in cases where the FittingType class needs to store some * state. * * @param gaussians Number of Gaussians in this GMM. * @param dimensionality Dimensionality of each Gaussian. * @param fitter Initialized fitting mechanism. */ GMM(const size_t gaussians, const size_t dimensionality, FittingType& fitter); /** * Create a GMM with the given means, covariances, and weights. * * @param means Means of the model. * @param covariances Covariances of the model. * @param weights Weights of the model. */ GMM(const std::vector& means, const std::vector& covariances, const arma::vec& weights) : gaussians(means.size()), dimensionality((!means.empty()) ? means[0].n_elem : 0), means(means), covariances(covariances), weights(weights), localFitter(FittingType()), fitter(localFitter) { /* Nothing to do. */ } /** * Create a GMM with the given means, covariances, and weights, and use the * given initialized FittingType class. This is useful in cases where the * FittingType class needs to store some state. * * @param means Means of the model. * @param covariances Covariances of the model. * @param weights Weights of the model. */ GMM(const std::vector& means, const std::vector& covariances, const arma::vec& weights, FittingType& fitter) : gaussians(means.size()), dimensionality((!means.empty()) ? means[0].n_elem : 0), means(means), covariances(covariances), weights(weights), fitter(fitter) { /* Nothing to do. */ } /** * Copy constructor for GMMs which use different fitting types. */ template GMM(const GMM& other); /** * Copy constructor for GMMs using the same fitting type. This also copies * the fitter. */ GMM(const GMM& other); /** * Copy operator for GMMs which use different fitting types. */ template GMM& operator=(const GMM& other); /** * Copy operator for GMMs which use the same fitting type. This also copies * the fitter. */ GMM& operator=(const GMM& other); /** * Load a GMM from an XML file. The format of the XML file should be the same * as is generated by the Save() method. * * @param filename Name of XML file containing model to be loaded. */ //void Load(const std::string& filename); /** * Save a GMM to an XML file. * * @param filename Name of XML file to write to. */ //void Save(const std::string& filename) const; //! Return the number of gaussians in the model. size_t Gaussians() const { return gaussians; } //! Modify the number of gaussians in the model. Careful! You will have to //! resize the means, covariances, and weights yourself. size_t& Gaussians() { return gaussians; } //! Return the dimensionality of the model. size_t Dimensionality() const { return dimensionality; } //! Modify the dimensionality of the model. Careful! You will have to update //! each mean and covariance matrix yourself. size_t& Dimensionality() { return dimensionality; } //! Return a const reference to the vector of means (mu). const std::vector& Means() const { return means; } //! Return a reference to the vector of means (mu). std::vector& Means() { return means; } //! Return a const reference to the vector of covariance matrices (sigma). const std::vector& Covariances() const { return covariances; } //! Return a reference to the vector of covariance matrices (sigma). std::vector& Covariances() { return covariances; } //! Return a const reference to the a priori weights of each Gaussian. const arma::vec& Weights() const { return weights; } //! Return a reference to the a priori weights of each Gaussian. arma::vec& Weights() { return weights; } //! Return a const reference to the fitting type. const FittingType& Fitter() const { return fitter; } //! Return a reference to the fitting type. FittingType& Fitter() { return fitter; } /** * Return the probability that the given observation came from this * distribution. * * @param observation Observation to evaluate the probability of. */ double Probability(const arma::vec& observation) const; /** * Return the probability that the given observation came from the given * Gaussian component in this distribution. * * @param observation Observation to evaluate the probability of. * @param component Index of the component of the GMM to be considered. */ double Probability(const arma::vec& observation, const size_t component) const; /** * Return a randomly generated observation according to the probability * distribution defined by this object. * * @return Random observation from this GMM. */ arma::vec Random() const; /** * Estimate the probability distribution directly from the given observations, * using the given algorithm in the FittingType class to fit the data. * * The fitting will be performed 'trials' times; from these trials, the model * with the greatest log-likelihood will be selected. By default, only one * trial is performed. The log-likelihood of the best fitting is returned. * * Optionally, the existing model can be used as an initial model for the * estimation by setting 'useExistingModel' to true. If the fitting procedure * is deterministic after the initial position is given, then 'trials' should * be set to 1. * * @tparam FittingType The type of fitting method which should be used * (EMFit<> is suggested). * @param observations Observations of the model. * @param trials Number of trials to perform; the model in these trials with * the greatest log-likelihood will be selected. * @param useExistingModel If true, the existing model is used as an initial * model for the estimation. * @return The log-likelihood of the best fit. */ double Estimate(const arma::mat& observations, const size_t trials = 1, const bool useExistingModel = false); /** * Estimate the probability distribution directly from the given observations, * taking into account the probability of each observation actually being from * this distribution, and using the given algorithm in the FittingType class * to fit the data. * * The fitting will be performed 'trials' times; from these trials, the model * with the greatest log-likelihood will be selected. By default, only one * trial is performed. The log-likelihood of the best fitting is returned. * * Optionally, the existing model can be used as an initial model for the * estimation by setting 'useExistingModel' to true. If the fitting procedure * is deterministic after the initial position is given, then 'trials' should * be set to 1. * * @param observations Observations of the model. * @param probabilities Probability of each observation being from this * distribution. * @param trials Number of trials to perform; the model in these trials with * the greatest log-likelihood will be selected. * @param useExistingModel If true, the existing model is used as an initial * model for the estimation. * @return The log-likelihood of the best fit. */ double Estimate(const arma::mat& observations, const arma::vec& probabilities, const size_t trials = 1, const bool useExistingModel = false); /** * Classify the given observations as being from an individual component in * this GMM. The resultant classifications are stored in the 'labels' object, * and each label will be between 0 and (Gaussians() - 1). Supposing that a * point was classified with label 2, and that our GMM object was called * 'gmm', one could access the relevant Gaussian distribution as follows: * * @code * arma::vec mean = gmm.Means()[2]; * arma::mat covariance = gmm.Covariances()[2]; * double priorWeight = gmm.Weights()[2]; * @endcode * * @param observations List of observations to classify. * @param labels Object which will be filled with labels. */ void Classify(const arma::mat& observations, arma::Col& labels) const; /** * Returns a string representation of this object. */ std::string ToString() const; private: /** * This function computes the loglikelihood of the given model. This function * is used by GMM::Estimate(). * * @param dataPoints Observations to calculate the likelihood for. * @param means Means of the given mixture model. * @param covars Covariances of the given mixture model. * @param weights Weights of the given mixture model. */ double LogLikelihood(const arma::mat& dataPoints, const std::vector& means, const std::vector& covars, const arma::vec& weights) const; //! Locally-stored fitting object; in case the user did not pass one. FittingType localFitter; //! Reference to the fitting object we should use. FittingType& fitter; }; }; // namespace gmm }; // namespace mlpack // Include implementation. #include "gmm_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/det/0000755000176200001440000000000013647512751020147 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/det/dt_utils.hpp0000644000176200001440000000611513647512751022512 0ustar liggesusers/** * @file dt_utils.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * This file implements functions to perform different tasks with the Density * Tree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DET_DT_UTILS_HPP #define __MLPACK_METHODS_DET_DT_UTILS_HPP #include #include "dtree.hpp" namespace mlpack { namespace det { /** * Print the membership of leaves of a density estimation tree given the labels * and number of classes. Optionally, pass the name of a file to print this * information to (otherwise stdout is used). * * @param dtree Tree to print membership of. * @param data Dataset tree is built upon. * @param labels Class labels of dataset. * @param numClasses Number of classes in dataset. * @param leafClassMembershipFile Name of file to print to (optional). */ void PrintLeafMembership(DTree* dtree, const arma::mat& data, const arma::Mat& labels, const size_t numClasses, const std::string leafClassMembershipFile = ""); /** * Print the variable importance of each dimension of a density estimation tree. * Optionally, pass the name of a file to print this information to (otherwise * stdout is used). * * @param dtree Density tree to use. * @param viFile Name of file to print to (optional). */ void PrintVariableImportance(const DTree* dtree, const std::string viFile = ""); /** * Train the optimal decision tree using cross-validation with the given number * of folds. Optionally, give a filename to print the unpruned tree to. This * initializes a tree on the heap, so you are responsible for deleting it. * * @param dataset Dataset for the tree to use. * @param folds Number of folds to use for cross-validation. * @param useVolumeReg If true, use volume regularization. * @param maxLeafSize Maximum number of points allowed in a leaf. * @param minLeafSize Minimum number of points allowed in a leaf. * @param unprunedTreeOutput Filename to print unpruned tree to (optional). */ DTree* Trainer(arma::mat& dataset, const size_t folds, const bool useVolumeReg = false, const size_t maxLeafSize = 10, const size_t minLeafSize = 5, const std::string unprunedTreeOutput = ""); }; // namespace det }; // namespace mlpack #endif // __MLPACK_METHODS_DET_DT_UTILS_HPP RcppMLPACK/inst/include/mlpack/methods/det/dtree.hpp0000644000176200001440000002503313647512751021766 0ustar liggesusers/** * @file dtree.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * Density Estimation Tree class * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DET_DTREE_HPP #define __MLPACK_METHODS_DET_DTREE_HPP #include namespace mlpack { namespace det /** Density Estimation Trees */ { /** * A density estimation tree is similar to both a decision tree and a space * partitioning tree (like a kd-tree). Each leaf represents a constant-density * hyper-rectangle. The tree is constructed in such a way as to minimize the * integrated square error between the probability distribution of the tree and * the observed probability distribution of the data. Because the tree is * similar to a decision tree, the density estimation tree can provide very fast * density estimates for a given point. * * For more information, see the following paper: * * @code * @incollection{ram2011, * author = {Ram, Parikshit and Gray, Alexander G.}, * title = {Density estimation trees}, * booktitle = {{Proceedings of the 17th ACM SIGKDD International Conference * on Knowledge Discovery and Data Mining}}, * series = {KDD '11}, * year = {2011}, * pages = {627--635} * } * @endcode */ class DTree { public: /** * Create an empty density estimation tree. */ DTree(); /** * Create a density estimation tree with the given bounds and the given number * of total points. Children will not be created. * * @param maxVals Maximum values of the bounding box. * @param minVals Minimum values of the bounding box. * @param totalPoints Total number of points in the dataset. */ DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t totalPoints); /** * Create a density estimation tree on the given data. Children will be * created following the procedure outlined in the paper. The data will be * modified; it will be reordered similar to the way BinarySpaceTree modifies * datasets. * * @param data Dataset to build tree on. */ DTree(arma::mat& data); /** * Create a child node of a density estimation tree given the bounding box * specified by maxVals and minVals, using the size given in start and end and * the specified error. Children of this node will not be created * recursively. * * @param maxVals Upper bound of bounding box. * @param minVals Lower bound of bounding box. * @param start Start of points represented by this node in the data matrix. * @param end End of points represented by this node in the data matrix. * @param error log-negative error of this node. */ DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t start, const size_t end, const double logNegError); /** * Create a child node of a density estimation tree given the bounding box * specified by maxVals and minVals, using the size given in start and end, * and calculating the error with the total number of points given. Children * of this node will not be created recursively. * * @param maxVals Upper bound of bounding box. * @param minVals Lower bound of bounding box. * @param start Start of points represented by this node in the data matrix. * @param end End of points represented by this node in the data matrix. */ DTree(const arma::vec& maxVals, const arma::vec& minVals, const size_t totalPoints, const size_t start, const size_t end); //! Clean up memory allocated by the tree. ~DTree(); /** * Greedily expand the tree. The points in the dataset will be reordered * during tree growth. * * @param data Dataset to build tree on. * @param oldFromNew Mappings from old points to new points. * @param useVolReg If true, volume regularization is used. * @param maxLeafSize Maximum size of a leaf. * @param minLeafSize Minimum size of a leaf. */ double Grow(arma::mat& data, arma::Col& oldFromNew, const bool useVolReg = false, const size_t maxLeafSize = 10, const size_t minLeafSize = 5); /** * Perform alpha pruning on a tree. Returns the new value of alpha. * * @param oldAlpha Old value of alpha. * @param points Total number of points in dataset. * @param useVolReg If true, volume regularization is used. * @return New value of alpha. */ double PruneAndUpdate(const double oldAlpha, const size_t points, const bool useVolReg = false); /** * Compute the logarithm of the density estimate of a given query point. * * @param query Point to estimate density of. */ double ComputeValue(const arma::vec& query) const; /** * Print the tree in a depth-first manner (this function is called * recursively). * * @param fp File to write the tree to. * @param level Level of the tree (should start at 0). */ //void WriteTree(FILE *fp, const size_t level = 0) const; /** * Index the buckets for possible usage later; this results in every leaf in * the tree having a specific tag (accessible with BucketTag()). This * function calls itself recursively. * * @param tag Tag for the next leaf; leave at 0 for the initial call. */ int TagTree(const int tag = 0); /** * Return the tag of the leaf containing the query. This is useful for * generating class memberships. * * @param query Query to search for. */ int FindBucket(const arma::vec& query) const; /** * Compute the variable importance of each dimension in the learned tree. * * @param importances Vector to store the calculated importances in. */ void ComputeVariableImportance(arma::vec& importances) const; /** * Compute the log-negative-error for this point, given the total number of * points in the dataset. * * @param totalPoints Total number of points in the dataset. */ double LogNegativeError(const size_t totalPoints) const; /** * Return whether a query point is within the range of this node. */ bool WithinRange(const arma::vec& query) const; private: // The indices in the complete set of points // (after all forms of swapping in the original data // matrix to align all the points in a node // consecutively in the matrix. The 'old_from_new' array // maps the points back to their original indices. //! The index of the first point in the dataset contained in this node (and //! its children). size_t start; //! The index of the last point in the dataset contained in this node (and its //! children). size_t end; //! Upper half of bounding box for this node. arma::vec maxVals; //! Lower half of bounding box for this node. arma::vec minVals; //! The splitting dimension for this node. size_t splitDim; //! The split value on the splitting dimension for this node. double splitValue; //! log-negative-L2-error of the node. double logNegError; //! Sum of the error of the leaves of the subtree. double subtreeLeavesLogNegError; //! Number of leaves of the subtree. size_t subtreeLeaves; //! If true, this node is the root of the tree. bool root; //! Ratio of the number of points in the node to the total number of points. double ratio; //! The logarithm of the volume of the node. double logVolume; //! The tag for the leaf, used for hashing points. int bucketTag; //! Upper part of alpha sum; used for pruning. double alphaUpper; //! The left child. DTree* left; //! The right child. DTree* right; public: //! Return the starting index of points contained in this node. size_t Start() const { return start; } //! Return the first index of a point not contained in this node. size_t End() const { return end; } //! Return the split dimension of this node. size_t SplitDim() const { return splitDim; } //! Return the split value of this node. double SplitValue() const { return splitValue; } //! Return the log negative error of this node. double LogNegError() const { return logNegError; } //! Return the log negative error of all descendants of this node. double SubtreeLeavesLogNegError() const { return subtreeLeavesLogNegError; } //! Return the number of leaves which are descendants of this node. size_t SubtreeLeaves() const { return subtreeLeaves; } //! Return the ratio of points in this node to the points in the whole //! dataset. double Ratio() const { return ratio; } //! Return the inverse of the volume of this node. double LogVolume() const { return logVolume; } //! Return the left child. DTree* Left() const { return left; } //! Return the right child. DTree* Right() const { return right; } //! Return whether or not this is the root of the tree. bool Root() const { return root; } //! Return the upper part of the alpha sum. double AlphaUpper() const { return alphaUpper; } //! Return the maximum values. const arma::vec& MaxVals() const { return maxVals; } //! Modify the maximum values. arma::vec& MaxVals() { return maxVals; } //! Return the minimum values. const arma::vec& MinVals() const { return minVals; } //! Modify the minimum values. arma::vec& MinVals() { return minVals; } /** * Returns a string representation of this object. */ std::string ToString() const; private: // Utility methods. /** * Find the dimension to split on. */ bool FindSplit(const arma::mat& data, size_t& splitDim, double& splitValue, double& leftError, double& rightError, const size_t minLeafSize = 5) const; /** * Split the data, returning the number of points left of the split. */ size_t SplitData(arma::mat& data, const size_t splitDim, const double splitValue, arma::Col& oldFromNew) const; }; }; // namespace det }; // namespace mlpack #endif // __MLPACK_METHODS_DET_DTREE_HPP RcppMLPACK/inst/include/mlpack/methods/pca/0000755000176200001440000000000013647512751020136 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/pca/pca.hpp0000644000176200001440000001127213647512751021415 0ustar liggesusers/** * @file pca.hpp * @author Ajinkya Kale * * Defines the PCA class to perform Principal Components Analysis on the * specified data set. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_PCA_PCA_HPP #define __MLPACK_METHODS_PCA_PCA_HPP #include namespace mlpack { namespace pca { /** * This class implements principal components analysis (PCA). This is a common, * widely-used technique that is often used for either dimensionality reduction * or transforming data into a better basis. Further information on PCA can be * found in almost any statistics or machine learning textbook, and all over the * internet. */ class PCA { public: /** * Create the PCA object, specifying if the data should be scaled in each * dimension by standard deviation when PCA is performed. * * @param scaleData Whether or not to scale the data. */ PCA(const bool scaleData = false); /** * Apply Principal Component Analysis to the provided data set. It is safe to * pass the same matrix reference for both data and transformedData. * * @param data Data matrix. * @param transformedData Matrix to put results of PCA into. * @param eigval Vector to put eigenvalues into. * @param eigvec Matrix to put eigenvectors (loadings) into. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec) const; /** * Apply Principal Component Analysis to the provided data set. It is safe to * pass the same matrix reference for both data and transformedData. * * @param data Data matrix. * @param transformedData Matrix to store results of PCA in. * @param eigval Vector to put eigenvalues into. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigVal) const; /** * Use PCA for dimensionality reduction on the given dataset. This will save * the newDimension largest principal components of the data and remove the * rest. The parameter returned is the amount of variance of the data that is * retained; this is a value between 0 and 1. For instance, a value of 0.9 * indicates that 90% of the variance present in the data was retained. * * @param data Data matrix. * @param newDimension New dimension of the data. * @return Amount of the variance of the data retained (between 0 and 1). */ double Apply(arma::mat& data, const size_t newDimension) const; //! This overload is here to make sure int gets casted right to size_t. inline double Apply(arma::mat& data, const int newDimension) const { return Apply(data, size_t(newDimension)); } /** * Use PCA for dimensionality reduction on the given dataset. This will save * as many dimensions as necessary to retain at least the given amount of * variance (specified by parameter varRetained). The amount should be * between 0 and 1; if the amount is 0, then only 1 dimension will be * retained. If the amount is 1, then all dimensions will be retained. * * The method returns the actual amount of variance retained, which will * always be greater than or equal to the varRetained parameter. * * @param data Data matrix. * @param varRetained Lower bound on amount of variance to retain; should be * between 0 and 1. * @return Actual amount of variance retained (between 0 and 1). */ double Apply(arma::mat& data, const double varRetained) const; //! Get whether or not this PCA object will scale (by standard deviation) the //! data when PCA is performed. bool ScaleData() const { return scaleData; } //! Modify whether or not this PCA object will scale (by standard deviation) //! the data when PCA is performed. bool& ScaleData() { return scaleData; } // Returns a string representation of this object. std::string ToString() const; private: //! Whether or not the data will be scaled by standard deviation when PCA is //! performed. bool scaleData; }; // class PCA }; // namespace pca }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/mvu/0000755000176200001440000000000013647512751020202 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/mvu/mvu.hpp0000644000176200001440000000326413647512751021527 0ustar liggesusers/** * @file mvu.hpp * @author Ryan Curtin * * An implementation of Maximum Variance Unfolding. This file defines an MVU * class as well as a class representing the objective function (a semidefinite * program) which MVU seeks to minimize. Minimization is performed by the * Augmented Lagrangian optimizer (which in turn uses the L-BFGS optimizer). * * Note: this implementation of MVU does not work. See #189. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_MVU_MVU_HPP #define __MLPACK_METHODS_MVU_MVU_HPP #include namespace mlpack { namespace mvu { /** * The MVU class is meant to provide a good abstraction for users. The dataset * needs to be provided, as well as several parameters. * * - dataset * - new dimensionality */ class MVU { public: MVU(const arma::mat& dataIn); void Unfold(const size_t newDim, const size_t numNeighbors, arma::mat& outputCoordinates); private: const arma::mat& data; }; }; // namespace mvu }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/kmeans/0000755000176200001440000000000013647512751020651 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/kmeans/random_partition.hpp0000644000176200001440000000440513647512751024736 0ustar liggesusers/** * @file random_partition.hpp * @author Ryan Curtin * * Very simple partitioner which partitions the data randomly into the number of * desired clusters. Used as the default InitialPartitionPolicy for KMeans. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_RANDOM_PARTITION_HPP #define __MLPACK_METHODS_KMEANS_RANDOM_PARTITION_HPP #include namespace mlpack { namespace kmeans { /** * A very simple partitioner which partitions the data randomly into the number * of desired clusters. It has no parameters, and so an instance of the class * is not even necessary. */ class RandomPartition { public: //! Empty constructor, required by the InitialPartitionPolicy policy. RandomPartition() { } /** * Partition the given dataset into the given number of clusters. Assignments * are random, and the number of points in each cluster should be equal (or * approximately equal). * * @tparam MatType Type of data (arma::mat or arma::sp_mat). * @param data Dataset to partition. * @param clusters Number of clusters to split dataset into. * @param assignments Vector to store cluster assignments into. Values will * be between 0 and (clusters - 1). */ template inline static void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments) { // Implementation is so simple we'll put it here in the header file. assignments = arma::shuffle(arma::linspace >(0, (clusters - 1), data.n_cols)); } }; }; }; #endif RcppMLPACK/inst/include/mlpack/methods/kmeans/kmeans_impl.hpp0000644000176200001440000003233313647512751023665 0ustar liggesusers/** * @file kmeans_impl.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * @author Ryan Curtin * * Implementation for the K-means method for getting an initial point. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #include "kmeans.hpp" #include #include #include #include namespace mlpack { namespace kmeans { /** * Construct the K-Means object. */ template KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy>:: KMeans(const size_t maxIterations, const double overclusteringFactor, const MetricType metric, const InitialPartitionPolicy partitioner, const EmptyClusterPolicy emptyClusterAction) : maxIterations(maxIterations), metric(metric), partitioner(partitioner), emptyClusterAction(emptyClusterAction) { // Validate overclustering factor. if (overclusteringFactor < 1.0) { Rcpp::Rcout << "KMeans::KMeans(): overclustering factor must be >= 1.0 (" << overclusteringFactor << " given). Setting factor to 1.0.\n"; this->overclusteringFactor = 1.0; } else { this->overclusteringFactor = overclusteringFactor; } } /** * Perform k-means clustering on the data, returning a list of cluster * assignments. This just forward to the other function, which returns the * centroids too. If this is properly inlined, there shouldn't be any * performance penalty whatsoever. */ template template inline void KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy>:: Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, const bool initialGuess) const { MatType centroids(data.n_rows, clusters); Cluster(data, clusters, assignments, centroids, initialGuess); } /** * Perform k-means clustering on the data, returning a list of cluster * assignments and the centroids of each cluster. */ template template void KMeans< MetricType, InitialPartitionPolicy, EmptyClusterPolicy>:: Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, MatType& centroids, const bool initialAssignmentGuess, const bool initialCentroidGuess) const { // Make sure we have more points than clusters. if (clusters > data.n_cols) Rcpp::Rcout << "KMeans::Cluster(): more clusters requested than points given." << std::endl; // Make sure our overclustering factor is valid. size_t actualClusters = size_t(overclusteringFactor * clusters); if (actualClusters > data.n_cols && overclusteringFactor != 1.0) { Rcpp::Rcout << "KMeans::Cluster(): overclustering factor is too large. No " << "overclustering will be done." << std::endl; actualClusters = clusters; } // Now, the initial assignments. First determine if they are necessary. if (initialAssignmentGuess) { if (assignments.n_elem != data.n_cols) Rcpp::Rcerr << "KMeans::Cluster(): initial cluster assignments (length " << assignments.n_elem << ") not the same size as the dataset (size " << data.n_cols << ")!" << std::endl; } else if (initialCentroidGuess) { if (centroids.n_cols != clusters) Rcpp::Rcerr << "KMeans::Cluster(): wrong number of initial cluster " << "centroids (" << centroids.n_cols << ", should be " << clusters << ")!" << std::endl; if (centroids.n_rows != data.n_rows) Rcpp::Rcerr << "KMeans::Cluster(): initial cluster centroids have wrong " << " dimensionality (" << centroids.n_rows << ", should be " << data.n_rows << ")!" << std::endl; // If there were no problems, construct the initial assignments from the // given centroids. assignments.set_size(data.n_cols); for (size_t i = 0; i < data.n_cols; ++i) { // Find the closest centroid to this point. double minDistance = std::numeric_limits::infinity(); size_t closestCluster = clusters; // Invalid value. for (size_t j = 0; j < clusters; j++) { double distance = metric.Evaluate(data.col(i), centroids.col(j)); if (distance < minDistance) { minDistance = distance; closestCluster = j; } } // Assign the point to the closest cluster that we found. assignments[i] = closestCluster; } } else { // Use the partitioner to come up with the partition assignments. partitioner.Cluster(data, actualClusters, assignments); } // Counts of points in each cluster. arma::Col counts(actualClusters); counts.zeros(); // Resize to correct size. centroids.set_size(data.n_rows, actualClusters); // Set counts correctly. for (size_t i = 0; i < assignments.n_elem; i++) counts[assignments[i]]++; size_t changedAssignments = 0; size_t iteration = 0; do { // Update step. // Calculate centroids based on given assignments. centroids.zeros(); for (size_t i = 0; i < data.n_cols; i++) centroids.col(assignments[i]) += data.col(i); for (size_t i = 0; i < actualClusters; i++) centroids.col(i) /= counts[i]; // Assignment step. // Find the closest centroid to each point. We will keep track of how many // assignments change. When no assignments change, we are done. changedAssignments = 0; for (size_t i = 0; i < data.n_cols; i++) { // Find the closest centroid to this point. double minDistance = std::numeric_limits::infinity(); size_t closestCluster = actualClusters; // Invalid value. for (size_t j = 0; j < actualClusters; j++) { double distance = metric.Evaluate(data.col(i), centroids.col(j)); if (distance < minDistance) { minDistance = distance; closestCluster = j; } } // Reassign this point to the closest cluster. if (assignments[i] != closestCluster) { // Update counts. counts[assignments[i]]--; counts[closestCluster]++; // Update assignment. assignments[i] = closestCluster; changedAssignments++; } } // If we are not allowing empty clusters, then check that all of our // clusters have points. for (size_t i = 0; i < actualClusters; i++) if (counts[i] == 0) changedAssignments += emptyClusterAction.EmptyCluster(data, i, centroids, counts, assignments); iteration++; } while (changedAssignments > 0 && iteration != maxIterations); if (iteration != maxIterations) { Rcpp::Rcout << "KMeans::Cluster(): converged after " << iteration << " iterations." << std::endl; } else { Rcpp::Rcout << "KMeans::Cluster(): terminated after limit of " << iteration << " iterations." << std::endl; // Recalculate final clusters. centroids.zeros(); for (size_t i = 0; i < data.n_cols; i++) centroids.col(assignments[i]) += data.col(i); for (size_t i = 0; i < actualClusters; i++) centroids.col(i) /= counts[i]; } // If we have overclustered, we need to merge the nearest clusters. if (actualClusters != clusters) { // Generate a list of all the clusters' distances from each other. This // list will become mangled and unused as the number of clusters decreases. size_t numDistances = ((actualClusters - 1) * actualClusters) / 2; size_t clustersLeft = actualClusters; arma::vec distances(numDistances); arma::Col firstCluster(numDistances); arma::Col secondCluster(numDistances); // Keep the mappings of clusters that we are changing. arma::Col mappings = arma::linspace >(0, actualClusters - 1, actualClusters); size_t i = 0; for (size_t first = 0; first < actualClusters; first++) { for (size_t second = first + 1; second < actualClusters; second++) { distances(i) = metric.Evaluate(centroids.col(first), centroids.col(second)); firstCluster(i) = first; secondCluster(i) = second; i++; } } while (clustersLeft != clusters) { arma::uword minIndex; distances.min(minIndex); // Now we merge the clusters which that distance belongs to. size_t first = firstCluster(minIndex); size_t second = secondCluster(minIndex); for (size_t j = 0; j < assignments.n_elem; j++) if (assignments(j) == second) assignments(j) = first; // Now merge the centroids. centroids.col(first) *= counts[first]; centroids.col(first) += (counts[second] * centroids.col(second)); centroids.col(first) /= (counts[first] + counts[second]); // Update the counts. counts[first] += counts[second]; counts[second] = 0; // Now update all the relevant distances. // First the distances where either cluster is the second cluster. for (size_t cluster = 0; cluster < second; cluster++) { // The offset is sum^n i - sum^(n - m) i, where n is actualClusters and // m is the cluster we are trying to offset to. size_t offset = (size_t) (((actualClusters - 1) * cluster) + (cluster - pow(cluster, 2.0)) / 2) - 1; // See if we need to update the distance from this cluster to the first // cluster. if (cluster < first) { // Make sure it isn't already DBL_MAX. if (distances(offset + (first - cluster)) != DBL_MAX) distances(offset + (first - cluster)) = metric.Evaluate( centroids.col(first), centroids.col(cluster)); } distances(offset + (second - cluster)) = DBL_MAX; } // Now the distances where the first cluster is the first cluster. size_t offset = (size_t) (((actualClusters - 1) * first) + (first - pow(first, 2.0)) / 2) - 1; for (size_t cluster = first + 1; cluster < actualClusters; cluster++) { // Make sure it isn't already DBL_MAX. if (distances(offset + (cluster - first)) != DBL_MAX) { distances(offset + (cluster - first)) = metric.Evaluate( centroids.col(first), centroids.col(cluster)); } } // Max the distance between the first and second clusters. distances(offset + (second - first)) = DBL_MAX; // Now max the distances for the second cluster (which no longer has // anything in it). offset = (size_t) (((actualClusters - 1) * second) + (second - pow(second, 2.0)) / 2) - 1; for (size_t cluster = second + 1; cluster < actualClusters; cluster++) distances(offset + (cluster - second)) = DBL_MAX; clustersLeft--; // Update the cluster mappings. mappings(second) = first; // Also update any mappings that were pointed at the previous cluster. for (size_t cluster = 0; cluster < actualClusters; cluster++) if (mappings(cluster) == second) mappings(cluster) = first; } // Now remap the mappings down to the smallest possible numbers. // Could this process be sped up? arma::Col remappings(actualClusters); remappings.fill(actualClusters); size_t remap = 0; // Counter variable. for (size_t cluster = 0; cluster < actualClusters; cluster++) { // If the mapping of the current cluster has not been assigned a value // yet, we will assign it a cluster number. if (remappings(mappings(cluster)) == actualClusters) { remappings(mappings(cluster)) = remap; remap++; } } // Fix the assignments using the mappings we created. for (size_t j = 0; j < assignments.n_elem; j++) assignments(j) = remappings(mappings(assignments(j))); } } template std::string KMeans::ToString() const { std::ostringstream convert; convert << "KMeans [" << this << "]" << std::endl; convert << " Overclustering Factor: " << overclusteringFactor <. */ #ifndef __MLPACK_METHODS_KMEANS_REFINED_START_IMPL_HPP #define __MLPACK_METHODS_KMEANS_REFINED_START_IMPL_HPP // In case it hasn't been included yet. #include "refined_start.hpp" namespace mlpack { namespace kmeans { //! Partition the given dataset according to Bradley and Fayyad's algorithm. template void RefinedStart::Cluster(const MatType& data, const size_t clusters, arma::Col& assignments) const { math::RandomSeed(std::time(NULL)); // This will hold the sampled datasets. const size_t numPoints = size_t(percentage * data.n_cols); MatType sampledData(data.n_rows, numPoints); // vector is packed so each bool is 1 bit. std::vector pointsUsed(data.n_cols, false); arma::mat sampledCentroids(data.n_rows, samplings * clusters); // We will use these objects repeatedly for clustering. arma::Col sampledAssignments; arma::mat centroids; KMeans<> kmeans; for (size_t i = 0; i < samplings; ++i) { // First, assemble the sampled dataset. size_t curSample = 0; while (curSample < numPoints) { // Pick a random point in [0, numPoints). size_t sample = (size_t) math::RandInt(data.n_cols); if (!pointsUsed[sample]) { // This point isn't used yet. So we'll put it in our sample. pointsUsed[sample] = true; sampledData.col(curSample) = data.col(sample); ++curSample; } } // Now, using the sampled dataset, run k-means. In the case of an empty // cluster, we re-initialize that cluster as the point furthest away from // the cluster with maximum variance. This is not *exactly* what the paper // implements, but it is quite similar, and we'll call it "good enough". kmeans.Cluster(sampledData, clusters, sampledAssignments, centroids); // Store the sampled centroids. sampledCentroids.cols(i * clusters, (i + 1) * clusters - 1) = centroids; pointsUsed.assign(data.n_cols, false); } // Now, we run k-means on the sampled centroids to get our final clusters. kmeans.Cluster(sampledCentroids, clusters, sampledAssignments, centroids); // Turn the final centroids into assignments. assignments.set_size(data.n_cols); for (size_t i = 0; i < data.n_cols; ++i) { // Find the closest centroid to this point. double minDistance = std::numeric_limits::infinity(); size_t closestCluster = clusters; for (size_t j = 0; j < clusters; ++j) { const double distance = kmeans.Metric().Evaluate(data.col(i), centroids.col(j)); if (distance < minDistance) { minDistance = distance; closestCluster = j; } } // Assign the point to its closest cluster. assignments[i] = closestCluster; } } }; // namespace kmeans }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/kmeans/max_variance_new_cluster.hpp0000644000176200001440000000463013647512751026434 0ustar liggesusers/** * @file max_variance_new_cluster.hpp * @author Ryan Curtin * * An implementation of the EmptyClusterPolicy policy class for K-Means. When * an empty cluster is detected, the point furthest from the centroid of the * cluster with maximum variance is taken to be a new cluster. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_HPP #define __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_HPP #include namespace mlpack { namespace kmeans { /** * When an empty cluster is detected, this class takes the point furthest from * the centroid of the cluster with maximum variance as a new cluster. */ class MaxVarianceNewCluster { public: //! Default constructor required by EmptyClusterPolicy. MaxVarianceNewCluster() { } /** * Take the point furthest from the centroid of the cluster with maximum * variance to be a new cluster. * * @tparam MatType Type of data (arma::mat or arma::sp_mat). * @param data Dataset on which clustering is being performed. * @param emptyCluster Index of cluster which is empty. * @param centroids Centroids of each cluster (one per column). * @param clusterCounts Number of points in each cluster. * @param assignments Cluster assignments of each point. * * @return Number of points changed. */ template static size_t EmptyCluster(const MatType& data, const size_t emptyCluster, const MatType& centroids, arma::Col& clusterCounts, arma::Col& assignments); }; }; // namespace kmeans }; // namespace mlpack // Include implementation. #include "max_variance_new_cluster_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/kmeans/kmeans.hpp0000644000176200001440000002107213647512751022642 0ustar liggesusers/** * @file kmeans.hpp * @author Parikshit Ram (pram@cc.gatech.edu) * * K-Means clustering. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_KMEANS_HPP #define __MLPACK_METHODS_KMEANS_KMEANS_HPP #include #include #include "random_partition.hpp" #include "max_variance_new_cluster.hpp" #include namespace mlpack { namespace kmeans /** K-Means clustering. */ { /** * This class implements K-Means clustering. This implementation supports * overclustering, which means that more clusters than are requested will be * found; then, those clusters will be merged together to produce the desired * number of clusters. * * Two template parameters can (optionally) be supplied: the policy for how to * find the initial partition of the data, and the actions to be taken when an * empty cluster is encountered, as well as the distance metric to be used. * * A simple example of how to run K-Means clustering is shown below. * * @code * extern arma::mat data; // Dataset we want to run K-Means on. * arma::Col assignments; // Cluster assignments. * * KMeans<> k; // Default options. * k.Cluster(data, 3, assignments); // 3 clusters. * * // Cluster using the Manhattan distance, 100 iterations maximum, and an * // overclustering factor of 4.0. * KMeans k(100, 4.0); * k.Cluster(data, 6, assignments); // 6 clusters. * @endcode * * @tparam MetricType The distance metric to use for this KMeans; see * metric::LMetric for an example. * @tparam InitialPartitionPolicy Initial partitioning policy; must implement a * default constructor and 'void Cluster(const arma::mat&, const size_t, * arma::Col&)'. * @tparam EmptyClusterPolicy Policy for what to do on an empty cluster; must * implement a default constructor and 'void EmptyCluster(const arma::mat&, * arma::Col class KMeans { public: /** * Create a K-Means object and (optionally) set the parameters which K-Means * will be run with. This implementation allows a few strategies to improve * the performance of K-Means, including "overclustering" and disallowing * empty clusters. * * The overclustering factor controls how many clusters are * actually found; for instance, with an overclustering factor of 4, if * K-Means is run to find 3 clusters, it will actually find 12, then merge the * nearest clusters until only 3 are left. * * @param maxIterations Maximum number of iterations allowed before giving up * (0 is valid, but the algorithm may never terminate). * @param overclusteringFactor Factor controlling how many extra clusters are * found and then merged to get the desired number of clusters. * @param metric Optional MetricType object; for when the metric has state * it needs to store. * @param partitioner Optional InitialPartitionPolicy object; for when a * specially initialized partitioning policy is required. * @param emptyClusterAction Optional EmptyClusterPolicy object; for when a * specially initialized empty cluster policy is required. */ KMeans(const size_t maxIterations = 1000, const double overclusteringFactor = 1.0, const MetricType metric = MetricType(), const InitialPartitionPolicy partitioner = InitialPartitionPolicy(), const EmptyClusterPolicy emptyClusterAction = EmptyClusterPolicy()); /** * Perform k-means clustering on the data, returning a list of cluster * assignments. Optionally, the vector of assignments can be set to an * initial guess of the cluster assignments; to do this, set initialGuess to * true. * * @tparam MatType Type of matrix (arma::mat or arma::sp_mat). * @param data Dataset to cluster. * @param clusters Number of clusters to compute. * @param assignments Vector to store cluster assignments in. * @param initialGuess If true, then it is assumed that assignments has a list * of initial cluster assignments. */ template void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, const bool initialGuess = false) const; /** * Perform k-means clustering on the data, returning a list of cluster * assignments and also the centroids of each cluster. Optionally, the vector * of assignments can be set to an initial guess of the cluster assignments; * to do this, set initialAssignmentGuess to true. Another way to set initial * cluster guesses is to fill the centroids matrix with the centroid guesses, * and then set initialCentroidGuess to true. initialAssignmentGuess * supersedes initialCentroidGuess, so if both are set to true, the * assignments vector is used. * * Note that if the overclustering factor is greater than 1, the centroids * matrix will be resized in the method. Regardless of the overclustering * factor, the centroid guess matrix (if initialCentroidGuess is set to true) * should have the same number of rows as the data matrix, and number of * columns equal to 'clusters'. * * @tparam MatType Type of matrix (arma::mat or arma::sp_mat). * @param data Dataset to cluster. * @param clusters Number of clusters to compute. * @param assignments Vector to store cluster assignments in. * @param centroids Matrix in which centroids are stored. * @param initialAssignmentGuess If true, then it is assumed that assignments * has a list of initial cluster assignments. * @param initialCentroidGuess If true, then it is assumed that centroids * contains the initial centroids of each cluster. */ template void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments, MatType& centroids, const bool initialAssignmentGuess = false, const bool initialCentroidGuess = false) const; //! Return the overclustering factor. double OverclusteringFactor() const { return overclusteringFactor; } //! Set the overclustering factor. Must be greater than 1. double& OverclusteringFactor() { return overclusteringFactor; } //! Get the maximum number of iterations. size_t MaxIterations() const { return maxIterations; } //! Set the maximum number of iterations. size_t& MaxIterations() { return maxIterations; } //! Get the distance metric. const MetricType& Metric() const { return metric; } //! Modify the distance metric. MetricType& Metric() { return metric; } //! Get the initial partitioning policy. const InitialPartitionPolicy& Partitioner() const { return partitioner; } //! Modify the initial partitioning policy. InitialPartitionPolicy& Partitioner() { return partitioner; } //! Get the empty cluster policy. const EmptyClusterPolicy& EmptyClusterAction() const { return emptyClusterAction; } //! Modify the empty cluster policy. EmptyClusterPolicy& EmptyClusterAction() { return emptyClusterAction; } // Returns a string representation of this object. std::string ToString() const; private: //! Factor controlling how many clusters are actually found. double overclusteringFactor; //! Maximum number of iterations before giving up. size_t maxIterations; //! Instantiated distance metric. MetricType metric; //! Instantiated initial partitioning policy. InitialPartitionPolicy partitioner; //! Instantiated empty cluster policy. EmptyClusterPolicy emptyClusterAction; }; }; // namespace kmeans }; // namespace mlpack // Include implementation. #include "kmeans_impl.hpp" #endif // __MLPACK_METHODS_MOG_KMEANS_HPP RcppMLPACK/inst/include/mlpack/methods/kmeans/max_variance_new_cluster_impl.hpp0000644000176200001440000000661013647512751027455 0ustar liggesusers/** * @file max_variance_new_cluster_impl.hpp * @author Ryan Curtin * * Implementation of MaxVarianceNewCluster class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_IMPL_HPP #define __MLPACK_METHODS_KMEANS_MAX_VARIANCE_NEW_CLUSTER_IMPL_HPP // Just in case it has not been included. #include "max_variance_new_cluster.hpp" namespace mlpack { namespace kmeans { /** * Take action about an empty cluster. */ template size_t MaxVarianceNewCluster::EmptyCluster(const MatType& data, const size_t emptyCluster, const MatType& centroids, arma::Col& clusterCounts, arma::Col& assignments) { // First, we need to find the cluster with maximum variance (by which I mean // the sum of the covariance matrices). arma::vec variances; variances.zeros(clusterCounts.n_elem); // Start with 0. // Add the variance of each point's distance away from the cluster. I think // this is the sensible thing to do. for (size_t i = 0; i < data.n_cols; ++i) { variances[assignments[i]] += metric::SquaredEuclideanDistance::Evaluate( data.col(i), centroids.col(assignments[i])); } // Divide by the number of points in the cluster to produce the variance. // Although a -nan will occur here for the empty cluster(s), this doesn't // matter because variances.max() won't pick it up. If the number of points // in the cluster is 1, we ensure that cluster is not selected by forcing the // variance to 0. for (size_t i = 0; i < clusterCounts.n_elem; ++i) variances[i] /= (clusterCounts[i] == 1) ? DBL_MAX : clusterCounts[i]; // Now find the cluster with maximum variance. arma::uword maxVarCluster; variances.max(maxVarCluster); // Now, inside this cluster, find the point which is furthest away. size_t furthestPoint = data.n_cols; double maxDistance = -DBL_MAX; for (size_t i = 0; i < data.n_cols; ++i) { if (assignments[i] == maxVarCluster) { double distance = arma::as_scalar( arma::var(data.col(i) - centroids.col(maxVarCluster))); if (distance > maxDistance) { maxDistance = distance; furthestPoint = i; } } } // Take that point and add it to the empty cluster. clusterCounts[maxVarCluster]--; clusterCounts[emptyCluster]++; assignments[furthestPoint] = emptyCluster; // Output some debugging information. Rcpp::Rcout << "Point " << furthestPoint << " assigned to empty cluster " << emptyCluster << ".\n"; return 1; // We only changed one point. } }; // namespace kmeans }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/kmeans/refined_start.hpp0000644000176200001440000000652313647512751024221 0ustar liggesusers/** * @file refined_start.hpp * @author Ryan Curtin * * An implementation of Bradley and Fayyad's "Refining Initial Points for * K-Means clustering". This class is meant to provide better initial points * for the k-means algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_REFINED_START_HPP #define __MLPACK_METHODS_KMEANS_REFINED_START_HPP #include namespace mlpack { namespace kmeans { /** * A refined approach for choosing initial points for k-means clustering. This * approach runs k-means several times on random subsets of the data, and then * clusters those solutions to select refined initial cluster assignments. It * is an implementation of the following paper: * * @inproceedings{bradley1998refining, * title={Refining initial points for k-means clustering}, * author={Bradley, Paul S and Fayyad, Usama M}, * booktitle={Proceedings of the Fifteenth International Conference on Machine * Learning (ICML 1998)}, * volume={66}, * year={1998} * } */ class RefinedStart { public: /** * Create the RefinedStart object, optionally specifying parameters for the * number of samplings to perform and the percentage of the dataset to use in * each sampling. */ RefinedStart(const size_t samplings = 100, const double percentage = 0.02) : samplings(samplings), percentage(percentage) { } /** * Partition the given dataset into the given number of clusters according to * the random sampling scheme outlined in Bradley and Fayyad's paper. * * @tparam MatType Type of data (arma::mat or arma::sp_mat). * @param data Dataset to partition. * @param clusters Number of clusters to split dataset into. * @param assignments Vector to store cluster assignments into. Values will * be between 0 and (clusters - 1). */ template void Cluster(const MatType& data, const size_t clusters, arma::Col& assignments) const; //! Get the number of samplings that will be performed. size_t Samplings() const { return samplings; } //! Modify the number of samplings that will be performed. size_t& Samplings() { return samplings; } //! Get the percentage of the data used by each subsampling. double Percentage() const { return percentage; } //! Modify the percentage of the data used by each subsampling. double& Percentage() { return percentage; } private: //! The number of samplings to perform. size_t samplings; //! The percentage of the data to use for each subsampling. double percentage; }; }; // namespace kmeans }; // namespace mlpack // Include implementation. #include "refined_start_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/kmeans/allow_empty_clusters.hpp0000644000176200001440000000434313647512751025646 0ustar liggesusers/** * @file allow_empty_clusters.hpp * @author Ryan Curtin * * This very simple policy is used when K-Means is allowed to return empty * clusters. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KMEANS_ALLOW_EMPTY_CLUSTERS_HPP #define __MLPACK_METHODS_KMEANS_ALLOW_EMPTY_CLUSTERS_HPP #include namespace mlpack { namespace kmeans { /** * Policy which allows K-Means to create empty clusters without any error being * reported. */ class AllowEmptyClusters { public: //! Default constructor required by EmptyClusterPolicy policy. AllowEmptyClusters() { } /** * This function does nothing. It is called by K-Means when K-Means detects * an empty cluster. * * @tparam MatType Type of data (arma::mat or arma::spmat). * @param data Dataset on which clustering is being performed. * @param emptyCluster Index of cluster which is empty. * @param centroids Centroids of each cluster (one per column). * @param clusterCounts Number of points in each cluster. * @param assignments Cluster assignments of each point. * * @return Number of points changed (0). */ template static size_t EmptyCluster(const MatType& /* data */, const size_t /* emptyCluster */, const MatType& /* centroids */, arma::Col& /* clusterCounts */, arma::Col& /* assignments */) { // Empty clusters are okay! Do nothing. return 0; } }; }; // namespace kmeans }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/perceptron/0000755000176200001440000000000013647512751021554 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/perceptron/perceptron_impl.hpp0000644000176200001440000001324013647512751025467 0ustar liggesusers/** * @file perceptron_impl.hpp * @author Udit Saxena * * Implementation of Perceptron class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_IMPL_HPP #define __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_IMPL_HPP #include "perceptron.hpp" namespace mlpack { namespace perceptron { /** * Constructor - constructs the perceptron. Or rather, builds the weightVectors * matrix, which is later used in Classification. * It adds a bias input vector of 1 to the input data to take care of the bias * weights. * * @param data Input, training data. * @param labels Labels of dataset. * @param iterations Maximum number of iterations for the perceptron learning * algorithm. */ template< typename LearnPolicy, typename WeightInitializationPolicy, typename MatType > Perceptron::Perceptron( const MatType& data, const arma::Row& labels, int iterations) { WeightInitializationPolicy WIP; WIP.Initialize(weightVectors, arma::max(labels) + 1, data.n_rows + 1); // Start training. classLabels = labels; trainData = data; // Insert a row of ones at the top of the training data set. MatType zOnes(1, data.n_cols); zOnes.fill(1); trainData.insert_rows(0, zOnes); iter = iterations; arma::rowvec D(data.n_cols); D.fill(1.0);// giving equal weight to all the points. Train(D); } /** * Classification function. After training, use the weightVectors matrix to * classify test, and put the predicted classes in predictedLabels. * * @param test testing data or data to classify. * @param predictedLabels vector to store the predicted classes after * classifying test */ template void Perceptron::Classify( const MatType& test, arma::Row& predictedLabels) { arma::mat tempLabelMat; arma::uword maxIndexRow, maxIndexCol; for (size_t i = 0; i < test.n_cols; i++) { tempLabelMat = weightVectors.submat(0, 1, weightVectors.n_rows - 1, weightVectors.n_cols - 1) * test.col(i) + weightVectors.col(0); tempLabelMat.max(maxIndexRow, maxIndexCol); predictedLabels(0, i) = maxIndexRow; } // predictedLabels.print("These are the labels predicted by the perceptron"); } /** * Alternate constructor which copies parameters from an already initiated * perceptron. * * @param other The other initiated Perceptron object from which we copy the * values from. * @param data The data on which to train this Perceptron object on. * @param D Weight vector to use while training. For boosting purposes. * @param labels The labels of data. */ template Perceptron::Perceptron( const Perceptron<>& other, MatType& data, const arma::rowvec& D, const arma::Row& labels) { classLabels = labels; trainData = data; iter = other.iter; // Insert a row of ones at the top of the training data set. MatType zOnes(1, data.n_cols); zOnes.fill(1); trainData.insert_rows(0, zOnes); WeightInitializationPolicy WIP; WIP.Initialize(weightVectors, arma::max(labels) + 1, data.n_rows + 1); Train(D); } /** * Training Function. It trains on trainData using the cost matrix D * * @param D Cost matrix. Stores the cost of mispredicting instances */ template< typename LearnPolicy, typename WeightInitializationPolicy, typename MatType > void Perceptron::Train( const arma::rowvec& D) { size_t j, i = 0; bool converged = false; size_t tempLabel; arma::uword maxIndexRow, maxIndexCol; arma::mat tempLabelMat; LearnPolicy LP; while ((i < iter) && (!converged)) { // This outer loop is for each iteration, and we use the 'converged' // variable for noting whether or not convergence has been reached. i++; converged = true; // Now this inner loop is for going through the dataset in each iteration. for (j = 0; j < trainData.n_cols; j++) { // Multiply for each variable and check whether the current weight vector // correctly classifies this. tempLabelMat = weightVectors * trainData.col(j); tempLabelMat.max(maxIndexRow, maxIndexCol); // Check whether prediction is correct. if (maxIndexRow != classLabels(0, j)) { // Due to incorrect prediction, convergence set to false. converged = false; tempLabel = classLabels(0, j); // Send maxIndexRow for knowing which weight to update, send j to know // the value of the vector to update it with. Send tempLabel to know // the correct class. LP.UpdateWeights(trainData, weightVectors, j, tempLabel, maxIndexRow, D); } } } } }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/perceptron/learning_policies/0000755000176200001440000000000013647512751025242 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/perceptron/learning_policies/simple_weight_update.hpp0000644000176200001440000000512013647512751032153 0ustar liggesusers/** * @file simple_weight_update.hpp * @author Udit Saxena * * Simple weight update rule for the perceptron. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHODS_PERCEPTRON_LEARNING_POLICIES_SIMPLE_WEIGHT_UPDATE_HPP #define _MLPACK_METHODS_PERCEPTRON_LEARNING_POLICIES_SIMPLE_WEIGHT_UPDATE_HPP #include /** * This class is used to update the weightVectors matrix according to the simple * update rule as discussed by Rosenblatt: * * if a vector x has been incorrectly classified by a weight w, * then w = w - x * and w'= w'+ x * * where w' is the weight vector which correctly classifies x. */ namespace mlpack { namespace perceptron { class SimpleWeightUpdate { public: /** * This function is called to update the weightVectors matrix. * It decreases the weights of the incorrectly classified class while * increasing the weight of the correct class it should have been classified to. * * @param trainData The training dataset. * @param weightVectors Matrix of weight vectors. * @param rowIndex Index of the row which has been incorrectly predicted. * @param labelIndex Index of the vector in trainData. * @param vectorIndex Index of the class which should have been predicted. * @param D Cost of mispredicting the labelIndex instance. */ void UpdateWeights(const arma::mat& trainData, arma::mat& weightVectors, const size_t labelIndex, const size_t vectorIndex, const size_t rowIndex, const arma::rowvec& D) { weightVectors.row(rowIndex) = weightVectors.row(rowIndex) - D(labelIndex) * trainData.col(labelIndex).t(); weightVectors.row(vectorIndex) = weightVectors.row(vectorIndex) + D(labelIndex) * trainData.col(labelIndex).t(); } }; }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/perceptron/initialization_methods/0000755000176200001440000000000013647512751026326 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/perceptron/initialization_methods/random_init.hpp0000644000176200001440000000271713647512751031351 0ustar liggesusers/** * @file random_init.hpp * @author Udit Saxena * * Random initialization for perceptron weights. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_RANDOM_INIT_HPP #define _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_RANDOM_INIT_HPP #include namespace mlpack { namespace perceptron { /** * This class is used to initialize weights for the weightVectors matrix in a * random manner. */ class RandomInitialization { public: RandomInitialization() { } inline static void Initialize(arma::mat& W, const size_t row, const size_t col) { W = arma::randu(row, col); } }; // class RandomInitialization }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/perceptron/initialization_methods/zero_init.hpp0000644000176200001440000000274413647512751031050 0ustar liggesusers/** * @file zero_init.hpp * @author Udit Saxena * * Implementation of ZeroInitialization policy for perceptrons. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_ZERO_INIT_HPP #define _MLPACK_METHOS_PERCEPTRON_INITIALIZATION_METHODS_ZERO_INIT_HPP #include namespace mlpack { namespace perceptron { /** * This class is used to initialize the matrix weightVectors to zero. */ class ZeroInitialization { public: ZeroInitialization() { } inline static void Initialize(arma::mat& W, const size_t row, const size_t col) { arma::mat tempWeights(row, col); tempWeights.fill(0.0); W = tempWeights; } }; // class ZeroInitialization }; // namespace perceptron }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/perceptron/perceptron.hpp0000644000176200001440000000716113647512751024453 0ustar liggesusers/** * @file perceptron.hpp * @author Udit Saxena * * Definition of Perceptron class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_HPP #define __MLPACK_METHODS_PERCEPTRON_PERCEPTRON_HPP #include #include "initialization_methods/zero_init.hpp" #include "initialization_methods/random_init.hpp" #include "learning_policies/simple_weight_update.hpp" namespace mlpack { namespace perceptron { /** * This class implements a simple perceptron (i.e., a single layer neural * network). It converges if the supplied training dataset is linearly * separable. * * @tparam LearnPolicy Options of SimpleWeightUpdate and GradientDescent. * @tparam WeightInitializationPolicy Option of ZeroInitialization and * RandomInitialization. */ template class Perceptron { public: /** * Constructor - constructs the perceptron by building the weightVectors * matrix, which is later used in Classification. It adds a bias input vector * of 1 to the input data to take care of the bias weights. * * @param data Input, training data. * @param labels Labels of dataset. * @param iterations Maximum number of iterations for the perceptron learning * algorithm. */ Perceptron(const MatType& data, const arma::Row& labels, int iterations); /** * Classification function. After training, use the weightVectors matrix to * classify test, and put the predicted classes in predictedLabels. * * @param test Testing data or data to classify. * @param predictedLabels Vector to store the predicted classes after * classifying test. */ void Classify(const MatType& test, arma::Row& predictedLabels); /** * Alternate constructor which copies parameters from an already initiated * perceptron. * * @param other The other initiated Perceptron object from which we copy the * values from. * @param data The data on which to train this Perceptron object on. * @param D Weight vector to use while training. For boosting purposes. * @param labels The labels of data. */ Perceptron(const Perceptron<>& other, MatType& data, const arma::rowvec& D, const arma::Row& labels); private: //! To store the number of iterations size_t iter; //! Stores the class labels for the input data. arma::Row classLabels; //! Stores the weight vectors for each of the input class labels. arma::mat weightVectors; //! Stores the training data to be used later on in UpdateWeights. arma::mat trainData; /** * Training Function. It trains on trainData using the cost matrix D * * @param D Cost matrix. Stores the cost of mispredicting instances */ void Train(const arma::rowvec& D); }; } // namespace perceptron } // namespace mlpack #include "perceptron_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/decision_stump/0000755000176200001440000000000013647512751022420 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/decision_stump/decision_stump_impl.hpp0000644000176200001440000002772613647512751027215 0ustar liggesusers/** * @file decision_stump_impl.hpp * @author Udit Saxena * * Implementation of DecisionStump class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_IMPL_HPP #define __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_IMPL_HPP // In case it hasn't been included yet. #include "decision_stump.hpp" #include #include namespace mlpack { namespace decision_stump { /** * Constructor. Train on the provided data. Generate a decision stump from data. * * @param data Input, training data. * @param labels Labels of data. * @param classes Number of distinct classes in labels. * @param inpBucketSize Minimum size of bucket when splitting. */ template DecisionStump::DecisionStump(const MatType& data, const arma::Row& labels, const size_t classes, size_t inpBucketSize) { numClass = classes; bucketSize = inpBucketSize; // If classLabels are not all identical, proceed with training. size_t bestAtt = 0; double entropy; const double rootEntropy = CalculateEntropy( labels.subvec(0, labels.n_elem - 1)); double gain, bestGain = 0.0; for (size_t i = 0; i < data.n_rows; i++) { // Go through each attribute of the data. if (IsDistinct(data.row(i))) { // For each attribute with non-identical values, treat it as a potential // splitting attribute and calculate entropy if split on it. entropy = SetupSplitAttribute(data.row(i), labels); // Log::Debug << "Entropy for attribute " << i << " is " << entropy << ".\n"; gain = rootEntropy - entropy; // Find the attribute with the best entropy so that the gain is // maximized. // if (entropy < bestEntropy) // Instead of the above rule, we are maximizing gain, which was // what is returned from SetupSplitAttribute. if (gain < bestGain) { bestAtt = i; bestGain = gain; } } } splitAttribute = bestAtt; // Once the splitting column/attribute has been decided, train on it. TrainOnAtt(data.row(splitAttribute), labels); } /** * Classification function. After training, classify test, and put the predicted * classes in predictedLabels. * * @param test Testing data or data to classify. * @param predictedLabels Vector to store the predicted classes after * classifying test */ template void DecisionStump::Classify(const MatType& test, arma::Row& predictedLabels) { for (size_t i = 0; i < test.n_cols; i++) { // Determine which bin the test point falls into. // Assume first that it falls into the first bin, then proceed through the // bins until it is known which bin it falls into. size_t bin = 0; const double val = test(splitAttribute, i); while (bin < split.n_elem - 1) { if (val < split(bin + 1)) break; ++bin; } predictedLabels(i) = binLabels(bin); } } /** * * * * * */ template DecisionStump::DecisionStump(const DecisionStump<>& ds) { numClass = ds.numClass; splitAttribute = ds.splitAttribute; bucketSize = ds.bucketSize; split = ds.split; binLabels = ds.binLabels; } /** * * * * * * template DecisionStump::ModifyData(MatType& data, const arma::Row& D) */ /** * Sets up attribute as if it were splitting on it and finds entropy when * splitting on attribute. * * @param attribute A row from the training data, which might be a candidate for * the splitting attribute. */ template double DecisionStump::SetupSplitAttribute( const arma::rowvec& attribute, const arma::Row& labels) { size_t i, count, begin, end; double entropy = 0.0; // Sort the attribute in order to calculate splitting ranges. arma::rowvec sortedAtt = arma::sort(attribute); // Store the indices of the sorted attribute to build a vector of sorted // labels. This sort is stable. arma::uvec sortedIndexAtt = arma::stable_sort_index(attribute.t()); arma::Row sortedLabels(attribute.n_elem); sortedLabels.fill(0); for (i = 0; i < attribute.n_elem; i++) sortedLabels(i) = labels(sortedIndexAtt(i)); i = 0; count = 0; // This splits the sorted into buckets of size greater than or equal to // inpBucketSize. while (i < sortedLabels.n_elem) { count++; if (i == sortedLabels.n_elem - 1) { // If we're at the end, then don't worry about the bucket size; just take // this as the last bin. begin = i - count + 1; end = i; // Use ratioEl to calculate the ratio of elements in this split. const double ratioEl = ((double) (end - begin + 1) / sortedLabels.n_elem); entropy += ratioEl * CalculateEntropy( sortedLabels.subvec(begin, end)); i++; } else if (sortedLabels(i) != sortedLabels(i + 1)) { // If we're not at the last element of sortedLabels, then check whether // count is less than the current bucket size. if (count < bucketSize) { // If it is, then take the minimum bucket size anyways. // This is where the inpBucketSize comes into use. // This makes sure there isn't a bucket for every change in labels. begin = i - count + 1; end = begin + bucketSize - 1; if (end > sortedLabels.n_elem - 1) end = sortedLabels.n_elem - 1; } else { // If it is not, then take the bucket size as the value of count. begin = i - count + 1; end = i; } const double ratioEl = ((double) (end - begin + 1) / sortedLabels.n_elem); entropy += ratioEl * CalculateEntropy( sortedLabels.subvec(begin, end)); i = end + 1; count = 0; } else i++; } return entropy; } /** * After having decided the attribute on which to split, train on that * attribute. * * @param attribute Attribute is the attribute decided by the constructor on * which we now train the decision stump. */ template template void DecisionStump::TrainOnAtt(const arma::rowvec& attribute, const arma::Row& labels) { size_t i, count, begin, end; arma::rowvec sortedSplitAtt = arma::sort(attribute); arma::uvec sortedSplitIndexAtt = arma::stable_sort_index(attribute.t()); arma::Row sortedLabels(attribute.n_elem); sortedLabels.fill(0); arma::vec tempSplit; arma::Row tempLabel; for (i = 0; i < attribute.n_elem; i++) sortedLabels(i) = labels(sortedSplitIndexAtt(i)); arma::rowvec subCols; rType mostFreq; i = 0; count = 0; while (i < sortedLabels.n_elem) { count++; if (i == sortedLabels.n_elem - 1) { begin = i - count + 1; end = i; arma::rowvec zSubCols((sortedLabels.cols(begin, end)).n_elem); zSubCols.fill(0.0); subCols = sortedLabels.cols(begin, end) + zSubCols; mostFreq = CountMostFreq(subCols); split.resize(split.n_elem + 1); split(split.n_elem - 1) = sortedSplitAtt(begin); binLabels.resize(binLabels.n_elem + 1); binLabels(binLabels.n_elem - 1) = mostFreq; i++; } else if (sortedLabels(i) != sortedLabels(i + 1)) { if (count < bucketSize) { // Test for different values of bucketSize, especially extreme cases. begin = i - count + 1; end = begin + bucketSize - 1; if (end > sortedLabels.n_elem - 1) end = sortedLabels.n_elem - 1; } else { begin = i - count + 1; end = i; } arma::rowvec zSubCols((sortedLabels.cols(begin, end)).n_elem); zSubCols.fill(0.0); subCols = sortedLabels.cols(begin, end) + zSubCols; // Find the most frequent element in subCols so as to assign a label to // the bucket of subCols. mostFreq = CountMostFreq(subCols); split.resize(split.n_elem + 1); split(split.n_elem - 1) = sortedSplitAtt(begin); binLabels.resize(binLabels.n_elem + 1); binLabels(binLabels.n_elem - 1) = mostFreq; i = end + 1; count = 0; } else i++; } // Now trim the split matrix so that buckets one after the after which point // to the same classLabel are merged as one big bucket. MergeRanges(); } /** * After the "split" matrix has been set up, merge ranges with identical class * labels. */ template void DecisionStump::MergeRanges() { for (size_t i = 1; i < split.n_rows; i++) { if (binLabels(i) == binLabels(i - 1)) { // Remove this row, as it has the same label as the previous bucket. binLabels.shed_row(i); split.shed_row(i); // Go back to previous row. i--; } } } template template rType DecisionStump::CountMostFreq(const arma::Row& subCols) { // Sort subCols for easier processing. arma::Row sortCounts = arma::sort(subCols); rType element = sortCounts[0]; size_t count = 0, localCount = 0; if (sortCounts.n_elem == 1) return sortCounts[0]; // An O(n) loop which counts the most frequent element in sortCounts. for (size_t i = 0; i < sortCounts.n_elem; ++i) { if (i == sortCounts.n_elem - 1) { if (sortCounts(i - 1) == sortCounts(i)) { // element = sortCounts(i - 1); localCount++; } else if (localCount > count) count = localCount; } else if (sortCounts(i) != sortCounts(i + 1)) { localCount = 0; count++; } else { localCount++; if (localCount > count) { count = localCount; if (localCount == 1) element = sortCounts(i); } } } return element; } /** * Returns 1 if all the values of featureRow are not same. * * @param featureRow The attribute which is checked for identical values. */ template template int DecisionStump::IsDistinct(const arma::Row& featureRow) { rType val = featureRow(0); for (size_t i = 1; i < featureRow.n_elem; ++i) if (val != featureRow(i)) return 1; return 0; } /** * Calculate entropy of attribute. * * @param attribute The attribute for which we calculate the entropy. * @param labels Corresponding labels of the attribute. */ template template double DecisionStump::CalculateEntropy(arma::subview_row labels) { double entropy = 0.0; size_t j; arma::Row numElem(numClass); numElem.fill(0); // Populate numElem; they are used as helpers to calculate // entropy. for (j = 0; j < labels.n_elem; j++) numElem(labels(j))++; // The equation for entropy uses log2(), but log2() is from C99 and thus // Visual Studio will not have it. Therefore, we will use std::log(), and // then apply the change-of-base formula at the end of the calculation. for (j = 0; j < numClass; j++) { const double p1 = ((double) numElem(j) / labels.n_elem); entropy += (p1 == 0) ? 0 : p1 * std::log(p1); } return entropy / std::log(2.0); } }; // namespace decision_stump }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/decision_stump/decision_stump.hpp0000644000176200001440000001254613647512751026166 0ustar liggesusers/** * @file decision_stump.hpp * @author Udit Saxena * * Definition of decision stumps. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_HPP #define __MLPACK_METHODS_DECISION_STUMP_DECISION_STUMP_HPP #include namespace mlpack { namespace decision_stump { /** * This class implements a decision stump. It constructs a single level * decision tree, i.e., a decision stump. It uses entropy to decide splitting * ranges. * * The stump is parameterized by a splitting attribute (the dimension on which * points are split), a vector of bin split values, and a vector of labels for * each bin. Bin i is specified by the range [split[i], split[i + 1]). The * last bin has range up to \infty (split[i + 1] does not exist in that case). * Points that are below the first bin will take the label of the first bin. * * @tparam MatType Type of matrix that is being used (sparse or dense). */ template class DecisionStump { public: /** * Constructor. Train on the provided data. Generate a decision stump from * data. * * @param data Input, training data. * @param labels Labels of training data. * @param classes Number of distinct classes in labels. * @param inpBucketSize Minimum size of bucket when splitting. */ DecisionStump(const MatType& data, const arma::Row& labels, const size_t classes, size_t inpBucketSize); /** * Classification function. After training, classify test, and put the * predicted classes in predictedLabels. * * @param test Testing data or data to classify. * @param predictedLabels Vector to store the predicted classes after * classifying test data. */ void Classify(const MatType& test, arma::Row& predictedLabels); /** * * * * */ DecisionStump(const DecisionStump<>& ds); /** * * * * * * ModifyData(MatType& data, const arma::Row& D); */ //! Access the splitting attribute. int SplitAttribute() const { return splitAttribute; } //! Modify the splitting attribute (be careful!). int& SplitAttribute() { return splitAttribute; } //! Access the splitting values. const arma::vec& Split() const { return split; } //! Modify the splitting values (be careful!). arma::vec& Split() { return split; } //! Access the labels for each split bin. const arma::Col BinLabels() const { return binLabels; } //! Modify the labels for each split bin (be careful!). arma::Col& BinLabels() { return binLabels; } private: //! Stores the number of classes. size_t numClass; //! Stores the value of the attribute on which to split. int splitAttribute; //! Size of bucket while determining splitting criterion. size_t bucketSize; //! Stores the splitting values after training. arma::vec split; //! Stores the labels for each splitting bin. arma::Col binLabels; /** * Sets up attribute as if it were splitting on it and finds entropy when * splitting on attribute. * * @param attribute A row from the training data, which might be a * candidate for the splitting attribute. */ double SetupSplitAttribute(const arma::rowvec& attribute, const arma::Row& labels); /** * After having decided the attribute on which to split, train on that * attribute. * * @param attribute attribute is the attribute decided by the constructor * on which we now train the decision stump. */ template void TrainOnAtt(const arma::rowvec& attribute, const arma::Row& labels); /** * After the "split" matrix has been set up, merge ranges with identical class * labels. */ void MergeRanges(); /** * Count the most frequently occurring element in subCols. * * @param subCols The vector in which to find the most frequently * occurring element. */ template rType CountMostFreq(const arma::Row& subCols); /** * Returns 1 if all the values of featureRow are not same. * * @param featureRow The attribute which is checked for identical values. */ template int IsDistinct(const arma::Row& featureRow); /** * Calculate the entropy of the given attribute. * * @param attribute The attribute of which we calculate the entropy. * @param labels Corresponding labels of the attribute. */ template double CalculateEntropy(arma::subview_row labels); }; }; // namespace decision_stump }; // namespace mlpack #include "decision_stump_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/kernel_pca/0000755000176200001440000000000013647512751021476 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/kernel_pca/kernel_rules/0000755000176200001440000000000013647512751024170 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/kernel_pca/kernel_rules/naive_method.hpp0000644000176200001440000000736413647512751027355 0ustar liggesusers/** * @file naive_method.hpp * @author Ajinkya Kale * * Use the naive method to construct the kernel matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_NAIVE_METHOD_HPP #define __MLPACK_METHODS_KERNEL_PCA_NAIVE_METHOD_HPP #include namespace mlpack { namespace kpca { template class NaiveKernelRule { public: public: /** * Construct the kernel matrix approximation using the nystroem method. * * @param data Input data points. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. * @param rank Rank to be used for matrix approximation. * @param kernel Kernel to be used for computation. */ static void ApplyKernelMatrix(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t /* unused */, KernelType kernel = KernelType()) { // Construct the kernel matrix. arma::mat kernelMatrix; // Resize the kernel matrix to the right size. kernelMatrix.set_size(data.n_cols, data.n_cols); // Note that we only need to calculate the upper triangular part of the // kernel matrix, since it is symmetric. This helps minimize the number of // kernel evaluations. for (size_t i = 0; i < data.n_cols; ++i) { for (size_t j = i; j < data.n_cols; ++j) { // Evaluate the kernel on these two points. kernelMatrix(i, j) = kernel.Evaluate(data.unsafe_col(i), data.unsafe_col(j)); } } // Copy to the lower triangular part of the matrix. for (size_t i = 1; i < data.n_cols; ++i) for (size_t j = 0; j < i; ++j) kernelMatrix(i, j) = kernelMatrix(j, i); // For PCA the data has to be centered, even if the data is centered. But it // is not guaranteed that the data, when mapped to the kernel space, is also // centered. Since we actually never work in the feature space we cannot // center the data. So, we perform a "psuedo-centering" using the kernel // matrix. arma::rowvec rowMean = arma::sum(kernelMatrix, 0) / kernelMatrix.n_cols; kernelMatrix.each_col() -= arma::sum(kernelMatrix, 1) / kernelMatrix.n_cols; kernelMatrix.each_row() -= rowMean; kernelMatrix += arma::sum(rowMean) / kernelMatrix.n_cols; // Eigendecompose the centered kernel matrix. arma::eig_sym(eigval, eigvec, kernelMatrix); // Swap the eigenvalues since they are ordered backwards (we need largest to // smallest). for (size_t i = 0; i < floor(eigval.n_elem / 2.0); ++i) eigval.swap_rows(i, (eigval.n_elem - 1) - i); // Flip the coefficients to produce the same effect. eigvec = arma::fliplr(eigvec); transformedData = eigvec.t() * kernelMatrix; } }; }; // namespace kpca }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/kernel_pca/kernel_rules/nystroem_method.hpp0000644000176200001440000000620313647512751030122 0ustar liggesusers/** * @file nystroem_method.hpp * @author Marcus Edel * * Use the Nystroem method for approximating a kernel matrix. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_NYSTROEM_METHOD_HPP #define __MLPACK_METHODS_KERNEL_PCA_NYSTROEM_METHOD_HPP #include #include #include namespace mlpack { namespace kpca { template< typename KernelType, typename PointSelectionPolicy = kernel::KMeansSelection<> > class NystroemKernelRule { public: /** * Construct the kernel matrix approximation using the nystroem method. * * @param data Input data points. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. * @param rank Rank to be used for matrix approximation. * @param kernel Kernel to be used for computation. */ static void ApplyKernelMatrix(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t rank, KernelType kernel = KernelType()) { arma::mat G, v; kernel::NystroemMethod nm(data, kernel, rank); nm.Apply(G); transformedData = G.t() * G; // For PCA the data has to be centered, even if the data is centered. But // it is not guaranteed that the data, when mapped to the kernel space, is // also centered. Since we actually never work in the feature space we // cannot center the data. So, we perform a "psuedo-centering" using the // kernel matrix. arma::rowvec rowMean = arma::sum(transformedData, 0) / transformedData.n_cols; transformedData.each_col() -= arma::sum(transformedData, 1) / transformedData.n_cols; transformedData.each_row() -= rowMean; transformedData += arma::sum(rowMean) / transformedData.n_cols; // Eigendecompose the centered kernel matrix. arma::svd(eigvec, eigval, v, transformedData); eigval %= eigval / (data.n_cols - 1); transformedData = eigvec.t() * G.t(); } }; }; // namespace kpca }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/kernel_pca/kernel_pca.hpp0000644000176200001440000001240313647512751024312 0ustar liggesusers/** * @file kernel_pca.hpp * @author Ajinkya Kale * @author Marcus Edel * * Defines the KernelPCA class to perform Kernel Principal Components Analysis * on the specified data set. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_HPP #define __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_HPP #include #include namespace mlpack { namespace kpca { /** * This class performs kernel principal components analysis (Kernel PCA), for a * given kernel. This is a standard machine learning technique and is * well-documented on the Internet and in standard texts. It is often used as a * dimensionality reduction technique, and can also be useful in mapping * linearly inseparable classes of points to different spaces where they are * linearly separable. * * The performance of the method is highly dependent on the kernel choice. * There are numerous available kernels in the mlpack::kernel namespace (see * files in mlpack/core/kernels/) and it is easy to write your own; see other * implementations for examples. */ template < typename KernelType, typename KernelRule = NaiveKernelRule > class KernelPCA { public: /** * Construct the KernelPCA object, optionally passing a kernel. Optionally, * the transformed data can be centered about the origin; to do this, pass * 'true' for centerTransformedData. This will take slightly longer (but not * much). * * @param kernel Kernel to be used for computation. * @param centerTransformedData Center transformed data. */ KernelPCA(const KernelType kernel = KernelType(), const bool centerTransformedData = false); /** * Apply Kernel Principal Components Analysis to the provided data set. * * @param data Data matrix. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. * @param newDimension New dimension for the dataset. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t newDimension); /** * Apply Kernel Principal Components Analysis to the provided data set. * * @param data Data matrix. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. * @param eigvec KPCA eigenvectors will be written to this matrix. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec); /** * Apply Kernel Principal Component Analysis to the provided data set. * * @param data Data matrix. * @param transformedData Matrix to output results into. * @param eigval KPCA eigenvalues will be written to this vector. */ void Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval); /** * Apply dimensionality reduction using Kernel Principal Component Analysis * to the provided data set. The data matrix will be modified in-place. Note * that the dimension can be larger than the existing dimension because KPCA * works on the kernel matrix, not the covariance matrix. This means the new * dimension can be as large as the number of points (columns) in the dataset. * Note that if you specify newDimension to be larger than the current * dimension of the data (the number of rows), then it's not really * "dimensionality reduction"... * * @param data Data matrix. * @param newDimension New dimension for the dataset. */ void Apply(arma::mat& data, const size_t newDimension); //! Get the kernel. const KernelType& Kernel() const { return kernel; } //! Modify the kernel. KernelType& Kernel() { return kernel; } //! Return whether or not the transformed data is centered. bool CenterTransformedData() const { return centerTransformedData; } //! Return whether or not the transformed data is centered. bool& CenterTransformedData() { return centerTransformedData; } // Returns a string representation of this object. std::string ToString() const; private: //! The instantiated kernel. KernelType kernel; //! If true, the data will be scaled (by standard deviation) when Apply() is //! run. bool centerTransformedData; }; // class KernelPCA }; // namespace kpca }; // namespace mlpack // Include implementation. #include "kernel_pca_impl.hpp" #endif // __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_HPP RcppMLPACK/inst/include/mlpack/methods/kernel_pca/kernel_pca_impl.hpp0000644000176200001440000000766513647512751025351 0ustar liggesusers/** * @file kernel_pca_impl.hpp * @author Ajinkya Kale * @author Marcus Edel * * Implementation of Kernel PCA class to perform Kernel Principal Components * Analysis on the specified data set. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_IMPL_HPP #define __MLPACK_METHODS_KERNEL_PCA_KERNEL_PCA_IMPL_HPP // In case it hasn't already been included. #include "kernel_pca.hpp" namespace mlpack { namespace kpca { template KernelPCA::KernelPCA(const KernelType kernel, const bool centerTransformedData) : kernel(kernel), centerTransformedData(centerTransformedData) { } //! Apply Kernel Principal Component Analysis to the provided data set. template void KernelPCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec, const size_t newDimension) { KernelRule::ApplyKernelMatrix(data, transformedData, eigval, eigvec, newDimension, kernel); // Center the transformed data, if the user asked for it. if (centerTransformedData) { arma::colvec transformedDataMean = arma::mean(transformedData, 1); transformedData = transformedData - (transformedDataMean * arma::ones(transformedData.n_cols)); } } //! Apply Kernel Principal Component Analysis to the provided data set. template void KernelPCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigval, arma::mat& eigvec) { Apply(data, transformedData, eigval, eigvec, data.n_cols); } //! Apply Kernel Principal Component Analysis to the provided data set. template void KernelPCA::Apply(const arma::mat& data, arma::mat& transformedData, arma::vec& eigVal) { arma::mat coeffs; Apply(data, transformedData, eigVal, coeffs, data.n_cols); } //! Use KPCA for dimensionality reduction. template void KernelPCA::Apply(arma::mat& data, const size_t newDimension) { arma::mat coeffs; arma::vec eigVal; Apply(data, data, eigVal, coeffs, newDimension); if (newDimension < coeffs.n_rows && newDimension > 0) data.shed_rows(newDimension, data.n_rows - 1); } //! Returns a string representation of the object. template std::string KernelPCA::ToString() const { std::ostringstream convert; convert << "KernelPCA [" << this << "]" << std::endl; convert << " Center Transformed: " << centerTransformedData <. */ #ifndef __MLPACK_METHODS_SPARSE_CODING_RANDOM_INITIALIZER_HPP #define __MLPACK_METHODS_SPARSE_CODING_RANDOM_INITIALIZER_HPP #include namespace mlpack { namespace sparse_coding { /** * A DictionaryInitializer for use with the SparseCoding class. This provides a * random, normally distributed dictionary, such that each atom has a norm of 1. */ class RandomInitializer { public: /** * Initialize the dictionary randomly from a normal distribution, such that * each atom has a norm of 1. This is simple enough to be included with the * definition. * * @param data Dataset to use for initialization. * @param atoms Number of atoms (columns) in the dictionary. * @param dictionary Dictionary to initialize. */ static void Initialize(const arma::mat& data, const size_t atoms, arma::mat& dictionary) { // Create random dictionary. dictionary.randn(data.n_rows, atoms); // Normalize each atom. for (size_t j = 0; j < atoms; ++j) dictionary.col(j) /= norm(dictionary.col(j), 2); } }; }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/sparse_coding/sparse_coding_impl.hpp0000644000176200001440000002675613647512751026605 0ustar liggesusers/** * @file sparse_coding_impl.hpp * @author Nishant Mehta * * Implementation of Sparse Coding with Dictionary Learning using l1 (LASSO) or * l1+l2 (Elastic Net) regularization. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_IMPL_HPP #define __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_IMPL_HPP // In case it hasn't already been included. #include "sparse_coding.hpp" namespace mlpack { namespace sparse_coding { template SparseCoding::SparseCoding(const arma::mat& data, const size_t atoms, const double lambda1, const double lambda2) : atoms(atoms), data(data), codes(atoms, data.n_cols), lambda1(lambda1), lambda2(lambda2) { // Initialize the dictionary. DictionaryInitializer::Initialize(data, atoms, dictionary); } template void SparseCoding::Encode(const size_t maxIterations, const double objTolerance, const double newtonTolerance) { //Timer::Start("sparse_coding"); double lastObjVal = DBL_MAX; // Take the initial coding step, which has to happen before entering the main // optimization loop. Rcpp::Rcout << "Initial Coding Step." << std::endl; OptimizeCode(); arma::uvec adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double) (adjacencies.n_elem)) / ((double) (atoms * data.n_cols)) << "%." << std::endl; Rcpp::Rcout << " Objective value: " << Objective() << "." << std::endl; for (size_t t = 1; t != maxIterations; ++t) { // Print current iteration, and maximum number of iterations (if it isn't // 0). Rcpp::Rcout << "Iteration " << t; if (maxIterations != 0) Rcpp::Rcout << " of " << maxIterations; Rcpp::Rcout << "." << std::endl; // First step: optimize the dictionary. Rcpp::Rcout << "Performing dictionary step... " << std::endl; OptimizeDictionary(adjacencies, newtonTolerance); Rcpp::Rcout << " Objective value: " << Objective() << "." << std::endl; // Second step: perform the coding. Rcpp::Rcout << "Performing coding step..." << std::endl; OptimizeCode(); // Get the indices of all the nonzero elements in the codes. adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double) (adjacencies.n_elem)) / ((double) (atoms * data.n_cols)) << "%." << std::endl; // Find the new objective value and improvement so we can check for // convergence. double curObjVal = Objective(); double improvement = lastObjVal - curObjVal; Rcpp::Rcout << " Objective value: " << curObjVal << " (improvement " << std::scientific << improvement << ")." << std::endl; // Have we converged? if (improvement < objTolerance) { Rcpp::Rcout << "Converged within tolerance " << objTolerance << ".\n"; break; } lastObjVal = curObjVal; } //Timer::Stop("sparse_coding"); } template void SparseCoding::OptimizeCode() { // When using the Cholesky version of LARS, this is correct even if // lambda2 > 0. arma::mat matGram = trans(dictionary) * dictionary; for (size_t i = 0; i < data.n_cols; ++i) { // Report progress. if ((i % 100) == 0) Rcpp::Rcout << "Optimization at point " << i << "." << std::endl; bool useCholesky = true; regression::LARS lars(useCholesky, matGram, lambda1, lambda2); // Create an alias of the code (using the same memory), and then LARS will // place the result directly into that; then we will not need to have an // extra copy. arma::vec code = codes.unsafe_col(i); lars.Regress(dictionary, data.unsafe_col(i), code, false); } } // Dictionary step for optimization. template double SparseCoding::OptimizeDictionary( const arma::uvec& adjacencies, const double newtonTolerance) { // Count the number of atomic neighbors for each point x^i. arma::uvec neighborCounts = arma::zeros(data.n_cols, 1); if (adjacencies.n_elem > 0) { // This gets the column index. Intentional integer division. size_t curPointInd = (size_t) (adjacencies(0) / atoms); size_t nextColIndex = (curPointInd + 1) * atoms; for (size_t l = 1; l < adjacencies.n_elem; ++l) { // If l no longer refers to an element in this column, advance the column // number accordingly. if (adjacencies(l) >= nextColIndex) { curPointInd = (size_t) (adjacencies(l) / atoms); nextColIndex = (curPointInd + 1) * atoms; } ++neighborCounts(curPointInd); } } // Handle the case of inactive atoms (atoms not used in the given coding). std::vector inactiveAtoms; for (size_t j = 0; j < atoms; ++j) { if (accu(codes.row(j) != 0) == 0) inactiveAtoms.push_back(j); } const size_t nInactiveAtoms = inactiveAtoms.size(); const size_t nActiveAtoms = atoms - nInactiveAtoms; // Efficient construction of Z restricted to active atoms. arma::mat matActiveZ; if (nInactiveAtoms > 0) { math::RemoveRows(codes, inactiveAtoms, matActiveZ); } if (nInactiveAtoms > 0) { Rcpp::Rcout << "There are " << nInactiveAtoms << " inactive atoms. They will be re-initialized randomly.\n"; } Rcpp::Rcout << "Solving Dual via Newton's Method.\n"; // Solve using Newton's method in the dual - note that the final dot // multiplication with inv(A) seems to be unavoidable. Although more // expensive, the code written this way (we use solve()) should be more // numerically stable than just using inv(A) for everything. arma::vec dualVars = arma::zeros(nActiveAtoms); //vec dualVars = 1e-14 * ones(nActiveAtoms); // Method used by feature sign code - fails miserably here. Perhaps the // MATLAB optimizer fmincon does something clever? //vec dualVars = 10.0 * randu(nActiveAtoms, 1); //vec dualVars = diagvec(solve(dictionary, data * trans(codes)) // - codes * trans(codes)); //for (size_t i = 0; i < dualVars.n_elem; i++) // if (dualVars(i) < 0) // dualVars(i) = 0; bool converged = false; // If we have any inactive atoms, we must construct these differently. arma::mat codesXT; arma::mat codesZT; if (inactiveAtoms.empty()) { codesXT = codes * trans(data); codesZT = codes * trans(codes); } else { codesXT = matActiveZ * trans(data); codesZT = matActiveZ * trans(matActiveZ); } double normGradient; double improvement; for (size_t t = 1; !converged; ++t) { arma::mat A = codesZT + diagmat(dualVars); arma::mat matAInvZXT = solve(A, codesXT); arma::vec gradient = -arma::sum(arma::square(matAInvZXT), 1); gradient += 1; arma::mat hessian = -(-2 * (matAInvZXT * trans(matAInvZXT)) % inv(A)); arma::vec searchDirection = -solve(hessian, gradient); //printf("%e\n", norm(searchDirection, 2)); // Armijo line search. const double c = 1e-4; double alpha = 1.0; const double rho = 0.9; double sufficientDecrease = c * dot(gradient, searchDirection); while (true) { // Calculate objective. double sumDualVars = sum(dualVars); double fOld = -(-trace(trans(codesXT) * matAInvZXT) - sumDualVars); double fNew = -(-trace(trans(codesXT) * solve(codesZT + diagmat(dualVars + alpha * searchDirection), codesXT)) - (sumDualVars + alpha * sum(searchDirection))); if (fNew <= fOld + alpha * sufficientDecrease) { searchDirection = alpha * searchDirection; improvement = fOld - fNew; break; } alpha *= rho; } // Take step and print useful information. dualVars += searchDirection; normGradient = norm(gradient, 2); Rcpp::Rcout << "Newton Method iteration " << t << ":" << std::endl; Rcpp::Rcout << " Gradient norm: " << std::scientific << normGradient << "." << std::endl; Rcpp::Rcout << " Improvement: " << std::scientific << improvement << ".\n"; if (improvement < newtonTolerance) converged = true; } if (inactiveAtoms.empty()) { // Directly update dictionary. dictionary = trans(solve(codesZT + diagmat(dualVars), codesXT)); } else { arma::mat activeDictionary = trans(solve(codesZT + diagmat(dualVars), codesXT)); // Update all atoms. size_t currentInactiveIndex = 0; for (size_t i = 0; i < atoms; ++i) { if (inactiveAtoms[currentInactiveIndex] == i) { // This atom is inactive. Reinitialize it randomly. dictionary.col(i) = (data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols))); dictionary.col(i) /= norm(dictionary.col(i), 2); // Increment inactive index counter. ++currentInactiveIndex; } else { // Update estimate. dictionary.col(i) = activeDictionary.col(i - currentInactiveIndex); } } } //printf("final reconstruction error: %e\n", norm(data - dictionary * codes, "fro")); return normGradient; } // Project each atom of the dictionary back into the unit ball (if necessary). template void SparseCoding::ProjectDictionary() { for (size_t j = 0; j < atoms; j++) { double atomNorm = norm(dictionary.col(j), 2); if (atomNorm > 1) { Rcpp::Rcout << "Norm of atom " << j << " exceeds 1 (" << std::scientific << atomNorm << "). Shrinking...\n"; dictionary.col(j) /= atomNorm; } } } // Compute the objective function. template double SparseCoding::Objective() const { double l11NormZ = sum(sum(abs(codes))); double froNormResidual = norm(data - (dictionary * codes), "fro"); if (lambda2 > 0) { double froNormZ = norm(codes, "fro"); return 0.5 * (std::pow(froNormResidual, 2.0) + (lambda2 * std::pow(froNormZ, 2.0))) + (lambda1 * l11NormZ); } else // It can be simpler. { return 0.5 * std::pow(froNormResidual, 2.0) + lambda1 * l11NormZ; } } template std::string SparseCoding::ToString() const { std::ostringstream convert; convert << "Sparse Coding [" << this << "]" << std::endl; convert << " Data: " << data.n_rows << "x" ; convert << data.n_cols << std::endl; convert << " Atoms: " << atoms << std::endl; convert << " Lambda 1: " << lambda1 << std::endl; convert << " Lambda 2: " << lambda2 << std::endl; return convert.str(); } }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/sparse_coding/sparse_coding.hpp0000644000176200001440000001662513647512751025556 0ustar liggesusers/** * @file sparse_coding.hpp * @author Nishant Mehta * * Definition of the SparseCoding class, which performs L1 (LASSO) or * L1+L2 (Elastic Net)-regularized sparse coding with dictionary learning * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_HPP #define __MLPACK_METHODS_SPARSE_CODING_SPARSE_CODING_HPP #include #include // Include our three simple dictionary initializers. #include "nothing_initializer.hpp" #include "data_dependent_random_initializer.hpp" #include "random_initializer.hpp" namespace mlpack { namespace sparse_coding { /** * An implementation of Sparse Coding with Dictionary Learning that achieves * sparsity via an l1-norm regularizer on the codes (LASSO) or an (l1+l2)-norm * regularizer on the codes (the Elastic Net). * * Let d be the number of dimensions in the original space, m the number of * training points, and k the number of atoms in the dictionary (the dimension * of the learned feature space). The training data X is a d-by-m matrix where * each column is a point and each row is a dimension. The dictionary D is a * d-by-k matrix, and the sparse codes matrix Z is a k-by-m matrix. * This program seeks to minimize the objective: * * \f[ * \min_{D,Z} 0.5 ||X - D Z||_{F}^2\ + \lambda_1 \sum_{i=1}^m ||Z_i||_1 * + 0.5 \lambda_2 \sum_{i=1}^m ||Z_i||_2^2 * \f] * * subject to \f$ ||D_j||_2 <= 1 \f$ for \f$ 1 <= j <= k \f$ * where typically \f$ lambda_1 > 0 \f$ and \f$ lambda_2 = 0 \f$. * * This problem is solved by an algorithm that alternates between a dictionary * learning step and a sparse coding step. The dictionary learning step updates * the dictionary D using a Newton method based on the Lagrange dual (see the * paper below for details). The sparse coding step involves solving a large * number of sparse linear regression problems; this can be done efficiently * using LARS, an algorithm that can solve the LASSO or the Elastic Net (papers * below). * * Here are those papers: * * @code * @incollection{lee2007efficient, * title = {Efficient sparse coding algorithms}, * author = {Honglak Lee and Alexis Battle and Rajat Raina and Andrew Y. Ng}, * booktitle = {Advances in Neural Information Processing Systems 19}, * editor = {B. Sch\"{o}lkopf and J. Platt and T. Hoffman}, * publisher = {MIT Press}, * address = {Cambridge, MA}, * pages = {801--808}, * year = {2007} * } * @endcode * * @code * @article{efron2004least, * title={Least angle regression}, * author={Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R.}, * journal={The Annals of statistics}, * volume={32}, * number={2}, * pages={407--499}, * year={2004}, * publisher={Institute of Mathematical Statistics} * } * @endcode * * @code * @article{zou2005regularization, * title={Regularization and variable selection via the elastic net}, * author={Zou, H. and Hastie, T.}, * journal={Journal of the Royal Statistical Society Series B}, * volume={67}, * number={2}, * pages={301--320}, * year={2005}, * publisher={Royal Statistical Society} * } * @endcode * * Before the method is run, the dictionary is initialized using the * DictionaryInitializationPolicy class. Possible choices include the * RandomInitializer, which provides an entirely random dictionary, the * DataDependentRandomInitializer, which provides a random dictionary based * loosely on characteristics of the dataset, and the NothingInitializer, which * does not initialize the dictionary -- instead, the user should set the * dictionary using the Dictionary() mutator method. * * @tparam DictionaryInitializationPolicy The class to use to initialize the * dictionary; must have 'void Initialize(const arma::mat& data, arma::mat& * dictionary)' function. */ template class SparseCoding { public: /** * Set the parameters to SparseCoding. lambda2 defaults to 0. * * @param data Data matrix * @param atoms Number of atoms in dictionary * @param lambda1 Regularization parameter for l1-norm penalty * @param lambda2 Regularization parameter for l2-norm penalty */ SparseCoding(const arma::mat& data, const size_t atoms, const double lambda1, const double lambda2 = 0); /** * Run Sparse Coding with Dictionary Learning. * * @param maxIterations Maximum number of iterations to run algorithm. If 0, * the algorithm will run until convergence (or forever). * @param objTolerance Tolerance for objective function. When an iteration of * the algorithm produces an improvement smaller than this, the algorithm * will terminate. * @param newtonTolerance Tolerance for the Newton's method dictionary * optimization step. */ void Encode(const size_t maxIterations = 0, const double objTolerance = 0.01, const double newtonTolerance = 1e-6); /** * Sparse code each point via LARS. */ void OptimizeCode(); /** * Learn dictionary via Newton method based on Lagrange dual. * * @param adjacencies Indices of entries (unrolled column by column) of * the coding matrix Z that are non-zero (the adjacency matrix for the * bipartite graph of points and atoms). * @param newtonTolerance Tolerance of the Newton's method optimizer. * @return the norm of the gradient of the Lagrange dual with respect to * the dual variables */ double OptimizeDictionary(const arma::uvec& adjacencies, const double newtonTolerance = 1e-6); /** * Project each atom of the dictionary back onto the unit ball, if necessary. */ void ProjectDictionary(); /** * Compute the objective function. */ double Objective() const; //! Access the data. const arma::mat& Data() const { return data; } //! Access the dictionary. const arma::mat& Dictionary() const { return dictionary; } //! Modify the dictionary. arma::mat& Dictionary() { return dictionary; } //! Access the sparse codes. const arma::mat& Codes() const { return codes; } //! Modify the sparse codes. arma::mat& Codes() { return codes; } // Returns a string representation of this object. std::string ToString() const; private: //! Number of atoms. size_t atoms; //! Data matrix (columns are points). const arma::mat& data; //! Dictionary (columns are atoms). arma::mat dictionary; //! Sparse codes (columns are points). arma::mat codes; //! l1 regularization term. double lambda1; //! l2 regularization term. double lambda2; }; }; // namespace sparse_coding }; // namespace mlpack // Include implementation. #include "sparse_coding_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/sparse_coding/data_dependent_random_initializer.hpp0000644000176200001440000000452513647512751031634 0ustar liggesusers/** * @file data_dependent_random_initializer.hpp * @author Nishant Mehta * * A sensible heuristic for initializing dictionaries for sparse coding. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_DATA_DEPENDENT_RANDOM_INITIALIZER_HPP #define __MLPACK_METHODS_SPARSE_CODING_DATA_DEPENDENT_RANDOM_INITIALIZER_HPP #include namespace mlpack { namespace sparse_coding { /** * A data-dependent random dictionary initializer for SparseCoding. This * creates random dictionary atoms by adding three random observations from the * data together, and then normalizing the atom. */ class DataDependentRandomInitializer { public: /** * Initialize the dictionary by adding together three random observations from * the data, and then normalizing the atom. This implementation is simple * enough to be included with the definition. * * @param data Dataset to initialize the dictionary with. * @param atoms Number of atoms in dictionary. * @param dictionary Dictionary to initialize. */ static void Initialize(const arma::mat& data, const size_t atoms, arma::mat& dictionary) { // Set the size of the dictionary. dictionary.set_size(data.n_rows, atoms); // Create each atom. for (size_t i = 0; i < atoms; ++i) { // Add three atoms together. dictionary.col(i) = (data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols))); // Now normalize the atom. dictionary.col(i) /= norm(dictionary.col(i), 2); } } }; }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/sparse_coding/nothing_initializer.hpp0000644000176200001440000000344713647512751027005 0ustar liggesusers/** * @file nothing_initializer.hpp * @author Ryan Curtin * * An initializer for SparseCoding which does precisely nothing. It is useful * for when you have an already defined dictionary and you plan on setting it * with SparseCoding::Dictionary(). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_SPARSE_CODING_NOTHING_INITIALIZER_HPP #define __MLPACK_METHODS_SPARSE_CODING_NOTHING_INITIALIZER_HPP #include namespace mlpack { namespace sparse_coding { /** * A DictionaryInitializer for SparseCoding which does not initialize anything; * it is useful for when the dictionary is already known and will be set with * SparseCoding::Dictionary(). */ class NothingInitializer { public: /** * This function does not initialize the dictionary. This will cause problems * for SparseCoding if the dictionary is not set manually before running the * method. */ static void Initialize(const arma::mat& /* data */, const size_t /* atoms */, arma::mat& /* dictionary */) { // Do nothing! } }; }; // namespace sparse_coding }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/cf/0000755000176200001440000000000013647512751017763 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/cf/cf_impl.hpp0000644000176200001440000002104113647512751022103 0ustar liggesusers/** * @file cf.cpp * @author Mudit Raj Gupta * * Collaborative Filtering. * * Implementation of CF class to perform Collaborative Filtering on the * specified data set. * * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ namespace mlpack { namespace cf { /** * Construct the CF object. */ template CF::CF(arma::mat& data, const size_t numUsersForSimilarity, const size_t rank) : data(data), numUsersForSimilarity(numUsersForSimilarity), rank(rank), factorizer() { // Validate neighbourhood size. if (numUsersForSimilarity < 1) { Rcpp::Rcout << "CF::CF(): neighbourhood size should be > 0(" << numUsersForSimilarity << " given). Setting value to 5.\n"; //Setting Default Value of 5 this->numUsersForSimilarity = 5; } CleanData(); } template void CF::GetRecommendations(const size_t numRecs, arma::Mat& recommendations) { // Generate list of users. Maybe it would be more efficient to pass an empty // users list, and then have the other overload of GetRecommendations() assume // that if users is empty, then recommendations should be generated for all // users? arma::Col users = arma::linspace >(0, cleanedData.n_cols - 1, cleanedData.n_cols); // Call the main overload for recommendations. GetRecommendations(numRecs, recommendations, users); } template void CF::GetRecommendations(const size_t numRecs, arma::Mat& recommendations, arma::Col& users) { // Base function for calculating recommendations. // Check if the user wanted us to choose a rank for them. if (rank == 0) { // This is a simple heuristic that picks a rank based on the density of the // dataset between 5 and 105. const double density = (cleanedData.n_nonzero * 100.0) / cleanedData.n_elem; const size_t rankEstimate = size_t(density) + 5; // Set to heuristic value. Rcpp::Rcout << "No rank given for decomposition; using rank of " << rankEstimate << " calculated by density-based heuristic." << std::endl; rank = rankEstimate; } // Operations independent of the query: // Decompose the sparse data matrix to user and data matrices. factorizer.Apply(cleanedData, rank, w, h); // Generate new table by multiplying approximate values. rating = w * h; // Now, we will use the decomposed w and h matrices to estimate what the user // would have rated items as, and then pick the best items. // Temporarily store feature vector of queried users. arma::mat query(rating.n_rows, users.n_elem); // Select feature vectors of queried users. for (size_t i = 0; i < users.n_elem; i++) query.col(i) = rating.col(users(i)); // Temporary storage for neighborhood of the queried users. arma::Mat neighborhood; // Calculate the neighborhood of the queried users. // This should be a templatized option. neighbor::AllkNN a(rating, query); arma::mat resultingDistances; // Temporary storage. a.Search(numUsersForSimilarity, neighborhood, resultingDistances); // Temporary storage for storing the average rating for each user in their // neighborhood. arma::mat averages = arma::zeros(rating.n_rows, query.n_cols); // Iterate over each query user. for (size_t i = 0; i < neighborhood.n_cols; ++i) { // Iterate over each neighbor of the query user. for (size_t j = 0; j < neighborhood.n_rows; ++j) averages.col(i) += rating.col(neighborhood(j, i)); // Normalize average. averages.col(i) /= neighborhood.n_rows; } // Generate recommendations for each query user by finding the maximum numRecs // elements in the averages matrix. recommendations.set_size(numRecs, users.n_elem); recommendations.fill(cleanedData.n_rows); // Invalid item number. arma::mat values(numRecs, users.n_elem); values.fill(-DBL_MAX); // The smallest possible value. for (size_t i = 0; i < users.n_elem; i++) { // Look through the averages column corresponding to the current user. for (size_t j = 0; j < averages.n_rows; ++j) { // Ensure that the user hasn't already rated the item. if (cleanedData(j, users(i)) != 0.0) continue; // The user already rated the item. // Is the estimated value better than the worst candidate? const double value = averages(j, i); if (value > values(values.n_rows - 1, i)) { // It should be inserted. Which position? size_t insertPosition = values.n_rows - 1; while (insertPosition > 0) { if (value <= values(insertPosition - 1, i)) break; // The current value is the right one. insertPosition--; } // Now insert it into the list. InsertNeighbor(i, insertPosition, j, value, recommendations, values); } } // If we were not able to come up with enough recommendations, issue a // warning. if (recommendations(values.n_rows - 1, i) == cleanedData.n_rows + 1) Rcpp::Rcout << "Could not provide " << values.n_rows << " recommendations " << "for user " << users(i) << " (not enough un-rated items)!" << std::endl; } } template void CF::CleanData() { // Generate list of locations for batch insert constructor for sparse // matrices. arma::umat locations(2, data.n_cols); arma::vec values(data.n_cols); for (size_t i = 0; i < data.n_cols; ++i) { // We have to transpose it because items are rows, and users are columns. locations(1, i) = ((arma::uword) data(0, i)); locations(0, i) = ((arma::uword) data(1, i)); values(i) = data(2, i); } // Find maximum user and item IDs. const size_t maxItemID = (size_t) max(locations.row(0)) + 1; const size_t maxUserID = (size_t) max(locations.row(1)) + 1; // Fill sparse matrix. cleanedData = arma::sp_mat(locations, values, maxItemID, maxUserID); } /** * Helper function to insert a point into the recommendation matrices. * * @param queryIndex Index of point whose recommendations we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of item being inserted as a recommendation. * @param value Value of recommendation. */ template void CF::InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double value, arma::Mat& recommendations, arma::mat& values) const { // We only memmove() if there is actually a need to shift something. if (pos < (recommendations.n_rows - 1)) { const int len = (values.n_rows - 1) - pos; memmove(values.colptr(queryIndex) + (pos + 1), values.colptr(queryIndex) + pos, sizeof(double) * len); memmove(recommendations.colptr(queryIndex) + (pos + 1), recommendations.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. values(pos, queryIndex) = value; recommendations(pos, queryIndex) = neighbor; } // Return string of object. template std::string CF::ToString() const { std::ostringstream convert; convert << "Collaborative Filtering [" << this << "]" << std::endl; //convert << " Number of Recommendations: " << numRecs << std::endl; //convert << " Number of Users for Similarity: " << numUsersForSimilarity; //convert << std::endl; //convert << " Data: " << data.n_rows << "x" << data.n_cols << std::endl; return convert.str(); } }; // namespace mlpack }; // namespace cf RcppMLPACK/inst/include/mlpack/methods/cf/cf.hpp0000644000176200001440000001511613647512751021070 0ustar liggesusers/** * @file cf.hpp * @author Mudit Raj Gupta * * Collaborative filtering. * * Defines the CF class to perform collaborative filtering on the specified data * set using alternating least squares (ALS). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_CF_CF_HPP #define __MLPACK_METHODS_CF_CF_HPP #include #include #include #include #include #include #include #include namespace mlpack { namespace cf /** Collaborative filtering. */{ /** * This class implements Collaborative Filtering (CF). This implementation * presently supports Alternating Least Squares (ALS) for collaborative * filtering. * * A simple example of how to run Collaborative Filtering is shown below. * * @code * extern arma::mat data; // (user, item, rating) table * extern arma::Col users; // users seeking recommendations * arma::Mat recommendations; // Recommendations * * CF<> cf(data); // Default options. * * // Generate 10 recommendations for all users. * cf.GetRecommendations(10, recommendations); * * // Generate 10 recommendations for specified users. * cf.GetRecommendations(10, recommendations, users); * * @endcode * * The data matrix is a (user, item, rating) table. Each column in the matrix * should have three rows. The first represents the user; the second represents * the item; and the third represents the rating. The user and item, while they * are in a matrix that holds doubles, should hold integer (or size_t) values. * The user and item indices are assumed to start at 0. * * @tparam FactorizerType The type of matrix factorization to use to decompose * the rating matrix (a W and H matrix). This must implement the method * Apply(arma::sp_mat& data, size_t rank, arma::mat& W, arma::mat& H). */ template< typename FactorizerType = amf::AMF > class CF { public: /** * Initialize the CF object. Store a reference to the data that we * will be using. There are parameters that can be set; default values * are provided for each of them. If the rank is left unset (or is set to 0), * a simple density-based heuristic will be used to choose a rank. * * @param data Initial (user, item, rating) matrix. * @param numUsersForSimilarity Size of the neighborhood. * @param rank Rank parameter for matrix factorization. */ CF(arma::mat& data, const size_t numUsersForSimilarity = 5, const size_t rank = 0); //! Sets number of users for calculating similarity. void NumUsersForSimilarity(const size_t num) { if (num < 1) { Rcpp::Rcout << "CF::NumUsersForSimilarity(): invalid value (< 1) " "ignored." << std::endl; return; } this->numUsersForSimilarity = num; } //! Gets number of users for calculating similarity. size_t NumUsersForSimilarity() const { return numUsersForSimilarity; } //! Sets rank parameter for matrix factorization. void Rank(const size_t rankValue) { this->rank = rankValue; } //! Gets rank parameter for matrix factorization. size_t Rank() const { return rank; } //! Sets factorizer for NMF void Factorizer(const FactorizerType& f) { this->factorizer = f; } //! Get the User Matrix. const arma::mat& W() const { return w; } //! Get the Item Matrix. const arma::mat& H() const { return h; } //! Get the Rating Matrix. const arma::mat& Rating() const { return rating; } //! Get the data matrix. const arma::mat& Data() const { return data; } //! Get the cleaned data matrix. const arma::sp_mat& CleanedData() const { return cleanedData; } /** * Generates the given number of recommendations for all users. * * @param numRecs Number of Recommendations * @param recommendations Matrix to save recommendations into. */ void GetRecommendations(const size_t numRecs, arma::Mat& recommendations); /** * Generates the given number of recommendations for the specified users. * * @param numRecs Number of Recommendations * @param recommendations Matrix to save recommendations * @param users Users for which recommendations are to be generated */ void GetRecommendations(const size_t numRecs, arma::Mat& recommendations, arma::Col& users); /** * Returns a string representation of this object. */ std::string ToString() const; private: //! Initial data matrix. arma::mat data; //! Number of users for similarity. size_t numUsersForSimilarity; //! Rank used for matrix factorization. size_t rank; //! Instantiated factorizer object. FactorizerType factorizer; //! User matrix. arma::mat w; //! Item matrix. arma::mat h; //! Rating matrix. arma::mat rating; //! Cleaned data matrix. arma::sp_mat cleanedData; //! Converts the User, Item, Value Matrix to User-Item Table void CleanData(); /** * Helper function to insert a point into the recommendation matrices. * * @param queryIndex Index of point whose recommendations we are inserting * into. * @param pos Position in list to insert into. * @param neighbor Index of item being inserted as a recommendation. * @param value Value of recommendation. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double value, arma::Mat& recommendations, arma::mat& values) const; }; // class CF }; // namespace cf }; // namespace mlpack //Include implementation #include "cf_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/quic_svd/0000755000176200001440000000000013647512751021210 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/quic_svd/quic_svd_impl.hpp0000644000176200001440000000561613647512751024567 0ustar liggesusers/** * @file quic_svd_impl.hpp * @author Siddharth Agrawal * * An implementation of QUIC-SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_IMPL_HPP #define __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_IMPL_HPP // In case it hasn't been included yet. #include "quic_svd.hpp" using namespace mlpack::tree; namespace mlpack { namespace svd { QUIC_SVD::QUIC_SVD(const arma::mat& dataset, arma::mat& u, arma::mat& v, arma::mat& sigma, const double epsilon, const double delta) : dataset(dataset), epsilon(epsilon), delta(delta) { // Since columns are sample in the implementation, the matrix is transposed if // necessary for maximum speedup. CosineTree* ctree; if (dataset.n_cols > dataset.n_rows) ctree = new CosineTree(dataset, epsilon, delta); else ctree = new CosineTree(dataset.t(), epsilon, delta); // Get subspace basis by creating the cosine tree. ctree->GetFinalBasis(basis); // Use the ExtractSVD algorithm mentioned in the paper to extract the SVD of // the original dataset in the obtained subspace. ExtractSVD(u, v, sigma); } void QUIC_SVD::ExtractSVD(arma::mat& u, arma::mat& v, arma::mat& sigma) { // Calculate A * V_hat, necessary for further calculations. arma::mat projectedMat; if (dataset.n_cols > dataset.n_rows) projectedMat = dataset.t() * basis; else projectedMat = dataset * basis; // Calculate the squared projected matrix. arma::mat projectedMatSquared = projectedMat.t() * projectedMat; // Calculate the SVD of the above matrix. arma::mat uBar, vBar; arma::vec sigmaBar; arma::svd(uBar, sigmaBar, vBar, projectedMatSquared); // Calculate the approximate SVD of the original matrix, using the SVD of the // squared projected matrix. v = basis * vBar; sigma = arma::sqrt(diagmat(sigmaBar)); u = projectedMat * vBar * sigma.i(); // Since columns are sampled, the unitary matrices have to be exchanged, if // the transposed matrix is not passed. if (dataset.n_cols > dataset.n_rows) { arma::mat tempMat = u; u = v; v = tempMat; } } }; // namespace svd }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/quic_svd/quic_svd.hpp0000644000176200001440000000523413647512751023542 0ustar liggesusers/** * @file quic_svd.hpp * @author Siddharth Agrawal * * An implementation of QUIC-SVD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_HPP #define __MLPACK_METHODS_QUIC_SVD_QUIC_SVD_HPP #include #include namespace mlpack { namespace svd { class QUIC_SVD { public: /** * Constructor which implements the QUIC-SVD algorithm. The function calls the * CosineTree constructor to create a subspace basis, where the original * matrix's projection has minimum reconstruction error. The constructor then * uses the ExtractSVD() function to calculate the SVD of the original dataset * in that subspace. * * @param dataset Matrix for which SVD is calculated. * @param u First unitary matrix. * @param v Second unitary matrix. * @param sigma Diagonal matrix of singular values. * @param epsilon Error tolerance fraction for calculated subspace. * @param delta Cumulative probability for Monte Carlo error lower bound. */ QUIC_SVD(const arma::mat& dataset, arma::mat& u, arma::mat& v, arma::mat& sigma, const double epsilon = 0.03, const double delta = 0.1); /** * This function uses the vector subspace created using a cosine tree to * calculate an approximate SVD of the original matrix. * * @param u First unitary matrix. * @param v Second unitary matrix. * @param sigma Diagonal matrix of singular values. */ void ExtractSVD(arma::mat& u, arma::mat& v, arma::mat& sigma); private: //! Matrix for which cosine tree is constructed. const arma::mat& dataset; //! Error tolerance fraction for calculated subspace. double epsilon; //! Cumulative probability for Monte Carlo error lower bound. double delta; //! Subspace basis of the input dataset. arma::mat basis; }; }; // namespace svd }; // namespace mlpack // Include implementation. #include "quic_svd_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/local_coordinate_coding/0000755000176200001440000000000013647512751024217 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/local_coordinate_coding/lcc_impl.hpp0000644000176200001440000002557713647512751026532 0ustar liggesusers/** * @file lcc_impl.hpp * @author Nishant Mehta * * Implementation of Local Coordinate Coding * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_IMPL_HPP #define __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_IMPL_HPP // In case it hasn't been included yet. #include "lcc.hpp" namespace mlpack { namespace lcc { template LocalCoordinateCoding::LocalCoordinateCoding( const arma::mat& data, const size_t atoms, const double lambda) : atoms(atoms), data(data), codes(atoms, data.n_cols), lambda(lambda) { // Initialize the dictionary. DictionaryInitializer::Initialize(data, atoms, dictionary); } template void LocalCoordinateCoding::Encode( const size_t maxIterations, const double objTolerance) { //Timer::Start("local_coordinate_coding"); double lastObjVal = DBL_MAX; // Take the initial coding step, which has to happen before entering the main // loop. Rcpp::Rcout << "Initial Coding Step." << std::endl; OptimizeCode(); arma::uvec adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double)(adjacencies.n_elem)) / ((double)(atoms * data.n_cols)) << "%.\n"; Rcpp::Rcout << " Objective value: " << Objective(adjacencies) << "." << std::endl; for (size_t t = 1; t != maxIterations; t++) { Rcpp::Rcout << "Iteration " << t << " of " << maxIterations << "." << std::endl; // First step: optimize the dictionary. Rcpp::Rcout << "Performing dictionary step..." << std::endl; OptimizeDictionary(adjacencies); double dsObjVal = Objective(adjacencies); Rcpp::Rcout << " Objective value: " << Objective(adjacencies) << "." << std::endl; // Second step: perform the coding. Rcpp::Rcout << "Performing coding step..." << std::endl; OptimizeCode(); adjacencies = find(codes); Rcpp::Rcout << " Sparsity level: " << 100.0 * ((double) (adjacencies.n_elem)) / ((double)(atoms * data.n_cols)) << "%.\n"; // Terminate if the objective increased in the coding step. double curObjVal = Objective(adjacencies); if (curObjVal > dsObjVal) { Rcpp::Rcout << "Objective increased in coding step! Terminating." << std::endl; break; } // Find the new objective value and improvement so we can check for // convergence. double improvement = lastObjVal - curObjVal; Rcpp::Rcout << "Objective value: " << curObjVal << " (improvement " << std::scientific << improvement << ")." << std::endl; if (improvement < objTolerance) { Rcpp::Rcout << "Converged within tolerance " << objTolerance << ".\n"; break; } lastObjVal = curObjVal; } //Timer::Stop("local_coordinate_coding"); } template void LocalCoordinateCoding::OptimizeCode() { arma::mat invSqDists = 1.0 / (repmat(trans(sum(square(dictionary))), 1, data.n_cols) + repmat(sum(square(data)), atoms, 1) - 2 * trans(dictionary) * data); arma::mat dictGram = trans(dictionary) * dictionary; arma::mat dictGramTD(dictGram.n_rows, dictGram.n_cols); for (size_t i = 0; i < data.n_cols; i++) { // report progress if ((i % 100) == 0) { Rcpp::Rcout << "Optimization at point " << i << "." << std::endl; } arma::vec invW = invSqDists.unsafe_col(i); arma::mat dictPrime = dictionary * diagmat(invW); arma::mat dictGramTD = diagmat(invW) * dictGram * diagmat(invW); bool useCholesky = false; regression::LARS lars(useCholesky, dictGramTD, 0.5 * lambda); // Run LARS for this point, by making an alias of the point and passing // that. arma::vec beta = codes.unsafe_col(i); lars.Regress(dictPrime, data.unsafe_col(i), beta, false); beta %= invW; // Remember, beta is an alias of codes.col(i). } } template void LocalCoordinateCoding::OptimizeDictionary( arma::uvec adjacencies) { // Count number of atomic neighbors for each point x^i. arma::uvec neighborCounts = arma::zeros(data.n_cols, 1); if (adjacencies.n_elem > 0) { // This gets the column index. Intentional integer division. size_t curPointInd = (size_t) (adjacencies(0) / atoms); ++neighborCounts(curPointInd); size_t nextColIndex = (curPointInd + 1) * atoms; for (size_t l = 1; l < adjacencies.n_elem; l++) { // If l no longer refers to an element in this column, advance the column // number accordingly. if (adjacencies(l) >= nextColIndex) { curPointInd = (size_t) (adjacencies(l) / atoms); nextColIndex = (curPointInd + 1) * atoms; } ++neighborCounts(curPointInd); } } // Build dataPrime := [X x^1 ... x^1 ... x^n ... x^n] // where each x^i is repeated for the number of neighbors x^i has. arma::mat dataPrime = arma::zeros(data.n_rows, data.n_cols + adjacencies.n_elem); dataPrime(arma::span::all, arma::span(0, data.n_cols - 1)) = data; size_t curCol = data.n_cols; for (size_t i = 0; i < data.n_cols; i++) { if (neighborCounts(i) > 0) { dataPrime(arma::span::all, arma::span(curCol, curCol + neighborCounts(i) - 1)) = repmat(data.col(i), 1, neighborCounts(i)); } curCol += neighborCounts(i); } // Handle the case of inactive atoms (atoms not used in the given coding). std::vector inactiveAtoms; for (size_t j = 0; j < atoms; ++j) if (accu(codes.row(j) != 0) == 0) inactiveAtoms.push_back(j); const size_t nInactiveAtoms = inactiveAtoms.size(); const size_t nActiveAtoms = atoms - nInactiveAtoms; // Efficient construction of codes restricted to active atoms. arma::mat codesPrime = arma::zeros(nActiveAtoms, data.n_cols + adjacencies.n_elem); arma::vec wSquared = arma::ones(data.n_cols + adjacencies.n_elem, 1); if (nInactiveAtoms > 0) { Rcpp::Rcout << "There are " << nInactiveAtoms << " inactive atoms. They will be re-initialized randomly.\n"; // Create matrix holding only active codes. arma::mat activeCodes; math::RemoveRows(codes, inactiveAtoms, activeCodes); // Create reverse atom lookup for active atoms. arma::uvec atomReverseLookup(atoms); size_t inactiveOffset = 0; for (size_t i = 0; i < atoms; ++i) { if (inactiveAtoms[inactiveOffset] == i) ++inactiveOffset; else atomReverseLookup(i - inactiveOffset) = i; } codesPrime(arma::span::all, arma::span(0, data.n_cols - 1)) = activeCodes; // Fill the rest of codesPrime. for (size_t l = 0; l < adjacencies.n_elem; ++l) { // Recover the location in the codes matrix that this adjacency refers to. size_t atomInd = adjacencies(l) % atoms; size_t pointInd = (size_t) (adjacencies(l) / atoms); // Fill matrix. codesPrime(atomReverseLookup(atomInd), data.n_cols + l) = 1.0; wSquared(data.n_cols + l) = codes(atomInd, pointInd); } } else { // All atoms are active. codesPrime(arma::span::all, arma::span(0, data.n_cols - 1)) = codes; for (size_t l = 0; l < adjacencies.n_elem; ++l) { // Recover the location in the codes matrix that this adjacency refers to. size_t atomInd = adjacencies(l) % atoms; size_t pointInd = (size_t) (adjacencies(l) / atoms); // Fill matrix. codesPrime(atomInd, data.n_cols + l) = 1.0; wSquared(data.n_cols + l) = codes(atomInd, pointInd); } } wSquared.subvec(data.n_cols, wSquared.n_elem - 1) = lambda * abs(wSquared.subvec(data.n_cols, wSquared.n_elem - 1)); // Solve system. if (nInactiveAtoms == 0) { // No inactive atoms. We can solve directly. arma::mat A = codesPrime * diagmat(wSquared) * trans(codesPrime); arma::mat B = codesPrime * diagmat(wSquared) * trans(dataPrime); dictionary = trans(solve(A, B)); /* dictionary = trans(solve(codesPrime * diagmat(wSquared) * trans(codesPrime), codesPrime * diagmat(wSquared) * trans(dataPrime))); */ } else { // Inactive atoms must be reinitialized randomly, so we cannot solve // directly for the entire dictionary estimate. arma::mat dictionaryActive = trans(solve(codesPrime * diagmat(wSquared) * trans(codesPrime), codesPrime * diagmat(wSquared) * trans(dataPrime))); // Update all atoms. size_t currentInactiveIndex = 0; for (size_t i = 0; i < atoms; ++i) { if (inactiveAtoms[currentInactiveIndex] == i) { // This atom is inactive. Reinitialize it randomly. dictionary.col(i) = (data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols)) + data.col(math::RandInt(data.n_cols))); // Now normalize the atom. dictionary.col(i) /= norm(dictionary.col(i), 2); // Increment inactive atom counter. ++currentInactiveIndex; } else { // Update estimate. dictionary.col(i) = dictionaryActive.col(i - currentInactiveIndex); } } } } template double LocalCoordinateCoding::Objective( arma::uvec adjacencies) const { double weightedL1NormZ = 0; for (size_t l = 0; l < adjacencies.n_elem; l++) { // Map adjacency back to its location in the codes matrix. const size_t atomInd = adjacencies(l) % atoms; const size_t pointInd = (size_t) (adjacencies(l) / atoms); weightedL1NormZ += fabs(codes(atomInd, pointInd)) * arma::as_scalar( arma::sum(arma::square(dictionary.col(atomInd) - data.col(pointInd)))); } double froNormResidual = norm(data - dictionary * codes, "fro"); return std::pow(froNormResidual, 2.0) + lambda * weightedL1NormZ; } template std::string LocalCoordinateCoding::ToString() const { std::ostringstream convert; convert << "Local Coordinate Coding [" << this << "]" << std::endl; convert << " Number of Atoms: " << atoms << std::endl; convert << " Data: " << data.n_rows << "x" << data.n_cols << std::endl; convert << " Lambda: " << lambda << std::endl; return convert.str(); } }; // namespace lcc }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/local_coordinate_coding/lcc.hpp0000644000176200001440000001322613647512751025475 0ustar liggesusers/** * @file lcc.hpp * @author Nishant Mehta * * Definition of the LocalCoordinateCoding class, which performs the Local * Coordinate Coding algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_HPP #define __MLPACK_METHODS_LOCAL_COORDINATE_CODING_LCC_HPP #include #include // Include three simple dictionary initializers from sparse coding. #include "../sparse_coding/nothing_initializer.hpp" #include "../sparse_coding/data_dependent_random_initializer.hpp" #include "../sparse_coding/random_initializer.hpp" namespace mlpack { namespace lcc { /** * An implementation of Local Coordinate Coding (LCC) that codes data which * approximately lives on a manifold using a variation of l1-norm regularized * sparse coding; in LCC, the penalty on the absolute value of each point's * coefficient for each atom is weighted by the squared distance of that point * to that atom. * * Let d be the number of dimensions in the original space, m the number of * training points, and k the number of atoms in the dictionary (the dimension * of the learned feature space). The training data X is a d-by-m matrix where * each column is a point and each row is a dimension. The dictionary D is a * d-by-k matrix, and the sparse codes matrix Z is a k-by-m matrix. * This program seeks to minimize the objective: * min_{D,Z} ||X - D Z||_{Fro}^2 * + lambda sum_{i=1}^m sum_{j=1}^k dist(X_i,D_j)^2 Z_i^j * where lambda > 0. * * This problem is solved by an algorithm that alternates between a dictionary * learning step and a sparse coding step. The dictionary learning step updates * the dictionary D by solving a linear system (note that the objective is a * positive definite quadratic program). The sparse coding step involves * solving a large number of weighted l1-norm regularized linear regression * problems problems; this can be done efficiently using LARS, an algorithm * that can solve the LASSO (paper below). * * The papers are listed below. * * @code * @incollection{NIPS2009_0719, * title = {Nonlinear Learning using Local Coordinate Coding}, * author = {Kai Yu and Tong Zhang and Yihong Gong}, * booktitle = {Advances in Neural Information Processing Systems 22}, * editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C. K. I. Williams * and A. Culotta}, * pages = {2223--2231}, * year = {2009} * } * @endcode * * @code * @article{efron2004least, * title={Least angle regression}, * author={Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R.}, * journal={The Annals of statistics}, * volume={32}, * number={2}, * pages={407--499}, * year={2004}, * publisher={Institute of Mathematical Statistics} * } * @endcode */ template class LocalCoordinateCoding { public: /** * Set the parameters to LocalCoordinateCoding. * * @param data Data matrix. * @param atoms Number of atoms in dictionary. * @param lambda Regularization parameter for weighted l1-norm penalty. */ LocalCoordinateCoding(const arma::mat& data, const size_t atoms, const double lambda); /** * Run local coordinate coding. * * @param nIterations Maximum number of iterations to run algorithm. * @param objTolerance Tolerance of objective function. When the objective * function changes by a value lower than this tolerance, the optimization * terminates. */ void Encode(const size_t maxIterations = 0, const double objTolerance = 0.01); /** * Code each point via distance-weighted LARS. */ void OptimizeCode(); /** * Learn dictionary by solving linear system. * * @param adjacencies Indices of entries (unrolled column by column) of * the coding matrix Z that are non-zero (the adjacency matrix for the * bipartite graph of points and atoms) */ void OptimizeDictionary(arma::uvec adjacencies); /** * Compute objective function given the list of adjacencies. */ double Objective(arma::uvec adjacencies) const; //! Access the data. const arma::mat& Data() const { return data; } //! Accessor for dictionary. const arma::mat& Dictionary() const { return dictionary; } //! Mutator for dictionary. arma::mat& Dictionary() { return dictionary; } //! Accessor the codes. const arma::mat& Codes() const { return codes; } //! Modify the codes. arma::mat& Codes() { return codes; } // Returns a string representation of this object. std::string ToString() const; private: //! Number of atoms in dictionary. size_t atoms; //! Data matrix (columns are points). const arma::mat& data; //! Dictionary (columns are atoms). arma::mat dictionary; //! Codes (columns are points). arma::mat codes; //! l1 regularization term. double lambda; }; }; // namespace lcc }; // namespace mlpack // Include implementation. #include "lcc_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/range_search/0000755000176200001440000000000013647512751022014 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/range_search/range_search_rules_impl.hpp0000644000176200001440000002224713647512751027410 0ustar liggesusers/** * @file range_search_rules_impl.hpp * @author Ryan Curtin * * Implementation of rules for range search with generic trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_IMPL_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_IMPL_HPP // In case it hasn't been included yet. #include "range_search_rules.hpp" namespace mlpack { namespace range { template RangeSearchRules::RangeSearchRules( const arma::mat& referenceSet, const arma::mat& querySet, const math::Range& range, std::vector >& neighbors, std::vector >& distances, MetricType& metric) : referenceSet(referenceSet), querySet(querySet), range(range), neighbors(neighbors), distances(distances), metric(metric), lastQueryIndex(querySet.n_cols), lastReferenceIndex(referenceSet.n_cols) { // Nothing to do. } //! The base case. Evaluate the distance between the two points and add to the //! results if necessary. template inline force_inline double RangeSearchRules::BaseCase( const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, don't return the point as in its own range. if ((&referenceSet == &querySet) && (queryIndex == referenceIndex)) return 0.0; // If we have just performed this base case, don't do it again. if ((lastQueryIndex == queryIndex) && (lastReferenceIndex == referenceIndex)) return 0.0; // No value to return... this shouldn't do anything bad. const double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // Update last indices, so we don't accidentally perform a base case twice. lastQueryIndex = queryIndex; lastReferenceIndex = referenceIndex; if (range.Contains(distance)) { neighbors[queryIndex].push_back(referenceIndex); distances[queryIndex].push_back(distance); } return distance; } //! Single-tree scoring function. template double RangeSearchRules::Score(const size_t queryIndex, TreeType& referenceNode) { // We must get the minimum and maximum distances and store them in this // object. math::Range distances; if (tree::TreeTraits::FirstPointIsCentroid) { // In this situation, we calculate the base case. So we should check to be // sure we haven't already done that. double baseCase; if (tree::TreeTraits::HasSelfChildren && (referenceNode.Parent() != NULL) && (referenceNode.Point(0) == referenceNode.Parent()->Point(0))) { // If the tree has self-children and this is a self-child, the base case // was already calculated. baseCase = referenceNode.Parent()->Stat().LastDistance(); lastQueryIndex = queryIndex; lastReferenceIndex = referenceNode.Point(0); } else { // We must calculate the base case by hand. baseCase = BaseCase(queryIndex, referenceNode.Point(0)); } // This may be possibly loose for non-ball bound trees. distances.Lo() = baseCase - referenceNode.FurthestDescendantDistance(); distances.Hi() = baseCase + referenceNode.FurthestDescendantDistance(); // Update last distance calculation. referenceNode.Stat().LastDistance() = baseCase; } else { distances = referenceNode.RangeDistance(querySet.unsafe_col(queryIndex)); } // If the ranges do not overlap, prune this node. if (!distances.Contains(range)) return DBL_MAX; // In this case, all of the points in the reference node will be part of the // results. if ((distances.Lo() >= range.Lo()) && (distances.Hi() <= range.Hi())) { AddResult(queryIndex, referenceNode); return DBL_MAX; // We don't need to go any deeper. } // Otherwise the score doesn't matter. Recursion order is irrelevant in // range search. return 0.0; } //! Single-tree rescoring function. template double RangeSearchRules::Rescore( const size_t /* queryIndex */, TreeType& /* referenceNode */, const double oldScore) const { // If it wasn't pruned before, it isn't pruned now. return oldScore; } //! Dual-tree scoring function. template double RangeSearchRules::Score(TreeType& queryNode, TreeType& referenceNode) { math::Range distances; if (tree::TreeTraits::FirstPointIsCentroid) { // It is possible that the base case has already been calculated. double baseCase = 0.0; if ((traversalInfo.LastQueryNode() != NULL) && (traversalInfo.LastReferenceNode() != NULL) && (traversalInfo.LastQueryNode()->Point(0) == queryNode.Point(0)) && (traversalInfo.LastReferenceNode()->Point(0) == referenceNode.Point(0))) { baseCase = traversalInfo.LastBaseCase(); // Make sure that if BaseCase() is called, we don't duplicate results. lastQueryIndex = queryNode.Point(0); lastReferenceIndex = referenceNode.Point(0); } else { // We must calculate the base case. baseCase = BaseCase(queryNode.Point(0), referenceNode.Point(0)); } distances.Lo() = baseCase - queryNode.FurthestDescendantDistance() - referenceNode.FurthestDescendantDistance(); distances.Hi() = baseCase + queryNode.FurthestDescendantDistance() + referenceNode.FurthestDescendantDistance(); // Update the last distances performed for the query and reference node. traversalInfo.LastBaseCase() = baseCase; } else { // Just perform the calculation. distances = referenceNode.RangeDistance(&queryNode); } // If the ranges do not overlap, prune this node. if (!distances.Contains(range)) return DBL_MAX; // In this case, all of the points in the reference node will be part of all // the results for each point in the query node. if ((distances.Lo() >= range.Lo()) && (distances.Hi() <= range.Hi())) { for (size_t i = 0; i < queryNode.NumDescendants(); ++i) AddResult(queryNode.Descendant(i), referenceNode); return DBL_MAX; // We don't need to go any deeper. } // Otherwise the score doesn't matter. Recursion order is irrelevant in range // search. traversalInfo.LastQueryNode() = &queryNode; traversalInfo.LastReferenceNode() = &referenceNode; return 0.0; } //! Dual-tree rescoring function. template double RangeSearchRules::Rescore( TreeType& /* queryNode */, TreeType& /* referenceNode */, const double oldScore) const { // If it wasn't pruned before, it isn't pruned now. return oldScore; } //! Add all the points in the given node to the results for the given query //! point. template void RangeSearchRules::AddResult(const size_t queryIndex, TreeType& referenceNode) { // Some types of trees calculate the base case evaluation before Score() is // called, so if the base case has already been calculated, then we must avoid // adding that point to the results again. size_t baseCaseMod = 0; if (tree::TreeTraits::FirstPointIsCentroid && (queryIndex == lastQueryIndex) && (referenceNode.Point(0) == lastReferenceIndex)) { baseCaseMod = 1; } // Resize distances and neighbors vectors appropriately. We have to use // reserve() and not resize(), because we don't know if we will encounter the // case where the datasets and points are the same (and we skip in that case). const size_t oldSize = neighbors[queryIndex].size(); neighbors[queryIndex].reserve(oldSize + referenceNode.NumDescendants() - baseCaseMod); distances[queryIndex].reserve(oldSize + referenceNode.NumDescendants() - baseCaseMod); for (size_t i = baseCaseMod; i < referenceNode.NumDescendants(); ++i) { if ((&referenceSet == &querySet) && (queryIndex == referenceNode.Descendant(i))) continue; const double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceNode.Dataset().unsafe_col(referenceNode.Descendant(i))); neighbors[queryIndex].push_back(referenceNode.Descendant(i)); distances[queryIndex].push_back(distance); } } }; // namespace range }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/range_search/range_search_impl.hpp0000644000176200001440000002636713647512751026205 0ustar liggesusers/** * @file range_search_impl.hpp * @author Ryan Curtin * * Implementation of the RangeSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_IMPL_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_IMPL_HPP // Just in case it hasn't been included. #include "range_search.hpp" // The rules for traversal. #include "range_search_rules.hpp" namespace mlpack { namespace range { template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } template RangeSearch::RangeSearch( const typename TreeType::Mat& referenceSetIn, const typename TreeType::Mat& querySetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? queryCopy : querySetIn), treeOwner(!naive), // If in naive mode, we are not building any trees. hasQuerySet(true), naive(naive), singleMode(!naive && singleMode), // Naive overrides single mode. metric(metric), numPrunes(0) { // Build the trees. //Timer::Start("range_search/tree_building"); // Copy the datasets, if they will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) { referenceCopy = referenceSetIn; queryCopy = querySetIn; } // If in naive mode, then we do not need to build trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = BuildTree( const_cast(querySet), oldFromNewQueries); } //Timer::Stop("range_search/tree_building"); } template RangeSearch::RangeSearch( const typename TreeType::Mat& referenceSetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), queryTree(NULL), treeOwner(!naive), // If in naive mode, we are not building any trees. hasQuerySet(false), naive(naive), singleMode(!naive && singleMode), // Naive overrides single mode. metric(metric), numPrunes(0) { // Build the trees. //Timer::Start("range_search/tree_building"); // Copy the dataset, if it will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) referenceCopy = referenceSetIn; // If in naive mode, then we do not need to build trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = new TreeType(*referenceTree); } //Timer::Stop("range_search/tree_building"); } template RangeSearch::RangeSearch( TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), hasQuerySet(true), naive(false), singleMode(singleMode), metric(metric), numPrunes(0) { // Nothing else to initialize. } template RangeSearch::RangeSearch( TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), hasQuerySet(false), naive(false), singleMode(singleMode), metric(metric), numPrunes(0) { // If doing dual-tree range search, we must clone the reference tree. if (!singleMode) queryTree = new TreeType(*referenceTree); } template RangeSearch::~RangeSearch() { if (treeOwner) { if (referenceTree) delete referenceTree; if (queryTree) delete queryTree; } // If doing dual-tree search with one dataset, we cloned the reference tree. if (!treeOwner && !hasQuerySet && !(singleMode || naive)) delete queryTree; } template void RangeSearch::Search( const math::Range& range, std::vector >& neighbors, std::vector >& distances) { //Timer::Start("range_search/computing_neighbors"); // Set size of prunes to 0. numPrunes = 0; // If we have built the trees ourselves, then we will have to map all the // indices back to their original indices when this computation is finished. // To avoid extra copies, we will store the unmapped neighbors and distances // in a separate object. std::vector >* neighborPtr = &neighbors; std::vector >* distancePtr = &distances; // Mapping is only necessary if the tree rearranges points. if (tree::TreeTraits::RearrangesDataset) { if (treeOwner && !(singleMode && hasQuerySet)) distancePtr = new std::vector >; // Query indices need to be mapped. if (treeOwner) neighborPtr = new std::vector >; // All indices need mapping. } // Resize each vector. neighborPtr->clear(); // Just in case there was anything in it. neighborPtr->resize(querySet.n_cols); distancePtr->clear(); distancePtr->resize(querySet.n_cols); // Create the helper object for the traversal. typedef RangeSearchRules RuleType; RuleType rules(referenceSet, querySet, range, *neighborPtr, *distancePtr, metric); if (naive) { // The naive brute-force solution. for (size_t i = 0; i < querySet.n_cols; ++i) for (size_t j = 0; j < referenceSet.n_cols; ++j) rules.BaseCase(i, j); } else if (singleMode) { // Create the traverser. typename TreeType::template SingleTreeTraverser traverser(rules); // Now have it traverse for each point. for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); numPrunes = traverser.NumPrunes(); } else // Dual-tree recursion. { // Create the traverser. typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*queryTree, *referenceTree); numPrunes = traverser.NumPrunes(); } //Timer::Stop("range_search/computing_neighbors"); // Output number of prunes. Rcpp::Rcout << "Number of pruned nodes during computation: " << numPrunes << "." << std::endl; // Map points back to original indices, if necessary. if (!treeOwner || !tree::TreeTraits::RearrangesDataset) { // No mapping needed. We are done. return; } else if (treeOwner && hasQuerySet && !singleMode) // Map both sets. { neighbors.clear(); neighbors.resize(querySet.n_cols); distances.clear(); distances.resize(querySet.n_cols); for (size_t i = 0; i < distances.size(); i++) { // Map distances (copy a column). size_t queryMapping = oldFromNewQueries[i]; distances[queryMapping] = (*distancePtr)[i]; // Copy each neighbor individually, because we need to map it. neighbors[queryMapping].resize(distances[queryMapping].size()); for (size_t j = 0; j < distances[queryMapping].size(); j++) { neighbors[queryMapping][j] = oldFromNewReferences[(*neighborPtr)[i][j]]; } } // Finished with temporary objects. delete neighborPtr; delete distancePtr; } else if (treeOwner && !hasQuerySet) { neighbors.clear(); neighbors.resize(querySet.n_cols); distances.clear(); distances.resize(querySet.n_cols); for (size_t i = 0; i < distances.size(); i++) { // Map distances (copy a column). size_t refMapping = oldFromNewReferences[i]; distances[refMapping] = (*distancePtr)[i]; // Copy each neighbor individually, because we need to map it. neighbors[refMapping].resize(distances[refMapping].size()); for (size_t j = 0; j < distances[refMapping].size(); j++) { neighbors[refMapping][j] = oldFromNewReferences[(*neighborPtr)[i][j]]; } } // Finished with temporary objects. delete neighborPtr; delete distancePtr; } else if (treeOwner && hasQuerySet && singleMode) // Map only references. { neighbors.clear(); neighbors.resize(querySet.n_cols); // Map indices of neighbors. for (size_t i = 0; i < neighbors.size(); i++) { neighbors[i].resize((*neighborPtr)[i].size()); for (size_t j = 0; j < neighbors[i].size(); j++) { neighbors[i][j] = oldFromNewReferences[(*neighborPtr)[i][j]]; } } // Finished with temporary object. delete neighborPtr; } } template std::string RangeSearch::ToString() const { std::ostringstream convert; convert << "Range Search [" << this << "]" << std::endl; if (treeOwner) convert << " Tree Owner: TRUE" << std::endl; if (naive) convert << " Naive: TRUE" << std::endl; convert << " Metric: " << std::endl << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace range }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/range_search/range_search.hpp0000644000176200001440000002422013647512751025146 0ustar liggesusers/** * @file range_search.hpp * @author Ryan Curtin * * Defines the RangeSearch class, which performs a generalized range search on * points. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_HPP #include #include #include #include "range_search_stat.hpp" namespace mlpack { namespace range /** Range-search routines. */ { /** * The RangeSearch class is a template class for performing range searches. It * is implemented in the style of a generalized tree-independent dual-tree * algorithm; for more details on the actual algorithm, see the RangeSearchRules * class. */ template, RangeSearchStat> > class RangeSearch { public: /** * Initialize the RangeSearch object with a different reference set and a * query set. Optionally, perform the computation in naive mode or * single-tree mode, and set the leaf size used for tree-building. * Additionally, an instantiated metric can be given, for cases where the * distance metric holds data. * * This method will copy the matrices to internal copies, which are rearranged * during tree-building. You can avoid this extra copy by pre-constructing * the trees and passing them using a different constructor. * * @param referenceSet Reference dataset. * @param querySet Query dataset. * @param naive Whether the computation should be done in O(n^2) naive mode. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param leafSize The leaf size to be used during tree construction. * @param metric Instantiated distance metric. */ RangeSearch(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RangeSearch object with only a reference set, which will * also be used as a query set. Optionally, perform the computation in naive * mode or single-tree mode, and set the leaf size used for tree-building. * Additionally an instantiated metric can be given, for cases where the * distance metric holds data. * * This method will copy the reference matrix to an internal copy, which is * rearranged during tree-building. You can avoid this extra copy by * pre-constructing the reference tree and passing it using a different * constructor. * * @param referenceSet Reference dataset. * @param naive Whether the computation should be done in O(n^2) naive mode. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param leafSize The leaf size to be used during tree construction. * @param metric Instantiated distance metric. */ RangeSearch(const typename TreeType::Mat& referenceSet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RangeSearch object with the given datasets and * pre-constructed trees. It is assumed that the points in referenceSet and * querySet correspond to the points in referenceTree and queryTree, * respectively. Optionally, choose to use single-tree mode. Naive * mode is not available as an option for this constructor; instead, to run * naive computation, construct a tree with all the points in one leaf (i.e. * leafSize = number of points). Additionally, an instantiated distance * metric can be given, for cases where the distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param queryTree Pre-built tree for query points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param querySet Set of query points corresponding to queryTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RangeSearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the RangeSearch object with the given reference dataset and * pre-constructed tree. It is assumed that the points in referenceSet * correspond to the points in referenceTree. Optionally, choose to use * single-tree mode. Naive mode is not available as an option for this * constructor; instead, to run naive computation, construct a tree with all * the points in one leaf (i.e. leafSize = number of points). Additionally, * an instantiated distance metric can be given, for the case where the * distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ RangeSearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Destroy the RangeSearch object. If trees were created, they will be * deleted. */ ~RangeSearch(); /** * Search for all points in the given range, returning the results in the * neighbors and distances objects. Each entry in the external vector * corresponds to a query point. Each of these entries holds a vector which * contains the indices and distances of the reference points falling into the * given range. * * That is: * * - neighbors.size() and distances.size() both equal the number of query * points. * * - neighbors[i] contains the indices of all the points in the reference set * which have distances inside the given range to query point i. * * - distances[i] contains all of the distances corresponding to the indices * contained in neighbors[i]. * * - neighbors[i] and distances[i] are not sorted in any particular order. * * @param range Range of distances in which to search. * @param neighbors Object which will hold the list of neighbors for each * point which fell into the given range, for each query point. * @param distances Object which will hold the list of distances for each * point which fell into the given range, for each query point. */ void Search(const math::Range& range, std::vector >& neighbors, std::vector >& distances); // Returns a string representation of this object. std::string ToString() const; private: //! Copy of reference matrix; used when a tree is built internally. typename TreeType::Mat referenceCopy; //! Copy of query matrix; used when a tree is built internally. typename TreeType::Mat queryCopy; //! Reference set (data should be accessed using this). const typename TreeType::Mat& referenceSet; //! Query set (data should be accessed using this). const typename TreeType::Mat& querySet; //! Reference tree. TreeType* referenceTree; //! Query tree (may be NULL). TreeType* queryTree; //! Mappings to old reference indices (used when this object builds trees). std::vector oldFromNewReferences; //! Mappings to old query indices (used when this object builds trees). std::vector oldFromNewQueries; //! If true, this object is responsible for deleting the trees. bool treeOwner; //! If true, a query set was passed; if false, the query set is the reference //! set. bool hasQuerySet; //! If true, O(n^2) naive computation is used. bool naive; //! If true, single-tree computation is used. bool singleMode; //! Instantiated distance metric. MetricType metric; //! The number of pruned nodes during computation. size_t numPrunes; }; }; // namespace range }; // namespace mlpack // Include implementation. #include "range_search_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/range_search/range_search_rules.hpp0000644000176200001440000001302613647512751026362 0ustar liggesusers/** * @file range_search_rules.hpp * @author Ryan Curtin * * Rules for range search, so that it can be done with arbitrary tree types. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_RULES_HPP #include "../neighbor_search/ns_traversal_info.hpp" namespace mlpack { namespace range { template class RangeSearchRules { public: /** * Construct the RangeSearchRules object. This is usually done from within * the RangeSearch class at search time. * * @param referenceSet Set of reference data. * @param querySet Set of query data. * @param range Range to search for. * @param neighbors Vector to store resulting neighbors in. * @param distances Vector to store resulting distances in. * @param metric Instantiated metric. */ RangeSearchRules(const arma::mat& referenceSet, const arma::mat& querySet, const math::Range& range, std::vector >& neighbors, std::vector >& distances, MetricType& metric); /** * Compute the base case between the given query point and reference point. * * @param queryIndex Index of query point. * @param referenceIndex Index of reference point. */ double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) const; /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference set. const arma::mat& referenceSet; //! The query set. const arma::mat& querySet; //! The range of distances for which we are searching. const math::Range& range; //! The vector the resultant neighbor indices should be stored in. std::vector >& neighbors; //! The vector the resultant neighbor distances should be stored in. std::vector >& distances; //! The instantiated metric. MetricType& metric; //! The last query index. size_t lastQueryIndex; //! The last reference index. size_t lastReferenceIndex; //! Add all the points in the given node to the results for the given query //! point. If the base case has already been calculated, we make sure to not //! add that to the results twice. void AddResult(const size_t queryIndex, TreeType& referenceNode); TraversalInfoType traversalInfo; }; }; // namespace range }; // namespace mlpack // Include implementation. #include "range_search_rules_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/range_search/range_search_stat.hpp0000644000176200001440000000437513647512751026212 0ustar liggesusers/** * @file range_search_stat.hpp * @author Ryan Curtin * * Statistic class for RangeSearch, which just holds the last visited node and * the corresponding base case result. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_STAT_HPP #define __MLPACK_METHODS_RANGE_SEARCH_RANGE_SEARCH_STAT_HPP #include namespace mlpack { namespace range { /** * Statistic class for RangeSearch, to be set to the StatisticType of the tree * type that range search is being performed with. This class just holds the * last visited node and the corresponding base case result. */ class RangeSearchStat { public: /** * Initialize the statistic. */ RangeSearchStat() : lastDistanceNode(NULL), lastDistance(0.0) { } /** * Initialize the statistic given a tree node that this statistic belongs to. * In this case, we ignore the node. */ template RangeSearchStat(TreeType& /* node */) : lastDistanceNode(NULL), lastDistance(0.0) { } //! Get the last distance evaluation node. void* LastDistanceNode() const { return lastDistanceNode; } //! Modify the last distance evaluation node. void*& LastDistanceNode() { return lastDistanceNode; } //! Get the last distance evaluation. double LastDistance() const { return lastDistance; } //! Modify the last distance evaluation. double& LastDistance() { return lastDistance; } private: //! The last distance evaluation node. void* lastDistanceNode; //! The last distance evaluation. double lastDistance; }; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/lars/0000755000176200001440000000000013647512751020334 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/lars/lars.hpp0000644000176200001440000002045313647512751022012 0ustar liggesusers/** * @file lars.hpp * @author Nishant Mehta (niche) * * Definition of the LARS class, which performs Least Angle Regression and the * LASSO. * * Only minor modifications of LARS are necessary to handle the constrained * version of the problem: * * \f[ * \min_{\beta} 0.5 || X \beta - y ||_2^2 + 0.5 \lambda_2 || \beta ||_2^2 * \f] * subject to \f$ ||\beta||_1 <= \tau \f$ * * Although this option currently is not implemented, it will be implemented * very soon. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LARS_LARS_HPP #define __MLPACK_METHODS_LARS_LARS_HPP #include namespace mlpack { namespace regression { // beta is the estimator // yHat is the prediction from the current estimator /** * An implementation of LARS, a stage-wise homotopy-based algorithm for * l1-regularized linear regression (LASSO) and l1+l2 regularized linear * regression (Elastic Net). * * Let \f$ X \f$ be a matrix where each row is a point and each column is a * dimension and let \f$ y \f$ be a vector of responses. * * The Elastic Net problem is to solve * * \f[ \min_{\beta} 0.5 || X \beta - y ||_2^2 + \lambda_1 || \beta ||_1 + * 0.5 \lambda_2 || \beta ||_2^2 \f] * * where \f$ \beta \f$ is the vector of regression coefficients. * * If \f$ \lambda_1 > 0 \f$ and \f$ \lambda_2 = 0 \f$, the problem is the LASSO. * If \f$ \lambda_1 > 0 \f$ and \f$ \lambda_2 > 0 \f$, the problem is the * elastic net. * If \f$ \lambda_1 = 0 \f$ and \f$ \lambda_2 > 0 \f$, the problem is ridge * regression. * If \f$ \lambda_1 = 0 \f$ and \f$ \lambda_2 = 0 \f$, the problem is * unregularized linear regression. * * Note: This algorithm is not recommended for use (in terms of efficiency) * when \f$ \lambda_1 \f$ = 0. * * For more details, see the following papers: * * @code * @article{efron2004least, * title={Least angle regression}, * author={Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R.}, * journal={The Annals of statistics}, * volume={32}, * number={2}, * pages={407--499}, * year={2004}, * publisher={Institute of Mathematical Statistics} * } * @endcode * * @code * @article{zou2005regularization, * title={Regularization and variable selection via the elastic net}, * author={Zou, H. and Hastie, T.}, * journal={Journal of the Royal Statistical Society Series B}, * volume={67}, * number={2}, * pages={301--320}, * year={2005}, * publisher={Royal Statistical Society} * } * @endcode */ class LARS { public: /** * Set the parameters to LARS. Both lambda1 and lambda2 default to 0. * * @param useCholesky Whether or not to use Cholesky decomposition when * solving linear system (as opposed to using the full Gram matrix). * @param lambda1 Regularization parameter for l1-norm penalty. * @param lambda2 Regularization parameter for l2-norm penalty. * @param tolerance Run until the maximum correlation of elements in (X^T y) * is less than this. */ LARS(const bool useCholesky, const double lambda1 = 0.0, const double lambda2 = 0.0, const double tolerance = 1e-16); /** * Set the parameters to LARS, and pass in a precalculated Gram matrix. Both * lambda1 and lambda2 default to 0. * * @param useCholesky Whether or not to use Cholesky decomposition when * solving linear system (as opposed to using the full Gram matrix). * @param gramMatrix Gram matrix. * @param lambda1 Regularization parameter for l1-norm penalty. * @param lambda2 Regularization parameter for l2-norm penalty. * @param tolerance Run until the maximum correlation of elements in (X^T y) * is less than this. */ LARS(const bool useCholesky, const arma::mat& gramMatrix, const double lambda1 = 0.0, const double lambda2 = 0.0, const double tolerance = 1e-16); /** * Run LARS. The input matrix (like all MLPACK matrices) should be * column-major -- each column is an observation and each row is a dimension. * However, because LARS is more efficient on a row-major matrix, this method * will (internally) transpose the matrix. If this transposition is not * necessary (i.e., you want to pass in a row-major matrix), pass 'false' for * the transposeData parameter. * * @param data Column-major input data (or row-major input data if rowMajor = * true). * @param responses A vector of targets. * @param beta Vector to store the solution (the coefficients) in. * @param rowMajor Set to false if the data is row-major. */ void Regress(const arma::mat& data, const arma::vec& responses, arma::vec& beta, const bool transposeData = true); //! Access the set of active dimensions. const std::vector& ActiveSet() const { return activeSet; } //! Access the set of coefficients after each iteration; the solution is the //! last element. const std::vector& BetaPath() const { return betaPath; } //! Access the set of values for lambda1 after each iteration; the solution is //! the last element. const std::vector& LambdaPath() const { return lambdaPath; } //! Access the upper triangular cholesky factor const arma::mat& MatUtriCholFactor() const { return matUtriCholFactor; } // Returns a string representation of this object. std::string ToString() const; private: //! Gram matrix. arma::mat matGramInternal; //! Reference to the Gram matrix we will use. const arma::mat& matGram; //! Upper triangular cholesky factor; initially 0x0 matrix. arma::mat matUtriCholFactor; //! Whether or not to use Cholesky decomposition when solving linear system. bool useCholesky; //! True if this is the LASSO problem. bool lasso; //! Regularization parameter for l1 penalty. double lambda1; //! True if this is the elastic net problem. bool elasticNet; //! Regularization parameter for l2 penalty. double lambda2; //! Tolerance for main loop. double tolerance; //! Solution path. std::vector betaPath; //! Value of lambda_1 for each solution in solution path. std::vector lambdaPath; //! Active set of dimensions. std::vector activeSet; //! Active set membership indicator (for each dimension). std::vector isActive; // Set of variables that are ignored (if any). //! Set of ignored variables (for dimensions in span{active set dimensions}). std::vector ignoreSet; //! Membership indicator for set of ignored variables. std::vector isIgnored; /** * Remove activeVarInd'th element from active set. * * @param activeVarInd Index of element to remove from active set. */ void Deactivate(const size_t activeVarInd); /** * Add dimension varInd to active set. * * @param varInd Dimension to add to active set. */ void Activate(const size_t varInd); /** * Add dimension varInd to ignores set (never removed). * * @param varInd Dimension to add to ignores set. */ void Ignore(const size_t varInd); // compute "equiangular" direction in output space void ComputeYHatDirection(const arma::mat& matX, const arma::vec& betaDirection, arma::vec& yHatDirection); // interpolate to compute last solution vector void InterpolateBeta(); void CholeskyInsert(const arma::vec& newX, const arma::mat& X); void CholeskyInsert(double sqNormNewX, const arma::vec& newGramCol); void GivensRotate(const arma::vec::fixed<2>& x, arma::vec::fixed<2>& rotatedX, arma::mat& G); void CholeskyDelete(const size_t colToKill); }; }; // namespace regression }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/linear_regression/0000755000176200001440000000000013647512751023105 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/linear_regression/linear_regression.hpp0000644000176200001440000000736113647512751027337 0ustar liggesusers/** * @file linear_regression.hpp * @author James Cline * * Simple least-squares linear regression. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_LINEAR_REGRESSION_LINEAR_REGRESSION_HPP #define __MLPACK_METHODS_LINEAR_REGRESSION_LINEAR_REGRESSION_HPP #include namespace mlpack { namespace regression /** Regression methods. */ { /** * A simple linear regression algorithm using ordinary least squares. * Optionally, this class can perform ridge regression, if the lambda parameter * is set to a number greater than zero. */ class LinearRegression { public: /** * Creates the model. * * @param predictors X, matrix of data points to create B with. * @param responses y, the measured data for each point in X */ LinearRegression(const arma::mat& predictors, const arma::vec& responses, const double lambda = 0); /** * Initialize the model from a file. * * @param filename the name of the file to load the model from. */ //LinearRegression(const std::string& filename); /** * Copy constructor. * * @param linearRegression the other instance to copy parameters from. */ LinearRegression(const LinearRegression& linearRegression); /** * Empty constructor. */ LinearRegression() { } /** * Calculate y_i for each data point in points. * * @param points the data points to calculate with. * @param predictions y, will contain calculated values on completion. */ void Predict(const arma::mat& points, arma::vec& predictions) const; /** * Calculate the L2 squared error on the given predictors and responses using * this linear regression model. This calculation returns * * \f[ * (1 / n) * \| y - X B \|^2_2 * \f] * * where \f$ y \f$ is the responses vector, \f$ X \f$ is the matrix of * predictors, and \f$ B \f$ is the parameters of the trained linear * regression model. * * As this number decreases to 0, the linear regression fit is better. * * @param predictors Matrix of predictors (X). * @param responses Vector of responses (y). */ double ComputeError(const arma::mat& points, const arma::vec& responses) const; //! Return the parameters (the b vector). const arma::vec& Parameters() const { return parameters; } //! Modify the parameters (the b vector). arma::vec& Parameters() { return parameters; } //! Return the Tikhonov regularization parameter for ridge regression. double Lambda() const { return lambda; } //! Modify the Tikhonov regularization parameter for ridge regression. double& Lambda() { return lambda; } // Returns a string representation of this object. std::string ToString() const; private: /** * The calculated B. * Initialized and filled by constructor to hold the least squares solution. */ arma::vec parameters; /** * The Tikhonov regularization parameter for ridge regression (0 for linear * regression). */ double lambda; }; }; // namespace linear_regression }; // namespace mlpack #endif // __MLPACK_METHODS_LINEAR_REGRESSCLIN_HPP RcppMLPACK/inst/include/mlpack/methods/emst/0000755000176200001440000000000013647512751020343 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/emst/dtb_rules.hpp0000644000176200001440000001460613647512751023046 0ustar liggesusers/** * @file dtb.hpp * @author Bill March (march@gatech.edu) * * Tree traverser rules for the DualTreeBoruvka algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_RULES_HPP #define __MLPACK_METHODS_EMST_DTB_RULES_HPP #include #include "../neighbor_search/ns_traversal_info.hpp" namespace mlpack { namespace emst { template class DTBRules { public: DTBRules(const arma::mat& dataSet, UnionFind& connections, arma::vec& neighborsDistances, arma::Col& neighborsInComponent, arma::Col& neighborsOutComponent, MetricType& metric); double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Get the score for recursion order, passing the base case result (in the * situation where it may be needed to calculate the recursion order). A low * score indicates priority for recursion, while DBL_MAX indicates that the * node should not be recursed into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore); /** * Get the score for recursion order. A low score indicates priority for * recursionm while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Get the score for recursion order, passing the base case result (in the * situation where it may be needed to calculate the recursion order). A low * score indicates priority for recursion, while DBL_MAX indicates that the * node should not be recursed into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param baseCaseResult Result of BaseCase(queryIndex, referenceNode). */ double Score(TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Socre() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; typedef neighbor::NeighborSearchTraversalInfo TraversalInfoType; const TraversalInfoType& TraversalInfo() const { return traversalInfo; } TraversalInfoType& TraversalInfo() { return traversalInfo; } //! Get the number of base cases performed. size_t BaseCases() const { return baseCases; } //! Modify the number of base cases performed. size_t& BaseCases() { return baseCases; } //! Get the number of node combinations that have been scored. size_t Scores() const { return scores; } //! Modify the number of node combinations that have been scored. size_t& Scores() { return scores; } private: //! The data points. const arma::mat& dataSet; //! Stores the tree structure so far UnionFind& connections; //! The distance to the candidate nearest neighbor for each component. arma::vec& neighborsDistances; //! The index of the point in the component that is an endpoint of the //! candidate edge. arma::Col& neighborsInComponent; //! The index of the point outside of the component that is an endpoint //! of the candidate edge. arma::Col& neighborsOutComponent; //! The instantiated metric. MetricType& metric; /** * Update the bound for the given query node. */ inline double CalculateBound(TreeType& queryNode) const; TraversalInfoType traversalInfo; //! The number of base cases calculated. size_t baseCases; //! The number of node combinations that have been scored. size_t scores; }; // class DTBRules } // emst namespace } // mlpack namespace #include "dtb_rules_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/emst/edge_pair.hpp0000644000176200001440000000462413647512751023001 0ustar liggesusers/** * @file edge_pair.hpp * * @author Bill March (march@gatech.edu) * * This file contains utilities necessary for all of the minimum spanning tree * algorithms. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_EDGE_PAIR_HPP #define __MLPACK_METHODS_EMST_EDGE_PAIR_HPP #include #include "union_find.hpp" namespace mlpack { namespace emst { /** * An edge pair is simply two indices and a distance. It is used as the * basic element of an edge list when computing a minimum spanning tree. */ class EdgePair { private: //! Lesser index. size_t lesser; //! Greater index. size_t greater; //! Distance between two indices. double distance; public: /** * Initialize an EdgePair with two indices and a distance. The indices are * called lesser and greater, implying that they be sorted before calling * Init. However, this is not necessary for functionality; it is just a way * to keep the edge list organized in other code. */ EdgePair(const size_t lesser, const size_t greater, const double dist) : lesser(lesser), greater(greater), distance(dist) { //Log::Assert(lesser != greater, //"EdgePair::EdgePair(): indices cannot be equal."); } //! Get the lesser index. size_t Lesser() const { return lesser; } //! Modify the lesser index. size_t& Lesser() { return lesser; } //! Get the greater index. size_t Greater() const { return greater; } //! Modify the greater index. size_t& Greater() { return greater; } //! Get the distance. double Distance() const { return distance; } //! Modify the distance. double& Distance() { return distance; } }; // class EdgePair }; // namespace emst }; // namespace mlpack #endif // __MLPACK_METHODS_EMST_EDGE_PAIR_HPP RcppMLPACK/inst/include/mlpack/methods/emst/dtb_rules_impl.hpp0000644000176200001440000002203013647512751024055 0ustar liggesusers/** * @file dtb_impl.hpp * @author Bill March (march@gatech.edu) * * Tree traverser rules for the DualTreeBoruvka algorithm. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_RULES_IMPL_HPP #define __MLPACK_METHODS_EMST_DTB_RULES_IMPL_HPP namespace mlpack { namespace emst { template DTBRules:: DTBRules(const arma::mat& dataSet, UnionFind& connections, arma::vec& neighborsDistances, arma::Col& neighborsInComponent, arma::Col& neighborsOutComponent, MetricType& metric) : dataSet(dataSet), connections(connections), neighborsDistances(neighborsDistances), neighborsInComponent(neighborsInComponent), neighborsOutComponent(neighborsOutComponent), metric(metric), baseCases(0), scores(0) { // Nothing else to do. } template inline force_inline double DTBRules::BaseCase(const size_t queryIndex, const size_t referenceIndex) { // Check if the points are in the same component at this iteration. // If not, return the distance between them. Also, store a better result as // the current neighbor, if necessary. double newUpperBound = -1.0; // Find the index of the component the query is in. size_t queryComponentIndex = connections.Find(queryIndex); size_t referenceComponentIndex = connections.Find(referenceIndex); if (queryComponentIndex != referenceComponentIndex) { ++baseCases; double distance = metric.Evaluate(dataSet.col(queryIndex), dataSet.col(referenceIndex)); if (distance < neighborsDistances[queryComponentIndex]) { //Log::Assert(queryIndex != referenceIndex); neighborsDistances[queryComponentIndex] = distance; neighborsInComponent[queryComponentIndex] = queryIndex; neighborsOutComponent[queryComponentIndex] = referenceIndex; } } if (newUpperBound < neighborsDistances[queryComponentIndex]) newUpperBound = neighborsDistances[queryComponentIndex]; //Log::Assert(newUpperBound >= 0.0); return newUpperBound; } template double DTBRules::Score(const size_t queryIndex, TreeType& referenceNode) { size_t queryComponentIndex = connections.Find(queryIndex); // If the query belongs to the same component as all of the references, // then prune. The cast is to stop a warning about comparing unsigned to // signed values. if (queryComponentIndex == (size_t) referenceNode.Stat().ComponentMembership()) return DBL_MAX; const arma::vec queryPoint = dataSet.unsafe_col(queryIndex); const double distance = referenceNode.MinDistance(queryPoint); // If all the points in the reference node are farther than the candidate // nearest neighbor for the query's component, we prune. return neighborsDistances[queryComponentIndex] < distance ? DBL_MAX : distance; } template double DTBRules::Score(const size_t queryIndex, TreeType& referenceNode, const double baseCaseResult) { // I don't really understand the last argument here // It just gets passed in the distance call, otherwise this function // is the same as the one above. size_t queryComponentIndex = connections.Find(queryIndex); // If the query belongs to the same component as all of the references, // then prune. if (queryComponentIndex == referenceNode.Stat().ComponentMembership()) return DBL_MAX; const arma::vec queryPoint = dataSet.unsafe_col(queryIndex); const double distance = referenceNode.MinDistance(queryPoint, baseCaseResult); // If all the points in the reference node are farther than the candidate // nearest neighbor for the query's component, we prune. return (neighborsDistances[queryComponentIndex] < distance) ? DBL_MAX : distance; } template double DTBRules::Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) { // We don't need to check component membership again, because it can't // change inside a single iteration. return (oldScore > neighborsDistances[connections.Find(queryIndex)]) ? DBL_MAX : oldScore; } template double DTBRules::Score(TreeType& queryNode, TreeType& referenceNode) { // If all the queries belong to the same component as all the references // then we prune. if ((queryNode.Stat().ComponentMembership() >= 0) && (queryNode.Stat().ComponentMembership() == referenceNode.Stat().ComponentMembership())) return DBL_MAX; ++scores; const double distance = queryNode.MinDistance(&referenceNode); const double bound = CalculateBound(queryNode); // If all the points in the reference node are farther than the candidate // nearest neighbor for all queries in the node, we prune. return (bound < distance) ? DBL_MAX : distance; } template double DTBRules::Score(TreeType& queryNode, TreeType& referenceNode, const double baseCaseResult) { // If all the queries belong to the same component as all the references // then we prune. if ((queryNode.Stat().ComponentMembership() >= 0) && (queryNode.Stat().ComponentMembership() == referenceNode.Stat().ComponentMembership())) return DBL_MAX; ++scores; const double distance = queryNode.MinDistance(referenceNode, baseCaseResult); const double bound = CalculateBound(queryNode); // If all the points in the reference node are farther than the candidate // nearest neighbor for all queries in the node, we prune. return (bound < distance) ? DBL_MAX : distance; } template double DTBRules::Rescore(TreeType& queryNode, TreeType& /* referenceNode */, const double oldScore) const { const double bound = CalculateBound(queryNode); return (oldScore > bound) ? DBL_MAX : oldScore; } // Calculate the bound for a given query node in its current state and update // it. template inline double DTBRules::CalculateBound( TreeType& queryNode) const { double worstPointBound = -DBL_MAX; double bestPointBound = DBL_MAX; double worstChildBound = -DBL_MAX; double bestChildBound = DBL_MAX; // Now, find the best and worst point bounds. for (size_t i = 0; i < queryNode.NumPoints(); ++i) { const size_t pointComponent = connections.Find(queryNode.Point(i)); const double bound = neighborsDistances[pointComponent]; if (bound > worstPointBound) worstPointBound = bound; if (bound < bestPointBound) bestPointBound = bound; } // Find the best and worst child bounds. for (size_t i = 0; i < queryNode.NumChildren(); ++i) { const double maxBound = queryNode.Child(i).Stat().MaxNeighborDistance(); if (maxBound > worstChildBound) worstChildBound = maxBound; const double minBound = queryNode.Child(i).Stat().MinNeighborDistance(); if (minBound < bestChildBound) bestChildBound = minBound; } // Now calculate the actual bounds. const double worstBound = std::max(worstPointBound, worstChildBound); const double bestBound = std::min(bestPointBound, bestChildBound); // We must check that bestBound != DBL_MAX; otherwise, we risk overflow. const double bestAdjustedBound = (bestBound == DBL_MAX) ? DBL_MAX : bestBound + 2 * queryNode.FurthestDescendantDistance(); // Update the relevant quantities in the node. queryNode.Stat().MaxNeighborDistance() = worstBound; queryNode.Stat().MinNeighborDistance() = bestBound; queryNode.Stat().Bound() = std::min(worstBound, bestAdjustedBound); return queryNode.Stat().Bound(); } }; // namespace emst }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/emst/union_find.hpp0000644000176200001440000000564113647512751023212 0ustar liggesusers/** * @file union_find.hpp * @author Bill March (march@gatech.edu) * * Implements a union-find data structure. This structure tracks the components * of a graph. Each point in the graph is initially in its own component. * Calling unionfind.Union(x, y) unites the components indexed by x and y. * unionfind.Find(x) returns the index of the component containing point x. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_UNION_FIND_HPP #define __MLPACK_METHODS_EMST_UNION_FIND_HPP #include namespace mlpack { namespace emst { /** * A Union-Find data structure. See Cormen, Rivest, & Stein for details. The * structure tracks the components of a graph. Each point in the graph is * initially in its own component. Calling Union(x, y) unites the components * indexed by x and y. Find(x) returns the index of the component containing * point x. */ class UnionFind { private: arma::Col parent; arma::ivec rank; public: //! Construct the object with the given size. UnionFind(const size_t size) : parent(size), rank(size) { for (size_t i = 0; i < size; ++i) { parent[i] = i; rank[i] = 0; } } //! Destroy the object (nothing to do). ~UnionFind() { } /** * Returns the component containing an element. * * @param x the component to be found * @return The index of the component containing x */ size_t Find(const size_t x) { if (parent[x] == x) { return x; } else { // This ensures that the tree has a small depth parent[x] = Find(parent[x]); return parent[x]; } } /** * Union the components containing x and y. * * @param x one component * @param y the other component */ void Union(const size_t x, const size_t y) { const size_t xRoot = Find(x); const size_t yRoot = Find(y); if (xRoot == yRoot) { return; } else if (rank[xRoot] == rank[yRoot]) { parent[yRoot] = parent[xRoot]; rank[xRoot] = rank[xRoot] + 1; } else if (rank[xRoot] > rank[yRoot]) { parent[yRoot] = xRoot; } else { parent[xRoot] = yRoot; } } }; // class UnionFind }; // namespace emst }; // namespace mlpack #endif // __MLPACK_METHODS_EMST_UNION_FIND_HPP RcppMLPACK/inst/include/mlpack/methods/emst/dtb_impl.hpp0000644000176200001440000002261413647512751022653 0ustar liggesusers/** * @file dtb_impl.hpp * @author Bill March (march@gatech.edu) * * Implementation of DTB. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_IMPL_HPP #define __MLPACK_METHODS_EMST_DTB_IMPL_HPP #include "dtb_rules.hpp" namespace mlpack { namespace emst { //! Call the tree constructor that does mapping. template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } /** * Takes in a reference to the data set. Copies the data, builds the tree, * and initializes all of the member variables. */ template DualTreeBoruvka::DualTreeBoruvka( const typename TreeType::Mat& dataset, const bool naive, const MetricType metric) : data((tree::TreeTraits::RearrangesDataset && !naive) ? dataCopy : dataset), ownTree(!naive), naive(naive), connections(dataset.n_cols), totalDist(0.0), metric(metric) { //Timer::Start("emst/tree_building"); if (!naive) { // Copy the dataset, if it will be modified during tree construction. if (tree::TreeTraits::RearrangesDataset) dataCopy = dataset; tree = BuildTree(const_cast(data), oldFromNew); } //Timer::Stop("emst/tree_building"); edges.reserve(data.n_cols - 1); // Set size. neighborsInComponent.set_size(data.n_cols); neighborsOutComponent.set_size(data.n_cols); neighborsDistances.set_size(data.n_cols); neighborsDistances.fill(DBL_MAX); } // Constructor template DualTreeBoruvka::DualTreeBoruvka( TreeType* tree, const typename TreeType::Mat& dataset, const MetricType metric) : data(dataset), tree(tree), ownTree(false), naive(false), connections(data.n_cols), totalDist(0.0), metric(metric) { edges.reserve(data.n_cols - 1); // Fill with EdgePairs. neighborsInComponent.set_size(data.n_cols); neighborsOutComponent.set_size(data.n_cols); neighborsDistances.set_size(data.n_cols); neighborsDistances.fill(DBL_MAX); } template DualTreeBoruvka::~DualTreeBoruvka() { if (ownTree) delete tree; } /** * Iteratively find the nearest neighbor of each component until the MST is * complete. */ template void DualTreeBoruvka::ComputeMST(arma::mat& results) { //Timer::Start("emst/mst_computation"); totalDist = 0; // Reset distance. typedef DTBRules RuleType; RuleType rules(data, connections, neighborsDistances, neighborsInComponent, neighborsOutComponent, metric); while (edges.size() < (data.n_cols - 1)) { if (naive) { // Full O(N^2) traversal. for (size_t i = 0; i < data.n_cols; ++i) for (size_t j = 0; j < data.n_cols; ++j) rules.BaseCase(i, j); } else { typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*tree, *tree); } AddAllEdges(); Cleanup(); Rcpp::Rcout << edges.size() << " edges found so far." << std::endl; if (!naive) { Rcpp::Rcout << rules.BaseCases() << " cumulative base cases." << std::endl; Rcpp::Rcout << rules.Scores() << " cumulative node combinations scored." << std::endl; } } //Timer::Stop("emst/mst_computation"); EmitResults(results); Rcpp::Rcout << "Total spanning tree length: " << totalDist << std::endl; } /** * Adds a single edge to the edge list */ template void DualTreeBoruvka::AddEdge(const size_t e1, const size_t e2, const double distance) { //Log::Assert((distance >= 0.0), // "DualTreeBoruvka::AddEdge(): distance cannot be negative."); if (e1 < e2) edges.push_back(EdgePair(e1, e2, distance)); else edges.push_back(EdgePair(e2, e1, distance)); } // AddEdge /** * Adds all the edges found in one iteration to the list of neighbors. */ template void DualTreeBoruvka::AddAllEdges() { for (size_t i = 0; i < data.n_cols; i++) { size_t component = connections.Find(i); size_t inEdge = neighborsInComponent[component]; size_t outEdge = neighborsOutComponent[component]; if (connections.Find(inEdge) != connections.Find(outEdge)) { //totalDist = totalDist + dist; // changed to make this agree with the cover tree code totalDist += neighborsDistances[component]; AddEdge(inEdge, outEdge, neighborsDistances[component]); connections.Union(inEdge, outEdge); } } } // AddAllEdges /** * Unpermute the edge list (if necessary) and output it to results. */ template void DualTreeBoruvka::EmitResults(arma::mat& results) { // Sort the edges. std::sort(edges.begin(), edges.end(), SortFun); //Log::Assert(edges.size() == data.n_cols - 1); results.set_size(3, edges.size()); // Need to unpermute the point labels. if (!naive && ownTree && tree::TreeTraits::RearrangesDataset) { for (size_t i = 0; i < (data.n_cols - 1); i++) { // Make sure the edge list stores the smaller index first to // make checking correctness easier size_t ind1 = oldFromNew[edges[i].Lesser()]; size_t ind2 = oldFromNew[edges[i].Greater()]; if (ind1 < ind2) { edges[i].Lesser() = ind1; edges[i].Greater() = ind2; } else { edges[i].Lesser() = ind2; edges[i].Greater() = ind1; } results(0, i) = edges[i].Lesser(); results(1, i) = edges[i].Greater(); results(2, i) = edges[i].Distance(); } } else { for (size_t i = 0; i < edges.size(); i++) { results(0, i) = edges[i].Lesser(); results(1, i) = edges[i].Greater(); results(2, i) = edges[i].Distance(); } } } // EmitResults /** * This function resets the values in the nodes of the tree nearest neighbor * distance and checks for fully connected nodes. */ template void DualTreeBoruvka::CleanupHelper(TreeType* tree) { // Reset the statistic information. tree->Stat().MaxNeighborDistance() = DBL_MAX; tree->Stat().MinNeighborDistance() = DBL_MAX; tree->Stat().Bound() = DBL_MAX; // Recurse into all children. for (size_t i = 0; i < tree->NumChildren(); ++i) CleanupHelper(&tree->Child(i)); // Get the component of the first child or point. Then we will check to see // if all other components of children and points are the same. const int component = (tree->NumChildren() != 0) ? tree->Child(0).Stat().ComponentMembership() : connections.Find(tree->Point(0)); // Check components of children. for (size_t i = 0; i < tree->NumChildren(); ++i) if (tree->Child(i).Stat().ComponentMembership() != component) return; // Check components of points. for (size_t i = 0; i < tree->NumPoints(); ++i) if (connections.Find(tree->Point(i)) != size_t(component)) return; // If we made it this far, all components are the same. tree->Stat().ComponentMembership() = component; } /** * The values stored in the tree must be reset on each iteration. */ template void DualTreeBoruvka::Cleanup() { for (size_t i = 0; i < data.n_cols; i++) neighborsDistances[i] = DBL_MAX; if (!naive) CleanupHelper(tree); } // convert the object to a string template std::string DualTreeBoruvka::ToString() const { std::ostringstream convert; convert << "DualTreeBoruvka [" << this << "]" << std::endl; convert << " Data: " << data.n_rows << "x" << data.n_cols <. */ #ifndef __MLPACK_METHODS_EMST_DTB_HPP #define __MLPACK_METHODS_EMST_DTB_HPP #include "dtb_stat.hpp" #include "edge_pair.hpp" #include #include #include namespace mlpack { namespace emst /** Euclidean Minimum Spanning Trees. */ { /** * Performs the MST calculation using the Dual-Tree Boruvka algorithm, using any * type of tree. * * For more information on the algorithm, see the following citation: * * @code * @inproceedings{ * author = {March, W.B., Ram, P., and Gray, A.G.}, * title = {{Fast Euclidean Minimum Spanning Tree: Algorithm, Analysis, * Applications.}}, * booktitle = {Proceedings of the 16th ACM SIGKDD International Conference * on Knowledge Discovery and Data Mining} * series = {KDD 2010}, * year = {2010} * } * @endcode * * General usage of this class might be like this: * * @code * extern arma::mat data; // We want to find the MST of this dataset. * DualTreeBoruvka<> dtb(data); // Create the tree with default options. * * // Find the MST. * arma::mat mstResults; * dtb.ComputeMST(mstResults); * @endcode * * More advanced usage of the class can use different types of trees, pass in an * already-built tree, or compute the MST using the O(n^2) naive algorithm. * * @tparam MetricType The metric to use. IMPORTANT: this hasn't really been * tested with anything other than the L2 metric, so user beware. Note that the * tree type needs to compute bounds using the same metric as the type * specified here. * @tparam TreeType Type of tree to use. Should use DTBStat as a statistic. */ template< typename MetricType = metric::EuclideanDistance, typename TreeType = tree::BinarySpaceTree, DTBStat> > class DualTreeBoruvka { private: //! Copy of the data (if necessary). typename TreeType::Mat dataCopy; //! Reference to the data (this is what should be used for accessing data). const typename TreeType::Mat& data; //! Pointer to the root of the tree. TreeType* tree; //! Indicates whether or not we "own" the tree. bool ownTree; //! Indicates whether or not O(n^2) naive mode will be used. bool naive; //! Edges. std::vector edges; // We must use vector with non-numerical types. //! Connections. UnionFind connections; //! Permutations of points during tree building. std::vector oldFromNew; //! List of edge nodes. arma::Col neighborsInComponent; //! List of edge nodes. arma::Col neighborsOutComponent; //! List of edge distances. arma::vec neighborsDistances; //! Total distance of the tree. double totalDist; //! The instantiated metric. MetricType metric; //! For sorting the edge list after the computation. struct SortEdgesHelper { bool operator()(const EdgePair& pairA, const EdgePair& pairB) { return (pairA.Distance() < pairB.Distance()); } } SortFun; public: /** * Create the tree from the given dataset. This copies the dataset to an * internal copy, because tree-building modifies the dataset. * * @param data Dataset to build a tree for. * @param naive Whether the computation should be done in O(n^2) naive mode. * @param leafSize The leaf size to be used during tree construction. */ DualTreeBoruvka(const typename TreeType::Mat& dataset, const bool naive = false, const MetricType metric = MetricType()); /** * Create the DualTreeBoruvka object with an already initialized tree. This * will not copy the dataset, and can save a little processing power. Naive * mode is not available as an option for this constructor; instead, to run * naive computation, construct a tree with all the points in one leaf (i.e. * leafSize = number of points). * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param tree Pre-built tree. * @param dataset Dataset corresponding to the pre-built tree. */ DualTreeBoruvka(TreeType* tree, const typename TreeType::Mat& dataset, const MetricType metric = MetricType()); /** * Delete the tree, if it was created inside the object. */ ~DualTreeBoruvka(); /** * Iteratively find the nearest neighbor of each component until the MST is * complete. The results will be a 3xN matrix (with N equal to the number of * edges in the minimum spanning tree). The first row will contain the lesser * index of the edge; the second row will contain the greater index of the * edge; and the third row will contain the distance between the two edges. * * @param results Matrix which results will be stored in. */ void ComputeMST(arma::mat& results); /** * Returns a string representation of this object. */ std::string ToString() const; private: /** * Adds a single edge to the edge list */ void AddEdge(const size_t e1, const size_t e2, const double distance); /** * Adds all the edges found in one iteration to the list of neighbors. */ void AddAllEdges(); /** * Unpermute the edge list and output it to results. */ void EmitResults(arma::mat& results); /** * This function resets the values in the nodes of the tree nearest neighbor * distance, and checks for fully connected nodes. */ void CleanupHelper(TreeType* tree); /** * The values stored in the tree must be reset on each iteration. */ void Cleanup(); }; // class DualTreeBoruvka }; // namespace emst }; // namespace mlpack #include "dtb_impl.hpp" #endif // __MLPACK_METHODS_EMST_DTB_HPP RcppMLPACK/inst/include/mlpack/methods/emst/dtb_stat.hpp0000644000176200001440000000675113647512751022671 0ustar liggesusers/** * @file dtb.hpp * @author Bill March (march@gatech.edu) * * DTBStat is the StatisticType used by trees when performing EMST. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_EMST_DTB_STAT_HPP #define __MLPACK_METHODS_EMST_DTB_STAT_HPP #include namespace mlpack { namespace emst { /** * A statistic for use with MLPACK trees, which stores the upper bound on * distance to nearest neighbors and the component which this node belongs to. */ class DTBStat { private: //! Upper bound on the distance to the nearest neighbor of any point in this //! node. double maxNeighborDistance; //! Lower bound on the distance to the nearest neighbor of any point in this //! node. double minNeighborDistance; //! Total bound for pruning. double bound; //! The index of the component that all points in this node belong to. This //! is the same index returned by UnionFind for all points in this node. If //! points in this node are in different components, this value will be //! negative. int componentMembership; public: /** * A generic initializer. Sets the maximum neighbor distance to its default, * and the component membership to -1 (no component). */ DTBStat() : maxNeighborDistance(DBL_MAX), minNeighborDistance(DBL_MAX), bound(DBL_MAX), componentMembership(-1) { } /** * This is called when a node is finished initializing. We set the maximum * neighbor distance to its default, and if possible, we set the component * membership of the node (if it has only one point and no children). * * @param node Node that has been finished. */ template DTBStat(const TreeType& node) : maxNeighborDistance(DBL_MAX), minNeighborDistance(DBL_MAX), bound(DBL_MAX), componentMembership( ((node.NumPoints() == 1) && (node.NumChildren() == 0)) ? node.Point(0) : -1) { } //! Get the maximum neighbor distance. double MaxNeighborDistance() const { return maxNeighborDistance; } //! Modify the maximum neighbor distance. double& MaxNeighborDistance() { return maxNeighborDistance; } //! Get the minimum neighbor distance. double MinNeighborDistance() const { return minNeighborDistance; } //! Modify the minimum neighbor distance. double& MinNeighborDistance() { return minNeighborDistance; } //! Get the total bound for pruning. double Bound() const { return bound; } //! Modify the total bound for pruning. double& Bound() { return bound; } //! Get the component membership of this node. int ComponentMembership() const { return componentMembership; } //! Modify the component membership of this node. int& ComponentMembership() { return componentMembership; } }; // class DTBStat }; // namespace emst }; // namespace mlpack #endif // __MLPACK_METHODS_EMST_DTB_STAT_HPP RcppMLPACK/inst/include/mlpack/methods/lsh/0000755000176200001440000000000013647512751020161 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/lsh/lsh_search.hpp0000644000176200001440000002366013647512751023014 0ustar liggesusers/** * @file lsh_search.hpp * @author Parikshit Ram * * Defines the LSHSearch class, which performs an approximate * nearest neighbor search for a queries in a query set * over a given dataset using Locality-sensitive hashing * with 2-stable distributions. * * The details of this method can be found in the following paper: * * @inproceedings{datar2004locality, * title={Locality-sensitive hashing scheme based on p-stable distributions}, * author={Datar, M. and Immorlica, N. and Indyk, P. and Mirrokni, V.S.}, * booktitle= * {Proceedings of the 12th Annual Symposium on Computational Geometry}, * pages={253--262}, * year={2004}, * organization={ACM} * } * * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_HPP #include #include #include #include #include namespace mlpack { namespace neighbor { /** * The LSHSearch class -- This class builds a hash on the reference set * and uses this hash to compute the distance-approximate nearest-neighbors * of the given queries. * * @tparam SortPolicy The sort policy for distances; see NearestNeighborSort. */ template class LSHSearch { public: /** * This function initializes the LSH class. It builds the hash on the * reference set with 2-stable distributions. See the individual functions * performing the hashing for details on how the hashing is done. * * @param referenceSet Set of reference points. * @param querySet Set of query points. * @param numProj Number of projections in each hash table (anything between * 10-50 might be a decent choice). * @param numTables Total number of hash tables (anything between 10-20 * should suffice). * @param hashWidth The width of hash for every table. If 0 (the default) is * provided, then the hash width is automatically obtained by computing * the average pairwise distance of 25 pairs. This should be a reasonable * upper bound on the nearest-neighbor distance in general. * @param secondHashSize The size of the second hash table. This should be a * large prime number. * @param bucketSize The size of the bucket in the second hash table. This is * the maximum number of points that can be hashed into single bucket. * Default values are already provided here. */ LSHSearch(const arma::mat& referenceSet, const arma::mat& querySet, const size_t numProj, const size_t numTables, const double hashWidth = 0.0, const size_t secondHashSize = 99901, const size_t bucketSize = 500); /** * This function initializes the LSH class. It builds the hash on the * reference set with 2-stable distributions. See the individual functions * performing the hashing for details on how the hashing is done. * * @param referenceSet Set of reference points and the set of queries. * @param numProj Number of projections in each hash table (anything between * 10-50 might be a decent choice). * @param numTables Total number of hash tables (anything between 10-20 * should suffice). * @param hashWidth The width of hash for every table. If 0 (the default) is * provided, then the hash width is automatically obtained by computing * the average pairwise distance of 25 pairs. This should be a reasonable * upper bound on the nearest-neighbor distance in general. * @param secondHashSize The size of the second hash table. This should be a * large prime number. * @param bucketSize The size of the bucket in the second hash table. This is * the maximum number of points that can be hashed into single bucket. * Default values are already provided here. */ LSHSearch(const arma::mat& referenceSet, const size_t numProj, const size_t numTables, const double hashWidth = 0.0, const size_t secondHashSize = 99901, const size_t bucketSize = 500); /** * Compute the nearest neighbors and store the output in the given matrices. * The matrices will be set to the size of n columns by k rows, where n is * the number of points in the query dataset and k is the number of neighbors * being searched for. * * @param k Number of neighbors to search for. * @param resultingNeighbors Matrix storing lists of neighbors for each query * point. * @param distances Matrix storing distances of neighbors for each query * point. * @param numTablesToSearch This parameter allows the user to have control * over the number of hash tables to be searched. This allows * the user to pick the number of tables it can afford for the time * available without having to build hashing for every table size. * By default, this is set to zero in which case all tables are * considered. */ void Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const size_t numTablesToSearch = 0); // Returns a string representation of this object. std::string ToString() const; private: /** * This function builds a hash table with two levels of hashing as presented * in the paper. This function first hashes the points with 'numProj' random * projections to a single hash table creating (key, point ID) pairs where the * key is a 'numProj'-dimensional integer vector. * * Then each key in this hash table is hashed into a second hash table using a * standard hash. * * This function does not have any parameters and relies on parameters which * are private members of this class, intialized during the class * intialization. */ void BuildHash(); /** * This function takes a query and hashes it into each of the hash tables to * get keys for the query and then the key is hashed to a bucket of the second * hash table and all the points (if any) in those buckets are collected as * the potential neighbor candidates. * * @param queryIndex The index of the query currently being processed. * @param referenceIndices The list of neighbor candidates obtained from * hashing the query into all the hash tables and eventually into * multiple buckets of the second hash table. */ void ReturnIndicesFromTable(const size_t queryIndex, arma::uvec& referenceIndices, size_t numTablesToSearch); /** * This is a helper function that computes the distance of the query to the * neighbor candidates and appropriately stores the best 'k' candidates * * @param queryIndex The index of the query in question * @param referenceIndex The index of the neighbor candidate in question */ double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * This is a helper function that efficiently inserts better neighbor * candidates into an existing set of neighbor candidates. This function is * only called by the 'BaseCase' function. * * @param queryIndex This is the index of the query being processed currently * @param pos The position of the neighbor candidate in the current list of * neighbor candidates. * @param neighbor The neighbor candidate that is being inserted into the list * of the best 'k' candidates for the query in question. * @param distance The distance of the query to the neighbor candidate. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); //! Reference dataset. const arma::mat& referenceSet; //! Query dataset (may not be given). const arma::mat& querySet; //! The number of projections const size_t numProj; //! The number of hash tables const size_t numTables; //! The std::vector containing the projection matrix of each table std::vector projections; // should be [numProj x dims] x numTables //! The list of the offset 'b' for each of the projection for each table arma::mat offsets; // should be numProj x numTables //! The hash width double hashWidth; //! The big prime representing the size of the second hash const size_t secondHashSize; //! The weights of the second hash arma::vec secondHashWeights; //! The bucket size of the second hash const size_t bucketSize; //! Instantiation of the metric. metric::SquaredEuclideanDistance metric; //! The final hash table; should be (< secondHashSize) x bucketSize. arma::Mat secondHashTable; //! The number of elements present in each hash bucket; should be //! secondHashSize. arma::Col bucketContentSize; //! For a particular hash value, points to the row in secondHashTable //! corresponding to this value. Should be secondHashSize. arma::Col bucketRowInHashTable; //! The pointer to the nearest neighbor distances. arma::mat* distancePtr; //! The pointer to the nearest neighbor indices. arma::Mat* neighborPtr; }; // class LSHSearch }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "lsh_search_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/lsh/lsh_search_impl.hpp0000644000176200001440000003567513647512751024046 0ustar liggesusers/** * @file lsh_search_impl.hpp * @author Parikshit Ram * * Implementation of the LSHSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_LSH_SEARCH_IMPL_HPP #include namespace mlpack { namespace neighbor { // Construct the object. template LSHSearch:: LSHSearch(const arma::mat& referenceSet, const arma::mat& querySet, const size_t numProj, const size_t numTables, const double hashWidthIn, const size_t secondHashSize, const size_t bucketSize) : referenceSet(referenceSet), querySet(querySet), numProj(numProj), numTables(numTables), hashWidth(hashWidthIn), secondHashSize(secondHashSize), bucketSize(bucketSize) { if (hashWidth == 0.0) // The user has not provided any value. { // Compute a heuristic hash width from the data. for (size_t i = 0; i < 25; i++) { size_t p1 = (size_t) math::RandInt(referenceSet.n_cols); size_t p2 = (size_t) math::RandInt(referenceSet.n_cols); hashWidth += std::sqrt(metric.Evaluate(referenceSet.unsafe_col(p1), referenceSet.unsafe_col(p2))); } hashWidth /= 25; } Rcpp::Rcout << "Hash width chosen as: " << hashWidth << std::endl; BuildHash(); } template LSHSearch:: LSHSearch(const arma::mat& referenceSet, const size_t numProj, const size_t numTables, const double hashWidthIn, const size_t secondHashSize, const size_t bucketSize) : referenceSet(referenceSet), querySet(referenceSet), numProj(numProj), numTables(numTables), hashWidth(hashWidthIn), secondHashSize(secondHashSize), bucketSize(bucketSize) { if (hashWidth == 0.0) // The user has not provided any value. { // Compute a heuristic hash width from the data. for (size_t i = 0; i < 25; i++) { size_t p1 = (size_t) math::RandInt(referenceSet.n_cols); size_t p2 = (size_t) math::RandInt(referenceSet.n_cols); hashWidth += std::sqrt(metric.Evaluate(referenceSet.unsafe_col(p1), referenceSet.unsafe_col(p2))); } hashWidth /= 25; } Rcpp::Rcout << "Hash width chosen as: " << hashWidth << std::endl; BuildHash(); } template void LSHSearch:: InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (distancePtr->n_rows - 1)) { int len = (distancePtr->n_rows - 1) - pos; memmove(distancePtr->colptr(queryIndex) + (pos + 1), distancePtr->colptr(queryIndex) + pos, sizeof(double) * len); memmove(neighborPtr->colptr(queryIndex) + (pos + 1), neighborPtr->colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. (*distancePtr)(pos, queryIndex) = distance; (*neighborPtr)(pos, queryIndex) = neighbor; } template inline force_inline double LSHSearch:: BaseCase(const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, then this search is only using one dataset // and we should not return identical points. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return 0.0; double distance = metric.Evaluate(querySet.unsafe_col(queryIndex), referenceSet.unsafe_col(referenceIndex)); // If this distance is better than any of the current candidates, the // SortDistance() function will give us the position to insert it into. arma::vec queryDist = distancePtr->unsafe_col(queryIndex); arma::Col queryIndices = neighborPtr->unsafe_col(queryIndex); size_t insertPosition = SortPolicy::SortDistance(queryDist, queryIndices, distance); // SortDistance() returns (size_t() - 1) if we shouldn't add it. if (insertPosition != (size_t() - 1)) InsertNeighbor(queryIndex, insertPosition, referenceIndex, distance); return distance; } template void LSHSearch:: ReturnIndicesFromTable(const size_t queryIndex, arma::uvec& referenceIndices, size_t numTablesToSearch) { // Decide on the number of tables to look into. if (numTablesToSearch == 0) // If no user input is given, search all. numTablesToSearch = numTables; // Sanity check to make sure that the existing number of tables is not // exceeded. if (numTablesToSearch > numTables) numTablesToSearch = numTables; // Hash the query in each of the 'numTablesToSearch' hash tables using the // 'numProj' projections for each table. This gives us 'numTablesToSearch' // keys for the query where each key is a 'numProj' dimensional integer // vector. // Compute the projection of the query in each table. arma::mat allProjInTables(numProj, numTablesToSearch); for (size_t i = 0; i < numTablesToSearch; i++) { allProjInTables.unsafe_col(i) = projections[i].t() * querySet.unsafe_col(queryIndex); } allProjInTables += offsets.cols(0, numTablesToSearch - 1); allProjInTables /= hashWidth; // Compute the hash value of each key of the query into a bucket of the // 'secondHashTable' using the 'secondHashWeights'. arma::rowvec hashVec = secondHashWeights.t() * arma::floor(allProjInTables); for (size_t i = 0; i < hashVec.n_elem; i++) hashVec[i] = (double) ((size_t) hashVec[i] % secondHashSize); //Log::Assert(hashVec.n_elem == numTablesToSearch); // For all the buckets that the query is hashed into, sequentially // collect the indices in those buckets. arma::Col refPointsConsidered; refPointsConsidered.zeros(referenceSet.n_cols); for (size_t i = 0; i < hashVec.n_elem; i++) // For all tables. { size_t hashInd = (size_t) hashVec[i]; if (bucketContentSize[hashInd] > 0) { // Pick the indices in the bucket corresponding to 'hashInd'. size_t tableRow = bucketRowInHashTable[hashInd]; assert(tableRow < secondHashSize); assert(tableRow < secondHashTable.n_rows); for (size_t j = 0; j < bucketContentSize[hashInd]; j++) refPointsConsidered[secondHashTable(tableRow, j)]++; } } referenceIndices = arma::find(refPointsConsidered > 0); } template void LSHSearch:: Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances, const size_t numTablesToSearch) { neighborPtr = &resultingNeighbors; distancePtr = &distances; // Set the size of the neighbor and distance matrices. neighborPtr->set_size(k, querySet.n_cols); distancePtr->set_size(k, querySet.n_cols); distancePtr->fill(SortPolicy::WorstDistance()); neighborPtr->fill(referenceSet.n_cols); size_t avgIndicesReturned = 0; //Timer::Start("computing_neighbors"); // Go through every query point sequentially. for (size_t i = 0; i < querySet.n_cols; i++) { // Hash every query into every hash table and eventually into the // 'secondHashTable' to obtain the neighbor candidates. arma::uvec refIndices; ReturnIndicesFromTable(i, refIndices, numTablesToSearch); // An informative book-keeping for the number of neighbor candidates // returned on average. avgIndicesReturned += refIndices.n_elem; // Sequentially go through all the candidates and save the best 'k' // candidates. for (size_t j = 0; j < refIndices.n_elem; j++) BaseCase(i, (size_t) refIndices[j]); } //Timer::Stop("computing_neighbors"); avgIndicesReturned /= querySet.n_cols; Rcpp::Rcout << avgIndicesReturned << " distinct indices returned on average." << std::endl; } template void LSHSearch:: BuildHash() { // The first level hash for a single table outputs a 'numProj'-dimensional // integer key for each point in the set -- (key, pointID) // The key creation details are presented below // // The second level hash is performed by hashing the key to // an integer in the range [0, 'secondHashSize'). // // This is done by creating a weight vector 'secondHashWeights' of // length 'numProj' with each entry an integer randomly chosen // between [0, 'secondHashSize'). // // Then the bucket for any key and its corresponding point is // given by % 'secondHashSize' // and the corresponding point ID is put into that bucket. // Step I: Prepare the second level hash. // Obtain the weights for the second hash. secondHashWeights = arma::floor(arma::randu(numProj) * (double) secondHashSize); // The 'secondHashTable' is initially an empty matrix of size // ('secondHashSize' x 'bucketSize'). But by only filling the buckets // as points land in them allows us to shrink the size of the // 'secondHashTable' at the end of the hashing. // Fill the second hash table n = referenceSet.n_cols. This is because no // point has index 'n' so the presence of this in the bucket denotes that // there are no more points in this bucket. secondHashTable.set_size(secondHashSize, bucketSize); secondHashTable.fill(referenceSet.n_cols); // Keep track of the size of each bucket in the hash. At the end of hashing // most buckets will be empty. bucketContentSize.zeros(secondHashSize); // Instead of putting the points in the row corresponding to the bucket, we // chose the next empty row and keep track of the row in which the bucket // lies. This allows us to stack together and slice out the empty buckets at // the end of the hashing. bucketRowInHashTable.set_size(secondHashSize); bucketRowInHashTable.fill(secondHashSize); // Keep track of number of non-empty rows in the 'secondHashTable'. size_t numRowsInTable = 0; // Step II: The offsets for all projections in all tables. // Since the 'offsets' are in [0, hashWidth], we obtain the 'offsets' // as randu(numProj, numTables) * hashWidth. offsets.randu(numProj, numTables); offsets *= hashWidth; // Step III: Create each hash table in the first level hash one by one and // putting them directly into the 'secondHashTable' for memory efficiency. for (size_t i = 0; i < numTables; i++) { // Step IV: Obtain the 'numProj' projections for each table. // For L2 metric, 2-stable distributions are used, and // the normal Z ~ N(0, 1) is a 2-stable distribution. arma::mat projMat; projMat.randn(referenceSet.n_rows, numProj); // Save the projection matrix for querying. projections.push_back(projMat); // Step V: create the 'numProj'-dimensional key for each point in each // table. // The following code performs the task of hashing each point to a // 'numProj'-dimensional integer key. Hence you get a ('numProj' x // 'referenceSet.n_cols') key matrix. // // For a single table, let the 'numProj' projections be denoted by 'proj_i' // and the corresponding offset be 'offset_i'. Then the key of a single // point is obtained as: // key = { floor( ( + offset_i) / 'hashWidth' ) forall i } arma::mat offsetMat = arma::repmat(offsets.unsafe_col(i), 1, referenceSet.n_cols); arma::mat hashMat = projMat.t() * referenceSet; hashMat += offsetMat; hashMat /= hashWidth; // Step VI: Putting the points in the 'secondHashTable' by hashing the key. // Now we hash every key, point ID to its corresponding bucket. arma::rowvec secondHashVec = secondHashWeights.t() * arma::floor(hashMat); // This gives us the bucket for the corresponding point ID. for (size_t j = 0; j < secondHashVec.n_elem; j++) secondHashVec[j] = (double)((size_t) secondHashVec[j] % secondHashSize); //Log::Assert(secondHashVec.n_elem == referenceSet.n_cols); // Insert the point in the corresponding row to its bucket in the // 'secondHashTable'. for (size_t j = 0; j < secondHashVec.n_elem; j++) { // This is the bucket number. size_t hashInd = (size_t) secondHashVec[j]; // The point ID is 'j'. // If this is currently an empty bucket, start a new row keep track of // which row corresponds to the bucket. if (bucketContentSize[hashInd] == 0) { // Start a new row for hash. bucketRowInHashTable[hashInd] = numRowsInTable; secondHashTable(numRowsInTable, 0) = j; numRowsInTable++; } else { // If bucket is already present in the 'secondHashTable', find the // corresponding row and insert the point ID in this row unless the // bucket is full, in which case, do nothing. if (bucketContentSize[hashInd] < bucketSize) secondHashTable(bucketRowInHashTable[hashInd], bucketContentSize[hashInd]) = j; } // Increment the count of the points in this bucket. if (bucketContentSize[hashInd] < bucketSize) bucketContentSize[hashInd]++; } // Loop over all points in the reference set. } // Loop over tables. // Step VII: Condensing the 'secondHashTable'. size_t maxBucketSize = 0; for (size_t i = 0; i < bucketContentSize.n_elem; i++) if (bucketContentSize[i] > maxBucketSize) maxBucketSize = bucketContentSize[i]; Rcpp::Rcout << "Final hash table size: (" << numRowsInTable << " x " << maxBucketSize << ")" << std::endl; secondHashTable.resize(numRowsInTable, maxBucketSize); } template std::string LSHSearch::ToString() const { std::ostringstream convert; convert << "LSHSearch [" << this << "]" << std::endl; convert << " Reference Set: " << referenceSet.n_rows << "x" ; convert << referenceSet.n_cols << std::endl; if (&referenceSet != &querySet) convert << " QuerySet: " << querySet.n_rows << "x" << querySet.n_cols << std::endl; convert << " Number of Projections: " << numProj << std::endl; convert << " Number of Tables: " << numTables << std::endl; convert << " Hash Width: " << hashWidth << std::endl; convert << " Metric: " << std::endl; convert << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/0000755000176200001440000000000013647512751022515 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/neighbor_search/neighbor_search_rules.hpp0000644000176200001440000001476613647512751027600 0ustar liggesusers/** * @file neighbor_search_rules.hpp * @author Ryan Curtin * * Defines the pruning rules and base case rules necessary to perform a * tree-based search (with an arbitrary tree) for the NeighborSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_RULES_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_RULES_HPP #include "ns_traversal_info.hpp" namespace mlpack { namespace neighbor { template class NeighborSearchRules { public: NeighborSearchRules(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric); /** * Get the distance from the query point to the reference point. * This will update the "neighbor" matrix with the new point if appropriate * and will track the number of base cases (number of points evaluated). * * @param queryIndex Index of query point. * @param referenceIndex Index of reference point. */ double BaseCase(const size_t queryIndex, const size_t referenceIndex); /** * Get the score for recursion order. A low score indicates priority for * recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. */ double Score(const size_t queryIndex, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryIndex Index of query point. * @param referenceNode Candidate node to be recursed into. * @param oldScore Old score produced by Score() (or Rescore()). */ double Rescore(const size_t queryIndex, TreeType& referenceNode, const double oldScore) const; /** * Get the score for recursion order. A low score indicates priority for * recursionm while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. */ double Score(TreeType& queryNode, TreeType& referenceNode); /** * Re-evaluate the score for recursion order. A low score indicates priority * for recursion, while DBL_MAX indicates that the node should not be recursed * into at all (it should be pruned). This is used when the score has already * been calculated, but another recursion may have modified the bounds for * pruning. So the old score is checked against the new pruning bound. * * @param queryNode Candidate query node to recurse into. * @param referenceNode Candidate reference node to recurse into. * @param oldScore Old score produced by Socre() (or Rescore()). */ double Rescore(TreeType& queryNode, TreeType& referenceNode, const double oldScore) const; //! Get the number of base cases that have been performed. size_t BaseCases() const { return baseCases; } //! Modify the number of base cases that have been performed. size_t& BaseCases() { return baseCases; } //! Get the number of scores that have been performed. size_t Scores() const { return scores; } //! Modify the number of scores that have been performed. size_t& Scores() { return scores; } //! Convenience typedef. typedef NeighborSearchTraversalInfo TraversalInfoType; //! Get the traversal info. const TraversalInfoType& TraversalInfo() const { return traversalInfo; } //! Modify the traversal info. TraversalInfoType& TraversalInfo() { return traversalInfo; } private: //! The reference set. const typename TreeType::Mat& referenceSet; //! The query set. const typename TreeType::Mat& querySet; //! The matrix the resultant neighbor indices should be stored in. arma::Mat& neighbors; //! The matrix the resultant neighbor distances should be stored in. arma::mat& distances; //! The instantiated metric. MetricType& metric; //! The last query point BaseCase() was called with. size_t lastQueryIndex; //! The last reference point BaseCase() was called with. size_t lastReferenceIndex; //! The last base case result. double lastBaseCase; //! The number of base cases that have been performed. size_t baseCases; //! The number of scores that have been performed. size_t scores; //! Traversal info for the parent combination; this is updated by the //! traversal before each call to Score(). TraversalInfoType traversalInfo; /** * Recalculate the bound for a given query node. */ double CalculateBound(TreeType& queryNode) const; /** * Insert a point into the neighbors and distances matrices; this is a helper * function. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ void InsertNeighbor(const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance); }; }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "neighbor_search_rules_impl.hpp" #endif // __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_RULES_HPP RcppMLPACK/inst/include/mlpack/methods/neighbor_search/neighbor_search_impl.hpp0000644000176200001440000003115513647512751027376 0ustar liggesusers/** * @file neighbor_search_impl.hpp * @author Ryan Curtin * * Implementation of Neighbor-Search class to perform all-nearest-neighbors on * two specified data sets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_IMPL_HPP #include #include "neighbor_search_rules.hpp" namespace mlpack { namespace neighbor { //! Call the tree constructor that does mapping. template TreeType* BuildTree( typename TreeType::Mat& dataset, std::vector& oldFromNew, typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == true, TreeType* >::type = 0) { return new TreeType(dataset, oldFromNew); } //! Call the tree constructor that does not do mapping. template TreeType* BuildTree( const typename TreeType::Mat& dataset, const std::vector& /* oldFromNew */, const typename boost::enable_if_c< tree::TreeTraits::RearrangesDataset == false, TreeType* >::type = 0) { return new TreeType(dataset); } // Construct the object. template NeighborSearch:: NeighborSearch(const typename TreeType::Mat& referenceSetIn, const typename TreeType::Mat& querySetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? queryCopy : querySetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), // False if a tree was passed. If naive, then no trees. hasQuerySet(true), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric) { // C++11 will allow us to call out to other constructors so we can avoid this // copy/paste problem. // We'll time tree building, but only if we are building trees. //Timer::Start("tree_building"); // Copy the datasets, if they will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) { referenceCopy = referenceSetIn; queryCopy = querySetIn; } // If not in naive mode, then we need to build trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = BuildTree( const_cast(querySet), oldFromNewQueries); } // Stop the timer we started above (if we need to). //Timer::Stop("tree_building"); } // Construct the object. template NeighborSearch:: NeighborSearch(const typename TreeType::Mat& referenceSetIn, const bool naive, const bool singleMode, const MetricType metric) : referenceSet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), querySet(tree::TreeTraits::RearrangesDataset ? referenceCopy : referenceSetIn), referenceTree(NULL), queryTree(NULL), treeOwner(!naive), // If naive, then we are not building any trees. hasQuerySet(false), naive(naive), singleMode(!naive && singleMode), // No single mode if naive. metric(metric) { // We'll time tree building, but only if we are building trees. //Timer::Start("tree_building"); // Copy the dataset, if it will be modified during tree building. if (tree::TreeTraits::RearrangesDataset) referenceCopy = referenceSetIn; // If not in naive mode, then we may need to construct trees. if (!naive) { // The const_cast is safe; if RearrangesDataset == false, then it'll be // casted back to const anyway, and if not, referenceSet points to // referenceCopy, which isn't const. referenceTree = BuildTree( const_cast(referenceSet), oldFromNewReferences); if (!singleMode) queryTree = new TreeType(*referenceTree); } // Stop the timer we started above. //Timer::Stop("tree_building"); } // Construct the object. template NeighborSearch::NeighborSearch( TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(querySet), referenceTree(referenceTree), queryTree(queryTree), treeOwner(false), hasQuerySet(true), naive(false), singleMode(singleMode), metric(metric) { // Nothing else to initialize. } // Construct the object. template NeighborSearch::NeighborSearch( TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode, const MetricType metric) : referenceSet(referenceSet), querySet(referenceSet), referenceTree(referenceTree), queryTree(NULL), treeOwner(false), hasQuerySet(false), // In this case we will own a tree, if singleMode. naive(false), singleMode(singleMode), metric(metric) { //Timer::Start("tree_building"); // The query tree cannot be the same as the reference tree. if (referenceTree && !singleMode) queryTree = new TreeType(*referenceTree); //Timer::Stop("tree_building"); } /** * The tree is the only member we may be responsible for deleting. The others * will take care of themselves. */ template NeighborSearch::~NeighborSearch() { if (treeOwner) { if (referenceTree) delete referenceTree; if (queryTree) delete queryTree; } else if (!treeOwner && !hasQuerySet && !(singleMode || naive)) { // We replicated the reference tree to create a query tree. delete queryTree; } } /** * Computes the best neighbors and stores them in resultingNeighbors and * distances. */ template void NeighborSearch::Search( const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances) { //Timer::Start("computing_neighbors"); // If we have built the trees ourselves, then we will have to map all the // indices back to their original indices when this computation is finished. // To avoid an extra copy, we will store the neighbors and distances in a // separate matrix. arma::Mat* neighborPtr = &resultingNeighbors; arma::mat* distancePtr = &distances; // Mapping is only necessary if the tree rearranges points. if (tree::TreeTraits::RearrangesDataset) { if (treeOwner && !(singleMode && hasQuerySet)) distancePtr = new arma::mat; // Query indices need to be mapped. if (treeOwner) neighborPtr = new arma::Mat; // All indices need mapping. } // Set the size of the neighbor and distance matrices. neighborPtr->set_size(k, querySet.n_cols); neighborPtr->fill(size_t() - 1); distancePtr->set_size(k, querySet.n_cols); distancePtr->fill(SortPolicy::WorstDistance()); // Create the helper object for the tree traversal. typedef NeighborSearchRules RuleType; RuleType rules(referenceSet, querySet, *neighborPtr, *distancePtr, metric); if (naive) { // The naive brute-force traversal. for (size_t i = 0; i < querySet.n_cols; ++i) for (size_t j = 0; j < referenceSet.n_cols; ++j) rules.BaseCase(i, j); } else if (singleMode) { // The search doesn't work if the root node is also a leaf node. // if this is the case, it is suggested that you use the naive method. assert(!(referenceTree->IsLeaf())); // Create the traverser. typename TreeType::template SingleTreeTraverser traverser(rules); // Now have it traverse for each point. for (size_t i = 0; i < querySet.n_cols; ++i) traverser.Traverse(i, *referenceTree); Rcpp::Rcout << rules.Scores() << " node combinations were scored.\n"; Rcpp::Rcout << rules.BaseCases() << " base cases were calculated.\n"; } else // Dual-tree recursion. { // Create the traverser. typename TreeType::template DualTreeTraverser traverser(rules); traverser.Traverse(*queryTree, *referenceTree); Rcpp::Rcout << rules.Scores() << " node combinations were scored.\n"; Rcpp::Rcout << rules.BaseCases() << " base cases were calculated.\n"; } //Timer::Stop("computing_neighbors"); // Now, do we need to do mapping of indices? if (!treeOwner || !tree::TreeTraits::RearrangesDataset) { // No mapping needed. We are done. return; } else if (treeOwner && hasQuerySet && !singleMode) // Map both sets. { // Set size of output matrices correctly. resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewQueries[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewQueries[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrices. delete neighborPtr; delete distancePtr; } else if (treeOwner && !hasQuerySet) { resultingNeighbors.set_size(k, querySet.n_cols); distances.set_size(k, querySet.n_cols); for (size_t i = 0; i < distances.n_cols; i++) { // Map distances (copy a column). distances.col(oldFromNewReferences[i]) = distancePtr->col(i); // Map indices of neighbors. for (size_t j = 0; j < distances.n_rows; j++) { resultingNeighbors(j, oldFromNewReferences[i]) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrices. delete neighborPtr; delete distancePtr; } else if (treeOwner && hasQuerySet && singleMode) // Map only references. { // Set size of neighbor indices matrix correctly. resultingNeighbors.set_size(k, querySet.n_cols); // Map indices of neighbors. for (size_t i = 0; i < resultingNeighbors.n_cols; i++) { for (size_t j = 0; j < resultingNeighbors.n_rows; j++) { resultingNeighbors(j, i) = oldFromNewReferences[(*neighborPtr)(j, i)]; } } // Finished with temporary matrix. delete neighborPtr; } } // Search //Return a String of the Object. template std::string NeighborSearch::ToString() const { std::ostringstream convert; convert << "NearestNeighborSearch [" << this << "]" << std::endl; convert << " Reference Set: " << referenceSet.n_rows << "x" ; convert << referenceSet.n_cols << std::endl; if (&referenceSet != &querySet) convert << " QuerySet: " << querySet.n_rows << "x" << querySet.n_cols << std::endl; convert << " Reference Tree: " << referenceTree << std::endl; if (&referenceTree != &queryTree) convert << " QueryTree: " << queryTree << std::endl; convert << " Tree Owner: " << treeOwner << std::endl; convert << " Naive: " << naive << std::endl; convert << " Metric: " << std::endl; convert << mlpack::util::Indent(metric.ToString(),2); return convert.str(); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/sort_policies/0000755000176200001440000000000013647512751025373 5ustar liggesusersRcppMLPACK/inst/include/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort_impl.hpp0000644000176200001440000000517113647512751033516 0ustar liggesusers/** * @file nearest_neighbor_sort_impl.hpp * @author Ryan Curtin * * Implementation of templated methods for the NearestNeighborSort SortPolicy * class for the NeighborSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_NEIGHBOR_NEAREST_NEIGHBOR_SORT_IMPL_HPP #define __MLPACK_NEIGHBOR_NEAREST_NEIGHBOR_SORT_IMPL_HPP namespace mlpack { namespace neighbor { template inline double NearestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return queryNode->MinDistance(referenceNode); } template inline double NearestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance) { return queryNode->MinDistance(referenceNode, centerToCenterDistance); } template inline double NearestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* /* referenceNode */, const TreeType* referenceChildNode, const double centerToCenterDistance) { return queryNode->MinDistance(referenceChildNode, centerToCenterDistance) - referenceChildNode->ParentDistance(); } template inline double NearestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return referenceNode->MinDistance(point); } template inline double NearestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode, const double pointToCenterDistance) { return referenceNode->MinDistance(point, pointToCenterDistance); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/sort_policies/nearest_neighbor_sort.hpp0000644000176200001440000001456213647512751032501 0ustar liggesusers/** * @file nearest_neighbor_sort.hpp * @author Ryan Curtin * * Implementation of the SortPolicy class for NeighborSearch; in this case, the * nearest neighbors are those that are most important. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_SORT_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_SORT_HPP #include namespace mlpack { namespace neighbor { /** * This class implements the necessary methods for the SortPolicy template * parameter of the NeighborSearch class. The sorting policy here is that the * minimum distance is the best (so, when used with NeighborSearch, the output * is nearest neighbors). * * This class is also meant to serve as a guide to implement a custom * SortPolicy. All of the methods implemented here must be implemented by any * other SortPolicy classes. */ class NearestNeighborSort { public: /** * Return the index in the vector where the new distance should be inserted, * or (size_t() - 1) if it should not be inserted (i.e. if it is not any * better than any of the existing points in the list). The list should be * sorted such that the best point is the first in the list. The actual * insertion is not performed. * * @param list Vector of existing distance points, sorted such that the best * point is first in the list. * @param new_distance Distance to try to insert * * @return size_t containing the position to insert into, or (size_t() - 1) * if the new distance should not be inserted. */ static size_t SortDistance(const arma::vec& list, const arma::Col& indices, double newDistance); /** * Return whether or not value is "better" than ref. In this case, that means * that the value is less than the reference. * * @param value Value to compare * @param ref Value to compare with * * @return bool indicating whether or not (value < ref). */ static inline bool IsBetter(const double value, const double ref) { return (value < ref); } /** * Return the best possible distance between two nodes. In our case, this is * the minimum distance between the two tree nodes using the given distance * function. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode); /** * Return the best possible distance between two nodes, given that the * distance between the centers of the two nodes has already been calculated. * This is used in conjunction with trees that have self-children (like cover * trees). */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance); /** * Return the best possible distance between the query node and the reference * child node given the base case distance between the query node and the * reference node. TreeType::ParentDistance() must be implemented to use * this. * * @param queryNode Query node. * @param referenceNode Reference node. * @param referenceChildNode Child of reference node which is being inspected. * @param centerToCenterDistance Distance between centers of query node and * reference node. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const TreeType* referenceChildNode, const double centerToCenterDistance); /** * Return the best possible distance between a node and a point. In our case, * this is the minimum distance between the tree node and the point using the * given distance function. */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode); /** * Return the best possible distance between a point and a node, given that * the distance between the point and the center of the node has already been * calculated. This is used in conjunction with trees that have * self-children (like cover trees). */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode, const double pointToCenterDistance); /** * Return what should represent the worst possible distance with this * particular sort policy. In our case, this should be the maximum possible * distance, DBL_MAX. * * @return DBL_MAX */ static inline double WorstDistance() { return DBL_MAX; } /** * Return what should represent the best possible distance with this * particular sort policy. In our case, this should be the minimum possible * distance, 0.0. * * @return 0.0 */ static inline double BestDistance() { return 0.0; } /** * Return the best combination of the two distances. */ static inline double CombineBest(const double a, const double b) { return std::max(a - b, 0.0); } /** * Return the worst combination of the two distances. */ static inline double CombineWorst(const double a, const double b) { if (a == DBL_MAX || b == DBL_MAX) return DBL_MAX; return a + b; } }; }; // namespace neighbor }; // namespace mlpack // Include implementation of templated functions. #include "nearest_neighbor_sort_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort_impl.hpp0000644000176200001440000000522613647512751033722 0ustar liggesusers/*** * @file furthest_neighbor_sort_impl.hpp * @author Ryan Curtin * * Implementation of templated methods for the FurthestNeighborSort SortPolicy * class for the NeighborSearch class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_IMPL_HPP namespace mlpack { namespace neighbor { template inline double FurthestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return queryNode->MaxDistance(referenceNode); } template inline double FurthestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance) { return queryNode->MaxDistance(referenceNode, centerToCenterDistance); } template inline double FurthestNeighborSort::BestNodeToNodeDistance( const TreeType* queryNode, const TreeType* referenceNode, const TreeType* referenceChildNode, const double centerToCenterDistance) { return queryNode->MaxDistance(referenceNode, centerToCenterDistance) + referenceChildNode->ParentDistance(); } template inline double FurthestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode) { // This is not implemented yet for the general case because the trees do not // accept arbitrary distance metrics. return referenceNode->MaxDistance(point); } template inline double FurthestNeighborSort::BestPointToNodeDistance( const VecType& point, const TreeType* referenceNode, const double pointToCenterDistance) { return referenceNode->MaxDistance(point, pointToCenterDistance); } }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/sort_policies/furthest_neighbor_sort.hpp0000644000176200001440000001427713647512751032707 0ustar liggesusers/** * @file furthest_neighbor_sort.hpp * @author Ryan Curtin * * Implementation of the SortPolicy class for NeighborSearch; in this case, the * furthest neighbors are those that are most important. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_FURTHEST_NEIGHBOR_SORT_HPP #include namespace mlpack { namespace neighbor { /** * This class implements the necessary methods for the SortPolicy template * parameter of the NeighborSearch class. The sorting policy here is that the * minimum distance is the best (so, when used with NeighborSearch, the output * is furthest neighbors). */ class FurthestNeighborSort { public: /** * Return the index in the vector where the new distance should be inserted, * or size_t() - 1 if it should not be inserted (i.e. if it is not any better * than any of the existing points in the list). The list should be sorted * such that the best point is the first in the list. The actual insertion is * not performed. * * @param list Vector of existing distance points, sorted such that the best * point is the first in the list. * @param new_distance Distance to try to insert. * * @return size_t containing the position to insert into, or (size_t() - 1) * if the new distance should not be inserted. */ static size_t SortDistance(const arma::vec& list, const arma::Col& indices, double newDistance); /** * Return whether or not value is "better" than ref. In this case, that means * that the value is greater than the reference. * * @param value Value to compare * @param ref Value to compare with * * @return bool indicating whether or not (value > ref). */ static inline bool IsBetter(const double value, const double ref) { return (value > ref); } /** * Return the best possible distance between two nodes. In our case, this is * the maximum distance between the two tree nodes using the given distance * function. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode); /** * Return the best possible distance between two nodes, given that the * distance between the centers of the two nodes has already been calculated. * This is used in conjunction with trees that have self-children (like cover * trees). */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const double centerToCenterDistance); /** * Return the best possible distance between the query node and the reference * child node given the base case distance between the query node and the * reference node. TreeType::ParentDistance() must be implemented to use * this. * * @param queryNode Query node. * @param referenceNode Reference node. * @param referenceChildNode Child of reference node which is being inspected. * @param centerToCenterDistance Distance between centers of query node and * reference node. */ template static double BestNodeToNodeDistance(const TreeType* queryNode, const TreeType* referenceNode, const TreeType* referenceChildNode, const double centerToCenterDistance); /** * Return the best possible distance between a node and a point. In our case, * this is the maximum distance between the tree node and the point using the * given distance function. */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode); /** * Return the best possible distance between a point and a node, given that * the distance between the point and the center of the node has already been * calculated. This is used in conjunction with trees that have * self-children (like cover trees). */ template static double BestPointToNodeDistance(const VecType& queryPoint, const TreeType* referenceNode, const double pointToCenterDistance); /** * Return what should represent the worst possible distance with this * particular sort policy. In our case, this should be the minimum possible * distance, 0. * * @return 0 */ static inline double WorstDistance() { return 0; } /** * Return what should represent the best possible distance with this * particular sort policy. In our case, this should be the maximum possible * distance, DBL_MAX. * * @return DBL_MAX */ static inline double BestDistance() { return DBL_MAX; } /** * Return the best combination of the two distances. */ static inline double CombineBest(const double a, const double b) { if (a == DBL_MAX || b == DBL_MAX) return DBL_MAX; return a + b; } /** * Return the worst combination of the two distances. */ static inline double CombineWorst(const double a, const double b) { return std::max(a - b, 0.0); } }; }; // namespace neighbor }; // namespace mlpack // Include implementation of templated functions. #include "furthest_neighbor_sort_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/typedef.hpp0000644000176200001440000000345513647512751024675 0ustar liggesusers/** * @file typedef.hpp * @author Ryan Curtin * * Simple typedefs describing template instantiations of the NeighborSearch * class which are commonly used. This is meant to be included by * neighbor_search.h but is a separate file for simplicity. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_NEIGHBOR_SEARCH_TYPEDEF_H #define __MLPACK_NEIGHBOR_SEARCH_TYPEDEF_H // In case someone included this directly. #include "neighbor_search.hpp" #include #include "sort_policies/nearest_neighbor_sort.hpp" #include "sort_policies/furthest_neighbor_sort.hpp" namespace mlpack { namespace neighbor { /** * The AllkNN class is the all-k-nearest-neighbors method. It returns L2 * distances (Euclidean distances) for each of the k nearest neighbors. */ typedef NeighborSearch AllkNN; /** * The AllkFN class is the all-k-furthest-neighbors method. It returns L2 * distances (Euclidean distances) for each of the k furthest neighbors. */ typedef NeighborSearch AllkFN; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/neighbor_search_stat.hpp0000644000176200001440000000706513647512751027413 0ustar liggesusers/** * @file neighbor_search.hpp * @author Ryan Curtin * * Defines the NeighborSearch class, which performs an abstract * nearest-neighbor-like query on two datasets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_STAT_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_STAT_HPP #include namespace mlpack { namespace neighbor { /** * Extra data for each node in the tree. For neighbor searches, each node only * needs to store a bound on neighbor distances. */ template class NeighborSearchStat { private: //! The first bound on the node's neighbor distances (B_1). This represents //! the worst candidate distance of any descendants of this node. double firstBound; //! The second bound on the node's neighbor distances (B_2). This represents //! a bound on the worst distance of any descendants of this node assembled //! using the best descendant candidate distance modified by the furthest //! descendant distance. double secondBound; //! The better of the two bounds. double bound; //! The last distance evaluation node. void* lastDistanceNode; //! The last distance evaluation. double lastDistance; public: /** * Initialize the statistic with the worst possible distance according to * our sorting policy. */ NeighborSearchStat() : firstBound(SortPolicy::WorstDistance()), secondBound(SortPolicy::WorstDistance()), bound(SortPolicy::WorstDistance()), lastDistanceNode(NULL), lastDistance(0.0) { } /** * Initialization for a fully initialized node. In this case, we don't need * to worry about the node. */ template NeighborSearchStat(TreeType& /* node */) : firstBound(SortPolicy::WorstDistance()), secondBound(SortPolicy::WorstDistance()), bound(SortPolicy::WorstDistance()), lastDistanceNode(NULL), lastDistance(0.0) { } //! Get the first bound. double FirstBound() const { return firstBound; } //! Modify the first bound. double& FirstBound() { return firstBound; } //! Get the second bound. double SecondBound() const { return secondBound; } //! Modify the second bound. double& SecondBound() { return secondBound; } //! Get the overall bound (the better of the two bounds). double Bound() const { return bound; } //! Modify the overall bound (it should be the better of the two bounds). double& Bound() { return bound; } //! Get the last distance evaluation node. void* LastDistanceNode() const { return lastDistanceNode; } //! Modify the last distance evaluation node. void*& LastDistanceNode() { return lastDistanceNode; } //! Get the last distance calculation. double LastDistance() const { return lastDistance; } //! Modify the last distance calculation. double& LastDistance() { return lastDistance; } }; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp0000644000176200001440000004272213647512751030612 0ustar liggesusers/** * @file nearest_neighbor_rules_impl.hpp * @author Ryan Curtin * * Implementation of NearestNeighborRules. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_RULES_IMPL_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_RULES_IMPL_HPP // In case it hasn't been included yet. #include "neighbor_search_rules.hpp" namespace mlpack { namespace neighbor { template NeighborSearchRules::NeighborSearchRules( const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, arma::Mat& neighbors, arma::mat& distances, MetricType& metric) : referenceSet(referenceSet), querySet(querySet), neighbors(neighbors), distances(distances), metric(metric), lastQueryIndex(querySet.n_cols), lastReferenceIndex(referenceSet.n_cols), baseCases(0), scores(0) { // We must set the traversal info last query and reference node pointers to // something that is both invalid (i.e. not a tree node) and not NULL. We'll // use the this pointer. traversalInfo.LastQueryNode() = (TreeType*) this; traversalInfo.LastReferenceNode() = (TreeType*) this; } template inline force_inline // Absolutely MUST be inline so optimizations can happen. double NeighborSearchRules:: BaseCase(const size_t queryIndex, const size_t referenceIndex) { // If the datasets are the same, then this search is only using one dataset // and we should not return identical points. if ((&querySet == &referenceSet) && (queryIndex == referenceIndex)) return 0.0; // If we have already performed this base case, then do not perform it again. if ((lastQueryIndex == queryIndex) && (lastReferenceIndex == referenceIndex)) return lastBaseCase; double distance = metric.Evaluate(querySet.col(queryIndex), referenceSet.col(referenceIndex)); ++baseCases; // If this distance is better than any of the current candidates, the // SortDistance() function will give us the position to insert it into. arma::vec queryDist = distances.unsafe_col(queryIndex); arma::Col queryIndices = neighbors.unsafe_col(queryIndex); const size_t insertPosition = SortPolicy::SortDistance(queryDist, queryIndices, distance); // SortDistance() returns (size_t() - 1) if we shouldn't add it. if (insertPosition != (size_t() - 1)) InsertNeighbor(queryIndex, insertPosition, referenceIndex, distance); // Cache this information for the next time BaseCase() is called. lastQueryIndex = queryIndex; lastReferenceIndex = referenceIndex; lastBaseCase = distance; return distance; } template inline double NeighborSearchRules::Score( const size_t queryIndex, TreeType& referenceNode) { ++scores; // Count number of Score() calls. double distance; if (tree::TreeTraits::FirstPointIsCentroid) { // The first point in the tree is the centroid. So we can then calculate // the base case between that and the query point. double baseCase = -1.0; if (tree::TreeTraits::HasSelfChildren) { // If the parent node is the same, then we have already calculated the // base case. if ((referenceNode.Parent() != NULL) && (referenceNode.Point(0) == referenceNode.Parent()->Point(0))) baseCase = referenceNode.Parent()->Stat().LastDistance(); else baseCase = BaseCase(queryIndex, referenceNode.Point(0)); // Save this evaluation. referenceNode.Stat().LastDistance() = baseCase; } distance = SortPolicy::CombineBest(baseCase, referenceNode.FurthestDescendantDistance()); } else { distance = SortPolicy::BestPointToNodeDistance(querySet.col(queryIndex), &referenceNode); } // Compare against the best k'th distance for this query point so far. const double bestDistance = distances(distances.n_rows - 1, queryIndex); return (SortPolicy::IsBetter(distance, bestDistance)) ? distance : DBL_MAX; } template inline double NeighborSearchRules::Rescore( const size_t queryIndex, TreeType& /* referenceNode */, const double oldScore) const { // If we are already pruning, still prune. if (oldScore == DBL_MAX) return oldScore; // Just check the score again against the distances. const double bestDistance = distances(distances.n_rows - 1, queryIndex); return (SortPolicy::IsBetter(oldScore, bestDistance)) ? oldScore : DBL_MAX; } template inline double NeighborSearchRules::Score( TreeType& queryNode, TreeType& referenceNode) { ++scores; // Count number of Score() calls. // Update our bound. const double bestDistance = CalculateBound(queryNode); // Use the traversal info to see if a parent-child or parent-parent prune is // possible. This is a looser bound than we could make, but it might be // sufficient. const double queryParentDist = queryNode.ParentDistance(); const double queryDescDist = queryNode.FurthestDescendantDistance(); const double refParentDist = referenceNode.ParentDistance(); const double refDescDist = referenceNode.FurthestDescendantDistance(); const double score = traversalInfo.LastScore(); double adjustedScore; // We want to set adjustedScore to be the distance between the centroid of the // last query node and last reference node. We will do this by adjusting the // last score. In some cases, we can just use the last base case. if (tree::TreeTraits::FirstPointIsCentroid) { adjustedScore = traversalInfo.LastBaseCase(); } else if (score == 0.0) // Nothing we can do here. { adjustedScore = 0.0; } else { // The last score is equal to the distance between the centroids minus the // radii of the query and reference bounds along the axis of the line // between the two centroids. In the best case, these radii are the // furthest descendant distances, but that is not always true. It would // take too long to calculate the exact radii, so we are forced to use // MinimumBoundDistance() as a lower-bound approximation. const double lastQueryDescDist = traversalInfo.LastQueryNode()->MinimumBoundDistance(); const double lastRefDescDist = traversalInfo.LastReferenceNode()->MinimumBoundDistance(); adjustedScore = SortPolicy::CombineWorst(score, lastQueryDescDist); adjustedScore = SortPolicy::CombineWorst(score, lastRefDescDist); } // Assemble an adjusted score. For nearest neighbor search, this adjusted // score is a lower bound on MinDistance(queryNode, referenceNode) that is // assembled without actually calculating MinDistance(). For furthest // neighbor search, it is an upper bound on // MaxDistance(queryNode, referenceNode). If the traversalInfo isn't usable // then the node should not be pruned by this. if (traversalInfo.LastQueryNode() == queryNode.Parent()) { const double queryAdjust = queryParentDist + queryDescDist; adjustedScore = SortPolicy::CombineBest(adjustedScore, queryAdjust); } else if (traversalInfo.LastQueryNode() == &queryNode) { adjustedScore = SortPolicy::CombineBest(adjustedScore, queryDescDist); } else { // The last query node wasn't this query node or its parent. So we force // the adjustedScore to be such that this combination can't be pruned here, // because we don't really know anything about it. // It would be possible to modify this section to try and make a prune based // on the query descendant distance and the distance between the query node // and last traversal query node, but this case doesn't actually happen for // kd-trees or cover trees. adjustedScore = SortPolicy::BestDistance(); } if (traversalInfo.LastReferenceNode() == referenceNode.Parent()) { const double refAdjust = refParentDist + refDescDist; adjustedScore = SortPolicy::CombineBest(adjustedScore, refAdjust); } else if (traversalInfo.LastReferenceNode() == &referenceNode) { adjustedScore = SortPolicy::CombineBest(adjustedScore, refDescDist); } else { // The last reference node wasn't this reference node or its parent. So we // force the adjustedScore to be such that this combination can't be pruned // here, because we don't really know anything about it. // It would be possible to modify this section to try and make a prune based // on the reference descendant distance and the distance between the // reference node and last traversal reference node, but this case doesn't // actually happen for kd-trees or cover trees. adjustedScore = SortPolicy::BestDistance(); } // Can we prune? if (SortPolicy::IsBetter(bestDistance, adjustedScore)) { if (!(tree::TreeTraits::FirstPointIsCentroid && score == 0.0)) { // There isn't any need to set the traversal information because no // descendant combinations will be visited, and those are the only // combinations that would depend on the traversal information. return DBL_MAX; } } double distance; if (tree::TreeTraits::FirstPointIsCentroid) { // The first point in the node is the centroid, so we can calculate the // distance between the two points using BaseCase() and then find the // bounds. This is potentially loose for non-ball bounds. double baseCase = -1.0; if (tree::TreeTraits::HasSelfChildren && (traversalInfo.LastQueryNode()->Point(0) == queryNode.Point(0)) && (traversalInfo.LastReferenceNode()->Point(0) == referenceNode.Point(0))) { // We already calculated it. baseCase = traversalInfo.LastBaseCase(); } else { baseCase = BaseCase(queryNode.Point(0), referenceNode.Point(0)); } distance = SortPolicy::CombineBest(baseCase, queryNode.FurthestDescendantDistance() + referenceNode.FurthestDescendantDistance()); lastQueryIndex = queryNode.Point(0); lastReferenceIndex = referenceNode.Point(0); lastBaseCase = baseCase; traversalInfo.LastBaseCase() = baseCase; } else { distance = SortPolicy::BestNodeToNodeDistance(&queryNode, &referenceNode); } if (SortPolicy::IsBetter(distance, bestDistance)) { // Set traversal information. traversalInfo.LastQueryNode() = &queryNode; traversalInfo.LastReferenceNode() = &referenceNode; traversalInfo.LastScore() = distance; return distance; } else { // There isn't any need to set the traversal information because no // descendant combinations will be visited, and those are the only // combinations that would depend on the traversal information. return DBL_MAX; } } template inline double NeighborSearchRules::Rescore( TreeType& queryNode, TreeType& /* referenceNode */, const double oldScore) const { if (oldScore == DBL_MAX) return oldScore; // Update our bound. const double bestDistance = CalculateBound(queryNode); return (SortPolicy::IsBetter(oldScore, bestDistance)) ? oldScore : DBL_MAX; } // Calculate the bound for a given query node in its current state and update // it. template inline double NeighborSearchRules:: CalculateBound(TreeType& queryNode) const { // This is an adapted form of the B(N_q) function in the paper // ``Tree-Independent Dual-Tree Algorithms'' by Curtin et. al.; the goal is to // place a bound on the worst possible distance a point combination could have // to improve any of the current neighbor estimates. If the best possible // distance between two nodes is greater than this bound, then the node // combination can be pruned (see Score()). // There are a couple ways we can assemble a bound. For simplicity, this is // described for nearest neighbor search (SortPolicy = NearestNeighborSort), // but the code that is written is adapted for whichever SortPolicy. // First, we can consider the current worst neighbor candidate distance of any // descendant point. This is assembled with 'worstDistance' by looping // through the points held by the query node, and then by taking the cached // worst distance from any child nodes (Stat().FirstBound()). This // corresponds roughly to B_1(N_q) in the paper. // The other way of bounding is to use the triangle inequality. To do this, // we find the current best kth-neighbor candidate distance of any descendant // query point, and use the triangle inequality to place a bound on the // distance that candidate would have to any other descendant query point. // This corresponds roughly to B_2(N_q) in the paper, and is the bounding // style for cover trees. // Then, to assemble the final bound, since both bounds are valid, we simply // take the better of the two. double worstDistance = SortPolicy::BestDistance(); double bestDistance = SortPolicy::WorstDistance(); // Loop over points held in the node. for (size_t i = 0; i < queryNode.NumPoints(); ++i) { const double distance = distances(distances.n_rows - 1, queryNode.Point(i)); if (SortPolicy::IsBetter(worstDistance, distance)) worstDistance = distance; if (SortPolicy::IsBetter(distance, bestDistance)) bestDistance = distance; } // Add triangle inequality adjustment to best distance. It is possible this // could be tighter for some certain types of trees. bestDistance = SortPolicy::CombineWorst(bestDistance, queryNode.FurthestPointDistance() + queryNode.FurthestDescendantDistance()); // Loop over children of the node, and use their cached information to // assemble bounds. for (size_t i = 0; i < queryNode.NumChildren(); ++i) { const double firstBound = queryNode.Child(i).Stat().FirstBound(); const double adjustment = std::max(0.0, queryNode.FurthestDescendantDistance() - queryNode.Child(i).FurthestDescendantDistance()); const double adjustedSecondBound = SortPolicy::CombineWorst( queryNode.Child(i).Stat().SecondBound(), 2 * adjustment); if (SortPolicy::IsBetter(worstDistance, firstBound)) worstDistance = firstBound; if (SortPolicy::IsBetter(adjustedSecondBound, bestDistance)) bestDistance = adjustedSecondBound; } // Now consider the parent bounds. if (queryNode.Parent() != NULL) { // The parent's worst distance bound implies that the bound for this node // must be at least as good. Thus, if the parent worst distance bound is // better, then take it. if (SortPolicy::IsBetter(queryNode.Parent()->Stat().FirstBound(), worstDistance)) worstDistance = queryNode.Parent()->Stat().FirstBound(); // The parent's best distance bound implies that the bound for this node // must be at least as good. Thus, if the parent best distance bound is // better, then take it. if (SortPolicy::IsBetter(queryNode.Parent()->Stat().SecondBound(), bestDistance)) bestDistance = queryNode.Parent()->Stat().SecondBound(); } // Cache bounds for later. queryNode.Stat().FirstBound() = worstDistance; queryNode.Stat().SecondBound() = bestDistance; if (SortPolicy::IsBetter(worstDistance, bestDistance)) return worstDistance; else return bestDistance; } /** * Helper function to insert a point into the neighbors and distances matrices. * * @param queryIndex Index of point whose neighbors we are inserting into. * @param pos Position in list to insert into. * @param neighbor Index of reference point which is being inserted. * @param distance Distance from query point to reference point. */ template void NeighborSearchRules::InsertNeighbor( const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) { // We only memmove() if there is actually a need to shift something. if (pos < (distances.n_rows - 1)) { int len = (distances.n_rows - 1) - pos; memmove(distances.colptr(queryIndex) + (pos + 1), distances.colptr(queryIndex) + pos, sizeof(double) * len); memmove(neighbors.colptr(queryIndex) + (pos + 1), neighbors.colptr(queryIndex) + pos, sizeof(size_t) * len); } // Now put the new information in the right index. distances(pos, queryIndex) = distance; neighbors(pos, queryIndex) = neighbor; } }; // namespace neighbor }; // namespace mlpack #endif // __MLPACK_METHODS_NEIGHBOR_SEARCH_NEAREST_NEIGHBOR_RULES_IMPL_HPP RcppMLPACK/inst/include/mlpack/methods/neighbor_search/unmap.hpp0000644000176200001440000000604413647512751024352 0ustar liggesusers/** * @file unmap.hpp * @author Ryan Curtin * * Convenience methods to unmap results. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_UNMAP_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_UNMAP_HPP #include namespace mlpack { namespace neighbor { /** * Assuming that the datasets have been mapped using the referenceMap and the * queryMap (such as during kd-tree construction), unmap the columns of the * distances and neighbors matrices into neighborsOut and distancesOut, and also * unmap the entries in each row of neighbors. This is useful for the dual-tree * case. * * @param neighbors Matrix of neighbors resulting from neighbor search. * @param distances Matrix of distances resulting from neighbor search. * @param referenceMap Mapping of reference set to old points. * @param queryMap Mapping of query set to old points. * @param neighborsOut Matrix to store unmapped neighbors into. * @param distancesOut Matrix to store unmapped distances into. * @param squareRoot If true, take the square root of the distances. */ void Unmap(const arma::Mat& neighbors, const arma::mat& distances, const std::vector& referenceMap, const std::vector& queryMap, arma::Mat& neighborsOut, arma::mat& distancesOut, const bool squareRoot = false); /** * Assuming that the datasets have been mapped using referenceMap (such as * during kd-tree construction), unmap the columns of the distances and * neighbors matrices into neighborsOut and distancesOut, and also unmap the * entries in each row of neighbors. This is useful for the single-tree case. * * @param neighbors Matrix of neighbors resulting from neighbor search. * @param distances Matrix of distances resulting from neighbor search. * @param referenceMap Mapping of reference set to old points. * @param neighborsOut Matrix to store unmapped neighbors into. * @param distancesOut Matrix to store unmapped distances into. * @param squareRoot If true, take the square root of the distances. */ void Unmap(const arma::Mat& neighbors, const arma::mat& distances, const std::vector& referenceMap, arma::Mat& neighborsOut, arma::mat& distancesOut, const bool squareRoot = false); }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/neighbor_search.hpp0000644000176200001440000002563713647512751026365 0ustar liggesusers/** * @file neighbor_search.hpp * @author Ryan Curtin * * Defines the NeighborSearch class, which performs an abstract * nearest-neighbor-like query on two datasets. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_NEIGHBOR_SEARCH_HPP #include #include #include #include #include #include "neighbor_search_stat.hpp" #include "sort_policies/nearest_neighbor_sort.hpp" namespace mlpack { namespace neighbor /** Neighbor-search routines. These include * all-nearest-neighbors and all-furthest-neighbors * searches. */ { /** * The NeighborSearch class is a template class for performing distance-based * neighbor searches. It takes a query dataset and a reference dataset (or just * a reference dataset) and, for each point in the query dataset, finds the k * neighbors in the reference dataset which have the 'best' distance according * to a given sorting policy. A constructor is given which takes only a * reference dataset, and if that constructor is used, the given reference * dataset is also used as the query dataset. * * The template parameters SortPolicy and Metric define the sort function used * and the metric (distance function) used. More information on those classes * can be found in the NearestNeighborSort class and the kernel::ExampleKernel * class. * * @tparam SortPolicy The sort policy for distances; see NearestNeighborSort. * @tparam MetricType The metric to use for computation. * @tparam TreeType The tree type to use. */ template, NeighborSearchStat > > class NeighborSearch { public: /** * Initialize the NeighborSearch object, passing both a query and reference * dataset. Optionally, perform the computation in naive mode or single-tree * mode, and set the leaf size used for tree-building. An initialized * distance metric can be given, for cases where the metric has internal data * (i.e. the distance::MahalanobisDistance class). * * This method will copy the matrices to internal copies, which are rearranged * during tree-building. You can avoid this extra copy by pre-constructing * the trees and passing them using a diferent constructor. * * @param referenceSet Set of reference points. * @param querySet Set of query points. * @param naive If true, O(n^2) naive search will be used (as opposed to * dual-tree search). This overrides singleMode (if it is set to true). * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ NeighborSearch(const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the NeighborSearch object, passing only one dataset, which is * used as both the query and the reference dataset. Optionally, perform the * computation in naive mode or single-tree mode, and set the leaf size used * for tree-building. An initialized distance metric can be given, for cases * where the metric has internal data (i.e. the distance::MahalanobisDistance * class). * * If naive mode is being used and a pre-built tree is given, it may not work: * naive mode operates by building a one-node tree (the root node holds all * the points). If that condition is not satisfied with the pre-built tree, * then naive mode will not work. * * @param referenceSet Set of reference points. * @param naive If true, O(n^2) naive search will be used (as opposed to * dual-tree search). This overrides singleMode (if it is set to true). * @param singleMode If true, single-tree search will be used (as opposed to * dual-tree search). * @param leafSize Leaf size for tree construction (ignored if tree is given). * @param metric An optional instance of the MetricType class. */ NeighborSearch(const typename TreeType::Mat& referenceSet, const bool naive = false, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the NeighborSearch object with the given datasets and * pre-constructed trees. It is assumed that the points in referenceSet and * querySet correspond to the points in referenceTree and queryTree, * respectively. Optionally, choose to use single-tree mode. Naive mode is * not available as an option for this constructor; instead, to run naive * computation, construct a tree with all of the points in one leaf (i.e. * leafSize = number of points). Additionally, an instantiated distance * metric can be given, for cases where the distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param queryTree Pre-built tree for query points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param querySet Set of query points corresponding to queryTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ NeighborSearch(TreeType* referenceTree, TreeType* queryTree, const typename TreeType::Mat& referenceSet, const typename TreeType::Mat& querySet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Initialize the NeighborSearch object with the given reference dataset and * pre-constructed tree. It is assumed that the points in referenceSet * correspond to the points in referenceTree. Optionally, choose to use * single-tree mode. Naive mode is not available as an option for this * constructor; instead, to run naive computation, construct a tree with all * the points in one leaf (i.e. leafSize = number of points). Additionally, * an instantiated distance metric can be given, for the case where the * distance metric holds data. * * There is no copying of the data matrices in this constructor (because * tree-building is not necessary), so this is the constructor to use when * copies absolutely must be avoided. * * @note * Because tree-building (at least with BinarySpaceTree) modifies the ordering * of a matrix, be sure you pass the modified matrix to this object! In * addition, mapping the points of the matrix back to their original indices * is not done when this constructor is used. * @endnote * * @param referenceTree Pre-built tree for reference points. * @param referenceSet Set of reference points corresponding to referenceTree. * @param singleMode Whether single-tree computation should be used (as * opposed to dual-tree computation). * @param metric Instantiated distance metric. */ NeighborSearch(TreeType* referenceTree, const typename TreeType::Mat& referenceSet, const bool singleMode = false, const MetricType metric = MetricType()); /** * Delete the NeighborSearch object. The tree is the only member we are * responsible for deleting. The others will take care of themselves. */ ~NeighborSearch(); /** * Compute the nearest neighbors and store the output in the given matrices. * The matrices will be set to the size of n columns by k rows, where n is the * number of points in the query dataset and k is the number of neighbors * being searched for. * * @param k Number of neighbors to search for. * @param resultingNeighbors Matrix storing lists of neighbors for each query * point. * @param distances Matrix storing distances of neighbors for each query * point. */ void Search(const size_t k, arma::Mat& resultingNeighbors, arma::mat& distances); // Returns a string representation of this object. std::string ToString() const; private: //! Copy of reference dataset (if we need it, because tree building modifies //! it). typename TreeType::Mat referenceCopy; //! Copy of query dataset (if we need it, because tree building modifies it). typename TreeType::Mat queryCopy; //! Reference dataset. const typename TreeType::Mat& referenceSet; //! Query dataset (may not be given). const typename TreeType::Mat& querySet; //! Pointer to the root of the reference tree. TreeType* referenceTree; //! Pointer to the root of the query tree (might not exist). TreeType* queryTree; //! If true, this object created the trees and is responsible for them. bool treeOwner; //! Indicates if a separate query set was passed. bool hasQuerySet; //! Indicates if O(n^2) naive search is being used. bool naive; //! Indicates if single-tree search is being used (opposed to dual-tree). bool singleMode; //! Instantiation of metric. MetricType metric; //! Permutations of reference points during tree building. std::vector oldFromNewReferences; //! Permutations of query points during tree building. std::vector oldFromNewQueries; }; // class NeighborSearch }; // namespace neighbor }; // namespace mlpack // Include implementation. #include "neighbor_search_impl.hpp" // Include convenience typedefs. #include "typedef.hpp" #endif RcppMLPACK/inst/include/mlpack/methods/neighbor_search/ns_traversal_info.hpp0000644000176200001440000000554013647512751026750 0ustar liggesusers/** * @file ns_traversal_info.hpp * @author Ryan Curtin * * This class holds traversal information for dual-tree traversals that are * using the NeighborSearchRules RuleType. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_NEIGHBOR_SEARCH_TRAVERSAL_INFO_HPP #define __MLPACK_METHODS_NEIGHBOR_SEARCH_TRAVERSAL_INFO_HPP namespace mlpack { namespace neighbor { /** * Traversal information for NeighborSearch. This information is used to make * parent-child prunes or parent-parent prunes in Score() without needing to * evaluate the distance between two nodes. * * The information held by this class is the last node combination visited * before the current node combination was recursed into and the distance * between the node centroids. */ template class NeighborSearchTraversalInfo { public: /** * Create the TraversalInfo object and initialize the pointers to NULL. */ NeighborSearchTraversalInfo() : lastQueryNode(NULL), lastReferenceNode(NULL), lastScore(0.0), lastBaseCase(0.0) { /* Nothing to do. */ } //! Get the last query node. TreeType* LastQueryNode() const { return lastQueryNode; } //! Modify the last query node. TreeType*& LastQueryNode() { return lastQueryNode; } //! Get the last reference node. TreeType* LastReferenceNode() const { return lastReferenceNode; } //! Modify the last reference node. TreeType*& LastReferenceNode() { return lastReferenceNode; } //! Get the score associated with the last query and reference nodes. double LastScore() const { return lastScore; } //! Modify the score associated with the last query and reference nodes. double& LastScore() { return lastScore; } //! Get the base case associated with the last node combination. double LastBaseCase() const { return lastBaseCase; } //! Modify the base case associated with the last node combination. double& LastBaseCase() { return lastBaseCase; } private: //! The last query node. TreeType* lastQueryNode; //! The last reference node. TreeType* lastReferenceNode; //! The last distance. double lastScore; //! The last base case. double lastBaseCase; }; }; // namespace neighbor }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/prereqs.hpp0000644000176200001440000000376513647512751020135 0ustar liggesusers/** * @file prereqs.hpp * * The core includes that mlpack expects; standard C++ includes and Armadillo. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_PREREQS_HPP #define __MLPACK_PREREQS_HPP // First, check if Armadillo was included before, warning if so. #ifdef ARMA_INCLUDES #pragma message "Armadillo was included before mlpack; this can sometimes cause\ problems. It should only be necessary to include and not\ ." #endif // Next, standard includes. #include #include #include #include #include #include #include #include // Defining _USE_MATH_DEFINES should set M_PI. #define _USE_MATH_DEFINES #include // For tgamma(). #include // But if it's not defined, we'll do it. #ifndef M_PI #define M_PI 3.141592653589793238462643383279 #endif // Give ourselves a nice way to force functions to be inline if we need. #define force_inline #if defined(__GNUG__) && !defined(DEBUG) #undef force_inline #define force_inline __attribute__((always_inline)) #elif defined(_MSC_VER) && !defined(DEBUG) #undef force_inline #define force_inline __forceinline #endif // Now include Armadillo through the special mlpack extensions. #include #endif RcppMLPACK/inst/include/mlpack/core/0000755000176200001440000000000013647521523016655 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/math/0000755000176200001440000000000013647512751017611 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/math/round.hpp0000644000176200001440000000233413647512751021453 0ustar liggesusers/** * @file round.hpp * @author Ryan Curtin * * Implementation of round() for use on Visual Studio, where C99 isn't * implemented. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_ROUND_HPP #define __MLPACK_CORE_MATH_ROUND_HPP // _MSC_VER should only be defined for Visual Studio, which doesn't implement // C99. #ifdef _MSC_VER // This function ends up going into the global namespace, so it can be used in // place of C99's round(). //! Round a number to the nearest integer. inline double round(double a) { return floor(a + 0.5); } #endif #endif RcppMLPACK/inst/include/mlpack/core/math/range.hpp0000644000176200001440000001070613647512751021422 0ustar liggesusers/** * @file range.hpp * * Definition of the Range class, which represents a simple range with a lower * and upper bound. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_RANGE_HPP #define __MLPACK_CORE_MATH_RANGE_HPP namespace mlpack { namespace math { /** * Simple real-valued range. It contains an upper and lower bound. */ class Range { private: double lo; /// The lower bound. double hi; /// The upper bound. public: /** Initialize to an empty set (where lo > hi). */ inline Range(); /*** * Initialize a range to enclose only the given point (lo = point, hi = * point). * * @param point Point that this range will enclose. */ inline Range(const double point); /** * Initializes to specified range. * * @param lo Lower bound of the range. * @param hi Upper bound of the range. */ inline Range(const double lo, const double hi); //! Get the lower bound. inline double Lo() const { return lo; } //! Modify the lower bound. inline double& Lo() { return lo; } //! Get the upper bound. inline double Hi() const { return hi; } //! Modify the upper bound. inline double& Hi() { return hi; } /** * Gets the span of the range (hi - lo). */ inline double Width() const; /** * Gets the midpoint of this range. */ inline double Mid() const; /** * Expands this range to include another range. * * @param rhs Range to include. */ inline Range& operator|=(const Range& rhs); /** * Expands this range to include another range. * * @param rhs Range to include. */ inline Range operator|(const Range& rhs) const; /** * Shrinks this range to be the overlap with another range; this makes an * empty set if there is no overlap. * * @param rhs Other range. */ inline Range& operator&=(const Range& rhs); /** * Shrinks this range to be the overlap with another range; this makes an * empty set if there is no overlap. * * @param rhs Other range. */ inline Range operator&(const Range& rhs) const; /** * Scale the bounds by the given double. * * @param d Scaling factor. */ inline Range& operator*=(const double d); /** * Scale the bounds by the given double. * * @param d Scaling factor. */ inline Range operator*(const double d) const; /** * Scale the bounds by the given double. * * @param d Scaling factor. */ friend inline Range operator*(const double d, const Range& r); // Symmetric. /** * Compare with another range for strict equality. * * @param rhs Other range. */ inline bool operator==(const Range& rhs) const; /** * Compare with another range for strict equality. * * @param rhs Other range. */ inline bool operator!=(const Range& rhs) const; /** * Compare with another range. For Range objects x and y, x < y means that x * is strictly less than y and does not overlap at all. * * @param rhs Other range. */ inline bool operator<(const Range& rhs) const; /** * Compare with another range. For Range objects x and y, x < y means that x * is strictly less than y and does not overlap at all. * * @param rhs Other range. */ inline bool operator>(const Range& rhs) const; /** * Determines if a point is contained within the range. * * @param d Point to check. */ inline bool Contains(const double d) const; /** * Determines if another range overlaps with this one. * * @param r Other range. * * @return true if ranges overlap at all. */ inline bool Contains(const Range& r) const; /** * Returns a string representation of an object. */ inline std::string ToString() const; }; }; // namespace math }; // namespace mlpack // Include inlined implementation. #include "range_impl.hpp" #endif // __MLPACK_CORE_MATH_RANGE_HPP RcppMLPACK/inst/include/mlpack/core/math/clamp.hpp0000644000176200001440000000416013647512751021417 0ustar liggesusers/** * @file clamp.hpp * * Miscellaneous math clamping routines. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_CLAMP_HPP #define __MLPACK_CORE_MATH_CLAMP_HPP #include #include #include namespace mlpack { namespace math /** Miscellaneous math routines. */ { /** * Forces a number to be non-negative, turning negative numbers into zero. * Avoids branching costs (this is a measurable improvement). * * @param d Double to clamp. * @return 0 if d < 0, d otherwise. */ inline double ClampNonNegative(const double d) { return (d + fabs(d)) / 2; } /** * Forces a number to be non-positive, turning positive numbers into zero. * Avoids branching costs (this is a measurable improvement). * * @param d Double to clamp. * @param 0 if d > 0, d otherwise. */ inline double ClampNonPositive(const double d) { return (d - fabs(d)) / 2; } /** * Clamp a number between a particular range. * * @param value The number to clamp. * @param rangeMin The first of the range. * @param rangeMax The last of the range. * @return max(rangeMin, min(rangeMax, d)). */ inline double ClampRange(double value, const double rangeMin, const double rangeMax) { value -= rangeMax; value = ClampNonPositive(value) + rangeMax; value -= rangeMin; value = ClampNonNegative(value) + rangeMin; return value; } }; // namespace math }; // namespace mlpack #endif // __MLPACK_CORE_MATH_CLAMP_HPP RcppMLPACK/inst/include/mlpack/core/math/lin_alg.hpp0000644000176200001440000000577013647512751021740 0ustar liggesusers/** * @file lin_alg.hpp * @author Nishant Mehta * * Linear algebra utilities. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_LIN_ALG_HPP #define __MLPACK_CORE_MATH_LIN_ALG_HPP #include /** * Linear algebra utility functions, generally performed on matrices or vectors. */ namespace mlpack { namespace math { /** * Auxiliary function to raise vector elements to a specific power. The sign * is ignored in the power operation and then re-added. Useful for * eigenvalues. */ void VectorPower(arma::vec& vec, const double power); /** * Creates a centered matrix, where centering is done by subtracting * the sum over the columns (a column vector) from each column of the matrix. * * @param x Input matrix * @param xCentered Matrix to write centered output into */ void Center(const arma::mat& x, arma::mat& xCentered); /** * Whitens a matrix using the singular value decomposition of the covariance * matrix. Whitening means the covariance matrix of the result is the identity * matrix. */ void WhitenUsingSVD(const arma::mat& x, arma::mat& xWhitened, arma::mat& whiteningMatrix); /** * Whitens a matrix using the eigendecomposition of the covariance matrix. * Whitening means the covariance matrix of the result is the identity matrix. */ void WhitenUsingEig(const arma::mat& x, arma::mat& xWhitened, arma::mat& whiteningMatrix); /** * Overwrites a dimension-N vector to a random vector on the unit sphere in R^N. */ void RandVector(arma::vec& v); /** * Orthogonalize x and return the result in W, using eigendecomposition. * We will be using the formula \f$ W = x (x^T x)^{-0.5} \f$. */ void Orthogonalize(const arma::mat& x, arma::mat& W); /** * Orthogonalize x in-place. This could be sped up by a custom * implementation. */ void Orthogonalize(arma::mat& x); /** * Remove a certain set of rows in a matrix while copying to a second matrix. * * @param input Input matrix to copy. * @param rowsToRemove Vector containing indices of rows to be removed. * @param output Matrix to copy non-removed rows into. */ void RemoveRows(const arma::mat& input, const std::vector& rowsToRemove, arma::mat& output); }; // namespace math }; // namespace mlpack #endif // __MLPACK_CORE_MATH_LIN_ALG_HPP RcppMLPACK/inst/include/mlpack/core/math/random.hpp0000644000176200001440000001043213647512751021602 0ustar liggesusers/** * @file random.hpp * * Miscellaneous math random-related routines. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_RANDOM_HPP #define __MLPACK_CORE_MATH_RANDOM_HPP #include #include namespace mlpack { namespace math /** Miscellaneous math routines. */ { // Annoying Boost versioning issues. #include #if BOOST_VERSION >= 104700 // Global random object. extern boost::random::mt19937 randGen; // Global uniform distribution. extern boost::random::uniform_01<> randUniformDist; // Global normal distribution. extern boost::random::normal_distribution<> randNormalDist; #else // Global random object. extern boost::mt19937 randGen; #if BOOST_VERSION >= 103900 // Global uniform distribution. extern boost::uniform_01<> randUniformDist; #else // Pre-1.39 Boost.Random did not give default template parameter values. extern boost::uniform_01 randUniformDist; #endif // Global normal distribution. extern boost::normal_distribution<> randNormalDist; #endif /** * Set the random seed used by the random functions (Random() and RandInt()). * The seed is casted to a 32-bit integer before being given to the random * number generator, but a size_t is taken as a parameter for API consistency. * * @param seed Seed for the random number generator. */ inline void RandomSeed(const size_t seed) { randGen.seed((uint32_t) seed); srand((unsigned int) seed); } /** * Generates a uniform random number between 0 and 1. */ inline double Random() { #if BOOST_VERSION >= 103900 return randUniformDist(randGen); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return randUniformDist(); #endif } /** * Generates a uniform random number in the specified range. */ inline double Random(const double lo, const double hi) { #if BOOST_VERSION >= 103900 return lo + (hi - lo) * randUniformDist(randGen); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return lo + (hi - lo) * randUniformDist(); #endif } /** * Generates a uniform random integer. */ inline int RandInt(const int hiExclusive) { #if BOOST_VERSION >= 103900 return (int) std::floor((double) hiExclusive * randUniformDist(randGen)); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return (int) std::floor((double) hiExclusive * randUniformDist()); #endif } /** * Generates a uniform random integer. */ inline int RandInt(const int lo, const int hiExclusive) { #if BOOST_VERSION >= 103900 return lo + (int) std::floor((double) (hiExclusive - lo) * randUniformDist(randGen)); #else // Before Boost 1.39, we did not give the random object when we wanted a // random number; that gets given at construction time. return lo + (int) std::floor((double) (hiExclusive - lo) * randUniformDist()); #endif } /** * Generates a normally distributed random number with mean 0 and variance 1. */ inline double RandNormal() { return randNormalDist(randGen); } /** * Generates a normally distributed random number with specified mean and * variance. * * @param mean Mean of distribution. * @param variance Variance of distribution. */ inline double RandNormal(const double mean, const double variance) { return variance * randNormalDist(randGen) + mean; } }; // namespace math }; // namespace mlpack #endif // __MLPACK_CORE_MATH_MATH_LIB_HPP RcppMLPACK/inst/include/mlpack/core/math/range_impl.hpp0000644000176200001440000001024013647512751022434 0ustar liggesusers/** * @file range_impl.hpp * * Implementation of the (inlined) Range class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_MATH_RANGE_IMPL_HPP #define __MLPACK_CORE_MATH_RANGE_IMPL_HPP #include "range.hpp" #include #include namespace mlpack { namespace math { /** * Initialize the range to 0. */ inline Range::Range() : lo(DBL_MAX), hi(-DBL_MAX) { /* nothing else to do */ } /** * Initialize a range to enclose only the given point. */ inline Range::Range(const double point) : lo(point), hi(point) { /* nothing else to do */ } /** * Initializes the range to the specified values. */ inline Range::Range(const double lo, const double hi) : lo(lo), hi(hi) { /* nothing else to do */ } /** * Gets the span of the range, hi - lo. Returns 0 if the range is negative. */ inline double Range::Width() const { if (lo < hi) return (hi - lo); else return 0.0; } /** * Gets the midpoint of this range. */ inline double Range::Mid() const { return (hi + lo) / 2; } /** * Expands range to include the other range. */ inline Range& Range::operator|=(const Range& rhs) { if (rhs.lo < lo) lo = rhs.lo; if (rhs.hi > hi) hi = rhs.hi; return *this; } inline Range Range::operator|(const Range& rhs) const { return Range((rhs.lo < lo) ? rhs.lo : lo, (rhs.hi > hi) ? rhs.hi : hi); } /** * Shrinks range to be the overlap with another range, becoming an empty * set if there is no overlap. */ inline Range& Range::operator&=(const Range& rhs) { if (rhs.lo > lo) lo = rhs.lo; if (rhs.hi < hi) hi = rhs.hi; return *this; } inline Range Range::operator&(const Range& rhs) const { return Range((rhs.lo > lo) ? rhs.lo : lo, (rhs.hi < hi) ? rhs.hi : hi); } /** * Scale the bounds by the given double. */ inline Range& Range::operator*=(const double d) { lo *= d; hi *= d; // Now if we've negated, we need to flip things around so the bound is valid. if (lo > hi) { double tmp = hi; hi = lo; lo = tmp; } return *this; } inline Range Range::operator*(const double d) const { double nlo = lo * d; double nhi = hi * d; if (nlo <= nhi) return Range(nlo, nhi); else return Range(nhi, nlo); } // Symmetric case. inline Range operator*(const double d, const Range& r) { double nlo = r.lo * d; double nhi = r.hi * d; if (nlo <= nhi) return Range(nlo, nhi); else return Range(nhi, nlo); } /** * Compare with another range for strict equality. */ inline bool Range::operator==(const Range& rhs) const { return (lo == rhs.lo) && (hi == rhs.hi); } inline bool Range::operator!=(const Range& rhs) const { return (lo != rhs.lo) || (hi != rhs.hi); } /** * Compare with another range. For Range objects x and y, x < y means that x is * strictly less than y and does not overlap at all. */ inline bool Range::operator<(const Range& rhs) const { return hi < rhs.lo; } inline bool Range::operator>(const Range& rhs) const { return lo > rhs.hi; } /** * Determines if a point is contained within the range. */ inline bool Range::Contains(const double d) const { return d >= lo && d <= hi; } /** * Determines if this range overlaps with another range. */ inline bool Range::Contains(const Range& r) const { return lo <= r.hi && hi >= r.lo; } /** * Returns a string representation of an object. */ std::string Range::ToString() const { std::ostringstream convert; convert << "[" << lo << ", " << hi << "]"; return convert.str(); } }; // namespace math }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/dists/0000755000176200001440000000000013647512751020006 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/dists/discrete_distribution.hpp0000644000176200001440000001310713647512751025122 0ustar liggesusers/** * @file discrete_distribution.hpp * @author Ryan Curtin * * Implementation of the discrete distribution, where each discrete observation * has a given probability. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_DISTRIBUTIONS_DISCRETE_DISTRIBUTION_HPP #define __MLPACK_METHODS_HMM_DISTRIBUTIONS_DISCRETE_DISTRIBUTION_HPP #include namespace mlpack { namespace distribution /** Probability distributions. */ { /** * A discrete distribution where the only observations are discrete * observations. This is useful (for example) with discrete Hidden Markov * Models, where observations are non-negative integers representing specific * emissions. * * No bounds checking is performed for observations, so if an invalid * observation is passed (i.e. observation > numObservations), a crash will * probably occur. * * This distribution only supports one-dimensional observations, so when passing * an arma::vec as an observation, it should only have one dimension * (vec.n_rows == 1). Any additional dimensions will simply be ignored. * * @note * This class, like every other class in MLPACK, uses arma::vec to represent * observations. While a discrete distribution only has positive integers * (size_t) as observations, these can be converted to doubles (which is what * arma::vec holds). This distribution internally converts those doubles back * into size_t before comparisons. * @endnote */ class DiscreteDistribution { public: /** * Default constructor, which creates a distribution that has no observations. */ DiscreteDistribution() { /* nothing to do */ } /** * Define the discrete distribution as having numObservations possible * observations. The probability in each state will be set to (1 / * numObservations). * * @param numObservations Number of possible observations this distribution * can have. */ DiscreteDistribution(const size_t numObservations) : probabilities(arma::ones(numObservations) / numObservations) { /* nothing to do */ } /** * Define the discrete distribution as having the given probabilities for each * observation. * * @param probabilities Probabilities of each possible observation. */ DiscreteDistribution(const arma::vec& probabilities) { // We must be sure that our distribution is normalized. double sum = accu(probabilities); if (sum > 0) this->probabilities = probabilities / sum; else { this->probabilities.set_size(probabilities.n_elem); this->probabilities.fill(1 / probabilities.n_elem); } } /** * Get the dimensionality of the distribution. */ size_t Dimensionality() const { return 1; } /** * Return the probability of the given observation. If the observation is * greater than the number of possible observations, then a crash will * probably occur -- bounds checking is not performed. * * @param observation Observation to return the probability of. * @return Probability of the given observation. */ double Probability(const arma::vec& observation) const { // Adding 0.5 helps ensure that we cast the floating point to a size_t // correctly. const size_t obs = size_t(observation[0] + 0.5); // Ensure that the observation is within the bounds. if (obs >= probabilities.n_elem) { Rcpp::Rcout << "DiscreteDistribution::Probability(): received observation " << obs << "; observation must be in [0, " << probabilities.n_elem << "] for this distribution." << std::endl; } return probabilities(obs); } /** * Return a randomly generated observation (one-dimensional vector; one * observation) according to the probability distribution defined by this * object. * * @return Random observation. */ arma::vec Random() const; /** * Estimate the probability distribution directly from the given observations. * If any of the observations is greater than numObservations, a crash is * likely to occur. * * @param observations List of observations. */ void Estimate(const arma::mat& observations); /** * Estimate the probability distribution from the given observations, taking * into account the probability of each observation actually being from this * distribution. * * @param observations List of observations. * @param probabilities List of probabilities that each observation is * actually from this distribution. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities); //! Return the vector of probabilities. const arma::vec& Probabilities() const { return probabilities; } //! Modify the vector of probabilities. arma::vec& Probabilities() { return probabilities; } /* * Returns a string representation of this object. */ std::string ToString() const; private: arma::vec probabilities; }; }; // namespace distribution }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/dists/gaussian_distribution.hpp0000644000176200001440000000667313647512751025144 0ustar liggesusers/** * @file gaussian_distribution.hpp * @author Ryan Curtin * * Implementation of the Gaussian distribution. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_HMM_DISTRIBUTIONS_GAUSSIAN_DISTRIBUTION_HPP #define __MLPACK_METHODS_HMM_DISTRIBUTIONS_GAUSSIAN_DISTRIBUTION_HPP #include // Should be somewhere else, maybe in core. #include namespace mlpack { namespace distribution { /** * A single multivariate Gaussian distribution. */ class GaussianDistribution { private: //! Mean of the distribution. arma::vec mean; //! Covariance of the distribution. arma::mat covariance; public: /** * Default constructor, which creates a Gaussian with zero dimension. */ GaussianDistribution() { /* nothing to do */ } /** * Create a Gaussian distribution with zero mean and identity covariance with * the given dimensionality. */ GaussianDistribution(const size_t dimension) : mean(arma::zeros(dimension)), covariance(arma::eye(dimension, dimension)) { /* Nothing to do. */ } /** * Create a Gaussian distribution with the given mean and covariance. */ GaussianDistribution(const arma::vec& mean, const arma::mat& covariance) : mean(mean), covariance(covariance) { /* Nothing to do. */ } //! Return the dimensionality of this distribution. size_t Dimensionality() const { return mean.n_elem; } /** * Return the probability of the given observation. */ double Probability(const arma::vec& observation) const { return mlpack::gmm::phi(observation, mean, covariance); } /** * Return a randomly generated observation according to the probability * distribution defined by this object. * * @return Random observation from this Gaussian distribution. */ arma::vec Random() const; /** * Estimate the Gaussian distribution directly from the given observations. * * @param observations List of observations. */ void Estimate(const arma::mat& observations); /** * Estimate the Gaussian distribution from the given observations, taking into * account the probability of each observation actually being from this * distribution. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities); //! Return the mean. const arma::vec& Mean() const { return mean; } //! Return a modifiable copy of the mean. arma::vec& Mean() { return mean; } //! Return the covariance matrix. const arma::mat& Covariance() const { return covariance; } //! Return a modifiable copy of the covariance. arma::mat& Covariance() { return covariance; } /** * Returns a string representation of this object. */ std::string ToString() const; }; }; // namespace distribution }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/dists/laplace_distribution.hpp0000644000176200001440000001141413647512751024720 0ustar liggesusers/* * @file laplace.hpp * @author Zhihao Lou * * Laplace (double exponential) distribution used in SA. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZER_SA_LAPLACE_DISTRIBUTION_HPP #define __MLPACK_CORE_OPTIMIZER_SA_LAPLACE_DISTRIBUTION_HPP namespace mlpack { namespace distribution { /** * The multivariate Laplace distribution centered at 0 has pdf * * \f[ * f(x|\theta) = \frac{1}{2 \theta}\exp\left(-\frac{\|x - \mu\|}{\theta}\right) * \f] * * given scale parameter \f$\theta\f$ and mean \f$\mu\f$. This implementation * assumes a diagonal covariance, but a rewrite to support arbitrary covariances * is possible. * * See the following paper for more information on the non-diagonal-covariance * Laplace distribution and estimation techniques: * * @code * @article{eltoft2006multivariate, * title={{On the Multivariate Laplace Distribution}}, * author={Eltoft, Torbj\orn and Kim, Taesu and Lee, Te-Won}, * journal={IEEE Signal Processing Letters}, * volume={13}, * number={5}, * pages={300--304}, * year={2006} * } * @endcode * * Note that because of the diagonal covariance restriction, much of the algebra * in the paper above becomes simplified, and the PDF takes roughly the same * form as the univariate case. */ class LaplaceDistribution { public: /** * Default constructor, which creates a Laplace distribution with zero * dimension and zero scale parameter. */ LaplaceDistribution() : scale(0) { } /** * Construct the Laplace distribution with the given scale and dimensionality. * The mean is initialized to zero. * * @param dimensionality Dimensionality of distribution. * @param scale Scale of distribution. */ LaplaceDistribution(const size_t dimensionality, const double scale) : mean(arma::zeros(dimensionality)), scale(scale) { } /** * Construct the Laplace distribution with the given mean and scale parameter. * * @param mean Mean of distribution. * @param scale Scale of distribution. */ LaplaceDistribution(const arma::vec& mean, const double scale) : mean(mean), scale(scale) { } //! Return the dimensionality of this distribution. size_t Dimensionality() const { return mean.n_elem; } /** * Return the probability of the given observation. */ double Probability(const arma::vec& observation) const; /** * Return a randomly generated observation according to the probability * distribution defined by this object. This is inlined for speed. * * @return Random observation from this Laplace distribution. */ arma::vec Random() const { arma::vec result(mean.n_elem); result.randu(); // Convert from uniform distribution to Laplace distribution. // arma::sign() does not exist in Armadillo < 3.920 so we have to do this // elementwise. for (size_t i = 0; i < result.n_elem; ++i) { if (result[i] < 0) result[i] = mean[i] + scale * result[i] * std::log(1 + 2.0 * (result[i] - 0.5)); else result[i] = mean[i] - scale * result[i] * std::log(1 - 2.0 * (result[i] - 0.5)); } return result; } /** * Estimate the Laplace distribution directly from the given observations. * * @param observations List of observations. */ void Estimate(const arma::mat& observations); /** * Estimate the Laplace distribution from the given observations, taking into * account the probability of each observation actually being from this * distribution. */ void Estimate(const arma::mat& observations, const arma::vec& probabilities); //! Return the mean. const arma::vec& Mean() const { return mean; } //! Modify the mean. arma::vec& Mean() { return mean; } //! Return the scale parameter. double Scale() const { return scale; } //! Modify the scale parameter. double& Scale() { return scale; } //! Return a string representation of the object. std::string ToString() const; private: //! Mean of the distribution. arma::vec mean; //! Scale parameter of the distribution. double scale; }; }; // namespace distribution }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/metrics/0000755000176200001440000000000013647512751020326 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/metrics/ip_metric.hpp0000644000176200001440000000361113647512751023013 0ustar liggesusers/** * @file ip_metric.hpp * @author Ryan Curtin * * Inner product induced metric. If given a kernel function, this gives the * complementary metric. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_IP_METRIC_HPP #define __MLPACK_METHODS_FASTMKS_IP_METRIC_HPP namespace mlpack { namespace metric { template class IPMetric { public: //! Create the IPMetric without an instantiated kernel. IPMetric(); //! Create the IPMetric with an instantiated kernel. IPMetric(KernelType& kernel); //! Destroy the IPMetric object. ~IPMetric(); /** * Evaluate the metric. */ template double Evaluate(const Vec1Type& a, const Vec2Type& b); //! Get the kernel. const KernelType& Kernel() const { return kernel; } //! Modify the kernel. KernelType& Kernel() { return kernel; } /** * Returns a string representation of this object. */ std::string ToString() const; private: //! The locally stored kernel, if it is necessary. KernelType* localKernel; //! The reference to the kernel that is being used. KernelType& kernel; }; }; // namespace metric }; // namespace mlpack // Include implementation. #include "ip_metric_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/metrics/ip_metric_impl.hpp0000644000176200001440000000535113647512751024037 0ustar liggesusers/** * @file ip_metric_impl.hpp * @author Ryan Curtin * * Implementation of the IPMetric. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_METHODS_FASTMKS_IP_METRIC_IMPL_HPP #define __MLPACK_METHODS_FASTMKS_IP_METRIC_IMPL_HPP // In case it hasn't been included yet. #include "ip_metric_impl.hpp" #include #include namespace mlpack { namespace metric { // Constructor with no instantiated kernel. template IPMetric::IPMetric() : localKernel(new KernelType()), kernel(*localKernel) { // Nothing to do. } // Constructor with instantiated kernel. template IPMetric::IPMetric(KernelType& kernel) : localKernel(NULL), kernel(kernel) { // Nothing to do. } // Destructor for the IPMetric. template IPMetric::~IPMetric() { if (localKernel != NULL) delete localKernel; } template template inline double IPMetric::Evaluate(const Vec1Type& a, const Vec2Type& b) { // This is the metric induced by the kernel function. // Maybe we can do better by caching some of this? return sqrt(kernel.Evaluate(a, a) + kernel.Evaluate(b, b) - 2 * kernel.Evaluate(a, b)); } // Convert object to string. template std::string IPMetric::ToString() const { std::ostringstream convert; convert << "IPMetric [" << this << "]" << std::endl; convert << " Kernel: " << std::endl; convert << util::Indent(kernel.ToString(), 2); return convert.str(); } // A specialization for the linear kernel, which actually just turns out to be // the Euclidean distance. template<> template inline double IPMetric::Evaluate(const Vec1Type& a, const Vec2Type& b) { return metric::LMetric<2, true>::Evaluate(a, b); } }; // namespace metric }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/metrics/lmetric.hpp0000644000176200001440000000663513647512751022510 0ustar liggesusers/** * @file lmetric.hpp * @author Ryan Curtin * * Generalized L-metric, allowing both squared distances to be returned as well * as non-squared distances. The squared distances are faster to compute. * * This also gives several convenience typedefs for commonly used L-metrics. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_LMETRIC_HPP #define __MLPACK_CORE_METRICS_LMETRIC_HPP #include namespace mlpack { namespace metric { /** * The L_p metric for arbitrary integer p, with an option to take the root. * * This class implements the standard L_p metric for two arbitrary vectors @f$ x * @f$ and @f$ y @f$ of dimensionality @f$ n @f$: * * @f[ * d(x, y) = \left( \sum_{i = 1}^{n} | x_i - y_i |^p \right)^{\frac{1}{p}}. * @f] * * The value of p is given as a template parameter. * * In addition, the function @f$ d(x, y) @f$ can be simplified, neglecting the * p-root calculation. This is done by specifying the TakeRoot template * parameter to be false. Then, * * @f[ * d(x, y) = \sum_{i = 1}^{n} | x_i - y_i |^p * @f] * * It is faster to compute that distance, so TakeRoot is by default off. * However, when TakeRoot is false, the distance given is not actually a true * metric -- it does not satisfy the triangle inequality. Some MLPACK methods * do not require the triangle inequality to operate correctly (such as the * BinarySpaceTree), but setting TakeRoot = false in some cases will cause * incorrect results. * * A few convenience typedefs are given: * * - ManhattanDistance * - EuclideanDistance * - SquaredEuclideanDistance * * @tparam Power Power of metric; i.e. Power = 1 gives the L1-norm (Manhattan * distance). * @tparam TakeRoot If true, the Power'th root of the result is taken before it * is returned. Setting this to false causes the metric to not satisfy the * Triangle Inequality (be careful!). */ template class LMetric { public: /*** * Default constructor does nothing, but is required to satisfy the Kernel * policy. */ LMetric() { } /** * Computes the distance between two points. */ template static double Evaluate(const VecType1& a, const VecType2& b); std::string ToString() const; }; // Convenience typedefs. /*** * The Manhattan (L1) distance. */ typedef LMetric<1, false> ManhattanDistance; /*** * The squared Euclidean (L2) distance. */ typedef LMetric<2, false> SquaredEuclideanDistance; /*** * The Euclidean (L2) distance. */ typedef LMetric<2, true> EuclideanDistance; /*** * The L-infinity distance */ typedef LMetric ChebyshevDistance; }; // namespace metric }; // namespace mlpack // Include implementation. #include "lmetric_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/metrics/mahalanobis_distance.hpp0000644000176200001440000000757113647512751025201 0ustar liggesusers/*** * @file mahalanobis_dstance.h * @author Ryan Curtin * * The Mahalanobis distance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_HPP #define __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_HPP #include namespace mlpack { namespace metric { /** * The Mahalanobis distance, which is essentially a stretched Euclidean * distance. Given a square covariance matrix @f$ Q @f$ of size @f$ d @f$ x * @f$ d @f$, where @f$ d @f$ is the dimensionality of the points it will be * evaluating, and given two vectors @f$ x @f$ and @f$ y @f$ also of * dimensionality @f$ d @f$, * * @f[ * d(x, y) = \sqrt{(x - y)^T Q (x - y)} * @f] * * where Q is the covariance matrix. * * Because each evaluation multiplies (x_1 - x_2) by the covariance matrix, it * may be much quicker to use an LMetric and simply stretch the actual dataset * itself before performing any evaluations. However, this class is provided * for convenience. * * Similar to the LMetric class, this offers a template parameter TakeRoot * which, when set to false, will instead evaluate the distance * * @f[ * d(x, y) = (x - y)^T Q (x - y) * @f] * * which is faster to evaluate. * * @tparam TakeRoot If true, takes the root of the output. It is slightly * faster to leave this at the default of false. */ template class MahalanobisDistance { public: /** * Initialize the Mahalanobis distance with the empty matrix as covariance. * Don't call Evaluate() until you set the covariance with Covariance()! */ MahalanobisDistance() { } /** * Initialize the Mahalanobis distance with the identity matrix of the given * dimensionality. * * @param dimensionality Dimesnsionality of the covariance matrix. */ MahalanobisDistance(const size_t dimensionality) : covariance(arma::eye(dimensionality, dimensionality)) { } /** * Initialize the Mahalanobis distance with the given covariance matrix. The * given covariance matrix will be copied (this is not optimal). * * @param covariance The covariance matrix to use for this distance. */ MahalanobisDistance(const arma::mat& covariance) : covariance(covariance) { } /** * Evaluate the distance between the two given points using this Mahalanobis * distance. If the covariance matrix has not been set (i.e. if you used the * empty constructor and did not later modify the covariance matrix), calling * this method will probably result in a crash. * * @param a First vector. * @param b Second vector. */ // Return String of Object std::string ToString() const; template double Evaluate(const VecType1& a, const VecType2& b); /** * Access the covariance matrix. * * @return Constant reference to the covariance matrix. */ const arma::mat& Covariance() const { return covariance; } /** * Modify the covariance matrix. * * @return Reference to the covariance matrix. */ arma::mat& Covariance() { return covariance; } private: //! The covariance matrix associated with this distance. arma::mat covariance; }; }; // namespace distance }; // namespace mlpack #include "mahalanobis_distance_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/metrics/lmetric_impl.hpp0000644000176200001440000000657613647512751023535 0ustar liggesusers/** * @file lmetric_impl.hpp * @author Ryan Curtin * * Implementation of template specializations of LMetric class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_LMETRIC_IMPL_HPP #define __MLPACK_CORE_METRICS_LMETRIC_IMPL_HPP // In case it hasn't been included. #include "lmetric.hpp" namespace mlpack { namespace metric { // Unspecialized implementation. This should almost never be used... template template double LMetric::Evaluate(const VecType1& a, const VecType2& b) { double sum = 0; for (size_t i = 0; i < a.n_elem; i++) sum += pow(fabs(a[i] - b[i]), Power); if (!TakeRoot) // The compiler should optimize this correctly at compile-time. return sum; return pow(sum, (1.0 / Power)); } // String conversion. template std::string LMetric::ToString() const { std::ostringstream convert; convert << "LMetric [" << this << "]" << std::endl; convert << " Power: " << Power << std::endl; convert << " TakeRoot: " << (TakeRoot ? "true" : "false") << std::endl; return convert.str(); } // L1-metric specializations; the root doesn't matter. template<> template double LMetric<1, true>::Evaluate(const VecType1& a, const VecType2& b) { return accu(abs(a - b)); } template<> template double LMetric<1, false>::Evaluate(const VecType1& a, const VecType2& b) { return accu(abs(a - b)); } // L2-metric specializations. template<> template double LMetric<2, true>::Evaluate(const VecType1& a, const VecType2& b) { return sqrt(accu(square(a - b))); } template<> template double LMetric<2, false>::Evaluate(const VecType1& a, const VecType2& b) { return accu(square(a - b)); } // L3-metric specialization (not very likely to be used, but just in case). template<> template double LMetric<3, true>::Evaluate(const VecType1& a, const VecType2& b) { double sum = 0; for (size_t i = 0; i < a.n_elem; i++) sum += pow(fabs(a[i] - b[i]), 3.0); return pow(accu(pow(abs(a - b), 3.0)), 1.0 / 3.0); } template<> template double LMetric<3, false>::Evaluate(const VecType1& a, const VecType2& b) { return accu(pow(abs(a - b), 3.0)); } // L-infinity (Chebyshev distance) specialization template<> template double LMetric::Evaluate(const VecType1& a, const VecType2& b) { return arma::as_scalar(max(abs(a - b))); } }; // namespace metric }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/metrics/mahalanobis_distance_impl.hpp0000644000176200001440000000515113647512751026212 0ustar liggesusers/*** * @file mahalanobis_distance.cc * @author Ryan Curtin * * Implementation of the Mahalanobis distance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_IMPL_HPP #define __MLPACK_CORE_METRICS_MAHALANOBIS_DISTANCE_IMPL_HPP #include "mahalanobis_distance.hpp" namespace mlpack { namespace metric { /** * Specialization for non-rooted case. */ template<> template double MahalanobisDistance::Evaluate(const VecType1& a, const VecType2& b) { arma::vec m = (a - b); arma::mat out = trans(m) * covariance * m; // 1x1 return out[0]; } /** * Specialization for rooted case. This requires one extra evaluation of * sqrt(). */ template<> template double MahalanobisDistance::Evaluate(const VecType1& a, const VecType2& b) { // Check if covariance matrix has been initialized. if (covariance.n_rows == 0) covariance = arma::eye(a.n_elem, a.n_elem); arma::vec m = (a - b); arma::mat out = trans(m) * covariance * m; // 1x1; return sqrt(out[0]); } // Convert object into string. template std::string MahalanobisDistance::ToString() const { std::ostringstream convert; std::ostringstream convertb; convert << "MahalanobisDistance [" << this << "]" << std::endl; if (TakeRoot) convert << " TakeRoot: TRUE" << std::endl; if (covariance.size() < 65) { convert << " Covariance: " << std::endl; convertb << covariance << std::endl; convert << mlpack::util::Indent(convertb.str(),2); } else { convert << " Covariance matrix: " << covariance.n_rows << "x" ; convert << covariance.n_cols << std::endl << " Range: [" ; convert << covariance.min() << "," << covariance.max() << "]" << std::endl; } return convert.str(); } }; // namespace metric }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/data/0000755000176200001440000000000013647512751017571 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/data/save.hpp0000644000176200001440000000550313647512751021243 0ustar liggesusers/** * @file save.hpp * @author Ryan Curtin * * Save an Armadillo matrix to file. This is necessary because Armadillo does * not transpose matrices upon saving, and it allows us to give better error * output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_SAVE_HPP #define __MLPACK_CORE_DATA_SAVE_HPP //#include #include // Includes Armadillo. #include namespace mlpack { namespace data /** Functions to load and save matrices. */ { /** * Saves a matrix to file, guessing the filetype from the extension. This * will transpose the matrix at save time. If the filetype cannot be * determined, an error will be given. * * The supported types of files are the same as found in Armadillo: * * - CSV (csv_ascii), denoted by .csv, or optionally .txt * - ASCII (raw_ascii), denoted by .txt * - Armadillo ASCII (arma_ascii), also denoted by .txt * - PGM (pgm_binary), denoted by .pgm * - PPM (ppm_binary), denoted by .ppm * - Raw binary (raw_binary), denoted by .bin * - Armadillo binary (arma_binary), denoted by .bin * - HDF5 (hdf5_binary), denoted by .hdf5, .hdf, .h5, or .he5 * * If the file extension is not one of those types, an error will be given. If * the 'fatal' parameter is set to true, an error will cause the program to * exit. If the 'transpose' parameter is set to true, the matrix will be * transposed before saving. Generally, because MLPACK stores matrices in a * column-major format and most datasets are stored on disk as row-major, this * parameter should be left at its default value of 'true'. * * @param filename Name of file to save to. * @param matrix Matrix to save into file. * @param fatal If an error should be reported as fatal (default false). * @param transpose If true, transpose the matrix before saving. * @return Boolean value indicating success or failure of save. */ template bool Save(const std::string& filename, const arma::Mat& matrix, bool fatal = false, bool transpose = true); }; // namespace data }; // namespace mlpack // Include implementation. #include "save_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/data/normalize_labels_impl.hpp0000644000176200001440000000635313647512751024654 0ustar liggesusers/** * @file normalize_labels_impl.hpp * @author Ryan Curtin * * Implementation of label normalization functions; these are useful for mapping * labels to the range [0, n). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_NORMALIZE_LABELS_IMPL_HPP #define __MLPACK_CORE_DATA_NORMALIZE_LABELS_IMPL_HPP // In case it hasn't been included yet. #include "normalize_labels.hpp" namespace mlpack { namespace data { /** * Given a set of labels of a particular datatype, convert them to unsigned * labels in the range [0, n) where n is the number of different labels. Also, * a reverse mapping from the new label to the old value is stored in the * 'mapping' vector. * * @param labelsIn Input labels of arbitrary datatype. * @param labels Vector that unsigned labels will be stored in. * @param mapping Reverse mapping to convert new labels back to old labels. */ template void NormalizeLabels(const arma::Col& labelsIn, arma::Col& labels, arma::Col& mapping) { // Loop over the input labels, and develop the mapping. We'll first naively // resize the mapping to the maximum possible size, and then when we fill it, // we'll resize it back down to its actual size. mapping.set_size(labelsIn.n_elem); labels.set_size(labelsIn.n_elem); size_t curLabel = 0; for (size_t i = 0; i < labelsIn.n_elem; ++i) { bool found = false; for (size_t j = 0; j < curLabel; ++j) { // Is the label already in the list of labels we have seen? if (labelsIn[i] == mapping[j]) { labels[i] = j; found = true; break; } } // Do we need to add this new label? if (!found) { mapping[curLabel] = labelsIn[i]; labels[i] = curLabel; ++curLabel; } } // Resize mapping back down to necessary size. mapping.resize(curLabel); } /** * Given a set of labels that have been mapped to the range [0, n), map them * back to the original labels given by the 'mapping' vector. * * @param labels Set of normalized labels to convert. * @param mapping Mapping to use to convert labels. * @param labelsOut Vector to store new labels in. */ template void RevertLabels(const arma::Col& labels, const arma::Col& mapping, arma::Col& labelsOut) { // We already have the mapping, so we just need to loop over each element. labelsOut.set_size(labels.n_elem); for (size_t i = 0; i < labels.n_elem; ++i) labelsOut[i] = mapping[labels[i]]; } }; // namespace data }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/data/load.hpp0000644000176200001440000000573713647512751021235 0ustar liggesusers/** * @file load.hpp * @author Ryan Curtin * * Load an Armadillo matrix from file. This is necessary because Armadillo does * not transpose matrices on input, and it allows us to give better error * output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_LOAD_HPP #define __MLPACK_CORE_DATA_LOAD_HPP //#include #include // Includes Armadillo. #include namespace mlpack { namespace data /** Functions to load and save matrices. */ { /** * Loads a matrix from file, guessing the filetype from the extension. This * will transpose the matrix at load time. If the filetype cannot be * determined, an error will be given. * * The supported types of files are the same as found in Armadillo: * * - CSV (csv_ascii), denoted by .csv, or optionally .txt * - ASCII (raw_ascii), denoted by .txt * - Armadillo ASCII (arma_ascii), also denoted by .txt * - PGM (pgm_binary), denoted by .pgm * - PPM (ppm_binary), denoted by .ppm * - Raw binary (raw_binary), denoted by .bin * - Armadillo binary (arma_binary), denoted by .bin * - HDF5, denoted by .hdf, .hdf5, .h5, or .he5 * * If the file extension is not one of those types, an error will be given. * This is preferable to Armadillo's default behavior of loading an unknown * filetype as raw_binary, which can have very confusing effects. * * If the parameter 'fatal' is set to true, the program will exit with an error * if the matrix does not load successfully. The parameter 'transpose' controls * whether or not the matrix is transposed after loading. In most cases, * because data is generally stored in a row-major format and MLPACK requires * column-major matrices, this should be left at its default value of 'true'. * * @param filename Name of file to load. * @param matrix Matrix to load contents of file into. * @param fatal If an error should be reported as fatal (default false). * @param transpose If true, transpose the matrix after loading. * @return Boolean value indicating success or failure of load. */ template bool Load(const std::string& filename, arma::Mat& matrix, bool fatal = false, bool transpose = true); }; // namespace data }; // namespace mlpack // Include implementation. #include "load_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/data/load_impl.hpp0000644000176200001440000001557313647512751022255 0ustar liggesusers/** * @file load_impl.hpp * @author Ryan Curtin * * Implementation of templatized load() function defined in load.hpp. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_LOAD_IMPL_HPP #define __MLPACK_CORE_DATA_LOAD_IMPL_HPP // In case it hasn't already been included. #include "load.hpp" #include #include namespace mlpack { namespace data { template bool Load(const std::string& filename, arma::Mat& matrix, bool fatal, bool transpose) { Timer::Start("loading_data"); // First we will try to discriminate by file extension. size_t ext = filename.rfind('.'); if (ext == std::string::npos) { Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Cannot determine type of file '" << filename << "'; " << "no extension is present." << std::endl; else Log::Warn << "Cannot determine type of file '" << filename << "'; " << "no extension is present. Load failed." << std::endl; return false; } // Get the extension and force it to lowercase. std::string extension = filename.substr(ext + 1); std::transform(extension.begin(), extension.end(), extension.begin(), ::tolower); // Catch nonexistent files by opening the stream ourselves. std::fstream stream; stream.open(filename.c_str(), std::fstream::in); if (!stream.is_open()) { Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Cannot open file '" << filename << "'. " << std::endl; else Log::Warn << "Cannot open file '" << filename << "'; load failed." << std::endl; return false; } bool unknownType = false; arma::file_type loadType; std::string stringType; if (extension == "csv") { loadType = arma::csv_ascii; stringType = "CSV data"; } else if (extension == "txt") { // This could be raw ASCII or Armadillo ASCII (ASCII with size header). // We'll let Armadillo do its guessing (although we have to check if it is // arma_ascii ourselves) and see what we come up with. // This is taken from load_auto_detect() in diskio_meat.hpp const std::string ARMA_MAT_TXT = "ARMA_MAT_TXT"; char* rawHeader = new char[ARMA_MAT_TXT.length() + 1]; std::streampos pos = stream.tellg(); stream.read(rawHeader, std::streamsize(ARMA_MAT_TXT.length())); rawHeader[ARMA_MAT_TXT.length()] = '\0'; stream.clear(); stream.seekg(pos); // Reset stream position after peeking. if (std::string(rawHeader) == ARMA_MAT_TXT) { loadType = arma::arma_ascii; stringType = "Armadillo ASCII formatted data"; } else // It's not arma_ascii. Now we let Armadillo guess. { loadType = arma::diskio::guess_file_type(stream); if (loadType == arma::raw_ascii) // Raw ASCII (space-separated). stringType = "raw ASCII formatted data"; else if (loadType == arma::csv_ascii) // CSV can be .txt too. stringType = "CSV data"; else // Unknown .txt... we will throw an error. unknownType = true; } delete[] rawHeader; } else if (extension == "bin") { // This could be raw binary or Armadillo binary (binary with header). We // will check to see if it is Armadillo binary. const std::string ARMA_MAT_BIN = "ARMA_MAT_BIN"; char *rawHeader = new char[ARMA_MAT_BIN.length() + 1]; std::streampos pos = stream.tellg(); stream.read(rawHeader, std::streamsize(ARMA_MAT_BIN.length())); rawHeader[ARMA_MAT_BIN.length()] = '\0'; stream.clear(); stream.seekg(pos); // Reset stream position after peeking. if (std::string(rawHeader) == ARMA_MAT_BIN) { stringType = "Armadillo binary formatted data"; loadType = arma::arma_binary; } else // We can only assume it's raw binary. { stringType = "raw binary formatted data"; loadType = arma::raw_binary; } delete[] rawHeader; } else if (extension == "pgm") { loadType = arma::pgm_binary; stringType = "PGM data"; } else if (extension == "h5" || extension == "hdf5" || extension == "hdf" || extension == "he5") { #ifdef ARMA_USE_HDF5 loadType = arma::hdf5_binary; stringType = "HDF5 data"; #else Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Attempted to load '" << filename << "' as HDF5 data, but " << "Armadillo was compiled without HDF5 support. Load failed." << std::endl; else Log::Warn << "Attempted to load '" << filename << "' as HDF5 data, but " << "Armadillo was compiled without HDF5 support. Load failed." << std::endl; return false; #endif } else // Unknown extension... { unknownType = true; loadType = arma::raw_binary; // Won't be used; prevent a warning. stringType = ""; } // Provide error if we don't know the type. if (unknownType) { Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Unable to detect type of '" << filename << "'; " << "incorrect extension?" << std::endl; else Log::Warn << "Unable to detect type of '" << filename << "'; load failed." << " Incorrect extension?" << std::endl; return false; } // Try to load the file; but if it's raw_binary, it could be a problem. if (loadType == arma::raw_binary) Log::Warn << "Loading '" << filename << "' as " << stringType << "; " << "but this may not be the actual filetype!" << std::endl; else Log::Info << "Loading '" << filename << "' as " << stringType << ". " << std::flush; const bool success = matrix.load(stream, loadType); if (!success) { Log::Info << std::endl; Timer::Stop("loading_data"); if (fatal) Log::Fatal << "Loading from '" << filename << "' failed." << std::endl; else Log::Warn << "Loading from '" << filename << "' failed." << std::endl; return false; } else Log::Info << "Size is " << (transpose ? matrix.n_cols : matrix.n_rows) << " x " << (transpose ? matrix.n_rows : matrix.n_cols) << ".\n"; // Now transpose the matrix, if necessary. if (transpose) matrix = trans(matrix); Timer::Stop("loading_data"); // Finally, return the success indicator. return success; } }; // namespace data }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/data/normalize_labels.hpp0000644000176200001440000000450213647512751023625 0ustar liggesusers/** * @file normalize_labels.hpp * @author Ryan Curtin * * Often labels are not given as {0, 1, 2, ...} but instead {1, 2, ...} or even * {-1, 1} or otherwise. The purpose of this function is to normalize labels to * {0, 1, 2, ...} and provide a mapping back to those labels. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_NORMALIZE_LABELS_HPP #define __MLPACK_CORE_DATA_NORMALIZE_LABELS_HPP #include namespace mlpack { namespace data { /** * Given a set of labels of a particular datatype, convert them to unsigned * labels in the range [0, n) where n is the number of different labels. Also, * a reverse mapping from the new label to the old value is stored in the * 'mapping' vector. * * @param labelsIn Input labels of arbitrary datatype. * @param labels Vector that unsigned labels will be stored in. * @param mapping Reverse mapping to convert new labels back to old labels. */ template void NormalizeLabels(const arma::Col& labelsIn, arma::Col& labels, arma::Col& mapping); /** * Given a set of labels that have been mapped to the range [0, n), map them * back to the original labels given by the 'mapping' vector. * * @param labels Set of normalized labels to convert. * @param mapping Mapping to use to convert labels. * @param labelsOut Vector to store new labels in. */ template void RevertLabels(const arma::Col& labels, const arma::Col& mapping, arma::Col& labelsOut); }; // namespace data }; // namespace mlpack // Include implementation. #include "normalize_labels_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/data/save_impl.hpp0000644000176200001440000001133113647512751022260 0ustar liggesusers/** * @file save_impl.hpp * @author Ryan Curtin * * Implementation of save functionality. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_DATA_SAVE_IMPL_HPP #define __MLPACK_CORE_DATA_SAVE_IMPL_HPP // In case it hasn't already been included. #include "save.hpp" namespace mlpack { namespace data { template bool Save(const std::string& filename, const arma::Mat& matrix, bool fatal, bool transpose) { Timer::Start("saving_data"); // First we will try to discriminate by file extension. size_t ext = filename.rfind('.'); if (ext == std::string::npos) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "No extension given with filename '" << filename << "'; " << "type unknown. Save failed." << std::endl; else Log::Warn << "No extension given with filename '" << filename << "'; " << "type unknown. Save failed." << std::endl; return false; } // Get the actual extension. std::string extension = filename.substr(ext + 1); // Catch errors opening the file. std::fstream stream; stream.open(filename.c_str(), std::fstream::out); if (!stream.is_open()) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Cannot open file '" << filename << "' for writing. " << "Save failed." << std::endl; else Log::Warn << "Cannot open file '" << filename << "' for writing; save " << "failed." << std::endl; return false; } bool unknownType = false; arma::file_type saveType; std::string stringType; if (extension == "csv") { saveType = arma::csv_ascii; stringType = "CSV data"; } else if (extension == "txt") { saveType = arma::raw_ascii; stringType = "raw ASCII formatted data"; } else if (extension == "bin") { saveType = arma::arma_binary; stringType = "Armadillo binary formatted data"; } else if (extension == "pgm") { saveType = arma::pgm_binary; stringType = "PGM data"; } else if (extension == "h5" || extension == "hdf5" || extension == "hdf" || extension == "he5") { #ifdef ARMA_USE_HDF5 saveType = arma::hdf5_binary; stringType = "HDF5 data"; #else Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Attempted to save HDF5 data to '" << filename << "', but " << "Armadillo was compiled without HDF5 support. Save failed." << std::endl; else Log::Warn << "Attempted to save HDF5 data to '" << filename << "', but " << "Armadillo was compiled without HDF5 support. Save failed." << std::endl; return false; #endif } else { unknownType = true; saveType = arma::raw_binary; // Won't be used; prevent a warning. stringType = ""; } // Provide error if we don't know the type. if (unknownType) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Unable to determine format to save to from filename '" << filename << "'. Save failed." << std::endl; else Log::Warn << "Unable to determine format to save to from filename '" << filename << "'. Save failed." << std::endl; return false; } // Try to save the file. Log::Info << "Saving " << stringType << " to '" << filename << "'." << std::endl; // Transpose the matrix. if (transpose) { arma::Mat tmp = trans(matrix); if (!tmp.quiet_save(stream, saveType)) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Save to '" << filename << "' failed." << std::endl; else Log::Warn << "Save to '" << filename << "' failed." << std::endl; return false; } } else { if (!matrix.quiet_save(stream, saveType)) { Timer::Stop("saving_data"); if (fatal) Log::Fatal << "Save to '" << filename << "' failed." << std::endl; else Log::Warn << "Save to '" << filename << "' failed." << std::endl; return false; } } Timer::Stop("saving_data"); // Finally return success. return true; } }; // namespace data }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/0000755000176200001440000000000013647512751020323 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/kernels/cosine_distance.hpp0000644000176200001440000000406013647512751024166 0ustar liggesusers/** * @file cosine_distance.hpp * @author Ryan Curtin * * This implements the cosine distance (or cosine similarity) between two * vectors, which is a measure of the angle between the two vectors. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_COSINE_DISTANCE_HPP #define __MLPACK_CORE_KERNELS_COSINE_DISTANCE_HPP #include namespace mlpack { namespace kernel { /** * The cosine distance (or cosine similarity). It is defined by * * @f[ * d(a, b) = \frac{a^T b}{|| a || || b ||} * @f] * * and this class assumes the standard L2 inner product. */ class CosineDistance { public: /** * Computes the cosine distance between two points. * * @param a First vector. * @param b Second vector. * @return d(a, b). */ template static double Evaluate(const VecType& a, const VecType& b); /** * Returns a string representation of this object. */ std::string ToString() const { std::ostringstream convert; convert << "CosineDistance [" << this << "]" << std::endl; return convert.str(); } }; //! Kernel traits for the cosine distance. template<> class KernelTraits { public: //! The cosine kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack // Include implementation. #include "cosine_distance_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/kernels/kernel_traits.hpp0000644000176200001440000000311013647512751023675 0ustar liggesusers/** * @file kernel_traits.hpp * @author Ryan Curtin * * This provides the KernelTraits class, a template class to get information * about various kernels. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_KERNEL_TRAITS_HPP #define __MLPACK_CORE_KERNELS_KERNEL_TRAITS_HPP namespace mlpack { namespace kernel { /** * This is a template class that can provide information about various kernels. * By default, this class will provide the weakest possible assumptions on * kernels, and each kernel should override values as necessary. If a kernel * doesn't need to override a value, then there's no need to write a * KernelTraits specialization for that class. */ template class KernelTraits { public: /** * If true, then the kernel is normalized: K(x, x) = K(y, y) = 1 for all x. */ static const bool IsNormalized = false; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/hyperbolic_tangent_kernel.hpp0000644000176200001440000000534213647512751026260 0ustar liggesusers/** * @file hyperbolic_tangent_kernel.hpp * @author Ajinkya Kale * * Implementation of the hyperbolic tangent kernel. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_HYPERBOLIC_TANGENT_KERNEL_HPP #define __MLPACK_CORE_KERNELS_HYPERBOLIC_TANGENT_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * Hyperbolic tangent kernel. For any two vectors @f$ x @f$, @f$ y @f$ and a * given scale @f$ s @f$ and offset @f$ t @f$ * * @f[ * K(x, y) = \tanh(s + t) * @f] */ class HyperbolicTangentKernel { public: /** * This constructor sets the default scale to 1.0 and offset to 0.0. */ HyperbolicTangentKernel() : scale(1.0), offset(0.0) { } /** * Construct the hyperbolic tangent kernel with custom scale factor and * offset. * * @param scale Scaling factor for . * @param offset Kernel offset. */ HyperbolicTangentKernel(double scale, double offset) : scale(scale), offset(offset) { } /** * Evaluate the hyperbolic tangent kernel. This evaluation uses Armadillo's * dot() function. * * @tparam VecType Type of vector (should be arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template double Evaluate(const VecType& a, const VecType& b) { return tanh(scale * arma::dot(a, b) + offset); } //! Get scale factor. double Scale() const { return scale; } //! Modify scale factor. double& Scale() { return scale; } //! Get offset for the kernel. double Offset() const { return offset; } //! Modify offset for the kernel. double& Offset() { return offset; } //! Convert object to string. std::string ToString() const { std::ostringstream convert; convert << "HyperbolicTangentKernel [" << this << "]" << std::endl; convert << " Scale: " << scale << std::endl; convert << " Offset: " << offset << std::endl; return convert.str(); } private: double scale; double offset; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/pspectrum_string_kernel_impl.hpp0000644000176200001440000000620413647512751027027 0ustar liggesusers/** * @file pspectrum_string_kernel_impl.hpp * @author Ryan Curtin * * Implementation of the p-spectrum string kernel, created for use with FastMKS. * Instead of passing a data matrix to FastMKS which stores the kernels, pass a * one-dimensional data matrix (data vector) to FastMKS which stores indices of * strings; then, the actual strings are given to the PSpectrumStringKernel at * construction time, and the kernel knows to map the indices to actual strings. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_IMPL_HPP #define __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_IMPL_HPP // In case it has not been included yet. #include "pspectrum_string_kernel.hpp" namespace mlpack { namespace kernel { /** * Evaluate the kernel for the string indices given. As mentioned in the class * documentation, a and b should be 2-element vectors, where the first element * contains the index of the dataset and the second element contains the index * of the string. Therefore, if [2 3] is passed for a, the string used will be * datasets[2][3] (datasets is of type std::vector >&). * * @param a Index of string and dataset for first string. * @param b Index of string and dataset for second string. */ template double PSpectrumStringKernel::Evaluate(const VecType& a, const VecType& b) const { // Get the map of substrings for the two strings we are interested in. const std::map& aMap = counts[a[0]][a[1]]; const std::map& bMap = counts[b[0]][b[1]]; double eval = 0; // Loop through the two maps (which, when iterated through, are sorted // alphabetically). std::map::const_iterator aIt = aMap.begin(); std::map::const_iterator bIt = bMap.begin(); while ((aIt != aMap.end()) && (bIt != bMap.end())) { // Compare alphabetically (this is how std::map is ordered). int result = (*aIt).first.compare((*bIt).first); if (result == 0) // The same substring. { eval += ((*aIt).second * (*bIt).second); // Now increment both. ++aIt; ++bIt; } else if (result > 0) { // aIt is "ahead" of bIt (alphabetically); so increment bIt to "catch up". ++bIt; } else { // bIt is "ahead" of aIt (alphabetically); so increment aIt to "catch up". ++aIt; } } return eval; } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/example_kernel.hpp0000644000176200001440000001376713647512751024045 0ustar liggesusers/** * @file example_kernel.hpp * @author Ryan Curtin * * This is an example kernel. If you are making your own kernel, follow the * outline specified in this file. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_EXAMPLE_KERNEL_HPP #define __MLPACK_CORE_KERNELS_EXAMPLE_KERNEL_HPP #include namespace mlpack { /** * @brief Kernel functions. * * This namespace contains kernel functions, which evaluate some kernel function * @f$ K(x, y) @f$ for some arbitrary vectors @f$ x @f$ and @f$ y @f$ of the * same dimension. The single restriction on the function @f$ K(x, y) @f$ is * that it must satisfy Mercer's condition: * * @f[ * \int \int K(x, y) g(x) g(y) dx dy \ge 0 * @f] * * for all square integrable functions @f$ g(x) @f$. * * The kernels in this namespace all implement the same methods as the * ExampleKernel class. Any additional custom kernels should implement all the * methods that class implements; in addition, any method using a kernel should * rely on any arbitrary kernel function class having a default constructor and * a function * * @code * double Evaluate(arma::vec&, arma::vec&); * @endcode */ namespace kernel { /** * An example kernel function. This is not a useful kernel, but it implements * the two functions necessary to satisfy the Kernel policy (so that a class can * be used whenever an MLPACK method calls for a `typename Kernel` template * parameter. * * All that is necessary is a constructor and an `Evaluate()` function. More * methods could be added; for instance, one useful idea is a constructor which * takes parameters for a kernel (for instance, the width of the Gaussian for a * Gaussian kernel). However, MLPACK methods cannot count on these various * constructors existing, which is why most methods allow passing an * already-instantiated kernel object (and by default the method will construct * the kernel with the default constructor). So, for instance, * * @code * GaussianKernel k(5.0); * KDE kde(dataset, k); * @endcode * * will set up KDE using a Gaussian kernel with a width of 5.0, but * * @code * KDE kde(dataset); * @endcode * * will create the kernel with the default constructor. It is important (but * not strictly mandatory) that your default constructor still gives a working * kernel. * * @note * Not all kernels require state. For instance, the regular dot product needs * no parameters. In that case, no local variables are necessary and * `Evaluate()` can (and should) be declared static. However, for greater * generalization, MLPACK methods expect all kernels to require state and hence * must store instantiated kernel functions; this is why a default constructor * is necessary. * @endnote */ class ExampleKernel { public: /** * The default constructor, which takes no parameters. Because our simple * example kernel has no internal parameters that need to be stored, the * constructor does not need to do anything. For a more complex example, see * the GaussianKernel, which stores an internal parameter. */ ExampleKernel() { } /** * Evaluates the kernel function for two given vectors. In this case, because * our simple example kernel has no internal parameters, we can declare the * function static. For a more complex example which cannot be declared * static, see the GaussianKernel, which stores an internal parameter. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template static double Evaluate(const VecType& a, const VecType& b) { return 0; } /** * Returns a string for the kernel object; in this case, with only the memory * address for the kernel. If your kernel has any members, your ToString() * method should include those as neccessary as well. **/ std::string ToString() const { std::ostringstream convert; convert << "ExampleKernel [" << this << "]" << std::endl; return convert.str(); } /** * Obtains the convolution integral [integral K(||x-a||)K(||b-x||)dx] * for the two vectors. In this case, because * our simple example kernel has no internal parameters, we can declare the * function static. For a more complex example which cannot be declared * static, see the GaussianKernel, which stores an internal parameter. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template static double ConvolutionIntegral(const VecType& a, const VecType& b) { return 0; } /** * Obtains the normalizing volume for the kernel with dimension $dimension$. * In this case, because our simple example kernel has no internal parameters, * we can declare the function static. For a more complex example which * cannot be declared static, see the GaussianKernel, which stores an internal * parameter. * * @param dimension the dimension of the space. * @return the normalization constant. */ static double Normalizer() { return 0; } // Modified to remove unused variable "dimension" //static double Normalizer(size_t dimension=1) { return 0; } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/triangular_kernel.hpp0000644000176200001440000000556213647512751024554 0ustar liggesusers/** * @file triangular_kernel.hpp * @author Ryan Curtin * * Definition and implementation of the trivially simple triangular kernel. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_TRIANGULAR_KERNEL_HPP #define __MLPACK_CORE_KERNELS_TRIANGULAR_KERNEL_HPP #include #include namespace mlpack { namespace kernel { /** * The trivially simple triangular kernel, defined by * * @f[ * K(x, y) = \max \{ 0, 1 - \frac{|| x - y ||_2}{b} \} * @f] * * where \f$ b \f$ is the bandwidth of the kernel. */ class TriangularKernel { public: /** * Initialize the triangular kernel with the given bandwidth (default 1.0). * * @param bandwidth Bandwidth of the triangular kernel. */ TriangularKernel(const double bandwidth = 1.0) : bandwidth(bandwidth) { } /** * Evaluate the triangular kernel for the two given vectors. * * @param a First vector. * @param b Second vector. */ template double Evaluate(const Vec1Type& a, const Vec2Type& b) const { return std::max(0.0, (1 - metric::EuclideanDistance::Evaluate(a, b) / bandwidth)); } /** * Evaluate the triangular kernel given that the distance between the two * points is known. * * @param distance The distance between the two points. */ double Evaluate(const double distance) const { return std::max(0.0, (1 - distance) / bandwidth); } //! Get the bandwidth of the kernel. double Bandwidth() const { return bandwidth; } //! Modify the bandwidth of the kernel. double& Bandwidth() { return bandwidth; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "TriangularKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: //! The bandwidth of the kernel. double bandwidth; }; //! Kernel traits for the triangular kernel. template<> class KernelTraits { public: //! The triangular kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/polynomial_kernel.hpp0000644000176200001440000000563713647512751024572 0ustar liggesusers/** * @file polynomial_kernel.hpp * @author Ajinkya Kale * * Implementation of the polynomial kernel (just the standard dot product). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_POLYNOMIAL_KERNEL_HPP #define __MLPACK_CORE_KERNELS_POLYNOMIAL_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The simple polynomial kernel. For any two vectors @f$ x @f$, @f$ y @f$, * @f$ degree @f$ and @f$ offset @f$, * * @f[ * K(x, y) = (x^T * y + offset) ^ {degree}. * @f] */ class PolynomialKernel { public: /** * Construct the Polynomial Kernel with the given offset and degree. If the * arguments are omitted, the default degree is 2 and the default offset is 0. * * @param offset Offset of the dot product of the arguments. * @param degree Degree of the polynomial. */ PolynomialKernel(const double degree = 2.0, const double offset = 0.0) : degree(degree), offset(offset) { } /** * Simple evaluation of the dot product. This evaluation uses Armadillo's * dot() function. * * @tparam VecType Type of vector (should be arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template double Evaluate(const VecType& a, const VecType& b) const { return pow((arma::dot(a, b) + offset), degree); } //! Get the degree of the polynomial. const double& Degree() const { return degree; } //! Modify the degree of the polynomial. double& Degree() { return degree; } //! Get the offset of the dot product of the arguments. const double& Offset() const { return offset; } //! Modify the offset of the dot product of the arguments. double& Offset() { return offset; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "PolynomialKernel [" << this << "]" << std::endl; convert << " Degree: " << degree << std::endl; convert << " Offset: " << offset << std::endl; return convert.str(); } private: //! The degree of the polynomial. double degree; //! The offset of the dot product of the arguments. double offset; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/spherical_kernel.hpp0000644000176200001440000000720613647512751024353 0ustar liggesusers/** * @file spherical_kernel.hpp * @author Neil Slagle * * This is an example kernel. If you are making your own kernel, follow the * outline specified in this file. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_SPHERICAL_KERNEL_H #define __MLPACK_CORE_KERNELS_SPHERICAL_KERNEL_H #include #include namespace mlpack { namespace kernel { class SphericalKernel { public: SphericalKernel() : bandwidth(1.0), bandwidthSquared(1.0) {} SphericalKernel(double b) : bandwidth(b), bandwidthSquared(b*b) {} template double Evaluate(const VecType& a, const VecType& b) { return (metric::SquaredEuclideanDistance::Evaluate(a, b) <= bandwidthSquared) ? 1.0 : 0.0; } /** * Obtains the convolution integral [integral K(||x-a||)K(||b-x||)dx] * for the two vectors. In this case, because * our simple example kernel has no internal parameters, we can declare the * function static. For a more complex example which cannot be declared * static, see the GaussianKernel, which stores an internal parameter. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template double ConvolutionIntegral(const VecType& a, const VecType& b) { double distance = sqrt(metric::SquaredEuclideanDistance::Evaluate(a, b)); if (distance >= 2.0 * bandwidth) { return 0.0; } double volumeSquared = pow(Normalizer(a.n_rows), 2.0); switch(a.n_rows) { case 1: return 1.0 / volumeSquared * (2.0 * bandwidth - distance); break; case 2: return 1.0 / volumeSquared * (2.0 * bandwidth * bandwidth * acos(distance/(2.0 * bandwidth)) - distance / 4.0 * sqrt(4.0*bandwidth*bandwidth-distance*distance)); break; default: Rcpp::Rcout << "The spherical kernel does not support convolution\ integrals above dimension two, yet..." << std::endl; return -1.0; break; } } double Normalizer(size_t dimension) { return pow(bandwidth, (double) dimension) * pow(M_PI, dimension / 2.0) / boost::math::tgamma(dimension / 2.0 + 1.0); } double Evaluate(double t) { return (t <= bandwidth) ? 1.0 : 0.0; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "SphericalKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: double bandwidth; double bandwidthSquared; }; //! Kernel traits for the spherical kernel. template<> class KernelTraits { public: //! The spherical kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/laplacian_kernel.hpp0000644000176200001440000000662613647512751024332 0ustar liggesusers/** * @file laplacian_kernel.hpp * @author Ajinkya Kale * * Implementation of the Laplacian kernel (LaplacianKernel). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_LAPLACIAN_KERNEL_HPP #define __MLPACK_CORE_KERNELS_LAPLACIAN_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The standard Laplacian kernel. Given two vectors @f$ x @f$, @f$ y @f$, and a * bandwidth @f$ \mu @f$ (set in the constructor), * * @f[ * K(x, y) = \exp(-\frac{|| x - y ||}{\mu}). * @f] * * The implementation is all in the header file because it is so simple. */ class LaplacianKernel { public: /** * Default constructor; sets bandwidth to 1.0. */ LaplacianKernel() : bandwidth(1.0) { } /** * Construct the Laplacian kernel with a custom bandwidth. * * @param bandwidth The bandwidth of the kernel (@f$\mu@f$). */ LaplacianKernel(double bandwidth) : bandwidth(bandwidth) { } /** * Evaluation of the Laplacian kernel. This could be generalized to use any * distance metric, not the Euclidean distance, but for now, the Euclidean * distance is used. * * @tparam VecType Type of vector (likely arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ template double Evaluate(const VecType& a, const VecType& b) const { // The precalculation of gamma saves us a little computation time. return exp(-metric::EuclideanDistance::Evaluate(a, b) / bandwidth); } /** * Evaluation of the Laplacian kernel given the distance between two points. * * @param t The distance between the two points the kernel should be evaluated * on. * @return K(t) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ double Evaluate(const double t) const { // The precalculation of gamma saves us a little computation time. return exp(-t / bandwidth); } //! Get the bandwidth. double Bandwidth() const { return bandwidth; } //! Modify the bandwidth. double& Bandwidth() { return bandwidth; } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "LaplacianKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: //! Kernel bandwidth. double bandwidth; }; //! Kernel traits of the Laplacian kernel. template<> class KernelTraits { public: //! The Laplacian kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/linear_kernel.hpp0000644000176200001440000000402713647512751023651 0ustar liggesusers/** * @file linear_kernel.hpp * @author Wei Guan * @author James Cline * @author Ryan Curtin * * Implementation of the linear kernel (just the standard dot product). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_LINEAR_KERNEL_HPP #define __MLPACK_CORE_KERNELS_LINEAR_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The simple linear kernel (dot product). For any two vectors @f$ x @f$ and * @f$ y @f$, * * @f[ * K(x, y) = x^T y * @f] * * This kernel has no parameters and therefore the evaluation can be static. */ class LinearKernel { public: /** * This constructor does nothing; the linear kernel has no parameters to * store. */ LinearKernel() { } /** * Simple evaluation of the dot product. This evaluation uses Armadillo's * dot() function. * * @tparam VecType Type of vector (should be arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b). */ template static double Evaluate(const VecType& a, const VecType& b) { return arma::dot(a, b); } //! Return a string representation of the kernel. std::string ToString() const { std::ostringstream convert; convert << "LinearKernel [" << this << "]" << std::endl; return convert.str(); } }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/cosine_distance_impl.hpp0000644000176200001440000000314213647512751025207 0ustar liggesusers/** * @file cosine_distance_impl.hpp * @author Ryan Curtin * * This implements the cosine distance. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_COSINE_DISTANCE_IMPL_HPP #define __MLPACK_CORE_KERNELS_COSINE_DISTANCE_IMPL_HPP #include "cosine_distance.hpp" namespace mlpack { namespace kernel { template double CosineDistance::Evaluate(const VecType& a, const VecType& b) { // Since we are using the L2 inner product, this is easy. But we have to make // sure we aren't dividing by zero (if we are, then the cosine similarity is // 0: we reason this value because the cosine distance is just a normalized // dot product; take away the normalization, and if ||a|| or ||b|| is equal to // 0, then a^T b is zero too). const double denominator = norm(a, 2) * norm(b, 2); if (denominator == 0.0) return 0; else return dot(a, b) / denominator; } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/pspectrum_string_kernel.hpp0000644000176200001440000001336613647512751026015 0ustar liggesusers/** * @file pspectrum_string_kernel.hpp * @author Ryan Curtin * * Implementation of the p-spectrum string kernel, created for use with FastMKS. * Instead of passing a data matrix to FastMKS which stores the kernels, pass a * one-dimensional data matrix (data vector) to FastMKS which stores indices of * strings; then, the actual strings are given to the PSpectrumStringKernel at * construction time, and the kernel knows to map the indices to actual strings. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_HPP #define __MLPACK_CORE_KERNELS_PSPECTRUM_STRING_KERNEL_HPP #include #include #include #include namespace mlpack { namespace kernel { /** * The p-spectrum string kernel. Given a length p, the p-spectrum kernel finds * the contiguous subsequence match count between two strings. The kernel will * take every possible substring of length p of one string and count how many * times it appears in the other string. * * The string kernel, when created, must be passed a reference to a series of * string datasets (std::vector >&). This is because * MLPACK only supports datasets which are Armadillo matrices -- and a dataset * of variable-length strings cannot be easily cast into an Armadillo matrix. * * Therefore, once the PSpectrumStringKernel is created with a reference to the * string datasets, a "fake" Armadillo data matrix must be created, which simply * holds indices to the strings they represent. This "fake" matrix has two rows * and n columns (where n is the number of strings in the dataset). The first * row holds the index of the dataset (remember, the kernel can have multiple * datasets), and the second row holds the index of the string. A fake matrix * containing only strings from dataset 0 might look like this: * * [[0 0 0 0 0 0 0 0 0] * [0 1 2 3 4 5 6 7 8]] * * This fake matrix is then given to the machine learning method, which will * eventually call PSpectrumStringKernel::Evaluate(a, b), where a and b are two * columns of the fake matrix. The string kernel will then map these fake * columns back to the strings they represent, and then correctly evaluate the * kernel. * * Unfortunately, not every machine learning method will work with this kernel. * Only machine learning methods which do not ever operate on the explicit * representation of points can use this kernel. So, for instance, one cannot * build a kd-tree on strings, because the BinarySpaceTree<> class will split * the data according to the fake data matrix -- resulting in a meaningless * tree. This kernel was originally written for the FastMKS method; so, at the * very least, it will work with that. */ class PSpectrumStringKernel { public: /** * Initialize the PSpectrumStringKernel with the given string datasets. For * more information on this, see the general class documentation. * * @param datasets Sets of string data. * @param p The length of substrings to search. */ PSpectrumStringKernel(const std::vector >& datasets, const size_t p); /** * Evaluate the kernel for the string indices given. As mentioned in the * class documentation, a and b should be 2-element vectors, where the first * element contains the index of the dataset and the second element contains * the index of the string. Therefore, if [2 3] is passed for a, the string * used will be datasets[2][3] (datasets is of type * std::vector >&). * * @param a Index of string and dataset for first string. * @param b Index of string and dataset for second string. */ template double Evaluate(const VecType& a, const VecType& b) const; //! Access the lists of substrings. const std::vector > >& Counts() const { return counts; } //! Modify the lists of substrings. std::vector > >& Counts() { return counts; } //! Access the value of p. size_t P() const { return p; } //! Modify the value of p. size_t& P() { return p; } /* * Returns a string representation of this object. */ std::string ToString() const{ std::ostringstream convert; convert << "PSpectrumStringKernel [" << this << "]" << std::endl; convert << " p used: " << p << std::endl; convert << " Dataset:" << datasets.size() << std::endl; std::ostringstream convertb; for (size_t ind=0; ind < datasets.size(); ind++) convertb << datasets[ind].size(); convert << mlpack::util::Indent(convertb.str(),2); return convert.str(); } private: //! The datasets. const std::vector >& datasets; //! Mappings of the datasets to counts of substrings. Such a huge structure //! is not wonderful... std::vector > > counts; //! The value of p to use in calculation. size_t p; }; }; // namespace kernel }; // namespace mlpack // Include implementation of templated Evaluate(). #include "pspectrum_string_kernel_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/kernels/epanechnikov_kernel_impl.hpp0000644000176200001440000000574313647512751026100 0ustar liggesusers/** * @file epanechnikov_kernel_impl.hpp * @author Neil Slagle * * Implementation of template-based Epanechnikov kernel functions. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_IMPL_HPP #define __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_IMPL_HPP // In case it hasn't already been included. #include "epanechnikov_kernel.hpp" #include namespace mlpack { namespace kernel { template inline double EpanechnikovKernel::Evaluate(const Vec1Type& a, const Vec2Type& b) const { return std::max(0.0, 1.0 - metric::SquaredEuclideanDistance::Evaluate(a, b) * inverseBandwidthSquared); } /** * Obtains the convolution integral [integral of K(||x-a||) K(||b-x||) dx] * for the two vectors. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template double EpanechnikovKernel::ConvolutionIntegral(const VecType& a, const VecType& b) { double distance = sqrt(metric::SquaredEuclideanDistance::Evaluate(a, b)); if (distance >= 2.0 * bandwidth) return 0.0; double volumeSquared = std::pow(Normalizer(a.n_rows), 2.0); switch (a.n_rows) { case 1: return 1.0 / volumeSquared * (16.0 / 15.0 * bandwidth - 4.0 * distance * distance / (3.0 * bandwidth) + 2.0 * distance * distance * distance / (3.0 * bandwidth * bandwidth) - std::pow(distance, 5.0) / (30.0 * std::pow(bandwidth, 4.0))); break; case 2: return 1.0 / volumeSquared * ((2.0 / 3.0 * bandwidth * bandwidth - distance * distance) * asin(sqrt(1.0 - std::pow(distance / (2.0 * bandwidth), 2.0))) + sqrt(4.0 * bandwidth * bandwidth - distance * distance) * (distance / 6.0 + 2.0 / 9.0 * distance * std::pow(distance / bandwidth, 2.0) - distance / 72.0 * std::pow(distance / bandwidth, 4.0))); break; default: Rcpp::Rcout << "EpanechnikovKernel::ConvolutionIntegral(): dimension " << a.n_rows << " not supported."; return -1.0; // This line will not execute. break; } } }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/kernels/epanechnikov_kernel.hpp0000644000176200001440000000610213647512751025045 0ustar liggesusers/** * @file epanechnikov_kernel.hpp * @author Neil Slagle * * Definition of the Epanechnikov kernel. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_HPP #define __MLPACK_CORE_KERNELS_EPANECHNIKOV_KERNEL_HPP #include namespace mlpack { namespace kernel { /** * The Epanechnikov kernel, defined as * * @f[ * K(x, y) = \max \{0, 1 - || x - y ||^2_2 / b^2 \} * @f] * * where @f$ b @f$ is the bandwidth the of the kernel (defaults to 1.0). */ class EpanechnikovKernel { public: /** * Instantiate the Epanechnikov kernel with the given bandwidth (default 1.0). * * @param bandwidth Bandwidth of the kernel. */ EpanechnikovKernel(const double bandwidth = 1.0) : bandwidth(bandwidth), inverseBandwidthSquared(1.0 / (bandwidth * bandwidth)) { } /** * Evaluate the Epanechnikov kernel on the given two inputs. * * @param a One input vector. * @param b The other input vector. */ template double Evaluate(const Vec1Type& a, const Vec2Type& b) const; /** * Evaluate the Epanechnikov kernel given that the distance between the two * input points is known. */ double Evaluate(const double distance) const; /** * Obtains the convolution integral [integral of K(||x-a||) K(||b-x||) dx] * for the two vectors. * * @tparam VecType Type of vector (arma::vec, arma::spvec should be expected). * @param a First vector. * @param b Second vector. * @return the convolution integral value. */ template double ConvolutionIntegral(const VecType& a, const VecType& b); /** * Compute the normalizer of this Epanechnikov kernel for the given dimension. * * @param dimension Dimension to calculate the normalizer for. */ double Normalizer(const size_t dimension); // Returns String of O bject std::string ToString() const; private: //! Bandwidth of the kernel. double bandwidth; //! Cached value of the inverse bandwidth squared (to speed up computation). double inverseBandwidthSquared; }; //! Kernel traits for the Epanechnikov kernel. template<> class KernelTraits { public: //! The Epanechnikov kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack // Include implementation. #include "epanechnikov_kernel_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/kernels/gaussian_kernel.hpp0000644000176200001440000001102113647512751024201 0ustar liggesusers/** * @file gaussian_kernel.hpp * @author Wei Guan * @author James Cline * @author Ryan Curtin * * Implementation of the Gaussian kernel (GaussianKernel). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_KERNELS_GAUSSIAN_KERNEL_HPP #define __MLPACK_CORE_KERNELS_GAUSSIAN_KERNEL_HPP #include #include namespace mlpack { namespace kernel { /** * The standard Gaussian kernel. Given two vectors @f$ x @f$, @f$ y @f$, and a * bandwidth @f$ \mu @f$ (set in the constructor), * * @f[ * K(x, y) = \exp(-\frac{|| x - y ||^2}{2 \mu^2}). * @f] * * The implementation is all in the header file because it is so simple. */ class GaussianKernel { public: /** * Default constructor; sets bandwidth to 1.0. */ GaussianKernel() : bandwidth(1.0), gamma(-0.5) { } /** * Construct the Gaussian kernel with a custom bandwidth. * * @param bandwidth The bandwidth of the kernel (@f$\mu@f$). */ GaussianKernel(const double bandwidth) : bandwidth(bandwidth), gamma(-0.5 * pow(bandwidth, -2.0)) { } /** * Evaluation of the Gaussian kernel. This could be generalized to use any * distance metric, not the Euclidean distance, but for now, the Euclidean * distance is used. * * @tparam VecType Type of vector (likely arma::vec or arma::spvec). * @param a First vector. * @param b Second vector. * @return K(a, b) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ template double Evaluate(const VecType& a, const VecType& b) const { // The precalculation of gamma saves us a little computation time. return exp(gamma * metric::SquaredEuclideanDistance::Evaluate(a, b)); } /** * Evaluation of the Gaussian kernel given the distance between two points. * * @param t The distance between the two points the kernel is evaluated on. * @return K(t) using the bandwidth (@f$\mu@f$) specified in the * constructor. */ double Evaluate(const double t) const { // The precalculation of gamma saves us a little computation time. return exp(gamma * std::pow(t, 2.0)); } /** * Obtain the normalization constant of the Gaussian kernel. * * @param dimension * @return the normalization constant */ double Normalizer(const size_t dimension) { return pow(sqrt(2.0 * M_PI) * bandwidth, (double) dimension); } /** * Obtain a convolution integral of the Gaussian kernel. * * @param a, first vector * @param b, second vector * @return the convolution integral */ template double ConvolutionIntegral(const VecType& a, const VecType& b) { return Evaluate(sqrt(metric::SquaredEuclideanDistance::Evaluate(a, b) / 2.0)) / (Normalizer(a.n_rows) * pow(2.0, (double) a.n_rows / 2.0)); } //! Get the bandwidth. double Bandwidth() const { return bandwidth; } //! Modify the bandwidth. This takes an argument because we must update the //! precalculated constant (gamma). void Bandwidth(const double bandwidth) { this->bandwidth = bandwidth; this->gamma = -0.5 * pow(bandwidth, -2.0); } //! Get the precalculated constant. double Gamma() const { return gamma; } //! Convert object to string. std::string ToString() const { std::ostringstream convert; convert << "GaussianKernel [" << this << "]" << std::endl; convert << " Bandwidth: " << bandwidth << std::endl; return convert.str(); } private: //! Kernel bandwidth. double bandwidth; //! Precalculated constant depending on the bandwidth; //! @f$ \gamma = -\frac{1}{2 \mu^2} @f$. double gamma; }; //! Kernel traits for the Gaussian kernel. template<> class KernelTraits { public: //! The Gaussian kernel is normalized: K(x, x) = 1 for all x. static const bool IsNormalized = true; }; }; // namespace kernel }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/0000755000176200001440000000000013647512751021065 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/optimizers/aug_lagrangian/0000755000176200001440000000000013647512751024024 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_test_functions.hpp0000644000176200001440000001161513647512751032627 0ustar liggesusers/** * @file aug_lagrangian_test_functions.hpp * @author Ryan Curtin * * Define test functions for the augmented Lagrangian method. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_TEST_FUNCTIONS_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_TEST_FUNCTIONS_HPP #include namespace mlpack { namespace optimization { /** * This function is taken from "Practical Mathematical Optimization" (Snyman), * section 5.3.8 ("Application of the Augmented Lagrangian Method"). It has * only one constraint. * * The minimum that satisfies the constraint is x = [1, 4], with an objective * value of 70. */ class AugLagrangianTestFunction { public: AugLagrangianTestFunction(); AugLagrangianTestFunction(const arma::mat& initial_point); double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); size_t NumConstraints() const { return 1; } double EvaluateConstraint(const size_t index, const arma::mat& coordinates); void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const { return initialPoint; } // convert the obkect into a string std::string ToString() const; private: arma::mat initialPoint; }; /** * This function is taken from M. Gockenbach's lectures on general nonlinear * programs, found at: * http://www.math.mtu.edu/~msgocken/ma5630spring2003/lectures/nlp/nlp.pdf * * The program we are using is example 2.5 from this document. * I have arbitrarily decided that this will be called the Gockenbach function. * * The minimum that satisfies the two constraints is given as * x = [0.12288, -1.1078, 0.015100], with an objective value of about 29.634. */ class GockenbachFunction { public: GockenbachFunction(); GockenbachFunction(const arma::mat& initial_point); double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); size_t NumConstraints() const { return 2; }; double EvaluateConstraint(const size_t index, const arma::mat& coordinates); void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const { return initialPoint; } private: arma::mat initialPoint; }; /** * This function is the Lovasz-Theta semidefinite program, as implemented in the * following paper: * * S. Burer, R. Monteiro * "A nonlinear programming algorithm for solving semidefinite programs via * low-rank factorization." * Journal of Mathematical Programming, 2004 * * Given a simple, undirected graph G = (V, E), the Lovasz-Theta SDP is defined * by: * * min_X{Tr(-(e e^T)^T X) : Tr(X) = 1, X_ij = 0 for all (i, j) in E, X >= 0} * * where e is the vector of all ones and X has dimension |V| x |V|. * * In the Monteiro-Burer formulation, we take X = R * R^T, where R is the * coordinates given to the Evaluate(), Gradient(), EvaluateConstraint(), and * GradientConstraint() functions. */ class LovaszThetaSDP { public: LovaszThetaSDP(); /** * Initialize the Lovasz-Theta SDP with the given set of edges. The edge * matrix should consist of rows of two dimensions, where dimension 0 is the * first vertex of the edge and dimension 1 is the second edge (or vice versa, * as it doesn't make a difference). * * @param edges Matrix of edges. */ LovaszThetaSDP(const arma::mat& edges); double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); size_t NumConstraints() const; double EvaluateConstraint(const size_t index, const arma::mat& coordinates); void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint(); const arma::mat& Edges() const { return edges; } arma::mat& Edges() { return edges; } private: arma::mat edges; size_t vertices; arma::mat initialPoint; }; }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_TEST_FUNCTIONS_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian.hpp0000644000176200001440000001376213647512751027505 0ustar liggesusers/** * @file aug_lagrangian.hpp * @author Ryan Curtin * * Definition of AugLagrangian class, which implements the Augmented Lagrangian * optimization method (also called the 'method of multipliers'. This class * uses the L-BFGS optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_HPP #include #include #include "aug_lagrangian_function.hpp" namespace mlpack { namespace optimization { /** * The AugLagrangian class implements the Augmented Lagrangian method of * optimization. In this scheme, a penalty term is added to the Lagrangian. * This method is also called the "method of multipliers". * * The template class LagrangianFunction must implement the following five * methods: * * - double Evaluate(const arma::mat& coordinates); * - void Gradient(const arma::mat& coordinates, arma::mat& gradient); * - size_t NumConstraints(); * - double EvaluateConstraint(size_t index, const arma::mat& coordinates); * - double GradientConstraint(size_t index, const arma::mat& coordinates, * arma::mat& gradient); * * The number of constraints must be greater than or equal to 0, and * EvaluateConstraint() should evaluate the constraint at the given index for * the given coordinates. Evaluate() should provide the objective function * value for the given coordinates. * * @tparam LagrangianFunction Function which can be optimized by this class. */ template class AugLagrangian { public: //! Shorthand for the type of the L-BFGS optimizer we'll be using. typedef L_BFGS > L_BFGSType; /** * Initialize the Augmented Lagrangian with the default L-BFGS optimizer. We * limit the number of L-BFGS iterations to 1000, rather than the unlimited * default L-BFGS. * * @param function The function to be optimized. */ AugLagrangian(LagrangianFunction& function); /** * Initialize the Augmented Lagrangian with a custom L-BFGS optimizer. * * @param function The function to be optimized. This must be a pre-created * utility AugLagrangianFunction. * @param lbfgs The custom L-BFGS optimizer to be used. This should have * already been initialized with the given AugLagrangianFunction. */ AugLagrangian(AugLagrangianFunction& augfunc, L_BFGSType& lbfgs); /** * Optimize the function. The value '1' is used for the initial value of each * Lagrange multiplier. To set the Lagrange multipliers yourself, use the * other overload of Optimize(). * * @param coordinates Output matrix to store the optimized coordinates in. * @param maxIterations Maximum number of iterations of the Augmented * Lagrangian algorithm. 0 indicates no maximum. * @param sigma Initial penalty parameter. */ bool Optimize(arma::mat& coordinates, const size_t maxIterations = 1000); /** * Optimize the function, giving initial estimates for the Lagrange * multipliers. The vector of Lagrange multipliers will be modified to * contain the Lagrange multipliers of the final solution (if one is found). * * @param coordinates Output matrix to store the optimized coordinates in. * @param initLambda Vector of initial Lagrange multipliers. Should have * length equal to the number of constraints. * @param initSigma Initial penalty parameter. * @param maxIterations Maximum number of iterations of the Augmented * Lagrangian algorithm. 0 indicates no maximum. */ bool Optimize(arma::mat& coordinates, const arma::vec& initLambda, const double initSigma, const size_t maxIterations = 1000); //! Get the LagrangianFunction. const LagrangianFunction& Function() const { return function; } //! Modify the LagrangianFunction. LagrangianFunction& Function() { return function; } //! Get the L-BFGS object used for the actual optimization. const L_BFGSType& LBFGS() const { return lbfgs; } //! Modify the L-BFGS object used for the actual optimization. L_BFGSType& LBFGS() { return lbfgs; } //! Get the Lagrange multipliers. const arma::vec& Lambda() const { return augfunc.Lambda(); } //! Modify the Lagrange multipliers (i.e. set them before optimization). arma::vec& Lambda() { return augfunc.Lambda(); } //! Get the penalty parameter. double Sigma() const { return augfunc.Sigma(); } //! Modify the penalty parameter. double& Sigma() { return augfunc.Sigma(); } // convert the obkect into a string std::string ToString() const; private: //! Function to be optimized. LagrangianFunction& function; //! Internally used AugLagrangianFunction which holds the function we are //! optimizing. This isn't publically accessible, but we provide ways to get //! to the Lagrange multipliers and the penalty parameter sigma. AugLagrangianFunction augfunc; //! If the user did not pass an L_BFGS object, we'll use our own internal one. L_BFGSType lbfgsInternal; //! The L-BFGS optimizer that we will use. L_BFGSType& lbfgs; }; }; // namespace optimization }; // namespace mlpack #include "aug_lagrangian_impl.hpp" #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function.hpp0000644000176200001440000001074713647512751031412 0ustar liggesusers/** * @file aug_lagrangian_function.hpp * @author Ryan Curtin * * Contains a utility class for AugLagrangian. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_HPP #include namespace mlpack { namespace optimization { /** * This is a utility class used by AugLagrangian, meant to wrap a * LagrangianFunction into a function usable by a simple optimizer like L-BFGS. * Given a LagrangianFunction which follows the format outlined in the * documentation for AugLagrangian, this class provides Evaluate(), Gradient(), * and GetInitialPoint() functions which allow this class to be used with a * simple optimizer like L-BFGS. * * This class can be specialized for your particular implementation -- commonly, * a faster method for computing the overall objective and gradient of the * augmented Lagrangian function can be implemented than the naive, default * implementation given. Use class template specialization and re-implement all * of the methods (unfortunately, C++ specialization rules mean you have to * re-implement everything). * * @tparam LagrangianFunction Lagrangian function to be used. */ template class AugLagrangianFunction { public: /** * Initialize the AugLagrangianFunction, but don't set the Lagrange * multipliers or penalty parameters yet. Make sure you set the Lagrange * multipliers before you use this... * * @param function Lagrangian function. */ AugLagrangianFunction(LagrangianFunction& function); /** * Initialize the AugLagrangianFunction with the given LagrangianFunction, * Lagrange multipliers, and initial penalty parameter. * * @param function Lagrangian function. * @param lambda Initial Lagrange multipliers. * @param sigma Initial penalty parameter. */ AugLagrangianFunction(LagrangianFunction& function, const arma::vec& lambda, const double sigma); /** * Evaluate the objective function of the Augmented Lagrangian function, which * is the standard Lagrangian function evaluation plus a penalty term, which * penalizes unsatisfied constraints. * * @param coordinates Coordinates to evaluate function at. * @return Objective function. */ double Evaluate(const arma::mat& coordinates) const; /** * Evaluate the gradient of the Augmented Lagrangian function. * * @param coordinates Coordinates to evaluate gradient at. * @param gradient Matrix to store gradient into. */ void Gradient(const arma::mat& coordinates, arma::mat& gradient) const; /** * Get the initial point of the optimization (supplied by the * LagrangianFunction). * * @return Initial point. */ const arma::mat& GetInitialPoint() const; //! Get the Lagrange multipliers. const arma::vec& Lambda() const { return lambda; } //! Modify the Lagrange multipliers. arma::vec& Lambda() { return lambda; } //! Get sigma (the penalty parameter). double Sigma() const { return sigma; } //! Modify sigma (the penalty parameter). double& Sigma() { return sigma; } //! Get the Lagrangian function. const LagrangianFunction& Function() const { return function; } //! Modify the Lagrangian function. LagrangianFunction& Function() { return function; } // convert the obkect into a string std::string ToString() const; private: //! Instantiation of the function to be optimized. LagrangianFunction& function; //! The Lagrange multipliers. arma::vec lambda; //! The penalty parameter. double sigma; }; }; // namespace optimization }; // namespace mlpack // Include basic implementation. #include "aug_lagrangian_function_impl.hpp" #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_function_impl.hpp0000644000176200001440000001004413647512751032421 0ustar liggesusers/** * @file aug_lagrangian_function_impl.hpp * @author Ryan Curtin * * Simple, naive implementation of AugLagrangianFunction. Better * specializations can probably be given in many cases, but this is the most * general case. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_FUNCTION_IMPL_HPP // In case it hasn't been included. #include "aug_lagrangian_function.hpp" namespace mlpack { namespace optimization { // Initialize the AugLagrangianFunction. template AugLagrangianFunction::AugLagrangianFunction( LagrangianFunction& function) : function(function), lambda(function.NumConstraints()), sigma(10) { // Initialize lambda vector to all zeroes. lambda.zeros(); } // Initialize the AugLagrangianFunction. template AugLagrangianFunction::AugLagrangianFunction( LagrangianFunction& function, const arma::vec& lambda, const double sigma) : lambda(lambda), sigma(sigma), function(function) { // Nothing else to do. } // Evaluate the AugLagrangianFunction at the given coordinates. template double AugLagrangianFunction::Evaluate( const arma::mat& coordinates) const { // The augmented Lagrangian is evaluated as // f(x) + {-lambda_i * c_i(x) + (sigma / 2) c_i(x)^2} for all constraints // First get the function's objective value. double objective = function.Evaluate(coordinates); // Now loop for each constraint. for (size_t i = 0; i < function.NumConstraints(); ++i) { double constraint = function.EvaluateConstraint(i, coordinates); objective += (-lambda[i] * constraint) + sigma * std::pow(constraint, 2) / 2; } return objective; } // Evaluate the gradient of the AugLagrangianFunction at the given coordinates. template void AugLagrangianFunction::Gradient( const arma::mat& coordinates, arma::mat& gradient) const { // The augmented Lagrangian's gradient is evaluted as // f'(x) + {(-lambda_i + sigma * c_i(x)) * c'_i(x)} for all constraints gradient.zeros(); function.Gradient(coordinates, gradient); arma::mat constraintGradient; // Temporary for constraint gradients. for (size_t i = 0; i < function.NumConstraints(); i++) { function.GradientConstraint(i, coordinates, constraintGradient); // Now calculate scaling factor and add to existing gradient. arma::mat tmpGradient; tmpGradient = (-lambda[i] + sigma * function.EvaluateConstraint(i, coordinates)) * constraintGradient; gradient += tmpGradient; } } // Get the initial point. template const arma::mat& AugLagrangianFunction::GetInitialPoint() const { return function.GetInitialPoint(); } // Convert the object to a string. template std::string AugLagrangianFunction::ToString() const { std::ostringstream convert; convert << "AugLagrangianFunction [" << this << "]" << std::endl; convert << " Lagrange multipliers:" << std::endl; convert << lambda; convert << " Penalty parameter: " << sigma << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/aug_lagrangian/aug_lagrangian_impl.hpp0000644000176200001440000001435313647512751030523 0ustar liggesusers/** * @file aug_lagrangian_impl.hpp * @author Ryan Curtin * * Implementation of AugLagrangian class (Augmented Lagrangian optimization * method). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_IMPL_HPP #include #include "aug_lagrangian_function.hpp" namespace mlpack { namespace optimization { template AugLagrangian::AugLagrangian(LagrangianFunction& function) : function(function), augfunc(function), lbfgsInternal(augfunc), lbfgs(lbfgsInternal) { lbfgs.MaxIterations() = 1000; } template AugLagrangian::AugLagrangian( AugLagrangianFunction& augfunc, L_BFGSType& lbfgs) : function(augfunc.Function()), augfunc(augfunc), lbfgs(lbfgs) { // Nothing to do. lbfgsInternal isn't used in this case. } // This overload just sets the lambda and sigma and calls the other overload. template bool AugLagrangian::Optimize(arma::mat& coordinates, const arma::vec& initLambda, const double initSigma, const size_t maxIterations) { augfunc.Lambda() = initLambda; augfunc.Sigma() = initSigma; return Optimize(coordinates, maxIterations); } // Convert the object to a string. template std::string AugLagrangian::ToString() const { std::ostringstream convert; convert << "AugLagrangian [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << mlpack::util::Indent(function.ToString(), 2); convert << " L-BFGS optimizer:" << std::endl; convert << mlpack::util::Indent(lbfgs.ToString(), 2); return convert.str(); } template bool AugLagrangian::Optimize(arma::mat& coordinates, const size_t maxIterations) { // Ensure that we update lambda immediately. double penaltyThreshold = DBL_MAX; // Track the last objective to compare for convergence. double lastObjective = function.Evaluate(coordinates); // Then, calculate the current penalty. double penalty = 0; for (size_t i = 0; i < function.NumConstraints(); i++) penalty += std::pow(function.EvaluateConstraint(i, coordinates), 2); Rcpp::Rcout << "Penalty is " << penalty << " (threshold " << penaltyThreshold << ")." << std::endl; // The odd comparison allows user to pass maxIterations = 0 (i.e. no limit on // number of iterations). size_t it; for (it = 0; it != (maxIterations - 1); it++) { Rcpp::Rcout << "AugLagrangian on iteration " << it << ", starting with objective " << lastObjective << "." << std::endl; // Rcpp::Rcout << coordinates << std::endl; // Rcpp::Rcout << trans(coordinates) * coordinates << std::endl; if (!lbfgs.Optimize(coordinates)) Rcpp::Rcout << "L-BFGS reported an error during optimization." << std::endl; // Check if we are done with the entire optimization (the threshold we are // comparing with is arbitrary). if (std::abs(lastObjective - function.Evaluate(coordinates)) < 1e-10 && augfunc.Sigma() > 500000) return true; lastObjective = function.Evaluate(coordinates); // Assuming that the optimization has converged to a new set of coordinates, // we now update either lambda or sigma. We update sigma if the penalty // term is too high, and we update lambda otherwise. // First, calculate the current penalty. double penalty = 0; for (size_t i = 0; i < function.NumConstraints(); i++) { penalty += std::pow(function.EvaluateConstraint(i, coordinates), 2); // Rcpp::Rcout << "Constraint " << i << " is " << // function.EvaluateConstraint(i, coordinates) << std::endl; } Rcpp::Rcout << "Penalty is " << penalty << " (threshold " << penaltyThreshold << ")." << std::endl; for (size_t i = 0; i < function.NumConstraints(); ++i) { // arma::mat tmpgrad; // function.GradientConstraint(i, coordinates, tmpgrad); // Rcpp::Rcout << "Gradient of constraint " << i << " is " << std::endl; // Rcpp::Rcout << tmpgrad << std::endl; } if (penalty < penaltyThreshold) // We update lambda. { // We use the update: lambda_{k + 1} = lambda_k - sigma * c(coordinates), // but we have to write a loop to do this for each constraint. for (size_t i = 0; i < function.NumConstraints(); i++) augfunc.Lambda()[i] -= augfunc.Sigma() * function.EvaluateConstraint(i, coordinates); // We also update the penalty threshold to be a factor of the current // penalty. TODO: this factor should be a parameter (from CLI). The // value of 0.25 is taken from Burer and Monteiro (2002). penaltyThreshold = 0.25 * penalty; Rcpp::Rcout << "Lagrange multiplier estimates updated." << std::endl; } else { // We multiply sigma by a constant value. TODO: this factor should be a // parameter (from CLI). The value of 10 is taken from Burer and Monteiro // (2002). augfunc.Sigma() *= 10; Rcpp::Rcout << "Updated sigma to " << augfunc.Sigma() << "." << std::endl; } } return false; } }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_AUG_LAGRANGIAN_AUG_LAGRANGIAN_IMPL_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/lbfgs/0000755000176200001440000000000013647512751022162 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/optimizers/lbfgs/test_functions.hpp0000644000176200001440000001245713647512751025753 0ustar liggesusers/** * @file test_functions.hpp * @author Ryan Curtin * * A collection of functions to test optimizers (in this case, L-BFGS). These * come from the following paper: * * "Testing Unconstrained Optimization Software" * Jorge J. Moré, Burton S. Garbow, and Kenneth E. Hillstrom. 1981. * ACM Trans. Math. Softw. 7, 1 (March 1981), 17-41. * http://portal.acm.org/citation.cfm?id=355934.355936 * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LBFGS_TEST_FUNCTIONS_HPP #define __MLPACK_CORE_OPTIMIZERS_LBFGS_TEST_FUNCTIONS_HPP #include // To fulfill the template policy class 'FunctionType', we must implement // the following: // // FunctionType(); // constructor // void Gradient(const arma::mat& coordinates, arma::mat& gradient); // double Evaluate(const arma::mat& coordinates); // const arma::mat& GetInitialPoint(); // // Note that we are using an arma::mat instead of the more intuitive and // expected arma::vec. This is because L-BFGS will also optimize matrices. // However, remember that an arma::vec is simply an (n x 1) arma::mat. You can // use either internally but the L-BFGS method requires arma::mat& to be passed // (C++ does not allow implicit reference casting to subclasses). namespace mlpack { namespace optimization { namespace test { /** * The Rosenbrock function, defined by * f(x) = f1(x) + f2(x) * f1(x) = 100 (x2 - x1^2)^2 * f2(x) = (1 - x1)^2 * x_0 = [-1.2, 1] * * This should optimize to f(x) = 0, at x = [1, 1]. * * "An automatic method for finding the greatest or least value of a function." * H.H. Rosenbrock. 1960. Comput. J. 3., 175-184. */ class RosenbrockFunction { public: RosenbrockFunction(); // initialize initial point double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; }; /** * The Wood function, defined by * f(x) = f1(x) + f2(x) + f3(x) + f4(x) + f5(x) + f6(x) * f1(x) = 100 (x2 - x1^2)^2 * f2(x) = (1 - x1)^2 * f3(x) = 90 (x4 - x3^2)^2 * f4(x) = (1 - x3)^2 * f5(x) = 10 (x2 + x4 - 2)^2 * f6(x) = (1 / 10) (x2 - x4)^2 * x_0 = [-3, -1, -3, -1] * * This should optimize to f(x) = 0, at x = [1, 1, 1, 1]. * * "A comparative study of nonlinear programming codes." * A.R. Colville. 1968. Rep. 320-2949, IBM N.Y. Scientific Center. */ class WoodFunction { public: WoodFunction(); // initialize initial point double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; }; /** * The Generalized Rosenbrock function in n dimensions, defined by * f(x) = sum_i^{n - 1} (f(i)(x)) * f_i(x) = 100 * (x_i^2 - x_{i + 1})^2 + (1 - x_i)^2 * x_0 = [-1.2, 1, -1.2, 1, ...] * * This should optimize to f(x) = 0, at x = [1, 1, 1, 1, ...]. * * This function can also be used for stochastic gradient descent (SGD) as a * decomposable function (DecomposableFunctionType), so there are other * overloads of Evaluate() and Gradient() implemented, as well as * NumFunctions(). * * "An analysis of the behavior of a glass of genetic adaptive systems." * K.A. De Jong. Ph.D. thesis, University of Michigan, 1975. */ class GeneralizedRosenbrockFunction { public: /*** * Set the dimensionality of the extended Rosenbrock function. * * @param n Number of dimensions for the function. */ GeneralizedRosenbrockFunction(int n); double Evaluate(const arma::mat& coordinates) const; void Gradient(const arma::mat& coordinates, arma::mat& gradient) const; size_t NumFunctions() const { return n - 1; } double Evaluate(const arma::mat& coordinates, const size_t i) const; void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) const; const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; int n; // Dimensionality }; /** * The Generalized Rosenbrock function in 4 dimensions with the Wood Function in * four dimensions. In this function we are actually optimizing a 2x4 matrix of * coordinates, not a vector. */ class RosenbrockWoodFunction { public: RosenbrockWoodFunction(); // initialize initial point double Evaluate(const arma::mat& coordinates); void Gradient(const arma::mat& coordinates, arma::mat& gradient); const arma::mat& GetInitialPoint() const; private: arma::mat initialPoint; GeneralizedRosenbrockFunction rf; WoodFunction wf; }; }; // namespace test }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_LBFGS_TEST_FUNCTIONS_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/lbfgs/lbfgs.hpp0000644000176200001440000002377213647512751024003 0ustar liggesusers/** * @file lbfgs.hpp * @author Dongryeol Lee * @author Ryan Curtin * * The generic L-BFGS optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_HPP #define __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_HPP #include namespace mlpack { namespace optimization { /** * The generic L-BFGS optimizer, which uses a back-tracking line search * algorithm to minimize a function. The parameters for the algorithm (number * of memory points, maximum step size, and so forth) are all configurable via * either the constructor or standalone modifier functions. A function which * can be optimized by this class must implement the following methods: * * - a default constructor * - double Evaluate(const arma::mat& coordinates); * - void Gradient(const arma::mat& coordinates, arma::mat& gradient); * - arma::mat& GetInitialPoint(); */ template class L_BFGS { public: /** * Initialize the L-BFGS object. Store a reference to the function we will be * optimizing and set the size of the memory for the algorithm. There are * many parameters that can be set for the optimization, but default values * are given for each of them. * * @param function Instance of function to be optimized. * @param numBasis Number of memory points to be stored (default 5). * @param maxIterations Maximum number of iterations for the optimization * (default 0 -- may run indefinitely). * @param armijoConstant Controls the accuracy of the line search routine for * determining the Armijo condition. * @param wolfe Parameter for detecting the Wolfe condition. * @param minGradientNorm Minimum gradient norm required to continue the * optimization. * @param maxLineSearchTrials The maximum number of trials for the line search * (before giving up). * @param minStep The minimum step of the line search. * @param maxStep The maximum step of the line search. */ L_BFGS(FunctionType& function, const size_t numBasis = 5, /* entirely arbitrary */ const size_t maxIterations = 0, /* run forever */ const double armijoConstant = 1e-4, const double wolfe = 0.9, const double minGradientNorm = 1e-10, const size_t maxLineSearchTrials = 50, const double minStep = 1e-20, const double maxStep = 1e20); /** * Return the point where the lowest function value has been found. * * @return arma::vec representing the point and a double with the function * value at that point. */ const std::pair& MinPointIterate() const; /** * Use L-BFGS to optimize the given function, starting at the given iterate * point and finding the minimum. The maximum number of iterations is set in * the constructor (or with MaxIterations()). Alternately, another overload * is provided which takes a maximum number of iterations as a parameter. The * given starting point will be modified to store the finishing point of the * algorithm, and the final objective value is returned. * * @param iterate Starting point (will be modified). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate); /** * Use L-BFGS to optimize (minimize) the given function, starting at the given * iterate point, and performing no more than the given maximum number of * iterations (the class variable maxIterations is ignored for this run, but * not modified). The given starting point will be modified to store the * finishing point of the algorithm, and the final objective value is * returned. * * @param iterate Starting point (will be modified). * @param maxIterations Maximum number of iterations (0 specifies no limit). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate, const size_t maxIterations); //! Return the function that is being optimized. const FunctionType& Function() const { return function; } //! Modify the function that is being optimized. FunctionType& Function() { return function; } //! Get the memory size. size_t NumBasis() const { return numBasis; } //! Modify the memory size. size_t& NumBasis() { return numBasis; } //! Get the maximum number of iterations. size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations. size_t& MaxIterations() { return maxIterations; } //! Get the Armijo condition constant. double ArmijoConstant() const { return armijoConstant; } //! Modify the Armijo condition constant. double& ArmijoConstant() { return armijoConstant; } //! Get the Wolfe parameter. double Wolfe() const { return wolfe; } //! Modify the Wolfe parameter. double& Wolfe() { return wolfe; } //! Get the minimum gradient norm. double MinGradientNorm() const { return minGradientNorm; } //! Modify the minimum gradient norm. double& MinGradientNorm() { return minGradientNorm; } //! Get the maximum number of line search trials. size_t MaxLineSearchTrials() const { return maxLineSearchTrials; } //! Modify the maximum number of line search trials. size_t& MaxLineSearchTrials() { return maxLineSearchTrials; } //! Return the minimum line search step size. double MinStep() const { return minStep; } //! Modify the minimum line search step size. double& MinStep() { return minStep; } //! Return the maximum line search step size. double MaxStep() const { return maxStep; } //! Modify the maximum line search step size. double& MaxStep() { return maxStep; } // convert the obkect into a string std::string ToString() const; private: //! Internal reference to the function we are optimizing. FunctionType& function; //! Position of the new iterate. arma::mat newIterateTmp; //! Stores all the s matrices in memory. arma::cube s; //! Stores all the y matrices in memory. arma::cube y; //! Size of memory for this L-BFGS optimizer. size_t numBasis; //! Maximum number of iterations. size_t maxIterations; //! Parameter for determining the Armijo condition. double armijoConstant; //! Parameter for detecting the Wolfe condition. double wolfe; //! Minimum gradient norm required to continue the optimization. double minGradientNorm; //! Maximum number of trials for the line search. size_t maxLineSearchTrials; //! Minimum step of the line search. double minStep; //! Maximum step of the line search. double maxStep; //! Best point found so far. std::pair minPointIterate; /** * Evaluate the function at the given iterate point and store the result if it * is a new minimum. * * @return The value of the function. */ double Evaluate(const arma::mat& iterate); /** * Calculate the scaling factor, gamma, which is used to scale the Hessian * approximation matrix. See method M3 in Section 4 of Liu and Nocedal * (1989). * * @return The calculated scaling factor. */ double ChooseScalingFactor(const size_t iterationNum, const arma::mat& gradient); /** * Check to make sure that the norm of the gradient is not smaller than 1e-5. * Currently that value is not configurable. * * @return (norm < minGradientNorm). */ bool GradientNormTooSmall(const arma::mat& gradient); /** * Perform a back-tracking line search along the search direction to * calculate a step size satisfying the Wolfe conditions. The parameter * iterate will be modified if the method is successful. * * @param functionValue Value of the function at the initial point * @param iterate The initial point to begin the line search from * @param gradient The gradient at the initial point * @param searchDirection A vector specifying the search direction * @param stepSize Variable the calculated step size will be stored in * * @return false if no step size is suitable, true otherwise. */ bool LineSearch(double& functionValue, arma::mat& iterate, arma::mat& gradient, const arma::mat& searchDirection); /** * Find the L-BFGS search direction. * * @param gradient The gradient at the current point * @param iteration_num The iteration number * @param scaling_factor Scaling factor to use (see ChooseScalingFactor_()) * @param search_direction Vector to store search direction in */ void SearchDirection(const arma::mat& gradient, const size_t iterationNum, const double scalingFactor, arma::mat& searchDirection); /** * Update the y and s matrices, which store the differences * between the iterate and old iterate and the differences between the * gradient and the old gradient, respectively. * * @param iterationNum Iteration number * @param iterate Current point * @param oldIterate Point at last iteration * @param gradient Gradient at current point (iterate) * @param oldGradient Gradient at last iteration point (oldIterate) */ void UpdateBasisSet(const size_t iterationNum, const arma::mat& iterate, const arma::mat& oldIterate, const arma::mat& gradient, const arma::mat& oldGradient); }; }; // namespace optimization }; // namespace mlpack #include "lbfgs_impl.hpp" #endif // __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/lbfgs/lbfgs_impl.hpp0000644000176200001440000003632613647512751025023 0ustar liggesusers/** * @file lbfgs_impl.hpp * @author Dongryeol Lee (dongryel@cc.gatech.edu) * @author Ryan Curtin * * The implementation of the L_BFGS optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_IMPL_HPP namespace mlpack { namespace optimization { /** * Initialize the L_BFGS object. Copy the function we will be optimizing and * set the size of the memory for the algorithm. * * @param function Instance of function to be optimized * @param numBasis Number of memory points to be stored * @param armijoConstant Controls the accuracy of the line search routine for * determining the Armijo condition. * @param wolfe Parameter for detecting the Wolfe condition. * @param minGradientNorm Minimum gradient norm required to continue the * optimization. * @param maxLineSearchTrials The maximum number of trials for the line search * (before giving up). * @param minStep The minimum step of the line search. * @param maxStep The maximum step of the line search. */ template L_BFGS::L_BFGS(FunctionType& function, const size_t numBasis, const size_t maxIterations, const double armijoConstant, const double wolfe, const double minGradientNorm, const size_t maxLineSearchTrials, const double minStep, const double maxStep) : function(function), numBasis(numBasis), maxIterations(maxIterations), armijoConstant(armijoConstant), wolfe(wolfe), minGradientNorm(minGradientNorm), maxLineSearchTrials(maxLineSearchTrials), minStep(minStep), maxStep(maxStep) { // Get the dimensions of the coordinates of the function; GetInitialPoint() // might return an arma::vec, but that's okay because then n_cols will simply // be 1. const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; newIterateTmp.set_size(rows, cols); s.set_size(rows, cols, numBasis); y.set_size(rows, cols, numBasis); // Allocate the pair holding the min iterate information. minPointIterate.first.zeros(rows, cols); minPointIterate.second = std::numeric_limits::max(); } /** * Evaluate the function at the given iterate point and store the result if * it is a new minimum. * * @return The value of the function */ template double L_BFGS::Evaluate(const arma::mat& iterate) { // Evaluate the function and keep track of the minimum function // value encountered during the optimization. double functionValue = function.Evaluate(iterate); if (functionValue < minPointIterate.second) { minPointIterate.first = iterate; minPointIterate.second = functionValue; } return functionValue; } /** * Calculate the scaling factor gamma which is used to scale the Hessian * approximation matrix. See method M3 in Section 4 of Liu and Nocedal (1989). * * @return The calculated scaling factor */ template double L_BFGS::ChooseScalingFactor(const size_t iterationNum, const arma::mat& gradient) { double scalingFactor = 1.0; if (iterationNum > 0) { int previousPos = (iterationNum - 1) % numBasis; // Get s and y matrices once instead of multiple times. arma::mat& sMat = s.slice(previousPos); arma::mat& yMat = y.slice(previousPos); scalingFactor = dot(sMat, yMat) / dot(yMat, yMat); } else { scalingFactor = 1.0 / sqrt(dot(gradient, gradient)); } return scalingFactor; } /** * Check to make sure that the norm of the gradient is not smaller than 1e-10. * Currently that value is not configurable. * * @return (norm < minGradientNorm) */ template bool L_BFGS::GradientNormTooSmall(const arma::mat& gradient) { double norm = arma::norm(gradient, 2); return (norm < minGradientNorm); } /** * Perform a back-tracking line search along the search direction to calculate a * step size satisfying the Wolfe conditions. * * @param functionValue Value of the function at the initial point * @param iterate The initial point to begin the line search from * @param gradient The gradient at the initial point * @param searchDirection A vector specifying the search direction * @param stepSize Variable the calculated step size will be stored in * * @return false if no step size is suitable, true otherwise. */ template bool L_BFGS::LineSearch(double& functionValue, arma::mat& iterate, arma::mat& gradient, const arma::mat& searchDirection) { // Default first step size of 1.0. double stepSize = 1.0; // The initial linear term approximation in the direction of the // search direction. double initialSearchDirectionDotGradient = arma::dot(gradient, searchDirection); // If it is not a descent direction, just report failure. if (initialSearchDirectionDotGradient > 0.0) { Rcpp::Rcerr << "L-BFGS line search direction is not a descent direction " << "(terminating)!" << std::endl; return false; } // Save the initial function value. double initialFunctionValue = functionValue; // Unit linear approximation to the decrease in function value. double linearApproxFunctionValueDecrease = armijoConstant * initialSearchDirectionDotGradient; // The number of iteration in the search. size_t numIterations = 0; // Armijo step size scaling factor for increase and decrease. const double inc = 2.1; const double dec = 0.5; double width = 0; while (true) { // Perform a step and evaluate the gradient and the function values at that // point. newIterateTmp = iterate; newIterateTmp += stepSize * searchDirection; functionValue = Evaluate(newIterateTmp); function.Gradient(newIterateTmp, gradient); numIterations++; if (functionValue > initialFunctionValue + stepSize * linearApproxFunctionValueDecrease) { width = dec; } else { // Check Wolfe's condition. double searchDirectionDotGradient = arma::dot(gradient, searchDirection); if (searchDirectionDotGradient < wolfe * initialSearchDirectionDotGradient) { width = inc; } else { if (searchDirectionDotGradient > -wolfe * initialSearchDirectionDotGradient) { width = dec; } else { break; } } } // Terminate when the step size gets too small or too big or it // exceeds the max number of iterations. if ((stepSize < minStep) || (stepSize > maxStep) || (numIterations >= maxLineSearchTrials)) { return false; } // Scale the step size. stepSize *= width; } // Move to the new iterate. iterate = newIterateTmp; return true; } /** * Find the L_BFGS search direction. * * @param gradient The gradient at the current point * @param iterationNum The iteration number * @param scalingFactor Scaling factor to use (see ChooseScalingFactor_()) * @param searchDirection Vector to store search direction in */ template void L_BFGS::SearchDirection(const arma::mat& gradient, const size_t iterationNum, const double scalingFactor, arma::mat& searchDirection) { // Start from this point. searchDirection = gradient; // See "A Recursive Formula to Compute H * g" in "Updating quasi-Newton // matrices with limited storage" (Nocedal, 1980). // Temporary variables. arma::vec rho(numBasis); arma::vec alpha(numBasis); size_t limit = (numBasis > iterationNum) ? 0 : (iterationNum - numBasis); for (size_t i = iterationNum; i != limit; i--) { int translatedPosition = (i + (numBasis - 1)) % numBasis; rho[iterationNum - i] = 1.0 / arma::dot(y.slice(translatedPosition), s.slice(translatedPosition)); alpha[iterationNum - i] = rho[iterationNum - i] * arma::dot(s.slice(translatedPosition), searchDirection); searchDirection -= alpha[iterationNum - i] * y.slice(translatedPosition); } searchDirection *= scalingFactor; for (size_t i = limit; i < iterationNum; i++) { int translatedPosition = i % numBasis; double beta = rho[iterationNum - i - 1] * arma::dot(y.slice(translatedPosition), searchDirection); searchDirection += (alpha[iterationNum - i - 1] - beta) * s.slice(translatedPosition); } // Negate the search direction so that it is a descent direction. searchDirection *= -1; } /** * Update the y and s matrices, which store the differences between * the iterate and old iterate and the differences between the gradient and the * old gradient, respectively. * * @param iterationNum Iteration number * @param iterate Current point * @param oldIterate Point at last iteration * @param gradient Gradient at current point (iterate) * @param oldGradient Gradient at last iteration point (oldIterate) */ template void L_BFGS::UpdateBasisSet(const size_t iterationNum, const arma::mat& iterate, const arma::mat& oldIterate, const arma::mat& gradient, const arma::mat& oldGradient) { // Overwrite a certain position instead of pushing everything in the vector // back one position. int overwritePos = iterationNum % numBasis; s.slice(overwritePos) = iterate - oldIterate; y.slice(overwritePos) = gradient - oldGradient; } /** * Return the point where the lowest function value has been found. * * @return arma::vec representing the point and a double with the function * value at that point. */ template inline const std::pair& L_BFGS::MinPointIterate() const { return minPointIterate; } template inline double L_BFGS::Optimize(arma::mat& iterate) { return Optimize(iterate, maxIterations); } /** * Use L_BFGS to optimize the given function, starting at the given iterate * point and performing no more than the specified number of maximum iterations. * The given starting point will be modified to store the finishing point of the * algorithm. * * @param numIterations Maximum number of iterations to perform * @param iterate Starting point (will be modified) */ template double L_BFGS::Optimize(arma::mat& iterate, const size_t maxIterations) { // Ensure that the cubes holding past iterations' information are the right // size. Also set the current best point value to the maximum. const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; s.set_size(rows, cols, numBasis); y.set_size(rows, cols, numBasis); minPointIterate.second = std::numeric_limits::max(); // The old iterate to be saved. arma::mat oldIterate; oldIterate.zeros(iterate.n_rows, iterate.n_cols); // Whether to optimize until convergence. bool optimizeUntilConvergence = (maxIterations == 0); // The initial function value. double functionValue = Evaluate(iterate); // The gradient: the current and the old. arma::mat gradient; arma::mat oldGradient; gradient.zeros(iterate.n_rows, iterate.n_cols); oldGradient.zeros(iterate.n_rows, iterate.n_cols); // The search direction. arma::mat searchDirection; searchDirection.zeros(iterate.n_rows, iterate.n_cols); // The initial gradient value. function.Gradient(iterate, gradient); // The main optimization loop. for (size_t itNum = 0; optimizeUntilConvergence || (itNum != maxIterations); ++itNum) { Rcpp::Rcout << "L-BFGS iteration " << itNum << "; objective " << function.Evaluate(iterate) << "." << std::endl; // Break when the norm of the gradient becomes too small. if (GradientNormTooSmall(gradient)) { Rcpp::Rcout << "L-BFGS gradient norm too small (terminating successfully)." << std::endl; break; } // Choose the scaling factor. double scalingFactor = ChooseScalingFactor(itNum, gradient); // Build an approximation to the Hessian and choose the search // direction for the current iteration. SearchDirection(gradient, itNum, scalingFactor, searchDirection); // Save the old iterate and the gradient before stepping. oldIterate = iterate; oldGradient = gradient; // Do a line search and take a step. if (!LineSearch(functionValue, iterate, gradient, searchDirection)) { Rcpp::Rcout << "Line search failed. Stopping optimization." << std::endl; break; // The line search failed; nothing else to try. } // It is possible that the difference between the two coordinates is zero. // In this case we terminate successfully. if (accu(iterate != oldIterate) == 0) { Rcpp::Rcout << "L-BFGS step size of 0 (terminating successfully)." << std::endl; break; } // Overwrite an old basis set. UpdateBasisSet(itNum, iterate, oldIterate, gradient, oldGradient); } // End of the optimization loop. return function.Evaluate(iterate); } // Convert the object to a string. template std::string L_BFGS::ToString() const { std::ostringstream convert; convert << "L_BFGS [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << util::Indent(function.ToString(), 2); convert << " Memory size: " << numBasis << std::endl; convert << " Cube size: " << s.n_rows << "x" << s.n_cols << "x" << s.n_slices << std::endl; convert << " Maximum iterations: " << maxIterations << std::endl; convert << " Armijo condition constant: " << armijoConstant << std::endl; convert << " Wolfe parameter: " << wolfe << std::endl; convert << " Minimum gradient norm: " << minGradientNorm << std::endl; convert << " Minimum step for line search: " << minStep << std::endl; convert << " Maximum step for line search: " << maxStep << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif // __MLPACK_CORE_OPTIMIZERS_LBFGS_LBFGS_IMPL_HPP RcppMLPACK/inst/include/mlpack/core/optimizers/sa/0000755000176200001440000000000013647512751021470 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/optimizers/sa/sa.hpp0000644000176200001440000002163513647512751022613 0ustar liggesusers/** * @file sa.hpp * @author Zhihao Lou * * Simulated Annealing (SA). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SA_SA_HPP #define __MLPACK_CORE_OPTIMIZERS_SA_SA_HPP #include #include "exponential_schedule.hpp" namespace mlpack { namespace optimization { /** * Simulated Annealing is an stochastic optimization algorithm which is able to * deliver near-optimal results quickly without knowing the gradient of the * function being optimized. It has unique hill climbing capability that makes * it less vulnerable to local minima. This implementation uses exponential * cooling schedule and feedback move control by default, but the cooling * schedule can be changed via a template parameter. * * The algorithm keeps the temperature at initial temperature for initMove * steps to get rid of the dependency on the initial condition. After that, it * cools every step until the system is considered frozen or maxIterations is * reached. * * At each step, SA only perturbs one parameter at a time. When SA has perturbed * all parameters in a problem, a sweep has been completed. Every moveCtrlSweep * sweeps, the algorithm does feedback move control to change the average move * size depending on the responsiveness of each parameter. Parameter gain * controls the proportion of the feedback control. * * The system is considered "frozen" when its score fails to change more then * tolerance for maxToleranceSweep consecutive sweeps. * * For SA to work, the FunctionType parameter must implement the following * two methods: * * double Evaluate(const arma::mat& coordinates); * arma::mat& GetInitialPoint(); * * and the CoolingScheduleType parameter must implement the following method: * * double NextTemperature(const double currentTemperature, * const double currentValue); * * which returns the next temperature given current temperature and the value * of the function being optimized. * * @tparam FunctionType objective function type to be minimized. * @tparam CoolingScheduleType type for cooling schedule */ template< typename FunctionType, typename CoolingScheduleType = ExponentialSchedule > class SA { public: /** * Construct the SA optimizer with the given function and parameters. * * @param function Function to be minimized. * @param coolingSchedule Instantiated cooling schedule. * @param maxIterations Maximum number of iterations allowed (0 indicates no limit). * @param initT Initial temperature. * @param initMoves Number of initial iterations without changing temperature. * @param moveCtrlSweep Sweeps per feedback move control. * @param tolerance Tolerance to consider system frozen. * @param maxToleranceSweep Maximum sweeps below tolerance to consider system * frozen. * @param maxMoveCoef Maximum move size. * @param initMoveCoef Initial move size. * @param gain Proportional control in feedback move control. */ SA(FunctionType& function, CoolingScheduleType& coolingSchedule, const size_t maxIterations = 1000000, const double initT = 10000., const size_t initMoves = 1000, const size_t moveCtrlSweep = 100, const double tolerance = 1e-5, const size_t maxToleranceSweep = 3, const double maxMoveCoef = 20, const double initMoveCoef = 0.3, const double gain = 0.3); /** * Optimize the given function using simulated annealing. The given starting * point will be modified to store the finishing point of the algorithm, and * the final objective value is returned. * * @param iterate Starting point (will be modified). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate); //! Get the instantiated function to be optimized. const FunctionType& Function() const { return function; } //! Modify the instantiated function. FunctionType& Function() { return function; } //! Get the temperature. double Temperature() const { return temperature; } //! Modify the temperature. double& Temperature() { return temperature; } //! Get the initial moves. size_t InitMoves() const { return initMoves; } //! Modify the initial moves. size_t& InitMoves() { return initMoves; } //! Get sweeps per move control. size_t MoveCtrlSweep() const { return moveCtrlSweep; } //! Modify sweeps per move control. size_t& MoveCtrlSweep() { return moveCtrlSweep; } //! Get the tolerance. double Tolerance() const { return tolerance; } //! Modify the tolerance. double& Tolerance() { return tolerance; } //! Get the maxToleranceSweep. size_t MaxToleranceSweep() const { return maxToleranceSweep; } //! Modify the maxToleranceSweep. size_t& MaxToleranceSweep() { return maxToleranceSweep; } //! Get the gain. double Gain() const { return gain; } //! Modify the gain. double& Gain() { return gain; } //! Get the maximum number of iterations. size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations. size_t& MaxIterations() { return maxIterations; } //! Get the maximum move size of each parameter. arma::mat MaxMove() const { return maxMove; } //! Modify the maximum move size of each parameter. arma::mat& MaxMove() { return maxMove; } //! Get move size of each parameter. arma::mat MoveSize() const { return moveSize; } //! Modify move size of each parameter. arma::mat& MoveSize() { return moveSize; } //! Return a string representation of this object. std::string ToString() const; private: //! The function to be optimized. FunctionType& function; //! The cooling schedule being used. CoolingScheduleType& coolingSchedule; //! The maximum number of iterations. size_t maxIterations; //! The current temperature. double temperature; //! The number of initial moves before reducing the temperature. size_t initMoves; //! The number of sweeps before a MoveControl() call. size_t moveCtrlSweep; //! Tolerance for convergence. double tolerance; //! Number of sweeps in tolerance before system is considered frozen. size_t maxToleranceSweep; //! Proportional control in feedback move control. double gain; //! Maximum move size of each parameter. arma::mat maxMove; //! Move size of each parameter. arma::mat moveSize; /** * GenerateMove proposes a move on element iterate(idx), and determines if * that move is acceptable or not according to the Metropolis criterion. * After that it increments idx so the next call will make a move on next * parameters. When all elements of the state have been moved (a sweep), it * resets idx and increments sweepCounter. When sweepCounter reaches * moveCtrlSweep, it performs MoveControl() and resets sweepCounter. * * @param iterate Current optimization position. * @param accept Matrix representing which parameters have had accepted moves. * @param energy Current energy of the system. * @param idx Current parameter to modify. * @param sweepCounter Current counter representing how many sweeps have been * completed. */ void GenerateMove(arma::mat& iterate, arma::mat& accept, double& energy, size_t& idx, size_t& sweepCounter); /** * MoveControl() uses a proportional feedback control to determine the size * parameter to pass to the move generation distribution. The target of such * move control is to make the acceptance ratio, accept/nMoves, be as close to * 0.44 as possible. Generally speaking, the larger the move size is, the * larger the function value change of the move will be, and less likely such * move will be accepted by the Metropolis criterion. Thus, the move size is * controlled by * * log(moveSize) = log(moveSize) + gain * (accept/nMoves - target) * * For more theory and the mysterious 0.44 value, see Jimmy K.-C. Lam and * Jean-Marc Delosme. `An efficient simulated annealing schedule: derivation'. * Technical Report 8816, Yale University, 1988. * * @param nMoves Number of moves since last call. * @param accept Matrix representing which parameters have had accepted moves. */ void MoveControl(const size_t nMoves, arma::mat& accept); }; }; // namespace optimization }; // namespace mlpack #include "sa_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/optimizers/sa/exponential_schedule.hpp0000644000176200001440000000456313647512751026413 0ustar liggesusers/** * @file exponential_schedule.hpp * @author Zhihao Lou * * Exponential (geometric) cooling schedule used in SA. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SA_EXPONENTIAL_SCHEDULE_HPP #define __MLPACK_CORE_OPTIMIZERS_SA_EXPONENTIAL_SCHEDULE_HPP namespace mlpack { namespace optimization { /** * The exponential cooling schedule cools the temperature T at every step * according to the equation * * \f[ * T_{n+1} = (1-\lambda) T_{n} * \f] * * where \f$ 0<\lambda<1 \f$ is the cooling speed. The smaller \f$ \lambda \f$ * is, the slower the cooling speed, and better the final result will be. Some * literature uses \f$ \alpha = (-1 \lambda) \f$ instead. In practice, * \f$ \alpha \f$ is very close to 1 and will be awkward to input (e.g. * alpha = 0.999999 vs lambda = 1e-6). */ class ExponentialSchedule { public: /* * Construct the ExponentialSchedule with the given parameter. * * @param lambda Cooling speed. */ ExponentialSchedule(const double lambda = 0.001) : lambda(lambda) { } /** * Returns the next temperature given current status. The current system's * energy is not used in this calculation. * * @param currentTemperature Current temperature of system. * @param currentEnergy Current energy of system (not used). */ double NextTemperature( const double currentTemperature, const double /* currentEnergy */) { return (1 - lambda) * currentTemperature; } //! Get the cooling speed, lambda. double Lambda() const { return lambda; } //! Modify the cooling speed, lambda. double& Lambda() { return lambda; } private: //! The cooling speed. double lambda; }; }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/sa/sa_impl.hpp0000644000176200001440000001747313647512751023641 0ustar liggesusers/** * @file sa_impl.hpp * @auther Zhihao Lou * * The implementation of the SA optimizer. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SA_SA_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_SA_SA_IMPL_HPP #include namespace mlpack { namespace optimization { template< typename FunctionType, typename CoolingScheduleType > SA::SA( FunctionType& function, CoolingScheduleType& coolingSchedule, const size_t maxIterations, const double initT, const size_t initMoves, const size_t moveCtrlSweep, const double tolerance, const size_t maxToleranceSweep, const double maxMoveCoef, const double initMoveCoef, const double gain) : function(function), coolingSchedule(coolingSchedule), maxIterations(maxIterations), temperature(initT), initMoves(initMoves), moveCtrlSweep(moveCtrlSweep), tolerance(tolerance), maxToleranceSweep(maxToleranceSweep), gain(gain) { const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; maxMove.set_size(rows, cols); maxMove.fill(maxMoveCoef); moveSize.set_size(rows, cols); moveSize.fill(initMoveCoef); } //! Optimize the function (minimize). template< typename FunctionType, typename CoolingScheduleType > double SA::Optimize(arma::mat &iterate) { const size_t rows = function.GetInitialPoint().n_rows; const size_t cols = function.GetInitialPoint().n_cols; size_t frozenCount = 0; double energy = function.Evaluate(iterate); double oldEnergy = energy; math::RandomSeed(std::time(NULL)); size_t idx = 0; size_t sweepCounter = 0; arma::mat accept(rows, cols); accept.zeros(); // Initial moves to get rid of dependency of initial states. for (size_t i = 0; i < initMoves; ++i) GenerateMove(iterate, accept, energy, idx, sweepCounter); // Iterating and cooling. for (size_t i = 0; i != maxIterations; ++i) { oldEnergy = energy; GenerateMove(iterate, accept, energy, idx, sweepCounter); temperature = coolingSchedule.NextTemperature(temperature, energy); // Determine if the optimization has entered (or continues to be in) a // frozen state. if (std::abs(energy - oldEnergy) < tolerance) ++frozenCount; else frozenCount = 0; // Terminate, if possible. if (frozenCount >= maxToleranceSweep * moveCtrlSweep * iterate.n_elem) { Rcpp::Rcout << "SA: minimized within tolerance " << tolerance << " for " << maxToleranceSweep << " sweeps after " << i << " iterations; " << "terminating optimization." << std::endl; return energy; } } Rcpp::Rcout << "SA: maximum iterations (" << maxIterations << ") reached; " << "terminating optimization." << std::endl; return energy; } /** * GenerateMove proposes a move on element iterate(idx), and determines * it that move is acceptable or not according to the Metropolis criterion. * After that it increments idx so next call will make a move on next * parameters. When all elements of the state has been moved (a sweep), it * resets idx and increments sweepCounter. When sweepCounter reaches * moveCtrlSweep, it performs moveControl and resets sweepCounter. */ template< typename FunctionType, typename CoolingScheduleType > void SA::GenerateMove( arma::mat& iterate, arma::mat& accept, double& energy, size_t& idx, size_t& sweepCounter) { const double prevEnergy = energy; const double prevValue = iterate(idx); // It is possible to use a non-Laplace distribution here, but it is difficult // because the acceptance ratio should be as close to 0.44 as possible, and // MoveControl() is derived for the Laplace distribution. // Sample from a Laplace distribution with scale parameter moveSize(idx). const double unif = 2.0 * math::Random() - 1.0; const double move = (unif < 0) ? (moveSize(idx) * std::log(1 + unif)) : (-moveSize(idx) * std::log(1 - unif)); iterate(idx) += move; energy = function.Evaluate(iterate); // According to the Metropolis criterion, accept the move with probability // min{1, exp(-(E_new - E_old) / T)}. const double xi = math::Random(); const double delta = energy - prevEnergy; const double criterion = std::exp(-delta / temperature); if (delta <= 0. || criterion > xi) { accept(idx) += 1.; } else // Reject the move; restore previous state. { iterate(idx) = prevValue; energy = prevEnergy; } ++idx; if (idx == iterate.n_elem) // Finished with a sweep. { idx = 0; ++sweepCounter; } if (sweepCounter == moveCtrlSweep) // Do MoveControl(). { MoveControl(moveCtrlSweep, accept); sweepCounter = 0; } } /** * MoveControl() uses a proportional feedback control to determine the size * parameter to pass to the move generation distribution. The target of such * move control is to make the acceptance ratio, accept/nMoves, be as close to * 0.44 as possible. Generally speaking, the larger the move size is, the larger * the function value change of the move will be, and less likely such move will * be accepted by the Metropolis criterion. Thus, the move size is controlled by * * log(moveSize) = log(moveSize) + gain * (accept/nMoves - target) * * For more theory and the mysterious 0.44 value, see Jimmy K.-C. Lam and * Jean-Marc Delosme. `An efficient simulated annealing schedule: derivation'. * Technical Report 8816, Yale University, 1988. */ template< typename FunctionType, typename CoolingScheduleType > void SA::MoveControl(const size_t nMoves, arma::mat& accept) { arma::mat target; target.copy_size(accept); target.fill(0.44); moveSize = arma::log(moveSize); moveSize += gain * (accept / (double) nMoves - target); moveSize = arma::exp(moveSize); // To avoid the use of element-wise arma::min(), which is only available in // Armadillo after v3.930, we use a for loop here instead. for (size_t i = 0; i < accept.n_elem; ++i) moveSize(i) = (moveSize(i) > maxMove(i)) ? maxMove(i) : moveSize(i); accept.zeros(); } template< typename FunctionType, typename CoolingScheduleType > std::string SA:: ToString() const { std::ostringstream convert; convert << "SA [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << util::Indent(function.ToString(), 2); convert << " Cooling Schedule:" << std::endl; convert << util::Indent(coolingSchedule.ToString(), 2); convert << " Temperature: " << temperature << std::endl; convert << " Initial moves: " << initMoves << std::endl; convert << " Sweeps per move control: " << moveCtrlSweep << std::endl; convert << " Tolerance: " << tolerance << std::endl; convert << " Maximum sweeps below tolerance: " << maxToleranceSweep << std::endl; convert << " Move control gain: " << gain << std::endl; convert << " Maximum iterations: " << maxIterations << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/sgd/0000755000176200001440000000000013647512751021642 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/optimizers/sgd/test_function.hpp0000644000176200001440000000360513647512751025243 0ustar liggesusers/** * @file test_function.hpp * @author Ryan Curtin * * Very simple test function for SGD. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SGD_TEST_FUNCTION_HPP #define __MLPACK_CORE_OPTIMIZERS_SGD_TEST_FUNCTION_HPP #include namespace mlpack { namespace optimization { namespace test { //! Very, very simple test function which is the composite of three other //! functions. It turns out that although this function is very simple, //! optimizing it fully can take a very long time. It seems to take in excess //! of 10 million iterations with a step size of 0.0005. class SGDTestFunction { public: //! Nothing to do for the constructor. SGDTestFunction() { } //! Return 3 (the number of functions). size_t NumFunctions() const { return 3; } //! Get the starting point. arma::mat GetInitialPoint() const { return arma::mat("6; -45.6; 6.2"); } //! Evaluate a function. double Evaluate(const arma::mat& coordinates, const size_t i) const; //! Evaluate the gradient of a function. void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient) const; }; }; // namespace test }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/sgd/sgd.hpp0000644000176200001440000001374213647512751023137 0ustar liggesusers/** * @file sgd.hpp * @author Ryan Curtin * * Stochastic Gradient Descent (SGD). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SGD_SGD_HPP #define __MLPACK_CORE_OPTIMIZERS_SGD_SGD_HPP #include namespace mlpack { namespace optimization { /** * Stochastic Gradient Descent is a technique for minimizing a function which * can be expressed as a sum of other functions. That is, suppose we have * * \f[ * f(A) = \sum_{i = 0}^{n} f_i(A) * \f] * * and our task is to minimize \f$ A \f$. Stochastic gradient descent iterates * over each function \f$ f_i(A) \f$, producing the following update scheme: * * \f[ * A_{j + 1} = A_j + \alpha \nabla f_i(A) * \f] * * where \f$ \alpha \f$ is a parameter which specifies the step size. \f$ i \f$ * is chosen according to \f$ j \f$ (the iteration number). The SGD class * supports either scanning through each of the \f$ n \f$ functions \f$ f_i(A) * \f$ linearly, or in a random sequence. The algorithm continues until \f$ j * \f$ reaches the maximum number of iterations -- or when a full sequence of * updates through each of the \f$ n \f$ functions \f$ f_i(A) \f$ produces an * improvement within a certain tolerance \f$ \epsilon \f$. That is, * * \f[ * | f(A_{j + n}) - f(A_j) | < \epsilon. * \f] * * The parameter \f$\epsilon\f$ is specified by the tolerance parameter to the * constructor; \f$n\f$ is specified by the maxIterations parameter. * * This class is useful for data-dependent functions whose objective function * can be expressed as a sum of objective functions operating on an individual * point. Then, SGD considers the gradient of the objective function operating * on an individual point in its update of \f$ A \f$. * * For SGD to work, a DecomposableFunctionType template parameter is required. * This class must implement the following function: * * size_t NumFunctions(); * double Evaluate(const arma::mat& coordinates, const size_t i); * void Gradient(const arma::mat& coordinates, * const size_t i, * arma::mat& gradient); * * NumFunctions() should return the number of functions (\f$n\f$), and in the * other two functions, the parameter i refers to which individual function (or * gradient) is being evaluated. So, for the case of a data-dependent function, * such as NCA (see mlpack::nca::NCA), NumFunctions() should return the number * of points in the dataset, and Evaluate(coordinates, 0) will evaluate the * objective function on the first point in the dataset (presumably, the dataset * is held internally in the DecomposableFunctionType). * * @tparam DecomposableFunctionType Decomposable objective function type to be * minimized. */ template class SGD { public: /** * Construct the SGD optimizer with the given function and parameters. * * @param function Function to be optimized (minimized). * @param stepSize Step size for each iteration. * @param maxIterations Maximum number of iterations allowed (0 means no * limit). * @param tolerance Maximum absolute tolerance to terminate algorithm. * @param shuffle If true, the function order is shuffled; otherwise, each * function is visited in linear order. */ SGD(DecomposableFunctionType& function, const double stepSize = 0.01, const size_t maxIterations = 100000, const double tolerance = 1e-5, const bool shuffle = true); /** * Optimize the given function using stochastic gradient descent. The given * starting point will be modified to store the finishing point of the * algorithm, and the final objective value is returned. * * @param iterate Starting point (will be modified). * @return Objective value of the final point. */ double Optimize(arma::mat& iterate); //! Get the instantiated function to be optimized. const DecomposableFunctionType& Function() const { return function; } //! Modify the instantiated function. DecomposableFunctionType& Function() { return function; } //! Get the step size. double StepSize() const { return stepSize; } //! Modify the step size. double& StepSize() { return stepSize; } //! Get the maximum number of iterations (0 indicates no limit). size_t MaxIterations() const { return maxIterations; } //! Modify the maximum number of iterations (0 indicates no limit). size_t& MaxIterations() { return maxIterations; } //! Get the tolerance for termination. double Tolerance() const { return tolerance; } //! Modify the tolerance for termination. double& Tolerance() { return tolerance; } //! Get whether or not the individual functions are shuffled. bool Shuffle() const { return shuffle; } //! Modify whether or not the individual functions are shuffled. bool& Shuffle() { return shuffle; } // convert the obkect into a string std::string ToString() const; private: //! The instantiated function. DecomposableFunctionType& function; //! The step size for each example. double stepSize; //! The maximum number of allowed iterations. size_t maxIterations; //! The tolerance for termination. double tolerance; //! Controls whether or not the individual functions are shuffled when //! iterating. bool shuffle; }; }; // namespace optimization }; // namespace mlpack // Include implementation. #include "sgd_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/optimizers/sgd/sgd_impl.hpp0000644000176200001440000001100513647512751024146 0ustar liggesusers/** * @file sgd_impl.hpp * @author Ryan Curtin * * Implementation of stochastic gradient descent. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_SGD_SGD_IMPL_HPP #define __MLPACK_CORE_OPTIMIZERS_SGD_SGD_IMPL_HPP #include // In case it hasn't been included yet. #include "sgd.hpp" namespace mlpack { namespace optimization { template SGD::SGD(DecomposableFunctionType& function, const double stepSize, const size_t maxIterations, const double tolerance, const bool shuffle) : function(function), stepSize(stepSize), maxIterations(maxIterations), tolerance(tolerance), shuffle(shuffle) { /* Nothing to do. */ } //! Optimize the function (minimize). template double SGD::Optimize(arma::mat& iterate) { // Find the number of functions to use. const size_t numFunctions = function.NumFunctions(); // This is used only if shuffle is true. arma::vec visitationOrder; if (shuffle) visitationOrder = arma::shuffle(arma::linspace(0, (numFunctions - 1), numFunctions)); // To keep track of where we are and how things are going. size_t currentFunction = 0; double overallObjective = 0; double lastObjective = DBL_MAX; // Calculate the first objective function. for (size_t i = 0; i < numFunctions; ++i) overallObjective += function.Evaluate(iterate, i); // Now iterate! arma::mat gradient(iterate.n_rows, iterate.n_cols); for (size_t i = 1; i != maxIterations; ++i, ++currentFunction) { // Is this iteration the start of a sequence? if ((currentFunction % numFunctions) == 0) { // Output current objective function. Rcpp::Rcout << "SGD: iteration " << i << ", objective " << overallObjective << "." << std::endl; if (overallObjective != overallObjective) { Rcpp::Rcout << "SGD: converged to " << overallObjective << "; terminating" << " with failure. Try a smaller step size?" << std::endl; return overallObjective; } if (std::abs(lastObjective - overallObjective) < tolerance) { Rcpp::Rcout << "SGD: minimized within tolerance " << tolerance << "; " << "terminating optimization." << std::endl; return overallObjective; } // Reset the counter variables. lastObjective = overallObjective; overallObjective = 0; currentFunction = 0; if (shuffle) // Determine order of visitation. visitationOrder = arma::shuffle(visitationOrder); } // Evaluate the gradient for this iteration. function.Gradient(iterate, currentFunction, gradient); // And update the iterate. iterate -= stepSize * gradient; // Now add that to the overall objective function. overallObjective += function.Evaluate(iterate, currentFunction); } Rcpp::Rcout << "SGD: maximum iterations (" << maxIterations << ") reached; " << "terminating optimization." << std::endl; return overallObjective; } // Convert the object to a string. template std::string SGD::ToString() const { std::ostringstream convert; convert << "SGD [" << this << "]" << std::endl; convert << " Function:" << std::endl; convert << util::Indent(function.ToString(), 2); convert << " Step size: " << stepSize << std::endl; convert << " Maximum iterations: " << maxIterations << std::endl; convert << " Tolerance: " << tolerance << std::endl; convert << " Shuffle points: " << (shuffle ? "true" : "false") << std::endl; return convert.str(); } }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/lrsdp/0000755000176200001440000000000013647512751022211 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/optimizers/lrsdp/lrsdp.hpp0000644000176200001440000001035013647512751024045 0ustar liggesusers/** * @file lrsdp.hpp * @author Ryan Curtin * * An implementation of Monteiro and Burer's formulation of low-rank * semidefinite programs (LR-SDP). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_HPP #define __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_HPP #include #include #include "lrsdp_function.hpp" namespace mlpack { namespace optimization { /** * LRSDP is the implementation of Monteiro and Burer's formulation of low-rank * semidefinite programs (LR-SDP). This solver uses the augmented Lagrangian * optimizer to solve low-rank semidefinite programs. */ class LRSDP { public: /** * Create an LRSDP to be optimized. The solution will end up being a matrix * of size (rank) x (rows). To construct each constraint and the objective * function, use the functions A(), B(), and C() to set them correctly. * * @param numConstraints Number of constraints in the problem. * @param rank Rank of the solution (<= rows). * @param rows Number of rows in the solution. */ LRSDP(const size_t numConstraints, const arma::mat& initialPoint); /** * Create an LRSDP to be optimized, passing in an already-created * AugLagrangian object. The given initial point should be set to the size * (rows) x (rank), where (rank) is the reduced rank of the problem. * * @param numConstraints Number of constraints in the problem. * @param initialPoint Initial point of the optimization. * @param auglag Pre-initialized AugLagrangian object. */ LRSDP(const size_t numConstraints, const arma::mat& initialPoint, AugLagrangian& augLagrangian); /** * Optimize the LRSDP and return the final objective value. The given * coordinates will be modified to contain the final solution. * * @param coordinates Starting coordinates for the optimization. */ double Optimize(arma::mat& coordinates); //! Return the objective function matrix (C). const arma::mat& C() const { return function.C(); } //! Modify the objective function matrix (C). arma::mat& C() { return function.C(); } //! Return the vector of A matrices (which correspond to the constraints). const std::vector& A() const { return function.A(); } //! Modify the veector of A matrices (which correspond to the constraints). std::vector& A() { return function.A(); } //! Return the vector of modes for the A matrices. const arma::uvec& AModes() const { return function.AModes(); } //! Modify the vector of modes for the A matrices. arma::uvec& AModes() { return function.AModes(); } //! Return the vector of B values. const arma::vec& B() const { return function.B(); } //! Modify the vector of B values. arma::vec& B() { return function.B(); } //! Return the function to be optimized. const LRSDPFunction& Function() const { return function; } //! Modify the function to be optimized. LRSDPFunction& Function() { return function; } //! Return the augmented Lagrangian object. const AugLagrangian& AugLag() const { return augLag; } //! Modify the augmented Lagrangian object. AugLagrangian& AugLag() { return augLag; } //! Return a string representation of the object. std::string ToString() const; private: //! Function to optimize, which the AugLagrangian object holds. LRSDPFunction function; //! The AugLagrangian object which will be used for optimization. AugLagrangian augLag; }; }; // namespace optimization }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/optimizers/lrsdp/lrsdp_function.hpp0000644000176200001440000000724213647512751025760 0ustar liggesusers/** * @file lrsdp_function.hpp * @author Ryan Curtin * @author Abhishek Laddha * * A class that represents the objective function which LRSDP optimizes. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_FUNCTION_HPP #define __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_FUNCTION_HPP #include #include namespace mlpack { namespace optimization { /** * The objective function that LRSDP is trying to optimize. */ class LRSDPFunction { public: /** * Construct the LRSDPFunction with the given initial point and number of * constraints. Set the A, B, and C matrices for each constraint using the * A(), B(), and C() functions. */ LRSDPFunction(const size_t numConstraints, const arma::mat& initialPoint); /** * Evaluate the objective function of the LRSDP (no constraints) at the given * coordinates. */ double Evaluate(const arma::mat& coordinates) const; /** * Evaluate the gradient of the LRSDP (no constraints) at the given * coordinates. */ void Gradient(const arma::mat& coordinates, arma::mat& gradient) const; /** * Evaluate a particular constraint of the LRSDP at the given coordinates. */ double EvaluateConstraint(const size_t index, const arma::mat& coordinates) const; /** * Evaluate the gradient of a particular constraint of the LRSDP at the given * coordinates. */ void GradientConstraint(const size_t index, const arma::mat& coordinates, arma::mat& gradient) const; //! Get the number of constraints in the LRSDP. size_t NumConstraints() const { return b.n_elem; } //! Get the initial point of the LRSDP. const arma::mat& GetInitialPoint() const { return initialPoint; } //! Return the objective function matrix (C). const arma::mat& C() const { return c; } //! Modify the objective function matrix (C). arma::mat& C() { return c; } //! Return the vector of A matrices (which correspond to the constraints). const std::vector& A() const { return a; } //! Modify the veector of A matrices (which correspond to the constraints). std::vector& A() { return a; } //! Return the vector of modes for the A matrices. const arma::uvec& AModes() const { return aModes; } //! Modify the vector of modes for the A matrices. arma::uvec& AModes() { return aModes; } //! Return the vector of B values. const arma::vec& B() const { return b; } //! Modify the vector of B values. arma::vec& B() { return b; } //! Return string representation of object. std::string ToString() const; private: //! Objective function matrix c. arma::mat c; //! A_i for each constraint. std::vector a; //! b_i for each constraint. arma::vec b; //! Initial point. arma::mat initialPoint; //! 1 if entries in matrix, 0 for normal. arma::uvec aModes; }; }; }; #endif // __MLPACK_CORE_OPTIMIZERS_LRSDP_LRSDP_FUNCTION_HPP RcppMLPACK/inst/include/mlpack/core/arma_extend/0000755000176200001440000000000013647512751021147 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/arma_extend/restrictors.hpp0000644000176200001440000000157113647512751024247 0ustar liggesusers// Modifications to allow u64/s64 in Armadillo when ARMA_64BIT_WORD is not // defined. Only required on Armadillo < 3.6.2. #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #ifndef ARMA_64BIT_WORD template<> struct arma_scalar_only { typedef u64 result; }; template<> struct arma_scalar_only { typedef s64 result; }; template<> struct arma_integral_only { typedef u64 result; }; template<> struct arma_integral_only { typedef s64 result; }; template<> struct arma_unsigned_integral_only { typedef u64 result; }; template<> struct arma_signed_integral_only { typedef s64 result; }; template<> struct arma_signed_only { typedef s64 result; }; #endif #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/glue_ccov_proto.hpp0000644000176200001440000000233113647512751025050 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup glue_ccov //! @{ class glue_ccov { public: template inline static void direct_ccov(Mat& out, const Mat& A, const Mat& B, const uword norm_type); template inline static void direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& A, const Mat< std::complex >& B, const uword norm_type); template inline static void apply(Mat& out, const Glue& X); }; //! @} RcppMLPACK/inst/include/mlpack/core/arma_extend/SpMat_extra_bones.hpp0000644000176200001440000000107113647512751025274 0ustar liggesusers/** * @file SpMat_extra_bones.hpp * @author Ryan Curtin * * Add a batch constructor for SpMat, if the version is older than 3.810.0. */ #if ARMA_VERSION_MAJOR == 3 && ARMA_VERSION_MINOR < 810 template inline SpMat( const Base& locations, const Base& values, const bool sort_locations = true); template inline SpMat( const Base& locations, const Base& values, const uword n_rows, const uword n_cols, const bool sort_locations = true); #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/glue_ccov_meat.hpp0000644000176200001440000000710713647512751024641 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup glue_cov //! @{ template inline void glue_ccov::direct_ccov(Mat& out, const Mat& A, const Mat& B, const uword norm_type) { arma_extra_debug_sigprint(); if(A.is_vec() && B.is_vec()) { arma_debug_check( (A.n_elem != B.n_elem), "ccov(): the number of elements in A and B must match" ); const eT* A_ptr = A.memptr(); const eT* B_ptr = B.memptr(); eT A_acc = eT(0); eT B_acc = eT(0); eT out_acc = eT(0); const uword N = A.n_elem; for(uword i=0; i 1) ? eT(N-1) : eT(1) ) : eT(N); out.set_size(1,1); out[0] = out_acc/norm_val; } else { arma_debug_assert_same_size(A, B, "ccov()"); const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); out = A * trans(B); out -= (sum(A) * trans(sum(B))) / eT(N); out /= norm_val; } } template inline void glue_ccov::direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& A, const Mat< std::complex >& B, const uword norm_type) { arma_extra_debug_sigprint(); typedef typename std::complex eT; if(A.is_vec() && B.is_vec()) { arma_debug_check( (A.n_elem != B.n_elem), "cov(): the number of elements in A and B must match" ); const eT* A_ptr = A.memptr(); const eT* B_ptr = B.memptr(); eT A_acc = eT(0); eT B_acc = eT(0); eT out_acc = eT(0); const uword N = A.n_elem; for(uword i=0; i 1) ? eT(N-1) : eT(1) ) : eT(N); out.set_size(1,1); out[0] = out_acc/norm_val; } else { arma_debug_assert_same_size(A, B, "ccov()"); const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); out = A * trans(conj(B)); out -= (sum(A) * trans(conj(sum(B)))) / eT(N); out /= norm_val; } } template inline void glue_ccov::apply(Mat& out, const Glue& X) { arma_extra_debug_sigprint(); typedef typename T1::elem_type eT; const unwrap_check A_tmp(X.A, out); const unwrap_check B_tmp(X.B, out); const Mat& A = A_tmp.M; const Mat& B = B_tmp.M; const uword norm_type = X.aux_uword; if(&A != &B) { glue_ccov::direct_ccov(out, A, B, norm_type); } else { op_ccov::direct_ccov(out, A, norm_type); } } //! @} RcppMLPACK/inst/include/mlpack/core/arma_extend/SpMat_extra_meat.hpp0000644000176200001440000002120313647512751025113 0ustar liggesusers/** * @file SpMat_extra_meat.hpp * @author Ryan Curtin * * Take the Armadillo batch sparse matrix constructor function from newer * Armadillo versions and port it to versions earlier than 3.810.0. */ #if ARMA_VERSION_MAJOR == 3 && ARMA_VERSION_MINOR < 810 //! Insert a large number of values at once. //! locations.row[0] should be row indices, locations.row[1] should be column indices, //! and values should be the corresponding values. //! If sort_locations is false, then it is assumed that the locations and values //! are already sorted in column-major ordering. template template inline SpMat::SpMat(const Base& locations_expr, const Base& vals_expr, const bool sort_locations) : n_rows(0) , n_cols(0) , n_elem(0) , n_nonzero(0) , vec_state(0) , values(NULL) , row_indices(NULL) , col_ptrs(NULL) { arma_extra_debug_sigprint_this(this); const unwrap locs_tmp( locations_expr.get_ref() ); const Mat& locs = locs_tmp.M; const unwrap vals_tmp( vals_expr.get_ref() ); const Mat& vals = vals_tmp.M; arma_debug_check( (vals.is_vec() == false), "SpMat::SpMat(): given 'values' object is not a vector" ); arma_debug_check((locs.n_cols != vals.n_elem), "SpMat::SpMat(): number of locations is different than number of values"); // If there are no elements in the list, max() will fail. if (locs.n_cols == 0) { init(0, 0); return; } arma_debug_check((locs.n_rows != 2), "SpMat::SpMat(): locations matrix must have two rows"); // Automatically determine size (and check if it's sorted). uvec bounds = arma::max(locs, 1); init(bounds[0] + 1, bounds[1] + 1); // Resize to correct number of elements. mem_resize(vals.n_elem); // Reset column pointers to zero. arrayops::inplace_set(access::rwp(col_ptrs), uword(0), n_cols + 1); bool actually_sorted = true; if(sort_locations == true) { // sort_index() uses std::sort() which may use quicksort... so we better // make sure it's not already sorted before taking an O(N^2) sort penalty. for (uword i = 1; i < locs.n_cols; ++i) { if ((locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) <= locs.at(0, i - 1))) { actually_sorted = false; break; } } if(actually_sorted == false) { // This may not be the fastest possible implementation but it maximizes code reuse. Col abslocs(locs.n_cols); for (uword i = 0; i < locs.n_cols; ++i) { abslocs[i] = locs.at(1, i) * n_rows + locs.at(0, i); } // Now we will sort with sort_index(). uvec sorted_indices = sort_index(abslocs); // Ascending sort. // Now we add the elements in this sorted order. for (uword i = 0; i < sorted_indices.n_elem; ++i) { arma_debug_check((locs.at(0, sorted_indices[i]) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, sorted_indices[i]) >= n_cols), "SpMat::SpMat(): invalid column index"); access::rw(values[i]) = vals[sorted_indices[i]]; access::rw(row_indices[i]) = locs.at(0, sorted_indices[i]); access::rw(col_ptrs[locs.at(1, sorted_indices[i]) + 1])++; } } } if( (sort_locations == false) || (actually_sorted == true) ) { // Now set the values and row indices correctly. // Increment the column pointers in each column (so they are column "counts"). for (uword i = 0; i < vals.n_elem; ++i) { arma_debug_check((locs.at(0, i) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, i) >= n_cols), "SpMat::SpMat(): invalid column index"); // Check ordering in debug mode. if(i > 0) { arma_debug_check ( ( (locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) < locs.at(0, i - 1)) ), "SpMat::SpMat(): out of order points; either pass sort_locations = true, or sort points in column-major ordering" ); arma_debug_check((locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) == locs.at(0, i - 1)), "SpMat::SpMat(): two identical point locations in list"); } access::rw(values[i]) = vals[i]; access::rw(row_indices[i]) = locs.at(0, i); access::rw(col_ptrs[locs.at(1, i) + 1])++; } } // Now fix the column pointers. for (uword i = 0; i <= n_cols; ++i) { access::rw(col_ptrs[i + 1]) += col_ptrs[i]; } } //! Insert a large number of values at once. //! locations.row[0] should be row indices, locations.row[1] should be column indices, //! and values should be the corresponding values. //! If sort_locations is false, then it is assumed that the locations and values //! are already sorted in column-major ordering. //! In this constructor the size is explicitly given. template template inline SpMat::SpMat(const Base& locations_expr, const Base& vals_expr, const uword in_n_rows, const uword in_n_cols, const bool sort_locations) : n_rows(0) , n_cols(0) , n_elem(0) , n_nonzero(0) , vec_state(0) , values(NULL) , row_indices(NULL) , col_ptrs(NULL) { arma_extra_debug_sigprint_this(this); init(in_n_rows, in_n_cols); const unwrap locs_tmp( locations_expr.get_ref() ); const Mat& locs = locs_tmp.M; const unwrap vals_tmp( vals_expr.get_ref() ); const Mat& vals = vals_tmp.M; arma_debug_check( (vals.is_vec() == false), "SpMat::SpMat(): given 'values' object is not a vector" ); arma_debug_check((locs.n_rows != 2), "SpMat::SpMat(): locations matrix must have two rows"); arma_debug_check((locs.n_cols != vals.n_elem), "SpMat::SpMat(): number of locations is different than number of values"); // Resize to correct number of elements. mem_resize(vals.n_elem); // Reset column pointers to zero. arrayops::inplace_set(access::rwp(col_ptrs), uword(0), n_cols + 1); bool actually_sorted = true; if(sort_locations == true) { // sort_index() uses std::sort() which may use quicksort... so we better // make sure it's not already sorted before taking an O(N^2) sort penalty. for (uword i = 1; i < locs.n_cols; ++i) { if ((locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) <= locs.at(0, i - 1))) { actually_sorted = false; break; } } if(actually_sorted == false) { // This may not be the fastest possible implementation but it maximizes code reuse. Col abslocs(locs.n_cols); for (uword i = 0; i < locs.n_cols; ++i) { abslocs[i] = locs.at(1, i) * n_rows + locs.at(0, i); } // Now we will sort with sort_index(). uvec sorted_indices = sort_index(abslocs); // Ascending sort. // Now we add the elements in this sorted order. for (uword i = 0; i < sorted_indices.n_elem; ++i) { arma_debug_check((locs.at(0, sorted_indices[i]) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, sorted_indices[i]) >= n_cols), "SpMat::SpMat(): invalid column index"); access::rw(values[i]) = vals[sorted_indices[i]]; access::rw(row_indices[i]) = locs.at(0, sorted_indices[i]); access::rw(col_ptrs[locs.at(1, sorted_indices[i]) + 1])++; } } } if( (sort_locations == false) || (actually_sorted == true) ) { // Now set the values and row indices correctly. // Increment the column pointers in each column (so they are column "counts"). for (uword i = 0; i < vals.n_elem; ++i) { arma_debug_check((locs.at(0, i) >= n_rows), "SpMat::SpMat(): invalid row index"); arma_debug_check((locs.at(1, i) >= n_cols), "SpMat::SpMat(): invalid column index"); // Check ordering in debug mode. if(i > 0) { arma_debug_check ( ( (locs.at(1, i) < locs.at(1, i - 1)) || (locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) < locs.at(0, i - 1)) ), "SpMat::SpMat(): out of order points; either pass sort_locations = true or sort points in column-major ordering" ); arma_debug_check((locs.at(1, i) == locs.at(1, i - 1) && locs.at(0, i) == locs.at(0, i - 1)), "SpMat::SpMat(): two identical point locations in list"); } access::rw(values[i]) = vals[i]; access::rw(row_indices[i]) = locs.at(0, i); access::rw(col_ptrs[locs.at(1, i) + 1])++; } } // Now fix the column pointers. for (uword i = 0; i <= n_cols; ++i) { access::rw(col_ptrs[i + 1]) += col_ptrs[i]; } } #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/promote_type.hpp0000644000176200001440000000702413647512751024411 0ustar liggesusers// Extra promote_type definitions until 64-bit index support is added to // Armadillo. These aren't necessary on Armadillo > 3.6.1. #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #ifndef ARMA_64BIT_WORD template struct is_promotable, s64> : public is_promotable_ok { typedef std::complex result; }; template struct is_promotable, u64> : public is_promotable_ok { typedef std::complex result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; // float ? template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template struct is_promotable > : public is_promotable_ok { typedef std::complex result; }; template struct is_promotable > : public is_promotable_ok { typedef std::complex result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef double result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef float result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; // float ? template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; // float ? template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef s64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; template<> struct is_promotable : public is_promotable_ok { typedef u64 result; }; #endif #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/hdf5_misc.hpp0000644000176200001440000000107613647512751023525 0ustar liggesusers// To hack in u64/s64 support to Armadillo when it is not compiled with // ARMA_64BIT_WORD. namespace hdf5_misc { #if defined(ARMA_USE_HDF5) #if !(defined(ARMA_64BIT_WORD) || defined(ARMA_USE_U64S64)) #if defined(ULLONG_MAX) template<> inline hid_t get_hdf5_type< long long >() { return H5Tcopy(H5T_NATIVE_LLONG); } template<> inline hid_t get_hdf5_type< unsigned long long >() { return H5Tcopy(H5T_NATIVE_ULLONG); } #endif #endif #endif } // namespace hdf5_misc RcppMLPACK/inst/include/mlpack/core/arma_extend/typedef.hpp0000644000176200001440000000202513647512751023317 0ustar liggesusers// Extensions to typedef u64 and s64 until that support is added into // Armadillo. We only need to typedef s64 on Armadillo > 1.2.0. This is not // necessary for Armadillo > 3.6.1. #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #ifndef ARMA_64BIT_WORD // We must typedef both u64 and s64. #if ULONG_MAX >= 0xffffffffffffffff typedef unsigned long u64; typedef long s64; #elif ULLONG_MAX >= 0xffffffffffffffff typedef unsigned long long u64; typedef long long s64; #else #error "don't know how to typedef 'u64' on this system" #endif namespace junk { struct arma_64_elem_size_test { arma_static_check( (sizeof(u64) != 8), ERROR___TYPE_U64_HAS_UNSUPPORTED_SIZE ); arma_static_check( (sizeof(s64) != 8), ERROR___TYPE_S64_HAS_UNSUPPORTED_SIZE ); }; } #endif #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/fn_ccov.hpp0000644000176200001440000000256513647512751023305 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup fn_ccov //! @{ template inline const Op ccov(const Base& X, const uword norm_type = 0) { arma_extra_debug_sigprint(); arma_debug_check( (norm_type > 1), "ccov(): norm_type must be 0 or 1"); return Op(X.get_ref(), norm_type, 0); } template inline const Glue cov(const Base& A, const Base& B, const uword norm_type = 0) { arma_extra_debug_sigprint(); arma_debug_check( (norm_type > 1), "ccov(): norm_type must be 0 or 1"); return Glue(A.get_ref(), B.get_ref(), norm_type); } //! @} RcppMLPACK/inst/include/mlpack/core/arma_extend/fn_inplace_reshape.hpp0000644000176200001440000000251413647512751025467 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Ryan Curtin (ryan at igglybob dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup fn_inplace_reshape //! @{ /** * This does not handle column vectors or row vectors entirely correctly. You * should be able to do multiplication or other basic operations with the * resulting matrix, but it may have other problems. So if you are using this * on vectors (arma::Col<> or arma::Row<>), be careful, and be warned that * bizarre behavior may occur. */ template inline Mat& inplace_reshape(Mat& X, const uword new_n_rows, const uword new_n_cols) { arma_extra_debug_sigprint(); arma_debug_check((new_n_rows * new_n_cols) != X.n_elem, "inplace_reshape(): cannot add or remove elements"); access::rw(X.n_rows) = new_n_rows; access::rw(X.n_cols) = new_n_cols; return X; } //! @} RcppMLPACK/inst/include/mlpack/core/arma_extend/op_ccov_proto.hpp0000644000176200001440000000212613647512751024534 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup op_cov //! @{ class op_ccov { public: template inline static void direct_ccov(Mat& out, const Mat& X, const uword norm_type); template inline static void direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& X, const uword norm_type); template inline static void apply(Mat& out, const Op& in); }; //! @} RcppMLPACK/inst/include/mlpack/core/arma_extend/arma_extend.hpp0000644000176200001440000000216313647512751024151 0ustar liggesusers/*** * @file arma_extend.hpp * @author Ryan Curtin * * Include Armadillo extensions which currently are not part of the main * Armadillo codebase. * * This will allow the use of the ccov() function (which performs the same * function as cov(trans(X)) but without the cost of computing trans(X)). This * also gives sparse matrix support, if it is necessary. */ #ifndef __MLPACK_CORE_ARMA_EXTEND_ARMA_EXTEND_HPP #define __MLPACK_CORE_ARMA_EXTEND_ARMA_EXTEND_HPP // Add batch constructor for sparse matrix (if version <= 3.810.0). #define ARMA_EXTRA_SPMAT_PROTO mlpack/core/arma_extend/SpMat_extra_bones.hpp #define ARMA_EXTRA_SPMAT_MEAT mlpack/core/arma_extend/SpMat_extra_meat.hpp #include #define NDEBUG 1 namespace arma { // u64/s64 #include "typedef.hpp" #include "traits.hpp" #include "promote_type.hpp" #include "restrictors.hpp" #include "hdf5_misc.hpp" // ccov() #include "op_ccov_proto.hpp" #include "op_ccov_meat.hpp" #include "glue_ccov_proto.hpp" #include "glue_ccov_meat.hpp" #include "fn_ccov.hpp" // inplace_reshape() #include "fn_inplace_reshape.hpp" }; #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/traits.hpp0000644000176200001440000000236013647512751023167 0ustar liggesusers// Extra traits to support u64 and s64 (or, specifically, unsigned long and // long) until that is applied to the Armadillo sources. // This isn't necessary if Armadillo was compiled with 64-bit support, or if // ARMA_USE_U64S64 is enabled, or if Armadillo >= 3.6.2 is used (by default // Armadillo 3.6.2 allows long types). #if (ARMA_VERSION_MAJOR < 3) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR < 6)) || \ ((ARMA_VERSION_MAJOR == 3) && (ARMA_VERSION_MINOR == 6) && \ (ARMA_VERSION_PATCH < 2)) #if !(defined(ARMA_64BIT_WORD) || defined(ARMA_USE_U64S64)) template struct is_u64 { static const bool value = false; }; template<> struct is_u64 { static const bool value = true; }; template struct is_s64 { static const bool value = false; }; template<> struct is_s64 { static const bool value = true; }; template<> struct is_supported_elem_type { static const bool value = true; }; template<> struct is_supported_elem_type { static const bool value = true; }; template<> struct is_signed { static const bool value = false; }; #endif #endif RcppMLPACK/inst/include/mlpack/core/arma_extend/op_ccov_meat.hpp0000644000176200001440000000453513647512751024325 0ustar liggesusers// Copyright (C) 2010 NICTA and the authors listed below // http://nicta.com.au // // Authors: // - Conrad Sanderson (conradsand at ieee dot org) // - Dimitrios Bouzas (dimitris dot mpouzas at gmail dot com) // // This file is part of the Armadillo C++ library. // It is provided without any warranty of fitness // for any purpose. You can redistribute this file // and/or modify it under the terms of the GNU // Lesser General Public License (LGPL) as published // by the Free Software Foundation, either version 3 // of the License or (at your option) any later version. // (see http://www.opensource.org/licenses for more info) //! \addtogroup op_cov //! @{ template inline void op_ccov::direct_ccov(Mat& out, const Mat& A, const uword norm_type) { arma_extra_debug_sigprint(); if(A.is_vec()) { if(A.n_rows == 1) { out = var(trans(A), norm_type); } else { out = var(A, norm_type); } } else { const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); const Col acc = sum(A, 1); out = A * trans(A); out -= (acc * trans(acc)) / eT(N); out /= norm_val; } } template inline void op_ccov::direct_ccov(Mat< std::complex >& out, const Mat< std::complex >& A, const uword norm_type) { arma_extra_debug_sigprint(); typedef typename std::complex eT; if(A.is_vec()) { if(A.n_rows == 1) { const Mat tmp_mat = var(trans(A), norm_type); out.set_size(1,1); out[0] = tmp_mat[0]; } else { const Mat tmp_mat = var(A, norm_type); out.set_size(1,1); out[0] = tmp_mat[0]; } } else { const uword N = A.n_cols; const eT norm_val = (norm_type == 0) ? ( (N > 1) ? eT(N-1) : eT(1) ) : eT(N); const Col acc = sum(A, 1); out = A * trans(conj(A)); out -= (acc * trans(conj(acc))) / eT(N); out /= norm_val; } } template inline void op_ccov::apply(Mat& out, const Op& in) { arma_extra_debug_sigprint(); typedef typename T1::elem_type eT; const unwrap_check tmp(in.m, out); const Mat& A = tmp.M; const uword norm_type = in.aux_uword_a; op_ccov::direct_ccov(out, A, norm_type); } //! @} RcppMLPACK/inst/include/mlpack/core/util/0000755000176200001440000000000013647512751017635 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/util/timers.hpp0000644000176200001440000001124713647512751021656 0ustar liggesusers/** * @file timers.hpp * @author Matthew Amidon * @author Marcus Edel * * Timers for MLPACK. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTILITIES_TIMERS_HPP #define __MLPACK_CORE_UTILITIES_TIMERS_HPP #include #include #if defined(__unix__) || defined(__unix) #include // clock_gettime() #include // timeval, gettimeofday() #include // flags like _POSIX_VERSION #elif defined(__MACH__) && defined(__APPLE__) #include // mach_timebase_info, // mach_absolute_time() // TEMPORARY #include // clock_gettime() #include // timeval, gettimeofday() #include // flags like _POSIX_VERSION #elif defined(_WIN32) #include //GetSystemTimeAsFileTime(), // QueryPerformanceFrequency(), // QueryPerformanceCounter() #include //timeval on windows // uint64_t isn't defined on every windows. #if !defined(HAVE_UINT64_T) #if SIZEOF_UNSIGNED_LONG == 8 typedef unsigned long uint64_t; #else typedef unsigned long long uint64_t; #endif // SIZEOF_UNSIGNED_LONG #endif // HAVE_UINT64_T //gettimeofday has no equivalent will need to write extra code for that. #if defined(_MSC_VER) || defined(_MSC_EXTENSIONS) #define DELTA_EPOCH_IN_MICROSECS 11644473600000000Ui64 #else #define DELTA_EPOCH_IN_MICROSECS 11644473600000000ULL #endif // _MSC_VER, _MSC_EXTENSIONS #else #error "unknown OS" #endif namespace mlpack { /** * The timer class provides a way for MLPACK methods to be timed. The three * methods contained in this class allow a named timer to be started and * stopped, and its value to be obtained. */ class Timer { public: /** * Start the given timer. If a timer is started, then stopped, then * re-started, then re-stopped, the final value of the timer is the length of * both runs -- that is, MLPACK timers are additive for each time they are * run, and do not reset. * * @note Undefined behavior will occur if a timer is started twice. * * @param name Name of timer to be started. */ static void Start(const std::string& name); /** * Stop the given timer. * * @note Undefined behavior will occur if a timer is started twice. * * @param name Name of timer to be stopped. */ static void Stop(const std::string& name); /** * Get the value of the given timer. * * @param name Name of timer to return value of. */ static timeval Get(const std::string& name); }; class Timers { public: //! Nothing to do for the constructor. Timers() { } /** * Returns a copy of all the timers used via this interface. */ std::map& GetAllTimers(); /** * Returns a copy of the timer specified. * * @param timerName The name of the timer in question. */ timeval GetTimer(const std::string& timerName); /** * Prints the specified timer. If it took longer than a minute to complete * the timer will be displayed in days, hours, and minutes as well. * * @param timerName The name of the timer in question. */ void PrintTimer(const std::string& timerName); /**  * Initializes a timer, available like a normal value specified on  * the command line.  Timers are of type timeval. If a timer is started, then * stopped, then re-started, then stopped, the final timer value will be the * length of both runs of the timer.  *  * @param timerName The name of the timer in question.  */ void StartTimer(const std::string& timerName); /**  * Halts the timer, and replaces it's value with  * the delta time from it's start   *   * @param timerName The name of the timer in question.  */ void StopTimer(const std::string& timerName); private: std::map timers; void FileTimeToTimeVal(timeval* tv); void GetTime(timeval* tv); }; }; // namespace mlpack #endif // __MLPACK_CORE_UTILITIES_TIMERS_HPP RcppMLPACK/inst/include/mlpack/core/util/arma_traits.hpp0000644000176200001440000000527713647512751022667 0ustar liggesusers/** * @file arma_traits.hpp * @author Ryan Curtin * * Some traits used for template metaprogramming (SFINAE) with Armadillo types. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_ARMA_TRAITS_HPP #define __MLPACK_CORE_UTIL_ARMA_TRAITS_HPP // Structs have public members by default (that's why they are chosen over // classes). /** * If value == true, then VecType is some sort of Armadillo vector or subview. * You might use this struct like this: * * @code * // Only accepts VecTypes that are actually Armadillo vector types. * template * void Function(const VecType& argumentA, * typename boost::enable_if >* = 0); * @endcode * * The use of the enable_if object allows the compiler to instantiate Function() * only if VecType is one of the Armadillo vector types. It has a default * argument because it isn't meant to be used in either the function call or the * function body. */ template struct IsVector { const static bool value = false; }; // Commenting out the first template per case, because //Visual Studio doesn't like this instantiaion pattern (error C2910). //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; //template<> template struct IsVector > { const static bool value = true; }; // I'm not so sure about this one. An SpSubview object can be a row or column, // but it can also be a matrix subview. //template<> template struct IsVector > { const static bool value = true; }; #endif RcppMLPACK/inst/include/mlpack/core/util/prefixedoutstream_impl.hpp0000644000176200001440000001062513647512751025145 0ustar liggesusers/** * @file prefixedoutstream.hpp * @author Ryan Curtin * @author Matthew Amidon * * Implementation of templated PrefixedOutStream member functions. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_IMPL_HPP #define __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_IMPL_HPP // Just in case it hasn't been included. #include "prefixedoutstream.hpp" #include namespace mlpack { namespace util { template PrefixedOutStream& PrefixedOutStream::operator<<(const T& s) { CallBaseLogic(s); return *this; } //! This handles forwarding all primitive types transparently template void PrefixedOutStream::CallBaseLogic(const T& s, typename boost::disable_if< boost::is_class >::type*) { BaseLogic(s); } // Forward all objects that do not implement a ToString() method transparently template void PrefixedOutStream::CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type*, typename boost::disable_if< HasToString >::type*) { BaseLogic(s); } // Call ToString() on all objects that implement ToString() before forwarding template void PrefixedOutStream::CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type*, typename boost::enable_if< HasToString >::type*) { std::string result = s.ToString(); BaseLogic(result); } template void PrefixedOutStream::BaseLogic(const T& val) { // We will use this to track whether or not we need to terminate at the end of // this call (only for streams which terminate after a newline). bool newlined = false; std::string line; // If we need to, output the prefix. PrefixIfNeeded(); std::ostringstream convert; convert << val; if(convert.fail()) { PrefixIfNeeded(); if (!ignoreInput) { destination << "Failed lexical_cast(T) for output; output" " not shown." << std::endl; newlined = true; } } else { line = convert.str(); // If the length of the casted thing was 0, it may have been a stream // manipulator, so send it directly to the stream and don't ask questions. if (line.length() == 0) { // The prefix cannot be necessary at this point. if (!ignoreInput) // Only if the user wants it. destination << val; return; } // Now, we need to check for newlines in this line. If we find one, output // up until the newline, then output the newline and the prefix and continue // looking. size_t nl; size_t pos = 0; while ((nl = line.find('\n', pos)) != std::string::npos) { PrefixIfNeeded(); // Only output if the user wants it. if (!ignoreInput) { destination << line.substr(pos, nl - pos); destination << std::endl; newlined = true; } carriageReturned = true; // Regardless of whether or not we display it. pos = nl + 1; } if (pos != line.length()) // We need to display the rest. { PrefixIfNeeded(); if (!ignoreInput) destination << line.substr(pos); } } // If we displayed a newline and we need to terminate afterwards, do that. if (fatal && newlined) exit(1); } // This is an inline function (that is why it is here and not in .cc). void PrefixedOutStream::PrefixIfNeeded() { // If we need to, output a prefix. if (carriageReturned) { if (!ignoreInput) // But only if we are allowed to. destination << prefix; carriageReturned = false; // Denote that the prefix has been displayed. } } }; // namespace util }; // namespace mlpack #endif // MLPACK_CLI_PREFIXEDOUTSTREAM_IMPL_H RcppMLPACK/inst/include/mlpack/core/util/prefixedoutstream.hpp0000644000176200001440000001423713647512751024127 0ustar liggesusers/** * @file prefixedoutstream.hpp * @author Ryan Curtin * @author Matthew Amidon * * Declaration of the PrefixedOutStream class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_HPP #define __MLPACK_CORE_UTIL_PREFIXEDOUTSTREAM_HPP #include #include #include #include #include #include #include #include #include namespace mlpack { namespace util { /** * Allows us to output to an ostream with a prefix at the beginning of each * line, in the same way we would output to cout or cerr. The prefix is * specified in the constructor (as well as the destination ostream). A newline * must be passed to the stream, and then the prefix will be prepended to the * next line. For example, * * @code * PrefixedOutStream outstr(std::cout, "[TEST] "); * outstr << "Hello world I like " << 7.5; * outstr << "...Continue" << std::endl; * outstr << "After the CR\n" << std::endl; * @endcode * * would give, on std::cout, * * @code * [TEST] Hello world I like 7.5...Continue * [TEST] After the CR * [TEST] * @endcode * * These objects are used for the mlpack::Log levels (DEBUG, INFO, WARN, and * FATAL). */ class PrefixedOutStream { public: /** * Set up the PrefixedOutStream. * * @param destination ostream which receives output from this object. * @param prefix The prefix to prepend to each line. */ PrefixedOutStream(std::ostream& destination, const char* prefix, bool ignoreInput = false, bool fatal = false) : destination(destination), ignoreInput(ignoreInput), prefix(prefix), // We want the first call to operator<< to prefix the prefix so we set // carriageReturned to true. carriageReturned(true), fatal(fatal) { /* nothing to do */ } //! Write a bool to the stream. PrefixedOutStream& operator<<(bool val); //! Write a short to the stream. PrefixedOutStream& operator<<(short val); //! Write an unsigned short to the stream. PrefixedOutStream& operator<<(unsigned short val); //! Write an int to the stream. PrefixedOutStream& operator<<(int val); //! Write an unsigned int to the stream. PrefixedOutStream& operator<<(unsigned int val); //! Write a long to the stream. PrefixedOutStream& operator<<(long val); //! Write an unsigned long to the stream. PrefixedOutStream& operator<<(unsigned long val); //! Write a float to the stream. PrefixedOutStream& operator<<(float val); //! Write a double to the stream. PrefixedOutStream& operator<<(double val); //! Write a long double to the stream. PrefixedOutStream& operator<<(long double val); //! Write a void pointer to the stream. PrefixedOutStream& operator<<(void* val); //! Write a character array to the stream. PrefixedOutStream& operator<<(const char* str); //! Write a string to the stream. PrefixedOutStream& operator<<(std::string& str); //! Write a streambuf to the stream. PrefixedOutStream& operator<<(std::streambuf* sb); //! Write an ostream manipulator function to the stream. PrefixedOutStream& operator<<(std::ostream& (*pf)(std::ostream&)); //! Write an ios manipulator function to the stream. PrefixedOutStream& operator<<(std::ios& (*pf)(std::ios&)); //! Write an ios_base manipulator function to the stream. PrefixedOutStream& operator<<(std::ios_base& (*pf)(std::ios_base&)); //! Write anything else to the stream. template PrefixedOutStream& operator<<(const T& s); //! The output stream that all data is to be sent too; example: std::cout. std::ostream& destination; //! Discards input, prints nothing if true. bool ignoreInput; private: HAS_MEM_FUNC(ToString, HasToString) //! This handles forwarding all primitive types transparently template void CallBaseLogic(const T& s, typename boost::disable_if< boost::is_class >::type* = 0); //! Forward all objects that do not implement a ToString() method template void CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type* = 0, typename boost::disable_if< HasToString >::type* = 0); //! Call ToString() on all objects that implement ToString() before forwarding template void CallBaseLogic(const T& s, typename boost::enable_if< boost::is_class >::type* = 0, typename boost::enable_if< HasToString >::type* = 0); /** * @brief Conducts the base logic required in all the operator << overloads. * Mostly just a good idea to reduce copy-pasta. * * @tparam T The type of the data to output. * @param val The The data to be output. */ template void BaseLogic(const T& val); /** * Output the prefix, but only if we need to and if we are allowed to. */ inline void PrefixIfNeeded(); //! Contains the prefix we must prepend to each line. std::string prefix; //! If true, the previous call to operator<< encountered a CR, and a prefix //! will be necessary. bool carriageReturned; //! If true, the application will terminate with an error code when a CR is //! encountered. bool fatal; }; }; // namespace util }; // namespace mlpack // Template definitions. #include "prefixedoutstream_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/util/option.hpp0000644000176200001440000000774713647512751021675 0ustar liggesusers/** * @file option.hpp * @author Matthew Amidon * * Definition of the Option class, which is used to define parameters which are * used by CLI. The ProgramDoc class also resides here. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_OPTION_HPP #define __MLPACK_CORE_UTIL_OPTION_HPP #include #include "cli.hpp" namespace mlpack { namespace util { /** * A static object whose constructor registers a parameter with the CLI class. * This should not be used outside of CLI itself, and you should use the * PARAM_FLAG(), PARAM_DOUBLE(), PARAM_INT(), PARAM_STRING(), or other similar * macros to declare these objects instead of declaring them directly. * * @see core/io/cli.hpp, mlpack::CLI */ template class Option { public: /** * Construct an Option object. When constructed, it will register * itself with CLI. * * @param ignoreTemplate Whether or not the template type matters for this * option. Essentially differs options with no value (flags) from those * that do, and thus require a type. * @param defaultValue Default value this parameter will be initialized to. * @param identifier The name of the option (no dashes in front; for --help, * we would pass "help"). * @param description A short string describing the option. * @param parent Full pathname of the parent module that "owns" this option. * The default is the root node (an empty string). * @param required Whether or not the option is required at runtime. */ Option(bool ignoreTemplate, N defaultValue, const std::string& identifier, const std::string& description, const std::string& parent = std::string(""), bool required = false); /** * Constructs an Option object. When constructed, it will register a flag * with CLI. * * @param identifier The name of the option (no dashes in front); for --help * we would pass "help". * @param description A short string describing the option. * @param parent Full pathname of the parent module that "owns" this option. * The default is the root node (an empty string). */ Option(const std::string& identifier, const std::string& description, const std::string& parent = std::string("")); }; /** * A static object whose constructor registers program documentation with the * CLI class. This should not be used outside of CLI itself, and you should use * the PROGRAM_INFO() macro to declare these objects. Only one ProgramDoc * object should ever exist. * * @see core/io/cli.hpp, mlpack::CLI */ class ProgramDoc { public: /** * Construct a ProgramDoc object. When constructed, it will register itself * with CLI. * * @param programName Short string representing the name of the program. * @param documentation Long string containing documentation on how to use the * program and what it is. No newline characters are necessary; this is * taken care of by CLI later. */ ProgramDoc(const std::string& programName, const std::string& documentation); //! The name of the program. std::string programName; //! Documentation for what the program does. std::string documentation; }; }; // namespace util }; // namespace mlpack // For implementations of templated functions #include "option_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/util/version.hpp0000644000176200001440000000270513647512751022037 0ustar liggesusers/** * @file version.hpp * @author Ryan Curtin * * The current version of mlpack, available as macros and as a string. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_VERSION_HPP #define __MLPACK_CORE_UTIL_VERSION_HPP #include // The version of mlpack. If this is svn trunk, this will be a version with // higher number than the most recent release. #define __MLPACK_VERSION_MAJOR 1 #define __MLPACK_VERSION_MINOR 0 #define __MLPACK_VERSION_PATCH 9 // The name of the version (for use by --version). namespace mlpack { namespace util { /** * This will return either "mlpack x.y.z" or "mlpack trunk-rXXXXX" depending on * whether or not this is a stable version of mlpack or an svn revision. */ std::string GetVersion(); }; // namespace util }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/util/sfinae_utility.hpp0000644000176200001440000000523513647512751023403 0ustar liggesusers/** * @file sfinae_utility.hpp * @author Trironk Kiatkungwanglai * * This file contains macro utilities for the SFINAE Paradigm. These utilities * determine if classes passed in as template parameters contain members at * compile time, which is useful for changing functionality depending on what * operations an object is capable of performing. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_SFINAE_UTILITY #define __MLPACK_CORE_SFINAE_UTILITY #include #include /* * Constructs a template supporting the SFINAE pattern. * * This macro generates a template struct that is useful for enabling/disabling * a method if the template class passed in contains a member function matching * a given signature with a specified name. * * The generated struct should be used in conjunction with boost::disable_if and * boost::enable_if. Here is an example usage: * * For general references, see: * http://stackoverflow.com/a/264088/391618 * * For an MLPACK specific use case, see /mlpack/core/util/prefixedoutstream.hpp * and /mlpack/core/util/prefixedoutstream_impl.hpp * * @param NAME the name of the struct to construct. For example: HasToString * @param FUNC the name of the function to check for. For example: ToString */ #define HAS_MEM_FUNC(FUNC, NAME) \ template \ struct NAME { \ typedef char yes[1]; \ typedef char no [2]; \ template struct type_check; \ template static yes &chk(type_check *); \ template static no &chk(...); \ static bool const value = sizeof(chk(0)) == sizeof(yes); \ }; #endif RcppMLPACK/inst/include/mlpack/core/util/nulloutstream.hpp0000644000176200001440000000546513647512751023276 0ustar liggesusers/** * @file nulloutstream.hpp * @author Ryan Curtin * @author Matthew Amidon * * Definition of the NullOutStream class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_NULLOUTSTREAM_HPP #define __MLPACK_CORE_UTIL_NULLOUTSTREAM_HPP #include #include #include namespace mlpack { namespace util { /** * Used for Log::Debug when not compiled with debugging symbols. This class * does nothing and should be optimized out entirely by the compiler. */ class NullOutStream { public: /** * Does nothing. */ NullOutStream() { } /** * Does nothing. */ NullOutStream(const NullOutStream& /* other */) { } //! Does nothing. NullOutStream& operator<<(bool) { return *this; } //! Does nothing. NullOutStream& operator<<(short) { return *this; } //! Does nothing. NullOutStream& operator<<(unsigned short) { return *this; } //! Does nothing. NullOutStream& operator<<(int) { return *this; } //! Does nothing. NullOutStream& operator<<(unsigned int) { return *this; } //! Does nothing. NullOutStream& operator<<(long) { return *this; } //! Does nothing. NullOutStream& operator<<(unsigned long) { return *this; } //! Does nothing. NullOutStream& operator<<(float) { return *this; } //! Does nothing. NullOutStream& operator<<(double) { return *this; } //! Does nothing. NullOutStream& operator<<(long double) { return *this; } //! Does nothing. NullOutStream& operator<<(void*) { return *this; } //! Does nothing. NullOutStream& operator<<(const char*) { return *this; } //! Does nothing. NullOutStream& operator<<(std::string&) { return *this; } //! Does nothing. NullOutStream& operator<<(std::streambuf*) { return *this; } //! Does nothing. NullOutStream& operator<<(std::ostream& (*) (std::ostream&)) { return *this; } //! Does nothing. NullOutStream& operator<<(std::ios& (*) (std::ios&)) { return *this; } //! Does nothing. NullOutStream& operator<<(std::ios_base& (*) (std::ios_base&)) { return *this; } //! Does nothing. template NullOutStream& operator<<(const T&) { return *this; } }; } // namespace util } // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/util/option_impl.hpp0000644000176200001440000000350313647512751022700 0ustar liggesusers/** * @file option_impl.hpp * @author Matthew Amidon * * Implementation of template functions for the Option class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_UTIL_OPTION_IMPL_HPP #define __MLPACK_CORE_UTIL_OPTION_IMPL_HPP // Just in case it has not been included. #include "option.hpp" namespace mlpack { namespace util { /** * Registers a parameter with CLI. */ template Option::Option(bool ignoreTemplate, N defaultValue, const std::string& identifier, const std::string& description, const std::string& alias, bool required) { if (ignoreTemplate) { CLI::Add(identifier, description, alias, required); } else { CLI::Add(identifier, description, alias, required); CLI::GetParam(identifier) = defaultValue; } } /** * Registers a flag parameter with CLI. */ template Option::Option(const std::string& identifier, const std::string& description, const std::string& alias) { CLI::AddFlag(identifier, description, alias); } }; // namespace util }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/util/string_util.hpp0000644000176200001440000000232713647512751022715 0ustar liggesusers/** * @file string_util.hpp * @author Trironk Kiatkungwanglai * @author Ryan Birmingham * * Declares methods that are useful for writing formatting output. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_STRING_UTIL_HPP #define __MLPACK_CORE_STRING_UTIL_HPP #include namespace mlpack { namespace util { //! A utility function that replaces all all newlines with a number of spaces //! depending on the indentation level. std::string Indent(std::string input, const size_t howManyTabs = 1); }; // namespace util }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/0000755000176200001440000000000013647512751017617 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/tree/cover_tree.hpp0000644000176200001440000000226613647512751022473 0ustar liggesusers/** * @file cover_tree.hpp * @author Ryan Curtin * * Includes all the necessary files to use the CoverTree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_HPP #define __MLPACK_CORE_TREE_COVER_TREE_HPP #include "bounds.hpp" #include "cover_tree/cover_tree.hpp" #include "cover_tree/single_tree_traverser.hpp" #include "cover_tree/single_tree_traverser_impl.hpp" #include "cover_tree/dual_tree_traverser.hpp" #include "cover_tree/dual_tree_traverser_impl.hpp" #include "cover_tree/traits.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/mrkd_statistic_impl.hpp0000644000176200001440000000641113647512751024377 0ustar liggesusers/** * @file mrkd_statistic_impl.hpp * @author James Cline * * Definition of the statistic for multi-resolution kd-trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_MRKD_STATISTIC_IMPL_HPP #define __MLPACK_CORE_TREE_MRKD_STATISTIC_IMPL_HPP // In case it hasn't already been included. #include "mrkd_statistic.hpp" namespace mlpack { namespace tree { template MRKDStatistic::MRKDStatistic(const TreeType& /* node */) : dataset(NULL), begin(0), count(0), leftStat(NULL), rightStat(NULL), parentStat(NULL) { } /** * This constructor is called when a leaf is created. * * @param dataset Matrix that the tree is being built on. * @param begin Starting index corresponding to this leaf. * @param count Number of points held in this leaf. * template MRKDStatistic::MRKDStatistic(const TreeType& node) : dataset(&dataset), begin(begin), count(count), leftStat(NULL), rightStat(NULL), parentStat(NULL) { centerOfMass = dataset.col(begin); for (size_t i = begin + 1; i < begin + count; ++i) centerOfMass += dataset.col(i); sumOfSquaredNorms = 0.0; for (size_t i = begin; i < begin + count; ++i) sumOfSquaredNorms += arma::norm(dataset.col(i), 2); } ** * This constructor is called when a non-leaf node is created. * This lets you build fast bottom-up statistics when building trees. * * @param dataset Matrix that the tree is being built on. * @param begin Starting index corresponding to this leaf. * @param count Number of points held in this leaf. * @param leftStat MRKDStatistic object of the left child node. * @param rightStat MRKDStatistic object of the right child node. * template MRKDStatistic::MRKDStatistic(const MatType& dataset, const size_t begin, const size_t count, MRKDStatistic& leftStat, MRKDStatistic& rightStat) : dataset(&dataset), begin(begin), count(count), leftStat(&leftStat), rightStat(&rightStat), parentStat(NULL) { sumOfSquaredNorms = leftStat.sumOfSquaredNorms + rightStat.sumOfSquaredNorms; * centerOfMass = ((leftStat.centerOfMass * leftStat.count) + (rightStat.centerOfMass * rightStat.count)) / (leftStat.count + rightStat.count); * centerOfMass = leftStat.centerOfMass + rightStat.centerOfMass; isWhitelistValid = false; leftStat.parentStat = this; rightStat.parentStat = this; } */ }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_MRKD_STATISTIC_IMPL_HPP RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/0000755000176200001440000000000013647521523023272 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/mean_split.hpp0000644000176200001440000001202013647512751026134 0ustar liggesusers/** * @file mean_split.hpp * @author Yash Vadalia * @author Ryan Curtin * * Definition of MeanSplit, a class that splits a binary space partitioning tree * node into two parts using the mean of the values in a certain dimension. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_HPP #include namespace mlpack { namespace tree /** Trees and tree-building procedures. */ { /** * A binary space partitioning tree node is split into its left and right child. * The split is done in the dimension that has the maximum width. The points are * divided into two parts based on the mean in this dimension. */ template class MeanSplit { public: /** * Split the node according to the mean value in the dimension with maximum * width. * * @param bound The bound used for this node. * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension This will be filled with the dimension the node is to * be split on. * @param splitCol The index at which the dataset is divided into two parts * after the rearrangement. */ static bool SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol); /** * Split the node according to the mean value in the dimension with maximum * width and return a list of changed indices. * * @param bound The bound used for this node. * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension This will be filled with the dimension the node is * to be split on. * @param splitCol The index at which the dataset is divided into two parts * after the rearrangement. * @param oldFromNew Vector which will be filled with the old positions for * each new point. */ static bool SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol, std::vector& oldFromNew); private: /** * Reorder the dataset into two parts such that they lie on either side of * splitCol. * * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension The dimension to split the node on. * @param splitVal The split in dimension splitDimension is based on this * value. */ static size_t PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal); /** * Reorder the dataset into two parts such that they lie on either side of * splitCol. Also returns a list of changed indices. * * @param data The dataset used by the binary space tree. * @param begin Index of the starting point in the dataset that belongs to * this node. * @param count Number of points in this node. * @param splitDimension The dimension to split the node on. * @param splitVal The split in dimension splitDimension is based on this * value. * @param oldFromNew Vector which will be filled with the old positions for * each new point. */ static size_t PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal, std::vector& oldFromNew); }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "mean_split_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/dual_tree_traverser.hpp0000644000176200001440000000674413647512751030062 0ustar liggesusers/** * @file dual_tree_traverser.hpp * @author Ryan Curtin * * Defines the DualTreeTraverser for the BinarySpaceTree tree type. This is a * nested class of BinarySpaceTree which traverses two trees in a depth-first * manner with a given set of rules which indicate the branches which can be * pruned and the order in which to recurse. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_HPP #include #include "binary_space_tree.hpp" namespace mlpack { namespace tree { template template class BinarySpaceTree:: DualTreeTraverser { public: /** * Instantiate the dual-tree traverser with the given rule set. */ DualTreeTraverser(RuleType& rule); /** * Traverse the two trees. This does not reset the number of prunes. * * @param queryNode The query node to be traversed. * @param referenceNode The reference node to be traversed. * @param score The score of the current node combination. */ void Traverse(BinarySpaceTree& queryNode, BinarySpaceTree& referenceNode); //! Get the number of prunes. size_t NumPrunes() const { return numPrunes; } //! Modify the number of prunes. size_t& NumPrunes() { return numPrunes; } //! Get the number of visited combinations. size_t NumVisited() const { return numVisited; } //! Modify the number of visited combinations. size_t& NumVisited() { return numVisited; } //! Get the number of times a node combination was scored. size_t NumScores() const { return numScores; } //! Modify the number of times a node combination was scored. size_t& NumScores() { return numScores; } //! Get the number of times a base case was calculated. size_t NumBaseCases() const { return numBaseCases; } //! Modify the number of times a base case was calculated. size_t& NumBaseCases() { return numBaseCases; } private: //! Reference to the rules with which the trees will be traversed. RuleType& rule; //! The number of prunes. size_t numPrunes; //! The number of node combinations that have been visited during traversal. size_t numVisited; //! The number of times a node combination was scored. size_t numScores; //! The number of times a base case was calculated. size_t numBaseCases; //! Traversal information, held in the class so that it isn't continually //! being reallocated. typename RuleType::TraversalInfoType traversalInfo; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "dual_tree_traverser_impl.hpp" #endif // __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_HPP RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/mean_split_impl.hpp0000644000176200001440000001626013647512751027167 0ustar liggesusers/** * @file mean_split_impl.hpp * @author Yash Vadalia * @author Ryan Curtin * * Implementation of class(MeanSplit) to split a binary space partition tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_MEAN_SPLIT_IMPL_HPP #include "mean_split.hpp" namespace mlpack { namespace tree { template bool MeanSplit::SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol) { splitDimension = data.n_rows; // Indicate invalid. double maxWidth = -1; // Find the split dimension. for (size_t d = 0; d < data.n_rows; d++) { double width = bound[d].Width(); if (width > maxWidth) { maxWidth = width; splitDimension = d; } } if (maxWidth == 0) // All these points are the same. We can't split. return false; // Split in the middle of that dimension. double splitVal = bound[splitDimension].Mid(); // Perform the actual splitting. This will order the dataset such that points // with value in dimension splitDimension less than or equal to splitVal are // on the left of splitCol, and points with value in dimension splitDimension // greater than splitVal are on the right side of splitCol. splitCol = PerformSplit(data, begin, count, splitDimension, splitVal); return true; } template bool MeanSplit::SplitNode(const BoundType& bound, MatType& data, const size_t begin, const size_t count, size_t& splitDimension, size_t& splitCol, std::vector& oldFromNew) { splitDimension = data.n_rows; // Indicate invalid. double maxWidth = -1; // Find the split dimension. for (size_t d = 0; d < data.n_rows; d++) { double width = bound[d].Width(); if (width > maxWidth) { maxWidth = width; splitDimension = d; } } if (maxWidth == 0) // All these points are the same. We can't split. return false; // Split in the middle of that dimension. double splitVal = bound[splitDimension].Mid(); // Perform the actual splitting. This will order the dataset such that points // with value in dimension splitDimension less than or equal to splitVal are // on the left of splitCol, and points with value in dimension splitDimension // greater than splitVal are on the right side of splitCol. splitCol = PerformSplit(data, begin, count, splitDimension, splitVal, oldFromNew); return true; } template size_t MeanSplit:: PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal) { // This method modifies the input dataset. We loop both from the left and // right sides of the points contained in this node. The points less than // splitVal should be on the left side of the matrix, and the points greater // than splitVal should be on the right side of the matrix. size_t left = begin; size_t right = begin + count - 1; // First half-iteration of the loop is out here because the termination // condition is in the middle. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; while (left <= right) { // Swap columns. data.swap_cols(left, right); // See how many points on the left are correct. When they are correct, // increase the left counter accordingly. When we encounter one that isn't // correct, stop. We will switch it later. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; // Now see how many points on the right are correct. When they are correct, // decrease the right counter accordingly. When we encounter one that isn't // correct, stop. We will switch it with the wrong point we found in the // previous loop. while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; } return left; } template size_t MeanSplit:: PerformSplit(MatType& data, const size_t begin, const size_t count, const size_t splitDimension, const double splitVal, std::vector& oldFromNew) { // This method modifies the input dataset. We loop both from the left and // right sides of the points contained in this node. The points less than // splitVal should be on the left side of the matrix, and the points greater // than splitVal should be on the right side of the matrix. size_t left = begin; size_t right = begin + count - 1; // First half-iteration of the loop is out here because the termination // condition is in the middle. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; while (left <= right) { // Swap columns. data.swap_cols(left, right); // Update the indices for what we changed. size_t t = oldFromNew[left]; oldFromNew[left] = oldFromNew[right]; oldFromNew[right] = t; // See how many points on the left are correct. When they are correct, // increase the left counter accordingly. When we encounter one that isn't // correct, stop. We will switch it later. while ((data(splitDimension, left) < splitVal) && (left <= right)) left++; // Now see how many points on the right are correct. When they are correct, // decrease the right counter accordingly. When we encounter one that isn't // correct, stop. We will switch it with the wrong point we found in the // previous loop. while ((data(splitDimension, right) >= splitVal) && (left <= right)) right--; } return left; } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/binary_space_tree.hpp0000644000176200001440000004371613647512751027477 0ustar liggesusers/** * @file binary_space_tree.hpp * * Definition of generalized binary space partitioning tree (BinarySpaceTree). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_HPP #include #include "mean_split.hpp" #include "../statistic.hpp" namespace mlpack { namespace tree /** Trees and tree-building procedures. */ { /** * A binary space partitioning tree, such as a KD-tree or a ball tree. Once the * bound and type of dataset is defined, the tree will construct itself. Call * the constructor with the dataset to build the tree on, and the entire tree * will be built. * * This particular tree does not allow growth, so you cannot add or delete nodes * from it. If you need to add or delete a node, the better procedure is to * rebuild the tree entirely. * * This tree does take one runtime parameter in the constructor, which is the * max leaf size to be used. * * @tparam BoundType The bound used for each node. The valid types of bounds * and the necessary skeleton interface for this class can be found in * bounds/. * @tparam StatisticType Extra data contained in the node. See statistic.hpp * for the necessary skeleton interface. * @tparam MatType The dataset class. * @tparam SplitType The class that partitions the dataset/points at a * particular node into two parts. Its definition decides the way this split * is done. */ template > class BinarySpaceTree { private: //! The left child node. BinarySpaceTree* left; //! The right child node. BinarySpaceTree* right; //! The parent node (NULL if this is the root of the tree). BinarySpaceTree* parent; //! The index of the first point in the dataset contained in this node (and //! its children). size_t begin; //! The number of points of the dataset contained in this node (and its //! children). size_t count; //! The max leaf size. size_t maxLeafSize; //! The bound object for this node. BoundType bound; //! Any extra data contained in the node. StatisticType stat; //! The dimension this node split on if it is a parent. size_t splitDimension; //! The distance from the centroid of this node to the centroid of the parent. double parentDistance; //! The worst possible distance to the furthest descendant, cached to speed //! things up. double furthestDescendantDistance; //! The minimum distance from the center to any edge of the bound. double minimumBoundDistance; //! The dataset. MatType& dataset; public: //! So other classes can use TreeType::Mat. typedef MatType Mat; //! A single-tree traverser for binary space trees; see //! single_tree_traverser.hpp for implementation. template class SingleTreeTraverser; //! A dual-tree traverser for binary space trees; see dual_tree_traverser.hpp. template class DualTreeTraverser; /** * Construct this as the root node of a binary space tree using the given * dataset. This will modify the ordering of the points in the dataset! * * @param data Dataset to create tree from. This will be modified! * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t maxLeafSize = 20); /** * Construct this as the root node of a binary space tree using the given * dataset. This will modify the ordering of points in the dataset! A * mapping of the old point indices to the new point indices is filled. * * @param data Dataset to create tree from. This will be modified! * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, std::vector& oldFromNew, const size_t maxLeafSize = 20); /** * Construct this as the root node of a binary space tree using the given * dataset. This will modify the ordering of points in the dataset! A * mapping of the old point indices to the new point indices is filled, as * well as a mapping of the new point indices to the old point indices. * * @param data Dataset to create tree from. This will be modified! * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param newFromOld Vector which will be filled with the new positions for * each old point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, std::vector& oldFromNew, std::vector& newFromOld, const size_t maxLeafSize = 20); /** * Construct this node on a subset of the given matrix, starting at column * begin and using count points. The ordering of that subset of points * will be modified! This is used for recursive tree-building by the other * constructors which don't specify point indices. * * @param data Dataset to create tree from. This will be modified! * @param begin Index of point to start tree construction with. * @param count Number of points to use to construct tree. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t begin, const size_t count, BinarySpaceTree* parent = NULL, const size_t maxLeafSize = 20); /** * Construct this node on a subset of the given matrix, starting at column * begin_in and using count_in points. The ordering of that subset of points * will be modified! This is used for recursive tree-building by the other * constructors which don't specify point indices. * * A mapping of the old point indices to the new point indices is filled, but * it is expected that the vector is already allocated with size greater than * or equal to (begin_in + count_in), and if that is not true, invalid memory * reads (and writes) will occur. * * @param data Dataset to create tree from. This will be modified! * @param begin Index of point to start tree construction with. * @param count Number of points to use to construct tree. * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, BinarySpaceTree* parent = NULL, const size_t maxLeafSize = 20); /** * Construct this node on a subset of the given matrix, starting at column * begin_in and using count_in points. The ordering of that subset of points * will be modified! This is used for recursive tree-building by the other * constructors which don't specify point indices. * * A mapping of the old point indices to the new point indices is filled, as * well as a mapping of the new point indices to the old point indices. It is * expected that the vector is already allocated with size greater than or * equal to (begin_in + count_in), and if that is not true, invalid memory * reads (and writes) will occur. * * @param data Dataset to create tree from. This will be modified! * @param begin Index of point to start tree construction with. * @param count Number of points to use to construct tree. * @param oldFromNew Vector which will be filled with the old positions for * each new point. * @param newFromOld Vector which will be filled with the new positions for * each old point. * @param maxLeafSize Size of each leaf in the tree. */ BinarySpaceTree(MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, std::vector& newFromOld, BinarySpaceTree* parent = NULL, const size_t maxLeafSize = 20); /** * Create a binary space tree by copying the other tree. Be careful! This * can take a long time and use a lot of memory. * * @param other Tree to be replicated. */ BinarySpaceTree(const BinarySpaceTree& other); /** * Deletes this node, deallocating the memory for the children and calling * their destructors in turn. This will invalidate any pointers or references * to any nodes which are children of this one. */ ~BinarySpaceTree(); /** * Find a node in this tree by its begin and count (const). * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param begin The begin() of the node to find. * @param count The count() of the node to find. * @return The found node, or NULL if not found. */ const BinarySpaceTree* FindByBeginCount(size_t begin, size_t count) const; /** * Find a node in this tree by its begin and count. * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param begin The begin() of the node to find. * @param count The count() of the node to find. * @return The found node, or NULL if not found. */ BinarySpaceTree* FindByBeginCount(size_t begin, size_t count); //! Return the bound object for this node. const BoundType& Bound() const { return bound; } //! Return the bound object for this node. BoundType& Bound() { return bound; } //! Return the statistic object for this node. const StatisticType& Stat() const { return stat; } //! Return the statistic object for this node. StatisticType& Stat() { return stat; } //! Return whether or not this node is a leaf (true if it has no children). bool IsLeaf() const; //! Return the max leaf size. size_t MaxLeafSize() const { return maxLeafSize; } //! Modify the max leaf size. size_t& MaxLeafSize() { return maxLeafSize; } //! Fills the tree to the specified level. size_t ExtendTree(const size_t level); //! Gets the left child of this node. BinarySpaceTree* Left() const { return left; } //! Modify the left child of this node. BinarySpaceTree*& Left() { return left; } //! Gets the right child of this node. BinarySpaceTree* Right() const { return right; } //! Modify the right child of this node. BinarySpaceTree*& Right() { return right; } //! Gets the parent of this node. BinarySpaceTree* Parent() const { return parent; } //! Modify the parent of this node. BinarySpaceTree*& Parent() { return parent; } //! Get the split dimension for this node. size_t SplitDimension() const { return splitDimension; } //! Modify the split dimension for this node. size_t& SplitDimension() { return splitDimension; } //! Get the dataset which the tree is built on. const MatType& Dataset() const { return dataset; } //! Modify the dataset which the tree is built on. Be careful! MatType& Dataset() { return dataset; } //! Get the metric which the tree uses. typename BoundType::MetricType Metric() const { return bound.Metric(); } //! Get the centroid of the node and store it in the given vector. void Centroid(arma::vec& centroid) { bound.Centroid(centroid); } //! Return the number of children in this node. size_t NumChildren() const; /** * Return the furthest distance to a point held in this node. If this is not * a leaf node, then the distance is 0 because the node holds no points. */ double FurthestPointDistance() const; /** * Return the furthest possible descendant distance. This returns the maximum * distance from the centroid to the edge of the bound and not the empirical * quantity which is the actual furthest descendant distance. So the actual * furthest descendant distance may be less than what this method returns (but * it will never be greater than this). */ double FurthestDescendantDistance() const; //! Return the minimum distance from the center of the node to any bound edge. double MinimumBoundDistance() const; //! Return the distance from the center of this node to the center of the //! parent node. double ParentDistance() const { return parentDistance; } //! Modify the distance from the center of this node to the center of the //! parent node. double& ParentDistance() { return parentDistance; } /** * Return the specified child (0 will be left, 1 will be right). If the index * is greater than 1, this will return the right child. * * @param child Index of child to return. */ BinarySpaceTree& Child(const size_t child) const; //! Return the number of points in this node (0 if not a leaf). size_t NumPoints() const; /** * Return the number of descendants of this node. For a non-leaf in a binary * space tree, this is the number of points at the descendant leaves. For a * leaf, this is the number of points in the leaf. */ size_t NumDescendants() const; /** * Return the index (with reference to the dataset) of a particular descendant * of this node. The index should be greater than zero but less than the * number of descendants. * * @param index Index of the descendant. */ size_t Descendant(const size_t index) const; /** * Return the index (with reference to the dataset) of a particular point in * this node. This will happily return invalid indices if the given index is * greater than the number of points in this node (obtained with NumPoints()) * -- be careful. * * @param index Index of point for which a dataset index is wanted. */ size_t Point(const size_t index) const; //! Return the minimum distance to another node. double MinDistance(const BinarySpaceTree* other) const { return bound.MinDistance(other->Bound()); } //! Return the maximum distance to another node. double MaxDistance(const BinarySpaceTree* other) const { return bound.MaxDistance(other->Bound()); } //! Return the minimum and maximum distance to another node. math::Range RangeDistance(const BinarySpaceTree* other) const { return bound.RangeDistance(other->Bound()); } //! Return the minimum distance to another point. template double MinDistance(const VecType& point, typename boost::enable_if >::type* = 0) const { return bound.MinDistance(point); } //! Return the maximum distance to another point. template double MaxDistance(const VecType& point, typename boost::enable_if >::type* = 0) const { return bound.MaxDistance(point); } //! Return the minimum and maximum distance to another point. template math::Range RangeDistance(const VecType& point, typename boost::enable_if >::type* = 0) const { return bound.RangeDistance(point); } /** * Returns the dimension this parent's children are split on. */ size_t GetSplitDimension() const; /** * Obtains the number of nodes in the tree, starting with this. */ size_t TreeSize() const; /** * Obtains the number of levels below this node in the tree, starting with * this. */ size_t TreeDepth() const; //! Return the index of the beginning point of this subset. size_t Begin() const { return begin; } //! Modify the index of the beginning point of this subset. size_t& Begin() { return begin; } /** * Gets the index one beyond the last index in the subset. */ size_t End() const; //! Return the number of points in this subset. size_t Count() const { return count; } //! Modify the number of points in this subset. size_t& Count() { return count; } //! Returns false: this tree type does not have self children. static bool HasSelfChildren() { return false; } private: /** * Private copy constructor, available only to fill (pad) the tree to a * specified level. */ BinarySpaceTree(const size_t begin, const size_t count, BoundType bound, StatisticType stat, const int maxLeafSize = 20) : left(NULL), right(NULL), begin(begin), count(count), bound(bound), stat(stat), maxLeafSize(maxLeafSize) { } BinarySpaceTree* CopyMe() { return new BinarySpaceTree(begin, count, bound, stat, maxLeafSize); } /** * Splits the current node, assigning its left and right children recursively. * * @param data Dataset which we are using. */ void SplitNode(MatType& data); /** * Splits the current node, assigning its left and right children recursively. * Also returns a list of the changed indices. * * @param data Dataset which we are using. * @param oldFromNew Vector holding permuted indices. */ void SplitNode(MatType& data, std::vector& oldFromNew); public: /** * Returns a string representation of this object. */ std::string ToString() const; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "binary_space_tree_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/single_tree_traverser.hpp0000644000176200001440000000452413647512751030410 0ustar liggesusers/** * @file single_tree_traverser.hpp * @author Ryan Curtin * * A nested class of BinarySpaceTree which traverses the entire tree with a * given set of rules which indicate the branches which can be pruned and the * order in which to recurse. This traverser is a depth-first traverser. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_HPP #include #include "binary_space_tree.hpp" namespace mlpack { namespace tree { template template class BinarySpaceTree:: SingleTreeTraverser { public: /** * Instantiate the single tree traverser with the given rule set. */ SingleTreeTraverser(RuleType& rule); /** * Traverse the tree with the given point. * * @param queryIndex The index of the point in the query set which is being * used as the query point. * @param referenceNode The tree node to be traversed. */ void Traverse(const size_t queryIndex, BinarySpaceTree& referenceNode); //! Get the number of prunes. size_t NumPrunes() const { return numPrunes; } //! Modify the number of prunes. size_t& NumPrunes() { return numPrunes; } private: //! Reference to the rules with which the tree will be traversed. RuleType& rule; //! The number of nodes which have been pruned during traversal. size_t numPrunes; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "single_tree_traverser_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/binary_space_tree_impl.hpp0000644000176200001440000005240313647512751030511 0ustar liggesusers/** * @file binary_space_tree_impl.hpp * * Implementation of generalized space partitioning tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_BINARY_SPACE_TREE_IMPL_HPP // In case it wasn't included already for some reason. #include "binary_space_tree.hpp" //#include //#include #include namespace mlpack { namespace tree { // Each of these overloads is kept as a separate function to keep the overhead // from the two std::vectors out, if possible. template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t maxLeafSize) : left(NULL), right(NULL), parent(NULL), begin(0), /* This root node starts at index 0, */ count(data.n_cols), /* and spans all of the dataset. */ maxLeafSize(maxLeafSize), bound(data.n_rows), parentDistance(0), // Parent distance for the root is 0: it has no parent. dataset(data) { // Do the actual splitting of this node. SplitNode(data); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, std::vector& oldFromNew, const size_t maxLeafSize) : left(NULL), right(NULL), parent(NULL), begin(0), count(data.n_cols), maxLeafSize(maxLeafSize), bound(data.n_rows), parentDistance(0), // Parent distance for the root is 0: it has no parent. dataset(data) { // Initialize oldFromNew correctly. oldFromNew.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) oldFromNew[i] = i; // Fill with unharmed indices. // Now do the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, std::vector& oldFromNew, std::vector& newFromOld, const size_t maxLeafSize) : left(NULL), right(NULL), parent(NULL), begin(0), count(data.n_cols), maxLeafSize(maxLeafSize), bound(data.n_rows), parentDistance(0), // Parent distance for the root is 0: it has no parent. dataset(data) { // Initialize the oldFromNew vector correctly. oldFromNew.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) oldFromNew[i] = i; // Fill with unharmed indices. // Now do the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); // Map the newFromOld indices correctly. newFromOld.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) newFromOld[oldFromNew[i]] = i; } template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t begin, const size_t count, BinarySpaceTree* parent, const size_t maxLeafSize) : left(NULL), right(NULL), parent(parent), begin(begin), count(count), maxLeafSize(maxLeafSize), bound(data.n_rows), dataset(data) { // Perform the actual splitting. SplitNode(data); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, BinarySpaceTree* parent, const size_t maxLeafSize) : left(NULL), right(NULL), parent(parent), begin(begin), count(count), maxLeafSize(maxLeafSize), bound(data.n_rows), dataset(data) { // Hopefully the vector is initialized correctly! We can't check that // entirely but we can do a minor sanity check. assert(oldFromNew.size() == data.n_cols); // Perform the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); } template BinarySpaceTree::BinarySpaceTree( MatType& data, const size_t begin, const size_t count, std::vector& oldFromNew, std::vector& newFromOld, BinarySpaceTree* parent, const size_t maxLeafSize) : left(NULL), right(NULL), parent(parent), begin(begin), count(count), maxLeafSize(maxLeafSize), bound(data.n_rows), dataset(data) { // Hopefully the vector is initialized correctly! We can't check that // entirely but we can do a minor sanity check. //Log::Assert(oldFromNew.size() == data.n_cols); // Perform the actual splitting. SplitNode(data, oldFromNew); // Create the statistic depending on if we are a leaf or not. stat = StatisticType(*this); // Map the newFromOld indices correctly. newFromOld.resize(data.n_cols); for (size_t i = 0; i < data.n_cols; i++) newFromOld[oldFromNew[i]] = i; } /* template BinarySpaceTree::BinarySpaceTree() : left(NULL), right(NULL), parent(NULL), begin(0), count(0), bound(), stat(), maxLeafSize(20) // Default max leaf size is 20. { // Nothing to do. }*/ /** * Create a binary space tree by copying the other tree. Be careful! This can * take a long time and use a lot of memory. */ template BinarySpaceTree::BinarySpaceTree( const BinarySpaceTree& other) : left(NULL), right(NULL), parent(other.parent), begin(other.begin), count(other.count), maxLeafSize(other.maxLeafSize), bound(other.bound), stat(other.stat), splitDimension(other.splitDimension), parentDistance(other.parentDistance), furthestDescendantDistance(other.furthestDescendantDistance), dataset(other.dataset) { // Create left and right children (if any). if (other.Left()) { left = new BinarySpaceTree(*other.Left()); left->Parent() = this; // Set parent to this, not other tree. } if (other.Right()) { right = new BinarySpaceTree(*other.Right()); right->Parent() = this; // Set parent to this, not other tree. } } /** * Deletes this node, deallocating the memory for the children and calling their * destructors in turn. This will invalidate any pointers or references to any * nodes which are children of this one. */ template BinarySpaceTree:: ~BinarySpaceTree() { if (left) delete left; if (right) delete right; } /** * Find a node in this tree by its begin and count. * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param queryBegin The Begin() of the node to find. * @param queryCount The Count() of the node to find. * @return The found node, or NULL if nothing is found. */ template const BinarySpaceTree* BinarySpaceTree::FindByBeginCount( size_t queryBegin, size_t queryCount) const { //Log::Assert(queryBegin >= begin); //Log::Assert(queryCount <= count); if (begin == queryBegin && count == queryCount) return this; else if (IsLeaf()) return NULL; else if (queryBegin < right->Begin()) return left->FindByBeginCount(queryBegin, queryCount); else return right->FindByBeginCount(queryBegin, queryCount); } /** * Find a node in this tree by its begin and count (const). * * Every node is uniquely identified by these two numbers. * This is useful for communicating position over the network, * when pointers would be invalid. * * @param queryBegin the Begin() of the node to find * @param queryCount the Count() of the node to find * @return the found node, or NULL */ template BinarySpaceTree* BinarySpaceTree::FindByBeginCount( const size_t queryBegin, const size_t queryCount) { //mlpack::Log::Assert(begin >= queryBegin); //mlpack::Log::Assert(count <= queryCount); if (begin == queryBegin && count == queryCount) return this; else if (IsLeaf()) return NULL; else if (queryBegin < left->End()) return left->FindByBeginCount(queryBegin, queryCount); else if (right) return right->FindByBeginCount(queryBegin, queryCount); else return NULL; } template size_t BinarySpaceTree::ExtendTree( size_t level) { --level; // Return the number of nodes duplicated. size_t nodesDuplicated = 0; if (level > 0) { if (!left) { left = CopyMe(); ++nodesDuplicated; } nodesDuplicated += left->ExtendTree(level); if (right) { nodesDuplicated += right->ExtendTree(level); } } return nodesDuplicated; } /* TODO: we can likely calculate this earlier, then store the * result in a private member variable; for now, we can * just calculate as needed... * * Also, perhaps we should rewrite these recursive functions * to avoid exceeding the stack limit */ template size_t BinarySpaceTree:: TreeSize() const { // Recursively count the nodes on each side of the tree. The plus one is // because we have to count this node, too. return 1 + (left ? left->TreeSize() : 0) + (right ? right->TreeSize() : 0); } template size_t BinarySpaceTree:: TreeDepth() const { // Recursively count the depth on each side of the tree. The plus one is // because we have to count this node, too. return 1 + std::max((left ? left->TreeDepth() : 0), (right ? right->TreeDepth() : 0)); } template inline bool BinarySpaceTree:: IsLeaf() const { return !left; } /** * Returns the number of children in this node. */ template inline size_t BinarySpaceTree:: NumChildren() const { if (left && right) return 2; if (left) return 1; return 0; } /** * Return a bound on the furthest point in the node from the centroid. This * returns 0 unless the node is a leaf. */ template inline double BinarySpaceTree:: FurthestPointDistance() const { if (!IsLeaf()) return 0.0; // Otherwise return the distance from the centroid to a corner of the bound. return 0.5 * bound.Diameter(); } /** * Return the furthest possible descendant distance. This returns the maximum * distance from the centroid to the edge of the bound and not the empirical * quantity which is the actual furthest descendant distance. So the actual * furthest descendant distance may be less than what this method returns (but * it will never be greater than this). */ template inline double BinarySpaceTree:: FurthestDescendantDistance() const { return furthestDescendantDistance; } //! Return the minimum distance from the center to any bound edge. template inline double BinarySpaceTree:: MinimumBoundDistance() const { return bound.MinWidth() / 2.0; } /** * Return the specified child. */ template inline BinarySpaceTree& BinarySpaceTree:: Child(const size_t child) const { if (child == 0) return *left; else return *right; } /** * Return the number of points contained in this node. */ template inline size_t BinarySpaceTree:: NumPoints() const { if (left) return 0; return count; } /** * Return the number of descendants contained in the node. */ template inline size_t BinarySpaceTree:: NumDescendants() const { return count; } /** * Return the index of a particular descendant contained in this node. */ template inline size_t BinarySpaceTree:: Descendant(const size_t index) const { return (begin + index); } /** * Return the index of a particular point contained in this node. */ template inline size_t BinarySpaceTree:: Point(const size_t index) const { return (begin + index); } /** * Gets the index one beyond the last index in the series. */ template inline size_t BinarySpaceTree:: End() const { return begin + count; } template void BinarySpaceTree::SplitNode( MatType& data) { // We need to expand the bounds of this node properly. bound |= data.cols(begin, begin + count - 1); // Calculate the furthest descendant distance. furthestDescendantDistance = 0.5 * bound.Diameter(); // Now, check if we need to split at all. if (count <= maxLeafSize) return; // We can't split this. // splitCol denotes the two partitions of the dataset after the split. The // points on its left go to the left child and the others go to the right // child. size_t splitCol; // Split the node. The elements of 'data' are reordered by the splitting // algorithm. This function call updates splitDimension and splitCol. const bool split = SplitType::SplitNode(bound, data, begin, count, splitDimension, splitCol); // The node may not be always split. For instance, if all the points are the // same, we can't split them. if (!split) return; // Now that we know the split column, we will recursively split the children // by calling their constructors (which perform this splitting process). left = new BinarySpaceTree(data, begin, splitCol - begin, this, maxLeafSize); right = new BinarySpaceTree(data, splitCol, begin + count - splitCol, this, maxLeafSize); // Calculate parent distances for those two nodes. arma::vec centroid, leftCentroid, rightCentroid; Centroid(centroid); left->Centroid(leftCentroid); right->Centroid(rightCentroid); const double leftParentDistance = bound.Metric().Evaluate(centroid, leftCentroid); const double rightParentDistance = bound.Metric().Evaluate(centroid, rightCentroid); left->ParentDistance() = leftParentDistance; right->ParentDistance() = rightParentDistance; } template void BinarySpaceTree::SplitNode( MatType& data, std::vector& oldFromNew) { // This should be a single function for Bound. // We need to expand the bounds of this node properly. bound |= data.cols(begin, begin + count - 1); // Calculate the furthest descendant distance. furthestDescendantDistance = 0.5 * bound.Diameter(); // First, check if we need to split at all. if (count <= maxLeafSize) return; // We can't split this. // splitCol denotes the two partitions of the dataset after the split. The // points on its left go to the left child and the others go to the right // child. size_t splitCol; // Split the node. The elements of 'data' are reordered by the splitting // algorithm. This function call updates splitDimension, splitCol and // oldFromNew. const bool split = SplitType::SplitNode(bound, data, begin, count, splitDimension, splitCol, oldFromNew); // The node may not be always split. For instance, if all the points are the // same, we can't split them. if (!split) return; // Now that we know the split column, we will recursively split the children // by calling their constructors (which perform this splitting process). left = new BinarySpaceTree(data, begin, splitCol - begin, oldFromNew, this, maxLeafSize); right = new BinarySpaceTree(data, splitCol, begin + count - splitCol, oldFromNew, this, maxLeafSize); // Calculate parent distances for those two nodes. arma::vec centroid, leftCentroid, rightCentroid; Centroid(centroid); left->Centroid(leftCentroid); right->Centroid(rightCentroid); const double leftParentDistance = bound.Metric().Evaluate(centroid, leftCentroid); const double rightParentDistance = bound.Metric().Evaluate(centroid, rightCentroid); left->ParentDistance() = leftParentDistance; right->ParentDistance() = rightParentDistance; } /** * Returns a string representation of this object. */ template std::string BinarySpaceTree:: ToString() const { std::ostringstream convert; convert << "BinarySpaceTree [" << this << "]" << std::endl; convert << " First point: " << begin << std::endl; convert << " Number of descendants: " << count << std::endl; convert << " Bound: " << std::endl; convert << mlpack::util::Indent(bound.ToString(), 2); convert << " Statistic: " << std::endl; convert << mlpack::util::Indent(stat.ToString(), 2); convert << " Max leaf size: " << maxLeafSize << std::endl; convert << " Split dimension: " << splitDimension << std::endl; // How many levels should we print? This will print the top two tree levels. if (left != NULL && parent == NULL) { convert << " Left child:" << std::endl; convert << mlpack::util::Indent(left->ToString(), 2); } if (right != NULL && parent == NULL) { convert << " Right child:" << std::endl; convert << mlpack::util::Indent(right->ToString(), 2); } return convert.str(); } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/single_tree_traverser_impl.hpp0000644000176200001440000001012613647512751031424 0ustar liggesusers/** * @file single_tree_traverser_impl.hpp * @author Ryan Curtin * * A nested class of BinarySpaceTree which traverses the entire tree with a * given set of rules which indicate the branches which can be pruned and the * order in which to recurse. This traverser is a depth-first traverser. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP // In case it hasn't been included yet. #include "single_tree_traverser.hpp" #include namespace mlpack { namespace tree { template template BinarySpaceTree:: SingleTreeTraverser::SingleTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0) { /* Nothing to do. */ } template template void BinarySpaceTree:: SingleTreeTraverser::Traverse( const size_t queryIndex, BinarySpaceTree& referenceNode) { // If this is a leaf, the base cases have already been run. if (referenceNode.IsLeaf()) return; // If either score is DBL_MAX, we do not recurse into that node. double leftScore = rule.Score(queryIndex, *referenceNode.Left()); // Immediately run the base case if it's not pruned. if ((leftScore != DBL_MAX) && (referenceNode.Left()->IsLeaf())) { for (size_t i = referenceNode.Left()->Begin(); i < referenceNode.Left()->End(); ++i) rule.BaseCase(queryIndex, i); } double rightScore = rule.Score(queryIndex, *referenceNode.Right()); // Immediately run the base case if it's not pruned. if ((rightScore != DBL_MAX) && (referenceNode.Right()->IsLeaf())) { for (size_t i = referenceNode.Right()->Begin(); i < referenceNode.Right()->End(); ++i) rule.BaseCase(queryIndex, i); } if (leftScore < rightScore) { // Recurse to the left. Traverse(queryIndex, *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(queryIndex, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) Traverse(queryIndex, *referenceNode.Right()); // Recurse to the right. else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(queryIndex, *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(queryIndex, *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) Traverse(queryIndex, *referenceNode.Left()); // Recurse to the left. else ++numPrunes; } else // leftScore is equal to rightScore. { if (leftScore == DBL_MAX) { numPrunes += 2; // Pruned both left and right. } else { // Choose the left first. Traverse(queryIndex, *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(queryIndex, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) Traverse(queryIndex, *referenceNode.Right()); else ++numPrunes; } } } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/traits.hpp0000644000176200001440000000421713647512751025320 0ustar liggesusers/** * @file traits.hpp * @author Ryan Curtin * * Specialization of the TreeTraits class for the BinarySpaceTree type of tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_TRAITS_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_TRAITS_HPP #include namespace mlpack { namespace tree { /** * This is a specialization of the TreeType class to the BinarySpaceTree tree * type. It defines characteristics of the binary space tree, and is used to * help write tree-independent (but still optimized) tree-based algorithms. See * mlpack/core/tree/tree_traits.hpp for more information. */ template class TreeTraits > { public: /** * Each binary space tree node has two children which represent * non-overlapping subsets of the space which the node represents. Therefore, * children are not overlapping. */ static const bool HasOverlappingChildren = false; /** * There is no guarantee that the first point in a node is its centroid. */ static const bool FirstPointIsCentroid = false; /** * Points are not contained at multiple levels of the binary space tree. */ static const bool HasSelfChildren = false; /** * Points are rearranged during building of the tree. */ static const bool RearrangesDataset = true; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree/dual_tree_traverser_impl.hpp0000644000176200001440000002554013647512751031076 0ustar liggesusers/** * @file dual_tree_traverser_impl.hpp * @author Ryan Curtin * * Implementation of the DualTreeTraverser for BinarySpaceTree. This is a way * to perform a dual-tree traversal of two trees. The trees must be the same * type. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP // In case it hasn't been included yet. #include "dual_tree_traverser.hpp" namespace mlpack { namespace tree { template template BinarySpaceTree:: DualTreeTraverser::DualTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0), numVisited(0), numScores(0), numBaseCases(0) { /* Nothing to do. */ } template template void BinarySpaceTree:: DualTreeTraverser::Traverse( BinarySpaceTree& queryNode, BinarySpaceTree& referenceNode) { // Increment the visit counter. ++numVisited; // Store the current traversal info. traversalInfo = rule.TraversalInfo(); // If both are leaves, we must evaluate the base case. if (queryNode.IsLeaf() && referenceNode.IsLeaf()) { // Loop through each of the points in each node. for (size_t query = queryNode.Begin(); query < queryNode.End(); ++query) { // See if we need to investigate this point (this function should be // implemented for the single-tree recursion too). Restore the traversal // information first. rule.TraversalInfo() = traversalInfo; const double childScore = rule.Score(query, referenceNode); if (childScore == DBL_MAX) continue; // We can't improve this particular point. for (size_t ref = referenceNode.Begin(); ref < referenceNode.End(); ++ref) rule.BaseCase(query, ref); numBaseCases += referenceNode.Count(); } } else if ((!queryNode.IsLeaf()) && referenceNode.IsLeaf()) { // We have to recurse down the query node. In this case the recursion order // does not matter. const double leftScore = rule.Score(*queryNode.Left(), referenceNode); ++numScores; if (leftScore != DBL_MAX) Traverse(*queryNode.Left(), referenceNode); else ++numPrunes; // Before recursing, we have to set the traversal information correctly. rule.TraversalInfo() = traversalInfo; const double rightScore = rule.Score(*queryNode.Right(), referenceNode); ++numScores; if (rightScore != DBL_MAX) Traverse(*queryNode.Right(), referenceNode); else ++numPrunes; } else if (queryNode.IsLeaf() && (!referenceNode.IsLeaf())) { // We have to recurse down the reference node. In this case the recursion // order does matter. Before recursing, though, we have to set the // traversal information correctly. double leftScore = rule.Score(queryNode, *referenceNode.Left()); typename RuleType::TraversalInfoType leftInfo = rule.TraversalInfo(); rule.TraversalInfo() = traversalInfo; double rightScore = rule.Score(queryNode, *referenceNode.Right()); numScores += 2; if (leftScore < rightScore) { // Recurse to the left. Restore the left traversal info. Store the right // traversal info. traversalInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(queryNode, *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(queryNode, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = traversalInfo; Traverse(queryNode, *referenceNode.Right()); } else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(queryNode, *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(queryNode, *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) { // Restore the left traversal info. rule.TraversalInfo() = leftInfo; Traverse(queryNode, *referenceNode.Left()); } else ++numPrunes; } else // leftScore is equal to rightScore. { if (leftScore == DBL_MAX) { numPrunes += 2; } else { // Choose the left first. Restore the left traversal info. Store the // right traversal info. traversalInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(queryNode, *referenceNode.Left()); rightScore = rule.Rescore(queryNode, *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = traversalInfo; Traverse(queryNode, *referenceNode.Right()); } else ++numPrunes; } } } else { // We have to recurse down both query and reference nodes. Because the // query descent order does not matter, we will go to the left query child // first. Before recursing, we have to set the traversal information // correctly. double leftScore = rule.Score(*queryNode.Left(), *referenceNode.Left()); typename RuleType::TraversalInfoType leftInfo = rule.TraversalInfo(); rule.TraversalInfo() = traversalInfo; double rightScore = rule.Score(*queryNode.Left(), *referenceNode.Right()); typename RuleType::TraversalInfoType rightInfo; numScores += 2; if (leftScore < rightScore) { // Recurse to the left. Restore the left traversal info. Store the right // traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Left(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Left(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Left(), *referenceNode.Right()); } else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(*queryNode.Left(), *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(*queryNode.Left(), *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) { // Restore the left traversal info. rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Left(), *referenceNode.Left()); } else ++numPrunes; } else { if (leftScore == DBL_MAX) { numPrunes += 2; } else { // Choose the left first. Restore the left traversal info and store the // right traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Left(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Left(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal information. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Left(), *referenceNode.Right()); } else ++numPrunes; } } // Restore the main traversal information. rule.TraversalInfo() = traversalInfo; // Now recurse down the right query node. leftScore = rule.Score(*queryNode.Right(), *referenceNode.Left()); leftInfo = rule.TraversalInfo(); rule.TraversalInfo() = traversalInfo; rightScore = rule.Score(*queryNode.Right(), *referenceNode.Right()); numScores += 2; if (leftScore < rightScore) { // Recurse to the left. Restore the left traversal info. Store the right // traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Right(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Right(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Right(), *referenceNode.Right()); } else ++numPrunes; } else if (rightScore < leftScore) { // Recurse to the right. Traverse(*queryNode.Right(), *referenceNode.Right()); // Is it still valid to recurse to the left? leftScore = rule.Rescore(*queryNode.Right(), *referenceNode.Left(), leftScore); if (leftScore != DBL_MAX) { // Restore the left traversal info. rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Right(), *referenceNode.Left()); } else ++numPrunes; } else { if (leftScore == DBL_MAX) { numPrunes += 2; } else { // Choose the left first. Restore the left traversal info. Store the // right traversal info. rightInfo = rule.TraversalInfo(); rule.TraversalInfo() = leftInfo; Traverse(*queryNode.Right(), *referenceNode.Left()); // Is it still valid to recurse to the right? rightScore = rule.Rescore(*queryNode.Right(), *referenceNode.Right(), rightScore); if (rightScore != DBL_MAX) { // Restore the right traversal info. rule.TraversalInfo() = rightInfo; Traverse(*queryNode.Right(), *referenceNode.Right()); } else ++numPrunes; } } } } }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_BINARY_SPACE_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP RcppMLPACK/inst/include/mlpack/core/tree/rectangle_tree.hpp0000644000176200001440000000313313647512751023313 0ustar liggesusers/** * @file rectangle_tree.hpp * @author Andrew Wells * * Include all the necessary filse to use the Rectangle Type Trees (RTree, RStarTree, XTree, * and HilbertRTree.) * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_RECTANGLE_TREE_HPP #define __MLPACK_CORE_TREE_RECTANGLE_TREE_HPP /* we include bounds.hpp since it gives us the necessary files. * However, we will not use the "ballbounds" option. */ #include "bounds.hpp" #include "rectangle_tree/rectangle_tree.hpp" #include "rectangle_tree/single_tree_traverser.hpp" #include "rectangle_tree/single_tree_traverser_impl.hpp" #include "rectangle_tree/dual_tree_traverser.hpp" #include "rectangle_tree/dual_tree_traverser_impl.hpp" #include "rectangle_tree/r_tree_split.hpp" #include "rectangle_tree/r_star_tree_split.hpp" #include "rectangle_tree/r_tree_descent_heuristic.hpp" #include "rectangle_tree/r_star_tree_descent_heuristic.hpp" #include "rectangle_tree/traits.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/statistic.hpp0000644000176200001440000000361013647512751022337 0ustar liggesusers/** * @file statistic.hpp * * Definition of the policy type for the statistic class. * * You should define your own statistic that looks like EmptyStatistic. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_STATISTIC_HPP #define __MLPACK_CORE_TREE_STATISTIC_HPP namespace mlpack { namespace tree { /** * Empty statistic if you are not interested in storing statistics in your * tree. Use this as a template for your own. */ class EmptyStatistic { public: EmptyStatistic() { } ~EmptyStatistic() { } /** * This constructor is called when a node is finished being created. The * node is finished, and its children are finished, but it is not * necessarily true that the statistics of other nodes are initialized yet. * * @param node Node which this corresponds to. */ template EmptyStatistic(TreeType& /* node */) { } public: /** * Returns a string representation of this object. */ std::string ToString() const { std::stringstream convert; convert << "EmptyStatistic [" << this << "]" << std::endl; return convert.str(); } }; }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_STATISTIC_HPP RcppMLPACK/inst/include/mlpack/core/tree/bounds.hpp0000644000176200001440000000202713647512751021623 0ustar liggesusers/** * @file bounds.hpp * * Bounds that are useful for binary space partitioning trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BOUNDS_HPP #define __MLPACK_CORE_TREE_BOUNDS_HPP #include #include #include "hrectbound.hpp" #include "ballbound.hpp" #endif // __MLPACK_CORE_TREE_BOUNDS_HPP RcppMLPACK/inst/include/mlpack/core/tree/ballbound_impl.hpp0000644000176200001440000001761513647512751023325 0ustar liggesusers/** * @file ballbound_impl.hpp * * Bounds that are useful for binary space partitioning trees. * Implementation of BallBound ball bound metric policy class. * * @experimental * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BALLBOUND_IMPL_HPP #define __MLPACK_CORE_TREE_BALLBOUND_IMPL_HPP // In case it hasn't been included already. #include "ballbound.hpp" #include namespace mlpack { namespace bound { //! Empty Constructor. template BallBound::BallBound() : radius(-DBL_MAX), metric(new TMetricType()), ownsMetric(true) { /* Nothing to do. */ } /** * Create the ball bound with the specified dimensionality. * * @param dimension Dimensionality of ball bound. */ template BallBound::BallBound(const size_t dimension) : radius(-DBL_MAX), center(dimension), metric(new TMetricType()), ownsMetric(true) { /* Nothing to do. */ } /** * Create the ball bound with the specified radius and center. * * @param radius Radius of ball bound. * @param center Center of ball bound. */ template BallBound::BallBound(const double radius, const VecType& center) : radius(radius), center(center), metric(new TMetricType()), ownsMetric(true) { /* Nothing to do. */ } //! Copy Constructor. To prevent memory leaks. template BallBound::BallBound(const BallBound& other) : radius(other.radius), center(other.center), metric(other.metric), ownsMetric(false) { /* Nothing to do. */ } //! For the same reason as the Copy Constructor. To prevent memory leaks. template BallBound& BallBound::operator=( const BallBound& other) { radius = other.radius; center = other.center; metric = other.metric; ownsMetric = false; } //! Destructor to release allocated memory. template BallBound::~BallBound() { if (ownsMetric) delete metric; } //! Get the range in a certain dimension. template math::Range BallBound::operator[](const size_t i) const { if (radius < 0) return math::Range(); else return math::Range(center[i] - radius, center[i] + radius); } /** * Determines if a point is within the bound. */ template bool BallBound::Contains(const VecType& point) const { if (radius < 0) return false; else return metric->Evaluate(center, point) <= radius; } /** * Calculates minimum bound-to-point squared distance. */ template template double BallBound::MinDistance( const OtherVecType& point, typename boost::enable_if >* /* junk */) const { if (radius < 0) return DBL_MAX; else return math::ClampNonNegative(metric->Evaluate(point, center) - radius); } /** * Calculates minimum bound-to-bound squared distance. */ template double BallBound::MinDistance(const BallBound& other) const { if (radius < 0) return DBL_MAX; else { const double delta = metric->Evaluate(center, other.center) - radius - other.radius; return math::ClampNonNegative(delta); } } /** * Computes maximum distance. */ template template double BallBound::MaxDistance( const OtherVecType& point, typename boost::enable_if >* /* junk */) const { if (radius < 0) return DBL_MAX; else return metric->Evaluate(point, center) + radius; } /** * Computes maximum distance. */ template double BallBound::MaxDistance(const BallBound& other) const { if (radius < 0) return DBL_MAX; else return metric->Evaluate(other.center, center) + radius + other.radius; } /** * Calculates minimum and maximum bound-to-bound squared distance. * * Example: bound1.MinDistanceSq(other) for minimum squared distance. */ template template math::Range BallBound::RangeDistance( const OtherVecType& point, typename boost::enable_if >* /* junk */) const { if (radius < 0) return math::Range(DBL_MAX, DBL_MAX); else { const double dist = metric->Evaluate(center, point); return math::Range(math::ClampNonNegative(dist - radius), dist + radius); } } template math::Range BallBound::RangeDistance( const BallBound& other) const { if (radius < 0) return math::Range(DBL_MAX, DBL_MAX); else { const double dist = metric->Evaluate(center, other.center); const double sumradius = radius + other.radius; return math::Range(math::ClampNonNegative(dist - sumradius), dist + sumradius); } } /** * Expand the bound to include the given bound. * template const BallBound& BallBound::operator|=( const BallBound& other) { double dist = metric->Evaluate(center, other); // Now expand the radius as necessary. if (dist > radius) radius = dist; return *this; }*/ /** * Expand the bound to include the given point. Algorithm adapted from * Jack Ritter, "An Efficient Bounding Sphere" in Graphics Gems (1990). * The difference lies in the way we initialize the ball bound. The way we * expand the bound is same. */ template template const BallBound& BallBound::operator|=(const MatType& data) { if (radius < 0) { center = data.col(0); radius = 0; } // Now iteratively add points. for (size_t i = 0; i < data.n_cols; ++i) { const double dist = metric->Evaluate(center, (VecType) data.col(i)); // See if the new point lies outside the bound. if (dist > radius) { // Move towards the new point and increase the radius just enough to // accomodate the new point. arma::vec diff = data.col(i) - center; center += ((dist - radius) / (2 * dist)) * diff; radius = 0.5 * (dist + radius); } } return *this; } /** * Returns a string representation of this object. */ template std::string BallBound::ToString() const { std::ostringstream convert; convert << "BallBound [" << this << "]" << std::endl; convert << " Radius: " << radius << std::endl; convert << " Center:" << std::endl << center; convert << " ownsMetric: " << ownsMetric << std::endl; convert << " Metric:" << std::endl << metric->ToString(); return convert.str(); } }; // namespace bound }; // namespace mlpack #endif // __MLPACK_CORE_TREE_DBALLBOUND_IMPL_HPP RcppMLPACK/inst/include/mlpack/core/tree/example_tree.hpp0000644000176200001440000002503613647512751023010 0ustar liggesusers/** * @file example_tree.hpp * @author Ryan Curtin * * An example tree. This contains all the functions that mlpack trees must * implement (although the actual implementations here don't make any sense * because this is just an example). * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_EXAMPLE_TREE_HPP #define __MLPACK_CORE_TREE_EXAMPLE_TREE_HPP namespace mlpack { namespace tree { /** * This is not an actual space tree but instead an example tree that exists to * show and document all the functions that mlpack trees must implement. For a * better overview of trees, see @ref trees. Also be aware that the * implementations of each of the methods in this example tree are entirely fake * and do not work; this example tree exists for its API, not its * implementation. * * Note that trees often have different properties. These properties are known * at compile-time through the mlpack::tree::TreeTraits class, and some * properties may imply the existence (or non-existence) of certain functions. * Refer to the TreeTraits for more documentation on that. * * The three template parameters below must be template parameters to the tree, * in the order given below. More template parameters are fine, but they must * come after the first three. * * @tparam MetricType This defines the space in which the tree will be built. * For some trees, arbitrary metrics cannot be used, and a template * metaprogramming approach should be used to issue a compile-time error if * a metric cannot be used with a specific tree type. One example is the * tree::BinarySpaceTree tree type, which cannot work with the * metric::IPMetric class. * @tparam StatisticType A tree node can hold a statistic, which is sometimes * useful for various dual-tree algorithms. The tree itself does not need * to know anything about how the statistic works, but it needs to hold a * StatisticType in each node. It can be assumed that the StatisticType * class has a constructor StatisticType(const ExampleTree&). * @tparam MatType A tree could be built on a dense matrix or a sparse matrix. * All mlpack trees should be able to support any Armadillo-compatible * matrix type. When the tree is written it should be assumed that MatType * has the same functionality as arma::mat. */ template, typename StatisticType = EmptyStatistic, typename MatType = arma::mat> class ExampleTree { public: /** * This constructor will build the tree given a dataset and an instantiated * metric. Note that the parameter is a MatType& and not an arma::mat&. The * dataset is not modified by the tree-building process (if it is, see the * documentation for mlpack::tree::TreeTraits::RearrangesDataset for how to * deal with that situation). The MetricType parameter is necessary even * though some metrics do not hold any state. This is so that the tree does * not have to worry about instantiating the metric (if the tree had to worry * about this, this would almost certainly incur additional runtime complexity * and a larger runtime size of the tree node objects, which is to be * avoided). The metric can't be const, in case MetricType::Evaluate() is * non-const. * * When this constructor is finished, the entire tree will be built and ready * to use. The constructor should call the constructor of the statistic for * each node that is built (see tree::EmptyStatistic for more information). * * @param dataset The dataset that the tree will be built on. * @param metric The instantiated metric to use to build the dataset. */ ExampleTree(const MatType& dataset, MetricType& metric); //! Return the number of children of this node. size_t NumChildren() const; //! Return a particular child of this node. const ExampleTree& Child(const size_t i) const; //! Modify a particular child of this node. ExampleTree& Child(const size_t i); //! Return the parent node (NULL if this is the root of the tree). ExampleTree* Parent() const; //! Return the number of points held in this node. size_t NumPoints() const; /** * Return the index of a particular point of this node. mlpack trees do not, * in general, hold the actual dataset, and instead just hold the indices of * the points they contain. Thus, you might use this function in code like * this: * * @code * arma::vec thirdPoint = dataset.col(treeNode.Point(2)); * @endcode */ size_t Point(const size_t i) const; /** * Get the number of descendant points. This is the number of unique points * held in this node plus the number of points held in all descendant nodes. * This could be calculated at build-time and cached, or could be calculated * at run-time. This may be harder to calculate for trees that may hold * points in multiple nodes (like cover trees and spill trees, for instance). */ size_t NumDescendants() const; /** * Get the index of a particular descendant point. The ordering of the * descendants does not matter, as long as calling Descendant(0) through * Descendant(NumDescendants() - 1) will return the indices of every * unique descendant point of the node. */ size_t Descendant(const size_t i) const; //! Get the statistic for this node. const StatisticType& Stat() const; //! Modify the statistic for this node. StatisticType& Stat(); //! Get the instantiated metric for this node. const MetricType& Metric() const; //! Modify the instantiated metric for this node. MetricType& Metric(); /** * Return the minimum distance between this node and a point. It is not * required that the exact minimum distance between the node and the point is * returned but instead a lower bound on the minimum distance will suffice. * See the definitions in @ref trees for more information. * * @param point Point to return [lower bound on] minimum distance to. */ double MinDistance(const MatType& point) const; /** * Return the minimum distance between this node and another node. It is not * required that the exact minimum distance between the two nodes be returned * but instead a lower bound on the minimum distance will suffice. See the * definitions in @ref trees for more information. * * @param node Node to return [lower bound on] minimum distance to. */ double MinDistance(const ExampleTree& other) const; /** * Return the maximum distance between this node and a point. It is not * required that the exact maximum distance between the node and the point is * returned but instead an upper bound on the maximum distance will suffice. * See the definitions in @ref trees for more information. * * @param point Point to return [upper bound on] maximum distance to. */ double MaxDistance(const MatType& point) const; /** * Return the maximum distance between this node and another node. It is not * required that the exact maximum distance between the two nodes be returned * but instead an upper bound on the maximum distance will suffice. See the * definitions in @ref trees for more information. * * @param node Node to return [upper bound on] maximum distance to. */ double MaxDistance(const ExampleTree& other) const; /** * Return both the minimum and maximum distances between this node and a point * as a math::Range object. This overload is given because it is possible * that, for some tree types, calculation of both at once is faster than a * call to MinDistance() then MaxDistance(). It is not necessary that the * minimum and maximum distances be exact; it is sufficient to return a lower * bound on the minimum distance and an upper bound on the maximum distance. * See the definitions in @ref trees for more information. * * @param point Point to return [bounds on] minimum and maximum distances to. */ math::Range RangeDistance(const MatType& point) const; /** * Return both the minimum and maximum distances between this node and another * node as a math::Range object. This overload is given because it is * possible that, for some tree types, calculation of both at once is faster * than a call to MinDistance() then MaxDistance(). It is not necessary that * the minimum and maximum distances be exact; it is sufficient to return a * lower bound on the minimum distance and an upper bound on the maximum * distance. See the definitions in @ref trees for more information. * * @param node Node to return [bounds on] minimum and maximum distances to. */ math::Range RangeDistance(const ExampleTree& other) const; /** * Fill the given vector with the center of the node. * * @param centroid Vector to be filled with the center of the node. */ void Centroid(arma::vec& centroid) const; /** * Get the distance from the center of the node to the furthest descendant * point of this node. This does not necessarily need to be the exact * furthest descendant distance but instead can be an upper bound. See the * definitions in @ref trees for more information. */ double FurthestDescendantDistance() const; /** * Get the distance from the center of this node to the center of the parent * node. */ double ParentDistance() const; private: //! This member is just here so the ExampleTree compiles without warnings. It //! is not required to be a member in every type of tree. StatisticType stat; /** * This member is just here so the ExampleTree compiles without warnings. It * is not required to be a member in every type of tree. Be aware that * storing the metric as a member and not a reference may mean that for some * metrics (such as metric::MahalanobisDistance in high dimensionality) may * incur lots of unnecessary matrix copying. */ MetricType& metric; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/mrkd_statistic.hpp0000644000176200001440000000670213647512751023361 0ustar liggesusers/** * @file mrkd_statistic.hpp * @author James Cline * * Definition of the statistic for multi-resolution kd-trees. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_MRKD_STATISTIC_HPP #define __MLPACK_CORE_TREE_MRKD_STATISTIC_HPP #include namespace mlpack { namespace tree { /** * Statistic for multi-resolution kd-trees. */ class MRKDStatistic { public: //! Initialize an empty statistic. MRKDStatistic(); /** * This constructor is called when a node is finished initializing. * * @param node The node that has been finished. */ template MRKDStatistic(const TreeType& /* node */); /** * Returns a string representation of this object. */ std::string ToString() const; //! Get the index of the initial item in the dataset. size_t Begin() const { return begin; } //! Modify the index of the initial item in the dataset. size_t& Begin() { return begin; } //! Get the number of items in the dataset. size_t Count() const { return count; } //! Modify the number of items in the dataset. size_t& Count() { return count; } //! Get the center of mass. const arma::colvec& CenterOfMass() const { return centerOfMass; } //! Modify the center of mass. arma::colvec& CenterOfMass() { return centerOfMass; } //! Get the index of the dominating centroid. size_t DominatingCentroid() const { return dominatingCentroid; } //! Modify the index of the dominating centroid. size_t& DominatingCentroid() { return dominatingCentroid; } //! Access the whitelist. const std::vector& Whitelist() const { return whitelist; } //! Modify the whitelist. std::vector& Whitelist() { return whitelist; } private: //! The data points this object contains. const arma::mat* dataset; //! The initial item in the dataset, so we don't have to make a copy. size_t begin; //! The number of items in the dataset. size_t count; //! The left child. const MRKDStatistic* leftStat; //! The right child. const MRKDStatistic* rightStat; //! A link to the parent node; NULL if this is the root. const MRKDStatistic* parentStat; // Computed statistics. //! The center of mass for this dataset. arma::colvec centerOfMass; //! The sum of the squared Euclidean norms for this dataset. double sumOfSquaredNorms; // There may be a better place to store this -- HRectBound? //! The index of the dominating centroid of the associated hyperrectangle. size_t dominatingCentroid; //! The list of centroids that cannot own this hyperrectangle. std::vector whitelist; //! Whether or not the whitelist is valid. bool isWhitelistValid; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "mrkd_statistic_impl.hpp" #endif // __MLPACK_CORE_TREE_MRKD_STATISTIC_HPP RcppMLPACK/inst/include/mlpack/core/tree/cosine_tree/0000755000176200001440000000000013647512751022116 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/tree/cosine_tree/cosine_tree.hpp0000644000176200001440000002337313647512751025136 0ustar liggesusers/** * @file cosine_tree.hpp * @author Siddharth Agrawal * * Definition of Cosine Tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COSINE_TREE_COSINE_TREE_HPP #define __MLPACK_CORE_TREE_COSINE_TREE_COSINE_TREE_HPP #include #include namespace mlpack { namespace tree { // Predeclare classes for CosineNodeQueue typedef. class CompareCosineNode; class CosineTree; // CosineNodeQueue typedef. typedef boost::heap::priority_queue > CosineNodeQueue; class CosineTree { public: /** * CosineTree constructor for the root node of the tree. It initializes the * necessary variables required for splitting of the node, and building the * tree further. It takes a pointer to the input matrix and calculates the * relevant variables using it. * * @param dataset Matrix for which cosine tree is constructed. */ CosineTree(const arma::mat& dataset); /** * CosineTree constructor for nodes other than the root node of the tree. It * takes in a pointer to the parent node and a list of column indices which * mentions the columns to be included in the node. The function calculate the * relevant variables just like the constructor above. * * @param parentNode Pointer to the parent cosine node. * @param subIndices Pointer to vector of column indices to be included. */ CosineTree(CosineTree& parentNode, const std::vector& subIndices); /** * Construct the CosineTree and the basis for the given matrix, and passed * 'epsilon' and 'delta' parameters. The CosineTree is constructed by * splitting nodes in the direction of maximum error, stored using a priority * queue. Basis vectors are added from the left and right children of the * split node. The basis vector from a node is the orthonormalized centroid of * its columns. The splitting continues till the Monte Carlo estimate of the * input matrix's projection on the obtained subspace is less than a fraction * of the norm of the input matrix. * * @param dataset Matrix for which the CosineTree is constructed. * @param epsilon Error tolerance fraction for calculated subspace. * @param delta Cumulative probability for Monte Carlo error lower bound. */ CosineTree(const arma::mat& dataset, const double epsilon, const double delta); /** * Calculates the orthonormalization of the passed centroid, with respect to * the current vector subspace. * * @param treeQueue Priority queue of cosine nodes. * @param centroid Centroid of the node being added to the basis. * @param newBasisVector Orthonormalized centroid of the node. * @param addBasisVector Address to additional basis vector. */ void ModifiedGramSchmidt(CosineNodeQueue& treeQueue, arma::vec& centroid, arma::vec& newBasisVector, arma::vec* addBasisVector = NULL); /** * Estimates the squared error of the projection of the input node's matrix * onto the current vector subspace. A normal distribution is fit using * weighted norms of projections of samples drawn from the input node's matrix * columns. The error is calculated as the difference between the Frobenius * norm of the input node's matrix and lower bound of the normal distribution. * * @param node Node for which Monte Carlo estimate is calculated. * @param treeQueue Priority queue of cosine nodes. * @param addBasisVector1 Address to first additional basis vector. * @param addBasisVector2 Address to second additional basis vector. */ double MonteCarloError(CosineTree* node, CosineNodeQueue& treeQueue, arma::vec* addBasisVector1 = NULL, arma::vec* addBasisVector2 = NULL); /** * Constructs the final basis matrix, after the cosine tree construction. * * @param treeQueue Priority queue of cosine nodes. */ void ConstructBasis(CosineNodeQueue& treeQueue); /** * This function splits the cosine node into two children based on the cosines * of the columns contained in the node, with respect to the sampled splitting * point. The function also calls the CosineTree constructor for the children. */ void CosineNodeSplit(); /** * Sample 'numSamples' points from the Length-Squared distribution of the * cosine node. The function uses 'l2NormsSquared' to calculate the cumulative * probability distribution of the column vectors. The sampling is based on a * randomly generated values in the range [0, 1]. */ void ColumnSamplesLS(std::vector& sampledIndices, arma::vec& probabilities, size_t numSamples); /** * Sample a point from the Length-Squared distribution of the cosine node. The * function uses 'l2NormsSquared' to calculate the cumulative probability * distribution of the column vectors. The sampling is based on a randomly * generated value in the range [0, 1]. */ size_t ColumnSampleLS(); /** * Sample a column based on the cumulative Length-Squared distribution of the * cosine node, and a randomly generated value in the range [0, 1]. Binary * search is more efficient than searching linearly for the same. This leads * a significant speedup when there are large number of columns to choose from * and when a number of samples are to be drawn from the distribution. * * @param cDistribution Cumulative LS distibution of columns in the node. * @param value Randomly generated value in the range [0, 1]. * @param start Starting index of the distribution interval to search in. * @param end Ending index of the distribution interval to search in. */ size_t BinarySearch(arma::vec& cDistribution, double value, size_t start, size_t end); /** * Calculate cosines of the columns present in the node, with respect to the * sampled splitting point. The calculated cosine values are useful for * splitting the node into its children. * * @param cosines Vector to store the cosine values in. */ void CalculateCosines(arma::vec& cosines); /** * Calculate centroid of the columns present in the node. The calculated * centroid is used as a basis vector for the cosine tree being constructed. */ void CalculateCentroid(); //! Returns the basis of the constructed subspace. void GetFinalBasis(arma::mat& finalBasis) { finalBasis = basis; } //! Get pointer to the dataset matrix. const arma::mat& GetDataset() const { return dataset; } //! Get the indices of columns in the node. std::vector& VectorIndices() { return indices; } //! Set the Monte Carlo error. void L2Error(const double error) { this->l2Error = error; } //! Get the Monte Carlo error. double L2Error() const { return l2Error; } //! Get pointer to the centroid vector. arma::vec& Centroid() { return centroid; } //! Set the basis vector of the node. void BasisVector(arma::vec& bVector) { this->basisVector = bVector; } //! Get the basis vector of the node. arma::vec& BasisVector() { return basisVector; } //! Get pointer to the left child of the node. CosineTree* Left() { return left; } //! Get pointer to the right child of the node. CosineTree* Right() { return right; } //! Get number of columns of input matrix in the node. size_t NumColumns() const { return numColumns; } //! Get the Frobenius norm squared of columns in the node. double FrobNormSquared() const { return frobNormSquared; } //! Get the column index of split point of the node. size_t SplitPointIndex() const { return indices[splitPointIndex]; } private: //! Matrix for which cosine tree is constructed. const arma::mat& dataset; //! Error tolerance fraction for calculated subspace. double epsilon; //! Cumulative probability for Monte Carlo error lower bound. double delta; //! Subspace basis of the input dataset. arma::mat basis; //! Parent of the node. CosineTree* parent; //! Right child of the node. CosineTree* right; //! Left child of the node. CosineTree* left; //! Indices of columns of input matrix in the node. std::vector indices; //! L2-norm squared of columns in the node. arma::vec l2NormsSquared; //! Centroid of columns of input matrix in the node. arma::vec centroid; //! Orthonormalized basis vector of the node. arma::vec basisVector; //! Index of split point of cosine node. size_t splitPointIndex; //! Number of columns of input matrix in the node. size_t numColumns; //! Monte Carlo error for this node. double l2Error; //! Frobenius norm squared of columns in the node. double frobNormSquared; }; class CompareCosineNode { public: // Comparison function for construction of priority queue. bool operator() (const CosineTree* a, const CosineTree* b) const { return a->L2Error() < b->L2Error(); } }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/hrectbound_impl.hpp0000644000176200001440000002577413647512751023525 0ustar liggesusers/** * @file hrectbound_impl.hpp * * Implementation of hyper-rectangle bound policy class. * Template parameter Power is the metric to use; use 2 for Euclidean (L2). * * @experimental * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_HRECTBOUND_IMPL_HPP #define __MLPACK_CORE_TREE_HRECTBOUND_IMPL_HPP #include // In case it has not been included yet. #include "hrectbound.hpp" namespace mlpack { namespace bound { /** * Empty constructor. */ template HRectBound::HRectBound() : dim(0), bounds(NULL), minWidth(0) { /* Nothing to do. */ } /** * Initializes to specified dimensionality with each dimension the empty * set. */ template HRectBound::HRectBound(const size_t dimension) : dim(dimension), bounds(new math::Range[dim]), minWidth(0) { /* Nothing to do. */ } /*** * Copy constructor necessary to prevent memory leaks. */ template HRectBound::HRectBound(const HRectBound& other) : dim(other.Dim()), bounds(new math::Range[dim]), minWidth(other.MinWidth()) { // Copy other bounds over. for (size_t i = 0; i < dim; i++) bounds[i] = other[i]; } /*** * Same as the copy constructor. */ template HRectBound& HRectBound::operator=( const HRectBound& other) { if (dim != other.Dim()) { // Reallocation is necessary. if (bounds) delete[] bounds; dim = other.Dim(); bounds = new math::Range[dim]; } // Now copy each of the bound values. for (size_t i = 0; i < dim; i++) bounds[i] = other[i]; minWidth = other.MinWidth(); return *this; } /** * Destructor: clean up memory. */ template HRectBound::~HRectBound() { if (bounds) delete[] bounds; } /** * Resets all dimensions to the empty set. */ template void HRectBound::Clear() { for (size_t i = 0; i < dim; i++) bounds[i] = math::Range(); minWidth = 0; } /*** * Calculates the centroid of the range, placing it into the given vector. * * @param centroid Vector which the centroid will be written to. */ template void HRectBound::Centroid(arma::vec& centroid) const { // Set size correctly if necessary. if (!(centroid.n_elem == dim)) centroid.set_size(dim); for (size_t i = 0; i < dim; i++) centroid(i) = bounds[i].Mid(); } /** * Calculates minimum bound-to-point squared distance. */ template template double HRectBound::MinDistance( const VecType& point, typename boost::enable_if >* /* junk */) const { //Log::Assert(point.n_elem == dim); double sum = 0; double lower, higher; for (size_t d = 0; d < dim; d++) { lower = bounds[d].Lo() - point[d]; higher = point[d] - bounds[d].Hi(); // Since only one of 'lower' or 'higher' is negative, if we add each's // absolute value to itself and then sum those two, our result is the // nonnegative half of the equation times two; then we raise to power Power. sum += pow((lower + fabs(lower)) + (higher + fabs(higher)), (double) Power); } // Now take the Power'th root (but make sure our result is squared if it needs // to be); then cancel out the constant of 2 (which may have been squared now) // that was introduced earlier. The compiler should optimize out the if // statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power) / 2.0; else return sum / pow(2.0, Power); } /** * Calculates minimum bound-to-bound squared distance. */ template double HRectBound::MinDistance(const HRectBound& other) const { //Log::Assert(dim == other.dim); double sum = 0; const math::Range* mbound = bounds; const math::Range* obound = other.bounds; double lower, higher; for (size_t d = 0; d < dim; d++) { lower = obound->Lo() - mbound->Hi(); higher = mbound->Lo() - obound->Hi(); // We invoke the following: // x + fabs(x) = max(x * 2, 0) // (x * 2)^2 / 4 = x^2 sum += pow((lower + fabs(lower)) + (higher + fabs(higher)), (double) Power); // Move bound pointers. mbound++; obound++; } // The compiler should optimize out this if statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power) / 2.0; else return sum / pow(2.0, Power); } /** * Calculates maximum bound-to-point squared distance. */ template template double HRectBound::MaxDistance( const VecType& point, typename boost::enable_if >* /* junk */) const { double sum = 0; //Log::Assert(point.n_elem == dim); for (size_t d = 0; d < dim; d++) { double v = std::max(fabs(point[d] - bounds[d].Lo()), fabs(bounds[d].Hi() - point[d])); sum += pow(v, (double) Power); } // The compiler should optimize out this if statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power); else return sum; } /** * Computes maximum distance. */ template double HRectBound::MaxDistance(const HRectBound& other) const { double sum = 0; //Log::Assert(dim == other.dim); double v; for (size_t d = 0; d < dim; d++) { v = std::max(fabs(other.bounds[d].Hi() - bounds[d].Lo()), fabs(bounds[d].Hi() - other.bounds[d].Lo())); sum += pow(v, (double) Power); // v is non-negative. } // The compiler should optimize out this if statement entirely. if (TakeRoot) return pow(sum, 1.0 / (double) Power); else return sum; } /** * Calculates minimum and maximum bound-to-bound squared distance. */ template math::Range HRectBound::RangeDistance(const HRectBound& other) const { double loSum = 0; double hiSum = 0; //Log::Assert(dim == other.dim); double v1, v2, vLo, vHi; for (size_t d = 0; d < dim; d++) { v1 = other.bounds[d].Lo() - bounds[d].Hi(); v2 = bounds[d].Lo() - other.bounds[d].Hi(); // One of v1 or v2 is negative. if (v1 >= v2) { vHi = -v2; // Make it nonnegative. vLo = (v1 > 0) ? v1 : 0; // Force to be 0 if negative. } else { vHi = -v1; // Make it nonnegative. vLo = (v2 > 0) ? v2 : 0; // Force to be 0 if negative. } loSum += pow(vLo, (double) Power); hiSum += pow(vHi, (double) Power); } if (TakeRoot) return math::Range(pow(loSum, 1.0 / (double) Power), pow(hiSum, 1.0 / (double) Power)); else return math::Range(loSum, hiSum); } /** * Calculates minimum and maximum bound-to-point squared distance. */ template template math::Range HRectBound::RangeDistance( const VecType& point, typename boost::enable_if >* /* junk */) const { double loSum = 0; double hiSum = 0; //Log::Assert(point.n_elem == dim); double v1, v2, vLo, vHi; for (size_t d = 0; d < dim; d++) { v1 = bounds[d].Lo() - point[d]; // Negative if point[d] > lo. v2 = point[d] - bounds[d].Hi(); // Negative if point[d] < hi. // One of v1 or v2 (or both) is negative. if (v1 >= 0) // point[d] <= bounds_[d].Lo(). { vHi = -v2; // v2 will be larger but must be negated. vLo = v1; } else // point[d] is between lo and hi, or greater than hi. { if (v2 >= 0) { vHi = -v1; // v1 will be larger, but must be negated. vLo = v2; } else { vHi = -std::min(v1, v2); // Both are negative, but we need the larger. vLo = 0; } } loSum += pow(vLo, (double) Power); hiSum += pow(vHi, (double) Power); } if (TakeRoot) return math::Range(pow(loSum, 1.0 / (double) Power), pow(hiSum, 1.0 / (double) Power)); else return math::Range(loSum, hiSum); } /** * Expands this region to include a new point. */ template template HRectBound& HRectBound::operator|=( const MatType& data) { //Log::Assert(data.n_rows == dim); arma::vec mins(min(data, 1)); arma::vec maxs(max(data, 1)); minWidth = DBL_MAX; for (size_t i = 0; i < dim; i++) { bounds[i] |= math::Range(mins[i], maxs[i]); const double width = bounds[i].Width(); if (width < minWidth) minWidth = width; } return *this; } /** * Expands this region to encompass another bound. */ template HRectBound& HRectBound::operator|=( const HRectBound& other) { assert(other.dim == dim); minWidth = DBL_MAX; for (size_t i = 0; i < dim; i++) { bounds[i] |= other.bounds[i]; const double width = bounds[i].Width(); if (width < minWidth) minWidth = width; } return *this; } /** * Determines if a point is within this bound. */ template template bool HRectBound::Contains(const VecType& point) const { for (size_t i = 0; i < point.n_elem; i++) { if (!bounds[i].Contains(point(i))) return false; } return true; } /** * Returns the diameter of the hyperrectangle (that is, the longest diagonal). */ template double HRectBound::Diameter() const { double d = 0; for (size_t i = 0; i < dim; ++i) d += std::pow(bounds[i].Hi() - bounds[i].Lo(), (double) Power); if (TakeRoot) return std::pow(d, 1.0 / (double) Power); else return d; } /** * Returns a string representation of this object. */ template std::string HRectBound::ToString() const { std::ostringstream convert; convert << "HRectBound [" << this << "]" << std::endl; convert << " Power: " << Power << std::endl; convert << " TakeRoot: " << (TakeRoot ? "true" : "false") << std::endl; convert << " Dimensionality: " << dim << std::endl; convert << " Bounds: " << std::endl; for (size_t i = 0; i < dim; ++i) convert << util::Indent(bounds[i].ToString()) << std::endl; convert << " Minimum width: " << minWidth << std::endl; return convert.str(); } }; // namespace bound }; // namespace mlpack #endif // __MLPACK_CORE_TREE_HRECTBOUND_IMPL_HPP RcppMLPACK/inst/include/mlpack/core/tree/hrectbound.hpp0000644000176200001440000001360713647512751022474 0ustar liggesusers/** * @file hrectbound.hpp * * Bounds that are useful for binary space partitioning trees. * * This file describes the interface for the HRectBound class, which implements * a hyperrectangle bound. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_HRECTBOUND_HPP #define __MLPACK_CORE_TREE_HRECTBOUND_HPP #include #include #include namespace mlpack { namespace bound { /** * Hyper-rectangle bound for an L-metric. This should be used in conjunction * with the LMetric class. Be sure to use the same template parameters for * LMetric as you do for HRectBound -- otherwise odd results may occur. * * @tparam Power The metric to use; use 2 for Euclidean (L2). * @tparam TakeRoot Whether or not the root should be taken (see LMetric * documentation). */ template class HRectBound { public: //! This is the metric type that this bound is using. typedef metric::LMetric MetricType; /** * Empty constructor; creates a bound of dimensionality 0. */ HRectBound(); /** * Initializes to specified dimensionality with each dimension the empty * set. */ HRectBound(const size_t dimension); //! Copy constructor; necessary to prevent memory leaks. HRectBound(const HRectBound& other); //! Same as copy constructor; necessary to prevent memory leaks. HRectBound& operator=(const HRectBound& other); //! Destructor: clean up memory. ~HRectBound(); /** * Resets all dimensions to the empty set (so that this bound contains * nothing). */ void Clear(); //! Gets the dimensionality. size_t Dim() const { return dim; } //! Get the range for a particular dimension. No bounds checking. Be //! careful: this may make MinWidth() invalid. math::Range& operator[](const size_t i) { return bounds[i]; } //! Modify the range for a particular dimension. No bounds checking. const math::Range& operator[](const size_t i) const { return bounds[i]; } //! Get the minimum width of the bound. double MinWidth() const { return minWidth; } //! Modify the minimum width of the bound. double& MinWidth() { return minWidth; } /** * Calculates the centroid of the range, placing it into the given vector. * * @param centroid Vector which the centroid will be written to. */ void Centroid(arma::vec& centroid) const; /** * Calculates minimum bound-to-point distance. * * @param point Point to which the minimum distance is requested. */ template double MinDistance(const VecType& point, typename boost::enable_if >* = 0) const; /** * Calculates minimum bound-to-bound distance. * * @param other Bound to which the minimum distance is requested. */ double MinDistance(const HRectBound& other) const; /** * Calculates maximum bound-to-point squared distance. * * @param point Point to which the maximum distance is requested. */ template double MaxDistance(const VecType& point, typename boost::enable_if >* = 0) const; /** * Computes maximum distance. * * @param other Bound to which the maximum distance is requested. */ double MaxDistance(const HRectBound& other) const; /** * Calculates minimum and maximum bound-to-bound distance. * * @param other Bound to which the minimum and maximum distances are * requested. */ math::Range RangeDistance(const HRectBound& other) const; /** * Calculates minimum and maximum bound-to-point distance. * * @param point Point to which the minimum and maximum distances are * requested. */ template math::Range RangeDistance(const VecType& point, typename boost::enable_if >* = 0) const; /** * Expands this region to include new points. * * @tparam MatType Type of matrix; could be Mat, SpMat, a subview, or just a * vector. * @param data Data points to expand this region to include. */ template HRectBound& operator|=(const MatType& data); /** * Expands this region to encompass another bound. */ HRectBound& operator|=(const HRectBound& other); /** * Determines if a point is within this bound. */ template bool Contains(const VecType& point) const; /** * Returns the diameter of the hyperrectangle (that is, the longest diagonal). */ double Diameter() const; /** * Returns a string representation of this object. */ std::string ToString() const; /** * Return the metric associated with this bound. Because it is an LMetric, it * cannot store state, so we can make it on the fly. It is also static * because the metric is only dependent on the template arguments. */ static MetricType Metric() { return metric::LMetric(); } private: //! The dimensionality of the bound. size_t dim; //! The bounds for each dimension. math::Range* bounds; //! Cached minimum width of bound. double minWidth; }; }; // namespace bound }; // namespace mlpack #include "hrectbound_impl.hpp" #endif // __MLPACK_CORE_TREE_HRECTBOUND_HPP RcppMLPACK/inst/include/mlpack/core/tree/ballbound.hpp0000644000176200001440000001340513647512751022275 0ustar liggesusers/** * @file ballbound.hpp * * Bounds that are useful for binary space partitioning trees. * Interface to a ball bound that works in arbitrary metric spaces. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BALLBOUND_HPP #define __MLPACK_CORE_TREE_BALLBOUND_HPP #include #include namespace mlpack { namespace bound { /** * Ball bound encloses a set of points at a specific distance (radius) from a * specific point (center). TMetricType is the custom metric type that defaults * to the Euclidean (L2) distance. * * @tparam VecType Type of vector (arma::vec or arma::sp_vec). * @tparam TMetricType metric type used in the distance measure. */ template > class BallBound { public: typedef VecType Vec; //! Need this for Binary Space Partion Tree typedef TMetricType MetricType; private: //! The radius of the ball bound. double radius; //! The center of the ball bound. VecType center; //! The metric used in this bound. TMetricType* metric; /** * To know whether this object allocated memory to the metric member * variable. This will be true except in the copy constructor and the * overloaded assignment operator. We need this to know whether we should * delete the metric member variable in the destructor. */ bool ownsMetric; public: //! Empty Constructor. BallBound(); /** * Create the ball bound with the specified dimensionality. * * @param dimension Dimensionality of ball bound. */ BallBound(const size_t dimension); /** * Create the ball bound with the specified radius and center. * * @param radius Radius of ball bound. * @param center Center of ball bound. */ BallBound(const double radius, const VecType& center); //! Copy constructor. To prevent memory leaks. BallBound(const BallBound& other); //! For the same reason as the Copy Constructor. To prevent memory leaks. BallBound& operator=(const BallBound& other); //! Destructor to release allocated memory. ~BallBound(); //! Get the radius of the ball. double Radius() const { return radius; } //! Modify the radius of the ball. double& Radius() { return radius; } //! Get the center point of the ball. const VecType& Center() const { return center; } //! Modify the center point of the ball. VecType& Center() { return center; } //! Get the dimensionality of the ball. double Dim() const { return center.n_elem; } /** * Get the minimum width of the bound (this is same as the diameter). * For ball bounds, width along all dimensions remain same. */ double MinWidth() const { return radius * 2.0; } //! Get the range in a certain dimension. math::Range operator[](const size_t i) const; /** * Determines if a point is within this bound. */ bool Contains(const VecType& point) const; /** * Place the centroid of BallBound into the given vector. * * @param centroid Vector which the centroid will be written to. */ void Centroid(VecType& centroid) const { centroid = center; } /** * Calculates minimum bound-to-point squared distance. */ template double MinDistance(const OtherVecType& point, typename boost::enable_if >* = 0) const; /** * Calculates minimum bound-to-bound squared distance. */ double MinDistance(const BallBound& other) const; /** * Computes maximum distance. */ template double MaxDistance(const OtherVecType& point, typename boost::enable_if >* = 0) const; /** * Computes maximum distance. */ double MaxDistance(const BallBound& other) const; /** * Calculates minimum and maximum bound-to-point distance. */ template math::Range RangeDistance( const OtherVecType& other, typename boost::enable_if >* = 0) const; /** * Calculates minimum and maximum bound-to-bound distance. * * Example: bound1.MinDistanceSq(other) for minimum distance. */ math::Range RangeDistance(const BallBound& other) const; /** * Expand the bound to include the given node. */ const BallBound& operator|=(const BallBound& other); /** * Expand the bound to include the given point. The centroid is recalculated * to be the center of all of the given points. * * @tparam MatType Type of matrix; could be arma::mat, arma::spmat, or a * vector. * @tparam data Data points to add. */ template const BallBound& operator|=(const MatType& data); /** * Returns the diameter of the ballbound. */ double Diameter() const { return 2 * radius; } /** * Returns the distance metric used in this bound. */ TMetricType Metric() const { return *metric; } /** * Returns a string representation of this object. */ std::string ToString() const; }; }; // namespace bound }; // namespace mlpack #include "ballbound_impl.hpp" #endif // __MLPACK_CORE_TREE_DBALLBOUND_HPP RcppMLPACK/inst/include/mlpack/core/tree/binary_space_tree.hpp0000644000176200001440000000240113647512751024003 0ustar liggesusers/** * @file binary_space_tree.hpp * @author Ryan Curtin * * Include all the necessary files to use the BinarySpaceTree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_BINARY_SPACE_TREE_HPP #define __MLPACK_CORE_TREE_BINARY_SPACE_TREE_HPP #include "bounds.hpp" #include "binary_space_tree/binary_space_tree.hpp" #include "binary_space_tree/single_tree_traverser.hpp" #include "binary_space_tree/single_tree_traverser_impl.hpp" #include "binary_space_tree/dual_tree_traverser.hpp" #include "binary_space_tree/dual_tree_traverser_impl.hpp" #include "binary_space_tree/traits.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/0000755000176200001440000000000013647521523021751 5ustar liggesusersRcppMLPACK/inst/include/mlpack/core/tree/cover_tree/cover_tree.hpp0000644000176200001440000004577213647512751024641 0ustar liggesusers/** * @file cover_tree.hpp * @author Ryan Curtin * * Definition of CoverTree, which can be used in place of the BinarySpaceTree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_HPP #define __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_HPP #include #include #include "first_point_is_root.hpp" #include "../statistic.hpp" namespace mlpack { namespace tree { /** * A cover tree is a tree specifically designed to speed up nearest-neighbor * computation in high-dimensional spaces. Each non-leaf node references a * point and has a nonzero number of children, including a "self-child" which * references the same point. A leaf node represents only one point. * * The tree can be thought of as a hierarchy with the root node at the top level * and the leaf nodes at the bottom level. Each level in the tree has an * assigned 'scale' i. The tree follows these three conditions: * * - nesting: the level C_i is a subset of the level C_{i - 1}. * - covering: all node in level C_{i - 1} have at least one node in the * level C_i with distance less than or equal to b^i (exactly one of these * is a parent of the point in level C_{i - 1}. * - separation: all nodes in level C_i have distance greater than b^i to all * other nodes in level C_i. * * The value 'b' refers to the base, which is a parameter of the tree. These * three properties make the cover tree very good for fast, high-dimensional * nearest-neighbor search. * * The theoretical structure of the tree contains many 'implicit' nodes which * only have a "self-child" (a child referencing the same point, but at a lower * scale level). This practical implementation only constructs explicit nodes * -- non-leaf nodes with more than one child. A leaf node has no children, and * its scale level is INT_MIN. * * For more information on cover trees, see * * @code * @inproceedings{ * author = {Beygelzimer, Alina and Kakade, Sham and Langford, John}, * title = {Cover trees for nearest neighbor}, * booktitle = {Proceedings of the 23rd International Conference on Machine * Learning}, * series = {ICML '06}, * year = {2006}, * pages = {97--104] * } * @endcode * * For information on runtime bounds of the nearest-neighbor computation using * cover trees, see the following paper, presented at NIPS 2009: * * @code * @inproceedings{ * author = {Ram, P., and Lee, D., and March, W.B., and Gray, A.G.}, * title = {Linear-time Algorithms for Pairwise Statistical Problems}, * booktitle = {Advances in Neural Information Processing Systems 22}, * editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C.K.I. Williams * and A. Culotta}, * pages = {1527--1535}, * year = {2009} * } * @endcode * * The CoverTree class offers three template parameters; a custom metric type * can be used with MetricType (this class defaults to the L2-squared metric). * The root node's point can be chosen with the RootPointPolicy; by default, the * FirstPointIsRoot policy is used, meaning the first point in the dataset is * used. The StatisticType policy allows you to define statistics which can be * gathered during the creation of the tree. * * @tparam MetricType Metric type to use during tree construction. * @tparam RootPointPolicy Determines which point to use as the root node. * @tparam StatisticType Statistic to be used during tree creation. */ template, typename RootPointPolicy = FirstPointIsRoot, typename StatisticType = EmptyStatistic> class CoverTree { public: typedef arma::mat Mat; /** * Create the cover tree with the given dataset and given base. * The dataset will not be modified during the building procedure (unlike * BinarySpaceTree). * * The last argument will be removed in mlpack 1.1.0 (see #274 and #273). * * @param dataset Reference to the dataset to build a tree on. * @param base Base to use during tree building (default 2.0). */ CoverTree(const arma::mat& dataset, const double base = 2.0, MetricType* metric = NULL); /** * Create the cover tree with the given dataset and the given instantiated * metric. Optionally, set the base. The dataset will not be modified during * the building procedure (unlike BinarySpaceTree). * * @param dataset Reference to the dataset to build a tree on. * @param metric Instantiated metric to use during tree building. * @param base Base to use during tree building (default 2.0). */ CoverTree(const arma::mat& dataset, MetricType& metric, const double base = 2.0); /** * Construct a child cover tree node. This constructor is not meant to be * used externally, but it could be used to insert another node into a tree. * This procedure uses only one vector for the near set, the far set, and the * used set (this is to prevent unnecessary memory allocation in recursive * calls to this constructor). Therefore, the size of the near set, far set, * and used set must be passed in. The near set will be entirely used up, and * some of the far set may be used. The value of usedSetSize will be set to * the number of points used in the construction of this node, and the value * of farSetSize will be modified to reflect the number of points in the far * set _after_ the construction of this node. * * If you are calling this manually, be careful that the given scale is * as small as possible, or you may be creating an implicit node in your tree. * * @param dataset Reference to the dataset to build a tree on. * @param base Base to use during tree building. * @param pointIndex Index of the point this node references. * @param scale Scale of this level in the tree. * @param parent Parent of this node (NULL indicates no parent). * @param parentDistance Distance to the parent node. * @param indices Array of indices, ordered [ nearSet | farSet | usedSet ]; * will be modified to [ farSet | usedSet ]. * @param distances Array of distances, ordered the same way as the indices. * These represent the distances between the point specified by pointIndex * and each point in the indices array. * @param nearSetSize Size of the near set; if 0, this will be a leaf. * @param farSetSize Size of the far set; may be modified (if this node uses * any points in the far set). * @param usedSetSize The number of points used will be added to this number. */ CoverTree(const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize, MetricType& metric = NULL); /** * Manually construct a cover tree node; no tree assembly is done in this * constructor, and children must be added manually (use Children()). This * constructor is useful when the tree is being "imported" into the CoverTree * class after being created in some other manner. * * @param dataset Reference to the dataset this node is a part of. * @param base Base that was used for tree building. * @param pointIndex Index of the point in the dataset which this node refers * to. * @param scale Scale of this node's level in the tree. * @param parent Parent node (NULL indicates no parent). * @param parentDistance Distance to parent node point. * @param furthestDescendantDistance Distance to furthest descendant point. * @param metric Instantiated metric (optional). */ CoverTree(const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, const double furthestDescendantDistance, MetricType* metric = NULL); /** * Create a cover tree from another tree. Be careful! This may use a lot of * memory and take a lot of time. * * @param other Cover tree to copy from. */ CoverTree(const CoverTree& other); /** * Delete this cover tree node and its children. */ ~CoverTree(); //! A single-tree cover tree traverser; see single_tree_traverser.hpp for //! implementation. template class SingleTreeTraverser; //! A dual-tree cover tree traverser; see dual_tree_traverser.hpp. template class DualTreeTraverser; //! Get a reference to the dataset. const arma::mat& Dataset() const { return dataset; } //! Get the index of the point which this node represents. size_t Point() const { return point; } //! For compatibility with other trees; the argument is ignored. size_t Point(const size_t) const { return point; } bool IsLeaf() const { return (children.size() == 0); } size_t NumPoints() const { return 1; } //! Get a particular child node. const CoverTree& Child(const size_t index) const { return *children[index]; } //! Modify a particular child node. CoverTree& Child(const size_t index) { return *children[index]; } //! Get the number of children. size_t NumChildren() const { return children.size(); } //! Get the children. const std::vector& Children() const { return children; } //! Modify the children manually (maybe not a great idea). std::vector& Children() { return children; } //! Get the number of descendant points. size_t NumDescendants() const; //! Get the index of a particular descendant point. size_t Descendant(const size_t index) const; //! Get the scale of this node. int Scale() const { return scale; } //! Modify the scale of this node. Be careful... int& Scale() { return scale; } //! Get the base. double Base() const { return base; } //! Modify the base; don't do this, you'll break everything. double& Base() { return base; } //! Get the statistic for this node. const StatisticType& Stat() const { return stat; } //! Modify the statistic for this node. StatisticType& Stat() { return stat; } //! Return the minimum distance to another node. double MinDistance(const CoverTree* other) const; //! Return the minimum distance to another node given that the point-to-point //! distance has already been calculated. double MinDistance(const CoverTree* other, const double distance) const; //! Return the minimum distance to another point. double MinDistance(const arma::vec& other) const; //! Return the minimum distance to another point given that the distance from //! the center to the point has already been calculated. double MinDistance(const arma::vec& other, const double distance) const; //! Return the maximum distance to another node. double MaxDistance(const CoverTree* other) const; //! Return the maximum distance to another node given that the point-to-point //! distance has already been calculated. double MaxDistance(const CoverTree* other, const double distance) const; //! Return the maximum distance to another point. double MaxDistance(const arma::vec& other) const; //! Return the maximum distance to another point given that the distance from //! the center to the point has already been calculated. double MaxDistance(const arma::vec& other, const double distance) const; //! Return the minimum and maximum distance to another node. math::Range RangeDistance(const CoverTree* other) const; //! Return the minimum and maximum distance to another node given that the //! point-to-point distance has already been calculated. math::Range RangeDistance(const CoverTree* other, const double distance) const; //! Return the minimum and maximum distance to another point. math::Range RangeDistance(const arma::vec& other) const; //! Return the minimum and maximum distance to another point given that the //! point-to-point distance has already been calculated. math::Range RangeDistance(const arma::vec& other, const double distance) const; //! Returns true: this tree does have self-children. static bool HasSelfChildren() { return true; } //! Get the parent node. CoverTree* Parent() const { return parent; } //! Modify the parent node. CoverTree*& Parent() { return parent; } //! Get the distance to the parent. double ParentDistance() const { return parentDistance; } //! Modify the distance to the parent. double& ParentDistance() { return parentDistance; } //! Get the distance to the furthest point. This is always 0 for cover trees. double FurthestPointDistance() const { return 0.0; } //! Get the distance from the center of the node to the furthest descendant. double FurthestDescendantDistance() const { return furthestDescendantDistance; } //! Modify the distance from the center of the node to the furthest //! descendant. double& FurthestDescendantDistance() { return furthestDescendantDistance; } //! Get the minimum distance from the center to any bound edge (this is the //! same as furthestDescendantDistance). double MinimumBoundDistance() const { return furthestDescendantDistance; } //! Get the centroid of the node and store it in the given vector. void Centroid(arma::vec& centroid) const { centroid = dataset.col(point); } //! Get the instantiated metric. MetricType& Metric() const { return *metric; } private: //! Reference to the matrix which this tree is built on. const arma::mat& dataset; //! Index of the point in the matrix which this node represents. size_t point; //! The list of children; the first is the self-child. std::vector children; //! Scale level of the node. int scale; //! The base used to construct the tree. double base; //! The instantiated statistic. StatisticType stat; //! The number of descendant points. size_t numDescendants; //! The parent node (NULL if this is the root of the tree). CoverTree* parent; //! Distance to the parent. double parentDistance; //! Distance to the furthest descendant. double furthestDescendantDistance; //! Whether or not we need to destroy the metric in the destructor. bool localMetric; //! The metric used for this tree. MetricType* metric; /** * Create the children for this node. */ void CreateChildren(arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize); /** * Fill the vector of distances with the distances between the point specified * by pointIndex and each point in the indices array. The distances of the * first pointSetSize points in indices are calculated (so, this does not * necessarily need to use all of the points in the arrays). * * @param pointIndex Point to build the distances for. * @param indices List of indices to compute distances for. * @param distances Vector to store calculated distances in. * @param pointSetSize Number of points in arrays to calculate distances for. */ void ComputeDistances(const size_t pointIndex, const arma::Col& indices, arma::vec& distances, const size_t pointSetSize); /** * Split the given indices and distances into a near and a far set, returning * the number of points in the near set. The distances must already be * initialized. This will order the indices and distances such that the * points in the near set make up the first part of the array and the far set * makes up the rest: [ nearSet | farSet ]. * * @param indices List of indices; will be reordered. * @param distances List of distances; will be reordered. * @param bound If the distance is less than or equal to this bound, the point * is placed into the near set. * @param pointSetSize Size of point set (because we may be sorting a smaller * list than the indices vector will hold). */ size_t SplitNearFar(arma::Col& indices, arma::vec& distances, const double bound, const size_t pointSetSize); /** * Assuming that the list of indices and distances is sorted as * [ childFarSet | childUsedSet | farSet | usedSet ], * resort the sets so the organization is * [ childFarSet | farSet | childUsedSet | usedSet ]. * * The size_t parameters specify the sizes of each set in the array. Only the * ordering of the indices and distances arrays will be modified (not their * actual contents). * * The size of any of the four sets can be zero and this method will handle * that case accordingly. * * @param indices List of indices to sort. * @param distances List of distances to sort. * @param childFarSetSize Number of points in child far set (childFarSet). * @param childUsedSetSize Number of points in child used set (childUsedSet). * @param farSetSize Number of points in far set (farSet). */ size_t SortPointSet(arma::Col& indices, arma::vec& distances, const size_t childFarSetSize, const size_t childUsedSetSize, const size_t farSetSize); void MoveToUsedSet(arma::Col& indices, arma::vec& distances, size_t& nearSetSize, size_t& farSetSize, size_t& usedSetSize, arma::Col& childIndices, const size_t childFarSetSize, const size_t childUsedSetSize); size_t PruneFarSet(arma::Col& indices, arma::vec& distances, const double bound, const size_t nearSetSize, const size_t pointSetSize); /** * Take a look at the last child (the most recently created one) and remove * any implicit nodes that have been created. */ void RemoveNewImplicitNodes(); public: /** * Returns a string representation of this object. */ std::string ToString() const; size_t DistanceComps() const { return distanceComps; } size_t& DistanceComps() { return distanceComps; } private: size_t distanceComps; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "cover_tree_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/first_point_is_root.hpp0000644000176200001440000000343313647512751026566 0ustar liggesusers/** * @file first_point_is_root.hpp * @author Ryan Curtin * * A very simple policy for the cover tree; the first point in the dataset is * chosen as the root of the cover tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_FIRST_POINT_IS_ROOT_HPP #define __MLPACK_CORE_TREE_FIRST_POINT_IS_ROOT_HPP #include namespace mlpack { namespace tree { /** * This class is meant to be used as a choice for the policy class * RootPointPolicy of the CoverTree class. This policy determines which point * is used for the root node of the cover tree. This particular implementation * simply chooses the first point in the dataset as the root. A more complex * implementation might choose, for instance, the point with least maximum * distance to other points (the closest to the "middle"). */ class FirstPointIsRoot { public: /** * Return the point to be used as the root point of the cover tree. This just * returns 0. */ static size_t ChooseRoot(const arma::mat& /* dataset */) { return 0; } }; }; // namespace tree }; // namespace mlpack #endif // __MLPACK_CORE_TREE_FIRST_POINT_IS_ROOT_HPP RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/dual_tree_traverser.hpp0000644000176200001440000000672613647512751026541 0ustar liggesusers/** * @file dual_tree_traverser.hpp * @author Ryan Curtin * * A dual-tree traverser for the cover tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_HPP #include #include namespace mlpack { namespace tree { template template class CoverTree::DualTreeTraverser { public: /** * Initialize the dual tree traverser with the given rule type. */ DualTreeTraverser(RuleType& rule); /** * Traverse the two specified trees. * * @param queryNode Root of query tree. * @param referenceNode Root of reference tree. */ void Traverse(CoverTree& queryNode, CoverTree& referenceNode); //! Get the number of pruned nodes. size_t NumPrunes() const { return numPrunes; } //! Modify the number of pruned nodes. size_t& NumPrunes() { return numPrunes; } ///// These are all fake because this is a patch for kd-trees only and I still ///// want it to compile! size_t NumVisited() const { return 0; } size_t NumScores() const { return 0; } size_t NumBaseCases() const { return 0; } private: //! The instantiated rule set for pruning branches. RuleType& rule; //! The number of pruned nodes. size_t numPrunes; //! Struct used for traversal. struct DualCoverTreeMapEntry { //! The node this entry refers to. CoverTree* referenceNode; //! The score of the node. double score; //! The base case. double baseCase; //! The traversal info associated with the call to Score() for this entry. typename RuleType::TraversalInfoType traversalInfo; //! Comparison operator, for sorting within the map. bool operator<(const DualCoverTreeMapEntry& other) const { if (score == other.score) return (baseCase < other.baseCase); else return (score < other.score); } }; /** * Helper function for traversal of the two trees. */ void Traverse(CoverTree& queryNode, std::map >& referenceMap); //! Prepare map for recursion. void PruneMap(CoverTree& queryNode, std::map >& referenceMap, std::map >& childMap); void ReferenceRecursion(CoverTree& queryNode, std::map >& referenceMap); }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "dual_tree_traverser_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/cover_tree_impl.hpp0000644000176200001440000010625613647512751025655 0ustar liggesusers/** * @file cover_tree_impl.hpp * @author Ryan Curtin * * Implementation of CoverTree class. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_IMPL_HPP #define __MLPACK_CORE_TREE_COVER_TREE_COVER_TREE_IMPL_HPP // In case it hasn't already been included. #include "cover_tree.hpp" #include #include namespace mlpack { namespace tree { // Create the cover tree. template CoverTree::CoverTree( const arma::mat& dataset, const double base, MetricType* metric) : dataset(dataset), point(RootPointPolicy::ChooseRoot(dataset)), scale(INT_MAX), base(base), numDescendants(0), parent(NULL), parentDistance(0), furthestDescendantDistance(0), localMetric(metric == NULL), metric(metric), distanceComps(0) { // If we need to create a metric, do that. We'll just do it on the heap. if (localMetric) this->metric = new MetricType(); // If there is only one point in the dataset... uh, we're done. if (dataset.n_cols == 1) return; // Kick off the building. Create the indices array and the distances array. arma::Col indices = arma::linspace >(1, dataset.n_cols - 1, dataset.n_cols - 1); // This is now [1 2 3 4 ... n]. We must be sure that our point does not // occur. if (point != 0) indices[point - 1] = 0; // Put 0 back into the set; remove what was there. arma::vec distances(dataset.n_cols - 1); // Build the initial distances. ComputeDistances(point, indices, distances, dataset.n_cols - 1); // Create the children. size_t farSetSize = 0; size_t usedSetSize = 0; CreateChildren(indices, distances, dataset.n_cols - 1, farSetSize, usedSetSize); // If we ended up creating only one child, remove the implicit node. while (children.size() == 1) { // Prepare to delete the implicit child node. CoverTree* old = children[0]; // Now take its children and set their parent correctly. children.erase(children.begin()); for (size_t i = 0; i < old->NumChildren(); ++i) { children.push_back(&(old->Child(i))); // Set its parent correctly. old->Child(i).Parent() = this; } // Remove all the children so they don't get erased. old->Children().clear(); // Reduce our own scale. scale = old->Scale(); // Now delete it. delete old; } // Use the furthest descendant distance to determine the scale of the root // node. scale = (int) ceil(log(furthestDescendantDistance) / log(base)); // Initialize statistic. stat = StatisticType(*this); Rcpp::Rcout << distanceComps << " distance computations during tree " << "construction." << std::endl; } template CoverTree::CoverTree( const arma::mat& dataset, MetricType& metric, const double base) : dataset(dataset), point(RootPointPolicy::ChooseRoot(dataset)), scale(INT_MAX), base(base), numDescendants(0), parent(NULL), parentDistance(0), furthestDescendantDistance(0), localMetric(false), metric(&metric), distanceComps(0) { // If there is only one point in the dataset, uh, we're done. if (dataset.n_cols == 1) return; // Kick off the building. Create the indices array and the distances array. arma::Col indices = arma::linspace >(1, dataset.n_cols - 1, dataset.n_cols - 1); // This is now [1 2 3 4 ... n]. We must be sure that our point does not // occur. if (point != 0) indices[point - 1] = 0; // Put 0 back into the set; remove what was there. arma::vec distances(dataset.n_cols - 1); // Build the initial distances. ComputeDistances(point, indices, distances, dataset.n_cols - 1); // Create the children. size_t farSetSize = 0; size_t usedSetSize = 0; CreateChildren(indices, distances, dataset.n_cols - 1, farSetSize, usedSetSize); // If we ended up creating only one child, remove the implicit node. while (children.size() == 1) { // Prepare to delete the implicit child node. CoverTree* old = children[0]; // Now take its children and set their parent correctly. children.erase(children.begin()); for (size_t i = 0; i < old->NumChildren(); ++i) { children.push_back(&(old->Child(i))); // Set its parent correctly. old->Child(i).Parent() = this; } // Remove all the children so they don't get erased. old->Children().clear(); // Reduce our own scale. scale = old->Scale(); // Now delete it. delete old; } // Use the furthest descendant distance to determine the scale of the root // node. scale = (int) ceil(log(furthestDescendantDistance) / log(base)); // Initialize statistic. stat = StatisticType(*this); Rcpp::Rcout << distanceComps << " distance computations during tree " << "construction." << std::endl; } template CoverTree::CoverTree( const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize, MetricType& metric) : dataset(dataset), point(pointIndex), scale(scale), base(base), numDescendants(0), parent(parent), parentDistance(parentDistance), furthestDescendantDistance(0), localMetric(false), metric(&metric), distanceComps(0) { // If the size of the near set is 0, this is a leaf. if (nearSetSize == 0) { this->scale = INT_MIN; numDescendants = 1; stat = StatisticType(*this); return; } // Otherwise, create the children. CreateChildren(indices, distances, nearSetSize, farSetSize, usedSetSize); // Initialize statistic. stat = StatisticType(*this); } // Manually create a cover tree node. template CoverTree::CoverTree( const arma::mat& dataset, const double base, const size_t pointIndex, const int scale, CoverTree* parent, const double parentDistance, const double furthestDescendantDistance, MetricType* metric) : dataset(dataset), point(pointIndex), scale(scale), base(base), numDescendants(0), parent(parent), parentDistance(parentDistance), furthestDescendantDistance(furthestDescendantDistance), localMetric(metric == NULL), metric(metric), distanceComps(0) { // If necessary, create a local metric. if (localMetric) this->metric = new MetricType(); // Initialize the statistic. stat = StatisticType(*this); } template CoverTree::CoverTree( const CoverTree& other) : dataset(other.dataset), point(other.point), scale(other.scale), base(other.base), stat(other.stat), numDescendants(other.numDescendants), parent(other.parent), parentDistance(other.parentDistance), furthestDescendantDistance(other.furthestDescendantDistance), localMetric(false), metric(other.metric), distanceComps(0) { // Copy each child by hand. for (size_t i = 0; i < other.NumChildren(); ++i) { children.push_back(new CoverTree(other.Child(i))); children[i]->Parent() = this; } } template CoverTree::~CoverTree() { // Delete each child. for (size_t i = 0; i < children.size(); ++i) delete children[i]; // Delete the local metric, if necessary. if (localMetric) delete metric; } //! Return the number of descendant points. template inline size_t CoverTree::NumDescendants() const { return numDescendants; } //! Return the index of a particular descendant point. template inline size_t CoverTree::Descendant( const size_t index) const { // The first descendant is the point contained within this node. if (index == 0) return point; // Is it in the self-child? if (index < children[0]->NumDescendants()) return children[0]->Descendant(index); // Now check the other children. size_t sum = children[0]->NumDescendants(); for (size_t i = 1; i < children.size(); ++i) { if (index - sum < children[i]->NumDescendants()) return children[i]->Descendant(index - sum); sum += children[i]->NumDescendants(); } // This should never happen. return (size_t() - 1); } template double CoverTree::MinDistance( const CoverTree* other) const { // Every cover tree node will contain points up to base^(scale + 1) away. return std::max(metric->Evaluate(dataset.unsafe_col(point), other->Dataset().unsafe_col(other->Point())) - furthestDescendantDistance - other->FurthestDescendantDistance(), 0.0); } template double CoverTree::MinDistance( const CoverTree* other, const double distance) const { // We already have the distance as evaluated by the metric. return std::max(distance - furthestDescendantDistance - other->FurthestDescendantDistance(), 0.0); } template double CoverTree::MinDistance( const arma::vec& other) const { return std::max(metric->Evaluate(dataset.unsafe_col(point), other) - furthestDescendantDistance, 0.0); } template double CoverTree::MinDistance( const arma::vec& /* other */, const double distance) const { return std::max(distance - furthestDescendantDistance, 0.0); } template double CoverTree::MaxDistance( const CoverTree* other) const { return metric->Evaluate(dataset.unsafe_col(point), other->Dataset().unsafe_col(other->Point())) + furthestDescendantDistance + other->FurthestDescendantDistance(); } template double CoverTree::MaxDistance( const CoverTree* other, const double distance) const { // We already have the distance as evaluated by the metric. return distance + furthestDescendantDistance + other->FurthestDescendantDistance(); } template double CoverTree::MaxDistance( const arma::vec& other) const { return metric->Evaluate(dataset.unsafe_col(point), other) + furthestDescendantDistance; } template double CoverTree::MaxDistance( const arma::vec& /* other */, const double distance) const { return distance + furthestDescendantDistance; } //! Return the minimum and maximum distance to another node. template math::Range CoverTree:: RangeDistance(const CoverTree* other) const { const double distance = metric->Evaluate(dataset.unsafe_col(point), other->Dataset().unsafe_col(other->Point())); math::Range result; result.Lo() = distance - furthestDescendantDistance - other->FurthestDescendantDistance(); result.Hi() = distance + furthestDescendantDistance + other->FurthestDescendantDistance(); return result; } //! Return the minimum and maximum distance to another node given that the //! point-to-point distance has already been calculated. template math::Range CoverTree:: RangeDistance(const CoverTree* other, const double distance) const { math::Range result; result.Lo() = distance - furthestDescendantDistance - other->FurthestDescendantDistance(); result.Hi() = distance + furthestDescendantDistance + other->FurthestDescendantDistance(); return result; } //! Return the minimum and maximum distance to another point. template math::Range CoverTree:: RangeDistance(const arma::vec& other) const { const double distance = metric->Evaluate(dataset.unsafe_col(point), other); return math::Range(distance - furthestDescendantDistance, distance + furthestDescendantDistance); } //! Return the minimum and maximum distance to another point given that the //! point-to-point distance has already been calculated. template math::Range CoverTree:: RangeDistance(const arma::vec& /* other */, const double distance) const { return math::Range(distance - furthestDescendantDistance, distance + furthestDescendantDistance); } //! For a newly initialized node, create children using the near and far set. template inline void CoverTree::CreateChildren( arma::Col& indices, arma::vec& distances, size_t nearSetSize, size_t& farSetSize, size_t& usedSetSize) { // Determine the next scale level. This should be the first level where there // are any points in the far set. So, if we know the maximum distance in the // distances array, this will be the largest i such that // maxDistance > pow(base, i) // and using this for the scale factor should guarantee we are not creating an // implicit node. If the maximum distance is 0, every point in the near set // will be created as a leaf, and a child to this node. We also do not need // to change the furthestChildDistance or furthestDescendantDistance. const double maxDistance = max(distances.rows(0, nearSetSize + farSetSize - 1)); if (maxDistance == 0) { // Make the self child at the lowest possible level. // This should not modify farSetSize or usedSetSize. size_t tempSize = 0; children.push_back(new CoverTree(dataset, base, point, INT_MIN, this, 0, indices, distances, 0, tempSize, usedSetSize, *metric)); distanceComps += children.back()->DistanceComps(); // Every point in the near set should be a leaf. for (size_t i = 0; i < nearSetSize; ++i) { // farSetSize and usedSetSize will not be modified. children.push_back(new CoverTree(dataset, base, indices[i], INT_MIN, this, distances[i], indices, distances, 0, tempSize, usedSetSize, *metric)); distanceComps += children.back()->DistanceComps(); usedSetSize++; } // The number of descendants is just the number of children, because each of // them are leaves and contain one point. numDescendants = children.size(); // Re-sort the dataset. We have // [ used | far | other used ] // and we want // [ far | all used ]. SortPointSet(indices, distances, 0, usedSetSize, farSetSize); return; } const int nextScale = std::min(scale, (int) ceil(log(maxDistance) / log(base))) - 1; const double bound = pow(base, nextScale); // First, make the self child. We must split the given near set into the near // set and far set for the self child. size_t childNearSetSize = SplitNearFar(indices, distances, bound, nearSetSize); // Build the self child (recursively). size_t childFarSetSize = nearSetSize - childNearSetSize; size_t childUsedSetSize = 0; children.push_back(new CoverTree(dataset, base, point, nextScale, this, 0, indices, distances, childNearSetSize, childFarSetSize, childUsedSetSize, *metric)); // Don't double-count the self-child (so, subtract one). numDescendants += children[0]->NumDescendants(); // The self-child can't modify the furthestChildDistance away from 0, but it // can modify the furthestDescendantDistance. furthestDescendantDistance = children[0]->FurthestDescendantDistance(); // Remove any implicit nodes we may have created. RemoveNewImplicitNodes(); distanceComps += children[0]->DistanceComps(); // Now the arrays, in memory, look like this: // [ childFar | childUsed | far | used ] // but we need to move the used points past our far set: // [ childFar | far | childUsed + used ] // and keeping in mind that childFar = our near set, // [ near | far | childUsed + used ] // is what we are trying to make. SortPointSet(indices, distances, childFarSetSize, childUsedSetSize, farSetSize); // Update size of near set and used set. nearSetSize -= childUsedSetSize; usedSetSize += childUsedSetSize; // Now for each point in the near set, we need to make children. To save // computation later, we'll create an array holding the points in the near // set, and then after each run we'll check which of those (if any) were used // and we will remove them. ...if that's faster. I think it is. while (nearSetSize > 0) { size_t newPointIndex = nearSetSize - 1; // Swap to front if necessary. if (newPointIndex != 0) { const size_t tempIndex = indices[newPointIndex]; const double tempDist = distances[newPointIndex]; indices[newPointIndex] = indices[0]; distances[newPointIndex] = distances[0]; indices[0] = tempIndex; distances[0] = tempDist; } // Will this be a new furthest child? if (distances[0] > furthestDescendantDistance) furthestDescendantDistance = distances[0]; // If there's only one point left, we don't need this crap. if ((nearSetSize == 1) && (farSetSize == 0)) { size_t childNearSetSize = 0; children.push_back(new CoverTree(dataset, base, indices[0], nextScale, this, distances[0], indices, distances, childNearSetSize, farSetSize, usedSetSize, *metric)); distanceComps += children.back()->DistanceComps(); numDescendants += children.back()->NumDescendants(); // Because the far set size is 0, we don't have to do any swapping to // move the point into the used set. ++usedSetSize; --nearSetSize; // And we're done. break; } // Create the near and far set indices and distance vectors. We don't fill // in the self-point, yet. arma::Col childIndices(nearSetSize + farSetSize); childIndices.rows(0, (nearSetSize + farSetSize - 2)) = indices.rows(1, nearSetSize + farSetSize - 1); arma::vec childDistances(nearSetSize + farSetSize); // Build distances for the child. ComputeDistances(indices[0], childIndices, childDistances, nearSetSize + farSetSize - 1); // Split into near and far sets for this point. childNearSetSize = SplitNearFar(childIndices, childDistances, bound, nearSetSize + farSetSize - 1); childFarSetSize = PruneFarSet(childIndices, childDistances, base * bound, childNearSetSize, (nearSetSize + farSetSize - 1)); // Now that we know the near and far set sizes, we can put the used point // (the self point) in the correct place; now, when we call // MoveToUsedSet(), it will move the self-point correctly. The distance // does not matter. childIndices(childNearSetSize + childFarSetSize) = indices[0]; childDistances(childNearSetSize + childFarSetSize) = 0; // Build this child (recursively). childUsedSetSize = 1; // Mark self point as used. children.push_back(new CoverTree(dataset, base, indices[0], nextScale, this, distances[0], childIndices, childDistances, childNearSetSize, childFarSetSize, childUsedSetSize, *metric)); numDescendants += children.back()->NumDescendants(); // Remove any implicit nodes. RemoveNewImplicitNodes(); distanceComps += children.back()->DistanceComps(); // Now with the child created, it returns the childIndices and // childDistances vectors in this form: // [ childFar | childUsed ] // For each point in the childUsed set, we must move that point to the used // set in our own vector. MoveToUsedSet(indices, distances, nearSetSize, farSetSize, usedSetSize, childIndices, childFarSetSize, childUsedSetSize); } // Calculate furthest descendant. for (size_t i = (nearSetSize + farSetSize); i < (nearSetSize + farSetSize + usedSetSize); ++i) if (distances[i] > furthestDescendantDistance) furthestDescendantDistance = distances[i]; } template size_t CoverTree::SplitNearFar( arma::Col& indices, arma::vec& distances, const double bound, const size_t pointSetSize) { // Sanity check; there is no guarantee that this condition will not be true. // ...or is there? if (pointSetSize <= 1) return 0; // We'll traverse from both left and right. size_t left = 0; size_t right = pointSetSize - 1; // A modification of quicksort, with the pivot value set to the bound. // Everything on the left of the pivot will be less than or equal to the // bound; everything on the right will be greater than the bound. while ((distances[left] <= bound) && (left != right)) ++left; while ((distances[right] > bound) && (left != right)) --right; while (left != right) { // Now swap the values and indices. const size_t tempPoint = indices[left]; const double tempDist = distances[left]; indices[left] = indices[right]; distances[left] = distances[right]; indices[right] = tempPoint; distances[right] = tempDist; // Traverse the left, seeing how many points are correctly on that side. // When we encounter an incorrect point, stop. We will switch it later. while ((distances[left] <= bound) && (left != right)) ++left; // Traverse the right, seeing how many points are correctly on that side. // When we encounter an incorrect point, stop. We will switch it with the // wrong point from the left side. while ((distances[right] > bound) && (left != right)) --right; } // The final left value is the index of the first far value. return left; } // Returns the maximum distance between points. template void CoverTree::ComputeDistances( const size_t pointIndex, const arma::Col& indices, arma::vec& distances, const size_t pointSetSize) { // For each point, rebuild the distances. The indices do not need to be // modified. distanceComps += pointSetSize; for (size_t i = 0; i < pointSetSize; ++i) { distances[i] = metric->Evaluate(dataset.unsafe_col(pointIndex), dataset.unsafe_col(indices[i])); } } template size_t CoverTree::SortPointSet( arma::Col& indices, arma::vec& distances, const size_t childFarSetSize, const size_t childUsedSetSize, const size_t farSetSize) { // We'll use low-level memcpy calls ourselves, just to ensure it's done // quickly and the way we want it to be. Unfortunately this takes up more // memory than one-element swaps, but there's not a great way around that. const size_t bufferSize = std::min(farSetSize, childUsedSetSize); const size_t bigCopySize = std::max(farSetSize, childUsedSetSize); // Sanity check: there is no need to sort if the buffer size is going to be // zero. if (bufferSize == 0) return (childFarSetSize + farSetSize); size_t* indicesBuffer = new size_t[bufferSize]; double* distancesBuffer = new double[bufferSize]; // The start of the memory region to copy to the buffer. const size_t bufferFromLocation = ((bufferSize == farSetSize) ? (childFarSetSize + childUsedSetSize) : childFarSetSize); // The start of the memory region to move directly to the new place. const size_t directFromLocation = ((bufferSize == farSetSize) ? childFarSetSize : (childFarSetSize + childUsedSetSize)); // The destination to copy the buffer back to. const size_t bufferToLocation = ((bufferSize == farSetSize) ? childFarSetSize : (childFarSetSize + farSetSize)); // The destination of the directly moved memory region. const size_t directToLocation = ((bufferSize == farSetSize) ? (childFarSetSize + farSetSize) : childFarSetSize); // Copy the smaller piece to the buffer. memcpy(indicesBuffer, indices.memptr() + bufferFromLocation, sizeof(size_t) * bufferSize); memcpy(distancesBuffer, distances.memptr() + bufferFromLocation, sizeof(double) * bufferSize); // Now move the other memory. memmove(indices.memptr() + directToLocation, indices.memptr() + directFromLocation, sizeof(size_t) * bigCopySize); memmove(distances.memptr() + directToLocation, distances.memptr() + directFromLocation, sizeof(double) * bigCopySize); // Now copy the temporary memory to the right place. memcpy(indices.memptr() + bufferToLocation, indicesBuffer, sizeof(size_t) * bufferSize); memcpy(distances.memptr() + bufferToLocation, distancesBuffer, sizeof(double) * bufferSize); delete[] indicesBuffer; delete[] distancesBuffer; // This returns the complete size of the far set. return (childFarSetSize + farSetSize); } template void CoverTree::MoveToUsedSet( arma::Col& indices, arma::vec& distances, size_t& nearSetSize, size_t& farSetSize, size_t& usedSetSize, arma::Col& childIndices, const size_t childFarSetSize, // childNearSetSize is 0 in this case. const size_t childUsedSetSize) { const size_t originalSum = nearSetSize + farSetSize + usedSetSize; // Loop across the set. We will swap points as we need. It should be noted // that farSetSize and nearSetSize may change with each iteration of this loop // (depending on if we make a swap or not). size_t startChildUsedSet = 0; // Where to start in the child set. for (size_t i = 0; i < nearSetSize; ++i) { // Discover if this point was in the child's used set. for (size_t j = startChildUsedSet; j < childUsedSetSize; ++j) { if (childIndices[childFarSetSize + j] == indices[i]) { // We have found a point; a swap is necessary. // Since this point is from the near set, to preserve the near set, we // must do a swap. if (farSetSize > 0) { if ((nearSetSize - 1) != i) { // In this case it must be a three-way swap. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; size_t tempNearIndex = indices[nearSetSize - 1]; double tempNearDist = distances[nearSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[i]; distances[nearSetSize + farSetSize - 1] = distances[i]; indices[nearSetSize - 1] = tempIndex; distances[nearSetSize - 1] = tempDist; indices[i] = tempNearIndex; distances[i] = tempNearDist; } else { // We can do a two-way swap. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[i]; distances[nearSetSize + farSetSize - 1] = distances[i]; indices[i] = tempIndex; distances[i] = tempDist; } } else if ((nearSetSize - 1) != i) { // A two-way swap is possible. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[i]; distances[nearSetSize + farSetSize - 1] = distances[i]; indices[i] = tempIndex; distances[i] = tempDist; } else { // No swap is necessary. } // We don't need to do a complete preservation of the child index set, // but we want to make sure we only loop over points we haven't seen. // So increment the child counter by 1 and move a point if we need. if (j != startChildUsedSet) { childIndices[childFarSetSize + j] = childIndices[childFarSetSize + startChildUsedSet]; } // Update all counters from the swaps we have done. ++startChildUsedSet; --nearSetSize; --i; // Since we moved a point out of the near set we must step back. break; // Break out of this for loop; back to the first one. } } } // Now loop over the far set. This loop is different because we only require // a normal two-way swap instead of the three-way swap to preserve the near // set / far set ordering. for (size_t i = 0; i < farSetSize; ++i) { // Discover if this point was in the child's used set. for (size_t j = startChildUsedSet; j < childUsedSetSize; ++j) { if (childIndices[childFarSetSize + j] == indices[i + nearSetSize]) { // We have found a point to swap. // Perform the swap. size_t tempIndex = indices[nearSetSize + farSetSize - 1]; double tempDist = distances[nearSetSize + farSetSize - 1]; indices[nearSetSize + farSetSize - 1] = indices[nearSetSize + i]; distances[nearSetSize + farSetSize - 1] = distances[nearSetSize + i]; indices[nearSetSize + i] = tempIndex; distances[nearSetSize + i] = tempDist; if (j != startChildUsedSet) { childIndices[childFarSetSize + j] = childIndices[childFarSetSize + startChildUsedSet]; } // Update all counters from the swaps we have done. ++startChildUsedSet; --farSetSize; --i; break; // Break out of this for loop; back to the first one. } } } // Update used set size. usedSetSize += childUsedSetSize; //Log::Assert(originalSum == (nearSetSize + farSetSize + usedSetSize)); } template size_t CoverTree::PruneFarSet( arma::Col& indices, arma::vec& distances, const double bound, const size_t nearSetSize, const size_t pointSetSize) { // What we are trying to do is remove any points greater than the bound from // the far set. We don't care what happens to those indices and distances... // so, we don't need to properly swap points -- just drop new ones in place. size_t left = nearSetSize; size_t right = pointSetSize - 1; while ((distances[left] <= bound) && (left != right)) ++left; while ((distances[right] > bound) && (left != right)) --right; while (left != right) { // We don't care what happens to the point which should be on the right. indices[left] = indices[right]; distances[left] = distances[right]; --right; // Since we aren't changing the right. // Advance to next location which needs to switch. while ((distances[left] <= bound) && (left != right)) ++left; while ((distances[right] > bound) && (left != right)) --right; } // The far set size is the left pointer, with the near set size subtracted // from it. return (left - nearSetSize); } /** * Take a look at the last child (the most recently created one) and remove any * implicit nodes that have been created. */ template inline void CoverTree:: RemoveNewImplicitNodes() { // If we created an implicit node, take its self-child instead (this could // happen multiple times). while (children[children.size() - 1]->NumChildren() == 1) { CoverTree* old = children[children.size() - 1]; children.erase(children.begin() + children.size() - 1); // Now take its child. children.push_back(&(old->Child(0))); // Set its parent correctly. old->Child(0).Parent() = this; old->Child(0).ParentDistance() = old->ParentDistance(); old->Child(0).DistanceComps() = old->DistanceComps(); // Remove its child (so it doesn't delete it). old->Children().erase(old->Children().begin() + old->Children().size() - 1); // Now delete it. delete old; } } /** * Returns a string representation of this object. */ template std::string CoverTree::ToString() const { std::ostringstream convert; convert << "CoverTree [" << this << "]" << std::endl; convert << "dataset: " << &dataset << std::endl; convert << "point: " << point << std::endl; convert << "scale: " << scale << std::endl; convert << "base: " << base << std::endl; // convert << "StatisticType: " << stat << std::endl; convert << "parent distance : " << parentDistance << std::endl; convert << "furthest child distance: " << furthestDescendantDistance; convert << std::endl; convert << "children:"; // How many levels should we print? This will print the top two tree levels. if (IsLeaf() == false && parent == NULL) { for (size_t i = 0; i < children.size(); i++) { convert << std::endl << mlpack::util::Indent(children.at(i)->ToString()); } } return convert.str(); } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/single_tree_traverser.hpp0000644000176200001440000000433613647512751027070 0ustar liggesusers/** * @file single_tree_traverser.hpp * @author Ryan Curtin * * Defines the SingleTreeTraverser for the cover tree. This implements a * single-tree breadth-first recursion with a pruning rule and a base case (two * point) rule. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_HPP #define __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_HPP #include #include "cover_tree.hpp" namespace mlpack { namespace tree { template template class CoverTree::SingleTreeTraverser { public: /** * Initialize the single tree traverser with the given rule. */ SingleTreeTraverser(RuleType& rule); /** * Traverse the tree with the given point. * * @param queryIndex The index of the point in the query set which is used as * the query point. * @param referenceNode The tree node to be traversed. */ void Traverse(const size_t queryIndex, CoverTree& referenceNode); //! Get the number of prunes so far. size_t NumPrunes() const { return numPrunes; } //! Set the number of prunes (good for a reset to 0). size_t& NumPrunes() { return numPrunes; } private: //! Reference to the rules with which the tree will be traversed. RuleType& rule; //! The number of nodes which have been pruned during traversal. size_t numPrunes; }; }; // namespace tree }; // namespace mlpack // Include implementation. #include "single_tree_traverser_impl.hpp" #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/single_tree_traverser_impl.hpp0000644000176200001440000001716313647512751030113 0ustar liggesusers/** * @file single_tree_traverser_impl.hpp * @author Ryan Curtin * * Implementation of the single tree traverser for cover trees, which implements * a breadth-first traversal. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_COVER_TREE_SINGLE_TREE_TRAVERSER_IMPL_HPP // In case it hasn't been included yet. #include "single_tree_traverser.hpp" #include namespace mlpack { namespace tree { //! This is the structure the cover tree map will use for traversal. template struct CoverTreeMapEntry { //! The node this entry refers to. CoverTree* node; //! The score of the node. double score; //! The index of the parent node. size_t parent; //! The base case evaluation. double baseCase; //! Comparison operator. bool operator<(const CoverTreeMapEntry& other) const { return (score < other.score); } }; template template CoverTree:: SingleTreeTraverser::SingleTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0) { /* Nothing to do. */ } template template void CoverTree:: SingleTreeTraverser::Traverse( const size_t queryIndex, CoverTree& referenceNode) { // This is a non-recursive implementation (which should be faster than a // recursive implementation). typedef CoverTreeMapEntry MapEntryType; // We will use this map as a priority queue. Each key represents the scale, // and then the vector is all the nodes in that scale which need to be // investigated. Because no point in a scale can add a point in its own // scale, we know that the vector for each scale is final when we get to it. // In addition, map is organized in such a way that rbegin() will return the // largest scale. std::map > mapQueue; // Create the score for the children. double rootChildScore = rule.Score(queryIndex, referenceNode); if (rootChildScore == DBL_MAX) { numPrunes += referenceNode.NumChildren(); } else { // Manually add the children of the first node. // Often, a ruleset will return without doing any computation on cover trees // using TreeTraits::FirstPointIsCentroid; this is an optimization that // (theoretically) the compiler should get right. double rootBaseCase = rule.BaseCase(queryIndex, referenceNode.Point()); // Don't add the self-leaf. size_t i = 0; if (referenceNode.Child(0).NumChildren() == 0) { ++numPrunes; i = 1; } for (/* i was set above. */; i < referenceNode.NumChildren(); ++i) { MapEntryType newFrame; newFrame.node = &referenceNode.Child(i); newFrame.score = rootChildScore; newFrame.baseCase = rootBaseCase; newFrame.parent = referenceNode.Point(); // Put it into the map. mapQueue[newFrame.node->Scale()].push_back(newFrame); } } // Now begin the iteration through the map, but only if it has anything in it. if (mapQueue.empty()) return; typename std::map >::reverse_iterator rit = mapQueue.rbegin(); // We will treat the leaves differently (below). while ((*rit).first != INT_MIN) { // Get a reference to the current scale. std::vector& scaleVector = (*rit).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); // Now loop over each element. for (size_t i = 0; i < scaleVector.size(); ++i) { // Get a reference to the current element. const MapEntryType& frame = scaleVector.at(i); CoverTree* node = frame.node; const double score = frame.score; const size_t parent = frame.parent; const size_t point = node->Point(); double baseCase = frame.baseCase; // First we recalculate the score of this node to find if we can prune it. if (rule.Rescore(queryIndex, *node, score) == DBL_MAX) { ++numPrunes; continue; } // Create the score for the children. const double childScore = rule.Score(queryIndex, *node); // Now if this childScore is DBL_MAX we can prune all children. In this // recursion setup pruning is all or nothing for children. if (childScore == DBL_MAX) { numPrunes += node->NumChildren(); continue; } // If we are a self-child, the base case has already been evaluated. // Often, a ruleset will return without doing any computation on cover // trees using TreeTraits::FirstPointIsCentroid; this is an optimization // that (theoretically) the compiler should get right. if (point != parent) baseCase = rule.BaseCase(queryIndex, point); // Don't add the self-leaf. size_t j = 0; if (node->Child(0).NumChildren() == 0) { ++numPrunes; j = 1; } for (/* j is already set. */; j < node->NumChildren(); ++j) { MapEntryType newFrame; newFrame.node = &node->Child(j); newFrame.score = childScore; newFrame.baseCase = baseCase; newFrame.parent = point; mapQueue[newFrame.node->Scale()].push_back(newFrame); } } // Now clear the memory for this scale; it isn't needed anymore. mapQueue.erase((*rit).first); } // Now deal with the leaves. for (size_t i = 0; i < mapQueue[INT_MIN].size(); ++i) { const MapEntryType& frame = mapQueue[INT_MIN].at(i); CoverTree* node = frame.node; const double score = frame.score; const size_t point = node->Point(); // First, recalculate the score of this node to find if we can prune it. double rescore = rule.Rescore(queryIndex, *node, score); if (rescore == DBL_MAX) { ++numPrunes; continue; } // For this to be a valid dual-tree algorithm, we *must* evaluate the // combination, even if pruning it will make no difference. It's the // definition. const double actualScore = rule.Score(queryIndex, *node); if (actualScore == DBL_MAX) { ++numPrunes; continue; } else { // Evaluate the base case, since the combination was not pruned. // Often, a ruleset will return without doing any computation on cover // trees using TreeTraits::FirstPointIsCentroid; this is an optimization // that (theoretically) the compiler should get right. rule.BaseCase(queryIndex, point); } } } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/traits.hpp0000644000176200001440000000412013647512751023770 0ustar liggesusers/** * @file traits.hpp * @author Ryan Curtin * * This file contains the specialization of the TreeTraits class for the * CoverTree type of tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_TRAITS_HPP #define __MLPACK_CORE_TREE_COVER_TREE_TRAITS_HPP #include namespace mlpack { namespace tree { /** * The specialization of the TreeTraits class for the CoverTree tree type. It * defines characteristics of the cover tree, and is used to help write * tree-independent (but still optimized) tree-based algorithms. See * mlpack/core/tree/tree_traits.hpp for more information. */ template class TreeTraits > { public: /** * The cover tree (or, this implementation of it) does not require that * children represent non-overlapping subsets of the parent node. */ static const bool HasOverlappingChildren = true; /** * Each cover tree node contains only one point, and that point is its * centroid. */ static const bool FirstPointIsCentroid = true; /** * Cover trees do have self-children. */ static const bool HasSelfChildren = true; /** * Points are not rearranged when the tree is built. */ static const bool RearrangesDataset = false; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/cover_tree/dual_tree_traverser_impl.hpp0000644000176200001440000002764513647512751027565 0ustar liggesusers/** * @file dual_tree_traverser_impl.hpp * @author Ryan Curtin * * A dual-tree traverser for the cover tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP #define __MLPACK_CORE_TREE_COVER_TREE_DUAL_TREE_TRAVERSER_IMPL_HPP #include #include namespace mlpack { namespace tree { template template CoverTree:: DualTreeTraverser::DualTreeTraverser(RuleType& rule) : rule(rule), numPrunes(0) { /* Nothing to do. */ } template template void CoverTree:: DualTreeTraverser::Traverse( CoverTree& queryNode, CoverTree& referenceNode) { // Start by creating a map and adding the reference root node to it. std::map > refMap; DualCoverTreeMapEntry rootRefEntry; rootRefEntry.referenceNode = &referenceNode; // Perform the evaluation between the roots of either tree. rootRefEntry.score = rule.Score(queryNode, referenceNode); rootRefEntry.baseCase = rule.BaseCase(queryNode.Point(), referenceNode.Point()); rootRefEntry.traversalInfo = rule.TraversalInfo(); refMap[referenceNode.Scale()].push_back(rootRefEntry); Traverse(queryNode, refMap); } template template void CoverTree:: DualTreeTraverser::Traverse( CoverTree& queryNode, std::map >& referenceMap) { if (referenceMap.size() == 0) return; // Nothing to do! // First recurse down the reference nodes as necessary. ReferenceRecursion(queryNode, referenceMap); // Did the map get emptied? if (referenceMap.size() == 0) return; // Nothing to do! // Now, reduce the scale of the query node by recursing. But we can't recurse // if the query node is a leaf node. if ((queryNode.Scale() != INT_MIN) && (queryNode.Scale() >= (*referenceMap.rbegin()).first)) { // Recurse into the non-self-children first. The recursion order cannot // affect the runtime of the algorithm, because each query child recursion's // results are separate and independent. I don't think this is true in // every case, and we may have to modify this section to consider scores in // the future. for (size_t i = 1; i < queryNode.NumChildren(); ++i) { // We need a copy of the map for this child. std::map > childMap; PruneMap(queryNode.Child(i), referenceMap, childMap); Traverse(queryNode.Child(i), childMap); } std::map > selfChildMap; PruneMap(queryNode.Child(0), referenceMap, selfChildMap); Traverse(queryNode.Child(0), selfChildMap); } if (queryNode.Scale() != INT_MIN) return; // No need to evaluate base cases at this level. It's all done. // If we have made it this far, all we have is a bunch of base case // evaluations to do. //Log::Assert((*referenceMap.begin()).first == INT_MIN); //Log::Assert(queryNode.Scale() == INT_MIN); std::vector& pointVector = (*referenceMap.begin()).second; for (size_t i = 0; i < pointVector.size(); ++i) { // Get a reference to the frame. const DualCoverTreeMapEntry& frame = pointVector[i]; CoverTree* refNode = frame.referenceNode; // If the point is the same as both parents, then we have already done this // base case. if ((refNode->Point() == refNode->Parent()->Point()) && (queryNode.Point() == queryNode.Parent()->Point())) { ++numPrunes; continue; } // Score the node, to see if we can prune it, after restoring the traversal // info. rule.TraversalInfo() = frame.traversalInfo; double score = rule.Score(queryNode, *refNode); if (score == DBL_MAX) { ++numPrunes; continue; } // If not, compute the base case. rule.BaseCase(queryNode.Point(), pointVector[i].referenceNode->Point()); } } template template void CoverTree:: DualTreeTraverser::PruneMap( CoverTree& queryNode, std::map >& referenceMap, std::map >& childMap) { if (referenceMap.empty()) return; // Nothing to do. // Copy the zero set first. if ((*referenceMap.begin()).first == INT_MIN) { // Get a reference to the vector representing the entries at this scale. std::vector& scaleVector = (*referenceMap.begin()).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); const int thisScale = (*referenceMap.begin()).first; childMap[thisScale].reserve(scaleVector.size()); std::vector& newScaleVector = childMap[thisScale]; // Loop over each entry in the vector. for (size_t j = 0; j < scaleVector.size(); ++j) { const DualCoverTreeMapEntry& frame = scaleVector[j]; // First evaluate if we can prune without performing the base case. CoverTree* refNode = frame.referenceNode; // Perform the actual scoring, after restoring the traversal info. rule.TraversalInfo() = frame.traversalInfo; double score = rule.Score(queryNode, *refNode); if (score == DBL_MAX) { // Pruned. Move on. ++numPrunes; continue; } // If it isn't pruned, we must evaluate the base case. const double baseCase = rule.BaseCase(queryNode.Point(), refNode->Point()); // Add to child map. newScaleVector.push_back(frame); newScaleVector.back().score = score; newScaleVector.back().baseCase = baseCase; newScaleVector.back().traversalInfo = rule.TraversalInfo(); } // If we didn't add anything, then strike this vector from the map. if (newScaleVector.size() == 0) childMap.erase((*referenceMap.begin()).first); } typename std::map >::reverse_iterator it = referenceMap.rbegin(); while ((it != referenceMap.rend())) { const int thisScale = (*it).first; if (thisScale == INT_MIN) // We already did it. break; // Get a reference to the vector representing the entries at this scale. std::vector& scaleVector = (*it).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); childMap[thisScale].reserve(scaleVector.size()); std::vector& newScaleVector = childMap[thisScale]; // Loop over each entry in the vector. for (size_t j = 0; j < scaleVector.size(); ++j) { const DualCoverTreeMapEntry& frame = scaleVector[j]; // First evaluate if we can prune without performing the base case. CoverTree* refNode = frame.referenceNode; // Perform the actual scoring, after restoring the traversal info. rule.TraversalInfo() = frame.traversalInfo; double score = rule.Score(queryNode, *refNode); if (score == DBL_MAX) { // Pruned. Move on. ++numPrunes; continue; } // If it isn't pruned, we must evaluate the base case. const double baseCase = rule.BaseCase(queryNode.Point(), refNode->Point()); // Add to child map. newScaleVector.push_back(frame); newScaleVector.back().score = score; newScaleVector.back().baseCase = baseCase; newScaleVector.back().traversalInfo = rule.TraversalInfo(); } // If we didn't add anything, then strike this vector from the map. if (newScaleVector.size() == 0) childMap.erase((*it).first); ++it; // Advance to next scale. } } template template void CoverTree:: DualTreeTraverser::ReferenceRecursion( CoverTree& queryNode, std::map >& referenceMap) { // First, reduce the maximum scale in the reference map down to the scale of // the query node. while (!referenceMap.empty()) { // Hacky bullshit to imitate jl cover tree. if (queryNode.Parent() == NULL && (*referenceMap.rbegin()).first < queryNode.Scale()) break; if (queryNode.Parent() != NULL && (*referenceMap.rbegin()).first <= queryNode.Scale()) break; // If the query node's scale is INT_MIN and the reference map's maximum // scale is INT_MIN, don't try to recurse... if ((queryNode.Scale() == INT_MIN) && ((*referenceMap.rbegin()).first == INT_MIN)) break; // Get a reference to the current largest scale. std::vector& scaleVector = (*referenceMap.rbegin()).second; // Before traversing all the points in this scale, sort by score. std::sort(scaleVector.begin(), scaleVector.end()); // Now loop over each element. for (size_t i = 0; i < scaleVector.size(); ++i) { // Get a reference to the current element. const DualCoverTreeMapEntry& frame = scaleVector.at(i); CoverTree* refNode = frame.referenceNode; // Create the score for the children. double score = rule.Rescore(queryNode, *refNode, frame.score); // Now if this childScore is DBL_MAX we can prune all children. In this // recursion setup pruning is all or nothing for children. if (score == DBL_MAX) { ++numPrunes; continue; } // If it is not pruned, we must evaluate the base case. // Add the children. for (size_t j = 0; j < refNode->NumChildren(); ++j) { rule.TraversalInfo() = frame.traversalInfo; double childScore = rule.Score(queryNode, refNode->Child(j)); if (childScore == DBL_MAX) { ++numPrunes; continue; } // It wasn't pruned; evaluate the base case. const double baseCase = rule.BaseCase(queryNode.Point(), refNode->Child(j).Point()); DualCoverTreeMapEntry newFrame; newFrame.referenceNode = &refNode->Child(j); newFrame.score = childScore; // Use the score of the parent. newFrame.baseCase = baseCase; newFrame.traversalInfo = rule.TraversalInfo(); referenceMap[newFrame.referenceNode->Scale()].push_back(newFrame); } } // Now clear the memory for this scale; it isn't needed anymore. referenceMap.erase((*referenceMap.rbegin()).first); } } }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/tree_traits.hpp0000644000176200001440000000754213647512751022665 0ustar liggesusers/** * @file tree_traits.hpp * @author Ryan Curtin * * This file implements the basic, unspecialized TreeTraits class, which * provides information about tree types. If you create a tree class, you * should specialize this class with the characteristics of your tree. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_TREE_TRAITS_HPP #define __MLPACK_CORE_TREE_TREE_TRAITS_HPP namespace mlpack { namespace tree { /** * The TreeTraits class provides compile-time information on the characteristics * of a given tree type. These include traits such as whether or not a node * knows the distance to its parent node, or whether or not the subspaces * represented by children can overlap. * * These traits can be used for static compile-time optimization: * * @code * // This if statement will be optimized out at compile time! * if (TreeTraits::HasOverlappingChildren == false) * { * // Do a simpler computation because no children overlap. * } * else * { * // Do the full, complex calculation. * } * @endcode * * The traits can also be used in conjunction with SFINAE to write specialized * versions of functions: * * @code * template * void Compute(TreeType& node, * boost::enable_if< * TreeTraits::RearrangesDataset>::type*) * { * // Computation where special dataset-rearranging tree constructor is * // called. * } * * template * void Compute(TreeType& node, * boost::enable_if< * !TreeTraits::RearrangesDataset>::type*) * { * // Computation where normal tree constructor is called. * } * @endcode * * In those two examples, the boost::enable_if<> class takes a boolean template * parameter which allows that function to be called when the boolean is true. * * Each trait must be a static const value and not a function; only const values * can be used as template parameters (with the exception of constexprs, which * are a C++11 feature; but MLPACK is not using C++11). By default (the * unspecialized implementation of TreeTraits), each parameter is set to make as * few assumptions about the tree as possible; so, even if TreeTraits is not * specialized for a particular tree type, tree-based algorithms should still * work. * * When you write your own tree, you must specialize the TreeTraits class to * your tree type and set the corresponding values appropriately. See * mlpack/core/tree/binary_space_tree/traits.hpp for an example. */ template class TreeTraits { public: /** * This is true if the subspaces represented by the children of a node can * overlap. */ static const bool HasOverlappingChildren = true; /** * This is true if Point(0) is the centroid of the node. */ static const bool FirstPointIsCentroid = false; /** * This is true if the points contained in the first child of a node * (Child(0)) are also contained in that node. */ static const bool HasSelfChildren = false; /** * This is true if the tree rearranges points in the dataset when it is built. */ static const bool RearrangesDataset = false; }; }; // namespace tree }; // namespace mlpack #endif RcppMLPACK/inst/include/mlpack/core/tree/traversal_info.hpp0000644000176200001440000000764213647512751023357 0ustar liggesusers/** * @file traversal_info.hpp * @author Ryan Curtin * * This class will hold the traversal information for dual-tree traversals. A * dual-tree traversal should be updating the members of this class before * Score() is called. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_TREE_TRAVERSAL_INFO_HPP #define __MLPACK_CORE_TREE_TRAVERSAL_INFO_HPP /** * The TraversalInfo class holds traversal information which is used in * dual-tree (and single-tree) traversals. A traversal should be updating the * members of this class before Score() is called. This class should be held as * a member of the RuleType class and the interface to it should be through a * TraversalInfo() method. * * The information held by this class is the last node combination visited * before the current node combination was recursed into, and the score * resulting from when Score() was called on that combination. However, this * information is identical for a query node and a reference node in a * particular node combination, so traversals only need to update the * TraversalInfo object in a query node (and the algorithms should only use the * TraversalInfo object from a query node). * * In general, this auxiliary traversal information is used to try and make a * prune without needing to call BaseCase() or calculate the distance between * nodes. Using this information you can place bounds on the distance between * the two nodes quickly. * * If the traversal is not updating the members of this class correctly, a * likely result is a null pointer dereference. Dual-tree algorithms should * assume that the members are set properly and should not need to check for * null pointers. * * There is one exception, which is the root node combination; the score can be * set to 0 and the query and reference nodes can just be set to the root nodes; * no algorithm should be able to prune the root combination anyway. */ template class TraversalInfo { public: /** * Create the TraversalInfo object and initialize the pointers to NULL. */ TraversalInfo() : lastQueryNode(NULL), lastReferenceNode(NULL), lastScore(0.0), lastBaseCase(0.0) { /* Nothing to do. */ } //! Get the last query node. TreeType* LastQueryNode() const { return lastQueryNode; } //! Modify the last query node. TreeType*& LastQueryNode() { return lastQueryNode; } //! Get the last reference node. TreeType* LastReferenceNode() const { return lastReferenceNode; } //! Modify the last reference node. TreeType*& LastReferenceNode() { return lastReferenceNode; } //! Get the score associated with the last query and reference nodes. double LastScore() const { return lastScore; } //! Modify the score associated with the last query and reference nodes. double& LastScore() { return lastScore; } //! Get the base case associated with the last node combination. double LastBaseCase() const { return lastBaseCase; } //! Modify the base case associated with the last node combination. double& LastBaseCase() { return lastBaseCase; } private: //! The last query node. TreeType* lastQueryNode; //! The last reference node. TreeType* lastReferenceNode; //! The last score. double lastScore; //! The last base case. double lastBaseCase; }; #endif RcppMLPACK/inst/include/mlpack/core.hpp0000644000176200001440000001753513647512751017404 0ustar liggesusers/*** * @file core.hpp * * Include all of the base components required to write MLPACK methods, and the * main MLPACK Doxygen documentation. * * This file is part of MLPACK 1.0.10. * * MLPACK is free software: you can redistribute it and/or modify it under the * terms of the GNU Lesser General Public License as published by the Free * Software Foundation, either version 3 of the License, or (at your option) any * later version. * * MLPACK is distributed in the hope that it will be useful, but WITHOUT ANY * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR * A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more * details (LICENSE.txt). * * You should have received a copy of the GNU General Public License along with * MLPACK. If not, see . */ #ifndef __MLPACK_CORE_HPP #define __MLPACK_CORE_HPP #if _WIN64 #ifndef ARMA_64BIT_WORD #define ARMA_64BIT_WORD #endif #endif /** * @mainpage MLPACK Documentation * * @section intro_sec Introduction * * MLPACK is an intuitive, fast, scalable C++ machine learning library, meant to * be a machine learning analog to LAPACK. It aims to implement a wide array of * machine learning methods and function as a "swiss army knife" for machine * learning researchers. The MLPACK development website can be found at * http://mlpack.org. * * MLPACK uses the Armadillo C++ matrix library (http://arma.sourceforge.net) * for general matrix, vector, and linear algebra support. MLPACK also uses the * program_options, math_c99, and unit_test_framework components of the Boost * library; in addition, LibXml2 is used. * * @section howto How To Use This Documentation * * This documentation is API documentation similar to Javadoc. It isn't * necessarily a tutorial, but it does provide detailed documentation on every * namespace, method, and class. * * Each MLPACK namespace generally refers to one machine learning method, so * browsing the list of namespaces provides some insight as to the breadth of * the methods contained in the library. * * To generate this documentation in your own local copy of MLPACK, you can * simply use Doxygen, from the root directory (@c /mlpack/trunk/ ): * * @code * $ doxygen * @endcode * * @section executables Executables * * MLPACK provides several executables so that MLPACK methods can be used * without any need for knowledge of C++. These executables are all * self-documented, and that documentation can be accessed by running the * executables with the '-h' or '--help' flag. * * A full list of executables is given below: * * allkfn, allknn, emst, gmm, hmm_train, hmm_loglik, hmm_viterbi, hmm_generate, * kernel_pca, kmeans, lars, linear_regression, local_coordinate_coding, mvu, * nbc, nca, pca, radical, sparse_coding * * @section tutorial Tutorials * * A few short tutorials on how to use MLPACK are given below. * * - @ref build * - @ref matrices * - @ref iodoc * - @ref timer * - @ref sample * - @ref verinfo * * Tutorials on specific methods are also available. * * - @ref nstutorial * - @ref lrtutorial * - @ref rstutorial * - @ref dettutorial * - @ref emst_tutorial * - @ref kmtutorial * - @ref fmkstutorial * * @section methods Methods in MLPACK * * The following methods are included in MLPACK: * * - Decision Stump - mlpack::decision_stump::DecisionStump * - Density Estimation Trees - mlpack::det::DTree * - Euclidean Minimum Spanning Trees - mlpack::emst::DualTreeBoruvka * - Gaussian Mixture Models (GMMs) - mlpack::gmm::GMM * - Hidden Markov Models (HMMs) - mlpack::hmm::HMM * - Kernel PCA - mlpack::kpca::KernelPCA * - K-Means Clustering - mlpack::kmeans::KMeans * - Least-Angle Regression (LARS/LASSO) - mlpack::regression::LARS * - Local Coordinate Coding - mlpack::lcc::LocalCoordinateCoding * - Locality-Sensitive Hashing - mlpack::neighbor::LSHSearch * - Naive Bayes Classifier - mlpack::naive_bayes::NaiveBayesClassifier * - Neighborhood Components Analysis (NCA) - mlpack::nca::NCA * - Nonnegative Matrix Factorization (NMF) - mlpack::amf::AMF<> * - Nystroem Method - mlpack::kernel::NystroemMethod * - Perceptron - mlpack::perceptron::Perceptron * - Principal Components Analysis (PCA) - mlpack::pca::PCA * - QUIC-SVD - mlpack::svd::QUIC_SVD * - RADICAL (ICA) - mlpack::radical::Radical * - Regularized SVD - mlpack::svd::RegularizedSVD * - Simple Least-Squares Linear Regression - * mlpack::regression::LinearRegression * - Sparse Autoencoding - mlpack::nn::SparseAutoencoder * - Sparse Coding - mlpack::sparse_coding::SparseCoding * - Tree-based neighbor search (AllkNN, AllkFN) - * mlpack::neighbor::NeighborSearch * - Tree-based range search - mlpack::range::RangeSearch * * @section remarks Final Remarks * * MLPACK contributors include: * * - Ryan Curtin * - James Cline * - Neil Slagle * - Matthew Amidon * - Vlad Grantcharov * - Ajinkya Kale * - Bill March * - Dongryeol Lee * - Nishant Mehta * - Parikshit Ram * - Rajendran Mohan * - Trironk Kiatkungwanglai * - Patrick Mason * - Chip Mappus * - Hua Ouyang * - Long Quoc Tran * - Noah Kauffman * - Guillermo Colon * - Wei Guan * - Ryan Riegel * - Nikolaos Vasiloglou * - Garry Boyer * - Andreas Löf * - Marcus Edel * - Mudit Raj Gupta * - Sumedh Ghaisas * - Michael Fox * - Ryan Birmingham * - Siddharth Agrawal * - Saheb Motiani * - Yash Vadalia * - Abhishek Laddha * - Vahab Akbarzadeh * - Andrew Wells * - Zhihao Lou */ // First, include all of the prerequisites. #include // Now the core mlpack classes. #include #include //#include //#include //#include #include #include #include #include #include #include //#include #include #include // Include kernel traits. #include #include #include #include #include #include #include #include #include #include #include #endif // Clean up unfortunate Windows preprocessor definitions, even if this file was // already included. Use std::min and std::max! #ifdef _WIN32 #ifdef min #undef min #endif #ifdef max #undef max #endif #endif RcppMLPACK/inst/include/RcppMLPACK.h0000644000176200001440000000733513647512751016436 0ustar liggesusers#ifndef RcppMLPACK__RcppMLPACK__h #define RcppMLPACK__RcppMLPACK__h #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #endif RcppMLPACK/inst/skeleton/0000755000176200001440000000000013647512751014662 5ustar liggesusersRcppMLPACK/inst/skeleton/Makevars0000644000176200001440000000014313647512751016354 0ustar liggesusersPKG_LIBS= `$(R_HOME)/bin/Rscript -e "RcppMLPACK:::LdFlags()"` $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) RcppMLPACK/inst/skeleton/Makevars.win0000644000176200001440000000021013647512751017143 0ustar liggesusersPKG_LIBS= PKG_LIBS = $(shell "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" -e "RcppMLPACK:::LdFlags()") $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) RcppMLPACK/inst/skeleton/kmeans.cpp0000644000176200001440000000062613647512751016650 0ustar liggesusers#include "RcppMLPACK.h" using namespace mlpack::kmeans; using namespace Rcpp; // [[Rcpp::export]] List mlkmeans(const arma::mat& data, const int& clusters) { arma::Col assignments; // Initialize with the default arguments. KMeans<> k; k.Cluster(data, clusters, assignments); return List::create(_["clusters"] = clusters, _["result"] = assignments); } RcppMLPACK/cleanup0000755000176200001440000000023013647514343013430 0ustar liggesusersfind . -name \*~ -exec rm {} \; find . -name \*.o -exec rm {} \; find . -name \*.so -exec rm {} \; find . -name "#*#" -exec rm {} \; rm -rf ./inst/lib RcppMLPACK/configure0000755000176200001440000023704613647514343014003 0ustar liggesusers#! /bin/sh # Guess values for system-dependent variables and create Makefiles. # Generated by GNU Autoconf 2.69 for RcppMLPACK 1.0.10-4. # # # Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc. # # # This configure script is free software; the Free Software Foundation # gives unlimited permission to copy, distribute and modify it. ## -------------------- ## ## M4sh Initialization. ## ## -------------------- ## # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( *) : ;; esac fi as_nl=' ' export as_nl # Printing a long string crashes Solaris 7 /usr/bin/printf. as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo # Prefer a ksh shell builtin over an external printf program on Solaris, # but without wasting forks for bash or zsh. if test -z "$BASH_VERSION$ZSH_VERSION" \ && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then as_echo='print -r --' as_echo_n='print -rn --' elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then as_echo='printf %s\n' as_echo_n='printf %s' else if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' as_echo_n='/usr/ucb/echo -n' else as_echo_body='eval expr "X$1" : "X\\(.*\\)"' as_echo_n_body='eval arg=$1; case $arg in #( *"$as_nl"*) expr "X$arg" : "X\\(.*\\)$as_nl"; arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; esac; expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" ' export as_echo_n_body as_echo_n='sh -c $as_echo_n_body as_echo' fi export as_echo_body as_echo='sh -c $as_echo_body as_echo' fi # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || PATH_SEPARATOR=';' } fi # IFS # We need space, tab and new line, in precisely that order. Quoting is # there to prevent editors from complaining about space-tab. # (If _AS_PATH_WALK were called with IFS unset, it would disable word # splitting by setting IFS to empty value.) IFS=" "" $as_nl" # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done IFS=$as_save_IFS ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi # Unset variables that we do not need and which cause bugs (e.g. in # pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" # suppresses any "Segmentation fault" message there. '((' could # trigger a bug in pdksh 5.2.14. for as_var in BASH_ENV ENV MAIL MAILPATH do eval test x\${$as_var+set} = xset \ && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : done PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. LC_ALL=C export LC_ALL LANGUAGE=C export LANGUAGE # CDPATH. (unset CDPATH) >/dev/null 2>&1 && unset CDPATH # Use a proper internal environment variable to ensure we don't fall # into an infinite loop, continuously re-executing ourselves. if test x"${_as_can_reexec}" != xno && test "x$CONFIG_SHELL" != x; then _as_can_reexec=no; export _as_can_reexec; # We cannot yet assume a decent shell, so we have to provide a # neutralization value for shells without unset; and this also # works around shells that cannot unset nonexistent variables. # Preserve -v and -x to the replacement shell. BASH_ENV=/dev/null ENV=/dev/null (unset BASH_ENV) >/dev/null 2>&1 && unset BASH_ENV ENV case $- in # (((( *v*x* | *x*v* ) as_opts=-vx ;; *v* ) as_opts=-v ;; *x* ) as_opts=-x ;; * ) as_opts= ;; esac exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"} # Admittedly, this is quite paranoid, since all the known shells bail # out after a failed `exec'. $as_echo "$0: could not re-execute with $CONFIG_SHELL" >&2 as_fn_exit 255 fi # We don't want this to propagate to other subprocesses. { _as_can_reexec=; unset _as_can_reexec;} if test "x$CONFIG_SHELL" = x; then as_bourne_compatible="if test -n \"\${ZSH_VERSION+set}\" && (emulate sh) >/dev/null 2>&1; then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on \${1+\"\$@\"}, which # is contrary to our usage. Disable this feature. alias -g '\${1+\"\$@\"}'='\"\$@\"' setopt NO_GLOB_SUBST else case \`(set -o) 2>/dev/null\` in #( *posix*) : set -o posix ;; #( *) : ;; esac fi " as_required="as_fn_return () { (exit \$1); } as_fn_success () { as_fn_return 0; } as_fn_failure () { as_fn_return 1; } as_fn_ret_success () { return 0; } as_fn_ret_failure () { return 1; } exitcode=0 as_fn_success || { exitcode=1; echo as_fn_success failed.; } as_fn_failure && { exitcode=1; echo as_fn_failure succeeded.; } as_fn_ret_success || { exitcode=1; echo as_fn_ret_success failed.; } as_fn_ret_failure && { exitcode=1; echo as_fn_ret_failure succeeded.; } if ( set x; as_fn_ret_success y && test x = \"\$1\" ); then : else exitcode=1; echo positional parameters were not saved. fi test x\$exitcode = x0 || exit 1 test -x / || exit 1" as_suggested=" as_lineno_1=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_1a=\$LINENO as_lineno_2=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_2a=\$LINENO eval 'test \"x\$as_lineno_1'\$as_run'\" != \"x\$as_lineno_2'\$as_run'\" && test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1" if (eval "$as_required") 2>/dev/null; then : as_have_required=yes else as_have_required=no fi if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null; then : else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR as_found=false for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. as_found=: case $as_dir in #( /*) for as_base in sh bash ksh sh5; do # Try only shells that exist, to save several forks. as_shell=$as_dir/$as_base if { test -f "$as_shell" || test -f "$as_shell.exe"; } && { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$as_shell"; } 2>/dev/null; then : CONFIG_SHELL=$as_shell as_have_required=yes if { $as_echo "$as_bourne_compatible""$as_suggested" | as_run=a "$as_shell"; } 2>/dev/null; then : break 2 fi fi done;; esac as_found=false done $as_found || { if { test -f "$SHELL" || test -f "$SHELL.exe"; } && { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$SHELL"; } 2>/dev/null; then : CONFIG_SHELL=$SHELL as_have_required=yes fi; } IFS=$as_save_IFS if test "x$CONFIG_SHELL" != x; then : export CONFIG_SHELL # We cannot yet assume a decent shell, so we have to provide a # neutralization value for shells without unset; and this also # works around shells that cannot unset nonexistent variables. # Preserve -v and -x to the replacement shell. BASH_ENV=/dev/null ENV=/dev/null (unset BASH_ENV) >/dev/null 2>&1 && unset BASH_ENV ENV case $- in # (((( *v*x* | *x*v* ) as_opts=-vx ;; *v* ) as_opts=-v ;; *x* ) as_opts=-x ;; * ) as_opts= ;; esac exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"} # Admittedly, this is quite paranoid, since all the known shells bail # out after a failed `exec'. $as_echo "$0: could not re-execute with $CONFIG_SHELL" >&2 exit 255 fi if test x$as_have_required = xno; then : $as_echo "$0: This script requires a shell more modern than all" $as_echo "$0: the shells that I found on your system." if test x${ZSH_VERSION+set} = xset ; then $as_echo "$0: In particular, zsh $ZSH_VERSION has bugs and should" $as_echo "$0: be upgraded to zsh 4.3.4 or later." else $as_echo "$0: Please tell bug-autoconf@gnu.org about your system, $0: including any error possibly output before this $0: message. Then install a modern shell, or manually run $0: the script under such a shell if you do have one." fi exit 1 fi fi fi SHELL=${CONFIG_SHELL-/bin/sh} export SHELL # Unset more variables known to interfere with behavior of common tools. CLICOLOR_FORCE= GREP_OPTIONS= unset CLICOLOR_FORCE GREP_OPTIONS ## --------------------- ## ## M4sh Shell Functions. ## ## --------------------- ## # as_fn_unset VAR # --------------- # Portably unset VAR. as_fn_unset () { { eval $1=; unset $1;} } as_unset=as_fn_unset # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. as_fn_set_status () { return $1 } # as_fn_set_status # as_fn_exit STATUS # ----------------- # Exit the shell with STATUS, even in a "trap 0" or "set -e" context. as_fn_exit () { set +e as_fn_set_status $1 exit $1 } # as_fn_exit # as_fn_mkdir_p # ------------- # Create "$as_dir" as a directory, including parents if necessary. as_fn_mkdir_p () { case $as_dir in #( -*) as_dir=./$as_dir;; esac test -d "$as_dir" || eval $as_mkdir_p || { as_dirs= while :; do case $as_dir in #( *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" as_dir=`$as_dirname -- "$as_dir" || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || $as_echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` test -d "$as_dir" && break done test -z "$as_dirs" || eval "mkdir $as_dirs" } || test -d "$as_dir" || as_fn_error $? "cannot create directory $as_dir" } # as_fn_mkdir_p # as_fn_executable_p FILE # ----------------------- # Test if FILE is an executable regular file. as_fn_executable_p () { test -f "$1" && test -x "$1" } # as_fn_executable_p # as_fn_append VAR VALUE # ---------------------- # Append the text in VALUE to the end of the definition contained in VAR. Take # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : eval 'as_fn_append () { eval $1+=\$2 }' else as_fn_append () { eval $1=\$$1\$2 } fi # as_fn_append # as_fn_arith ARG... # ------------------ # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : eval 'as_fn_arith () { as_val=$(( $* )) }' else as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` } fi # as_fn_arith # as_fn_error STATUS ERROR [LINENO LOG_FD] # ---------------------------------------- # Output "`basename $0`: error: ERROR" to stderr. If LINENO and LOG_FD are # provided, also output the error to LOG_FD, referencing LINENO. Then exit the # script with STATUS, using 1 if that was 0. as_fn_error () { as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi $as_echo "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi if (basename -- /) >/dev/null 2>&1 && test "X`basename -- / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi if (as_dir=`dirname -- /` && test "X$as_dir" = X/) >/dev/null 2>&1; then as_dirname=dirname else as_dirname=false fi as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || $as_echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q } /^X\/\(\/\/\)$/{ s//\1/ q } /^X\/\(\/\).*/{ s//\1/ q } s/.*/./; q'` # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits as_lineno_1=$LINENO as_lineno_1a=$LINENO as_lineno_2=$LINENO as_lineno_2a=$LINENO eval 'test "x$as_lineno_1'$as_run'" != "x$as_lineno_2'$as_run'" && test "x`expr $as_lineno_1'$as_run' + 1`" = "x$as_lineno_2'$as_run'"' || { # Blame Lee E. McMahon (1931-1989) for sed's syntax. :-) sed -n ' p /[$]LINENO/= ' <$as_myself | sed ' s/[$]LINENO.*/&-/ t lineno b :lineno N :loop s/[$]LINENO\([^'$as_cr_alnum'_].*\n\)\(.*\)/\2\1\2/ t loop s/-\n.*// ' >$as_me.lineno && chmod +x "$as_me.lineno" || { $as_echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2; as_fn_exit 1; } # If we had to re-execute with $CONFIG_SHELL, we're ensured to have # already done that, so ensure we don't try to do so again and fall # in an infinite loop. This has already happened in practice. _as_can_reexec=no; export _as_can_reexec # Don't try to exec as it changes $[0], causing all sort of problems # (the dirname of $[0] is not the place where we might find the # original and so on. Autoconf is especially sensitive to this). . "./$as_me.lineno" # Exit status is that of the last command. exit } ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) case `echo 'xy\c'` in *c*) ECHO_T=' ';; # ECHO_T is single tab character. xy) ECHO_C='\c';; *) echo `echo ksh88 bug on AIX 6.1` > /dev/null ECHO_T=' ';; esac;; *) ECHO_N='-n';; esac rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file else rm -f conf$$.dir mkdir conf$$.dir 2>/dev/null fi if (echo >conf$$.file) 2>/dev/null; then if ln -s conf$$.file conf$$ 2>/dev/null; then as_ln_s='ln -s' # ... but there are two gotchas: # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. # In both cases, we have to default to `cp -pR'. ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || as_ln_s='cp -pR' elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -pR' fi else as_ln_s='cp -pR' fi rm -f conf$$ conf$$.exe conf$$.dir/conf$$.file conf$$.file rmdir conf$$.dir 2>/dev/null if mkdir -p . 2>/dev/null; then as_mkdir_p='mkdir -p "$as_dir"' else test -d ./-p && rmdir ./-p as_mkdir_p=false fi as_test_x='test -x' as_executable_p=as_fn_executable_p # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" test -n "$DJDIR" || exec 7<&0 &1 # Name of the host. # hostname on some systems (SVR3.2, old GNU/Linux) returns a bogus exit status, # so uname gets run too. ac_hostname=`(hostname || uname -n) 2>/dev/null | sed 1q` # # Initializations. # ac_default_prefix=/usr/local ac_clean_files= ac_config_libobj_dir=. LIBOBJS= cross_compiling=no subdirs= MFLAGS= MAKEFLAGS= # Identity of this package. PACKAGE_NAME='RcppMLPACK' PACKAGE_TARNAME='rcppmlpack' PACKAGE_VERSION='1.0.10-4' PACKAGE_STRING='RcppMLPACK 1.0.10-4' PACKAGE_BUGREPORT='' PACKAGE_URL='' ac_subst_vars='CXXCPP OBJEXT EXEEXT ac_ct_CXX CPPFLAGS LDFLAGS CXXFLAGS CXX target_alias host_alias build_alias LIBS ECHO_T ECHO_N ECHO_C DEFS mandir localedir libdir psdir pdfdir dvidir htmldir infodir docdir oldincludedir includedir localstatedir sharedstatedir sysconfdir datadir datarootdir libexecdir sbindir bindir program_transform_name prefix exec_prefix PACKAGE_URL PACKAGE_BUGREPORT PACKAGE_STRING PACKAGE_VERSION PACKAGE_TARNAME PACKAGE_NAME PATH_SEPARATOR SHELL' ac_subst_files='' ac_user_opts=' enable_option_checking ' ac_precious_vars='build_alias host_alias target_alias CXX CXXFLAGS LDFLAGS LIBS CPPFLAGS CCC CXXCPP' # Initialize some variables set by options. ac_init_help= ac_init_version=false ac_unrecognized_opts= ac_unrecognized_sep= # The variables have the same names as the options, with # dashes changed to underlines. cache_file=/dev/null exec_prefix=NONE no_create= no_recursion= prefix=NONE program_prefix=NONE program_suffix=NONE program_transform_name=s,x,x, silent= site= srcdir= verbose= x_includes=NONE x_libraries=NONE # Installation directory options. # These are left unexpanded so users can "make install exec_prefix=/foo" # and all the variables that are supposed to be based on exec_prefix # by default will actually change. # Use braces instead of parens because sh, perl, etc. also accept them. # (The list follows the same order as the GNU Coding Standards.) bindir='${exec_prefix}/bin' sbindir='${exec_prefix}/sbin' libexecdir='${exec_prefix}/libexec' datarootdir='${prefix}/share' datadir='${datarootdir}' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' includedir='${prefix}/include' oldincludedir='/usr/include' docdir='${datarootdir}/doc/${PACKAGE_TARNAME}' infodir='${datarootdir}/info' htmldir='${docdir}' dvidir='${docdir}' pdfdir='${docdir}' psdir='${docdir}' libdir='${exec_prefix}/lib' localedir='${datarootdir}/locale' mandir='${datarootdir}/man' ac_prev= ac_dashdash= for ac_option do # If the previous option needs an argument, assign it. if test -n "$ac_prev"; then eval $ac_prev=\$ac_option ac_prev= continue fi case $ac_option in *=?*) ac_optarg=`expr "X$ac_option" : '[^=]*=\(.*\)'` ;; *=) ac_optarg= ;; *) ac_optarg=yes ;; esac # Accept the important Cygnus configure options, so we can diagnose typos. case $ac_dashdash$ac_option in --) ac_dashdash=yes ;; -bindir | --bindir | --bindi | --bind | --bin | --bi) ac_prev=bindir ;; -bindir=* | --bindir=* | --bindi=* | --bind=* | --bin=* | --bi=*) bindir=$ac_optarg ;; -build | --build | --buil | --bui | --bu) ac_prev=build_alias ;; -build=* | --build=* | --buil=* | --bui=* | --bu=*) build_alias=$ac_optarg ;; -cache-file | --cache-file | --cache-fil | --cache-fi \ | --cache-f | --cache- | --cache | --cach | --cac | --ca | --c) ac_prev=cache_file ;; -cache-file=* | --cache-file=* | --cache-fil=* | --cache-fi=* \ | --cache-f=* | --cache-=* | --cache=* | --cach=* | --cac=* | --ca=* | --c=*) cache_file=$ac_optarg ;; --config-cache | -C) cache_file=config.cache ;; -datadir | --datadir | --datadi | --datad) ac_prev=datadir ;; -datadir=* | --datadir=* | --datadi=* | --datad=*) datadir=$ac_optarg ;; -datarootdir | --datarootdir | --datarootdi | --datarootd | --dataroot \ | --dataroo | --dataro | --datar) ac_prev=datarootdir ;; -datarootdir=* | --datarootdir=* | --datarootdi=* | --datarootd=* \ | --dataroot=* | --dataroo=* | --dataro=* | --datar=*) datarootdir=$ac_optarg ;; -disable-* | --disable-*) ac_useropt=`expr "x$ac_option" : 'x-*disable-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid feature name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--disable-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval enable_$ac_useropt=no ;; -docdir | --docdir | --docdi | --doc | --do) ac_prev=docdir ;; -docdir=* | --docdir=* | --docdi=* | --doc=* | --do=*) docdir=$ac_optarg ;; -dvidir | --dvidir | --dvidi | --dvid | --dvi | --dv) ac_prev=dvidir ;; -dvidir=* | --dvidir=* | --dvidi=* | --dvid=* | --dvi=* | --dv=*) dvidir=$ac_optarg ;; -enable-* | --enable-*) ac_useropt=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid feature name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--enable-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval enable_$ac_useropt=\$ac_optarg ;; -exec-prefix | --exec_prefix | --exec-prefix | --exec-prefi \ | --exec-pref | --exec-pre | --exec-pr | --exec-p | --exec- \ | --exec | --exe | --ex) ac_prev=exec_prefix ;; -exec-prefix=* | --exec_prefix=* | --exec-prefix=* | --exec-prefi=* \ | --exec-pref=* | --exec-pre=* | --exec-pr=* | --exec-p=* | --exec-=* \ | --exec=* | --exe=* | --ex=*) exec_prefix=$ac_optarg ;; -gas | --gas | --ga | --g) # Obsolete; use --with-gas. with_gas=yes ;; -help | --help | --hel | --he | -h) ac_init_help=long ;; -help=r* | --help=r* | --hel=r* | --he=r* | -hr*) ac_init_help=recursive ;; -help=s* | --help=s* | --hel=s* | --he=s* | -hs*) ac_init_help=short ;; -host | --host | --hos | --ho) ac_prev=host_alias ;; -host=* | --host=* | --hos=* | --ho=*) host_alias=$ac_optarg ;; -htmldir | --htmldir | --htmldi | --htmld | --html | --htm | --ht) ac_prev=htmldir ;; -htmldir=* | --htmldir=* | --htmldi=* | --htmld=* | --html=* | --htm=* \ | --ht=*) htmldir=$ac_optarg ;; -includedir | --includedir | --includedi | --included | --include \ | --includ | --inclu | --incl | --inc) ac_prev=includedir ;; -includedir=* | --includedir=* | --includedi=* | --included=* | --include=* \ | --includ=* | --inclu=* | --incl=* | --inc=*) includedir=$ac_optarg ;; -infodir | --infodir | --infodi | --infod | --info | --inf) ac_prev=infodir ;; -infodir=* | --infodir=* | --infodi=* | --infod=* | --info=* | --inf=*) infodir=$ac_optarg ;; -libdir | --libdir | --libdi | --libd) ac_prev=libdir ;; -libdir=* | --libdir=* | --libdi=* | --libd=*) libdir=$ac_optarg ;; -libexecdir | --libexecdir | --libexecdi | --libexecd | --libexec \ | --libexe | --libex | --libe) ac_prev=libexecdir ;; -libexecdir=* | --libexecdir=* | --libexecdi=* | --libexecd=* | --libexec=* \ | --libexe=* | --libex=* | --libe=*) libexecdir=$ac_optarg ;; -localedir | --localedir | --localedi | --localed | --locale) ac_prev=localedir ;; -localedir=* | --localedir=* | --localedi=* | --localed=* | --locale=*) localedir=$ac_optarg ;; -localstatedir | --localstatedir | --localstatedi | --localstated \ | --localstate | --localstat | --localsta | --localst | --locals) ac_prev=localstatedir ;; -localstatedir=* | --localstatedir=* | --localstatedi=* | --localstated=* \ | --localstate=* | --localstat=* | --localsta=* | --localst=* | --locals=*) localstatedir=$ac_optarg ;; -mandir | --mandir | --mandi | --mand | --man | --ma | --m) ac_prev=mandir ;; -mandir=* | --mandir=* | --mandi=* | --mand=* | --man=* | --ma=* | --m=*) mandir=$ac_optarg ;; -nfp | --nfp | --nf) # Obsolete; use --without-fp. with_fp=no ;; -no-create | --no-create | --no-creat | --no-crea | --no-cre \ | --no-cr | --no-c | -n) no_create=yes ;; -no-recursion | --no-recursion | --no-recursio | --no-recursi \ | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) no_recursion=yes ;; -oldincludedir | --oldincludedir | --oldincludedi | --oldincluded \ | --oldinclude | --oldinclud | --oldinclu | --oldincl | --oldinc \ | --oldin | --oldi | --old | --ol | --o) ac_prev=oldincludedir ;; -oldincludedir=* | --oldincludedir=* | --oldincludedi=* | --oldincluded=* \ | --oldinclude=* | --oldinclud=* | --oldinclu=* | --oldincl=* | --oldinc=* \ | --oldin=* | --oldi=* | --old=* | --ol=* | --o=*) oldincludedir=$ac_optarg ;; -prefix | --prefix | --prefi | --pref | --pre | --pr | --p) ac_prev=prefix ;; -prefix=* | --prefix=* | --prefi=* | --pref=* | --pre=* | --pr=* | --p=*) prefix=$ac_optarg ;; -program-prefix | --program-prefix | --program-prefi | --program-pref \ | --program-pre | --program-pr | --program-p) ac_prev=program_prefix ;; -program-prefix=* | --program-prefix=* | --program-prefi=* \ | --program-pref=* | --program-pre=* | --program-pr=* | --program-p=*) program_prefix=$ac_optarg ;; -program-suffix | --program-suffix | --program-suffi | --program-suff \ | --program-suf | --program-su | --program-s) ac_prev=program_suffix ;; -program-suffix=* | --program-suffix=* | --program-suffi=* \ | --program-suff=* | --program-suf=* | --program-su=* | --program-s=*) program_suffix=$ac_optarg ;; -program-transform-name | --program-transform-name \ | --program-transform-nam | --program-transform-na \ | --program-transform-n | --program-transform- \ | --program-transform | --program-transfor \ | --program-transfo | --program-transf \ | --program-trans | --program-tran \ | --progr-tra | --program-tr | --program-t) ac_prev=program_transform_name ;; -program-transform-name=* | --program-transform-name=* \ | --program-transform-nam=* | --program-transform-na=* \ | --program-transform-n=* | --program-transform-=* \ | --program-transform=* | --program-transfor=* \ | --program-transfo=* | --program-transf=* \ | --program-trans=* | --program-tran=* \ | --progr-tra=* | --program-tr=* | --program-t=*) program_transform_name=$ac_optarg ;; -pdfdir | --pdfdir | --pdfdi | --pdfd | --pdf | --pd) ac_prev=pdfdir ;; -pdfdir=* | --pdfdir=* | --pdfdi=* | --pdfd=* | --pdf=* | --pd=*) pdfdir=$ac_optarg ;; -psdir | --psdir | --psdi | --psd | --ps) ac_prev=psdir ;; -psdir=* | --psdir=* | --psdi=* | --psd=* | --ps=*) psdir=$ac_optarg ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) silent=yes ;; -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ | --sbi=* | --sb=*) sbindir=$ac_optarg ;; -sharedstatedir | --sharedstatedir | --sharedstatedi \ | --sharedstated | --sharedstate | --sharedstat | --sharedsta \ | --sharedst | --shareds | --shared | --share | --shar \ | --sha | --sh) ac_prev=sharedstatedir ;; -sharedstatedir=* | --sharedstatedir=* | --sharedstatedi=* \ | --sharedstated=* | --sharedstate=* | --sharedstat=* | --sharedsta=* \ | --sharedst=* | --shareds=* | --shared=* | --share=* | --shar=* \ | --sha=* | --sh=*) sharedstatedir=$ac_optarg ;; -site | --site | --sit) ac_prev=site ;; -site=* | --site=* | --sit=*) site=$ac_optarg ;; -srcdir | --srcdir | --srcdi | --srcd | --src | --sr) ac_prev=srcdir ;; -srcdir=* | --srcdir=* | --srcdi=* | --srcd=* | --src=* | --sr=*) srcdir=$ac_optarg ;; -sysconfdir | --sysconfdir | --sysconfdi | --sysconfd | --sysconf \ | --syscon | --sysco | --sysc | --sys | --sy) ac_prev=sysconfdir ;; -sysconfdir=* | --sysconfdir=* | --sysconfdi=* | --sysconfd=* | --sysconf=* \ | --syscon=* | --sysco=* | --sysc=* | --sys=* | --sy=*) sysconfdir=$ac_optarg ;; -target | --target | --targe | --targ | --tar | --ta | --t) ac_prev=target_alias ;; -target=* | --target=* | --targe=* | --targ=* | --tar=* | --ta=* | --t=*) target_alias=$ac_optarg ;; -v | -verbose | --verbose | --verbos | --verbo | --verb) verbose=yes ;; -version | --version | --versio | --versi | --vers | -V) ac_init_version=: ;; -with-* | --with-*) ac_useropt=`expr "x$ac_option" : 'x-*with-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid package name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--with-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval with_$ac_useropt=\$ac_optarg ;; -without-* | --without-*) ac_useropt=`expr "x$ac_option" : 'x-*without-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && as_fn_error $? "invalid package name: $ac_useropt" ac_useropt_orig=$ac_useropt ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" "*) ;; *) ac_unrecognized_opts="$ac_unrecognized_opts$ac_unrecognized_sep--without-$ac_useropt_orig" ac_unrecognized_sep=', ';; esac eval with_$ac_useropt=no ;; --x) # Obsolete; use --with-x. with_x=yes ;; -x-includes | --x-includes | --x-include | --x-includ | --x-inclu \ | --x-incl | --x-inc | --x-in | --x-i) ac_prev=x_includes ;; -x-includes=* | --x-includes=* | --x-include=* | --x-includ=* | --x-inclu=* \ | --x-incl=* | --x-inc=* | --x-in=* | --x-i=*) x_includes=$ac_optarg ;; -x-libraries | --x-libraries | --x-librarie | --x-librari \ | --x-librar | --x-libra | --x-libr | --x-lib | --x-li | --x-l) ac_prev=x_libraries ;; -x-libraries=* | --x-libraries=* | --x-librarie=* | --x-librari=* \ | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*) x_libraries=$ac_optarg ;; -*) as_fn_error $? "unrecognized option: \`$ac_option' Try \`$0 --help' for more information" ;; *=*) ac_envvar=`expr "x$ac_option" : 'x\([^=]*\)='` # Reject names that are not valid shell variable names. case $ac_envvar in #( '' | [0-9]* | *[!_$as_cr_alnum]* ) as_fn_error $? "invalid variable name: \`$ac_envvar'" ;; esac eval $ac_envvar=\$ac_optarg export $ac_envvar ;; *) # FIXME: should be removed in autoconf 3.0. $as_echo "$as_me: WARNING: you should use --build, --host, --target" >&2 expr "x$ac_option" : ".*[^-._$as_cr_alnum]" >/dev/null && $as_echo "$as_me: WARNING: invalid host type: $ac_option" >&2 : "${build_alias=$ac_option} ${host_alias=$ac_option} ${target_alias=$ac_option}" ;; esac done if test -n "$ac_prev"; then ac_option=--`echo $ac_prev | sed 's/_/-/g'` as_fn_error $? "missing argument to $ac_option" fi if test -n "$ac_unrecognized_opts"; then case $enable_option_checking in no) ;; fatal) as_fn_error $? "unrecognized options: $ac_unrecognized_opts" ;; *) $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2 ;; esac fi # Check all directory arguments for consistency. for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \ datadir sysconfdir sharedstatedir localstatedir includedir \ oldincludedir docdir infodir htmldir dvidir pdfdir psdir \ libdir localedir mandir do eval ac_val=\$$ac_var # Remove trailing slashes. case $ac_val in */ ) ac_val=`expr "X$ac_val" : 'X\(.*[^/]\)' \| "X$ac_val" : 'X\(.*\)'` eval $ac_var=\$ac_val;; esac # Be sure to have absolute directory names. case $ac_val in [\\/$]* | ?:[\\/]* ) continue;; NONE | '' ) case $ac_var in *prefix ) continue;; esac;; esac as_fn_error $? "expected an absolute directory name for --$ac_var: $ac_val" done # There might be people who depend on the old broken behavior: `$host' # used to hold the argument of --host etc. # FIXME: To remove some day. build=$build_alias host=$host_alias target=$target_alias # FIXME: To remove some day. if test "x$host_alias" != x; then if test "x$build_alias" = x; then cross_compiling=maybe elif test "x$build_alias" != "x$host_alias"; then cross_compiling=yes fi fi ac_tool_prefix= test -n "$host_alias" && ac_tool_prefix=$host_alias- test "$silent" = yes && exec 6>/dev/null ac_pwd=`pwd` && test -n "$ac_pwd" && ac_ls_di=`ls -di .` && ac_pwd_ls_di=`cd "$ac_pwd" && ls -di .` || as_fn_error $? "working directory cannot be determined" test "X$ac_ls_di" = "X$ac_pwd_ls_di" || as_fn_error $? "pwd does not report name of working directory" # Find the source files, if location was not specified. if test -z "$srcdir"; then ac_srcdir_defaulted=yes # Try the directory containing this script, then the parent directory. ac_confdir=`$as_dirname -- "$as_myself" || $as_expr X"$as_myself" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_myself" : 'X\(//\)[^/]' \| \ X"$as_myself" : 'X\(//\)$' \| \ X"$as_myself" : 'X\(/\)' \| . 2>/dev/null || $as_echo X"$as_myself" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` srcdir=$ac_confdir if test ! -r "$srcdir/$ac_unique_file"; then srcdir=.. fi else ac_srcdir_defaulted=no fi if test ! -r "$srcdir/$ac_unique_file"; then test "$ac_srcdir_defaulted" = yes && srcdir="$ac_confdir or .." as_fn_error $? "cannot find sources ($ac_unique_file) in $srcdir" fi ac_msg="sources are in $srcdir, but \`cd $srcdir' does not work" ac_abs_confdir=`( cd "$srcdir" && test -r "./$ac_unique_file" || as_fn_error $? "$ac_msg" pwd)` # When building in place, set srcdir=. if test "$ac_abs_confdir" = "$ac_pwd"; then srcdir=. fi # Remove unnecessary trailing slashes from srcdir. # Double slashes in file names in object file debugging info # mess up M-x gdb in Emacs. case $srcdir in */) srcdir=`expr "X$srcdir" : 'X\(.*[^/]\)' \| "X$srcdir" : 'X\(.*\)'`;; esac for ac_var in $ac_precious_vars; do eval ac_env_${ac_var}_set=\${${ac_var}+set} eval ac_env_${ac_var}_value=\$${ac_var} eval ac_cv_env_${ac_var}_set=\${${ac_var}+set} eval ac_cv_env_${ac_var}_value=\$${ac_var} done # # Report the --help message. # if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF \`configure' configures RcppMLPACK 1.0.10-4 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... To assign environment variables (e.g., CC, CFLAGS...), specify them as VAR=VALUE. See below for descriptions of some of the useful variables. Defaults for the options are specified in brackets. Configuration: -h, --help display this help and exit --help=short display options specific to this package --help=recursive display the short help of all the included packages -V, --version display version information and exit -q, --quiet, --silent do not print \`checking ...' messages --cache-file=FILE cache test results in FILE [disabled] -C, --config-cache alias for \`--cache-file=config.cache' -n, --no-create do not create output files --srcdir=DIR find the sources in DIR [configure dir or \`..'] Installation directories: --prefix=PREFIX install architecture-independent files in PREFIX [$ac_default_prefix] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [PREFIX] By default, \`make install' will install all the files in \`$ac_default_prefix/bin', \`$ac_default_prefix/lib' etc. You can specify an installation prefix other than \`$ac_default_prefix' using \`--prefix', for instance \`--prefix=\$HOME'. For better control, use the options below. Fine tuning of the installation directories: --bindir=DIR user executables [EPREFIX/bin] --sbindir=DIR system admin executables [EPREFIX/sbin] --libexecdir=DIR program executables [EPREFIX/libexec] --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] --datarootdir=DIR read-only arch.-independent data root [PREFIX/share] --datadir=DIR read-only architecture-independent data [DATAROOTDIR] --infodir=DIR info documentation [DATAROOTDIR/info] --localedir=DIR locale-dependent data [DATAROOTDIR/locale] --mandir=DIR man documentation [DATAROOTDIR/man] --docdir=DIR documentation root [DATAROOTDIR/doc/rcppmlpack] --htmldir=DIR html documentation [DOCDIR] --dvidir=DIR dvi documentation [DOCDIR] --pdfdir=DIR pdf documentation [DOCDIR] --psdir=DIR ps documentation [DOCDIR] _ACEOF cat <<\_ACEOF _ACEOF fi if test -n "$ac_init_help"; then case $ac_init_help in short | recursive ) echo "Configuration of RcppMLPACK 1.0.10-4:";; esac cat <<\_ACEOF Some influential environment variables: CXX C++ compiler command CXXFLAGS C++ compiler flags LDFLAGS linker flags, e.g. -L if you have libraries in a nonstandard directory LIBS libraries to pass to the linker, e.g. -l CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I if you have headers in a nonstandard directory CXXCPP C++ preprocessor Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. Report bugs to the package provider. _ACEOF ac_status=$? fi if test "$ac_init_help" = "recursive"; then # If there are subdirs, report their specific --help. for ac_dir in : $ac_subdirs_all; do test "x$ac_dir" = x: && continue test -d "$ac_dir" || { cd "$srcdir" && ac_pwd=`pwd` && srcdir=. && test -d "$ac_dir"; } || continue ac_builddir=. case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; esac ;; esac ac_abs_top_builddir=$ac_pwd ac_abs_builddir=$ac_pwd$ac_dir_suffix # for backward compatibility: ac_top_builddir=$ac_top_build_prefix case $srcdir in .) # We are building in place. ac_srcdir=. ac_top_srcdir=$ac_top_builddir_sub ac_abs_top_srcdir=$ac_pwd ;; [\\/]* | ?:[\\/]* ) # Absolute name. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ac_abs_top_srcdir=$srcdir ;; *) # Relative name. ac_srcdir=$ac_top_build_prefix$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_build_prefix$srcdir ac_abs_top_srcdir=$ac_pwd/$srcdir ;; esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix cd "$ac_dir" || { ac_status=$?; continue; } # Check for guested configure. if test -f "$ac_srcdir/configure.gnu"; then echo && $SHELL "$ac_srcdir/configure.gnu" --help=recursive elif test -f "$ac_srcdir/configure"; then echo && $SHELL "$ac_srcdir/configure" --help=recursive else $as_echo "$as_me: WARNING: no configuration information is in $ac_dir" >&2 fi || ac_status=$? cd "$ac_pwd" || { ac_status=$?; break; } done fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF RcppMLPACK configure 1.0.10-4 generated by GNU Autoconf 2.69 Copyright (C) 2012 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. _ACEOF exit fi ## ------------------------ ## ## Autoconf initialization. ## ## ------------------------ ## # ac_fn_cxx_try_compile LINENO # ---------------------------- # Try to compile conftest.$ac_ext, and return whether this succeeded. ac_fn_cxx_try_compile () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack rm -f conftest.$ac_objext if { { ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>conftest.err ac_status=$? if test -s conftest.err; then grep -v '^ *+' conftest.err >conftest.er1 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { test -z "$ac_cxx_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then : ac_retval=0 else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 fi eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno as_fn_set_status $ac_retval } # ac_fn_cxx_try_compile # ac_fn_cxx_try_cpp LINENO # ------------------------ # Try to preprocess conftest.$ac_ext, and return whether this succeeded. ac_fn_cxx_try_cpp () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack if { { ac_try="$ac_cpp conftest.$ac_ext" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.err ac_status=$? if test -s conftest.err; then grep -v '^ *+' conftest.err >conftest.er1 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } > conftest.i && { test -z "$ac_cxx_preproc_warn_flag$ac_cxx_werror_flag" || test ! -s conftest.err }; then : ac_retval=0 else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 fi eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno as_fn_set_status $ac_retval } # ac_fn_cxx_try_cpp cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by RcppMLPACK $as_me 1.0.10-4, which was generated by GNU Autoconf 2.69. Invocation command line was $ $0 $@ _ACEOF exec 5>>config.log { cat <<_ASUNAME ## --------- ## ## Platform. ## ## --------- ## hostname = `(hostname || uname -n) 2>/dev/null | sed 1q` uname -m = `(uname -m) 2>/dev/null || echo unknown` uname -r = `(uname -r) 2>/dev/null || echo unknown` uname -s = `(uname -s) 2>/dev/null || echo unknown` uname -v = `(uname -v) 2>/dev/null || echo unknown` /usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null || echo unknown` /bin/uname -X = `(/bin/uname -X) 2>/dev/null || echo unknown` /bin/arch = `(/bin/arch) 2>/dev/null || echo unknown` /usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null || echo unknown` /usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null || echo unknown` /usr/bin/hostinfo = `(/usr/bin/hostinfo) 2>/dev/null || echo unknown` /bin/machine = `(/bin/machine) 2>/dev/null || echo unknown` /usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null || echo unknown` /bin/universe = `(/bin/universe) 2>/dev/null || echo unknown` _ASUNAME as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. $as_echo "PATH: $as_dir" done IFS=$as_save_IFS } >&5 cat >&5 <<_ACEOF ## ----------- ## ## Core tests. ## ## ----------- ## _ACEOF # Keep a trace of the command line. # Strip out --no-create and --no-recursion so they do not pile up. # Strip out --silent because we don't want to record it for future runs. # Also quote any args containing shell meta-characters. # Make two passes to allow for proper duplicate-argument suppression. ac_configure_args= ac_configure_args0= ac_configure_args1= ac_must_keep_next=false for ac_pass in 1 2 do for ac_arg do case $ac_arg in -no-create | --no-c* | -n | -no-recursion | --no-r*) continue ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) continue ;; *\'*) ac_arg=`$as_echo "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; esac case $ac_pass in 1) as_fn_append ac_configure_args0 " '$ac_arg'" ;; 2) as_fn_append ac_configure_args1 " '$ac_arg'" if test $ac_must_keep_next = true; then ac_must_keep_next=false # Got value, back to normal. else case $ac_arg in *=* | --config-cache | -C | -disable-* | --disable-* \ | -enable-* | --enable-* | -gas | --g* | -nfp | --nf* \ | -q | -quiet | --q* | -silent | --sil* | -v | -verb* \ | -with-* | --with-* | -without-* | --without-* | --x) case "$ac_configure_args0 " in "$ac_configure_args1"*" '$ac_arg' "* ) continue ;; esac ;; -* ) ac_must_keep_next=true ;; esac fi as_fn_append ac_configure_args " '$ac_arg'" ;; esac done done { ac_configure_args0=; unset ac_configure_args0;} { ac_configure_args1=; unset ac_configure_args1;} # When interrupted or exit'd, cleanup temporary files, and complete # config.log. We remove comments because anyway the quotes in there # would cause problems or look ugly. # WARNING: Use '\'' to represent an apostrophe within the trap. # WARNING: Do not start the trap code with a newline, due to a FreeBSD 4.0 bug. trap 'exit_status=$? # Save into config.log some information that might help in debugging. { echo $as_echo "## ---------------- ## ## Cache variables. ## ## ---------------- ##" echo # The following way of writing the cache mishandles newlines in values, ( for ac_var in `(set) 2>&1 | sed -n '\''s/^\([a-zA-Z_][a-zA-Z0-9_]*\)=.*/\1/p'\''`; do eval ac_val=\$$ac_var case $ac_val in #( *${as_nl}*) case $ac_var in #( *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( BASH_ARGV | BASH_SOURCE) eval $ac_var= ;; #( *) { eval $ac_var=; unset $ac_var;} ;; esac ;; esac done (set) 2>&1 | case $as_nl`(ac_space='\'' '\''; set) 2>&1` in #( *${as_nl}ac_space=\ *) sed -n \ "s/'\''/'\''\\\\'\'''\''/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\''\\2'\''/p" ;; #( *) sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p" ;; esac | sort ) echo $as_echo "## ----------------- ## ## Output variables. ## ## ----------------- ##" echo for ac_var in $ac_subst_vars do eval ac_val=\$$ac_var case $ac_val in *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac $as_echo "$ac_var='\''$ac_val'\''" done | sort echo if test -n "$ac_subst_files"; then $as_echo "## ------------------- ## ## File substitutions. ## ## ------------------- ##" echo for ac_var in $ac_subst_files do eval ac_val=\$$ac_var case $ac_val in *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac $as_echo "$ac_var='\''$ac_val'\''" done | sort echo fi if test -s confdefs.h; then $as_echo "## ----------- ## ## confdefs.h. ## ## ----------- ##" echo cat confdefs.h echo fi test "$ac_signal" != 0 && $as_echo "$as_me: caught signal $ac_signal" $as_echo "$as_me: exit $exit_status" } >&5 rm -f core *.core core.conftest.* && rm -f -r conftest* confdefs* conf$$* $ac_clean_files && exit $exit_status ' 0 for ac_signal in 1 2 13 15; do trap 'ac_signal='$ac_signal'; as_fn_exit 1' $ac_signal done ac_signal=0 # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -f -r conftest* confdefs.h $as_echo "/* confdefs.h */" > confdefs.h # Predefined preprocessor variables. cat >>confdefs.h <<_ACEOF #define PACKAGE_NAME "$PACKAGE_NAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_TARNAME "$PACKAGE_TARNAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_VERSION "$PACKAGE_VERSION" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_STRING "$PACKAGE_STRING" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_BUGREPORT "$PACKAGE_BUGREPORT" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_URL "$PACKAGE_URL" _ACEOF # Let the site file select an alternate cache file if it wants to. # Prefer an explicitly selected file to automatically selected ones. ac_site_file1=NONE ac_site_file2=NONE if test -n "$CONFIG_SITE"; then # We do not want a PATH search for config.site. case $CONFIG_SITE in #(( -*) ac_site_file1=./$CONFIG_SITE;; */*) ac_site_file1=$CONFIG_SITE;; *) ac_site_file1=./$CONFIG_SITE;; esac elif test "x$prefix" != xNONE; then ac_site_file1=$prefix/share/config.site ac_site_file2=$prefix/etc/config.site else ac_site_file1=$ac_default_prefix/share/config.site ac_site_file2=$ac_default_prefix/etc/config.site fi for ac_site_file in "$ac_site_file1" "$ac_site_file2" do test "x$ac_site_file" = xNONE && continue if test /dev/null != "$ac_site_file" && test -r "$ac_site_file"; then { $as_echo "$as_me:${as_lineno-$LINENO}: loading site script $ac_site_file" >&5 $as_echo "$as_me: loading site script $ac_site_file" >&6;} sed 's/^/| /' "$ac_site_file" >&5 . "$ac_site_file" \ || { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "failed to load site script $ac_site_file See \`config.log' for more details" "$LINENO" 5; } fi done if test -r "$cache_file"; then # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 $as_echo "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 $as_echo "$as_me: creating cache $cache_file" >&6;} >$cache_file fi # Check that the precious variables saved in the cache have kept the same # value. ac_cache_corrupted=false for ac_var in $ac_precious_vars; do eval ac_old_set=\$ac_cv_env_${ac_var}_set eval ac_new_set=\$ac_env_${ac_var}_set eval ac_old_val=\$ac_cv_env_${ac_var}_value eval ac_new_val=\$ac_env_${ac_var}_value case $ac_old_set,$ac_new_set in set,) { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 $as_echo "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} ac_cache_corrupted=: ;; ,set) { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5 $as_echo "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} ac_cache_corrupted=: ;; ,);; *) if test "x$ac_old_val" != "x$ac_new_val"; then # differences in whitespace do not lead to failure. ac_old_val_w=`echo x $ac_old_val` ac_new_val_w=`echo x $ac_new_val` if test "$ac_old_val_w" != "$ac_new_val_w"; then { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5 $as_echo "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} ac_cache_corrupted=: else { $as_echo "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5 $as_echo "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;} eval $ac_var=\$ac_old_val fi { $as_echo "$as_me:${as_lineno-$LINENO}: former value: \`$ac_old_val'" >&5 $as_echo "$as_me: former value: \`$ac_old_val'" >&2;} { $as_echo "$as_me:${as_lineno-$LINENO}: current value: \`$ac_new_val'" >&5 $as_echo "$as_me: current value: \`$ac_new_val'" >&2;} fi;; esac # Pass precious variables to config.status. if test "$ac_new_set" = set; then case $ac_new_val in *\'*) ac_arg=$ac_var=`$as_echo "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; *) ac_arg=$ac_var=$ac_new_val ;; esac case " $ac_configure_args " in *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. *) as_fn_append ac_configure_args " '$ac_arg'" ;; esac fi done if $ac_cache_corrupted; then { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} { $as_echo "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5 $as_echo "$as_me: error: changes in the environment can compromise the build" >&2;} as_fn_error $? "run \`make distclean' and/or \`rm $cache_file' and start over" "$LINENO" 5 fi ## -------------------- ## ## Main body of script. ## ## -------------------- ## ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu ## Set R_HOME, respecting an environment variable if one is set : ${R_HOME=$(R RHOME)} if test -z "${R_HOME}"; then as_fn_error $? "Could not determine R_HOME." "$LINENO" 5 fi ## Use R to set CXX and CXXFLAGS CXX=$(${R_HOME}/bin/R CMD config CXX) CXXFLAGS=$("${R_HOME}/bin/R" CMD config CXXFLAGS) ## We are using C++ ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu if test -z "$CXX"; then if test -n "$CCC"; then CXX=$CCC else if test -n "$ac_tool_prefix"; then for ac_prog in g++ c++ gpp aCC CC cxx cc++ cl.exe FCC KCC RCC xlC_r xlC do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_CXX+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$CXX"; then ac_cv_prog_CXX="$CXX" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CXX="$ac_tool_prefix$ac_prog" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CXX=$ac_cv_prog_CXX if test -n "$CXX"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CXX" >&5 $as_echo "$CXX" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi test -n "$CXX" && break done fi if test -z "$CXX"; then ac_ct_CXX=$CXX for ac_prog in g++ c++ gpp aCC CC cxx cc++ cl.exe FCC KCC RCC xlC_r xlC do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_ac_ct_CXX+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$ac_ct_CXX"; then ac_cv_prog_ac_ct_CXX="$ac_ct_CXX" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CXX="$ac_prog" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi ac_ct_CXX=$ac_cv_prog_ac_ct_CXX if test -n "$ac_ct_CXX"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CXX" >&5 $as_echo "$ac_ct_CXX" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi test -n "$ac_ct_CXX" && break done if test "x$ac_ct_CXX" = x; then CXX="g++" else case $cross_compiling:$ac_tool_warned in yes:) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 $as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CXX=$ac_ct_CXX fi fi fi fi # Provide some information about the compiler. $as_echo "$as_me:${as_lineno-$LINENO}: checking for C++ compiler version" >&5 set X $ac_compile ac_compiler=$2 for ac_option in --version -v -V -qversion; do { { ac_try="$ac_compiler $ac_option >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compiler $ac_option >&5") 2>conftest.err ac_status=$? if test -s conftest.err; then sed '10a\ ... rest of stderr output deleted ... 10q' conftest.err >conftest.er1 cat conftest.er1 >&5 fi rm -f conftest.er1 conftest.err $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } done cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files a.out a.out.dSYM a.exe b.out" # Try to create an executable without -o first, disregard a.out. # It will help us diagnose broken compilers, and finding out an intuition # of exeext. { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C++ compiler works" >&5 $as_echo_n "checking whether the C++ compiler works... " >&6; } ac_link_default=`$as_echo "$ac_link" | sed 's/ -o *conftest[^ ]*//'` # The possible output files: ac_files="a.out conftest.exe conftest a.exe a_out.exe b.out conftest.*" ac_rmfiles= for ac_file in $ac_files do case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;; * ) ac_rmfiles="$ac_rmfiles $ac_file";; esac done rm -f $ac_rmfiles if { { ac_try="$ac_link_default" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_link_default") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then : # Autoconf-2.13 could set the ac_cv_exeext variable to `no'. # So ignore a value of `no', otherwise this would lead to `EXEEXT = no' # in a Makefile. We should not override ac_cv_exeext if it was cached, # so that the user can short-circuit this test for compilers unknown to # Autoconf. for ac_file in $ac_files '' do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;; [ab].out ) # We found the default executable, but exeext='' is most # certainly right. break;; *.* ) if test "${ac_cv_exeext+set}" = set && test "$ac_cv_exeext" != no; then :; else ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` fi # We set ac_cv_exeext here because the later test for it is not # safe: cross compilers may not add the suffix if given an `-o' # argument, so we may need to know it at that point already. # Even if this section looks crufty: it has the advantage of # actually working. break;; * ) break;; esac done test "$ac_cv_exeext" = no && ac_cv_exeext= else ac_file='' fi if test -z "$ac_file"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error 77 "C++ compiler cannot create executables See \`config.log' for more details" "$LINENO" 5; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 $as_echo "yes" >&6; } fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for C++ compiler default output file name" >&5 $as_echo_n "checking for C++ compiler default output file name... " >&6; } { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_file" >&5 $as_echo "$ac_file" >&6; } ac_exeext=$ac_cv_exeext rm -f -r a.out a.out.dSYM a.exe conftest$ac_cv_exeext b.out ac_clean_files=$ac_clean_files_save { $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of executables" >&5 $as_echo_n "checking for suffix of executables... " >&6; } if { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then : # If both `conftest.exe' and `conftest' are `present' (well, observable) # catch `conftest.exe'. For instance with Cygwin, `ls conftest' will # work properly (i.e., refer to `conftest.exe'), while it won't with # `rm'. for ac_file in conftest.exe conftest conftest.*; do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;; *.* ) ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` break;; * ) break;; esac done else { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of executables: cannot compile and link See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest conftest$ac_cv_exeext { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5 $as_echo "$ac_cv_exeext" >&6; } rm -f conftest.$ac_ext EXEEXT=$ac_cv_exeext ac_exeext=$EXEEXT cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include int main () { FILE *f = fopen ("conftest.out", "w"); return ferror (f) || fclose (f) != 0; ; return 0; } _ACEOF ac_clean_files="$ac_clean_files conftest.out" # Check that the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are cross compiling" >&5 $as_echo_n "checking whether we are cross compiling... " >&6; } if test "$cross_compiling" != yes; then { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } if { ac_try='./conftest$ac_cv_exeext' { { case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_try") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; }; then cross_compiling=no else if test "$cross_compiling" = maybe; then cross_compiling=yes else { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot run C++ compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details" "$LINENO" 5; } fi fi fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5 $as_echo "$cross_compiling" >&6; } rm -f conftest.$ac_ext conftest$ac_cv_exeext conftest.out ac_clean_files=$ac_clean_files_save { $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5 $as_echo_n "checking for suffix of object files... " >&6; } if ${ac_cv_objext+:} false; then : $as_echo_n "(cached) " >&6 else cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.o conftest.obj if { { ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>&5 ac_status=$? $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then : for ac_file in conftest.o conftest.obj conftest.*; do test -f "$ac_file" || continue; case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM ) ;; *) ac_cv_objext=`expr "$ac_file" : '.*\.\(.*\)'` break;; esac done else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of object files: cannot compile See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest.$ac_cv_objext conftest.$ac_ext fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5 $as_echo "$ac_cv_objext" >&6; } OBJEXT=$ac_cv_objext ac_objext=$OBJEXT { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C++ compiler" >&5 $as_echo_n "checking whether we are using the GNU C++ compiler... " >&6; } if ${ac_cv_cxx_compiler_gnu+:} false; then : $as_echo_n "(cached) " >&6 else cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : ac_compiler_gnu=yes else ac_compiler_gnu=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_cxx_compiler_gnu=$ac_compiler_gnu fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_cxx_compiler_gnu" >&5 $as_echo "$ac_cv_cxx_compiler_gnu" >&6; } if test $ac_compiler_gnu = yes; then GXX=yes else GXX= fi ac_test_CXXFLAGS=${CXXFLAGS+set} ac_save_CXXFLAGS=$CXXFLAGS { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX accepts -g" >&5 $as_echo_n "checking whether $CXX accepts -g... " >&6; } if ${ac_cv_prog_cxx_g+:} false; then : $as_echo_n "(cached) " >&6 else ac_save_cxx_werror_flag=$ac_cxx_werror_flag ac_cxx_werror_flag=yes ac_cv_prog_cxx_g=no CXXFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : ac_cv_prog_cxx_g=yes else CXXFLAGS="" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : else ac_cxx_werror_flag=$ac_save_cxx_werror_flag CXXFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : ac_cv_prog_cxx_g=yes fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_cxx_werror_flag=$ac_save_cxx_werror_flag fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_g" >&5 $as_echo "$ac_cv_prog_cxx_g" >&6; } if test "$ac_test_CXXFLAGS" = set; then CXXFLAGS=$ac_save_CXXFLAGS elif test $ac_cv_prog_cxx_g = yes; then if test "$GXX" = yes; then CXXFLAGS="-g -O2" else CXXFLAGS="-g" fi else if test "$GXX" = yes; then CXXFLAGS="-O2" else CXXFLAGS= fi fi ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to run the C++ preprocessor" >&5 $as_echo_n "checking how to run the C++ preprocessor... " >&6; } if test -z "$CXXCPP"; then if ${ac_cv_prog_CXXCPP+:} false; then : $as_echo_n "(cached) " >&6 else # Double quotes because CXXCPP needs to be expanded for CXXCPP in "$CXX -E" "/lib/cpp" do ac_preproc_ok=false for ac_cxx_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if ac_fn_cxx_try_cpp "$LINENO"; then : else # Broken: fails on valid input. continue fi rm -f conftest.err conftest.i conftest.$ac_ext # OK, works on sane cases. Now check whether nonexistent headers # can be detected and how. cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include _ACEOF if ac_fn_cxx_try_cpp "$LINENO"; then : # Broken: success on invalid input. continue else # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.i conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.i conftest.err conftest.$ac_ext if $ac_preproc_ok; then : break fi done ac_cv_prog_CXXCPP=$CXXCPP fi CXXCPP=$ac_cv_prog_CXXCPP else ac_cv_prog_CXXCPP=$CXXCPP fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CXXCPP" >&5 $as_echo "$CXXCPP" >&6; } ac_preproc_ok=false for ac_cxx_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if ac_fn_cxx_try_cpp "$LINENO"; then : else # Broken: fails on valid input. continue fi rm -f conftest.err conftest.i conftest.$ac_ext # OK, works on sane cases. Now check whether nonexistent headers # can be detected and how. cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include _ACEOF if ac_fn_cxx_try_cpp "$LINENO"; then : # Broken: success on invalid input. continue else # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.i conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.i conftest.err conftest.$ac_ext if $ac_preproc_ok; then : else { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 $as_echo "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "C++ preprocessor \"$CXXCPP\" fails sanity check See \`config.log' for more details" "$LINENO" 5; } fi ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu ## Check the C++ compiler using the CXX value set ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu if test -z "$CXX"; then if test -n "$CCC"; then CXX=$CCC else if test -n "$ac_tool_prefix"; then for ac_prog in g++ c++ gpp aCC CC cxx cc++ cl.exe FCC KCC RCC xlC_r xlC do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_CXX+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$CXX"; then ac_cv_prog_CXX="$CXX" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_CXX="$ac_tool_prefix$ac_prog" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CXX=$ac_cv_prog_CXX if test -n "$CXX"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CXX" >&5 $as_echo "$CXX" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi test -n "$CXX" && break done fi if test -z "$CXX"; then ac_ct_CXX=$CXX for ac_prog in g++ c++ gpp aCC CC cxx cc++ cl.exe FCC KCC RCC xlC_r xlC do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 $as_echo_n "checking for $ac_word... " >&6; } if ${ac_cv_prog_ac_ct_CXX+:} false; then : $as_echo_n "(cached) " >&6 else if test -n "$ac_ct_CXX"; then ac_cv_prog_ac_ct_CXX="$ac_ct_CXX" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CXX="$ac_prog" $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi ac_ct_CXX=$ac_cv_prog_ac_ct_CXX if test -n "$ac_ct_CXX"; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CXX" >&5 $as_echo "$ac_ct_CXX" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } fi test -n "$ac_ct_CXX" && break done if test "x$ac_ct_CXX" = x; then CXX="g++" else case $cross_compiling:$ac_tool_warned in yes:) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 $as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CXX=$ac_ct_CXX fi fi fi fi # Provide some information about the compiler. $as_echo "$as_me:${as_lineno-$LINENO}: checking for C++ compiler version" >&5 set X $ac_compile ac_compiler=$2 for ac_option in --version -v -V -qversion; do { { ac_try="$ac_compiler $ac_option >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" $as_echo "$ac_try_echo"; } >&5 (eval "$ac_compiler $ac_option >&5") 2>conftest.err ac_status=$? if test -s conftest.err; then sed '10a\ ... rest of stderr output deleted ... 10q' conftest.err >conftest.er1 cat conftest.er1 >&5 fi rm -f conftest.er1 conftest.err $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } done { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C++ compiler" >&5 $as_echo_n "checking whether we are using the GNU C++ compiler... " >&6; } if ${ac_cv_cxx_compiler_gnu+:} false; then : $as_echo_n "(cached) " >&6 else cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : ac_compiler_gnu=yes else ac_compiler_gnu=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_cxx_compiler_gnu=$ac_compiler_gnu fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_cxx_compiler_gnu" >&5 $as_echo "$ac_cv_cxx_compiler_gnu" >&6; } if test $ac_compiler_gnu = yes; then GXX=yes else GXX= fi ac_test_CXXFLAGS=${CXXFLAGS+set} ac_save_CXXFLAGS=$CXXFLAGS { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX accepts -g" >&5 $as_echo_n "checking whether $CXX accepts -g... " >&6; } if ${ac_cv_prog_cxx_g+:} false; then : $as_echo_n "(cached) " >&6 else ac_save_cxx_werror_flag=$ac_cxx_werror_flag ac_cxx_werror_flag=yes ac_cv_prog_cxx_g=no CXXFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : ac_cv_prog_cxx_g=yes else CXXFLAGS="" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : else ac_cxx_werror_flag=$ac_save_cxx_werror_flag CXXFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int main () { ; return 0; } _ACEOF if ac_fn_cxx_try_compile "$LINENO"; then : ac_cv_prog_cxx_g=yes fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_cxx_werror_flag=$ac_save_cxx_werror_flag fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_g" >&5 $as_echo "$ac_cv_prog_cxx_g" >&6; } if test "$ac_test_CXXFLAGS" = set; then CXXFLAGS=$ac_save_CXXFLAGS elif test $ac_cv_prog_cxx_g = yes; then if test "$GXX" = yes; then CXXFLAGS="-g -O2" else CXXFLAGS="-g" fi else if test "$GXX" = yes; then CXXFLAGS="-O2" else CXXFLAGS= fi fi ac_ext=cpp ac_cpp='$CXXCPP $CPPFLAGS' ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_cxx_compiler_gnu ## If it is g++, we have GXX set so let's examine it if test "${GXX}" = yes; then { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether g++ version is sufficient" >&5 $as_echo_n "checking whether g++ version is sufficient... " >&6; } gxx_version=$(${CXX} -v 2>&1 | awk '/^.*g.. version/ {print $3}') case ${gxx_version} in 1.*|2.*|3.*|4.0.*|4.1.*|4.2.*|4.3.*|4.4.*|4.5.*|4.6.*) { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Only g++ version 4.7 or greater can be used with RcppMLPACK." >&5 $as_echo "$as_me: WARNING: Only g++ version 4.7 or greater can be used with RcppMLPACK." >&2;} as_fn_error $? "Please use a different compiler." "$LINENO" 5 ;; 4.7.*|4.8.*|4.9.*|5.*|6.*) ## Removes gcc 4.6.* from this line gxx_newer_than_45="-fpermissive" { $as_echo "$as_me:${as_lineno-$LINENO}: result: (${gxx_version}) yes" >&5 $as_echo "(${gxx_version}) yes" >&6; } ;; esac fi