fastcluster/0000755000176200001440000000000014550336553012622 5ustar liggesusersfastcluster/NAMESPACE0000644000176200001440000000011711727523223014033 0ustar liggesusersuseDynLib(fastcluster, .registration=TRUE) export('hclust', 'hclust.vector') fastcluster/README0000644000176200001440000001451314541365270013504 0ustar liggesusersfastcluster: Fast hierarchical clustering routines for R and Python Copyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. The fastcluster package is a C++ library for hierarchical, agglomerative clustering. It efficiently implements the seven most widely used clustering schemes: single, complete, average, weighted/McQuitty, Ward, centroid and median linkage. The library currently has interfaces to two languages: R and Python/NumPy. Part of the functionality is designed as drop-in replacement for existing routines: “linkage” in the SciPy package “scipy.cluster.hierarchy”, “hclust” in R's “stats” package, and the “flashClust” package. Once the fastcluster library is loaded at the beginning of the code, every program that uses hierarchical clustering can benefit immediately and effortlessly from the performance gain. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide. See the author's home page for more information, in particular a performance comparison with other clustering packages. The User's manual is the file inst/doc/fastcluster.pdf in the source distribution. The fastcluster package is distributed under the BSD license. See the file LICENSE in the source distribution or . Christoph Dalitz wrote a pure C++ interface to fastcluster: . Installation ‾‾‾‾‾‾‾‾‾‾‾‾ See the file INSTALL in the source distribution. Usage ‾‾‾‾‾ 1. R ‾‾‾‾ In R, load the package with the following command: library('fastcluster') The package overwrites the function hclust from the “stats” package (in the same way as the flashClust package does). Please remove any references to the flashClust package in your R files to not accidentally overwrite the hclust function with the flashClust version. The new hclust function has exactly the same calling conventions as the old one. You may just load the package and immediately and effortlessly enjoy the performance improvements. The function is also an improvement to the flashClust function from the “flashClust” package. Just replace every call to flashClust by hclust and expect your code to work as before, only faster. (If you are using flashClust prior to version 1.01, update it! See the change log for flashClust: http://cran.r-project.org/web/packages/flashClust/ChangeLog ) If you need to access the old function or make sure that the right function is called, specify the package as follows: fastcluster::hclust(…) flashClust::hclust(…) stats::hclust(…) Vector data can be clustered with a memory-saving algorithm with the command hclust.vector(…) See the User's manual inst/doc/fastcluster.pdf for further details. WARNING ‾‾‾‾‾‾‾ R and Matlab/SciPy use different conventions for the “Ward”, “centroid” and “median” methods. R assumes that the dissimilarity matrix consists of squared Euclidean distances, while Matlab and SciPy expect non-squared Euclidean distances. The fastcluster package respects these conventions and uses different formulas in the two interfaces. If you want the same results in both interfaces, then feed the hclust function in R with the entry-wise square of the distance matrix, D^2, for the “Ward”, “centroid” and “median” methods and later take the square root of the height field in the dendrogram. For the “average” and “weighted” alias “mcquitty” methods, you must still take the same distance matrix D as in the Python interface for the same results. The “single” and “complete” methods only depend on the relative order of the distances, hence it does not make a difference whether the method operates on the distances or the squared distances. The code example in the R documentation (enter ?hclust or example(hclust) in R) contains an instance where the squared distance matrix is generated from Euclidean data. 2. Python ‾‾‾‾‾‾‾‾‾ The fastcluster package is imported as usual by import fastcluster It provides the following functions: linkage(X, method='single', metric='euclidean', preserve_input=True) single(X) complete(X) average(X) weighted(X) ward(X) centroid(X) median(X) linkage_vector(X, method='single', metric='euclidean', extraarg=None) The argument X is either a compressed distance matrix or a collection of n observation vectors in d dimensions as an (n×d) array. Apart from the argument preserve_input, the methods have the same input and output as the functions of the same name in the package scipy.cluster.hierarchy. The additional, optional argument preserve_input specifies whether the fastcluster package first copies the distance matrix or writes into the existing array. If the dissimilarities are generated for the clustering step only and are not needed afterward, approximately half the memory can be saved by specifying preserve_input=False. Note that the input array X contains unspecified values after this procedure. You may want to write linkage(X, method='…', preserve_input=False) del X to make sure that the matrix X is not accessed accidentally after it has been used as scratch memory. The method linkage_vector(X, method='single', metric='euclidean', extraarg=None) provides memory-saving clustering for vector data. It also accepts a collection of n observation vectors in d dimensions as an (n×d) array as the first parameter. The parameter 'method' is either 'single', 'ward', 'centroid' or 'median'. The 'ward', 'centroid' and 'median' methods require the Euclidean metric. In case of single linkage, the 'metric' parameter can be chosen from all metrics which are implemented in scipy.spatial.dist.pdist. There may be differences between linkage(scipy.spatial.dist.pdist(X, metric='…')) and linkage_vector(X, metric='…') since there have been made a few corrections compared to the pdist function. Please consult the the User's manual inst/doc/fastcluster.pdf for comprehensive details. fastcluster/LICENSE0000644000176200001440000000263214541365152013627 0ustar liggesusersCopyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. fastcluster/INSTALL0000644000176200001440000000777714541365166013677 0ustar liggesusersfastcluster: Fast hierarchical clustering routines for R and Python Copyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. Installation ‾‾‾‾‾‾‾‾‾‾‾‾ Installation procedures were tested under 64-bit Ubuntu. CRAN also hosts precompiled binaries (of the R library, not the Python module) for Windows and OS X. In principle, it should be possible to install the fastcluster package on any system that has a C++ compiler and R respectively Python with NumPy. There are no unusual libraries needed to compile the package, only the STL library, which every C++ compiler should have by default. Please send me feedback if you accomplish to install the fastcluster package on a certain platform but needed to tweak the configuration! I will update the installation instructions and modify the package if needed (eg. include the right compiler flags for various operating systems). Installation for R ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Enter the command install.packages("fastcluster") in R, and R will download the package automatically, then install it. That's it! If this does not work, please consult R's help function by typing ?INSTALL from within R or read the “R installation and administration” manual: http://cran.r-project.org/doc/manuals/R-admin.html#Installing-packages For manual download, you can get the fastcluster package from the download page at CRAN: http://cran.r-project.org/web/packages/fastcluster/ You may need to start R with administrator rights to be able to install packages. There are ways to install R packages without administrator privileges in your user directories. See this help page for example: http://csg.sph.umich.edu/docs/R/localpackages.html Installation for Python ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Make sure that you have both Python and NumPy installed. 1. On all platforms ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ If pip is installed, type pip install --upgrade --user fastcluster in a terminal, which automatically downloads the latest version from PyPI, compiles the C++ library and installs the package for a single user without administrator rights. If this works, there is no need to follow the alternative steps below. 2. Microsoft Windows ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Installation files for Windows are stored on PyPI: https://pypi.python.org/pypi/fastcluster Christoph Gohlke also provides installation files for Windows on his web page: http://www.lfd.uci.edu/~gohlke/pythonlibs/#fastcluster 3. With setuptools ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ If pip is not available but setuptools, type easy_install --upgrade --user fastcluster in a terminal. 4. From the source package ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ If you have not done so already, download the fastcluster package from PyPI here: http://pypi.python.org/pypi/fastcluster/ Open a terminal, go to the directory with the downloaded file and extract the contents of the archive with: tar -xvf fastcluster-(version).tar.gz Alternatively, use your favorite archive manager for unpacking, eg. on Windows. This will generate a new directory “fastcluster-(version)”. Switch to this subdirectory: cd fastcluster-(...) The source distribution on CRAN also contains the complete source files. See the directory src/python there. Now compile and install the Python module by: python setup.py install You may need to precede this command with sudo or install the package in your home directory, like this: python setup.py install --user See the chapter “Installing Python modules” in the Python documentation for further help: http://docs.python.org/install/index.html fastcluster/man/0000755000176200001440000000000014541365667013405 5ustar liggesusersfastcluster/man/fastcluster.Rd0000644000176200001440000000570014541365411016220 0ustar liggesusers\name{fastcluster} \alias{fastcluster} \alias{fastcluster-package} \docType{package} \title{Fast hierarchical, agglomerative clustering routines for R and Python} \description{The \pkg{fastcluster} package provides efficient algorithms for hierarchical, agglomerative clustering. In addition to the R interface, there is also a Python interface to the underlying C++ library, to be found in the source distribution. } \details{The function \code{\link{hclust}} provides clustering when the input is a dissimilarity matrix. A dissimilarity matrix can be computed from vector data by \code{\link{dist}}. The \code{\link{hclust}} function can be used as a drop-in replacement for existing routines: \code{\link[stats:hclust]{stats::hclust}} and \code{\link[flashClust:hclust]{flashClust::hclust}} alias \code{\link[flashClust:flashClust]{flashClust::flashClust}}. Once the fastcluster library is loaded at the beginning of the code, every program that uses hierarchical clustering can benefit immediately and effortlessly from the performance gain When the package is loaded, it overwrites the function \code{\link{hclust}} with the new code. The function \code{\link{hclust.vector}} provides memory-saving routines when the input is vector data. Further information: \itemize{ \item R documentation pages: \code{\link{hclust}}, \code{\link{hclust.vector}} \item A comprehensive User's manual: \href{https://CRAN.R-project.org/package=fastcluster/vignettes/fastcluster.pdf}{fastcluster.pdf}. Get this from the R command line with \code{vignette('fastcluster')}. \item JSS paper: \doi{10.18637/jss.v053.i09}. \item See the author's home page for a performance comparison: \url{https://danifold.net/fastcluster.html}. } } \references{\url{https://danifold.net/fastcluster.html}} \author{Daniel Müllner} \seealso{\code{\link{hclust}}, \code{\link{hclust.vector}}} \examples{# Taken and modified from stats::hclust # # hclust(...) # new method # hclust.vector(...) # new method # stats::hclust(...) # old method require(fastcluster) require(graphics) hc <- hclust(dist(USArrests), "ave") plot(hc) plot(hc, hang = -1) ## Do the same with centroid clustering and squared Euclidean distance, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclust.vector(USArrests, "cen") # squared Euclidean distances hc$height <- hc$height^2 memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclust.vector(cent, method = "cen", members = table(memb)) # squared Euclidean distances hc1$height <- hc1$height^2 opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar) } \keyword{multivariate} \keyword{cluster} fastcluster/man/hclust.Rd0000644000176200001440000000440514541365447015175 0ustar liggesusers\name{hclust} \alias{hclust} \title{Fast hierarchical, agglomerative clustering of dissimilarity data} \description{ This function implements hierarchical clustering with the same interface as \code{\link[stats:hclust]{hclust}} from the \pkg{\link{stats}} package but with much faster algorithms. } \usage{hclust(d, method="complete", members=NULL)} \arguments{ \item{d}{a dissimilarity structure as produced by \code{dist}.} \item{method}{the agglomeration method to be used. This must be (an unambiguous abbreviation of) one of \code{"single"}, \code{"complete"}, \code{"average"}, \code{"mcquitty"}, \code{"ward.D"}, \code{"ward.D2"}, \code{"centroid"} or \code{"median"}.} \item{members}{\code{NULL} or a vector with length the number of observations.} } \value{An object of class \code{'hclust'}. It encodes a stepwise dendrogram.} \details{See the documentation of the original function \code{\link[stats:hclust]{hclust}} in the \pkg{\link{stats}} package. A comprehensive User's manual \href{https://CRAN.R-project.org/package=fastcluster/vignettes/fastcluster.pdf}{fastcluster.pdf} is available as a vignette. Get this from the R command line with \code{vignette('fastcluster')}. } \references{\url{https://danifold.net/fastcluster.html}} \author{Daniel Müllner} \seealso{\code{\link{fastcluster}}, \code{\link{hclust.vector}}, \code{\link[stats:hclust]{stats::hclust}}} \examples{# Taken and modified from stats::hclust # # hclust(...) # new method # stats::hclust(...) # old method require(fastcluster) require(graphics) hc <- hclust(dist(USArrests), "ave") plot(hc) plot(hc, hang = -1) ## Do the same with centroid clustering and squared Euclidean distance, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclust(dist(USArrests)^2, "cen") memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb)) opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar) } \keyword{multivariate} \keyword{cluster} fastcluster/man/hclust.vector.Rd0000644000176200001440000000701114541365667016476 0ustar liggesusers\name{hclust.vector} \alias{hclust.vector} \title{Fast hierarchical, agglomerative clustering of vector data} \description{ This function implements hierarchical, agglomerative clustering with memory-saving algorithms.} \usage{hclust.vector(X, method="single", members=NULL, metric='euclidean', p=NULL)} \arguments{ \item{X}{an \eqn{(N\times D)}{(N×D)} matrix of '\link{double}' values: \eqn{N}{N} observations in \eqn{D}{D} variables.} \item{method}{the agglomeration method to be used. This must be (an unambiguous abbreviation of) one of \code{"single"}, \code{"ward"}, \code{"centroid"} or \code{"median"}.} \item{members}{\code{NULL} or a vector with length the number of observations.} \item{metric}{the distance measure to be used. This must be one of \code{"euclidean"}, \code{"maximum"}, \code{"manhattan"}, \code{"canberra"}, \code{"binary"} or \code{"minkowski"}. Any unambiguous substring can be given.} \item{p}{parameter for the Minkowski metric.} } \details{The function \code{\link{hclust.vector}} provides clustering when the input is vector data. It uses memory-saving algorithms which allow processing of larger data sets than \code{\link{hclust}} does. The \code{"ward"}, \code{"centroid"} and \code{"median"} methods require \code{metric="euclidean"} and cluster the data set with respect to Euclidean distances. For \code{"single"} linkage clustering, any dissimilarity measure may be chosen. Currently, the same metrics are implemented as the \code{\link[stats:dist]{dist}} function provides. The call\preformatted{ hclust.vector(X, method='single', metric=[...])} gives the same result as\preformatted{ hclust(dist(X, metric=[...]), method='single')} but uses less memory and is equally fast. For the Euclidean methods, care must be taken since \code{\link{hclust}} expects \bold{squared} Euclidean distances. Hence, the call\preformatted{ hclust.vector(X, method='centroid')} is, aside from the lesser memory requirements, equivalent to\preformatted{ d = dist(X) hc = hclust(d^2, method='centroid') hc$height = sqrt(hc$height)} The same applies to the \code{"median"} method. The \code{"ward"} method in \code{\link{hclust.vector}} is equivalent to \code{\link{hclust}} with method \code{"ward.D2"}, but to method \code{"ward.D"} only after squaring as above. More details are in the User's manual \href{https://CRAN.R-project.org/package=fastcluster/vignettes/fastcluster.pdf}{fastcluster.pdf}, which is available as a vignette. Get this from the R command line with \code{vignette('fastcluster')}. } \references{\url{https://danifold.net/fastcluster.html}} \author{Daniel Müllner} \seealso{\code{\link{fastcluster}}, \code{\link{hclust}}} \examples{# Taken and modified from stats::hclust ## Perform centroid clustering with squared Euclidean distances, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclust.vector(USArrests, "cen") # squared Euclidean distances hc$height <- hc$height^2 memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclust.vector(cent, method = "cen", members = table(memb)) # squared Euclidean distances hc1$height <- hc1$height^2 opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar) } \keyword{multivariate} \keyword{cluster} fastcluster/DESCRIPTION0000644000176200001440000000337314550336552014335 0ustar liggesusersPackage: fastcluster Encoding: UTF-8 Type: Package Version: 1.2.6 Date: 2023-12-22 Title: Fast Hierarchical Clustering Routines for R and 'Python' Authors@R: c(person("Daniel", "Müllner", email = "daniel@danifold.net", role = c("aut", "cph", "cre")), person("Google Inc.", role="cph")) Copyright: Until package version 1.1.23: © 2011 Daniel Müllner . All changes from version 1.1.24 on: © Google Inc. . Enhances: stats, flashClust Depends: R (>= 3.0.0) Description: This is a two-in-one package which provides interfaces to both R and 'Python'. It implements fast hierarchical, agglomerative clustering routines. Part of the functionality is designed as drop-in replacement for existing routines: linkage() in the 'SciPy' package 'scipy.cluster.hierarchy', hclust() in R's 'stats' package, and the 'flashClust' package. It provides the same functionality with the benefit of a much faster implementation. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide. For information on how to install the 'Python' files, see the file INSTALL in the source distribution. Based on the present package, Christoph Dalitz also wrote a pure 'C++' interface to 'fastcluster': . License: FreeBSD | GPL-2 | file LICENSE URL: https://danifold.net/fastcluster.html NeedsCompilation: yes Packaged: 2023-12-22 20:04:59 UTC; muellner Author: Daniel Müllner [aut, cph, cre], Google Inc. [cph] Maintainer: Daniel Müllner Repository: CRAN Date/Publication: 2024-01-12 22:30:02 UTC fastcluster/build/0000755000176200001440000000000014541365753013725 5ustar liggesusersfastcluster/build/vignette.rds0000644000176200001440000000031614541365753016264 0ustar liggesusersb```b`afb`b2 1# 'X\SZ\ZTZ&ZZ^W&ɏ % Md0&$yּb4M.y) 3GZY_Ӄ -3'foHf e2|s mMI,F(WJbI^ZP?Rɮfastcluster/build/partial.rdb0000644000176200001440000000007514541365743016053 0ustar liggesusersb```b`afb`b1 H020piּb C"%!7fastcluster/tests/0000755000176200001440000000000014541365732013765 5ustar liggesusersfastcluster/tests/test_fastcluster.R0000644000176200001440000001644214541365732017515 0ustar liggesusers# fastcluster: Fast hierarchical clustering routines for R and Python # # Copyright: # * Until package version 1.1.23: © 2011 Daniel Müllner # * All changes from version 1.1.24 on: © Google Inc. # # Test script for the R interface seed = as.integer(runif(1, 0, 1e9)) set.seed(seed) cat(sprintf("Random seed: %d\n",seed)) print_seed <- function() { return(sprintf(' Please send a report to the author of the \'fastcluster\' package, Daniel Müllner. For contact details, see . To make the error reproducible, you must include the following number (the random seed value) in your error report: %d.\n\n', seed)) } hasWardD2 = getRversion() >= '3.1.0' # Compare two dendrograms and check whether they are equal, except that # ties may be resolved differently. compare <- function(dg1, dg2) { h1 <- dg1$height h2 <- dg2$height # "height" vectors may have small numerical errors. rdiffs <- abs(h1-h2)/pmax(abs(h1),abs(h2)) rdiffs = rdiffs[complete.cases(rdiffs)] rel_error <- max(rdiffs) # We allow a relative error of 1e-13. if (rel_error>1e-13) { print(h1) print(h2) cat(sprintf('Height vectors differ! The maximum relative error is %e.\n', rel_error)) return(FALSE) } # Filter the indices where consecutive merging distances are distinct. d = diff(dg1$height) b = (c(d,1)!=0 & c(1,d)!=0) #cat(sprintf("Percentage of indices where we can test: %g.\n",100.0*length(b[b])/length(b))) if (any(b)) { m1 = dg1$merge[b,] m2 = dg2$merge[b,] r = function(i) { if (i<0) { return(1) } else { return(b[i]) } } f = sapply(m1,r) fm1 = m1*f fm2 = m2*f # The "merge" matrices must be identical whereever indices are not ambiguous # due to ties. if (!identical(fm1,fm2)) { cat('Merge matrices differ!\n') return(FALSE) } # Compare the "order" vectors only if all merging distances were distinct. if (all(b) && !identical(dg1$order,dg2$order)) { cat('Order vectors differ!\n') return(FALSE) } } return(TRUE) } # Generate uniformly distributed random data generate.uniform <- function() { n = sample(10:1000,1) range_exp = runif(1,min=-10, max=10) cat(sprintf("Number of sample points: %d\n",n)) cat(sprintf("Dissimilarity range: [0,%g]\n",10^range_exp)) d = runif(n*(n-1)/2, min=0, max=10^range_exp) # Fake a compressed distance matrix attributes(d) <- NULL attr(d,"Size") <- n attr(d, "call") <- 'N/A' class(d) <- "dist" return(d) } # Generate normally distributed random data generate.normal <- function() { n = sample(10:1000,1) dim = sample(2:20,1) cat (sprintf("Number of sample points: %d\n",n)) cat (sprintf("Dimension: %d\n",dim)) pcd = matrix(rnorm(n*dim), c(n,dim)) d = dist(pcd) return(d) } # Test the clustering functions when a distance matrix is given. test.dm <- function(d) { d2 = d if (hasWardD2) { methods = c('single','complete','average','mcquitty','ward.D','ward.D2','centroid','median') } else { methods = c('single','complete','average','mcquitty','ward','centroid','median') } for (method in methods) { cat(paste('Method :', method, '\n')) dg_stats = stats::hclust(d, method=method) if (method == 'ward') { method = 'ward.D' } dg_fastcluster = fastcluster::hclust(d, method=method) if (!identical(d,d2)) { cat('Input array was corrupted!\n') stop(print_seed()) } if (!compare(dg_stats, dg_fastcluster)) { stop(print_seed()) } } cat('Passed.\n') } # Test the clustering functions for vector input in Euclidean space. test.vector <- function() { # generate test data n = sample(10:1000,1) dim = sample(2:20,1) cat (sprintf("Number of sample points: %d\n",n)) cat (sprintf("Dimension: %d\n",dim)) range_exp = runif(1,min=-10, max=10) pcd = matrix(rnorm(n*dim, sd=10^range_exp), c(n,dim)) pcd2 = pcd # test method='single' cat(paste('Method:', method, '\n')) for (metric in c('euclidean', 'maximum', 'manhattan', 'canberra', 'minkowski')) { cat(paste(' Metric:', metric, '\n')) if (metric=='minkowski') { p = runif(1, min=1.0, max=10.0) cat (sprintf(" p: %g\n",p)); dg_fastcluster = fastcluster::hclust.vector(pcd, method=method, metric=metric, p=p) d = dist(pcd, method=metric, p=p) } else { dg_fastcluster = fastcluster::hclust.vector(pcd, method=method, metric=metric) d = dist(pcd, method=metric) } d2 = d dg_fastcluster_dist = fastcluster::hclust(d, method=method) if (!identical(d,d2) || !identical(pcd,pcd2)) { cat('Input array was corrupted!\n') stop(print_seed()) } if (!compare(dg_fastcluster_dist, dg_fastcluster)) { stop(print_seed()) } } for (method in c('ward','centroid','median') ) { cat(paste('Method:', method, '\n')) dg_fastcluster = fastcluster::hclust.vector(pcd, method=method) if (!identical(pcd,pcd2)) { cat('Input array was corrupted!\n') stop(print_seed()) } d = dist(pcd) if(method == "ward" && hasWardD2) { method = "ward.D2" } else { # Workaround: fastcluster::hclust expects _squared_ euclidean distances. d = d^2 } d2 = d dg_fastcluster_dist = fastcluster::hclust(d, method=method) if (!identical(d,d2)) { cat('Input array was corrupted!\n') stop(print_seed()) } if(method != "ward.D2") { dg_fastcluster_dist$height = sqrt(dg_fastcluster_dist$height) } # The Euclidean methods may have small numerical errors due to squaring/ # taking the root in the Euclidean distances. if (!compare(dg_fastcluster_dist, dg_fastcluster)) { stop(print_seed()) } } cat('Passed.\n') } # Test the single linkage function with the "binary" metric test.vector.binary <- function() { # generate test data cat (sprintf("Uniform sampling for the 'binary' metric:\n")) n = sample(10:400,1) dim = sample(n:(2*n),1) cat (sprintf("Number of sample points: %d\n",n)) cat (sprintf("Dimension: %d\n",dim)) pcd = matrix(sample(-1:2, n*dim, replace=T), c(n,dim)) pcd2 = pcd # test method='single' metric='binary' cat(paste('Method:', method, '\n')) cat(paste(' Metric:', metric, '\n')) dg_fastcluster = fastcluster::hclust.vector(pcd, method=method, metric=metric) d = dist(pcd, method=metric) d2 = d dg_fastcluster_dist = fastcluster::hclust(d, method=method) if (!identical(d,d2) || !identical(d,d2)) { cat('Input array was corrupted!\n') stop(print_seed()) } if (!compare(dg_fastcluster_dist, dg_fastcluster)) { stop(print_seed()) } cat('Passed.\n') } N = 15 for (i in (1:N)) { if (i%%2==1) { cat(sprintf('Random test %d of %d (uniform distribution of distances):\n',i,2*N)) d = generate.uniform() } else { cat(sprintf('Random test %d of %d (Gaussian density):\n',i,2*N)) d = generate.normal() } test.dm(d) } for (i in (N+1:N)) { cat(sprintf('Random test %d of %d (Gaussian density):\n',i,2*N)) test.vector() test.vector.binary() } cat('Done.\n') fastcluster/src/0000755000176200001440000000000014541365753013415 5ustar liggesusersfastcluster/src/fastcluster_R.cpp0000644000176200001440000006410014541365003016726 0ustar liggesusers/* fastcluster: Fast hierarchical clustering routines for R and Python Copyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. */ #include #include #include // for R_pow #define fc_isnan(X) ((X)!=(X)) // There is ISNAN but it is so much slower on my x86_64 system with GCC! #include // for std::abs #include // for std::ptrdiff_t #include // for std::numeric_limits<...>::infinity() #include // for std::stable_sort #include // for std::runtime_error #include // for std::string #include // for std::bad_alloc #include // for std::exception #include "fastcluster.cpp" /* Since the public interface is given by the Python respectively R interface, * we do not want other symbols than the interface initalization routines to be * visible in the shared object file. The "visibility" switch is a GCC concept. * Hiding symbols keeps the relocation table small and decreases startup time. * See http://gcc.gnu.org/wiki/Visibility */ #if HAVE_VISIBILITY #pragma GCC visibility push(hidden) #endif /* Helper function: order the nodes so that they can be displayed nicely in a dendrogram. This is used for the 'order' field in the R output. */ struct pos_node { t_index pos; int node; }; void order_nodes(const int N, const int * const merge, const t_index * const node_size, int * const order) { /* Parameters: N : number of data points merge : (N-1)×2 array which specifies the node indices which are merged in each step of the clustering procedure. Negative entries -1...-N point to singleton nodes, while positive entries 1...(N-1) point to nodes which are themselves parents of other nodes. node_size : array of node sizes - makes it easier order : output array of size N Runtime: Θ(N) */ auto_array_ptr queue(N/2); int parent; int child; t_index pos = 0; queue[0].pos = 0; queue[0].node = N-2; t_index idx = 1; do { --idx; pos = queue[idx].pos; parent = queue[idx].node; // First child child = merge[parent]; if (child<0) { // singleton node, write this into the 'order' array. order[pos] = -child; ++pos; } else { /* compound node: put it on top of the queue and decompose it in a later iteration. */ queue[idx].pos = pos; queue[idx].node = child-1; // convert index-1 based to index-0 based ++idx; pos += node_size[child-1]; } // Second child child = merge[parent+N-1]; if (child<0) { order[pos] = -child; } else { queue[idx].pos = pos; queue[idx].node = child-1; ++idx; } } while (idx>0); } #define size_(r_) ( ((r_ void generate_R_dendrogram(int * const merge, double * const height, int * const order, cluster_result & Z2, const int N) { // The array "nodes" is a union-find data structure for the cluster // identites (only needed for unsorted cluster_result input). union_find nodes(sorted ? 0 : N); if (!sorted) { std::stable_sort(Z2[0], Z2[N-1]); } t_index node1, node2; auto_array_ptr node_size(N-1); for (t_index i=0; inode1; node2 = Z2[i]->node2; } else { node1 = nodes.Find(Z2[i]->node1); node2 = nodes.Find(Z2[i]->node2); // Merge the nodes in the union-find data structure by making them // children of a new node. nodes.Union(node1, node2); } // Sort the nodes in the output array. if (node1>node2) { t_index tmp = node1; node1 = node2; node2 = tmp; } /* Conversion between labeling conventions. Input: singleton nodes 0,...,N-1 compound nodes N,...,2N-2 Output: singleton nodes -1,...,-N compound nodes 1,...,N */ merge[i] = (node1(node1)-1 : static_cast(node1)-N+1; merge[i+N-1] = (node2(node2)-1 : static_cast(node2)-N+1; height[i] = Z2[i]->dist; node_size[i] = size_(node1) + size_(node2); } order_nodes(N, merge, node_size, order); } /* R interface code */ enum { METRIC_R_EUCLIDEAN = 0, METRIC_R_MAXIMUM = 1, METRIC_R_MANHATTAN = 2, METRIC_R_CANBERRA = 3, METRIC_R_BINARY = 4, METRIC_R_MINKOWSKI = 5, METRIC_R_CANBERRA_OLD = 6 }; class R_dissimilarity { private: t_float * Xa; std::ptrdiff_t dim; // std::ptrdiff_t saves many statis_cast<> in products t_float * members; void (cluster_result::*postprocessfn) (const t_float) const; t_float postprocessarg; t_float (R_dissimilarity::*distfn) (const t_index, const t_index) const; auto_array_ptr row_repr; int N; // no default constructor R_dissimilarity(); // noncopyable R_dissimilarity(R_dissimilarity const &); R_dissimilarity & operator=(R_dissimilarity const &); public: // Ignore warning about uninitialized member variables. I know what I am // doing here, and some member variables are only used for certain metrics. R_dissimilarity (t_float * const X_, const int N_, const int dim_, t_float * const members_, const unsigned char method, const unsigned char metric, const t_float p, bool make_row_repr) : Xa(X_), dim(dim_), members(members_), postprocessfn(NULL), postprocessarg(p), N(N_) { switch (method) { case METHOD_VECTOR_SINGLE: switch (metric) { case METRIC_R_EUCLIDEAN: distfn = &R_dissimilarity::sqeuclidean; postprocessfn = &cluster_result::sqrt; break; case METRIC_R_MAXIMUM: distfn = &R_dissimilarity::maximum; break; case METRIC_R_MANHATTAN: distfn = &R_dissimilarity::manhattan; break; case METRIC_R_CANBERRA: distfn = &R_dissimilarity::canberra; break; case METRIC_R_BINARY: distfn = &R_dissimilarity::dist_binary; break; case METRIC_R_MINKOWSKI: distfn = &R_dissimilarity::minkowski; postprocessfn = &cluster_result::power; break; case METRIC_R_CANBERRA_OLD: distfn = &R_dissimilarity::canberra_old; break; default: throw std::runtime_error(std::string("Invalid method.")); } break; case METHOD_VECTOR_WARD: postprocessfn = &cluster_result::sqrtdouble; break; default: postprocessfn = &cluster_result::sqrt; } if (make_row_repr) { row_repr.init(2*N-1); for (t_index i=0; i*distfn)(i,j); } inline t_float X (const t_index i, const t_index j) const { // "C-style" array alignment return Xa[i*dim+j]; } inline t_float * Xptr(const t_index i, const t_index j) const { // "C-style" array alignment return Xa+i*dim+j; } void merge(const t_index i, const t_index j, const t_index newnode) const { merge_inplace(row_repr[i], row_repr[j]); row_repr[newnode] = row_repr[j]; } void merge_inplace(const t_index i, const t_index j) const { for(t_index k=0; k(i1,i2)*members[i1]*members[i2]/ \ (members[i1]+members[i2]); } inline double ward_initial(t_index const i1, t_index const i2) const { /* In the R interface, ward_initial is the same as ward. Only the Python interface has two different functions here. */ return ward(i1,i2); } // This method must not produce NaN if the input is non-NaN. inline static t_float ward_initial_conversion(const t_float min) { // identity return min; } double ward_extended(t_index i1, t_index i2) const { return ward(row_repr[i1], row_repr[i2]); } /* The following definitions and methods have been taken directly from the R source file /src/library/stats/src/distance.c in the R release 2.13.0. The code has only been adapted very slightly. (Unfortunately, the methods cannot be called directly in the R libraries since the functions are declared "static" in the above file.) Note to maintainers: If the code in distance.c changes in future R releases compared to 2.13.0, please update the definitions here, if necessary. */ // translation of variable names #define nc dim #define nr N #define x Xa #define p postprocessarg // The code from distance.c starts here #define both_FINITE(a,b) (R_FINITE(a) && R_FINITE(b)) #ifdef R_160_and_older #define both_non_NA both_FINITE #else #define both_non_NA(a,b) (!ISNAN(a) && !ISNAN(b)) #endif /* We need two variants of the Euclidean metric: one that does not check for a NaN result, which is used for the initial distances, and one which does, for the updated distances during the clustering procedure. */ // still public template double sqeuclidean(t_index const i1, t_index const i2) const { double dev, dist; int count, j; count = 0; dist = 0; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { dev = (*p1 - *p2); if(!ISNAN(dev)) { dist += dev * dev; ++count; } } ++p1; ++p2; } if(count == 0) return NA_REAL; if(count != nc) dist /= (static_cast(count)/static_cast(nc)); //return sqrt(dist); // we take the square root later if (check_NaN) { if (fc_isnan(dist)) throw(nan_error()); } return dist; } inline double sqeuclidean_extended(t_index const i1, t_index const i2) const { return sqeuclidean(row_repr[i1], row_repr[i2]); } private: double maximum(t_index i1, t_index i2) const { double dev, dist; int count, j; count = 0; dist = -DBL_MAX; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { dev = std::abs(*p1 - *p2); if(!ISNAN(dev)) { if(dev > dist) dist = dev; ++count; } } ++p1; ++p2; } if(count == 0) return NA_REAL; return dist; } double manhattan(t_index i1, t_index i2) const { double dev, dist; int count, j; count = 0; dist = 0; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { dev = std::abs(*p1 - *p2); if(!ISNAN(dev)) { dist += dev; ++count; } } ++p1; ++p2; } if(count == 0) return NA_REAL; if(count != nc) dist /= (static_cast(count)/static_cast(nc)); return dist; } double canberra(t_index i1, t_index i2) const { double dev, dist, sum, diff; int count, j; count = 0; dist = 0; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { sum = std::abs(*p1) + std::abs(*p2); diff = std::abs(*p1 - *p2); if (sum > DBL_MIN || diff > DBL_MIN) { dev = diff/sum; if(!ISNAN(dev) || (!R_FINITE(diff) && diff == sum && /* use Inf = lim x -> oo */ (dev = 1., true))) { dist += dev; ++count; } } } ++p1; ++p2; } if(count == 0) return NA_REAL; if(count != nc) dist /= (static_cast(count)/static_cast(nc)); return dist; } double canberra_old(t_index i1, t_index i2) const { double dev, dist, sum, diff; int count, j; count = 0; dist = 0; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { sum = std::abs(*p1 + *p2); diff = std::abs(*p1 - *p2); if (sum > DBL_MIN || diff > DBL_MIN) { dev = diff/sum; if(!ISNAN(dev) || (!R_FINITE(diff) && diff == sum && /* use Inf = lim x -> oo */ (dev = 1., true))) { dist += dev; ++count; } } } ++p1; ++p2; } if(count == 0) return NA_REAL; if(count != nc) dist /= (static_cast(count)/static_cast(nc)); return dist; } double dist_binary(t_index i1, t_index i2) const { int total, count, dist; int j; total = 0; count = 0; dist = 0; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { if(!both_FINITE(*p1, *p2)) { // warning(_("treating non-finite values as NA")); } else { if(*p1 || *p2) { ++count; if( ! (*p1 && *p2) ) { ++dist; } } ++total; } } ++p1; ++p2; } if(total == 0) return NA_REAL; if(count == 0) return 0; return static_cast(dist) / static_cast(count); } double minkowski(t_index i1, t_index i2) const { double dev, dist; int count, j; count= 0; dist = 0; double * p1 = x+i1*nc; double * p2 = x+i2*nc; for(j = 0 ; j < nc ; ++j) { if(both_non_NA(*p1, *p2)) { dev = (*p1 - *p2); if(!ISNAN(dev)) { dist += R_pow(std::abs(dev), p); ++count; } } ++p1; ++p2; } if(count == 0) return NA_REAL; if(count != nc) dist /= (static_cast(count)/static_cast(nc)); //return R_pow(dist, 1.0/p); // raise to the (1/p)-th power later return dist; } }; extern "C" { SEXP fastcluster(SEXP const N_, SEXP const method_, SEXP D_, SEXP members_) { SEXP r = NULL; // return value try{ /* Input checks */ // Parameter N: number of data points if (!IS_INTEGER(N_) || LENGTH(N_)!=1) Rf_error("'N' must be a single integer."); const int N = INTEGER_VALUE(N_); if (N<2) Rf_error("N must be at least 2."); const R_xlen_t NN = static_cast(N)*(N-1)/2; // Parameter method: dissimilarity index update method if (!IS_INTEGER(method_) || LENGTH(method_)!=1) Rf_error("'method' must be a single integer."); const int method = INTEGER_VALUE(method_) - 1; // index-0 based; if (methodMAX_METHOD_CODE) { Rf_error("Invalid method index."); } // Parameter members: number of members in each node auto_array_ptr members; if (method==METHOD_METR_AVERAGE || method==METHOD_METR_WARD_D || method==METHOD_METR_WARD_D2 || method==METHOD_METR_CENTROID) { members.init(N); if (Rf_isNull(members_)) { for (t_index i=0; i D__; if (method!=METHOD_METR_SINGLE) { D__.init(NN); for (R_xlen_t i=0; i(N)*(N-1)/2; ++DD) *DD *= *DD; } /* Clustering step */ cluster_result Z2(N-1); switch (method) { case METHOD_METR_SINGLE: MST_linkage_core(N, D, Z2); break; case METHOD_METR_COMPLETE: NN_chain_core(N, D__, NULL, Z2); break; case METHOD_METR_AVERAGE: NN_chain_core(N, D__, members, Z2); break; case METHOD_METR_WEIGHTED: NN_chain_core(N, D__, NULL, Z2); break; case METHOD_METR_WARD_D: case METHOD_METR_WARD_D2: NN_chain_core(N, D__, members, Z2); break; case METHOD_METR_CENTROID: generic_linkage(N, D__, members, Z2); break; case METHOD_METR_MEDIAN: generic_linkage(N, D__, NULL, Z2); break; default: throw std::runtime_error(std::string("Invalid method.")); } D__.free(); // Free the memory now members.free(); // (not strictly necessary). SEXP m; // return field "merge" PROTECT(m = NEW_INTEGER(2*(N-1))); int * const merge = INTEGER_POINTER(m); SEXP dim_m; // Specify that m is an (N-1)×2 matrix PROTECT(dim_m = NEW_INTEGER(2)); INTEGER(dim_m)[0] = N-1; INTEGER(dim_m)[1] = 2; SET_DIM(m, dim_m); SEXP h; // return field "height" PROTECT(h = NEW_NUMERIC(N-1)); double * const height = NUMERIC_POINTER(h); SEXP o; // return fiels "order' PROTECT(o = NEW_INTEGER(N)); int * const order = INTEGER_POINTER(o); if (method==METHOD_METR_WARD_D2) { Z2.sqrt(); } if (method==METHOD_METR_CENTROID || method==METHOD_METR_MEDIAN) generate_R_dendrogram(merge, height, order, Z2, N); else generate_R_dendrogram(merge, height, order, Z2, N); SEXP n; // names PROTECT(n = NEW_CHARACTER(3)); SET_STRING_ELT(n, 0, COPY_TO_USER_STRING("merge")); SET_STRING_ELT(n, 1, COPY_TO_USER_STRING("height")); SET_STRING_ELT(n, 2, COPY_TO_USER_STRING("order")); PROTECT(r = NEW_LIST(3)); // field names in the output list SET_ELEMENT(r, 0, m); SET_ELEMENT(r, 1, h); SET_ELEMENT(r, 2, o); SET_NAMES(r, n); UNPROTECT(6); // m, dim_m, h, o, r, n } // try catch (const std::bad_alloc&) { Rf_error( "Memory overflow."); } catch(const std::exception& e){ Rf_error( "%s", e.what() ); } catch(const nan_error&){ Rf_error("NaN dissimilarity value."); } #ifdef FE_INVALID catch(const fenv_error&){ Rf_error( "NaN dissimilarity value in intermediate results."); } #endif catch(...){ Rf_error( "C++ exception (unknown reason)." ); } return r; } SEXP fastcluster_vector(SEXP const method_, SEXP const metric_, SEXP X_, SEXP members_, SEXP p_) { SEXP r = NULL; // return value try{ /* Input checks */ // Parameter method: dissimilarity index update method if (!IS_INTEGER(method_) || LENGTH(method_)!=1) Rf_error("'method' must be a single integer."); int method = INTEGER_VALUE(method_) - 1; // index-0 based; if (methodMAX_METHOD_VECTOR_CODE) { Rf_error("Invalid method index."); } // Parameter metric if (!IS_INTEGER(metric_) || LENGTH(metric_)!=1) Rf_error("'metric' must be a single integer."); int metric = INTEGER_VALUE(metric_) - 1; // index-0 based; if (metric<0 || metric>6 || (method!=METHOD_VECTOR_SINGLE && metric!=0) ) { Rf_error("Invalid metric index."); } // data array PROTECT(X_ = AS_NUMERIC(X_)); SEXP dims_ = PROTECT( Rf_getAttrib( X_, R_DimSymbol ) ) ; if( dims_ == R_NilValue || LENGTH(dims_) != 2 ) { Rf_error( "Argument is not a matrix."); } const int * const dims = INTEGER(dims_); const int N = dims[0]; const int dim = dims[1]; if (N<2) Rf_error("There must be at least two data points."); // Make a working copy of the dissimilarity array // for all methods except "single". double * X__ = NUMERIC_POINTER(X_); // Copy the input array and change it from Fortran-contiguous style // to C-contiguous style. auto_array_ptr X(LENGTH(X_)); for (std::ptrdiff_t i=0; i members; if (method==METHOD_VECTOR_WARD || method==METHOD_VECTOR_CENTROID) { members.init(N); if (Rf_isNull(members_)) { for (t_index i=0; i(method), static_cast(metric), p, make_row_repr); cluster_result Z2(N-1); /* Clustering step */ switch (method) { case METHOD_VECTOR_SINGLE: MST_linkage_core_vector(N, dist, Z2); break; case METHOD_VECTOR_WARD: generic_linkage_vector(N, dist, Z2); break; case METHOD_VECTOR_CENTROID: generic_linkage_vector_alternative(N, dist, Z2); break; case METHOD_VECTOR_MEDIAN: generic_linkage_vector_alternative(N, dist, Z2); break; default: throw std::runtime_error(std::string("Invalid method.")); } X.free(); // Free the memory now members.free(); // (not strictly necessary). dist.postprocess(Z2); SEXP m; // return field "merge" PROTECT(m = NEW_INTEGER(2*(N-1))); int * const merge = INTEGER_POINTER(m); SEXP dim_m; // Specify that m is an (N-1)×2 matrix PROTECT(dim_m = NEW_INTEGER(2)); INTEGER(dim_m)[0] = N-1; INTEGER(dim_m)[1] = 2; SET_DIM(m, dim_m); SEXP h; // return field "height" PROTECT(h = NEW_NUMERIC(N-1)); double * const height = NUMERIC_POINTER(h); SEXP o; // return fiels "order' PROTECT(o = NEW_INTEGER(N)); int * const order = INTEGER_POINTER(o); if (method==METHOD_VECTOR_SINGLE) generate_R_dendrogram(merge, height, order, Z2, N); else generate_R_dendrogram(merge, height, order, Z2, N); SEXP n; // names PROTECT(n = NEW_CHARACTER(3)); SET_STRING_ELT(n, 0, COPY_TO_USER_STRING("merge")); SET_STRING_ELT(n, 1, COPY_TO_USER_STRING("height")); SET_STRING_ELT(n, 2, COPY_TO_USER_STRING("order")); PROTECT(r = NEW_LIST(3)); // field names in the output list SET_ELEMENT(r, 0, m); SET_ELEMENT(r, 1, h); SET_ELEMENT(r, 2, o); SET_NAMES(r, n); UNPROTECT(6); // m, dim_m, h, o, r, n } // try catch (const std::bad_alloc&) { Rf_error( "Memory overflow."); } catch(const std::exception& e){ Rf_error( "%s", e.what() ); } catch(const nan_error&){ Rf_error("NaN dissimilarity value."); } catch(...){ Rf_error( "C++ exception (unknown reason)." ); } return r; } #if HAVE_VISIBILITY #pragma GCC visibility push(default) #endif void R_init_fastcluster(DllInfo * const dll) { R_CallMethodDef callMethods[] = { {"fastcluster", (DL_FUNC) &fastcluster, 4}, {"fastcluster_vector", (DL_FUNC) &fastcluster_vector, 5}, {NULL, NULL, 0} }; R_registerRoutines(dll, NULL, callMethods, NULL, NULL); R_useDynamicSymbols(dll, FALSE); R_forceSymbols(dll, TRUE); } #if HAVE_VISIBILITY #pragma GCC visibility pop #endif } // extern "C" #if HAVE_VISIBILITY #pragma GCC visibility pop #endif fastcluster/src/Makevars0000644000176200001440000000214013303334513015067 0ustar liggesusersOBJECTS = fastcluster_R.o # Optionally: Optimize, and warn about everything #PKG_CPPFLAGS = -O2 -g0 -march=native -mtune=native -fno-math-errno -Wl,--strip-all -Wall -Weffc++ -Wextra -Wcast-align -Wchar-subscripts -Wcomment -Wconversion -Wsign-conversion -Wdisabled-optimization -Wfloat-equal -Wformat -Wformat=2 -Wformat-nonliteral -Wformat-security -Wformat-y2k -Wimport -Winit-self -Winline -Winvalid-pch -Wunsafe-loop-optimizations -Wmissing-braces -Wmissing-field-initializers -Wmissing-format-attribute -Wmissing-include-dirs -Wmissing-noreturn -Wpacked -Wparentheses -Wpointer-arith -Wredundant-decls -Wreturn-type -Wsequence-point -Wshadow -Wsign-compare -Wstack-protector -Wstrict-aliasing -Wstrict-aliasing=2 -Wswitch -Wswitch-enum -Wtrigraphs -Wuninitialized -Wunknown-pragmas -Wunreachable-code -Wunused -Wunused-function -Wunused-label -Wunused-parameter -Wunused-value -Wunused-variable -Wvariadic-macros -Wvolatile-register-var -Wwrite-strings -Wlong-long -Wpadded -Wcast-qual -Wswitch-default -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Waggregate-return -Werror -pedantic -pedantic-errors fastcluster/src/python/0000755000176200001440000000000014541365350014727 5ustar liggesusersfastcluster/src/python/fastcluster.py0000644000176200001440000004463614541365036017656 0ustar liggesusers# -*- coding: utf-8 -*- __doc__ = """Fast hierarchical clustering routines for R and Python Copyright: Until package version 1.1.23: © 2011 Daniel Müllner All changes from version 1.1.24 on: © Google Inc. This module provides fast hierarchical clustering routines. The "linkage" method is designed to provide a replacement for the “linkage” function and its siblings in the scipy.cluster.hierarchy module. You may use the methods in this module with the same syntax as the corresponding SciPy functions but with the benefit of much faster performance. The method "linkage_vector" performs clustering of vector data with memory- saving algorithms. Refer to the User's manual "fastcluster.pdf" for comprehensive details. It is located in the directory inst/doc/ in the source distribution and may also be obtained at . """ __all__ = ['single', 'complete', 'average', 'weighted', 'ward', 'centroid', 'median', 'linkage', 'linkage_vector'] __version_info__ = ('1', '2', '6') __version__ = '.'.join(__version_info__) from numpy import double, empty, array, ndarray, var, cov, dot, expand_dims, \ ceil, sqrt from numpy.linalg import inv try: from scipy.spatial.distance import pdist except ImportError: def pdist(*args, **kwargs): raise ImportError('The fastcluster.linkage function cannot process ' 'vector data since the function ' 'scipy.spatial.distance.pdist could not be ' 'imported.') from _fastcluster import linkage_wrap, linkage_vector_wrap def single(D): '''Single linkage clustering (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='single') def complete(D): '''Complete linkage clustering (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='complete') def average(D): '''Hierarchical clustering with the “average” distance update formula (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='average') def weighted(D): '''Hierarchical clustering with the “weighted” distance update formula (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='weighted') def ward(D): '''Hierarchical clustering with the “Ward” distance update formula (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='ward') def centroid(D): '''Hierarchical clustering with the “centroid” distance update formula (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='centroid') def median(D): '''Hierarchical clustering with the “median” distance update formula (alias). See the help on the “linkage” function for further information.''' return linkage(D, method='median') # This dictionary must agree with the enum method_codes in fastcluster.cpp. mthidx = {'single' : 0, 'complete' : 1, 'average' : 2, 'weighted' : 3, 'ward' : 4, 'centroid' : 5, 'median' : 6 } def linkage(X, method='single', metric='euclidean', preserve_input=True): r'''Hierarchical, agglomerative clustering on a dissimilarity matrix or on Euclidean data. Apart from the argument 'preserve_input', the method has the same input parameters and output format as the functions of the same name in the module scipy.cluster.hierarchy. The argument X is preferably a NumPy array with floating point entries (X.dtype==numpy.double). Any other data format will be converted before it is processed. If X is a one-dimensional array, it is considered a condensed matrix of pairwise dissimilarities in the format which is returned by scipy.spatial.distance.pdist. It contains the flattened, upper- triangular part of a pairwise dissimilarity matrix. That is, if there are N data points and the matrix d contains the dissimilarity between the i-th and j-th observation at position d(i,j), the vector X has length N(N-1)/2 and is ordered as follows: [ d(0,1), d(0,2), ..., d(0,n-1), d(1,2), ..., d(1,n-1), ..., d(n-2,n-1) ] The 'metric' argument is ignored in case of dissimilarity input. The optional argument 'preserve_input' specifies whether the method makes a working copy of the dissimilarity vector or writes temporary data into the existing array. If the dissimilarities are generated for the clustering step only and are not needed afterward, approximately half the memory can be saved by specifying 'preserve_input=False'. Note that the input array X contains unspecified values after this procedure. It is therefore safer to write linkage(X, method="...", preserve_input=False) del X to make sure that the matrix X is not accessed accidentally after it has been used as scratch memory. (The single linkage algorithm does not write to the distance matrix or its copy anyway, so the 'preserve_input' flag has no effect in this case.) If X contains vector data, it must be a two-dimensional array with N observations in D dimensions as an (N×D) array. The preserve_input argument is ignored in this case. The specified metric is used to generate pairwise distances from the input. The following two function calls yield the same output: linkage(pdist(X, metric), method="...", preserve_input=False) linkage(X, metric=metric, method="...") The general scheme of the agglomerative clustering procedure is as follows: 1. Start with N singleton clusters (nodes) labeled 0,...,N−1, which represent the input points. 2. Find a pair of nodes with minimal distance among all pairwise distances. 3. Join the two nodes into a new node and remove the two old nodes. The new nodes are labeled consecutively N, N+1, ... 4. The distances from the new node to all other nodes is determined by the method parameter (see below). 5. Repeat N−1 times from step 2, until there is one big node, which contains all original input points. The output of linkage is stepwise dendrogram, which is represented as an (N−1)×4 NumPy array with floating point entries (dtype=numpy.double). The first two columns contain the node indices which are joined in each step. The input nodes are labeled 0,...,N−1, and the newly generated nodes have the labels N,...,2N−2. The third column contains the distance between the two nodes at each step, ie. the current minimal distance at the time of the merge. The fourth column counts the number of points which comprise each new node. The parameter method specifies which clustering scheme to use. The clustering scheme determines the distance from a new node to the other nodes. Denote the dissimilarities by d, the nodes to be joined by I, J, the new node by K and any other node by L. The symbol |I| denotes the size of the cluster I. method='single': d(K,L) = min(d(I,L), d(J,L)) The distance between two clusters A, B is the closest distance between any two points in each cluster: d(A,B) = min{ d(a,b) | a∈A, b∈B } method='complete': d(K,L) = max(d(I,L), d(J,L)) The distance between two clusters A, B is the maximal distance between any two points in each cluster: d(A,B) = max{ d(a,b) | a∈A, b∈B } method='average': d(K,L) = ( |I|·d(I,L) + |J|·d(J,L) ) / (|I|+|J|) The distance between two clusters A, B is the average distance between the points in the two clusters: d(A,B) = (|A|·|B|)^(-1) · \sum { d(a,b) | a∈A, b∈B } method='weighted': d(K,L) = (d(I,L)+d(J,L))/2 There is no global description for the distance between clusters since the distance depends on the order of the merging steps. The following three methods are intended for Euclidean data only, ie. when X contains the pairwise (non-squared!) distances between vectors in Euclidean space. The algorithm will work on any input, however, and it is up to the user to make sure that applying the methods makes sense. method='centroid': d(K,L) = ( (|I|·d(I,L) + |J|·d(J,L)) / (|I|+|J|) − |I|·|J|·d(I,J)/(|I|+|J|)^2 )^(1/2) There is a geometric interpretation: d(A,B) is the distance between the centroids (ie. barycenters) of the clusters in Euclidean space: d(A,B) = ‖c_A−c_B∥, where c_A denotes the centroid of the points in cluster A. method='median': d(K,L) = ( d(I,L)/2 + d(J,L)/2 − d(I,J)/4 )^(1/2) Define the midpoint w_K of a cluster K iteratively as w_K=k if K={k} is a singleton and as the midpoint (w_I+w_J)/2 if K is formed by joining I and J. Then we have d(A,B) = ∥w_A−w_B∥ in Euclidean space for all nodes A,B. Notice however that this distance depends on the order of the merging steps. method='ward': d(K,L) = ( ((|I|+|L)d(I,L) + (|J|+|L|)d(J,L) − |L|d(I,J)) / (|I|+|J|+|L|) )^(1/2) The global cluster dissimilarity can be expressed as d(A,B) = ( 2|A|·|B|/(|A|+|B|) )^(1/2) · ‖c_A−c_B∥, where c_A again denotes the centroid of the points in cluster A. The clustering algorithm handles infinite values correctly, as long as the chosen distance update formula makes sense. If a NaN value occurs, either in the original dissimilarities or as an updated dissimilarity, an error is raised. The linkage method does not treat NumPy's masked arrays as special and simply ignores the mask.''' X = array(X, copy=False, subok=True) if X.ndim==1: if method=='single': preserve_input = False X = array(X, dtype=double, copy=preserve_input, order='C', subok=True) NN = len(X) N = int(ceil(sqrt(NN*2))) if (N*(N-1)//2) != NN: raise ValueError(r'The length of the condensed distance matrix ' r'must be (k \choose 2) for k data points!') else: assert X.ndim==2 N = len(X) X = pdist(X, metric=metric) X = array(X, dtype=double, copy=False, order='C', subok=True) Z = empty((N-1,4)) if N > 1: linkage_wrap(N, X, Z, mthidx[method]) return Z # This dictionary must agree with the enum metric_codes in fastcluster_python.cpp. mtridx = {'euclidean' : 0, 'minkowski' : 1, 'cityblock' : 2, 'seuclidean' : 3, 'sqeuclidean' : 4, 'cosine' : 5, 'hamming' : 6, 'jaccard' : 7, 'chebychev' : 8, 'canberra' : 9, 'braycurtis' : 10, 'mahalanobis' : 11, 'yule' : 12, 'matching' : 13, 'sokalmichener' : 13, # an alias for 'matching' 'dice' : 14, 'rogerstanimoto' : 15, 'russellrao' : 16, 'sokalsneath' : 17, 'kulsinski' : 18, 'USER' : 19, } booleanmetrics = ('yule', 'matching', 'dice', 'kulsinski', 'rogerstanimoto', 'sokalmichener', 'russellrao', 'sokalsneath', 'kulsinski') def linkage_vector(X, method='single', metric='euclidean', extraarg=None): r'''Hierarchical (agglomerative) clustering on Euclidean data. Compared to the 'linkage' method, 'linkage_vector' uses a memory-saving algorithm. While the linkage method requires Θ(N^2) memory for clustering of N points, this method needs Θ(ND) for N points in R^D, which is usually much smaller. The argument X has the same format as before, when X describes vector data, ie. it is an (N×D) array. Also the output array has the same format. The parameter method must be one of 'single', 'centroid', 'median', 'ward', ie. only for these methods there exist memory-saving algorithms currently. If 'method', is one of 'centroid', 'median', 'ward', the 'metric' must be 'euclidean'. For single linkage clustering, any dissimilarity function may be chosen. Basically, every metric which is implemented in the method scipy.spatial.distance.pdist is reimplemented here. However, the metrics differ in some instances since a number of mistakes and typos (both in the code and in the documentation) were corrected in the fastcluster package. Therefore, the available metrics with their definitions are listed below as a reference. The symbols u and v mostly denote vectors in R^D with coordinates u_j and v_j respectively. See below for additional metrics for Boolean vectors. Unless otherwise stated, the input array X is converted to a floating point array (X.dtype==numpy.double) if it does not have already the required data type. Some metrics accept Boolean input; in this case this is stated explicitly below. If a NaN value occurs, either in the original dissimilarities or as an updated dissimilarity, an error is raised. In principle, the clustering algorithm handles infinite values correctly, but the user is advised to carefully check the behavior of the metric and distance update formulas under these circumstances. The distance formulas combined with the clustering in the 'linkage_vector' method do not have specified behavior if the data X contains infinite or NaN values. Also, the masks in NumPy’s masked arrays are simply ignored. metric='euclidean': Euclidean metric, L_2 norm d(u,v) = ∥u−v∥ = ( \sum_j { (u_j−v_j)^2 } )^(1/2) metric='sqeuclidean': squared Euclidean metric d(u,v) = ∥u−v∥^2 = \sum_j { (u_j−v_j)^2 } metric='seuclidean': standardized Euclidean metric d(u,v) = ( \sum_j { (u_j−v_j)^2 / V_j } )^(1/2) The vector V=(V_0,...,V_{D−1}) is given as the 'extraarg' argument. If no 'extraarg' is given, V_j is by default the unbiased sample variance of all observations in the j-th coordinate: V_j = Var_i (X(i,j) ) = 1/(N−1) · \sum_i ( X(i,j)^2 − μ(X_j)^2 ) (Here, μ(X_j) denotes as usual the mean of X(i,j) over all rows i.) metric='mahalanobis': Mahalanobis distance d(u,v) = ( transpose(u−v) V (u−v) )^(1/2) Here, V=extraarg, a (D×D)-matrix. If V is not specified, the inverse of the covariance matrix numpy.linalg.inv(numpy.cov(X, rowvar=False)) is used. metric='cityblock': the Manhattan distance, L_1 norm d(u,v) = \sum_j |u_j−v_j| metric='chebychev': the supremum norm, L_∞ norm d(u,v) = max_j { |u_j−v_j| } metric='minkowski': the L_p norm d(u,v) = ( \sum_j |u_j−v_j|^p ) ^(1/p) This metric coincides with the cityblock, euclidean and chebychev metrics for p=1, p=2 and p=∞ (numpy.inf), respectively. The parameter p is given as the 'extraarg' argument. metric='cosine' d(u,v) = 1 − ⟨u,v⟩ / (∥u∥·∥v∥) = 1 − (\sum_j u_j·v_j) / ( (\sum u_j^2)(\sum v_j^2) )^(1/2) metric='correlation': This method first mean-centers the rows of X and then applies the 'cosine' distance. Equivalently, the correlation distance measures 1 − (Pearson’s correlation coefficient). d(u,v) = 1 − ⟨u−μ(u),v−μ(v)⟩ / (∥u−μ(u)∥·∥v−μ(v)∥) metric='canberra' d(u,v) = \sum_j ( |u_j−v_j| / (|u_j|+|v_j|) ) Summands with u_j=v_j=0 contribute 0 to the sum. metric='braycurtis' d(u,v) = (\sum_j |u_j-v_j|) / (\sum_j |u_j+v_j|) metric=(user function): The parameter metric may also be a function which accepts two NumPy floating point vectors and returns a number. Eg. the Euclidean distance could be emulated with fn = lambda u, v: numpy.sqrt(((u-v)*(u-v)).sum()) linkage_vector(X, method='single', metric=fn) This method, however, is much slower than the build-in function. metric='hamming': The Hamming distance accepts a Boolean array (X.dtype==bool) for efficient storage. Any other data type is converted to numpy.double. d(u,v) = |{j | u_j≠v_j }| metric='jaccard': The Jaccard distance accepts a Boolean array (X.dtype==bool) for efficient storage. Any other data type is converted to numpy.double. d(u,v) = |{j | u_j≠v_j }| / |{j | u_j≠0 or v_j≠0 }| d(0,0) = 0 Python represents True by 1 and False by 0. In the Boolean case, the Jaccard distance is therefore: d(u,v) = |{j | u_j≠v_j }| / |{j | u_j ∨ v_j }| The following metrics are designed for Boolean vectors. The input array is converted to the 'bool' data type if it is not Boolean already. Use the following abbreviations to count the number of True/False combinations: a = |{j | u_j ∧ v_j }| b = |{j | u_j ∧ (¬v_j) }| c = |{j | (¬u_j) ∧ v_j }| d = |{j | (¬u_j) ∧ (¬v_j) }| Recall that D denotes the number of dimensions, hence D=a+b+c+d. metric='yule' d(u,v) = 2bc / (ad+bc) if bc≠0 d(u,v) = 0 if bc=0 metric='dice': d(u,v) = (b+c) / (2a+b+c) d(0,0) = 0 metric='rogerstanimoto': d(u,v) = 2(b+c) / (b+c+D) metric='russellrao': d(u,v) = (b+c+d) / D metric='sokalsneath': d(u,v) = 2(b+c)/ ( a+2(b+c)) d(0,0) = 0 metric='kulsinski' d(u,v) = (b/(a+b) + c/(a+c)) / 2 metric='matching': d(u,v) = (b+c)/D Notice that when given a Boolean array, the 'matching' and 'hamming' distance are the same. The 'matching' distance formula, however, converts every input to Boolean first. Hence, the vectors (0,1) and (0,2) have zero 'matching' distance since they are both converted to (False, True) but the Hamming distance is 0.5. metric='sokalmichener' is an alias for 'matching'.''' if method=='single': assert metric!='USER' if metric in ('hamming', 'jaccard'): X = array(X, copy=False, subok=True) dtype = bool if X.dtype==bool else double else: dtype = bool if metric in booleanmetrics else double X = array(X, dtype=dtype, copy=False, order='C', subok=True) else: assert metric=='euclidean' X = array(X, dtype=double, copy=(method=='ward'), order='C', subok=True) assert X.ndim==2 N = len(X) Z = empty((N-1,4)) if metric=='seuclidean': if extraarg is None: extraarg = var(X, axis=0, ddof=1) elif metric=='mahalanobis': if extraarg is None: extraarg = inv(cov(X, rowvar=False)) # instead of the inverse covariance matrix, pass the matrix product # with the data matrix! extraarg = array(dot(X,extraarg),dtype=double, copy=False, order='C', subok=True) elif metric=='correlation': X = X-expand_dims(X.mean(axis=1),1) metric='cosine' elif not isinstance(metric, str): assert extraarg is None metric, extraarg = 'USER', metric elif metric!='minkowski': assert extraarg is None if N > 1: linkage_vector_wrap(X, Z, mthidx[method], mtridx[metric], extraarg) return Z fastcluster/src/python/setup.py0000644000176200001440000001542514541365350016450 0ustar liggesusers#!/usr/bin/env python # -*- coding: utf-8 -*- u''' fastcluster: Fast hierarchical clustering routines for R and Python Copyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. ''' import os import sys import numpy from setuptools import setup, Extension from io import open with open('fastcluster.py', encoding='utf_8') as f: for line in f: if line.find('__version_info__ =') == 0: version = '.'.join(line.split("'")[1:-1:2]) break print('Fastcluster version: ' + version) print('Python version: ' + sys.version) setup(name='fastcluster', version=version, py_modules=['fastcluster'], description='Fast hierarchical clustering routines for R and Python.', long_description=u""" This library provides Python functions for hierarchical clustering. It generates hierarchical clusters from distance matrices or from vector data. This module is intended to replace the functions ``` linkage, single, complete, average, weighted, centroid, median, ward ``` in the module [`scipy.cluster.hierarchy`]( https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) with the same functionality but much faster algorithms. Moreover, the function `linkage_vector` provides memory-efficient clustering for vector data. The interface is very similar to MATLAB's Statistics Toolbox API to make code easier to port from MATLAB to Python/NumPy. The core implementation of this library is in C++ for efficiency. **User manual:** [fastcluster.pdf]( https://github.com/dmuellner/fastcluster/raw/master/docs/fastcluster.pdf). The “Yule” distance function changed in fastcluster version 1.2.0. This is following a [change in SciPy 1.6.3]( https://github.com/scipy/scipy/commit/3b22d1da98dc1b5f64bc944c21f398d4ba782bce). It is recommended to use fastcluster version 1.1.x together with SciPy versions before 1.6.3 and fastcluster 1.2.x with SciPy ≥1.6.3. The fastcluster package is considered stable and will undergo few changes from now on. If some years from now there have not been any updates, this does not necessarily mean that the package is unmaintained but maybe it just was not necessary to correct anything. Of course, please still report potential bugs and incompatibilities to daniel@danifold.net. You may also use [my GitHub repository](https://github.com/dmuellner/fastcluster/) for bug reports, pull requests etc. Note that [PyPI](https://pypi.org/project/fastcluster/) and [my GitHub repository](https://github.com/dmuellner/fastcluster/) host the source code for the Python interface only. The archive with both the R and the Python interface is available on [CRAN](https://CRAN.R-project.org/package=fastcluster) and the GitHub repository [“cran/fastcluster”](https://github.com/cran/fastcluster). Even though I appear as the author also of this second GitHub repository, this is just an automatic, read-only mirror of the CRAN archive, so please do not attempt to report bugs or contact me via this repository. Installation files for Windows are provided on [PyPI]( https://pypi.org/project/fastcluster/#files) and on [Christoph Gohlke's web page](http://www.lfd.uci.edu/~gohlke/pythonlibs/#fastcluster). Christoph Dalitz wrote a pure [C++ interface to fastcluster]( https://lionel.kr.hs-niederrhein.de/~dalitz/data/hclust/). Reference: Daniel Müllner, *fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python*, Journal of Statistical Software, **53** (2013), no. 9, 1–18, https://doi.org/10.18637/jss.v053.i09. """, long_description_content_type='text/markdown', python_requires='>=3', requires=['numpy'], install_requires=["numpy>=1.9"], extras_require={'test': ['scipy>=1.6.3']}, provides=['fastcluster'], ext_modules=[Extension('_fastcluster', ['fastcluster_python.cpp'], extra_compile_args=['/EHsc'] if os.name == 'nt' else [], include_dirs=[numpy.get_include()], # Feel free to uncomment the line below if you use the GCC. # This switches to more aggressive optimization and turns # more warning switches on. No warning should appear in # the compilation process. # # Also, the author's Python distribution generates debug # symbols by default. This can be turned off, resulting a in # much smaller compiled library. # # Optimization #extra_compile_args=['-O2', '-g0', '-march=native', '-mtune=native', '-fno-math-errno'], # # List of all warning switches, somewhere from stackoverflow.com #extra_compile_args=['-Wall', '-Weffc++', '-Wextra', '-Wall', '-Wcast-align', '-Wchar-subscripts', '-Wcomment', '-Wconversion', '-Wsign-conversion', '-Wdisabled-optimization', '-Wfloat-equal', '-Wformat', '-Wformat=2', '-Wformat-nonliteral', '-Wformat-security', '-Wformat-y2k', '-Wimport', '-Winit-self', '-Winline', '-Winvalid-pch', '-Wunsafe-loop-optimizations', '-Wmissing-braces', '-Wmissing-field-initializers', '-Wmissing-format-attribute', '-Wmissing-include-dirs', '-Wmissing-noreturn', '-Wpacked', '-Wparentheses', '-Wpointer-arith', '-Wredundant-decls', '-Wreturn-type', '-Wsequence-point', '-Wshadow', '-Wsign-compare', '-Wstack-protector', '-Wstrict-aliasing', '-Wstrict-aliasing=2', '-Wswitch', '-Wswitch-enum', '-Wtrigraphs', '-Wuninitialized', '-Wunknown-pragmas', '-Wunreachable-code', '-Wunused', '-Wunused-function', '-Wunused-label', '-Wunused-parameter', '-Wunused-value', '-Wunused-variable', '-Wvariadic-macros', '-Wvolatile-register-var', '-Wwrite-strings', '-Wlong-long', '-Wpadded', '-Wcast-qual', '-Wswitch-default', '-Wnon-virtual-dtor', '-Wold-style-cast', '-Woverloaded-virtual', '-Waggregate-return', '-Werror'], # # Linker optimization #extra_link_args=['-Wl,--strip-all'], )], keywords=['dendrogram', 'linkage', 'cluster', 'agglomerative', 'hierarchical', 'hierarchy', 'ward'], author=u"Daniel Müllner", author_email="daniel@danifold.net", license="BSD ", classifiers=[ "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Scientific/Engineering :: Mathematics", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: C++", "Operating System :: OS Independent", "License :: OSI Approved :: BSD License", "License :: OSI Approved :: GNU General Public License v2 (GPLv2)", "Intended Audience :: Science/Research", "Development Status :: 5 - Production/Stable"], url='https://danifold.net', test_suite='tests.fastcluster_test', ) fastcluster/src/python/tests/0000755000176200001440000000000014541365107016071 5ustar liggesusersfastcluster/src/python/tests/vectortest.py0000644000176200001440000002122514541365057020653 0ustar liggesusers#!/usr/bin/env python # -*- coding: utf-8 -*- # TBD test single on integer matrices for hamming/jaccard print(''' Test program for the 'fastcluster' package. Copyright: * Until package version 1.1.23: (c) 2011 Daniel Müllner * All changes from version 1.1.24 on: (c) Google Inc. ''') import sys import fastcluster as fc import numpy as np from scipy.spatial.distance import pdist, squareform import math version = '1.2.6' if fc.__version__ != version: raise ValueError('Wrong module version: {} instead of {}.'.format(fc.__version__, version)) import atexit def print_seed(): print("Seed: {0}".format(seed)) atexit.register(print_seed) seed = np.random.randint(0,1e9) print_seed() np.random.seed(seed) abstol = 1e-14 # absolute tolerance rtol = 1e-13 # relative tolerance # NaN values are used in computations. Do not warn about them. np.seterr(invalid='ignore') def correct_for_zero_vectors(D, pcd, metric): # Correct some metrics: we want the distance from the zero vector # to itself to be 0, not NaN. if metric in ('jaccard', 'dice', 'sokalsneath'): z = np.flatnonzero(np.all(pcd==0, axis=1)) if len(z): DD = squareform(D) DD[np.ix_(z, z)] = 0 D = squareform(DD) return D def test_all(n,dim): method = 'single' # metrics for boolean vectors pcd = np.random.randint(0, 2, size=(n,dim), dtype=bool) pcd2 = pcd.copy() for metric in ('hamming', 'jaccard', 'yule', 'matching', 'dice', 'rogerstanimoto', #'sokalmichener', # exclude, bug in Scipy # http://projects.scipy.org/scipy/ticket/1486 'russellrao', 'sokalsneath', #'kulsinski' # exclude, bug in Scipy # http://projects.scipy.org/scipy/ticket/1484 ): sys.stdout.write("Metric: " + metric + "...") D = pdist(pcd, metric=metric) D = correct_for_zero_vectors(D, pcd, metric) try: Z2 = fc.linkage_vector(pcd, method, metric) except FloatingPointError: # If linkage_vector reported a NaN dissimilarity value, # check whether the distance matrix really contains NaN. if np.any(np.isnan(D)): print("Skip this test: NaN dissimilarity value.") continue else: raise AssertionError('"linkage_vector" erroneously reported NaN.') if np.any(pcd2!=pcd): raise AssertionError('Input array was corrupted.', pcd) check(Z2, method, D) # metrics for real vectors bound = math.sqrt(n) pcd = np.random.randint(-bound, bound + 1, (n,dim)) for metric in ['euclidean', 'sqeuclidean', 'cityblock', 'chebychev', 'minkowski', 'cosine', 'correlation', 'hamming', 'jaccard', 'canberra', # canberra: see bug in older Scipy versions # http://projects.scipy.org/scipy/ticket/1430 'braycurtis', 'seuclidean', 'mahalanobis', 'user']: sys.stdout.write("Metric: " + metric + "...") if metric=='minkowski': p = np.random.uniform(1.,10.) sys.stdout.write("p: " + str(p) + "...") D = pdist(pcd, metric=metric, p=p) Z2 = fc.linkage_vector(pcd, method, metric, p) elif metric=='user': # Euclidean metric as a user function fn = (lambda u, v: np.sqrt(((u-v)*(u-v).T).sum())) D = pdist(pcd, metric=fn) Z2 = fc.linkage_vector(pcd, method, fn) else: D = pdist(pcd, metric=metric) D = correct_for_zero_vectors(D, pcd, metric) try: Z2 = fc.linkage_vector(pcd, method, metric) except FloatingPointError: if np.any(np.isnan(D)): print("Skip this test: NaN dissimilarity value.") continue else: raise AssertionError( '"linkage_vector" erroneously reported NaN.') check(Z2, method, D) D = pdist(pcd) for method in ['ward', 'centroid', 'median']: Z2 = fc.linkage_vector(pcd, method) check(Z2, method, D) def check(Z2, method, D): sys.stdout.write("Method: " + method + "...") I = np.array(Z2[:,:2], dtype=int) Ds = squareform(D) n = len(Ds) row_repr = np.arange(2*n-1) row_repr[n:] = -1 size = np.ones(n, dtype=int) np.fill_diagonal(Ds, np.nan) mins = np.empty(n-1) for i in range(n-1): for j in range(n-1): mins[j] = np.nanmin(Ds[j,j+1:]) gmin = np.nanmin(mins) if abs(Z2[i,2]-gmin) > max(abs(Z2[i,2]),abs(gmin))*rtol and \ abs(Z2[i,2]-gmin)>abstol: raise AssertionError( 'Not the global minimum in step {2}: {0}, {1}'. format(Z2[i,2], gmin,i), squareform(D)) i1, i2 = row_repr[I[i,:]] if (i1<0): raise AssertionError('Negative index i1.', squareform(D)) if (i2<0): raise AssertionError('Negative index i2.', squareform(D)) if I[i,0]>=I[i,1]: raise AssertionError('Convention violated.', squareform(D)) if i1>i2: i1, i2 = i2, i1 if abs(Ds[i1,i2]-gmin) > max(abs(Ds[i1,i2]),abs(gmin))*rtol and \ abs(Ds[i1,i2]-gmin)>abstol: raise AssertionError( 'The global minimum is not at the right place in step {5}: ' '({0}, {1}): {2} != {3}. Difference: {4}' .format(i1, i2, Ds[i1, i2], gmin, Ds[i1, i2]-gmin, i), squareform(D)) s1 = size[i1] s2 = size[i2] S = float(s1+s2) if method=='single': if i1>0: # mostly unnecessary; workaround for a bug/feature in NumPy # 1.7.0.dev, see http://projects.scipy.org/numpy/ticket/2078 Ds[:i1,i2] = np.min( Ds[:i1,(i1,i2)],axis=1) Ds[i1:i2,i2] = np.minimum(Ds[i1,i1:i2],Ds[i1:i2,i2]) Ds[i2,i2:] = np.min( Ds[(i1,i2),i2:],axis=0) elif method=='complete': if i1>0: Ds[:i1,i2] = np.max( Ds[:i1,(i1,i2)],axis=1) Ds[i1:i2,i2] = np.maximum(Ds[i1,i1:i2],Ds[i1:i2,i2]) Ds[i2,i2:] = np.max( Ds[(i1,i2),i2:],axis=0) elif method=='average': Ds[:i1,i2] = ( Ds[:i1,i1]*s1 + Ds[:i1,i2]*s2 ) / S Ds[i1:i2,i2] = ( Ds[i1,i1:i2]*s1 + Ds[i1:i2,i2]*s2 ) / S Ds[i2,i2:] = ( Ds[i1,i2:]*s1 + Ds[i2,i2:]*s2 ) / S elif method=='weighted': if i1>0: Ds[:i1,i2] = np.mean( Ds[:i1,(i1,i2)],axis=1) Ds[i1:i2,i2] = ( Ds[i1,i1:i2] + Ds[i1:i2,i2] )*.5 Ds[i2,i2:] = np.mean( Ds[(i1,i2),i2:],axis=0) elif method=='ward': Ds[:i1,i2] = np.sqrt((np.square(Ds[:i1,i1])*(s1+size[:i1]) -gmin*gmin*size[:i1] +np.square(Ds[:i1,i2])*(s2+size[:i1]))/(S+size[:i1])) Ds[i1:i2,i2] = np.sqrt((np.square(Ds[i1,i1:i2])*(s1+size[i1:i2]) -gmin*gmin*size[i1:i2] +np.square(Ds[i1:i2,i2])*(s2+size[i1:i2])) /(S+size[i1:i2])) Ds[i2,i2:] = np.sqrt((np.square(Ds[i1,i2:])*(s1+size[i2:]) -gmin*gmin*size[i2:] +np.square(Ds[i2,i2:])*(s2+size[i2:]))/(S+size[i2:])) elif method=='centroid': Ds[:i1,i2] = np.sqrt((np.square(Ds[:i1,i1])*s1 +np.square(Ds[:i1,i2])*s2)*S-gmin*gmin*s1*s2) / S Ds[i1:i2,i2] = np.sqrt((np.square(Ds[i1,i1:i2])*s1 +np.square(Ds[i1:i2,i2])*s2)*S-gmin*gmin*s1*s2) / S Ds[i2,i2:] = np.sqrt((np.square(Ds[i1,i2:])*s1 +np.square(Ds[i2,i2:])*s2)*S-gmin*gmin*s1*s2) / S elif method=='median': Ds[:i1,i2] = np.sqrt((np.square(Ds[:i1,i1]) +np.square(Ds[:i1,i2]))*2-gmin*gmin)*.5 Ds[i1:i2,i2] = np.sqrt((np.square(Ds[i1,i1:i2]) +np.square(Ds[i1:i2,i2]))*2-gmin*gmin)*.5 Ds[i2,i2:] = np.sqrt((np.square(Ds[i1,i2:]) +np.square(Ds[i2,i2:]))*2-gmin*gmin)*.5 else: raise ValueError('Unknown method.') Ds[i1, i1:n] = np.inf Ds[:i1, i1] = np.inf row_repr[n+i] = i2 size[i2] = S print('OK.') def test(repeats): if repeats: iterator = range(repeats) else: import itertools iterator = itertools.repeat(None) print(''' If everything is OK, the test program will run forever, without an error message. ''') for _ in iterator: dim = np.random.randint(2, 13) n = np.random.randint(max(2*dim,5),200) print('Dimension: {0}'.format(dim)) print('Number of points: {0}'.format(n)) test_all(n,dim) if __name__ == "__main__": test(None) fastcluster/src/python/tests/__init__.pyc0000644000176200001440000000223513602651751020347 0ustar liggesusers  Vc@s)ddlZdejfdYZdS(iNtfastcluster_testcBs#eZdZdZdZRS(cCs'ddlm}|j|ddS(Ni(ttesti (t tests.testRt assertTrue(tselfR((s?/home/muellner/cluster/fastcluster/src/python/tests/__init__.pyRscCs$ddlm}|j|dS(Ni(R(t tests.nantestRR(RR((s?/home/muellner/cluster/fastcluster/src/python/tests/__init__.pyttest_nanscCs'ddlm}|j|ddS(Ni(Ri (ttests.vectortestRR(RR((s?/home/muellner/cluster/fastcluster/src/python/tests/__init__.pyt test_vector s(t__name__t __module__RRR(((s?/home/muellner/cluster/fastcluster/src/python/tests/__init__.pyRs  (tunittesttTestCaseR(((s?/home/muellner/cluster/fastcluster/src/python/tests/__init__.pyts fastcluster/src/python/tests/__init__.py0000644000176200001440000000045314052433261020176 0ustar liggesusersimport unittest class fastcluster_test(unittest.TestCase): def test(self): from tests.test import test test(10) def test_nan(self): from tests.nantest import test test() def test_vector(self): from tests.vectortest import test test(10) fastcluster/src/python/tests/test.py0000644000176200001440000001352014541365073017425 0ustar liggesusers#!/usr/bin/env python # -*- coding: utf-8 -*- print(''' Test program for the 'fastcluster' package. Copyright: * Until package version 1.1.23: (c) 2011 Daniel Müllner * All changes from version 1.1.24 on: (c) Google Inc. ''') import sys import fastcluster as fc import numpy as np from scipy.spatial.distance import pdist, squareform import math version = '1.2.6' if fc.__version__ != version: raise ValueError('Wrong module version: {} instead of {}.'.format(fc.__version__, version)) import atexit def print_seed(): print("Seed: {0}".format(seed)) atexit.register(print_seed) seed = np.random.randint(0,1e9) np.random.seed(seed) #abstol = 1e-14 # absolute tolerance rtol = 1e-14 # relative tolerance # NaN values are used in computations. Do not warn about them. np.seterr(invalid='ignore') def test_all(D): D2 = D.copy() for method in ['single', 'complete', 'average', 'weighted', 'ward', 'centroid', 'median']: Z2 = fc.linkage(D, method) if np.any(D2!=D): raise AssertionError('Input array was corrupted.') check(Z2, D, method) def check(Z2, D, method): sys.stdout.write("Method: " + method + "...") I = np.array(Z2[:,:2], dtype=int) Ds = squareform(D) n = len(Ds) row_repr = np.arange(2*n-1) row_repr[n:] = -1 size = np.ones(n, dtype=int) np.fill_diagonal(Ds, np.nan) mins = np.empty(n-1) for i in range(n-1): for j in range(n-1): # Suppress warning if all distances are NaN. if np.all(np.isnan(Ds[j,j+1:])): mins[j] = np.nan else: mins[j] = np.nanmin(Ds[j,j+1:]) gmin = np.nanmin(mins) if (Z2[i,2]-gmin) > max(abs(Z2[i,2]),abs(gmin))*rtol: raise AssertionError('Not the global minimum in step {2}: {0}, {1}'.\ format(Z2[i,2], gmin, i)) i1, i2 = row_repr[I[i,:]] if (i1<0): raise AssertionError('Negative index i1.') if (i2<0): raise AssertionError('Negative index i2.') if I[i,0]>=I[i,1]: raise AssertionError('Convention violated.') if i1>i2: i1, i2 = i2, i1 if (Ds[i1,i2]-gmin) > max(abs(Ds[i1,i2]),abs(gmin))*rtol: raise AssertionError('The global minimum is not at the right place: ' '({0}, {1}): {2} != {3}. Difference: {4}'.\ format(i1, i2, Ds[i1, i2], gmin, Ds[i1, i2]-gmin)) s1 = size[i1] s2 = size[i2] S = float(s1+s2) if method=='single': if i1>0: # mostly unnecessary; workaround for a bug/feature in NumPy 1.7.0.dev # see http://projects.scipy.org/numpy/ticket/2078 Ds[:i1,i2] = np.min( Ds[:i1,(i1,i2)],axis=1) Ds[i1:i2,i2] = np.minimum(Ds[i1,i1:i2],Ds[i1:i2,i2]) Ds[i2,i2:] = np.min( Ds[(i1,i2),i2:],axis=0) elif method=='complete': if i1>0: Ds[:i1,i2] = np.max( Ds[:i1,(i1,i2)],axis=1) Ds[i1:i2,i2] = np.maximum(Ds[i1,i1:i2],Ds[i1:i2,i2]) Ds[i2,i2:] = np.max( Ds[(i1,i2),i2:],axis=0) elif method=='average': Ds[:i1,i2] = ( Ds[:i1,i1]*s1 + Ds[:i1,i2]*s2 ) / S Ds[i1:i2,i2] = ( Ds[i1,i1:i2]*s1 + Ds[i1:i2,i2]*s2 ) / S Ds[i2,i2:] = ( Ds[i1,i2:]*s1 + Ds[i2,i2:]*s2 ) / S elif method=='weighted': if i1>0: Ds[:i1,i2] = np.mean( Ds[:i1,(i1,i2)],axis=1) Ds[i1:i2,i2] = ( Ds[i1,i1:i2] + Ds[i1:i2,i2] ) *.5 Ds[i2,i2:] = np.mean( Ds[(i1,i2),i2:],axis=0) elif method=='ward': Ds[:i1,i2] = np.sqrt((np.square(Ds[:i1,i1])*(s1+size[:i1]) -gmin*gmin*size[:i1]+np.square(Ds[:i1,i2]) *(s2+size[:i1]))/(S+size[:i1])) Ds[i1:i2,i2] = np.sqrt((np.square(Ds[i1,i1:i2])*(s1+size[i1:i2]) -gmin*gmin*size[i1:i2]+np.square(Ds[i1:i2,i2]) *(s2+size[i1:i2]))/(S+size[i1:i2])) Ds[i2,i2:] = np.sqrt((np.square(Ds[i1,i2:])*(s1+size[i2:]) -gmin*gmin*size[i2:]+np.square(Ds[i2,i2:]) *(s2+size[i2:]))/(S+size[i2:])) elif method=='centroid': Ds[:i1,i2] = np.sqrt((np.square(Ds[:i1,i1])*s1 +np.square(Ds[:i1,i2])*s2)*S-gmin*gmin*s1*s2) / S Ds[i1:i2,i2] = np.sqrt((np.square(Ds[i1,i1:i2])*s1 +np.square(Ds[i1:i2,i2])*s2)*S-gmin*gmin*s1*s2) / S Ds[i2,i2:] = np.sqrt((np.square(Ds[i1,i2:])*s1 +np.square(Ds[i2,i2:])*s2)*S-gmin*gmin*s1*s2) / S elif method=='median': Ds[:i1,i2] = np.sqrt((np.square(Ds[:i1,i1])+\ np.square(Ds[:i1,i2]))*2-gmin*gmin)*.5 Ds[i1:i2,i2] = np.sqrt((np.square(Ds[i1,i1:i2])+\ np.square(Ds[i1:i2,i2]))*2-gmin*gmin)*.5 Ds[i2,i2:] = np.sqrt((np.square(Ds[i1,i2:])+\ np.square(Ds[i2,i2:]))*2-gmin*gmin)*.5 else: raise ValueError('Unknown method.') Ds[i1, i1:n] = np.nan Ds[:i1, i1] = np.nan row_repr[n+i] = i2 size[i2] = S print('OK.') def test(repeats): if repeats: iterator = range(repeats) else: import itertools iterator = itertools.repeat(None) print(''' If everything is OK, the test program will run forever, without an error message. ''') for _ in iterator: dim = np.random.randint(2,20) n = np.random.randint(2,100) print('Dimension: {0}'.format(dim)) print('Number of points: {0}'.format(n)) D = pdist(np.random.randn(n,dim)) print('Real distance values:') test_all(D) D = np.round(D*n/4) print('Integer distance values:') test_all(D) if __name__ == "__main__": test(None) fastcluster/src/python/tests/test.pyc0000644000176200001440000001446413602651751017576 0ustar liggesusers Q ^c@sdGHddlZddlZddlZddlmZmZddlZdZ ej e kr~e dj ej e nddl Z dZe jeejjddZejjed Zejd d d Zd ZdZedkredndS(s Test program for the 'fastcluster' package. Copyright: * Until package version 1.1.23: (c) 2011 Daniel Müllner * All changes from version 1.1.24 on: (c) Google Inc. iN(tpdistt squareforms1.1.26s'Wrong module version: {} instead of {}.cCsdjtGHdS(Ns Seed: {0}(tformattseed(((s;/home/muellner/cluster/fastcluster/src/python/tests/test.pyt print_seedsigeAg+=tinvalidtignorecCs||j}xidddddddgD]L}tj||}tj||krdtdnt|||q(WdS( NtsingletcompletetaveragetweightedtwardtcentroidtmediansInput array was corrupted.(tcopytfctlinkagetnptanytAssertionErrortcheck(tDtD2tmethodtZ2((s;/home/muellner/cluster/fastcluster/src/python/tests/test.pyttest_all s c Cs% tjjd|dtj|ddddfdt}t|}t|}tjd|d}d||)tj |dtj}tj |tj tj |d}x_ t |dD]M } x{t |dD]i} tjtj|| | ddfr$tj || B>  >B> FFI  >>> T/Z5T2 &O&O&R "C"C"F cCs |rt|}n ddl}|jd}dGHx|D]}tjjdd}tjjdd}dj|GHdj|GHttjj ||}y9dGHt |tj ||d }d GHt |Wq<t k r}|GHt |GHtSXq<WtS( NisS If everything is OK, the test program will run forever, without an error message. iiidsDimension: {0}sNumber of points: {0}sReal distance values:isInteger distance values:(R't itertoolstrepeattNoneRtrandomtrandintRRtrandnRtroundRRtFalsetTrue(trepeatstiteratorRDt_tdimR8RtE((s;/home/muellner/cluster/fastcluster/src/python/tests/test.pyttests,     t__main__(Rt fastclusterRtnumpyRtscipy.spatial.distanceRRtmathtversiont __version__R5RtatexitRtregisterRGRHRR-tseterrRRRRt__name__RF(((s;/home/muellner/cluster/fastcluster/src/python/tests/test.pyts(        `  fastcluster/src/python/tests/nantest.py0000644000176200001440000000416714541365107020127 0ustar liggesusers#!/usr/bin/env python # -*- coding: utf-8 -*- '''Test whether the fastcluster package correctly recognizes NaN values and raises a FloatingPointError.''' print(''' Test program for the 'fastcluster' package. Copyright: * Until package version 1.1.23: (c) 2011 Daniel Müllner * All changes from version 1.1.24 on: (c) Google Inc. ''') import numpy as np import fastcluster version = '1.2.6' if fastcluster.__version__ != version: raise ValueError('Wrong module version: {} instead of {}.'.format(fastcluster.__version__, version)) import atexit def print_seed(): print("Seed: {0}".format(seed)) atexit.register(print_seed) seed = np.random.randint(0,1e9) np.random.seed(seed) def test(): n = np.random.randint(2,100) # Part 1: distance matrix input N = n*(n-1)//2 D = np.random.rand(N) # Insert a single NaN value pos = np.random.randint(N) D[pos] = np.nan for method in ['single', 'complete', 'average', 'weighted', 'ward', 'centroid', 'median']: try: fastcluster.linkage(D, method=method) raise AssertionError('fastcluster did not detect a NaN value!') except FloatingPointError: pass # Next: the original array does not contain a NaN, but a NaN occurs # as an updated distance. for method in ['average', 'weighted', 'ward', 'centroid', 'median']: try: fastcluster.linkage([np.inf,-np.inf,-np.inf], method=method) raise AssertionError('fastcluster did not detect a NaN value!') except FloatingPointError: pass # Part 2: vector input dim = np.random.randint(2,13) X = np.random.rand(n,dim) pos = (np.random.randint(n), np.random.randint(dim)) # Insert a single NaN coordinate X[pos] = np.nan for method in ['single', 'ward', 'centroid', 'median']: try: fastcluster.linkage_vector(X, method=method) raise AssertionError('fastcluster did not detect a NaN value!') except FloatingPointError: pass if __name__ == "__main__": test() print('OK.') fastcluster/src/python/tests/__pycache__/0000755000176200001440000000000013602651655020304 5ustar liggesusersfastcluster/src/python/tests/__pycache__/test.cpython-38.pyc0000644000176200001440000001222113602651652023706 0ustar liggesusersU Q ^@sedddlZddlZddlZddlmZmZddl Z dZ ej e krXe d ej e ddlZddZeeejddZejed Zejd d d d ZddZddZedkreddS)u Test program for the 'fastcluster' package. Copyright: * Until package version 1.1.23: (c) 2011 Daniel Müllner * All changes from version 1.1.24 on: (c) Google Inc. N)pdist squareformz1.1.26z'Wrong module version: {} instead of {}.cCstdtdS)Nz Seed: {0})printformatseedrr;/home/muellner/cluster/fastcluster/src/python/tests/test.py print_seedsr geAg+=ignore)invalidcCsD|}dD]2}t||}t||kr2tdt|||q dS)N)singlecompleteaverageweightedwardcentroidmedianzInput array was corrupted.)copyfclinkagenpanyAssertionErrorcheck)DZD2methodZ2rrrtest_all s  rc Cstjd|dtj|ddddftd}t|}t|}td|d}d||d<tj |tjd}t |tj t |d}t |dD]} t |dD]N} tt|| | ddfrtj || <qt|| | ddf|| <qt|} || df| tt|| dft| tkrNtd|| df| | ||| ddf\} } | dkrxtd | dkrtd || df|| dfkrtd | | kr| | } } || | f| tt|| | ft| tkrtd | | || | f| || | f| || }|| }t||}|d kr| dkrttj|d| | | ffdd|d| | f<t|| | | f|| | | f|| | | f<tj|| | f| dfdd|| | df<n|dkrn| dkrtj|d| | | ffdd|d| | f<t|| | | f|| | | f|| | | f<tj|| | f| dfdd|| | df<n |dkr$|d| | f||d| | f|||d| | f<|| | | f||| | | f|||| | | f<|| | df||| | df|||| | df<nT|dkr| dkrbtj|d| | | ffdd|d| | f<|| | | f|| | | fd|| | | f<tj|| | f| dfdd|| | df<n|dkrTtt|d| | f||d| | | |d| t|d| | f||d| ||d| |d| | f<tt|| | | f||| | | | || | t|| | | f||| | ||| | || | | f<tt|| | df||| d| | || dt|| | df||| d||| d|| | df<n$|dkrztt|d| | f|t|d| | f||| | ||||d| | f<tt|| | | f|t|| | | f||| | ||||| | | f<tt|| | df|t|| | df||| | ||||| | df<n|dkrptt|d| | ft|d| | fd| | d|d| | f<tt|| | | ft|| | | fd| | d|| | | f<tt|| | dft|| | dfd| | d|| | df<ntdtj || | |f<tj |d| | f<| ||| <||| <qtddS)NzMethod: z...)dtypez,Not the global minimum in step {2}: {0}, {1}rzNegative index i1.zNegative index i2.zConvention violated.zUThe global minimum is not at the right place: ({0}, {1}): {2} != {3}. Difference: {4}r )axisr rrg?rrrzUnknown method.zOK.)sysstdoutwriterarrayintrlenarangeones fill_diagonalnanemptyrangeallisnannanminmaxabsrtolrrfloatminminimummaximummeansqrtsquare ValueErrorr)rrrIZDsnZrow_reprsizeZminsijZgmini1i2s1s2Srrrr)s,     0     0    *0.  *0. 88<  *0. (((    rc Cs|rt|}nddl}|d}td|D]}tjdd}tjdd}td|td|ttj ||}z6tdt |t ||d }td t |Wq,t k r}z t|tt |WYd Sd}~XYq,Xq,d S) NrzS If everything is OK, the test program will run forever, without an error message. rdzDimension: {0}zNumber of points: {0}zReal distance values:zInteger distance values:FT)r. itertoolsrepeatrrrandomrandintrrrandnrroundrr)repeatsiteratorrJ_dimr>rErrrtests,    rU__main__)rr# fastclusterrnumpyrZscipy.spatial.distancerrmathversion __version__r<ratexitr registerrLrMrr4seterrrrrU__name__rrrrs(     `fastcluster/src/python/tests/__pycache__/vectortest.cpython-38.pyc0000644000176200001440000001522613602651655025144 0ustar liggesusersU Q ^.#@sedddlZddlZddlZddlmZmZddl Z dZ ej e krXe d ej e ddlZddZeeejddZeejed Zd Zejd d d dZddZddZddZedkreddS)u Test program for the 'fastcluster' package. Copyright: * Until package version 1.1.23: (c) 2011 Daniel Müllner * All changes from version 1.1.24 on: (c) Google Inc. N)pdist squareformz1.1.26z'Wrong module version: {} instead of {}.cCstdtdS)Nz Seed: {0})printformatseedrrA/home/muellner/cluster/fastcluster/src/python/tests/vectortest.py print_seedsr geAg+=gvIh%<=ignore)invalidcCsL|dkrHttj|dkdd}t|rHt|}d|t||<t|}|S)N)jaccarddice sokalsneathraxis)np flatnonzeroalllenrix_)DpcdmetriczZDDrrrcorrect_for_zero_vectors#src Cs6d}tjjdd||ftjd}|}dD]}tjd|dt||d}t |||}zt |||}Wn:t k rt t|rtd Yq*ntd YnXt ||krtd |t|||q*t|}tj| |d ||f}d D] }tjd|d|dkr`tjdd} tjdt| dt||| d}t |||| }n|dkrdd} t|| d}t ||| }njt||d}t |||}zt |||}Wn>t k rt t|rtd Yqntd YnXt|||qt|}dD]}t ||}t|||qdS)Nsingler)sizedtype)hammingr yulematchingr rogerstanimoto russellraorzMetric: ...)rz(Skip this test: NaN dissimilarity value.z*"linkage_vector" erroneously reported NaN.zInput array was corrupted.r) euclidean sqeuclidean cityblock chebychev minkowskicosine correlationr r canberra braycurtis seuclidean mahalanobisuserr*g?g$@zp: )rpr1cSst||||jS)N)rsqrtTsum)uvrrrcztest_all..)wardcentroidmedian)rrandomrandintboolcopysysstdoutwriterrfclinkage_vectorFloatingPointErroranyisnanrAssertionErrorcheckmathr3uniformstr) ndimmethodrZpcd2rrZ2boundr2fnrrrtest_all.s\             rTc Cstjd|dtj|ddddftd}t|}t|}td|d}d||d<tj |tjd}t |tj t |d}t |dD]F} t |dD]"} t|| | ddf|| <qt|} t|| df| tt|| dft| tkrFt|| df| tkrFtd|| df| | t|||| ddf\} } | dkrvtd t|| dkrtd t||| df|| dfkrtd t|| | kr| | } } t|| | f| tt|| | ft| tkrJt|| | f| tkrJtd | | || | f| || | f| | t||| }|| }t||}|d kr| dkrtj|d| | | ffdd|d| | f<t|| | | f|| | | f|| | | f<tj|| | f| dfdd|| | df<n|dkr| dkr@tj|d| | | ffdd|d| | f<t|| | | f|| | | f|| | | f<tj|| | f| dfdd|| | df<n |dkrT|d| | f||d| | f|||d| | f<|| | | f||| | | f|||| | | f<|| | df||| | df|||| | df<nT|dkr| dkrtj|d| | | ffdd|d| | f<|| | | f|| | | fd|| | | f<tj|| | f| dfdd|| | df<n|dkrtt|d| | f||d| | | |d| t|d| | f||d| ||d| |d| | f<tt|| | | f||| | | | || | t|| | | f||| | ||| | || | | f<tt|| | df||| d| | || dt|| | df||| d||| d|| | df<n$|dkrtt|d| | f|t|d| | f||| | ||||d| | f<tt|| | | f|t|| | | f||| | ||||| | | f<tt|| | df|t|| | df||| | ||||| | df<n|dkrtt|d| | ft|d| | fd| | d|d| | f<tt|| | | ft|| | | fd| | d|| | | f<tt|| | dft|| | dfd| | d|| | df<ntdtj|| | |f<tj|d| | f<| ||| <||| <qtddS)NzMethod: r%r)rrz,Not the global minimum in step {2}: {0}, {1}rzNegative index i1.zNegative index i2.zConvention violated.zaThe global minimum is not at the right place in step {5}: ({0}, {1}): {2} != {3}. Difference: {4}rrcompleteaverageweightedg?r:r;r<zUnknown method.zOK.)rArBrCrarrayintrrarangeones fill_diagonalnanemptyrangenanminabsmaxrtolabstolrIrfloatminminimummaximummeanr3square ValueErrorinfr)rQrPrIDsrNrow_reprrminsijgmini1i2s1s2SrrrrJzs2   4     4    *0.  *0. 88<  *0. ($($($    rJc Cs|rt|}nddl}|d}td|D]}tjdd}tjtd|dd}td|td|zt ||Wq,t k r}z(t|j dt|j d WYd Sd}~XYq,Xq,d S) NrzS If everything is OK, the test program will run forever, without an error message. r zDimension: {0}zNumber of points: {0}rFT) r` itertoolsrepeatrrr=r>rcrrTrIargs)repeatsiteratorr}_rOrNErrrtests"  r__main__)rrA fastclusterrDnumpyrscipy.spatial.distancerrrKversion __version__rlratexitr registerr=r>rrerdseterrrrTrJr__name__rrrrs.     Lafastcluster/src/python/tests/__pycache__/__init__.cpython-38.pyc0000644000176200001440000000162213602651652024471 0ustar liggesusersU  V^@sddlZGdddejZdS)Nc@s$eZdZddZddZddZdS)fastcluster_testcCsddlm}||ddSNrtest )Z tests.testr assertTrueselfrr ?/home/muellner/cluster/fastcluster/src/python/tests/__init__.pyrs zfastcluster_test.testcCsddlm}||dS)Nrr)Z tests.nantestrrrr r r test_nans zfastcluster_test.test_nancCsddlm}||ddSr)Ztests.vectortestrrrr r r test_vector s zfastcluster_test.test_vectorN)__name__ __module__ __qualname__rr r r r r r rsr)unittestTestCaserr r r r sfastcluster/src/python/tests/__pycache__/nantest.cpython-38.pyc0000644000176200001440000000332613602651655024414 0ustar liggesusersU 'Q ^@sdZedddlZddlZdZejekr * All changes from version 1.1.24 on: (c) Google Inc. Nz1.1.26z'Wrong module version: {} instead of {}.cCstdtdS)Nz Seed: {0})printformatseedrr>/home/muellner/cluster/fastcluster/src/python/tests/nantest.py print_seedsrgeAc CsDtjdd}||dd}tj|}tj|}tj||<dD]4}ztj||dtdWqDtk rvYqDXqDdD]D}z*tjtj tj tj g|dtdWq~tk rYq~Xq~tjdd}tj||}tj|tj|f}tj||<d D]8}ztj ||dtdWntk r:YnXqd S) Nd)singlecompleteaverageweightedwardcentroidmedian)methodz'fastcluster did not detect a NaN value!)r rrrr )r rrrT) nprandomrandintrandnan fastclusterlinkageAssertionErrorFloatingPointErrorinflinkage_vector)nNDposrdimXrrrtests8        r%__main__zOK.)__doc__rnumpyrrversion __version__ ValueErrorratexitrregisterrrrr%__name__rrrrs   -fastcluster/src/python/fastcluster.pyc0000644000176200001440000004604313602651751020012 0ustar liggesusers P ^c @sdZddddddddd g ZdFZd jeZd dlmZmZmZm Z m Z m Z m Z m Z mZmZmZd dlmZyd dlmZWnek rdZnXd dlmZmZdZdZdZdZdZdZdZidd6dd6dd6dd6dd6dd6d d6Z dd!e!d"Z"idd!6dd#6dd$6dd%6dd&6dd'6d d(6d)d*6d+d,6d-d.6d/d06d1d26d3d46d5d66d5d76d8d96d:d;6d<d=6d>d?6d@dA6dBdC6Z#dGZ$dd!dEdDZ&dES(HsgFast hierarchical clustering routines for R and Python Copyright: Until package version 1.1.23: © 2011 Daniel Müllner All changes from version 1.1.24 on: © Google Inc. This module provides fast hierarchical clustering routines. The "linkage" method is designed to provide a replacement for the “linkage” function and its siblings in the scipy.cluster.hierarchy module. You may use the methods in this module with the same syntax as the corresponding SciPy functions but with the benefit of much faster performance. The method "linkage_vector" performs clustering of vector data with memory- saving algorithms. Refer to the User's manual "fastcluster.pdf" for comprehensive details. It is located in the directory inst/doc/ in the source distribution and may also be obtained at . tsingletcompletetaveragetweightedtwardtcentroidtmediantlinkagetlinkage_vectort1t26t.i( tdoubletemptytarraytndarraytvartcovtdottboolt expand_dimstceiltsqrt(tinv(tpdistcOstddS(NsThe fastcluster.linkage function cannot process vector data since the function scipy.spatial.distance.pdist could not be imported.(t ImportError(targstkwargs((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR s(t linkage_wraptlinkage_vector_wrapcCst|ddS(sfSingle linkage clustering (alias). See the help on the “linkage” function for further information.tmethodR(R(tD((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR'scCst|ddS(shComplete linkage clustering (alias). See the help on the “linkage” function for further information.RR(R(R((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR,scCst|ddS(sHierarchical clustering with the “average” distance update formula (alias). See the help on the “linkage” function for further information.RR(R(R((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR1scCst|ddS(sHierarchical clustering with the “weighted” distance update formula (alias). See the help on the “linkage” function for further information.RR(R(R((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR6scCst|ddS(sHierarchical clustering with the “Ward” distance update formula (alias). See the help on the “linkage” function for further information.RR(R(R((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR;scCst|ddS(sHierarchical clustering with the “centroid” distance update formula (alias). See the help on the “linkage” function for further information.RR(R(R((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyR@scCst|ddS(sHierarchical clustering with the “median” distance update formula (alias). See the help on the “linkage” function for further information.RR(R(R((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyREsiiiiiiit euclideanc CsIt|dtdt}|jdkr|dkr<t}nt|dtd|dddt}t|}ttt|d}||dd|kr t d q nW|jdkst t|}t |d |}t|dtdtdddt}t |dd f}|dkrEt |||t|n|S( sHierarchical, agglomerative clustering on a dissimilarity matrix or on Euclidean data. Apart from the argument 'preserve_input', the method has the same input parameters and output format as the functions of the same name in the module scipy.cluster.hierarchy. The argument X is preferably a NumPy array with floating point entries (X.dtype==numpy.double). Any other data format will be converted before it is processed. If X is a one-dimensional array, it is considered a condensed matrix of pairwise dissimilarities in the format which is returned by scipy.spatial.distance.pdist. It contains the flattened, upper- triangular part of a pairwise dissimilarity matrix. That is, if there are N data points and the matrix d contains the dissimilarity between the i-th and j-th observation at position d(i,j), the vector X has length N(N-1)/2 and is ordered as follows: [ d(0,1), d(0,2), ..., d(0,n-1), d(1,2), ..., d(1,n-1), ..., d(n-2,n-1) ] The 'metric' argument is ignored in case of dissimilarity input. The optional argument 'preserve_input' specifies whether the method makes a working copy of the dissimilarity vector or writes temporary data into the existing array. If the dissimilarities are generated for the clustering step only and are not needed afterward, approximately half the memory can be saved by specifying 'preserve_input=False'. Note that the input array X contains unspecified values after this procedure. It is therefore safer to write linkage(X, method="...", preserve_input=False) del X to make sure that the matrix X is not accessed accidentally after it has been used as scratch memory. (The single linkage algorithm does not write to the distance matrix or its copy anyway, so the 'preserve_input' flag has no effect in this case.) If X contains vector data, it must be a two-dimensional array with N observations in D dimensions as an (N×D) array. The preserve_input argument is ignored in this case. The specified metric is used to generate pairwise distances from the input. The following two function calls yield the same output: linkage(pdist(X, metric), method="...", preserve_input=False) linkage(X, metric=metric, method="...") The general scheme of the agglomerative clustering procedure is as follows: 1. Start with N singleton clusters (nodes) labeled 0,...,N−1, which represent the input points. 2. Find a pair of nodes with minimal distance among all pairwise distances. 3. Join the two nodes into a new node and remove the two old nodes. The new nodes are labeled consecutively N, N+1, ... 4. The distances from the new node to all other nodes is determined by the method parameter (see below). 5. Repeat N−1 times from step 2, until there is one big node, which contains all original input points. The output of linkage is stepwise dendrogram, which is represented as an (N−1)×4 NumPy array with floating point entries (dtype=numpy.double). The first two columns contain the node indices which are joined in each step. The input nodes are labeled 0,...,N−1, and the newly generated nodes have the labels N,...,2N−2. The third column contains the distance between the two nodes at each step, ie. the current minimal distance at the time of the merge. The fourth column counts the number of points which comprise each new node. The parameter method specifies which clustering scheme to use. The clustering scheme determines the distance from a new node to the other nodes. Denote the dissimilarities by d, the nodes to be joined by I, J, the new node by K and any other node by L. The symbol |I| denotes the size of the cluster I. method='single': d(K,L) = min(d(I,L), d(J,L)) The distance between two clusters A, B is the closest distance between any two points in each cluster: d(A,B) = min{ d(a,b) | a∈A, b∈B } method='complete': d(K,L) = max(d(I,L), d(J,L)) The distance between two clusters A, B is the maximal distance between any two points in each cluster: d(A,B) = max{ d(a,b) | a∈A, b∈B } method='average': d(K,L) = ( |I|·d(I,L) + |J|·d(J,L) ) / (|I|+|J|) The distance between two clusters A, B is the average distance between the points in the two clusters: d(A,B) = (|A|·|B|)^(-1) · \sum { d(a,b) | a∈A, b∈B } method='weighted': d(K,L) = (d(I,L)+d(J,L))/2 There is no global description for the distance between clusters since the distance depends on the order of the merging steps. The following three methods are intended for Euclidean data only, ie. when X contains the pairwise (non-squared!) distances between vectors in Euclidean space. The algorithm will work on any input, however, and it is up to the user to make sure that applying the methods makes sense. method='centroid': d(K,L) = ( (|I|·d(I,L) + |J|·d(J,L)) / (|I|+|J|) − |I|·|J|·d(I,J)/(|I|+|J|)^2 )^(1/2) There is a geometric interpretation: d(A,B) is the distance between the centroids (ie. barycenters) of the clusters in Euclidean space: d(A,B) = ‖c_A−c_B∥, where c_A denotes the centroid of the points in cluster A. method='median': d(K,L) = ( d(I,L)/2 + d(J,L)/2 − d(I,J)/4 )^(1/2) Define the midpoint w_K of a cluster K iteratively as w_K=k if K={k} is a singleton and as the midpoint (w_I+w_J)/2 if K is formed by joining I and J. Then we have d(A,B) = ∥w_A−w_B∥ in Euclidean space for all nodes A,B. Notice however that this distance depends on the order of the merging steps. method='ward': d(K,L) = ( ((|I|+|L)d(I,L) + (|J|+|L|)d(J,L) − |L|d(I,J)) / (|I|+|J|+|L|) )^(1/2) The global cluster dissimilarity can be expressed as d(A,B) = ( 2|A|·|B|/(|A|+|B|) )^(1/2) · ‖c_A−c_B∥, where c_A again denotes the centroid of the points in cluster A. The clustering algorithm handles infinite values correctly, as long as the chosen distance update formula makes sense. If a NaN value occurs, either in the original dissimilarities or as an updated dissimilarity, an error is raised. The linkage method does not treat NumPy's masked arrays as special and simply ignores the mask.tcopytsubokiRtdtypetordertCisTThe length of the condensed distance matrix must be (k \choose 2) for k data points!tmetrici(RtFalsetTruetndimR tlentintRRt ValueErrortAssertionErrorRR Rtmthidx(tXRR&tpreserve_inputtNNtNtZ((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyRSs"  $  $ t minkowskit cityblockt seuclideant sqeuclideantcosinethammingitjaccardit chebychevi tcanberrai t braycurtisi t mahalanobisi tyulei tmatchingt sokalmicheneritdiceitrogerstanimotoit russellraoit sokalsneathit kulsinskiitUSERc Csb|dkr|dkst|dkr`t|dtdt}|jtkrWtnt}n|tkrrtnt}t|d|dtdd dt}n<|d kstt|dtd|d kdd dt}|jd kstt |}t |d df}|dkrH|dkr1t |dddd }q1n|dkr|dkr{t t|dt}ntt||dtdtdd dt}n|dkr|t|jdd d }d}nRt|ts|dkstd|}}n!|dkr1|dks1tn|d kr^t||t|t||n|S(sHierarchical (agglomerative) clustering on Euclidean data. Compared to the 'linkage' method, 'linkage_vector' uses a memory-saving algorithm. While the linkage method requires Θ(N^2) memory for clustering of N points, this method needs Θ(ND) for N points in R^D, which is usually much smaller. The argument X has the same format as before, when X describes vector data, ie. it is an (N×D) array. Also the output array has the same format. The parameter method must be one of 'single', 'centroid', 'median', 'ward', ie. only for these methods there exist memory-saving algorithms currently. If 'method', is one of 'centroid', 'median', 'ward', the 'metric' must be 'euclidean'. For single linkage clustering, any dissimilarity function may be chosen. Basically, every metric which is implemented in the method scipy.spatial.distance.pdist is reimplemented here. However, the metrics differ in some instances since a number of mistakes and typos (both in the code and in the documentation) were corrected in the fastcluster package. Therefore, the available metrics with their definitions are listed below as a reference. The symbols u and v mostly denote vectors in R^D with coordinates u_j and v_j respectively. See below for additional metrics for Boolean vectors. Unless otherwise stated, the input array X is converted to a floating point array (X.dtype==numpy.double) if it does not have already the required data type. Some metrics accept Boolean input; in this case this is stated explicitly below. If a NaN value occurs, either in the original dissimilarities or as an updated dissimilarity, an error is raised. In principle, the clustering algorithm handles infinite values correctly, but the user is advised to carefully check the behavior of the metric and distance update formulas under these circumstances. The distance formulas combined with the clustering in the 'linkage_vector' method do not have specified behavior if the data X contains infinite or NaN values. Also, the masks in NumPy’s masked arrays are simply ignored. metric='euclidean': Euclidean metric, L_2 norm d(u,v) = ∥u−v∥ = ( \sum_j { (u_j−v_j)^2 } )^(1/2) metric='sqeuclidean': squared Euclidean metric d(u,v) = ∥u−v∥^2 = \sum_j { (u_j−v_j)^2 } metric='seuclidean': standardized Euclidean metric d(u,v) = ( \sum_j { (u_j−v_j)^2 / V_j } )^(1/2) The vector V=(V_0,...,V_{D−1}) is given as the 'extraarg' argument. If no 'extraarg' is given, V_j is by default the unbiased sample variance of all observations in the j-th coordinate: V_j = Var_i (X(i,j) ) = 1/(N−1) · \sum_i ( X(i,j)^2 − μ(X_j)^2 ) (Here, μ(X_j) denotes as usual the mean of X(i,j) over all rows i.) metric='mahalanobis': Mahalanobis distance d(u,v) = ( transpose(u−v) V (u−v) )^(1/2) Here, V=extraarg, a (D×D)-matrix. If V is not specified, the inverse of the covariance matrix numpy.linalg.inv(numpy.cov(X, rowvar=False)) is used. metric='cityblock': the Manhattan distance, L_1 norm d(u,v) = \sum_j |u_j−v_j| metric='chebychev': the supremum norm, L_∞ norm d(u,v) = max_j { |u_j−v_j| } metric='minkowski': the L_p norm d(u,v) = ( \sum_j |u_j−v_j|^p ) ^(1/p) This metric coincides with the cityblock, euclidean and chebychev metrics for p=1, p=2 and p=∞ (numpy.inf), respectively. The parameter p is given as the 'extraarg' argument. metric='cosine' d(u,v) = 1 − ⟨u,v⟩ / (∥u∥·∥v∥) = 1 − (\sum_j u_j·v_j) / ( (\sum u_j^2)(\sum v_j^2) )^(1/2) metric='correlation': This method first mean-centers the rows of X and then applies the 'cosine' distance. Equivalently, the correlation distance measures 1 − (Pearson’s correlation coefficient). d(u,v) = 1 − ⟨u−μ(u),v−μ(v)⟩ / (∥u−μ(u)∥·∥v−μ(v)∥) metric='canberra' d(u,v) = \sum_j ( |u_j−v_j| / (|u_j|+|v_j|) ) Summands with u_j=v_j=0 contribute 0 to the sum. metric='braycurtis' d(u,v) = (\sum_j |u_j-v_j|) / (\sum_j |u_j+v_j|) metric=(user function): The parameter metric may also be a function which accepts two NumPy floating point vectors and returns a number. Eg. the Euclidean distance could be emulated with fn = lambda u, v: numpy.sqrt(((u-v)*(u-v)).sum()) linkage_vector(X, method='single', metric=fn) This method, however, is much slower than the build-in function. metric='hamming': The Hamming distance accepts a Boolean array (X.dtype==bool) for efficient storage. Any other data type is converted to numpy.double. d(u,v) = |{j | u_j≠v_j }| metric='jaccard': The Jaccard distance accepts a Boolean array (X.dtype==bool) for efficient storage. Any other data type is converted to numpy.double. d(u,v) = |{j | u_j≠v_j }| / |{j | u_j≠0 or v_j≠0 }| d(0,0) = 0 Python represents True by 1 and False by 0. In the Boolean case, the Jaccard distance is therefore: d(u,v) = |{j | u_j≠v_j }| / |{j | u_j ∨ v_j }| The following metrics are designed for Boolean vectors. The input array is converted to the 'bool' data type if it is not Boolean already. Use the following abbreviations to count the number of True/False combinations: a = |{j | u_j ∧ v_j }| b = |{j | u_j ∧ (¬v_j) }| c = |{j | (¬u_j) ∧ v_j }| d = |{j | (¬u_j) ∧ (¬v_j) }| Recall that D denotes the number of dimensions, hence D=a+b+c+d. metric='yule' d(u,v) = 2bc / (ad+bc) metric='dice': d(u,v) = (b+c) / (2a+b+c) d(0,0) = 0 metric='rogerstanimoto': d(u,v) = 2(b+c) / (b+c+D) metric='russellrao': d(u,v) = (b+c+d) / D metric='sokalsneath': d(u,v) = 2(b+c)/ ( a+2(b+c)) d(0,0) = 0 metric='kulsinski' d(u,v) = (b/(a+b) + c/(a+c)) / 2 metric='matching': d(u,v) = (b+c)/D Notice that when given a Boolean array, the 'matching' and 'hamming' distance are the same. The 'matching' distance formula, however, converts every input to Boolean first. Hence, the vectors (0,1) and (0,2) have zero 'matching' distance since they are both converted to (False, True) but the Hamming distance is 0.5. metric='sokalmichener' is an alias for 'matching'.RRGR9R:R!R"R#R$R%R RiiiR6taxisitddofR>trowvart correlationR8R4(R9R:N(R-RR'R(R#RR tbooleanmetricsR)R*R tNoneRRRRRtmeant isinstancetstrRR.tmtridx(R/RR&textraargR#R2R3((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pyRs<  '*     0    !N(R R R ( R?R@RBRFRCRARDRERF('t__doc__t__all__t__version_info__tjoint __version__tnumpyR R RRRRRRRRRt numpy.linalgRtscipy.spatial.distanceRRt _fastclusterRRRRRRRRRR.R(RRQRLRMR(((s</home/muellner/cluster/fastcluster/src/python/fastcluster.pytsb!L             fastcluster/src/python/fastcluster_python.cpp0000644000176200001440000010046414541365020021372 0ustar liggesusers/* fastcluster: Fast hierarchical clustering routines for R and Python Copyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. */ // for INT32_MAX in fastcluster.cpp // This must be defined here since Python.h loads the header file pyport.h, // and from this stdint.h. INT32_MAX is defined in stdint.h, but only if // __STDC_LIMIT_MACROS is defined. #define __STDC_LIMIT_MACROS #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION #include #include /* It's complicated, but if I do not include the C++ math headers, GCC will complain about conversions from 'double' to 'float', whenever 'isnan' is called in a templated function (but not outside templates). The '#include ' seems to cure the problem. */ //#include #define fc_isnan(X) ((X)!=(X)) // There is Py_IS_NAN but it is so much slower on my x86_64 system with GCC! #include // for std::abs, std::pow, std::sqrt #include // for std::ptrdiff_t #include // for std::numeric_limits<...>::infinity() #include // for std::stable_sort #include // for std::bad_alloc #include // for std::exception #include "../fastcluster.cpp" // backwards compatibility #ifndef NPY_ARRAY_CARRAY_RO #define NPY_ARRAY_CARRAY_RO NPY_CARRAY_RO #endif /* Since the public interface is given by the Python respectively R interface, * we do not want other symbols than the interface initalization routines to be * visible in the shared object file. The "visibility" switch is a GCC concept. * Hiding symbols keeps the relocation table small and decreases startup time. * See http://gcc.gnu.org/wiki/Visibility */ #if HAVE_VISIBILITY #pragma GCC visibility push(hidden) #endif /* Convenience class for the output array: automatic counter. */ class linkage_output { private: t_float * Z; public: linkage_output(t_float * const Z_) : Z(Z_) {} void append(const t_index node1, const t_index node2, const t_float dist, const t_float size) { if (node1(node1); *(Z++) = static_cast(node2); } else { *(Z++) = static_cast(node2); *(Z++) = static_cast(node1); } *(Z++) = dist; *(Z++) = size; } }; /* Generate the SciPy-specific output format for a dendrogram from the clustering output. The list of merging steps can be sorted or unsorted. */ // The size of a node is either 1 (a single point) or is looked up from // one of the clusters. #define size_(r_) ( ((r_ static void generate_SciPy_dendrogram(t_float * const Z, cluster_result & Z2, const t_index N) { // The array "nodes" is a union-find data structure for the cluster // identities (only needed for unsorted cluster_result input). union_find nodes(sorted ? 0 : N); if (!sorted) { std::stable_sort(Z2[0], Z2[N-1]); } linkage_output output(Z); t_index node1, node2; for (node const * NN=Z2[0]; NN!=Z2[N-1]; ++NN) { // Get two data points whose clusters are merged in step i. if (sorted) { node1 = NN->node1; node2 = NN->node2; } else { // Find the cluster identifiers for these points. node1 = nodes.Find(NN->node1); node2 = nodes.Find(NN->node2); // Merge the nodes in the union-find data structure by making them // children of a new node. nodes.Union(node1, node2); } output.append(node1, node2, NN->dist, size_(node1)+size_(node2)); } } /* Python interface code */ static PyObject * linkage_wrap(PyObject * const self, PyObject * const args); static PyObject * linkage_vector_wrap(PyObject * const self, PyObject * const args); // List the C++ methods that this extension provides. static PyMethodDef _fastclusterWrapMethods[] = { {"linkage_wrap", linkage_wrap, METH_VARARGS, NULL}, {"linkage_vector_wrap", linkage_vector_wrap, METH_VARARGS, NULL}, {NULL, NULL, 0, NULL} /* Sentinel - marks the end of this structure */ }; /* Tell Python about these methods. Python 2.x and 3.x differ in their C APIs for this part. */ #if PY_VERSION_HEX >= 0x03000000 static struct PyModuleDef fastclustermodule = { PyModuleDef_HEAD_INIT, "_fastcluster", NULL, // no module documentation -1, /* size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */ _fastclusterWrapMethods, NULL, NULL, NULL, NULL }; /* Make the interface initalization routines visible in the shared object * file. */ #if HAVE_VISIBILITY #pragma GCC visibility push(default) #endif PyMODINIT_FUNC PyInit__fastcluster(void) { PyObject * m; m = PyModule_Create(&fastclustermodule); if (!m) { return NULL; } import_array(); // Must be present for NumPy. Called first after above line. return m; } #if HAVE_VISIBILITY #pragma GCC visibility pop #endif # else // Python 2.x #if HAVE_VISIBILITY #pragma GCC visibility push(default) #endif PyMODINIT_FUNC init_fastcluster(void) { (void) Py_InitModule("_fastcluster", _fastclusterWrapMethods); import_array(); // Must be present for NumPy. Called first after above line. } #if HAVE_VISIBILITY #pragma GCC visibility pop #endif #endif // PY_VERSION class GIL_release { private: // noncopyable GIL_release(GIL_release const &); GIL_release & operator=(GIL_release const &); public: inline GIL_release(bool really = true) : _save(really ? PyEval_SaveThread() : NULL) { } inline ~GIL_release() { if (_save) PyEval_RestoreThread(_save); } private: PyThreadState * _save; }; /* Interface to Python, part 1: The input is a dissimilarity matrix. */ static PyObject *linkage_wrap(PyObject * const, PyObject * const args) { PyArrayObject * D, * Z; long int N_ = 0; unsigned char method; try{ // Parse the input arguments if (!PyArg_ParseTuple(args, "lO!O!b", &N_, // signed long integer &PyArray_Type, &D, // NumPy array &PyArray_Type, &Z, // NumPy array &method)) { // unsigned char return NULL; // Error if the arguments have the wrong type. } if (N_ < 1 ) { // N must be at least 1. PyErr_SetString(PyExc_ValueError, "At least one element is needed for clustering."); return NULL; } /* (1) The biggest index used below is 4*(N-2)+3, as an index to Z. This must fit into the data type used for indices. (2) The largest representable integer, without loss of precision, by a floating point number of type t_float is 2^T_FLOAT_MANT_DIG. Here, we make sure that all cluster labels from 0 to 2N-2 in the output can be accurately represented by a floating point number. Conversion of N to 64 bits below is not really necessary but it prevents a warning ("shift count >= width of type") on systems where "long int" is 32 bits wide. */ if (N_ > MAX_INDEX/4 || static_cast(N_-1)>>(T_FLOAT_MANT_DIG-1) > 0) { PyErr_SetString(PyExc_ValueError, "Data is too big, index overflow."); return NULL; } t_index N = static_cast(N_); // Allow threads! GIL_release G; t_float * const D_ = reinterpret_cast(PyArray_DATA(D)); cluster_result Z2(N-1); auto_array_ptr members; // For these methods, the distance update formula needs the number of // data points in a cluster. if (method==METHOD_METR_AVERAGE || method==METHOD_METR_WARD || method==METHOD_METR_CENTROID) { members.init(N, 1); } // Operate on squared distances for these methods. if (method==METHOD_METR_WARD || method==METHOD_METR_CENTROID || method==METHOD_METR_MEDIAN) { for (t_float * DD = D_; DD!=D_+static_cast(N)*(N-1)/2; ++DD) *DD *= *DD; } switch (method) { case METHOD_METR_SINGLE: MST_linkage_core(N, D_, Z2); break; case METHOD_METR_COMPLETE: NN_chain_core(N, D_, NULL, Z2); break; case METHOD_METR_AVERAGE: NN_chain_core(N, D_, members, Z2); break; case METHOD_METR_WEIGHTED: NN_chain_core(N, D_, NULL, Z2); break; case METHOD_METR_WARD: NN_chain_core(N, D_, members, Z2); break; case METHOD_METR_CENTROID: generic_linkage(N, D_, members, Z2); break; case METHOD_METR_MEDIAN: generic_linkage(N, D_, NULL, Z2); break; default: throw std::runtime_error(std::string("Invalid method index.")); } if (method==METHOD_METR_WARD || method==METHOD_METR_CENTROID || method==METHOD_METR_MEDIAN) { Z2.sqrt(); } t_float * const Z_ = reinterpret_cast(PyArray_DATA(Z)); if (method==METHOD_METR_CENTROID || method==METHOD_METR_MEDIAN) { generate_SciPy_dendrogram(Z_, Z2, N); } else { generate_SciPy_dendrogram(Z_, Z2, N); } } // try catch (const std::bad_alloc&) { return PyErr_NoMemory(); } catch(const std::exception& e){ PyErr_SetString(PyExc_EnvironmentError, e.what()); return NULL; } catch(const nan_error&){ PyErr_SetString(PyExc_FloatingPointError, "NaN dissimilarity value."); return NULL; } #ifdef FE_INVALID catch(const fenv_error&){ PyErr_SetString(PyExc_FloatingPointError, "NaN dissimilarity value in intermediate results."); return NULL; } #endif catch(...){ PyErr_SetString(PyExc_EnvironmentError, "C++ exception (unknown reason). Please send a bug report."); return NULL; } Py_RETURN_NONE; } /* Part 2: Clustering on vector data */ /* Metric codes. These codes must agree with the dictionary mtridx in fastcluster.py. */ enum metric_codes { // metrics METRIC_EUCLIDEAN = 0, METRIC_MINKOWSKI = 1, METRIC_CITYBLOCK = 2, METRIC_SEUCLIDEAN = 3, METRIC_SQEUCLIDEAN = 4, METRIC_COSINE = 5, METRIC_HAMMING = 6, METRIC_JACCARD = 7, METRIC_CHEBYCHEV = 8, METRIC_CANBERRA = 9, METRIC_BRAYCURTIS = 10, METRIC_MAHALANOBIS = 11, METRIC_YULE = 12, METRIC_MATCHING = 13, METRIC_DICE = 14, METRIC_ROGERSTANIMOTO = 15, METRIC_RUSSELLRAO = 16, METRIC_SOKALSNEATH = 17, METRIC_KULSINSKI = 18, METRIC_USER = 19, METRIC_INVALID = 20, // sentinel METRIC_JACCARD_BOOL = 21, // separate function for Jaccard metric on }; // Boolean input data /* Helper class: Throw this if calling the Python interpreter from within C returned an error. */ class pythonerror {}; /* This class handles all the information about the dissimilarity computation. */ class python_dissimilarity { private: t_float * Xa; std::ptrdiff_t dim; // size_t saves many statis_cast<> in products t_index N; auto_array_ptr Xnew; t_index * members; void (cluster_result::*postprocessfn) (const t_float) const; t_float postprocessarg; t_float (python_dissimilarity::*distfn) (const t_index, const t_index) const; // for user-defined metrics PyObject * X_Python; PyObject * userfn; auto_array_ptr precomputed; t_float * precomputed2; PyArrayObject * V; const t_float * V_data; // noncopyable python_dissimilarity(); python_dissimilarity(python_dissimilarity const &); python_dissimilarity & operator=(python_dissimilarity const &); public: // Ignore warning about uninitialized member variables. I know what I am // doing here, and some member variables are only used for certain metrics. python_dissimilarity (PyArrayObject * const Xarg, t_index * const members_, const method_codes method, const metric_codes metric, PyObject * const extraarg, bool temp_point_array) : Xa(reinterpret_cast(PyArray_DATA(Xarg))), dim(PyArray_DIM(Xarg, 1)), N(static_cast(PyArray_DIM(Xarg, 0))), Xnew(temp_point_array ? (N-1)*dim : 0), members(members_), postprocessfn(NULL), V(NULL) { switch (method) { case METHOD_METR_SINGLE: postprocessfn = NULL; // default switch (metric) { case METRIC_EUCLIDEAN: set_euclidean(); break; case METRIC_SEUCLIDEAN: if (extraarg==NULL) { PyErr_SetString(PyExc_TypeError, "The 'seuclidean' metric needs a variance parameter."); throw pythonerror(); } V = reinterpret_cast(PyArray_FromAny(extraarg, PyArray_DescrFromType(NPY_DOUBLE), 1, 1, NPY_ARRAY_CARRAY_RO, NULL)); if (PyErr_Occurred()) { throw pythonerror(); } if (PyArray_DIM(V, 0)!=dim) { PyErr_SetString(PyExc_ValueError, "The variance vector must have the same dimensionality as the data."); throw pythonerror(); } V_data = reinterpret_cast(PyArray_DATA(V)); distfn = &python_dissimilarity::seuclidean; postprocessfn = &cluster_result::sqrt; break; case METRIC_SQEUCLIDEAN: distfn = &python_dissimilarity::sqeuclidean; break; case METRIC_CITYBLOCK: set_cityblock(); break; case METRIC_CHEBYCHEV: set_chebychev(); break; case METRIC_MINKOWSKI: set_minkowski(extraarg); break; case METRIC_COSINE: distfn = &python_dissimilarity::cosine; postprocessfn = &cluster_result::plusone; // precompute norms precomputed.init(N); for (t_index i=0; i(dim); break; case METRIC_JACCARD: distfn = &python_dissimilarity::jaccard; break; case METRIC_CANBERRA: distfn = &python_dissimilarity::canberra; break; case METRIC_BRAYCURTIS: distfn = &python_dissimilarity::braycurtis; break; case METRIC_MAHALANOBIS: if (extraarg==NULL) { PyErr_SetString(PyExc_TypeError, "The 'mahalanobis' metric needs a parameter for the inverse covariance."); throw pythonerror(); } V = reinterpret_cast(PyArray_FromAny(extraarg, PyArray_DescrFromType(NPY_DOUBLE), 2, 2, NPY_ARRAY_CARRAY_RO, NULL)); if (PyErr_Occurred()) { throw pythonerror(); } if (PyArray_DIM(V, 0)!=N || PyArray_DIM(V, 1)!=dim) { PyErr_SetString(PyExc_ValueError, "The inverse covariance matrix has the wrong size."); throw pythonerror(); } V_data = reinterpret_cast(PyArray_DATA(V)); distfn = &python_dissimilarity::mahalanobis; postprocessfn = &cluster_result::sqrt; break; case METRIC_YULE: distfn = &python_dissimilarity::yule; break; case METRIC_MATCHING: distfn = &python_dissimilarity::matching; postprocessfn = &cluster_result::divide; postprocessarg = static_cast(dim); break; case METRIC_DICE: distfn = &python_dissimilarity::dice; break; case METRIC_ROGERSTANIMOTO: distfn = &python_dissimilarity::rogerstanimoto; break; case METRIC_RUSSELLRAO: distfn = &python_dissimilarity::russellrao; postprocessfn = &cluster_result::divide; postprocessarg = static_cast(dim); break; case METRIC_SOKALSNEATH: distfn = &python_dissimilarity::sokalsneath; break; case METRIC_KULSINSKI: distfn = &python_dissimilarity::kulsinski; postprocessfn = &cluster_result::plusone; precomputed.init(N); for (t_index i=0; i(sum); } break; case METRIC_USER: X_Python = reinterpret_cast(Xarg); this->userfn = extraarg; distfn = &python_dissimilarity::user; break; default: // case METRIC_JACCARD_BOOL: distfn = &python_dissimilarity::jaccard_bool; } break; case METHOD_METR_WARD: postprocessfn = &cluster_result::sqrtdouble; break; default: postprocessfn = &cluster_result::sqrt; } } ~python_dissimilarity() { Py_XDECREF(V); } inline t_float operator () (const t_index i, const t_index j) const { return (this->*distfn)(i,j); } inline t_float X (const t_index i, const t_index j) const { return Xa[i*dim+j]; } inline bool Xb (const t_index i, const t_index j) const { return reinterpret_cast(Xa)[i*dim+j]; } inline t_float * Xptr(const t_index i, const t_index j) const { return Xa+i*dim+j; } void merge(const t_index i, const t_index j, const t_index newnode) const { t_float const * const Pi = i(members[i]) + Pj[k]*static_cast(members[j])) / static_cast(members[i]+members[j]); } members[newnode] = members[i]+members[j]; } void merge_weighted(const t_index i, const t_index j, const t_index newnode) const { t_float const * const Pi = i(members[i]) + Pj[k]*static_cast(members[j])) / static_cast(members[i]+members[j]); } members[j] += members[i]; } void merge_inplace_weighted(const t_index i, const t_index j) const { t_float const * const Pi = Xa+i*dim; t_float * const Pj = Xa+j*dim; for(t_index k=0; k(members[i]); t_float mj = static_cast(members[j]); return sqeuclidean(i,j)*mi*mj/(mi+mj); } inline t_float ward_initial(const t_index i, const t_index j) const { // alias for sqeuclidean // Factor 2!!! return sqeuclidean(i,j); } // This method must not produce NaN if the input is non-NaN. inline static t_float ward_initial_conversion(const t_float min) { return min*.5; } inline t_float ward_extended(const t_index i, const t_index j) const { t_float mi = static_cast(members[i]); t_float mj = static_cast(members[j]); return sqeuclidean_extended(i,j)*mi*mj/(mi+mj); } /* We need two variants of the Euclidean metric: one that does not check for a NaN result, which is used for the initial distances, and one which does, for the updated distances during the clustering procedure. */ template t_float sqeuclidean(const t_index i, const t_index j) const { t_float sum = 0; /* for (t_index k=0; k::infinity()) { set_chebychev(); } else if (postprocessarg==1.0){ set_cityblock(); } else if (postprocessarg==2.0){ set_euclidean(); } else { distfn = &python_dissimilarity::minkowski; postprocessfn = &cluster_result::power; } } void set_euclidean() { distfn = &python_dissimilarity::sqeuclidean; postprocessfn = &cluster_result::sqrt; } void set_cityblock() { distfn = &python_dissimilarity::cityblock; } void set_chebychev() { distfn = &python_dissimilarity::chebychev; } t_float seuclidean(const t_index i, const t_index j) const { t_float sum = 0; for (t_index k=0; kmax) { max = diff; } } return max; } t_float cosine(const t_index i, const t_index j) const { t_float sum = 0; for (t_index k=0; k(sum1) / static_cast(sum2); } t_float canberra(const t_index i, const t_index j) const { t_float sum = 0; for (t_index k=0; k(dim)-NTT-NXO); // NFFTT } void nbool_correspond_xo(const t_index i, const t_index j) const { NXO = 0; for (t_index k=0; k(2*NTFFT) / static_cast(NTFFT + NFFTT); } // Prevent a zero denominator for equal vectors. t_float dice(const t_index i, const t_index j) const { nbool_correspond(i, j); return (NXO==0) ? 0 : static_cast(NXO) / static_cast(NXO+2*NTT); } t_float rogerstanimoto(const t_index i, const t_index j) const { nbool_correspond_xo(i, j); return static_cast(2*NXO) / static_cast(NXO+dim); } t_float russellrao(const t_index i, const t_index j) const { nbool_correspond_tt(i, j); return static_cast(dim-NTT); } // Prevent a zero denominator for equal vectors. t_float sokalsneath(const t_index i, const t_index j) const { nbool_correspond(i, j); return (NXO==0) ? 0 : static_cast(2*NXO) / static_cast(NTT+2*NXO); } t_float kulsinski(const t_index i, const t_index j) const { nbool_correspond_tt(i, j); return static_cast(NTT) * (precomputed[i] + precomputed[j]); } // 'matching' distance = Hamming distance t_float matching(const t_index i, const t_index j) const { nbool_correspond_xo(i, j); return static_cast(NXO); } // Prevent a zero denominator for equal vectors. t_float jaccard_bool(const t_index i, const t_index j) const { nbool_correspond(i, j); return (NXO==0) ? 0 : static_cast(NXO) / static_cast(NXO+NTT); } }; static PyObject *linkage_vector_wrap(PyObject * const, PyObject * const args) { PyArrayObject * X, * Z; unsigned char method, metric; PyObject * extraarg; try{ // Parse the input arguments if (!PyArg_ParseTuple(args, "O!O!bbO", &PyArray_Type, &X, // NumPy array &PyArray_Type, &Z, // NumPy array &method, // unsigned char &metric, // unsigned char &extraarg )) { // Python object return NULL; } if (PyArray_NDIM(X) != 2) { PyErr_SetString(PyExc_ValueError, "The input array must be two-dimensional."); } npy_intp const N_ = PyArray_DIM(X, 0); if (N_ < 1 ) { // N must be at least 1. PyErr_SetString(PyExc_ValueError, "At least one element is needed for clustering."); return NULL; } npy_intp const dim = PyArray_DIM(X, 1); if (dim < 1 ) { PyErr_SetString(PyExc_ValueError, "Invalid dimension of the data set."); return NULL; } /* (1) The biggest index used below is 4*(N-2)+3, as an index to Z. This must fit into the data type used for indices. (2) The largest representable integer, without loss of precision, by a floating point number of type t_float is 2^T_FLOAT_MANT_DIG. Here, we make sure that all cluster labels from 0 to 2N-2 in the output can be accurately represented by a floating point number. Conversion of N to 64 bits below is not really necessary but it prevents a warning ("shift count >= width of type") on systems where "int" is 32 bits wide. */ if (N_ > MAX_INDEX/4 || dim > MAX_INDEX || static_cast(N_-1)>>(T_FLOAT_MANT_DIG-1) > 0) { PyErr_SetString(PyExc_ValueError, "Data is too big, index overflow."); return NULL; } t_index N = static_cast(N_); cluster_result Z2(N-1); auto_array_ptr members; if (method==METHOD_METR_WARD || method==METHOD_METR_CENTROID) { members.init(2*N-1, 1); } if ((method!=METHOD_METR_SINGLE && metric!=METRIC_EUCLIDEAN) || metric>=METRIC_INVALID) { PyErr_SetString(PyExc_IndexError, "Invalid metric index."); return NULL; } if (PyArray_ISBOOL(X)) { if (metric==METRIC_HAMMING) { metric = METRIC_MATCHING; // Alias } if (metric==METRIC_JACCARD) { metric = METRIC_JACCARD_BOOL; } } if (extraarg!=Py_None && metric!=METRIC_MINKOWSKI && metric!=METRIC_SEUCLIDEAN && metric!=METRIC_MAHALANOBIS && metric!=METRIC_USER) { PyErr_SetString(PyExc_TypeError, "No extra parameter is allowed for this metric."); return NULL; } /* temp_point_array must be true if the alternative algorithm is used below (currently for the centroid and median methods). */ bool temp_point_array = (method==METHOD_METR_CENTROID || method==METHOD_METR_MEDIAN); python_dissimilarity dist(X, members, static_cast(method), static_cast(metric), extraarg, temp_point_array); if (method!=METHOD_METR_SINGLE && method!=METHOD_METR_WARD && method!=METHOD_METR_CENTROID && method!=METHOD_METR_MEDIAN) { PyErr_SetString(PyExc_IndexError, "Invalid method index."); return NULL; } // Allow threads if the metric is not "user"! GIL_release G(metric!=METRIC_USER); switch (method) { case METHOD_METR_SINGLE: MST_linkage_core_vector(N, dist, Z2); break; case METHOD_METR_WARD: generic_linkage_vector(N, dist, Z2); break; case METHOD_METR_CENTROID: generic_linkage_vector_alternative(N, dist, Z2); break; default: // case METHOD_METR_MEDIAN: generic_linkage_vector_alternative(N, dist, Z2); } if (method==METHOD_METR_WARD || method==METHOD_METR_CENTROID) { members.free(); } dist.postprocess(Z2); t_float * const Z_ = reinterpret_cast(PyArray_DATA(Z)); if (method!=METHOD_METR_SINGLE) { generate_SciPy_dendrogram(Z_, Z2, N); } else { generate_SciPy_dendrogram(Z_, Z2, N); } } // try catch (const std::bad_alloc&) { return PyErr_NoMemory(); } catch(const std::exception& e){ PyErr_SetString(PyExc_EnvironmentError, e.what()); return NULL; } catch(const nan_error&){ PyErr_SetString(PyExc_FloatingPointError, "NaN dissimilarity value."); return NULL; } catch(const pythonerror){ return NULL; } catch(...){ PyErr_SetString(PyExc_EnvironmentError, "C++ exception (unknown reason). Please send a bug report."); return NULL; } Py_RETURN_NONE; } #if HAVE_VISIBILITY #pragma GCC visibility pop #endif fastcluster/src/Makevars.win0000644000176200001440000000003212551722447015674 0ustar liggesusersOBJECTS = fastcluster_R.o fastcluster/src/fastcluster.cpp0000644000176200001440000014410114541364767016465 0ustar liggesusers/* fastcluster: Fast hierarchical clustering routines for R and Python Copyright: * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. This library implements various fast algorithms for hierarchical, agglomerative clustering methods: (1) Algorithms for the "stored matrix approach": the input is the array of pairwise dissimilarities. MST_linkage_core: single linkage clustering with the "minimum spanning tree algorithm (Rohlfs) NN_chain_core: nearest-neighbor-chain algorithm, suitable for single, complete, average, weighted and Ward linkage (Murtagh) generic_linkage: generic algorithm, suitable for all distance update formulas (Müllner) (2) Algorithms for the "stored data approach": the input are points in a vector space. MST_linkage_core_vector: single linkage clustering for vector data generic_linkage_vector: generic algorithm for vector data, suitable for the Ward, centroid and median methods. generic_linkage_vector_alternative: alternative scheme for updating the nearest neighbors. This method seems faster than "generic_linkage_vector" for the centroid and median methods but slower for the Ward method. All these implementation treat infinity values correctly. They throw an exception if a NaN distance value occurs. */ // Older versions of Microsoft Visual Studio do not have the fenv header. #ifdef _MSC_VER #if (_MSC_VER == 1500 || _MSC_VER == 1600) #define NO_INCLUDE_FENV #endif #endif // NaN detection via fenv might not work on systems with software // floating-point emulation (bug report for Debian armel). #ifdef __SOFTFP__ #define NO_INCLUDE_FENV #endif #ifdef NO_INCLUDE_FENV #pragma message("Do not use fenv header.") #else #pragma message("Use fenv header.") /* The following #pragma is necessary even if it generates a warning in many compilers. Quoting https://en.cppreference.com/w/cpp/numeric/fenv: "The floating-point environment access and modification is only meaningful when #pragma STDC FENV_ACCESS is supported and is set to ON. [...] In practice, few current compilers, such as HP aCC, Oracle Studio, or IBM XL, support the #pragma explicitly, but most compilers allow meaningful access to the floating-point environment anyway." */ #pragma STDC FENV_ACCESS ON #pragma messag("If there is a warning about unknown #pragma STDC FENV_ACCESS, this can be ignored.") #include #endif #include // for std::pow, std::sqrt #include // for std::ptrdiff_t #include // for std::numeric_limits<...>::infinity() #include // for std::fill_n #include // for std::runtime_error #include // for std::string #include // also for DBL_MAX, DBL_MIN #ifndef DBL_MANT_DIG #error The constant DBL_MANT_DIG could not be defined. #endif #define T_FLOAT_MANT_DIG DBL_MANT_DIG #ifndef LONG_MAX #include #endif #ifndef LONG_MAX #error The constant LONG_MAX could not be defined. #endif #ifndef INT_MAX #error The constant INT_MAX could not be defined. #endif #ifndef INT32_MAX #ifdef _MSC_VER #if _MSC_VER >= 1600 #define __STDC_LIMIT_MACROS #include #else typedef __int32 int_fast32_t; typedef __int64 int64_t; #endif #else #define __STDC_LIMIT_MACROS #include #endif #endif #define FILL_N std::fill_n #ifdef _MSC_VER #if _MSC_VER < 1600 #undef FILL_N #define FILL_N stdext::unchecked_fill_n #endif #endif #ifndef HAVE_VISIBILITY #if __GNUC__ >= 4 #define HAVE_VISIBILITY 1 #endif #endif /* Since the public interface is given by the Python respectively R interface, * we do not want other symbols than the interface initalization routines to be * visible in the shared object file. The "visibility" switch is a GCC concept. * Hiding symbols keeps the relocation table small and decreases startup time. * See http://gcc.gnu.org/wiki/Visibility */ #if HAVE_VISIBILITY #pragma GCC visibility push(hidden) #endif typedef int_fast32_t t_index; #ifndef INT32_MAX #define MAX_INDEX 0x7fffffffL #else #define MAX_INDEX INT32_MAX #endif #if (LONG_MAX < MAX_INDEX) #error The integer format "t_index" must not have a greater range than "long int". #endif #if (INT_MAX > MAX_INDEX) #error The integer format "int" must not have a greater range than "t_index". #endif typedef double t_float; /* Method codes. These codes must agree with the METHODS array in fastcluster.R and the dictionary mthidx in fastcluster.py. */ enum method_codes { // non-Euclidean methods METHOD_METR_SINGLE = 0, METHOD_METR_COMPLETE = 1, METHOD_METR_AVERAGE = 2, METHOD_METR_WEIGHTED = 3, METHOD_METR_WARD = 4, METHOD_METR_WARD_D = METHOD_METR_WARD, METHOD_METR_CENTROID = 5, METHOD_METR_MEDIAN = 6, METHOD_METR_WARD_D2 = 7, MIN_METHOD_CODE = 0, MAX_METHOD_CODE = 7 }; enum method_codes_vector { // Euclidean methods METHOD_VECTOR_SINGLE = 0, METHOD_VECTOR_WARD = 1, METHOD_VECTOR_CENTROID = 2, METHOD_VECTOR_MEDIAN = 3, MIN_METHOD_VECTOR_CODE = 0, MAX_METHOD_VECTOR_CODE = 3 }; // self-destructing array pointer template class auto_array_ptr{ private: type * ptr; auto_array_ptr(auto_array_ptr const &); // non construction-copyable auto_array_ptr& operator=(auto_array_ptr const &); // non copyable public: auto_array_ptr() : ptr(NULL) { } template auto_array_ptr(index const size) : ptr(new type[size]) { } template auto_array_ptr(index const size, value const val) : ptr(new type[size]) { FILL_N(ptr, size, val); } ~auto_array_ptr() { delete [] ptr; } void free() { delete [] ptr; ptr = NULL; } template void init(index const size) { ptr = new type [size]; } template void init(index const size, value const val) { init(size); FILL_N(ptr, size, val); } inline operator type *() const { return ptr; } }; struct node { t_index node1, node2; t_float dist; }; inline bool operator< (const node a, const node b) { return (a.dist < b.dist); } class cluster_result { private: auto_array_ptr Z; t_index pos; public: cluster_result(const t_index size) : Z(size) , pos(0) {} void append(const t_index node1, const t_index node2, const t_float dist) { Z[pos].node1 = node1; Z[pos].node2 = node2; Z[pos].dist = dist; ++pos; } node * operator[] (const t_index idx) const { return Z + idx; } /* Define several methods to postprocess the distances. All these functions are monotone, so they do not change the sorted order of distances. */ void sqrt() const { for (node * ZZ=Z; ZZ!=Z+pos; ++ZZ) { ZZ->dist = std::sqrt(ZZ->dist); } } void sqrt(const t_float) const { // ignore the argument sqrt(); } void sqrtdouble(const t_float) const { // ignore the argument for (node * ZZ=Z; ZZ!=Z+pos; ++ZZ) { ZZ->dist = std::sqrt(2*ZZ->dist); } } #ifdef R_pow #define my_pow R_pow #else #define my_pow std::pow #endif void power(const t_float p) const { t_float const q = 1/p; for (node * ZZ=Z; ZZ!=Z+pos; ++ZZ) { ZZ->dist = my_pow(ZZ->dist,q); } } void plusone(const t_float) const { // ignore the argument for (node * ZZ=Z; ZZ!=Z+pos; ++ZZ) { ZZ->dist += 1; } } void divide(const t_float denom) const { for (node * ZZ=Z; ZZ!=Z+pos; ++ZZ) { ZZ->dist /= denom; } } }; class doubly_linked_list { /* Class for a doubly linked list. Initially, the list is the integer range [0, size]. We provide a forward iterator and a method to delete an index from the list. Typical use: for (i=L.start; L succ; private: auto_array_ptr pred; // Not necessarily private, we just do not need it in this instance. public: doubly_linked_list(const t_index size) // Initialize to the given size. : start(0) , succ(size+1) , pred(size+1) { for (t_index i=0; i(2*N-3-(r_))*(r_)>>1)+(c_)-1] ) // Z is an ((N-1)x4)-array #define Z_(_r, _c) (Z[(_r)*4 + (_c)]) /* Lookup function for a union-find data structure. The function finds the root of idx by going iteratively through all parent elements until a root is found. An element i is a root if nodes[i] is zero. To make subsequent searches faster, the entry for idx and all its parents is updated with the root element. */ class union_find { private: auto_array_ptr parent; t_index nextparent; public: union_find(const t_index size) : parent(size>0 ? 2*size-1 : 0, 0) , nextparent(size) { } t_index Find (t_index idx) const { if (parent[idx] != 0 ) { // a → b t_index p = idx; idx = parent[idx]; if (parent[idx] != 0 ) { // a → b → c do { idx = parent[idx]; } while (parent[idx] != 0); do { t_index tmp = parent[p]; parent[p] = idx; p = tmp; } while (parent[p] != idx); } } return idx; } void Union (const t_index node1, const t_index node2) { parent[node1] = parent[node2] = nextparent++; } }; class nan_error{}; #ifdef FE_INVALID class fenv_error{}; #endif static void MST_linkage_core(const t_index N, const t_float * const D, cluster_result & Z2) { /* N: integer, number of data points D: condensed distance matrix N*(N-1)/2 Z2: output data structure The basis of this algorithm is an algorithm by Rohlf: F. James Rohlf, Hierarchical clustering using the minimum spanning tree, The Computer Journal, vol. 16, 1973, p. 93–95. */ t_index i; t_index idx2; doubly_linked_list active_nodes(N); auto_array_ptr d(N); t_index prev_node; t_float min; // first iteration idx2 = 1; min = std::numeric_limits::infinity(); for (i=1; i tmp) d[i] = tmp; else if (fc_isnan(tmp)) throw (nan_error()); if (d[i] < min) { min = d[i]; idx2 = i; } } Z2.append(prev_node, idx2, min); } } /* Functions for the update of the dissimilarity array */ inline static void f_single( t_float * const b, const t_float a ) { if (*b > a) *b = a; } inline static void f_complete( t_float * const b, const t_float a ) { if (*b < a) *b = a; } inline static void f_average( t_float * const b, const t_float a, const t_float s, const t_float t) { *b = s*a + t*(*b); #ifndef FE_INVALID if (fc_isnan(*b)) { throw(nan_error()); } #endif } inline static void f_weighted( t_float * const b, const t_float a) { *b = (a+*b)*.5; #ifndef FE_INVALID if (fc_isnan(*b)) { throw(nan_error()); } #endif } inline static void f_ward( t_float * const b, const t_float a, const t_float c, const t_float s, const t_float t, const t_float v) { *b = ( (v+s)*a - v*c + (v+t)*(*b) ) / (s+t+v); //*b = a+(*b)-(t*a+s*(*b)+v*c)/(s+t+v); #ifndef FE_INVALID if (fc_isnan(*b)) { throw(nan_error()); } #endif } inline static void f_centroid( t_float * const b, const t_float a, const t_float stc, const t_float s, const t_float t) { *b = s*a - stc + t*(*b); #ifndef FE_INVALID if (fc_isnan(*b)) { throw(nan_error()); } #endif } inline static void f_median( t_float * const b, const t_float a, const t_float c_4) { *b = (a+(*b))*.5 - c_4; #ifndef FE_INVALID if (fc_isnan(*b)) { throw(nan_error()); } #endif } template static void NN_chain_core(const t_index N, t_float * const D, t_members * const members, cluster_result & Z2) { /* N: integer D: condensed distance matrix N*(N-1)/2 Z2: output data structure This is the NN-chain algorithm, described on page 86 in the following book: Fionn Murtagh, Multidimensional Clustering Algorithms, Vienna, Würzburg: Physica-Verlag, 1985. */ t_index i; auto_array_ptr NN_chain(N); t_index NN_chain_tip = 0; t_index idx1, idx2; t_float size1, size2; doubly_linked_list active_nodes(N); t_float min; for (t_float const * DD=D; DD!=D+(static_cast(N)*(N-1)>>1); ++DD) { if (fc_isnan(*DD)) { throw(nan_error()); } } #ifdef FE_INVALID if (feclearexcept(FE_INVALID)) throw fenv_error(); #endif for (t_index j=0; jidx2) { t_index tmp = idx1; idx1 = idx2; idx2 = tmp; } if (method==METHOD_METR_AVERAGE || method==METHOD_METR_WARD) { size1 = static_cast(members[idx1]); size2 = static_cast(members[idx2]); members[idx2] += members[idx1]; } // Remove the smaller index from the valid indices (active_nodes). active_nodes.remove(idx1); switch (method) { case METHOD_METR_SINGLE: /* Single linkage. Characteristic: new distances are never longer than the old distances. */ // Update the distance matrix in the range [start, idx1). for (i=active_nodes.start; i(members[i]); for (i=active_nodes.start; i(members[i]) ); // Update the distance matrix in the range (idx1, idx2). for (; i(members[i]) ); // Update the distance matrix in the range (idx2, N). for (i=active_nodes.succ[idx2]; i(members[i]) ); break; default: throw std::runtime_error(std::string("Invalid method.")); } } #ifdef FE_INVALID if (fetestexcept(FE_INVALID)) throw fenv_error(); #endif } class binary_min_heap { /* Class for a binary min-heap. The data resides in an array A. The elements of A are not changed but two lists I and R of indices are generated which point to elements of A and backwards. The heap tree structure is H[2*i+1] H[2*i+2] \ / \ / ≤ ≤ \ / \ / H[i] where the children must be less or equal than their parent. Thus, H[0] contains the minimum. The lists I and R are made such that H[i] = A[I[i]] and R[I[i]] = i. This implementation is not designed to handle NaN values. */ private: t_float * const A; t_index size; auto_array_ptr I; auto_array_ptr R; // no default constructor binary_min_heap(); // noncopyable binary_min_heap(binary_min_heap const &); binary_min_heap & operator=(binary_min_heap const &); public: binary_min_heap(t_float * const A_, const t_index size_) : A(A_), size(size_), I(size), R(size) { // Allocate memory and initialize the lists I and R to the identity. This // does not make it a heap. Call heapify afterwards! for (t_index i=0; i>1); idx>0; ) { --idx; update_geq_(idx); } } inline t_index argmin() const { // Return the minimal element. return I[0]; } void heap_pop() { // Remove the minimal element from the heap. --size; I[0] = I[size]; R[I[0]] = 0; update_geq_(0); } void remove(t_index idx) { // Remove an element from the heap. --size; R[I[size]] = R[idx]; I[R[idx]] = I[size]; if ( H(size)<=A[idx] ) { update_leq_(R[idx]); } else { update_geq_(R[idx]); } } void replace ( const t_index idxold, const t_index idxnew, const t_float val) { R[idxnew] = R[idxold]; I[R[idxnew]] = idxnew; if (val<=A[idxold]) update_leq(idxnew, val); else update_geq(idxnew, val); } void update ( const t_index idx, const t_float val ) const { // Update the element A[i] with val and re-arrange the indices to preserve // the heap condition. if (val<=A[idx]) update_leq(idx, val); else update_geq(idx, val); } void update_leq ( const t_index idx, const t_float val ) const { // Use this when the new value is not more than the old value. A[idx] = val; update_leq_(R[idx]); } void update_geq ( const t_index idx, const t_float val ) const { // Use this when the new value is not less than the old value. A[idx] = val; update_geq_(R[idx]); } private: void update_leq_ (t_index i) const { t_index j; for ( ; (i>0) && ( H(i)>1) ); i=j) heap_swap(i,j); } void update_geq_ (t_index i) const { t_index j; for ( ; (j=2*i+1)=H(i) ) { ++j; if ( j>=size || H(j)>=H(i) ) break; } else if ( j+1 static void generic_linkage(const t_index N, t_float * const D, t_members * const members, cluster_result & Z2) { /* N: integer, number of data points D: condensed distance matrix N*(N-1)/2 Z2: output data structure */ const t_index N_1 = N-1; t_index i, j; // loop variables t_index idx1, idx2; // row and column indices auto_array_ptr n_nghbr(N_1); // array of nearest neighbors auto_array_ptr mindist(N_1); // distances to the nearest neighbors auto_array_ptr row_repr(N); // row_repr[i]: node number that the // i-th row represents doubly_linked_list active_nodes(N); binary_min_heap nn_distances(&*mindist, N_1); // minimum heap structure for // the distance to the nearest neighbor of each point t_index node1, node2; // node numbers in the output t_float size1, size2; // and their cardinalities t_float min; // minimum and row index for nearest-neighbor search t_index idx; for (i=0; ii} D(i,j) for i in range(N-1) t_float const * DD = D; for (i=0; i::infinity(); for (idx=j=i+1; ji} D(i,j) Normally, we have equality. However, this minimum may become invalid due to the updates in the distance matrix. The rules are: 1) If mindist[i] is equal to D(i, n_nghbr[i]), this is the correct minimum and n_nghbr[i] is a nearest neighbor. 2) If mindist[i] is smaller than D(i, n_nghbr[i]), this might not be the correct minimum. The minimum needs to be recomputed. 3) mindist[i] is never bigger than the true minimum. Hence, we never miss the true minimum if we take the smallest mindist entry, re-compute the value if necessary (thus maybe increasing it) and looking for the now smallest mindist entry until a valid minimal entry is found. This step is done in the lines below. The update process for D below takes care that these rules are fulfilled. This makes sure that the minima in the rows D(i,i+1:)of D are re-calculated when necessary but re-calculation is avoided whenever possible. The re-calculation of the minima makes the worst-case runtime of this algorithm cubic in N. We avoid this whenever possible, and in most cases the runtime appears to be quadratic. */ idx1 = nn_distances.argmin(); if (method != METHOD_METR_SINGLE) { while ( mindist[idx1] < D_(idx1, n_nghbr[idx1]) ) { // Recompute the minimum mindist[idx1] and n_nghbr[idx1]. n_nghbr[idx1] = j = active_nodes.succ[idx1]; // exists, maximally N-1 min = D_(idx1,j); for (j=active_nodes.succ[j]; j(members[idx1]); size2 = static_cast(members[idx2]); members[idx2] += members[idx1]; } Z2.append(node1, node2, mindist[idx1]); // Remove idx1 from the list of active indices (active_nodes). active_nodes.remove(idx1); // Index idx2 now represents the new (merged) node with label N+i. row_repr[idx2] = N+i; // Update the distance matrix switch (method) { case METHOD_METR_SINGLE: /* Single linkage. Characteristic: new distances are never longer than the old distances. */ // Update the distance matrix in the range [start, idx1). for (j=active_nodes.start; j(members[j]) ); if (n_nghbr[j] == idx1) n_nghbr[j] = idx2; } // Update the distance matrix in the range (idx1, idx2). for (; j(members[j]) ); if (D_(j, idx2) < mindist[j]) { nn_distances.update_leq(j, D_(j, idx2)); n_nghbr[j] = idx2; } } // Update the distance matrix in the range (idx2, N). if (idx2(members[j]) ); min = D_(idx2,j); for (j=active_nodes.succ[j]; j(members[j]) ); if (D_(idx2,j) < min) { min = D_(idx2,j); n_nghbr[idx2] = j; } } nn_distances.update(idx2, min); } break; case METHOD_METR_CENTROID: { /* Centroid linkage. Shorter and longer distances can occur, not bigger than max(d1,d2) but maybe smaller than min(d1,d2). */ // Update the distance matrix in the range [start, idx1). t_float s = size1/(size1+size2); t_float t = size2/(size1+size2); t_float stc = s*t*mindist[idx1]; for (j=active_nodes.start; j static void MST_linkage_core_vector(const t_index N, t_dissimilarity & dist, cluster_result & Z2) { /* N: integer, number of data points dist: function pointer to the metric Z2: output data structure The basis of this algorithm is an algorithm by Rohlf: F. James Rohlf, Hierarchical clustering using the minimum spanning tree, The Computer Journal, vol. 16, 1973, p. 93–95. */ t_index i; t_index idx2; doubly_linked_list active_nodes(N); auto_array_ptr d(N); t_index prev_node; t_float min; // first iteration idx2 = 1; min = std::numeric_limits::infinity(); for (i=1; i tmp) d[i] = tmp; else if (fc_isnan(tmp)) throw (nan_error()); if (d[i] < min) { min = d[i]; idx2 = i; } } Z2.append(prev_node, idx2, min); } } template static void generic_linkage_vector(const t_index N, t_dissimilarity & dist, cluster_result & Z2) { /* N: integer, number of data points dist: function pointer to the metric Z2: output data structure This algorithm is valid for the distance update methods "Ward", "centroid" and "median" only! */ const t_index N_1 = N-1; t_index i, j; // loop variables t_index idx1, idx2; // row and column indices auto_array_ptr n_nghbr(N_1); // array of nearest neighbors auto_array_ptr mindist(N_1); // distances to the nearest neighbors auto_array_ptr row_repr(N); // row_repr[i]: node number that the // i-th row represents doubly_linked_list active_nodes(N); binary_min_heap nn_distances(&*mindist, N_1); // minimum heap structure for // the distance to the nearest neighbor of each point t_index node1, node2; // node numbers in the output t_float min; // minimum and row index for nearest-neighbor search for (i=0; ii} D(i,j) for i in range(N-1) for (i=0; i::infinity(); t_index idx; for (idx=j=i+1; j(i,j); } if (tmp(idx1,j); for (j=active_nodes.succ[j]; j(idx1,j); if (tmp(j, idx2); if (tmp < mindist[j]) { nn_distances.update_leq(j, tmp); n_nghbr[j] = idx2; } else if (n_nghbr[j] == idx2) n_nghbr[j] = idx1; // invalidate } // Find the nearest neighbor for idx2. if (idx2(idx2,j); for (j=active_nodes.succ[j]; j(idx2, j); if (tmp < min) { min = tmp; n_nghbr[idx2] = j; } } nn_distances.update(idx2, min); } } } } template static void generic_linkage_vector_alternative(const t_index N, t_dissimilarity & dist, cluster_result & Z2) { /* N: integer, number of data points dist: function pointer to the metric Z2: output data structure This algorithm is valid for the distance update methods "Ward", "centroid" and "median" only! */ const t_index N_1 = N-1; t_index i, j=0; // loop variables t_index idx1, idx2; // row and column indices auto_array_ptr n_nghbr(2*N-2); // array of nearest neighbors auto_array_ptr mindist(2*N-2); // distances to the nearest neighbors doubly_linked_list active_nodes(N+N_1); binary_min_heap nn_distances(&*mindist, N_1, 2*N-2, 1); // minimum heap // structure for the distance to the nearest neighbor of each point t_float min; // minimum for nearest-neighbor searches // Initialize the minimal distances: // Find the nearest neighbor of each point. // n_nghbr[i] = argmin_{j>i} D(i,j) for i in range(N-1) for (i=1; i::infinity(); t_index idx; for (idx=j=0; j(i,j); } if (tmp # * All changes from version 1.1.24 on: © Google Inc. hclust <- function(d, method="complete", members=NULL) { # Hierarchical clustering, on raw input data. if(method == "ward") { message("The \"ward\" method has been renamed to \"ward.D\"; note new \"ward.D2\"") method <- "ward.D" } # This array must agree with the enum method_codes in fastcluster.cpp. METHODS <- c("single", "complete", "average", "mcquitty", "ward.D", "centroid", "median", "ward.D2") method <- pmatch(method, METHODS) if (is.na(method)) stop("Invalid clustering method.") if (method == -1) stop("Ambiguous clustering method.") dendrogram <- c( .Call(fastcluster, attr(d, "Size"), method, d, members), list( labels = attr(d, "Labels") ,method = METHODS[method] ,call = match.call() ,dist.method = attr(d, "method") ) ) class(dendrogram) <- "hclust" return (dendrogram) } hclust.vector <- function(X, method='single', members=NULL, metric='euclidean', p=NULL) { # Hierarchical clustering, on vector data. METHODS <- c("single", "ward", "centroid", "median") methodidx <- pmatch(method, METHODS) if (is.na(methodidx)) stop(paste("Invalid clustering method '", method, "' for vector data.", sep='')) if (methodidx == -1) stop("Ambiguous clustering method.") METRICS <- c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski") metric = pmatch(metric, METRICS) if (is.na(metric) || metric > 6) stop("Invalid metric.") if (metric == -1) stop("Ambiguous metric.") if (metric == 4 && getRversion() < "3.5.0") metric <- as.integer(7) # special metric code for backwards compatibility if (methodidx!=1 && metric!=1) stop("The Euclidean methods 'ward', 'centroid' and 'median' require the 'euclidean' metric.") X <- as.matrix(X) dendrogram <- c( .Call(fastcluster_vector, methodidx, metric, X, members, p), list( labels = dimnames(X)[[1L]] ,method = METHODS[methodidx] ,call = match.call() ,dist.method = METRICS[metric] ) ) class(dendrogram) <- "hclust" return (dendrogram) } fastcluster/NEWS.md0000644000176200001440000001714414541365140013721 0ustar liggesusers# fastcluster: Fast hierarchical clustering routines for R and Python ## Copyright * Until package version 1.1.23: © 2011 Daniel Müllner * All changes from version 1.1.24 on: © Google Inc. ## Version history ### Version 1.0.0, 03/14/2011 * Initial release, dependent on Rcpp. Not available on CRAN. ### Version 1.0.1, 03/15/2011 * Removed the dependence on Rcpp; only R's original C interface is used. ### Version 1.0.2, 03/17/2011 * File DESCRIPTION: Fixed a typo ### Version 1.0.3, 03/20/2011 * File README: Removed the warning about false results from the flashClust package since the new flashClust version 1.01 has this error corrected. * Cleaned the test file fastcluster_test.R up. (No dependence on the MASS package any more) ### Version 1.0.4, 03/21/2011 * Changed the name of the external function from the outdated "Rcpp_linkage" to "fastcluster". * Registered the external function "fastcluster" in R. * Configured the C header inclusions to work on Fedora (thanks to Peter Langfelder). ### Version 1.1.0, 08/21/2011 * Routines for clustering vector data. * Added a User's manual * Revision of all files ### Version 1.1.1, 10/08/2011 * Fixed test scripts, which indicated an error on some architectures, even if results were correct. (The assumption was that ties in single linkage clustering are resolved in the same way, both for dissimilarity input and for vector input. This is not necessarily true if the floating point unit uses "excess precision". Now the test scripts are content with arbitrary resolution of ties and do not assume a specific scheme.) * Bug fix: uninitialized function pointer in Version 1.1.0 ### Version 1.1.2, 10/11/2011 * Fix for Solaris: replaced ssize_t by ptrdiff_t in the C++ code. * Removed the NN-chain algorithm for vector input: it was not clear that it would work under all circumstances with the intricacies of floating- point arithmetic. Especially the effects of the excess precision on the x87 are impossible to control in a portable way. Now, the memory-saving routines for the “Ward” linkage use the generic algorithm, as “centroid” and “median” linkage do. ### Version 1.1.3, 12/10/2011 * Replaced ptrdiff_t by std::ptrdiff_t, as GCC 4.6.1 complains about this. ### Version 1.1.4, 02/01/2012 * Release the GIL in the Python package, so that it can be used efficiently in multithreaded applications. * Improved performance for the "Ward" method with vector input. * The "members" parameter in the R interface is now treated as a double array, not an integer array as before. This was a slight incompatibility with the stats::hclust function. Thanks to Matthias Studer, University of Geneva, for pointing this out. ### Version 1.1.5, 02/14/2012 * Updated the "members" specification in the User's manual to reflect the recent change. ### Version 1.1.6, 03/12/2012 * Bug fix related to GIL release in the Python wrapper. Thanks to Massimo Di Stefano for the bug report. * Small compatibility changes in the Python test scripts (again thanks to Massimo Di Stefano for the report). ### Version 1.1.7, 09/17/2012 * Scipy import is now optional (suggested by Forest Gregg) * Compatibility fix for NumPy 1.7. Thanks to Semihcan Doken for the bug report. ### Version 1.1.8, 08/28/2012 * Test for NaN dissimilarity values: Now the algorithms produce an error message instead of silently giving false results. The documentation was updated accordingly. This is the final design as intended: the fastcluster package handles infinity values correctly but complains about NaNs. * The Python interface now works with both Python 2 and Python 3. * Changed the license to BSD. ### Version 1.1.9, 03/15/2013 * Compatibility fix for the MSVC compilers on Windows. * Simplified GIL release in the Python interface. ### Version 1.1.10, 05/22/2013 * Updated citation information (JSS paper). * Suppress warnings where applicable. Compilation with GCC should not produce any warning at all, even if all compiler warnings are enabled. (The switch -pedantic still does not work, but this is due to the Python headers.) * Optimization: Hidden symbols. Only the interface functions are exported to the symbol table with GCC. ### Version 1.1.11, 05/23/2013 * Compatibility fix for Solaris. ### Version 1.1.12, 12/10/2013 * Tiny maintenance updates: new author web page and e-mail address, new location for R vignette. ### Version 1.1.13, 12/17/2013 * Moved the "python" directory due to CRAN requirements. ### Version 1.1.14, 01/02/2015 * Updated the DESCRIPTION file according to CRAN rules. * Renamed the “ward” method for dissimilarity input to “ward.D” in the R interface and created a new method “ward.D2”, following changes in R's hclust package. ### Version 1.1.15, 01/05/2015 * Fixed the unit test to work with old and new R versions (see the changes in stats::hclust in R 3.1.0). ### Version 1.1.16, 01/07/2015 * Support for large distance matrices (more than 2^31 entries, R's long vector support since version 3.0.0). ### Version 1.1.17, 07/03/2015 * Resolved MSVC compiler warnings. ### Version 1.1.18, 07/16/2015 * Fixed missing NumPy header include path. ### Version 1.1.19, 07/19/2015 * Fixed unit tests. They can be run with "python setup.py test" now. ### Version 1.1.20, 07/19/2015 * New version number due to PyPI upload error. ### Version 1.1.21, 09/18/2016 * Appropiate use of std namespace, as required by CRAN. ### Version 1.1.22, 06/12/2016 * No fenv header usage if software floating-point emulation is used (bug report: NaN test failed on Debian armel). ### Version 1.1.23, 03/24/2017 * setup.py: Late NumPy import for better dependency management. ### Version 1.1.24, 08/04/2017 * R 3.5 corrects the formula for the “Canberra” metric. See https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17285. The formula in the fastcluster package was changed accordingly. This concerns only the R interface. SciPy and fastcluster's Python interface always had the correct formula. ### Version 1.1.25, 05/27/2018 * Removed all “#pragma GCC diagnostic” directives in .cpp files due to changed CRAN requirements (CRAN repository only, not the GitHub repository). * Updated build scripts for Python 3.7 (thanks to Forest Gregg). * Simplified setup.py. * Updated and corrected documentation. ### Version 1.1.26, 12/30/2019 * Small updates for Python 3.8. ### Version 1.1.27, 01/20/2021 * Updated NumPy dependency for Python 3.9. ### Version 1.1.28, 02/03/2021 * Replace deprecated “numpy.bool” and “numpy.int” by “bool” and “int”. ### Version 1.2.0, 05/23/2021 * Dropped support for Python 2. * Python interface: Updated definition of the Yule distance function, following a change in SciPy 1.6.3. ### Version 1.2.1, 05/24/2021 * Documentation update. ### Version 1.2.2, 05/24/2021 * Documentation update. ### Version 1.2.3, 05/24/2021 * Fixed #pragma in fastcluster.cpp. ### Version 1.2.4, 08/21/2021 * Fixed NumPy versions in Travis builds. ### Version 1.2.5, 02/26/2022 * Updated pyproject.toml to work with Python 3.10 and future Python versions. ### Version 1.2.6, 02/27/2022 * Move to GitHub action for CI. No code changes. fastcluster/MD50000644000176200001440000000403514550336553013134 0ustar liggesusers5647132e51a882a603eb7602a8a4129f *DESCRIPTION 627e5cd43678611b48e851a102e332b4 *INSTALL 29deac3a21b0e1c67dad201df0757146 *LICENSE da8e9d68585993250a9c29c3e9bff50b *NAMESPACE ad2a83aedfd03382943b9b92c0105e5f *NEWS.md 21b51a1b4f5af8e0039efcff29a54272 *R/fastcluster.R 746b8898bff3b727507abc5a93f144ea *README 8faf250d78a120ba4c7ea2186efddd24 *build/partial.rdb 4030348b66f10e23b2e6a33125bc4d21 *build/vignette.rds f16d722807bbcdfe64c212ad7809a096 *inst/CITATION 1a66ade0114483b55bf0b7588905833c *inst/doc/fastcluster.Rtex 402d75b7c28f64b511b6fce82dbc3ff7 *inst/doc/fastcluster.pdf 8faf128842a2a8e6ca2954a3497ad374 *man/fastcluster.Rd 1dd270e4129897240bb814776a0279c1 *man/hclust.Rd 2d0c880723a69408265e02af93e9b5a2 *man/hclust.vector.Rd 97bb0f9bf046e498c47423129fc3691a *src/Makevars 7b8a328733afe582986d5292e9c91278 *src/Makevars.win 1fd1f6f16faf26689251f06a1a776df3 *src/fastcluster.cpp 63898ed2a5576d5259c88916a39234c5 *src/fastcluster_R.cpp 9fabc868fac490d7254c5e5a95292306 *src/python/fastcluster.py df40c9145b9de1c1648f950958a1b5bc *src/python/fastcluster.pyc ff2e53cc3089e1e95c8053b46cbe575c *src/python/fastcluster_python.cpp bb5b74575c812fa65639e40aa762bc88 *src/python/setup.py d348ac0584d73c91ececd163f6086857 *src/python/tests/__init__.py e44f4363e1f645f6ffc53ac4df0f0bab *src/python/tests/__init__.pyc 4baded14b2796d86adb58a638265716b *src/python/tests/__pycache__/__init__.cpython-38.pyc 600502b35199d064c9463155cae9f994 *src/python/tests/__pycache__/nantest.cpython-38.pyc 7bd60ba19805f1f7707dddfe36b19e19 *src/python/tests/__pycache__/test.cpython-38.pyc 68bc0b251175082c8490292cbd6e6548 *src/python/tests/__pycache__/vectortest.cpython-38.pyc efe247b0d5a4de2077c536b2bd5f7424 *src/python/tests/nantest.py 78484f1e09562678b04e0fa7833904bc *src/python/tests/test.py 2515ca289f7f46f04d3a064599e57067 *src/python/tests/test.pyc 7d7b867408a45a137660bc42e3f75d4b *src/python/tests/vectortest.py a5f4a09dbd17f54667bec15b03553f84 *tests/test_fastcluster.R 9cbb544a7574e9d55aed550e5f3608a4 *vignettes/Makefile 1a66ade0114483b55bf0b7588905833c *vignettes/fastcluster.Rtex fastcluster/inst/0000755000176200001440000000000014541263271013573 5ustar liggesusersfastcluster/inst/doc/0000755000176200001440000000000014541365753014350 5ustar liggesusersfastcluster/inst/doc/fastcluster.Rtex0000644000176200001440000012272414541365540017555 0ustar liggesusers\def\fastclusterversion{1.2.6} \documentclass[fontsize=10pt,paper=letter,BCOR=-6mm,DIV=8]{scrartcl} \usepackage[utf8]{inputenc} \usepackage{lmodern} \normalfont \usepackage[T1]{fontenc} \usepackage{textcomp} \newcommand*\q{\textquotesingle} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{xcolor} \usepackage{ifpdf} \ifpdf \newcommand*\driver{} \else \newcommand*\driver{dvipdfmx} \fi \usepackage[% pdftitle={fastcluster manual}, pdfauthor={Daniel Müllner}, % pdfsubject={}, pdfdisplaydoctitle=true, % pdfduplex=DuplexFlipLongEdge, pdfstartview=FitH, colorlinks=True, pdfhighlight=/I, % pdfborder={0 0 1}, % linkbordercolor={1 .8 .8}, % citebordercolor={.5 .9 .5}, % urlbordercolor={.5 .7 1}, % linkcolor={blue}, % citecolor={blue}, urlcolor={blue!80!black}, linkcolor={red!80!black}, % runcolor={blue}, % filecolor={blue}, pdfpagemode=UseOutlines, bookmarksopen=true, bookmarksopenlevel=1, bookmarksdepth=2, breaklinks=true, unicode=true, \driver ]{hyperref} % Optimize the PDF targets and make the PDF file smaller \ifpdf\RequirePackage{hypdestopt}\fi \renewcommand*\sectionautorefname{Section} \usepackage{typearea} \DeclareMathOperator\size{size} \DeclareMathOperator\Var{Var} \newcommand*\linkage{\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html}{\texttt{linkage}}} \newcommand*\hierarchy{\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html}{\texttt{scipy.\hskip0pt cluster.\hskip0pt hierarchy}}} \newcommand*\hclust{\href{https://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html}{\texttt{hclust}}} \newcommand*\stats{\href{https://stat.ethz.ch/R-manual/R-patched/library/stats/html/00Index.html}{\texttt{stats}}} \newcommand*\flashClustPack{\href{https://CRAN.R-project.org/package=flashClust}{\texttt{flashClust}}} \newcommand*\dist{\href{https://stat.ethz.ch/R-manual/R-patched/library/stats/html/dist.html}{\texttt{dist}}} \newcommand*\print{\href{https://stat.ethz.ch/R-manual/R-patched/library/base/html/print.html}{\texttt{print}}} \newcommand*\plot{\href{https://stat.ethz.ch/R-manual/R-patched/library/graphics/html/plot.html}{\texttt{plot}}} \newcommand*\identify{\href{https://stat.ethz.ch/R-manual/R-patched/library/stats/html/identify.hclust.html}{\texttt{identify}}} \newcommand*\rect{\href{https://stat.ethz.ch/R-manual/R-patched/library/stats/html/rect.hclust.html}{\texttt{rect.hclust}}} \newcommand*\NA{\href{https://stat.ethz.ch/R-manual/R-patched/library/base/html/NA.html}{\texttt{NA}}} \newcommand*\double{\href{https://stat.ethz.ch/R-manual/R-patched/library/base/html/double.html}{\texttt{double}}} \newcommand*\matchcall{\href{https://stat.ethz.ch/R-manual/R-patched/library/base/html/match.call.html}{\texttt{match.call}}} \newcommand*\NumPyarray{\href{https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html}{NumPy array}} \newcommand*\maskedarrays{\href{https://docs.scipy.org/doc/numpy/reference/maskedarray.html}{masked arrays}} \newcommand*\pdist{\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html}{\texttt{scipy.spatial.distance.pdist}}} %\usepackage{showframe} \makeatletter \newenvironment{methods}{% \list{}{\labelwidth\z@ \itemindent-\leftmargin \let\makelabel\methodslabel}% }{% \endlist } \newcommand*{\methodslabel}[1]{% %\hspace{\labelsep}% \hbox to \textwidth{\hspace{\labelsep}% \normalfont\bfseries\ttfamily #1\hskip-\labelsep\hfill}% } \makeatother \setkomafont{descriptionlabel}{\normalfont\ttfamily\bfseries} \begin{document} %\VignetteIndexEntry{User's manual} \title{The \textit{fastcluster} package: User's manual} \author{\href{https://danifold.net}{Daniel Müllner}} \date{February 27, 2022} \subtitle{Version \fastclusterversion} \maketitle \makeatletter \renewenvironment{quotation}{% \list{}{\listparindent 1em% \itemindent \listparindent \leftmargin2.5em \rightmargin \leftmargin \parsep \z@ \@plus\p@ }% \item\relax }{% \endlist } \makeatother \begin{abstract}\noindent\small The fastcluster package is a C++ library for hierarchical, agglomerative clustering. It efficiently implements the seven most widely used clustering schemes: single, complete, average, weighted/\hskip0pt mcquitty, Ward, centroid and median linkage. The library currently has interfaces to two languages: R and Python/SciPy. Part of the functionality is designed as drop-in replacement for existing routines: \linkage{} in the SciPy package \hierarchy{}, \hclust{} in R's \stats{} package, and the \flashClustPack{} package. Once the fastcluster library is loaded at the beginning of the code, every program that uses hierarchical clustering can benefit immediately and effortlessly from the performance gain. Moreover, there are memory-saving routines for clustering of vector data, which go beyond what the existing packages provide. \end{abstract} \noindent This document describes the usage for the two interfaces for R and Python and is meant as the reference document for the end user. Installation instructions are given in the file INSTALL in the source distribution and are not repeated here. The sections about the two interfaces are independent and in consequence somewhat redundant, so that users who need a reference for one interface need to consult only one section. If you use the fastcluster package for scientific work, please cite it as: \begin{quote} Daniel Müllner, \textit{fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python}, Journal of Statistical Software, \textbf{53} (2013), no.~9, 1--18, \url{https://doi.org/10.18637/jss.v053.i09}. \end{quote} The \hyperref[yule]{“Yule” distance function} changed in the Python interface of fastcluster version 1.2.0. This is following a \href{https://github.com/scipy/scipy/commit/3b22d1da98dc1b5f64bc944c21f398d4ba782bce}{change in SciPy 1.6.3}. \textbf{It is recommended to use fastcluster version 1.1.x together with SciPy versions before 1.6.3 and fastcluster 1.2.x with SciPy $\geq{}$1.6.3.} The R interface does have the “Yule” distance function, hence is not affected by this change. The fastcluster package is considered stable and will undergo few changes from now on. If some years from now there have not been any updates, this does not necessarily mean that the package is unmaintained but maybe it just was not necessary to correct anything. Of course, please still report potential bugs and incompatibilities to \texttt{daniel@danifold.net}. \tableofcontents \section{The R interface} Load the package with the following command: \begin{quote} \texttt{library(\q fastcluster\q)} \end{quote} The package overwrites the function \hclust{} from the \stats{} package (in the same way as the \flashClustPack{} package does). Please remove any references to the \flashClustPack{} package in your R files to not accidentally overwrite the \hclust{} function with the \flashClustPack{} version. The \hyperref[hclust]{new \texttt{hclust} function} has exactly the same calling conventions as the old one. You may just load the package and immediately and effortlessly enjoy the performance improvements. The function is also an improvement to the \texttt{flashClust} function from the \flashClustPack{} package. Just replace every call to \texttt{flashClust} by \hyperref[hclust]{\texttt{hclust}} and expect your code to work as before, only faster.\footnote{If you are using flashClust prior to version 1.01, update it! See the change log for \flashClustPack{} at \url{https://cran.r-project.org/web/packages/flashClust/ChangeLog}.} In case the data includes infinite or NaN values, see \autoref{sec:infnan}. If you need to access the old function or make sure that the right function is called, specify the package as follows: \begin{quote} \texttt{\hyperref[hclust]{fastcluster::hclust}(…)}\\ \texttt{flashClust::hclust(…)}\\ \texttt{stats::hclust(…)} \end{quote} Vector data can be clustered with a memory-saving algorithm with the command: \begin{quote} \texttt{\hyperref[hclust.vector]{hclust.vector}(…)} \end{quote} The following sections contain comprehensive descriptions of these methods. \begin{methods} \item [\normalfont\texttt{\textbf{hclust}}\,(\textit{d, method=\q complete\q, members=NULL})] \phantomsection\label{hclust} \addcontentsline{toc}{subsection}{\texttt{hclust}} Hierarchical, agglomerative clustering on a condensed dissimilarity matrix. This method has the same specifications as the method \hclust{} in the package \stats{} and \texttt{hclust} alias \texttt{flashClust} in the package \flashClustPack{}. In particular, the \print{}, \plot{}, \rect{} and \identify{} methods work as expected. The argument $d$ is a condensed distance matrix, as it is produced by \dist. The argument \textit{method} is one of the strings \textit{\q single\q}, \textit{\q complete\q}, \textit{\q average\q}, \textit{\q mcquitty\q}, \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward.D\q}, \textit{\q ward.D2\q} or an unambiguous abbreviation thereof. The argument \textit{members} specifies the sizes of the initial nodes, ie.\ the number of observations in the initial clusters. The default value \texttt{NULL} says that all initial nodes are singletons, ie.\ have size 1. Otherwise, \textit{members} must be a vector whose size is the number of input points. The vector is processed as a \double{} array so that not only integer cardinalities of nodes can be accounted for but also weighted nodes with real weights. The general scheme of the agglomerative clustering procedure is as follows: \begin{enumerate} \item Start with $N$ singleton clusters (nodes) labeled $-1,\ldots, -N$, which represent the input points. \item Find a pair of nodes with minimal distance among all pairwise distances. \item Join the two nodes into a new node and remove the two old nodes. The new nodes are labeled consecutively $1,2,\ldots$ \item The distances from the new node to all other nodes is determined by the \textit{method} parameter (see below). \item Repeat $N-1$ times from step 2, until there is one big node, which contains all original input points. \end{enumerate} The output of \texttt{hclust} is an object of class \texttt{\q hclust\q} and represents a \emph{stepwise dendrogram}. It contains the following fields: \begin{description} \item[\normalfont\textit{merge}] This is an $(N-1)\times 2$ array. Row $i$ specifies the labels of the nodes which are joined step $i$ of the clustering. \item[\normalfont\textit{height}] This is a vector of length $N-1$. It contains the sequence of dissimilarities at which every pair of nearest nodes is joined. \item[\normalfont\textit{order}] This is a vector of length $N$. It contains a permutation of the numbers $1,\ldots N$ for the \plot{} method. When the dendrogram is plotted, this is the order in which the singleton nodes are plotted as the leaves of a rooted tree. The order is computed so that the dendrogram is plotted without intersections (except the case when there are inversions for the \textit{\q centroid\q} and \textit{\q median\q} methods). The choice of the \textit{\q order\q} sequence follows the same scheme as the \texttt{stats} package does, only with a faster algorithm. Note that there are many valid choices to order the nodes in a dendrogram without intersections. Also, subsequent points in the \textit{\q order\q} field are not always close in the ultrametric given by the dendrogram. \item[\normalfont\textit{labels}] This copies the attribute \textit{\q Labels\q} from the first input parameter $d$. It contains the labels for the objects being clustered. \item[\normalfont\textit{method}] The (unabbreviated) string for the \textit{\q method\q} parameter. See below for a specification of all available methods. \item[\normalfont\textit{call}] The full command that produced the result. See \matchcall. \item[\normalfont\textit{dist.method}] This \textit{\q method\q} attribute of the first input parameter $d$. This specifies which metric was used in the \texttt{dist} method which generated the first argument. \end{description} The parameter \textit{method} specifies which clustering scheme to use. The clustering scheme determines the distance from a new node to the other nodes. Denote the dissimilarities by $d$, the nodes to be joined by $I,J$, the new node by $K$ and any other node by $L$. The symbol $|I|$ denotes the size of the cluster $I$. \begin{description} \item [\normalfont\textit{method=\q single\q}:] $\displaystyle d(K,L) = \min(d(I,L), d(J,L))$ The distance between two clusters $A,B$ is the closest distance between any two points in each cluster: \[ d(A,B)=\min_{a\in A, b\in B}d(a,b) \] \item [\normalfont\textit{method=\q complete\q}:] $\displaystyle d(K,L) = \max(d(I,L), d(J,L))$ The distance between two clusters $A,B$ is the maximal distance between any two points in each cluster: \[ d(A,B)=\max_{a\in A, b\in B}d(a,b) \] \item [\normalfont\textit{method=\q average\q}:] $\displaystyle d(K,L) = \frac{|I|\cdot d(I,L)+|J|\cdot d(J,L)}{|I|+|J|}$ The distance between two clusters $A,B$ is the average distance between the points in the two clusters: \[ d(A,B)=\frac1{|A||B|}\sum_{a\in A, b\in B}d(a,b) \] \item [\normalfont\textit{method=\q mcquitty\q}:] $\displaystyle d(K,L) = \tfrac12(d(I,L)+d(J,L))$ There is no global description for the distance between clusters since the distance depends on the order of the merging steps. \end{description} The following three methods are intended for Euclidean data only, ie.\ when $X$ contains the pairwise \textbf{squared} distances between vectors in Euclidean space. The algorithm will work on any input, however, and it is up to the user to make sure that applying the methods makes sense. \begin{description} \item [\normalfont\textit{method=\q centroid\q}:] $\displaystyle d(K,L) = \frac{|I|\cdot d(I,L)+|J|\cdot d(J,L)}{|I|+|J|}-\frac{|I|\cdot|J|\cdot d(I,J)}{(|I|+|J|)^2}$ There is a geometric interpretation: $d(A,B)$ is the distance between the centroids (ie.\ barycenters) of the clusters in Euclidean space: \[ d(A,B) = \|\vec c_A-\vec c_B\|^2, \] where $\vec c_A$ denotes the centroid of the points in cluster $A$. \item [\normalfont\textit{method=\q median\q}:] $\displaystyle d(K,L) = \tfrac12 d(I,L)+\tfrac12 d(J,L)-\tfrac14 d(I,J)$ Define the midpoint $\vec w_K$ of a cluster $K$ iteratively as $\vec w_K=k$ if $K=\{k\}$ is a singleton and as the midpoint $\frac12(\vec w_I+\vec w_J)$ if $K$ is formed by joining $I$ and $J$. Then we have \[ d(A,B)=\|\vec w_A-\vec w_B\|^2 \] in Euclidean space for all nodes $A,B$. Notice however that this distance depends on the order of the merging steps. \item [\normalfont\textit{method=\q ward.D\q}:] $\displaystyle d(K,L) = \frac{(|I|+|L|)\cdot d(I,L)+(|J|+|L|)\cdot d(J,L)-|L|\cdot d(I,J)}{|I|+|J|+|L|}$ The global cluster dissimilarity can be expressed as \[ d(A,B)=\frac{2|A||B|}{|A|+|B|}\cdot\|\vec c_A-\vec c_B\|^2, \] where $\vec c_A$ again denotes the centroid of the points in cluster $A$. \item [\normalfont\textit{method=\q ward.D2\q}:] This is the equivalent of \textit{\q ward.D\q}, but for input consisting of untransformed (in particular: \textbf{non-squared}) Euclidean distances. Internally, all distances are squared first, then method \textit{ward.D} is applied, and finally the square root of all heights in the dendrogram is taken. Thus, global cluster dissimilarity can be expressed as the square root of that for \textit{ward.D}, namely \[ d(A,B)=\sqrt{\frac{2|A||B|}{|A|+|B|}}\cdot\|\vec c_A-\vec c_B\|. \] \end{description} \item [\normalfont\texttt{\textbf{hclust.vector}}\,(\textit{X, method=\q single\q, members=NULL, metric=\q euclidean\q, p=NULL})] \phantomsection\label{hclust.vector} \addcontentsline{toc}{subsection}{\texttt{hclust.vector}} This performs hierarchical, agglomerative clustering on vector data with memory-saving algorithms. While the \hyperref[hclust]{\texttt{hclust}} method requires $\Theta(N^2)$ memory for clustering of $N$ points, this method needs $\Theta(ND)$ for $N$ points in $\mathbb R^D$, which is usually much smaller. The argument $X$ must be a two-dimensional matrix with \double{} precision values. It describes $N$ data points in $\mathbb R^D$ as an $(N\times D)$ matrix. The parameter \textit{\q members\q} is the same as for \hyperref[hclust]{\texttt{hclust}}. The parameter \textit{\q method\q} is one of the strings \textit{\q single\q}, \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward\q}, or an unambiguous abbreviation thereof. If \textit{method} is \textit{\q single\q}, single linkage clustering is performed on the data points with the metric which is specified by the \textit{metric} parameter. The choices are the same as in the \dist{} method: \textit{\q euclidean\q}, \textit{\q maximum\q}, \textit{\q manhattan\q}, \textit{\q canberra\q}, \textit{\q binary\q} and \textit{\q minkowski\q}. Any unambiguous substring can be given. The parameter \textit{p} is used for the \textit{\q minkowski\q} metric only. The call \begin{quote} \texttt{hclust.vector(X, method=\q single\q, metric=[...])} \end{quote} is equivalent to \begin{quote} \texttt{hclust(dist(X, metric=[...]), method=\q single\q)} \end{quote} but uses less memory and is equally fast. Ties may be resolved differently, ie.\ if two pairs of nodes have equal, minimal dissimilarity values at some point, in the specific computer's representation for floating point numbers, either pair may be chosen for the next merging step in the dendrogram. Note that the formula for the \textit{\q canberra\q} metric changed in R 3.5.0: Before R version 3.5.0, the \textit{\q canberra\q} metric was computed as \[ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j+v_j|}. \] Starting with R version 3.5.0, the formula was corrected to \[ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j|+|v_j|}. \] Summands with $u_j=v_j=0$ always contribute 0 to the sum. The second, newer formula equals SciPy's definition. The fastcluster package detects the R version at runtime and chooses the formula accordingly, so that fastcluster and the \dist{} method always use the same formula for a given R version. If \textit{method} is one of \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward\q}, clustering is performed with respect to Euclidean distances. In this case, the parameter \textit{metric} must be \textit{\q euclidean\q}. Notice that \texttt{hclust.vector} operates on Euclidean distances for compatibility reasons with the \dist{} method, while \hyperref[hclust]{\texttt{hclust}} assumes \textbf{squared} Euclidean distances for compatibility with the \href{https://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html}{\texttt{stats::hclust}} method! Hence, the call \phantomsection\label{squared} \begin{quote} \texttt{hc = hclust.vector(X, method=\q centroid\q)} \end{quote} is, aside from the lesser memory requirements, equivalent to \begin{quote} \texttt{d = dist(X)}\\ \texttt{hc = hclust(d\textasciicircum 2, method=\q centroid\q)}\\ \texttt{hc\$height = sqrt(hc\$height)} \end{quote} The same applies to the \textit{\q median\q} method. The \textit{\q ward\q} method in \hyperref[hclust.vector]{\texttt{hclust.vector}} is equivalent to \hyperref[hclust]{\texttt{hclust}} with method \textit{\q ward.D2\q}, but to method \textit{\q ward.D\q} only after squaring as above. Differences in these algebraically equivalent methods may arise only from floating-point inaccuracies and the resolution of ties (which may, however, in extreme cases affect the entire clustering result due to the inherently unstable nature of the clustering schemes). \end{methods} \section{The Python interface} The fastcluster package is imported as usual by: \begin{quote} \texttt{import fastcluster} \end{quote} It provides the following functions: \begin{quote} \hyperref[linkage]{\texttt{linkage}}\,(\textit{X, method=\q single\q, metric=\q euclidean\q, preserve\_input=True})\\ \hyperref[single]{\texttt{single}}\,($X$)\\ \hyperref[complete]{\texttt{complete}}\,($X$)\\ \hyperref[average]{\texttt{average}}\,($X$)\\ \hyperref[weighted]{\texttt{weighted}}\,($X$)\\ \hyperref[ward]{\texttt{ward}}\,($X$)\\ \hyperref[centroid]{\texttt{centroid}}\,($X$)\\ \hyperref[median]{\texttt{median}}\,($X$)\\ \hyperref[linkage_vector]{\texttt{linkage\_vector}}\,(\textit{X, method=\q single\q, metric=\q euclidean\q, extraarg=None}) \end{quote} The following sections contain comprehensive descriptions of these methods. \begin{methods} \item [\normalfont\texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q single\q, metric=\q euclidean\q, preserve\_input=\q True\q})] \phantomsection\label{linkage} \addcontentsline{toc}{subsection}{\texttt{linkage}} Hierarchical, agglomerative clustering on a condensed dissimilarity matrix or on vector data. Apart from the argument \textit{preserve\_input}, the method has the same input parameters and output format as the function of the same name in the module \hierarchy. The argument $X$ is preferably a \NumPyarray{} with floating point entries (\texttt{X.dtype\hskip0pt==\hskip0pt numpy.double}). Any other data format will be converted before it is processed. NumPy's \maskedarrays{} are not treated as special, and the mask is simply ignored. If $X$ is a one-dimensional array, it is considered a condensed matrix of pairwise dissimilarities in the format which is returned by \pdist. It contains the flattened, upper-triangular part of a pairwise dissimilarity matrix. That is, if there are $N$ data points and the matrix $d$ contains the dissimilarity between the $i$-th and $j$-th observation at position $d_{i,j}$, the vector $X$ has length $\binom N2$ and is ordered as follows: \[ d = \begin{pmatrix} 0&d_{0,1}&d_{0,2}&\ldots&d_{0,n-1}\\ & 0&d_{1,2} & \ldots\\ &&0&\ldots\\ &&&\ddots\\ &&&&0 \end{pmatrix} = \begin{pmatrix} 0&X[0] &X[1]&\ldots&X[n-2]\\ & 0&X[n-1] & \ldots\\ &&0&\ldots\\ &&&\ddots\\ &&&&0 \end{pmatrix} \] The \textit{metric} argument is ignored in case of dissimilarity input. The optional argument \textit{preserve\_input} specifies whether the method makes a working copy of the dissimilarity vector or writes temporary data into the existing array. If the dissimilarities are generated for the clustering step only and are not needed afterward, approximately half the memory can be saved by specifying \textit{preserve\_input=False}. Note that the input array $X$ contains unspecified values after this procedure. It is therefore safer to write \begin{verbatim} linkage(X, method="...", preserve_input=False) del X \end{verbatim} to make sure that the matrix $X$ is not accessed accidentally after it has been used as scratch memory. (The single linkage algorithm does not write to the distance matrix or its copy anyway, so the \textit{preserve\_input} flag has no effect in this case.) If $X$ contains vector data, it must be a two-dimensional array with $N$ observations in $D$ dimensions as an $(N\times D)$ array. The \textit{preserve\_input} argument is ignored in this case. The specified \textit{metric} is used to generate pairwise distances from the input. The following two function calls yield equivalent output: \begin{verbatim} linkage(pdist(X, metric), method="...", preserve_input=False) linkage(X, metric=metric, method="...") \end{verbatim} The two results are identical in most cases, but differences occur if ties are resolved differently: if the minimum in step 2 below is attained for more than one pair of nodes, either pair may be chosen. It is not guaranteed that both \texttt{linkage} variants choose the same pair in this case. The general scheme of the agglomerative clustering procedure is as follows: \begin{enumerate} \item Start with $N$ singleton clusters (nodes) labeled $0,\ldots, N-1$, which represent the input points. \item Find a pair of nodes with minimal distance among all pairwise distances. \item Join the two nodes into a new node and remove the two old nodes. The new nodes are labeled consecutively $N,N+1,\ldots$ \item The distances from the new node to all other nodes is determined by the \textit{method} parameter (see below). \item Repeat $N-1$ times from step 2, until there is one big node, which contains all original input points. \end{enumerate} The output of \texttt{linkage} is \emph{stepwise dendrogram}, which is represented as an $(N-1)\times 4$ \NumPyarray{} with floating point entries (\texttt{dtype=numpy.double}). The first two columns contain the node indices which are joined in each step. The input nodes are labeled $0,\ldots,N-1$, and the newly generated nodes have the labels $N,\ldots, 2N-2$. The third column contains the distance between the two nodes at each step, ie.\ the current minimal distance at the time of the merge. The fourth column counts the number of points which comprise each new node. The parameter \textit{method} specifies which clustering scheme to use. The clustering scheme determines the distance from a new node to the other nodes. Denote the dissimilarities by $d$, the nodes to be joined by $I,J$, the new node by $K$ and any other node by $L$. The symbol $|I|$ denotes the size of the cluster $I$. \begin{description} \item [\normalfont\textit{method=\q single\q}:] $\displaystyle d(K,L) = \min(d(I,L), d(J,L))$ The distance between two clusters $A,B$ is the closest distance between any two points in each cluster: \[ d(A,B)=\min_{a\in A, b\in B}d(a,b) \] \item [\normalfont\textit{method=\q complete\q}:] $\displaystyle d(K,L) = \max(d(I,L), d(J,L))$ The distance between two clusters $A,B$ is the maximal distance between any two points in each cluster: \[ d(A,B)=\max_{a\in A, b\in B}d(a,b) \] \item [\normalfont\textit{method=\q average\q}:] $\displaystyle d(K,L) = \frac{|I|\cdot d(I,L)+|J|\cdot d(J,L)}{|I|+|J|}$ The distance between two clusters $A,B$ is the average distance between the points in the two clusters: \[ d(A,B)=\frac1{|A||B|}\sum_{a\in A, b\in B}d(a,b) \] \item [\normalfont\textit{method=\q weighted\q}:] $\displaystyle d(K,L) = \tfrac12(d(I,L)+d(J,L))$ There is no global description for the distance between clusters since the distance depends on the order of the merging steps. \end{description} The following three methods are intended for Euclidean data only, ie.\ when $X$ contains the pairwise (non-squared!)\ distances between vectors in Euclidean space. The algorithm will work on any input, however, and it is up to the user to make sure that applying the methods makes sense. \begin{description} \item [\normalfont\textit{method=\q centroid\q}:] $\displaystyle d(K,L) = \sqrt{\frac{|I|\cdot d(I,L)^2+|J|\cdot d(J,L)^2}{|I|+|J|}-\frac{|I|\cdot|J|\cdot d(I,J)^2}{(|I|+|J|)^2}}$ There is a geometric interpretation: $d(A,B)$ is the distance between the centroids (ie.\ barycenters) of the clusters in Euclidean space: \[ d(A,B) = \|\vec c_A-\vec c_B\|, \] where $\vec c_A$ denotes the centroid of the points in cluster $A$. \item [\normalfont\textit{method=\q median\q}:] $\displaystyle d(K,L) = \sqrt{\tfrac12 d(I,L)^2+\tfrac12 d(J,L)^2-\tfrac14 d(I,J)^2}$ Define the midpoint $\vec w_K$ of a cluster $K$ iteratively as $\vec w_K=k$ if $K=\{k\}$ is a singleton and as the midpoint $\frac12(\vec w_I+\vec w_J)$ if $K$ is formed by joining $I$ and $J$. Then we have \[ d(A,B)=\|\vec w_A-\vec w_B\| \] in Euclidean space for all nodes $A,B$. Notice however that this distance depends on the order of the merging steps. \item [\normalfont\textit{method=\q ward\q}:] \raggedright $\displaystyle d(K,L) = \sqrt{\frac{(|I|+|L|)\cdot d(I,L)^2+(|J|+|L|)\cdot d(J,L)^2-|L|\cdot d(I,J)^2}{|I|+|J|+|L|}}$ The global cluster dissimilarity can be expressed as \[ d(A,B)=\sqrt{\frac{2|A||B|}{|A|+|B|}}\cdot\|\vec c_A-\vec c_B\|, \] where $\vec c_A$ again denotes the centroid of the points in cluster $A$. \end{description} \item [\normalfont\texttt{fastcluster.\textbf{single}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{single}}\label{single} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q single\q}). \item [\normalfont\texttt{fastcluster.\textbf{complete}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{complete}}\label{complete} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q complete\q}). \item [\normalfont\texttt{fastcluster.\textbf{average}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{average}}\label{average} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q average\q}). \item [\normalfont\texttt{fastcluster.\textbf{weighted}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{weighted}}\label{weighted} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q weighted\q}). \item [\normalfont\texttt{fastcluster.\textbf{centroid}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{centroid}}\label{centroid} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q centroid\q}). \item [\normalfont\texttt{fastcluster.\textbf{median}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{median}}\label{median} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q median\q}). \item [\normalfont\texttt{fastcluster.\textbf{ward}}\,(\textit{X})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{ward}}\label{ward} Alias for \texttt{fastcluster.\textbf{linkage}}\,(\textit{X, method=\q ward\q}). \item [\normalfont\texttt{fastcluster.\textbf{linkage\_vector}}\,(\textit{X, method=\q single\q, metric=\q euclidean\q, extraarg=\q None\q})] \phantomsection\addcontentsline{toc}{subsection}{\texttt{linkage\_vector}}\label{linkage_vector} This performs hierarchical, agglomerative clustering on vector data with memory-saving algorithms. While the \hyperref[linkage]{\texttt{linkage}} method requires $\Theta(N^2)$ memory for clustering of $N$ points, this method needs $\Theta(ND)$ for $N$ points in $\mathbb R^D$, which is usually much smaller. The argument $X$ has the same format as before, when $X$ describes vector data, ie.\ it is an $(N\times D)$ array. Also the output array has the same format. The parameter \textit{method} must be one of \textit{\q single\q}, \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward\q}, ie.\ only for these methods there exist memory-saving algorithms currently. If \textit{method}, is one of \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward\q}, the \textit{metric} must be \textit{\q euclidean\q}. Like the \texttt{linkage} method, \texttt{linkage\_vector} does not treat NumPy's \maskedarrays{} as special and simply ignores the mask. For single linkage clustering, any dissimilarity function may be chosen. Basically, every metric which is implemented in the method \pdist{} is reimplemented here. However, the metrics differ in some instances since a number of mistakes and typos (both in the code and in the documentation) were corrected in the \textit{fastcluster} package.\footnote{Hopefully, the SciPy metric will be corrected in future versions and some day coincide with the \textit{fastcluster} definitions. See the bug reports at \url{https://github.com/scipy/scipy/issues/2009}, \url{https://github.com/scipy/scipy/issues/2011}.} Therefore, the available metrics with their definitions are listed below as a reference. The symbols $u$ and $v$ mostly denote vectors in $\mathbb R^D$ with coordinates $u_j$ and $v_j$ respectively. See below for additional metrics for Boolean vectors. Unless otherwise stated, the input array $X$ is converted to a floating point array (\texttt{X.dtype==\allowbreak numpy.double}) if it does has have already the required data type. Some metrics accept Boolean input; in this case this is stated explicitly below. \begin{description} \item[\normalfont\textit{\q euclidean\q}:] Euclidean metric, $L_2$ norm \[ d(u,v) = \| u-v\|_2 = \sqrt{\sum_j (u_j-v_j)^2} \] \item[\normalfont\textit{\q sqeuclidean\q}:] squared Euclidean metric \[ d(u,v) = \| u-v\|^2_2 = \sum_j (u_j-v_j)^2 \] \item[\normalfont\textit{\q seuclidean\q}:] standardized Euclidean metric \[ d(u,v) = \sqrt{\sum_j (u_j-v_j)^2 /V_j} \] The vector $V=(V_0,\ldots,V_{D-1})$ is given as the \textit{extraarg} argument. If no \textit{extraarg} is given, $V_j$ is by default the unbiased sample variance of all observations in the $j$-th coordinate, $V_j = \Var_i(X_{i,j})=\frac1{N-1}\sum_i(X_{i,j}^2-\mu(X_j)^2)$. (Here, $\mu(X_j)$ denotes as usual the mean of $X_{i,j}$ over all rows $i$.) \item[\normalfont\textit{\q mahalanobis\q}:] Mahalanobis distance \[ d(u,v) = \sqrt{(u-v)^{\mkern-3mu\top}V (u-v)} \] Here, $V=\textit{extraarg}$, a $(D\times D)$-matrix. If $V$ is not specified, the inverse of the covariance matrix \texttt{numpy.linalg.inv(numpy.\allowbreak cov(\allowbreak X, rowvar=False))} is used: \[ (V^{-1})_{j,k} = \frac1{N-1} \sum_i (X_{i,j}-\mu(X_j))(X_{i,k}-\mu(X_k)) \] \item[\normalfont\textit{\q cityblock\q}:] the Manhattan distance, $L_1$ norm \[ d(u,v) = \sum_j |u_j-v_j| \] \item[\normalfont\textit{\q chebychev\q}:] the supremum norm, $L_\infty$ norm \[ d(u,v) = \max_j |u_j-v_j| \] \item[\normalfont\textit{\q minkowski\q}:] the $L_p$ norm \[ d(u,v) = \left(\sum_j |u_j-v_j|^p\right)^{1/p} \] This metric coincides with the \textit{cityblock}, \textit{euclidean} and \textit{chebychev} metrics for $p=1$, $p=2$ and $p=\infty$ (\texttt{numpy.inf}), respectively. The parameter $p$ is given as the \textit{\q extraarg\q} argument. \item[\normalfont\textit{\q cosine\q}] \[ d(u,v) = 1 - \frac{\langle u,v\rangle}{\|u\|\cdot\|v\|} = 1 - \frac{\sum_j u_jv_j}{\sqrt{\sum_j u_j^2\cdot \sum_j v_j^2}} \] \item[\normalfont\textit{\q correlation\q}:] This method first mean-centers the rows of $X$ and then applies the \textit{cosine} distance. Equivalently, the \textit{correlation} distance measures $1-{}$\textrm{(Pearson's correlation coefficient)}. \[ d(u,v) = 1 - \frac{\langle u-\mu(u),v-\mu(v)\rangle}{\|u-\mu(u)\|\cdot\|v-\mu(v)\|}, \] \item[\normalfont\textit{\q canberra\q}] \[ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j|+|v_j|} \] Summands with $u_j=v_j=0$ contribute 0 to the sum. \item[\normalfont\textit{\q braycurtis\q}] \[ d(u,v) = \frac{\sum_j |u_j-v_j|}{\sum_j |u_j+v_j|} \] \item[\textnormal{(user function):}] The parameter \textit{metric} may also be a function which accepts two NumPy floating point vectors and returns a number. Eg.\ the Euclidean distance could be emulated with \begin{quote} \texttt{fn = lambda u, v: numpy.sqrt(((u-v)*(u-v)).sum())}\\ \texttt{linkage\_vector(X, method=\q single\q, metric=fn)} \end{quote} This method, however, is much slower than the built-in function. \item[\normalfont\textit{\q hamming\q}:] The Hamming distance accepts a Boolean array (\texttt{X.\allowbreak dtype\allowbreak ==\allowbreak bool}) for efficient storage. Any other data type is converted to \texttt{numpy.double}. \[ d(u,v) = |\{j\mid u_j\neq v_j\}| \] \item[\normalfont\textit{\q jaccard\q}:] The Jaccard distance accepts a Boolean array (\texttt{X.dtype\allowbreak ==\allowbreak bool}) for efficient storage. Any other data type is converted to \texttt{numpy.double}. \[ d(u,v) = \frac{|\{j\mid u_j\neq v_j\}|}{|\{j\mid u_j\neq 0\text{ or } v_j\neq 0\}|} \] \[ d(0,0) = 0 \] Python represents \texttt{True} by 1 and \texttt{False} by 0. In the Boolean case, the Jaccard distance is therefore: \[ d(u,v) = \frac{|\{j\mid u_j\neq v_j\}|}{|\{j\mid u_j\lor v_j\}|} \] \end{description} The following metrics are designed for Boolean vectors. The input array is converted to the \texttt{bool} data type if it is not Boolean already. Use the following abbreviations for the entries of a contingency table: \begin{align*} a &= |\{j\mid u_j\land v_j \}| & b &= |\{j\mid u_j\land(\lnot v_j)\}|\\ c &= |\{j\mid (\lnot u_j)\land v_j \}| & d &= |\{j\mid (\lnot u_j)\land(\lnot v_j)\}| \end{align*} Recall that $D$ denotes the number of dimensions, hence $D=a+b+c+d$. \begin{description} \item[\normalfont\textit{\q yule\q}] \phantomsection\label{yule} \begin{align*} d(u,v) &= \frac{2bc}{ad+bc} && \text{if $bc \neq 0$}\\ d(u,v) &= 0 && \text{if $bc = 0$} \end{align*} Note that the second clause $d(u,v)=0$ if $bc = 0$ was introduced in fastcluster version 1.2.0. Before, the result was NaN if the denominator in the formula was zero. fastcluster is following a \href{https://github.com/scipy/scipy/commit/3b22d1da98dc1b5f64bc944c21f398d4ba782bce}{change in SciPy 1.6.3} here. \item[\normalfont\textit{\q dice\q}] \begin{gather*} d(u,v) = \frac{b+c}{2a+b+c}\\ d(0,0) = 0 \end{gather*} \item[\normalfont\textit{\q rogerstanimoto\q}] \[ d(u,v) = \frac{2(b+c)}{b+c+D} \] \item[\normalfont\textit{\q russellrao\q}] \[ d(u,v) = \frac{b+c+d}{D} \] \item[\normalfont\textit{\q sokalsneath\q}] \begin{gather*} d(u,v) = \frac{2(b+c)}{a+2(b+c)}\\ d(0,0) = 0 \end{gather*} \item[\normalfont\textit{\q kulsinski\q}] \[ d(u,v) = \frac 12\cdot\left(\frac b{a+b} + \frac c{a+c}\right) \] \item[\normalfont\textit{\q matching\q}] \[ d(u,v) = \frac{b+c}{D} \] Notice that when given a Boolean array, the \textit{matching} and \textit{hamming} distance are the same. The \textit{matching} distance formula, however, converts every input to Boolean first. Hence, the vectors $(0,1)$ and $(0,2)$ have zero \textit{matching} distance since they are both converted to $(\mathrm{False}, \mathrm{True})$ but the \textit{hamming} distance is $0.5$. \item[\normalfont\textit{\q sokalmichener\q}] is an alias for \textit{\q matching\q}. \end{description} \end{methods} \section{Behavior for NaN and infinite values}\label{sec:infnan} Whenever the fastcluster package encounters a NaN value as the distance between nodes, either as the initial distance or as an updated distance after some merging steps, it raises an error. This was designed intentionally, even if there might be ways to propagate NaNs through the algorithms in a more or less sensible way. Indeed, since the clustering result depends on every single distance value, the presence of NaN values usually indicates a dubious clustering result, and therefore NaN values should be eliminated in preprocessing. In the R interface for vector input, coordinates with {\NA} value are interpreted as missing data and treated in the same way as R's {\dist} function does. This results in valid output whenever the resulting distances are not NaN. The Python interface does not provide any way of handling missing coordinates, and data should be processed accordingly and given as pairwise distances to the clustering algorithms in this case. The fastcluster package handles node distances and coordinates with infinite values correctly, as long as the formulas for the distance updates and the metric (in case of vector input) make sense. In concordance with the statement above, an error is produced if a NaN value results from performing arithmetic with infinity. Also, the usual proviso applies: internal formulas in the code are mathematically equivalent to the formulas as stated in the documentation only for finite, real numbers but might produce different results for $\pm\infty$. Apart from obvious cases like single or complete linkage, it is therefore recommended that users think about how they want infinite values to be treated by the distance update and metric formulas and then check whether the fastcluster code does exactly what they want in these special cases. \section{Differences between the two interfaces} \begin{itemize} \item The \textit{\q mcquitty\q} method in R is called \textit{\q weighted\q} in Python. \item R and SciPy use different conventions for the ``Euclidean'' methods \textit{\q centroid\q}, \textit{\q median\q}! R assumes that the dissimilarity matrix consists of squared Euclidean distances, while SciPy expects non-squared Euclidean distances. The fastcluster package respects these conventions and uses different formulas in the two interfaces. The \textit{\q ward\q} method in the Python interface is identical to \textit{\q ward.D2\q} in the R interface. If the same results in both interfaces ought to be obtained, then the \hyperref[hclust]{\texttt{hclust}} function in R must be input the entry-wise square of the distance matrix, \verb!d^2!, for the \textit{\q ward.D\q}, \textit{\q centroid\q} and \textit{\q median\q} methods, and later the square root of the height field in the dendrogram must be taken. The \hyperref[hclust.vector]{\texttt{hclust.vector}} method calculates non-squared Euclidean distances, like R's \dist{} method and identically to the Python interface. See the \hyperref[squared]{example} in the \hyperref[hclust.vector]{\texttt{hclust.vector}} documentation above. For the \textit{\q average\q} and \textit{\q weighted\q} alias \textit{\q mcquitty\q} methods, the same, non-squared distance matrix \texttt{d} as in the Python interface must be used for the same results. The \textit{\q single\q} and \textit{\q complete\q} methods only depend on the relative order of the distances, hence it does not make a difference whether the method operates on the distances or the squared distances. The code example in the R documentation (enter \texttt{?hclust} or \texttt{ex\-am\-ple(hclust)} in R) contains another instance where the squared distance matrix is generated from Euclidean data. \item The Python interface is not designed to deal with missing values, and NaN values in the vector data raise an error message. The \hyperref[hclust.vector]{\texttt{hclust.vector}} method in the R interface, in contrast, deals with NaN and the (R specific) {\NA} values in the same way as the \dist{} method does. Confer the documentation for \dist{} for details. \end{itemize} \section{References} \begin{trivlist} \item \textit{NumPy: Scientific computing tools for Python}, \url{http://numpy.org/}. \item Eric Jones, Travis Oliphant, Pearu Peterson et al., \textit{SciPy: Open Source Scientific Tools for Python}, 2001, \url{https://www.scipy.org}. \item \textit{R: A Language and Environment for Statistical Computing}, R Foundation for Statistical Computing, Vienna, 2011, \url{https://www.r-project.org}. \end{trivlist} \end{document} %%% Local variables: %%% mode: latex %%% TeX-master: "fastcluster.Rtex" %%% TeX-PDF-mode: t %%% End: fastcluster/inst/doc/fastcluster.pdf0000644000176200001440000035404714541365753017417 0ustar liggesusers%PDF-1.5 % 24 0 obj <> stream xڕXYs~ϯ[s3اlx=[Te1 ]4z{+JOɪ<¯Ͽެ_'+,f2LYm.?8՜Z1ȳ+~]]HUI&s#]QQ÷^ڇzc2'E35IS*%4$2S~QI"$IDmPj&2R(Izi#JYilG\鈿Mj3l]7%R g^{%m7LvWpB'Z_oyY *[tw "D/ VT=ox7D]vi*DAakn5Tp/ʎ'W^9 "{֦V*4k0}ZpDCϴ2i* Ccq_mKH؇-zS4*^e{\:qH'۫cq>WE?hzVU(8mU4Vp# ^\oL+&2_\aO cEO]]Q75ے7ُp4y:HK\}msޒr((.)K.j1W&L^]{&va_ ~8@r׎:%ۓR$9I(xi5b5+i[LڃFj"ؗ(ks-S/cq/;Ah seu cȘ㇇gMQy *.HgFIe]6VBU<'#{5Ιl 5TJTn8L?*_M}Bg 4&XAY뗦q(j=wNw15[LL\ENJjJ>a=X0-qmW9*>ȉT)]qĔIAkUgک_9WH~%tҳdUG*9enC&tʮ=^В^;>}mD싪[?]hh=eP@iq Ƕ{swUba4'ZZl=8?co\wm]giT>˸2M[Pk#Y( FK yޥ9"py!=2) T0<#]$,] Ok&ײ;]&s,0cX>P1[L"q&[D&K ,.;_3,!Ғ,Wy%!u3W,~7a'OƱ!~`3lO9$ x e|E`jۄġkoVKH%a}On6۸Ƿ evDxo(/U]3 `Oh.3?iֳte:6OyZ"Q$"]> XƫWsb4Cpx`1!Ls8k%tt}U|'X,"~%Í91PIcp,(PДLDYU}/J1$I݇KϲD}\N~:8ӰjÏ$0Lg-~EDB9L8c(wAn6&f`?e@_c+0\(y. ȯFZjfXԫ^_ endstream endobj 61 0 obj <> stream xZKs6W0F0&sQIRėMAg(!i;  ٖd6y n M2CoTkx<2 HBr&sHxrOkGӎ쿗&?^3#ϛaa]B2'yN-g:2K/Wf>tZ~s;T9{y''\",U ._K)JIAyDIfseCW8SA5H|6gD40,7B۔TZ/Pz#8 :+|xڳA%BXRlKycj1@bw(9'Cm3!W*Ot(.2\03јr_eڛuF^-`Y) a9Xrh|rŭϹ'42#.z4ePVL1u6.tm7AYUXe<:ҽsӯn.P8N 5H]Zvd`s؟`.he ܀iNH&;a6v | 7X^,'ҧ<<,# (zwZţU">ŪFSV#:jpry_:rZO99&ZrtLj] Q 0icS@q* 1a/`4ͣ[bk>s_v=L?Ge53:.S Aoe3t~4}Y?>B."gǍ$8'b7d|י.1[9}jxPP*A_cp {s`=h_c">uE!:3t&U];g9 jQI2#KxLz,SBqLArznbM&–֙u(|>`/ fvQA`>Rk-Z(j Oӓ#TAi C1 06X 0Vp"adplgr,ݨnA 3LoaW IwO߿fu[̩ N2>°J[r,|nb> stream xڽZKWjMpS䱎]Me%FHIQ&}h"%Hc׺kx4_?lA[u o_bO vϋ>EA B bY!o`~]޵[@ : r {WEAD/VB)5 T ҈+x1 6#G6_lEQ”{8KDsT,TwsDN$o$6W{\}?s {/g$Qm/]k۵W\nUST?Qutb64@Fue]VyFڵg2 YzQ:%<)ʲ3h:= ;*& Ů Oj9Dj~.l1b7՘ ZnYS X 1AeS *m7CW X2CN)b+S7HP=#*j"'˕)KA" jy;OUu  /Ε!\ktm*B$>U%gvgPgb''%>1y'Tv hݐQ1}6E1j&Ai؃~vMW$&YVL ԘpNP#HpV:Â)JPY)'S@McuX<\R0D x+w=j^xm[s _b@sc5MV"[vSKo~OtۆtiXm`衉J`uWnwGa.J]D E e_tDϴCbu{ע͉^E-:IFgnLMJ!?n˘ď[Zi%$ZQAx_=9q9DdgDm9$+ݍwc&l4ʥħ$yG)MIT͇|^&?-WL'?.0Qp B vj:raOiݡu]0G>*1)&>0sv3|rA]aӇr[cKg:nj4"{A\I4t;@oiw^+IeNCWT^CJ4)Ei _}$j]u> stream xZ[o~[i4p3[amlhD[J#Rl}ϙ3Hjhɨm$s9snwZ/_,Y|Yx0Ifl]V7ԃN3#oUsu-Ϻkge[啕=r;s ?.~ԃ b<ϻiܠg؇M(杈յR2%ٖnj7ӮKi_nYͬզWmU6 E^sXA] :,$-?jOwab_6(Tٮ+ZHơǺڕ+4_F XGsq)G“$k=^v>gʈȬĬӎ͆tyn- z7D3]v\wqh/`dob?vy-g:iHGeden}B^[< qSu1.s3-utA&ܡh7rbKxUr mV,J,z?JTCxKK rW9cx {H%&r׆DUWԌR1d¡7[攍]rC^gޖ!mV.)ggqT}P5WAFAn> HDd;3֚_1LU$}m(}(SCBkÛvwKra9(Y_ {H8Tqq|,vn XEw]-iP XoG ™b.t-%Ʃ=B 55:M,1aQjGbO0u6nҷT;ouP#+|sxE朖Qr"D@Ue/0n-7uH2RFx7M ,=RPQZ;|<t̍[l q\8ᑸ\-G<06-)U\\,?cađ?.-eU@=Zj,U2bĢ!`PGR uhj4UH4 "XG# BCK9*z s&F9+姪q{>ȉV# Յ4{{s8ft0UsYqh.h_JIhH&ZVX^kwdo*avjnSRb2u9U,ֳ7ŽgI t4)hHzWed3R! zZ;N&mvfKT⹍Iřp4U?рe/\P!$!2,h/gg}>b9+]pӧPlB K̥PsLɈ;8ws KSVf 84_!:AwvjFbBS!7,|N,4.=;9MCj=1)Ůuf<2p>$+T|\RB()=萨XIIk`KSB f cRuW_Iicѿ& dcCTgoչ h/ y$aMr#poe: Fm2#|;˶ivn MΉ欨*1Ңp٣gW+tzV0zFq`>F P̾-xەǦ;#$M fT oBޥdO)U<~Dtݔ8IDcfIa $088r>ǼIWpϩA= "B1Re\㧱%R&A7dϺDjz!Rs&xwI|ey -FXEHL8m$5o}ܔ)HJWv LAk?0m)k@0,"WC"DPC)i))gF1ZDuf%x5vT[<Jy\(Vс?`NN@\@X @ oQ}u8(>~K%~@:3h?_:HN1^\@0e6P3h?nH C" Ut""[\[RKZ40dZkm4b~o:s endstream endobj 91 0 obj <> stream x[K۸WB7=صzR9*΁83H<4|@%GIRZ$4F R$?l7fiHQ5[PMtvGjqaϫ0GP@t@)o]ѶQb 6$bJu?]r3zVjn{ g]_ ̓o!_cF| R&;lz*D98p3D5pR]2bhaH1շgRF \0yB/:.Gc9| =SO&N|sk|5X0w 1"0+ viR4%eoէlVym[Tkj׫}ir>%KvND0'*s;$iOOR#͝"#\,1 Q0Ukz&o}. ,L\`4d`*au5IPB u:LKVOEiFkrOXY^WY\?I89r 4]Q8sòr 8-yy(ֲ9>g u#gF`?~ ^JgC޾KhJ>9%9GOA }c9;0fabOnCD)wQd.]df˪͛0y|H|PIomQyh}zI=JArRX_tan Y%ww.o)`9.>gg H45'~-3ѷLf㔘>hQB]Q1] &*UF#T`h?.A@Grk(40@ B$s?D+m5}'93{! uU.]6 ;%g/uv~q#% B7BG:X>[~YBۜfZN2@MQެfZiSm7:սJWFT.HI#}ٍpZ 0"CӤ {AG1NIJȃ7R Ŕ zW$&8huUor-(HߟNUދD~W(cJÐ6O63J X 8U|.^@9$BKdB?Nŋ}p{xP׎}ا_1{4ARJSA | f46OO]{̭TOg~+_F9QyE֯ U&/ ~0>`R$-C|&^҇? KSm+R>\26N@"M)Y]Уˮ(k[Mc \NSDLqM嫔F86:zܝ'-ZG{ڗ"q:)XWo 7V))`;x/Sgq9+T1)ƁRzŸ&*hGrbˑKœwF]T^0@R }`H&0^Ε u `ϻ˜X-SI:V1f16Xy, 8>ZǶ,{=.H/޽ڝ/geeGdm܁+~Rdtr(r1}X`8ԣK(gb^@"-2[[oHJk_Z6o]\fئ(J.FۇlݣR:p쬼˥;f2Jg8U͒qZk%an涫v[{x:۸wl d%섆),]ssMC73dl08Tnh]wew}> stream xڽZYo~[(j:" -qy%f#%ٹ,8Ȗs Yг_flFloWW?#Ns=qK`XՏr}CwbtvD+<*Q DwP4nh2`-iXYLTzkt Dʗx9\{ƚ~w#},28# mujUt}`` CAoWL%l!̆]WwnP|-sĪ~9z ~U}{6xo&\ak VO]]"H@g ƈk!*mcfG<7] 3SQĆyx'ةu$JBLcEer戤"Wnp黍 ƣ̔CF4D 7%(o4㥗!VCT]r |`)ċo;/~7$jY9#D%v`-U;*Eca43Wy.hCݩfmh_҆| mxu'vë] aNO7oIx x ݡ^?&x@fADK"\lhz2")"ff=l%c)K9pѺX/{O-V-^km65į0ֈ4t[Aa\ANJMd`i͋8#2Y\yd<4O ~}{\Lć *ǔ\2~qS؎+nʠi~Z"+;5+wZdO˙At9A!.W4;c@}$Rpe!b26}Ж0!.-3%j$c}_`cTV!c=\mR3({i||CP tǹC9mܥL7*Kth6'~ ӹ0xV\Fh=iֳz>u{?y @%Tc  !?C->e] BR7ս^#DB74,g*P5ҪLׇ},cX^@uŵqyDojxڄt@7'IuČU|œ@w>Hde6$nko&c4nՓ63yH(ͻ8Eෛ0)k7($IqOxm(wq2n#Y=I,6@gaH{/U]?V AlsZFpp1VMېmMvvRmkMF&c&ݞBVvw$nf F`C/qN̨$<9 h]վYbp=M#V6D s _ij [=ևh,tXfpƻKN3 X)3)duSpQYiT+ s0ٌN* aV_<PJύyPzꨐ՟}$) /3"RKD",2!͌%f0>~,N%Nآ"矪9yyDkbE$X=1_6ߏ?U0%  Q4&2{T1ǒS!xӊ_WǸy1ƙ;2| A.Gu[k]$%v3rp7, a+^qXF'pƒ!ShF߆uh̓r~ Mj?j9}Y N(IoO^ƳAF]тj:B?>&,X-Z&Q wi}w@)0LikplX (a_Cx^53ٶ8R_BX5{@.Ҁ}LU,R!jݤcB+꟤wlB> stream xYm_*kV|/E!h Z*K>J̐,@q/)r83piqEW)+o*;Ϸ?E2)V+ɉf vsaۿ<5.@>n׬%MqX)gzt44h:5ъ5ٜTm96vV "g7s$2gU$6yAԗқ1.1,ŊlIQκ~ dk0LǭMrE@䲅bЅX[BCӕEm9#<v)Ĩ^Ll)`?;apJ'P*z2̂` E?]y_Ve~ mSᇏ&oTƉPg&{i/%fwíIL f0ㅠr')#kjd]H~-N 5ABT1H z"9pU'iBRy> Pݧ1wv.k FF-'0ja&Ę P9)sR|P@}n c$z#L`IKSUu.$$#6Mf"bU$".& {mԐ(s^G̤L0ޗH,8$ ~ENz&VjI"k&b.[p5ERٶ. q/ُٽL%]}ZK>Ic9yUN$gaKd /I b53~fv|uڏL]S/A'-TpUyHẛBʡKIWr,ChJ'y( oɌT%5|amʼ +W.c*?/K}S@!UF=jwv&?0>: K ̫ydi&W)ati-؃s V HWT. k)IV~CMLq@DC^ثflsy]T= r8gMcl-&7ûc, ?o3gX@C qk#9CSU!<#\#š0$-WeyWcKf}De_3 \N\n  ^[pK5K( eI1WJAF{ g'KY5thg s. !m՟ 8!2> 4cBFRB+{v [CcoivǗOֽꀎ^6A^1A~ *<[fd o/ESkEP7@PCdl'Ƌݝb6tKx[ZDP1.,)3DŢ+O]"G{|~^}k5`s>qٵ6@&|ٝ2xA•aG۶yofxG;@1ocy֣†gyd9iLbLd0:K?.t 䏏Ud`LN?olIy|[8N*n˶-eC Q S _ CSͻ|4 F*8~*P6}iws endstream endobj 125 0 obj <> stream xZ[~0RXM7X E[E5HrfR=-""uW9+#ow՟+l!JKVrtSՏ(y#D6Ucݮ?bAs$W:kͳz4,Y(4`r";]lGUCđȆy\'cn 7i.ڭU@SruMERrvl>wg NZ<8zq·ڱz9+WajVBQR`b奓a6=*fmz}o5xsW02,7Cͬr;ݢ܂ȶ FRp?ꩯ`/XDZ ژdyw>!0WJd\}3\<9 w7ƇL^k#Dۍ'rhU߼iG;m$B{r7kFeo/gyƊZ! }e9*cx2hS|z m= {d+f;qxŭa"G/6X Ȣ›0< `EP2)ev#5ƾF.)iH ֏dk˹<kaFБrC7#E.(4Ҩ6s]E؁z<-dad/KTE \< TU [z ѧ *Qqba8VMFTc Ѹ'Pg& -*kY˥ydIvv,(Tpq p+.1uuwiOBH){!5PrjT9S<.pɄ ] )uIuݻZC+87 $S2O9Q`;gb b'έ*Fz74SJ`B4-)Þ):Ŝeyԫs RvIR>ɋ t|Ep0BWQ,ݛd< ˲r|gWfǤYVŋ樂$/sWeV@8d#i[8]1oGP/ L~ 2mkx2ĩIr5IS~>5|jJ0FjґQ) 'U9A`&ˎ5 4*JxDm5xƝVK꤅T s]$gbx6۱|6^]z>r+7a ;UMBrK[܃&̯5,> stream xZKWēn![ɺvb{n8GCG$$5ɯO7@ i*SK^ģh|5o H_dM޾ۓߏ7o?E #~%eXIMrqGrP-'RdɗȓjƁ^ 7U&uæ4[ja'ˡyr{SߦRV}ծJz齾w~Ĵ.hPjgsVhmqCʠ Q)Mdm} jX=XA?w\T c~q,4o]w}+[zZcY>SJF䅛+4iJTs"޹Hi2*5le_*®vUOy QH&%"Umƅ@S)StÞZAָdpJJP3-&>Rd({2>]bM,!ajbeҒ{ynw] qo=ۭӨf߻JJ}s`YzsUd)9'VdX04eC7'~pWoadni 0ˁZQ6qR9?Ϯuy̥,0lrʈ(vfi,oE#~X=D0\m.BÝNb*˩ꇑ 9ƺ9,Uݾ~ \c=|AW Y5U2m6Q^ [S$z(0/Βr2.qr~>O %N.WoV΃k;㉥&ual =# 'x(C&?S:nYE@:w1#?2%I*WX-sqDt>d[ ~C?!yrFZ:i¡aȀ_3^ }sMšQDyj5a4j~+ݾxaۻFUmM 0W?8"p etc_jg9/Ι<[ `SL$E A!=?\ Ma,3t ~$`rZuɍshffľdiWaVL) ,zvQ#Gbʘȑe s5Jօ|ѓ'@ncB@Q'X#8-A`͏ΉZl1&9Fo˫yg$o969\&^K#;_2Lk(O vQNH&B=DP if55G"60D!b̻Yylc'*tvD^Z Bɠ̀IBtD *N3 aMJdsR;-EzdQ s ˘ Ӂc/yc60>0řlP?l)t`@Y>PR>( 38U"Jy1 \ 0a6@FUQ=OU g! X '-ZƲ."fo ~lD/| 6 l_SbYDn+s&ZW/ץuιMgp[jMP=k~my9"0bV:-h^C_4 !MM/ᶅ| Gm3OowG~@&F^ +% endstream endobj 132 0 obj <> stream x[IsW '5aŮ9d*Nʲ+U.L Qq2ްP 8;5 >f$d_-6Oz|w"3H*L LBˮίٜk/-u3S팪W[[Ch 뇦-f+DbDعlN߅åQ,76.Q2KU'XQs9 AF7)ka^X_w/v~7Ў诓Tc4y@726ԃ-P0 EU8R%5HI_ q^g3 1 y6W1MFCd!e;j$n)J(쯎Áh_!9@EnKjqJ<Dh8Ŵm67[(r8Q棩by71Z=Ȼ[l`!qɳV$`q=FBGg8t ꗸ)ۻL~gva8b:?-h$X &Ƅw$9:;8.rw0q5#CTi Oxb0P t`#O G°hp}A 3Fp&XlqVr4`5rjm矬~ibriBniဍ;'xkV ۝^w7ڗ-fQWm}sM:jn,{ *횪FbH².PeYۃMY* :!t*dM0^+wujLF["yQ%Xr}շu,\mvX&J)0(rĮQ,Bq=Ō[կGmjت#L/j{@v||"G kgz8p07̈P싅Kaoծڻ tfoƬC< G?Xw,EW4 |v=lE7ndxU#6.:$`~9z:&9#F(%}8Nlqb4vDac=@`s s^ LM+oS5 ]]@Jӕ ,艣⨧ T=1N:S,'u~ UlӰ.a_cb Q%.l /h c.0 YX C |UPM^ ] QM~_n`tE&`d,e^V>/˽2S;l*b6HV9Zz8Vgh)xy&i'ƵWׄPU9S endstream endobj 143 0 obj <> stream xZoܸ޿b d)~hpEp-p) YjF3$i{m#Rp7/j76tM7v~9߼^o 1ib8l)9/K.~(c?؎H$Cjv dK*I*xz :Fg,%zp"ݪ?Uޟm9eۅ-)Z=N:(p]5]\'doLIqɈhXMg*6Yaq\gcQZ?>E?vQp"SPf%84Y[fŌgD0 02pt*PaH~yQKZ4˷F#,AQO@ѵvW-`D/F,2#EDGS%iJ} е`H~L|B/>ɡ\ЂzgHtԤ$51jo*oz vV/J{fìADIf r>!],.1E)jw)dc~4r#&)<BHC6oc{q 'znogSQ fpTN(e+#9{SFŶjC9@k01O 1[2fJ3Az$Ef[ۭxLr׻I {L6hOji i92e, '`0X%"pD8ǼoidG @Af{X0&?yI8!(CK1B +`?)ǚ d;|FCصs>"#_meFQAJ]uLjf(ɰ2(v5Ie9 7Ɵ\9"p$ԫ2Xq8Q5y |0?ț<CHfZ]4c@A AN~r/"|:;+MS{˝ Po/'g/2Ǫ&Rq䂰h~ p-rf?=l&XoR/cǸaz CypAo!&9T %ANq(. ,DS6;Re\}%r%bPPO-|@s5Mal4"9Hwf_l)=٪[>T2 .YDJʳ\W*PL | I:r.۩Y4-*>|son~*u-vfaE+. X2 m)zdߟr l;ޫoQ嵟ɛ2<PH]v(Scky32d[=34(&T/zscS oS`h)p'߸m g&ox,Ăum}ڹFseC/~[DwvƯJn|8 VWg[+J7[A1fzEu!!b4MaGnihG-; OY{f̑X' klmPYxaʯ AkcYK9}Ua $=n.b8H|[P?sSb7u{= N8E0'nwKOXLZ߭ \j«{xBΛet,iu{x WD[RA|/nXN)ccۍ$g`3vj%o²|BeWVlN" qN_sOtHsJf,.<>b7uPSԚ(ʦVtP?ߘT/^{e"2q@;o2wkݣʪU^+uKm+)^Ha+O憂g(xSw ls]'l2+ƛzjyjTYv$/Kvx*f7Aߵ?O'ojۇ-6Bוk+DQpWÆpzz}dك>r`1Uo~ tc& 0ē f~ōArul{yt<&x,np纪k?{/Tq8@qg\.f8*sS7&h!aWȲEd]S(ؾRʖ-2"­"b)A188L?x3pBqB2SpY0+m| E?tGۿfij)dm{܅ڒ <$dpT:ESTJ?A`r,E2~ϒY$dR|vw?,J endstream endobj 146 0 obj <> stream xZ[o~[e`~ }XnE[bMYI4D3=$uC DL&yJH$?kUEbT&W`H3dL#\-"FqۼiUW)!5RߺEAƈn@S:-\2mnELy[mG贮ܻe}N;T)f-ۗr[?eY1qo]/:ܺ ʳdbD ZnH7WM/_{;)vUcf?oFTZEƙN[m[3/;/N*{6(Rȷew6wUUTŏ*XVOȭUDFQittCXTU,9p qD0l27&/C)],RTbFP@)ظ% nH28dS7bS mMCC 2(``&PG.\825 nT>bkrdto]d̸86ƶcQCN_Ma6ULj $Ntw)8V AQz"fv5 고pdvmHR}|rq'C1O13 ;.Pd .%m+ۈ ="9Z$9@fB_BeY@1CQH8|@X(%ue, sښCׇ0&<.L#]Pp8n0x>3<:w "TJ3\X_!Lɥ<_΂r"tWuV) 1)FaeaF( !'GbEtkgmhB"SQSe7azQ`]RsH "(սg@'P (Lel)cAfXzʅY7˞hC_&[mS},)T=+:r$)p/{)Y'eQ]ceDjz7mKD= ) ]6rd1 RYcw,'+`;Tm'E ڤ9o.Wz&e6Lw"p'JJyˇ\PvMfZr(_Xz oS6eGQma " A 5M?]pI@7h6t,zn( VQ鰱_ŃV/{FbkcCLQxn/~W ^Rsf{N}Y6}?-I0 }޶=kwq?z֟O@(@{H nBFS6SS(Y%#jF0U~tH -U7N*?[ ~=G=9#q'\C`mwMniuh\{ Vk$Xh:H.\4Ԟb(a lO>d%I^Iׂ[C[ endstream endobj 149 0 obj <> stream xZ[o~ݘ.t(bBX[Jdi}/ErLgsD<<y0ó$7[x;t5߰iHQ5[PCtvOn?axE)簅y-qwWy~!~#lTxSD ׮OswW?C`a@ňif'G'Ün1jT#Ki~$ R"ǶQ3w=/ɥIs1驝u2SHq$ҍ~I (F FY,="rP~ 碋)(FL HGiƨ;OmJKZ͞ vrTVGc; G4cwJʣCP]2f B77tZ1l)swe¿z,۵;x1 rS9NXH &>7pYVE6x1j|u|$IS71 [U-K$eҥs_9iu$@C%"1B`RJ=[Q5>9V4*p[E]g!RMDȪ?.&^/ JӤ)v y[D&ihZ ~a&>k2Px2Ҟq&Cӫ$u B$!40QBS \mdrېBl]43kn-xhQ4OGaPg97'wee@!Nh) &NeE2\`c QG!o3xzH|HD^l3D5Gk0b r¡ %9ɹNᇥ2E¥Y!$+ruSd%bd݋nӻ 12VłILt!R68\!}\yEvFh?I?,>HTǦB81f*xk)C $n, iIWFZP-`i"QP'  V(jpu(h4H~q+|y9@'8/G46-*Zu [&] !XxT⊕<ʚ]EV-r:Ư1I޾nj7:)E#'!'R9*}A@!&ᏱȳќܵY8HC$U:eSe@sM!F0_4(*yg1r`^R@C Ոr=G6Eikka'pʜ*ĉq5Ӏ7|qP^)|!;S)LbX`t Nkir7)qi$2%7B tk`@B|x({E͙UKc\ɢ2Xh*_Q}j !#KP7y;fFѐ@P?kLӇ/)7䓍k5`l2j~&Q:ED8_+ۥQ,?KGRäh7=_'gD@̺wo!MZ$ʧ"f`m_ۣ,Kav"S5mVlXWSp ;CB]}~F,3OXFh7ЁɎa-dmVđÇ+a#2ѓsnW4.tUn*P{/1Lg_b12ҥ@$L0|2{C:JS z-Φ/o րMa~CFMIQSN 1pXvn6E޷uVw"]ieD9L$$> LNLHbv}T!bJ\h$?}8t!<OFg_4&CGje]o" #b?VXPK1EÏ6(=W5(*k3- .ysbO ѢcRHU2&x&lF ,9⶧bᡞj|%@k}Thzة: Z G?x^7A(rtO' < ™SG endstream endobj 153 0 obj <> stream xZnF}߯RبK? `0 ,Ij d!x~/$EiI3oxFs 8+7f2k"32Cl$4rSϗBn a!@{[QGѺM|c&aDT|e)f"D~r>F#ilveY盪닺<}ڢڶXQ?o8]7 tѦWMM,N*d*Ƚn0},+[A{$ٚd@ŮoƢ՚[yŠpw D0mhЩloWrvhK%l,y}@fwRƾh0cMALQh2 DdѨaݗ5֔EbаJٗb&ӛuj` 1 q~x@ :af+ _Ȧ>2tH̓`$F|Gcሀ7N}n~< 1.sFhv1hᘏ-5C+]ȮxS W"2,`Ej(}q3XMRk JSSd~˔ 8'Njd`0[@KG].6D8eKA$刪̮F9q&!$گmm)ؑ?0FkÊEg?R#zj"J!)MN,_#͕#b)< Nu1?iYi3}”9aشD>]L$$b`9/ 8%GTtIb3x0 ϓj/lfC862U 4aپ.r64㵏.tp~h)-6+M!^KuQZ@V9:H,Z*Ǭ&R[utkT= d2&}q|"# \b@\/Fo }~9s|P@y8ݹϫ:]l;[w.V,4Z 3GSaEڟ0R/?AFTeZʣ6}xQSszRa7og ƞ#}li)L Ac1N8by]Ů9 jsԲOUExdУ,ޑ }Hp-ׄ!?I^ī}G>nG3:;i$mbH 󩝆RbjLHX6atc gNsiaı]$E_a,$u "islWA+뵠/ Y&{vwn# '(kIC~~36-!dK˴ -$wB;`K/_ & endstream endobj 158 0 obj <> stream xYKsWdDyحJv9rshEzzDpŭ*%[GOOO?Ud W?&fGfu2dJ!s_wݔLF`A}T}c7:YSϸ,2"wH'"N"X%n.0ΦFIL, 6Ę%V$Vz^VRQ &&J[eQgpT{/qȝҙ\ 8ezx^ "~:`JPRa^}˒:c$@#' !)#6 mHxbkpz ?JT)~dF?߲('>;>xK&In SWMT-7]HƤ,Wںu$:G]D!2c#e<2祈n^|fwP!*$K9O%Lg1 X 4zŦq6 WaFNv$~ L&YQmX.5C!rqTע>Md=i3~~(S?^GˤT{g]} /~sr7scX80/ew鑊Cd\ԏmszW ='tDm{Oۦk??>ԦSY{#/P{fC}gS=,GF B5z ]I(j݄zɽ4qĮ֓s?9:U%t3m$jO54$BNVT)߷zD>ZL:bG8+3gルt?ɻq͝#O8JvNG+hR%~l駱9+)(՜?!᢬-i !!B`;SBa;\5Q_5d§&gQ ˰5tzNtV#qߪ(@Ȝ^(d8<۪֒y:_pjJJj,Ύi{D 2p??sw{ qy1fPf sϞⴟsT {w/9agrDCYhN!?͆Ҷ\pڗ%5:eNp0TA[K;$BwNXute~$86OyuJ3="ό7]qCW QR;c;Iݗ8=)'63J Ad(r!TjqEC%f\0i7Nդ< u? ;ĆpM RӼO1gQkb"74ΩES:ip=N_Jz_-"@E01%c]ES|ոթ~7"+ʢuIp[q&SaY'#/7|mwLŲN q ' *[d85B@wHTwH\DKܣ5[ :*]Olt |ǝF"2sd)񯧑HE7pa?E=(=u6772DÀ'r,mv#S~JWNz\z韤ܻY_X6'^P`#$9EX4Txrvk T)#p Ҵ%9 `ɼaW!.\QqsIoƂ)f/s߹P{JS'Ǻx>鯵^m0SLJ endstream endobj 174 0 obj <> stream xڽYKW0'UhrlKVV[d*I$]OvA.KB,8 '/_ ۞V.n7 DJYX\q:/y6?"ۥ#%S_EsMkiY$ (LpXX{bGnA03bXF (ƍ ҈hۧ,x\I8NX,owp5E\pA)j^}MmXyEvbks<@vFv읨 %Ǥ"('+r0'QZfy]|zc(Mv)QEʿ=GmQ un 5}i /;QL#h9ӢAe[oULkF343TNi'|G,b%άH˩ D.Iݑ\DNi|iwi=MS}ZR 萶u񍆀^iՆ~Ǵ/E!13)L]I1I۽|bgY~G2aGZ"h;6iӮ5cmW^ -< W28 / S_ O8q6kjwhNA |g%Z\_0LgyL8 &hr z%PYb[y:69;+P:,QMv.VV<닣T,xC":(OT6L&=';͢6U.pfXҦEg^=CG PB\WÏrN<9kR؍ނ/依gg,~Z=2#&tBo]^t>dc_Bcqcc}iO.%wPg.:$);RğcfekAų&q3oʊ v6ڧ>h;u憍u<|x! ڤtw!x~q߼j[`> %&XA èV}TRx DX2%` 2vL{^tZp>Ն:J16!Kɉ34S<3EHvxp'}\Y L?{dlE? BQ"F0ȼsdc(?`,eR)ɿ}t@MlP FrW!HK=^"#uP}< kv%KHς>܈{֎>_KBji/bhn. sd&(5*-H jxۦ@ ӡg0=d˻^ ̱7ٴ|q'%6B1A)v+`L []JGʊès9*s}hrXb]řqCHsMIB\kyv@CKkYCEK|KA)ZwpB/,'XbRT`JϞ?`⋄1pw*kTUzEL=I;N$~ }(4MsL:cM8Z8ZvĸªL%;r"/[2n8L)ӱHMy=^peԂ`kSu!+1z׶޾-'V۷3c Qvvv{9K&*=+HoF_{)!}_R0G>҈OhJ--ٛ Z+%@gqu]=D:֨.ㅶv $ȎZf4Vd\#kuP\YL&Wy:||dͺ ϨII;6? 7-GL}Evz_>` 3bT<{ޟz_# עzB.CaXf4a'w]JC; L{% մL$/X@$oMXY`+|< 0KQTs> stream x\[o~cT _@AE ؤ@>2VjK$'~ߐ)dzIjsgΙ9PBB](]cp[8c e|\S*4zI ZBkgC5*D\Sx«dz) d, ^x`da 0C%S4)dѤEqgp"WXH.6+íx6],~ @itNT8&O%t{ ~dǫ 誀|7`@zA`]A PKnR@xGh%KM?"ZH*+Bv"+S$ڕTER7l"*?#ox,VT$4bUI}-5=Bx*;2#5wG%J!RW'<{X 7 }!jE&V66 !e 喇#?;D]`߄@ 5PP%9h"fγ1ƑN^F˷{@af)H+9#ǂo9)$$<ȥ`H9e,(r8xJӃ6- DŒ?0qKMrԃ(j#wF )`k]1g5%c |ϒH: s97]QSgdSN)S@'f 0a4d0 Z44Yz% ijhc|Q1p&[}O82-0K6ewK=Jܐ2 L.l#19A**8Kbʑ'w9r2<-X{%'-)c:e |׊H.wpšszHb`>xp莐HttɆwŃ;:sY+m9JFGK]MA/ ؏ڣ8iG̬o˜o'.km]P掼阞xuڲ%4 ϟb=rͧgY,{V~uvˇx߽7Od>^^y]\ߕ{(,+TB{Ga1]tvI,8Oey].,r.1}X˥擸Rܬn7y8S$,*+l+o}NB@@)cZO֢\WLoᄐ&7vv l?~11dJAgaֈ?`-UQHZaD_߼,_=kdJdu gP#g}Oӫ,cfJ )f2꩝VHe0W3RC_?/~bG B6RDD! 99csl}p)ڰ|7[ͥWjT]ko/ Z]f/'!i"QcՐ)yA<ߍCQ ̯nmW7;u#cuǹL;r1\woo7qe)T AxbH]<\|(uH$l@hwlf<M8tm 4U" B_Ϣձubɗy0,=6`E 8lR؀>L_]6"S::qw҆.(96a6b%!z 5aN{(؀cUpt^}02 !-Ddp|Re  oYXO\ӂK:Hp>\d-QSB82kH\*E#)#|+B >?{Ԣ 2pI ZFP%9 pEvnf"lcUBXZps_#Ҧvj/P$WUwC@0f0"7?D(&mTQ2WorEѡp0iNrǼs!W^W r,1X+A(kt9n ~qd~㵹ht;5cH5L̼[S"jn5 o-WLI*-B##V.JNX7ɁI DYRVUe,\KKIp 85r\&v*me_yR12`zsxr*+/g.?3߀sP}pq̦&.vnLABXАcs%8mfm98P]S/< glg\r[ edvw+<[D)Ƚ-irvg>1ɣ7n{U?"ĝ0S8 m~#I6>ԠZO -b:U.cYcq&ğrW Z)w!ՌT:>Բ*>ׁՋjA$w:tP}.~'EH@%R(sH,l+;SU0ւV"1'oyKw yotu&Dx)͊.D4TZڢoK=b}tБ[j\N>ŠS?X1 }7YC{#6,'w!2 d(l6M' Tc M Q`mȓtͷn߮3Dh:Womⅻ:r-|Yg+6-δtܿzԋF^mulj9 %cuVNrZ)gaܷq:ƽb&EP6;zC';Fs ?r$9/U?[՘!\N,5OXf8D!lS]A.01ɗ9`\A)%g-H+ɘ? Ox1Z:ziCz1:0duv#~x';ohէ ͭm n]mmFl#`Q7yi3,V wy`alɲAZYv̇hUW7qa׃0 ۸nA: ^>QcBg IY P>fK}=oΛpFa="̖֐[VnfY4dxMm6QUmպTk@mvûOշbb -vpY܀Wu;xh@vin n}[˃^t6ijϭ;".Иt;~O+ 1{L z)08,*<؜!v5<|3`@Z]%ybhh|c/lgkW:]]}_Y$N<Ww1pwj?YmY</u,(pzϽQ 0 P<8buk`A`k}1TnTwRd*T\.@wTF]~ULuAUg?Tׅ@jȪ!`ig?-{U5mi҇tKkupן뜔9SBZXjۭU n}>Ohgs1pZF/[Z족Q|_52M1Kٌn=OVaG~O ~Skb0~ k endstream endobj 282 0 obj <> stream xڝX TqE vKDJܢF1nA5ȰL 30 ܈ck&*1Dc^bI9?a4^r{9ꫪB1 BX:wݺu__?y[f1BO`7)7.~8(U+>7e~}Cvj*Iu0{:tuz߉ 4 |0hƠggݜS?pӣ~͵IN܅8]yܭN^o5s(S7(hi*)@DUf!KA =*k`Ȑ֠m#Wķ&fǫF]p`|psUސ~FOC-#r6!R#S= l_  dYfSfXM\Շ* vCA)Μai/'덎l)%C_P%FW{!V\UA_r!u2 2fyp_ d InN:.\Mv(ߔ#[ Ј.ػIL|]FFJ֠:.%6'* K}pho1S>GS<ĖZMqPj^+gue*]I%9 ]t]Y{~Pe 93pCw1oyj)ۘo(08ޫck?{k,b!odfs6Աc @%Jpw_˷/9QKpqٹ,3j"鏯C9p GlE4m4)9ZȅT.ņ췥s=sLU祬!sYzvt¡8^mN^)Wк"1(ES$gH q=]@ )\V1q$k{6WoܿxclsD dӰbh`{a؉;!8 49 ctL >?>IVn%̆Jjn}@SQ_@1.+s+#x$sIkʁlf'$@d_& \'PM8fh497e|!0"feC]JA ^B1ePV)RCH94>G9:u42LnBǮ'9o:qDmzfxNYsVE| ZcIGF3{$?On1@_g'zi Ra[m3ZN5=aI EM -8,5.AG_lbqeC/U ;1[{Ѝf@wO"==I(Dgh{>qDf5d(ĉ`Oi0rډRHY/1Y J@/iiv;/dwٛrrE-Z}LRTr&L尼:3p Y:9~=w'z#Pw(/// BMa.^8q quc-!CŲz.cLx%R|7{||ϻK˖,,j1V"{ EBE"Ȼp0%ăI6۵IK큲*;mNO^E}IG")Oz=FfrAM=jA }LqR+c5JnApC$ƾN?C18A%" ,#D8š+=A'&Z h4P|=b@bۥ{"a_*Tp&9R_J[S $2^W"8EvÒǥKxZ4ݎB 3xUI F>-*7CLbWC2(@э= S>gj\RD Lõ,4 YGHv~Ahtq Q<ףrYC.CcjKd %‚b{ *B,PlGT8^R4>J9R/4AФeE%g]ٜڜϦAD7\]Q:wh{1]*VZx}2ew 5mЦE\ܺ5  =!+my5-.4KuhRĶCqeݷZhf1\Lo)EN2δ$+FE^b o;D6!-A7yrz6o})X ߘS+{d'i5̌WA$ThkA \^6Clbc~Iƻ*$: 9ʥBAEbbCVtvovt6.`0LFOeeψÂ+F\ endstream endobj 284 0 obj <> stream xmS{LSW>"sJ-kb2_@\̈eYP"hjeCdCd,SaEh1,K\\sح2㜜wILm֎Ud*EU >N/DE @ۙfi:hg;8p˰ٓ35gڍ*#M²7\N]鴨&vcq AҒ]Q/(b,~cno!7Q=S`knhYk`).}WTx4C^6wtV Fᵫ*8_Rߌ )#H84!$~CHV$xzc-Jc=n9}M;V &mڭv2z/=<' B?5mڙ@BPH'ZAއ{9LZcvA#a>rZU^V2A ]{w {Iy/{- ;=N( lEW,{aL{R w˯A^6G3UzG|l$fE4Os 6z[oz4ǷTigQ%5&5Ps_6Zһng{ὁZo`U;Ax*w!Әu]@o/]3^tv3`֮n[gj&Rhy؜gh'J=gFkPD1{ⴒb#wZ{{sV`=tdU燥nN1)\LIL!UU#qFW~y{+pǾ+H;>[p i&OxՅbhin3 6UY!N 礷 clo ]۫3 kCtqs|.*"ѲS=.Eqz(ha{ %윱 endstream endobj 286 0 obj <> stream xڭUkPWf*D-ck GT 1zx "Ù04AEQ."(j¦4]f+5J~nݮs{/c8ܲv#*nGDb&]=+ spa~bxlC a.Κ"hd  1'b3xz{/ TgNMNUx{/re.e2TSٙTq; j4\)$:IEHeH#z Ul;`Xc*<K14.lB"<ј{{@igw#}``vcaM|-ns+ٷ;D0AFǜQquQ` m._k2MNUWU> 6nVRZJWC$>Zx dB+Wɶ@s l .OV"nA=s.#xXk`Ch;!9 _ý#53p rMKX !KwЋQ;фV5D=D@Pb8vF_ wz A!HTȂ[Byk{c5ߤfJL_\do.y,fMz1NS8JNGA#2A=bR!X3Q\{PM04RW=s|] 43I@ -:KRe!x} ~"_Q6W030IC;rCᬷAekT}-]_}NnjgL[w.C4N.N~82yf0*ɳPDd '1z wKe舆-L`:b*'Ey JdZM dd`iIV'dZxы@YYQMTp렩 rEzM@fB;Ñ`eh8pHTiEuR ʪFT20L Դ%6xmrѯjAC63"pʯj"Ɉ}pA$\=)]q ہZʙY,}ɰm;}O>K\&R_XEѫW5s.P"3@4Ԁ*ʊJC&ŵii`&#PeSCY$OBeqA_f_^ؠԇH4KVO 8Ϸ9%99S6aM0igA'̗$'=+ <ͅ]p@+)|GTD:Z.tT hax2Kjpajthᄝb+Kvgd& ,8aɯҍ ēI٫>L@ҏMZ G{;2|ᅼ#ImѶu4[d/ZFi%iQj7t~tXi0TUU34`_oCQJ endstream endobj 288 0 obj <> stream xڥW \Sg!{ED 3 S[-uRVhdaOr .hSNSתik]18o&%w?rvD"lWnv]@b|h܀tEh㷅¯) z\(QDrx rADmy?21);%&*:gs璻WX׊y^kB3SbBvznk ^aѡHH Vl X܇_KLUZQ3΋jj^ങ e)+/K15rS_SӨY,zzZD OD^W5Z"5j=ONP@j#LQ['JD9S7DNNgĬuqX+~͹ti11ǴvՍ>&qvr-ld&Ope]Ƥ7siJ£r>`I'ݥ]Qsi4<*g0S1)9vY?+a5G(W3D瑷P'G$CAXZXܵqVKp [Y^e@97"WbhyZw^TlbtǶuT ?FZSS[R[ZɣN;8;]~aX6Z *\(WL*RC9\bPm0?.M:ՠU -DWj|BK emΏ\^_(]*0q%b<8 է>ԁ֜P|V-*(TҢ8>7&?qZ}P Z-z 4"^| Dm> : KT~\~gV J>DG p) ae!e8b\1*o:? oюpRTմ;XaWMIXf_՟#}G,Bh B>}r0`'PSWyB|wd3ϊxQ}S%H`Ol"Uql(STt|%,p gp0R!9?A" ȍ~:" 푰 ^W XsP G;-r)HY0$Ъ3Lh6b?Mw9ݞȣ;ʶ=/p\dPB)`x+}Ǚ5% -qH 'gz보ef`j`#hJ jb74l>17N,?]/}7dox3ˤ˗w#_ L>0yP)gx# fJs}'ϯ^x~%_+ (寴$%ȷ ͂zh*rm!$1)nBr@;@~ ,VP^ j/,^!`'8%i*h쪗) xH2uEB 1*IƱ<Hqo9? /RƿZ* ]aht}x& **JͰv(Cˆ]I-xXQ6 7d^m-3҂(ʊ|\y H&J8~ > ` }Uјjoh Xzzy+q+7C8*}%]>gj "- 9^qRZg-yw 5:،Ҋ>vYQd*Xo}H!Eg~8?ĠK4tQЗM%[C]!'O'@SktSrxz\b [DKjW^#hA!4oCcs~O(/bwhԬP椽!9W}Oh5pZݕQ LrCfkk ^x"~4rE4 ҭЍW^AtU~io~Ew4_q ]f >2㽞&ch۳yfgRh3djWw(gyޒܝܣ~:TMS of"̹,ϣ[26R]qXDS3G ހ0[Zgq9fAIS4~s< ͝')@fKLh:;sZ<ɡk+G.>a Nȱ?CK%}}UQfCk/ܰGԫgW;Z- 3y|8訤YTAQuh1uQlB ۘ @O]p[-s b]1F^Wa1Xs]d}r8ؙ_恣d(SA6jjLI`ֳ[EMr:Xl;0waViFGU{[gIY,^ ȨBBWPV)EA hx!%OGl!ekbA_H؞G QP5S5:y횭9ؓ|͑nEDy|5k+1iMoAq6_t\8%;%C<lH4B6nO_:rޓUj$Ѿ-s OoXLcCz$V: }=<.}wyUM8BHT'$͑ fa dSR*ydF9+|n~$wŰ3Za%l MTvnhv jkO]%="@F1`,!?=[;c%̛%~ sdصo $un`{Sj!6=*L9B$}ϳ)tĂmLdUpŢx f$L7Yz0X%aGldf薔L]I);B6ڏ b>zȢ@ң u7T샏Ya0ALvдu; {??f̢WHihT&1H-$V$C}bA&I7Ӗ/5t4w! *RPZ3y_{eLѤ0Ɛa,]$N=bkD0 !ɷ N4Ƃa-Ǔ(2sFu5aGTV=_r,.L$];9M'n_TDEP;Yt6,6KŤ6 vvc.8 n[4Ryx/q^Elyu墲hY4#mKބp]$UU / UM6QGkz.HB\9WȰj#Tzzw4! `G MD"od*í܃@7e},YFF k '=" Bz4IdKU%`փ޳׵Ua+8hT癔5*˓9Bi;4l&b&hWc (Zl6TGd}BWH^5s8lB$^Vpt;JLDy]{7T}*( %5E]SL-RTW%&fu&񇝎N}[,cPg1Moǰt8:K^Htj-{[vkuoY}:A|)( S5DPEC T~Pk?/(y[ŷ6ap"T %G$Ĭɰ3lFG\*sad52'!!Z⟾)o x@NJǰwC'ӚUy Z-,U=|QǸ7 +jcsuLtS ncy׻c[:rd]o]7 8YAA& 7fI endstream endobj 290 0 obj <> stream xڅUkTg039E=²Z/[ѪTUD$"EpkBtzЮnRhMnmZ~c'`{;y/|'!d2B"rmZ+RS9=7'nu6Nm %BJF*gi#l"AH~!FɸP=>+F&Zcד9CQ-Ϥ'9-EEuh5*~*%>5^YćTNڒ媸\ڜ5KVQ-]rMl ;(qt\ "R!Imz3S\a+V/j=E_ v~GYƳ (Okɐ+8۩h=rD~4&:8(X} V 䃺sڗJ!H(=ϗ`WYEfJkFC57_/m?vm.:Z;Y%ʌzsz2^5%vѩ^zUS/f!.h/&V5B>pf sSywyY@DB@~f`0j g6XhhègfBOMy=?%zpCw1H8;tPtJc,۪%5Um3IߎEB҇Mlw3W . \X:hZNl}f6Vl̠|rz`Auq` i!KQ8I_*[(ސ#G,o /n)Чk C{3J0;9OiՐrΐϛr P,|-caɂCvz+{ޥtLdY.TmMg;c_UhnW'LS<, 4= P@J|(e`7f5@%AE(Q>b`weCmqF|w 3|BD_ Һd8$\{sn|1WqWo0dM-TQ\V^db~Lb JjwE"aͽQPɊ.~7ZgcM]), }nՇ>/ gK&y69[_j׵Dۏ}Y٫u j:h"KA2cdkuc " tV;;DDt8jM#  c? endstream endobj 292 0 obj <> stream xڥYw\׷eagʸfQ3%k$bAERwDgw.,=[HԘD%Kbb{w{1{,c=Ӿ;ҒH$6 o`+O4agKp0J-N*(-;CVS$kx+uxO.o$_)'M 02xWIN@l]#mO' "gv&h* a;:6vF[EkmsXm?f𾸱* ŗ _"HQ,[L5j ˵V(vAHzR~eA1T'ՇKPK 5JAP!Pj85IAIPcqj"5LMQөLj>ZH-SK2j9ZIj-eOj#LmP-A-)D)e.޷nob5*ƪD6D.Mџ0=&y˾{_볠o|  ?;*֋=pAA{_Zb"ˆ#>dC+Y)W^lmѰ~#,G8hG}_{"$\*=Su:RP٥99`#9+w:PWTYo/<]\!yEs%7ↆh|X ɠeS5٭_"ۋ}-0I@H ʐ:4atX,qp^QI-@~vk=ݓ4 L)=߯ IK9_IH*8D7Kdpkmpz^rv}hD v!`& :ñ(mD Z\xe0w +描Ugمm7Y?kAo|P7AJc()d]2DC[؎w_@}\Ik:Ⲳ=k7ڈ%ܿRv }8=U^Za20  ̳}=55I:i];4CrBcTsq0yjnX 3`v6s[~LZ1]Oۣ(32 YPhJBrqgTx?邬NѤqYpӏ{Y )6#eez&ȇ,{ @ө|W_!IQU#;9\~ᘻzm]a4V ]I_{}Y.`\',N ܔMss=/K\I _13s0 js䏪>Вq" Y<ɲ9!IJ4-? 3>dmvR8G![|*KNV06v2+3ރw괝;xx-'~4_,KKt۝ ,={&!ixbd}ռca]NSEd HWs|wYJcxw{Vu[R!@!,C=hV4GVQzUb[S?9ަ|#c-4 5A~[vhu DH% QC!wPva>?'yːNˢFL{TA(a 1dZO3!ءEhd<.Wd< w zq^˱VBNuţqh ֺQ@Ua}qxۏ؟~SŠ 4˔kkA~MMT=( X#ZPn5ݱ4qHH`LI.h8 ֌v^`#h+̇v}İsuF]60]fhĭJT1XdDlgW Zonvm#vʏX-[r'5M;n[T'љx 8 fJ HشnkR(#517k 0+<`#lX.ܽQCܶPdB52s5LZeƂK9i7!XM-x7V6qt0Npx&tJY݃V)])өu{?xq=.{rWroY|g NB `Iޙ7HScvlCy0_q rX}b^(d.|| | /w~U;.(n]24ټOtC}>uKLt 6Gnsru u d]scFl]C^`Fu!}i;HGcrk( "8\v:N P9>)+`ʆ9Tg7-uq߾l%N9ǧ |^bEm'ٽFH]Djo")79= @Nn?("7vوho 6W(dAcg>–QH[bZLFVt'v%Z.{ҜW.̽nA\oA(9}.W*cCP|$H'5P{aUpì sLNU f-7O\J ŵ߆xɇO= (Z1 @K#lga 0ƽ:6콠DhT4D60*Q]*NŠ)<7GT d xeoKx {bH769ZxΫ5FQKwyn\M`PGw"\ ;0ĠE̢] Rf4l4KmQ6rxb|K< b4OBhyr\_8<[rЛC#LU@x뺧p?ekGůsٟ/?ͦQoD}D9r#` 9T$m24m9 <:eB'$S=[ydwr.Q+26C‰cVD^\a pnQ%ts|k&!. kݎ} KpkrgMg{YxJÄ!Aɘe`#K ~Cu`ƣE<_;U/r,֬ÓefbepzM &<3I5i5s!XvՓDc*]fΘ:L 3OͿ9{uWe*Ҡ,ofnZ09돭9e%ޘH%vwAb6ilcJ)~e6讈dZz3hC>6w,d;qZ4K5PzD)gS=9JWmb {bEHEj{ʿȣZE'4>񎸧xb0ٕC4Ɍ% tCO6H6/)n"m ,3*|铺  Zu/W`{΃`x‹`̊?&j幣+wjs y(DkTORz~(|=o]{wudd_VJa' DW&R f9蔼DW\< iZʎyx"K:_;xc\.!=%R蝎6]wv&n=60& L7PJQYTYZڔSuBO"O!CU< ~ukȁF蓆 VeţGM57ەf6FhlO V5R9Hnl޸` A쪨**!lNRV}eݱl2R;}҃ި]]7wGpfX;/q-pxVr y('~N]spJcX@eЌ>2!fg-Z="!_2yelɠ4"}5#q{3]1?y`zɕ3l@Y',i.#+MbN]ᙌr8AƵ*vǂΫ̨AYm TKZ2'k?fh:q2.<]j?a"*J |:}<-ǣנYseU<:C}ֽtQB1 'U8ǐ}4ɔ^s f"%DBj6mq D느}AMڳl0>T= uмwE?4+EMVҭr!ITY5r2đymOhQhzLF(}4*[#ucBb2d#q44y/0O9ng$Ȳ=)jo'o]8LQNe<ۣGI: uZ֞;TVP,M/XZ%D 共5۹pV2,VԺÀxJp6cvGzv|:2MOlZ_VA0lDOJ85 K| %4D5O-"Ȅct*M b]Ye^)9|v'Ć3 )R_of˪܃QJdAWƼ)k8hd>E0]ަ;~.Ԑ@0Ȟ&[]]f]'C?؇/W ?X+_}xRZGƪ{r}ی$ϬN kF@d/Ӥ6Yl+8SM"u c%ٰuO eA{T.omY $T4/n{#SDnK;O_%ZRLEdI@@dDІ~qw;}+X|EKc *d-< Oa-Q IiSRʸtPgx_+~񄕆h=uY*؄D< U4[jK⅍җl6;wiv)c5T몓m[e􍅽jh^|QcC c6RmL6X݅x`G]Al71$u9"igtVK#CL:5KңvAϢ87p/ l"UP% ف'UJ`vV+R?a*K>Uh:`f@YUm@֋@#N]}.k+Pyo6n>x. ѓWyϕa"͎~xb V AL@nS9Fu&C0غ(r=ztjhNٞa4uMw9d ldskH"3&Vhݫ\a7}ݗ+ 2&JY4^7inEflnxyGK{>b(FT닋 vQACR>'u"ֿ\ 6fwt ^GsX rG#x/)j'mFDQh 䤘I8S1hrtb]%Dwl䬘Tu_8I1$)|a8hs|5MmLR~BqCJe Fo8[׉t2o#YK|vHCh5)wo?"Vd?2nzCXaX$ي?ŋC g6 zXpdǵ[֬NUx}UsÌ2U^Q Un1Z|@j;Q%4ը9\LE$8ŷC6+E8;N&)4Y)Ě P?}T "xuغ!E|J a1~N͞ǹKt,Z(QT\:/@Bqpg̈ BN&@$Tx:~.n t=&DUAb ?avNƹG=)1ϤE*> stream xڭY XSg>!дjϡVպ "jM %! I!}[չ3;rs{Xݱw}pICmaճlON OZH 1'u2[ eLo3}=kELpx2?&9%'-.&6w ߈ }7G  |Ó|7.ܲwkrg_roāhhߝvX}A;/wޚ C<籘xXA"p7۹D8EL“x&f41xxG3}2✪^By,^1nǁI87t뼎[ @gvdo2J0A]5LU4+U~Q%'4QU]Bķ퍕]iF$JaY@m!ʙ WXW<,5W B~eI{EHv;)ASЛ[hJ!|vuD;Qe;`Wfo8h2rQQ~^$8$j(}Ȧ} (I^fDh:Nsn!#@{:E45(rE6(|3Th' o r(".I'ƧVַ9Zif\5ّ͎B3aGOuв"6ld2 bQ BZs;JgI4eũ l,cXь|B!@U L~|kx DGS>2Д떉J,`mʎfhƬGFj+3*Jm2xY?D iABzyVcŁW=^h5rJ@ 9GDYU WV q_"M|ܷKmAA@{}Y\׆J~.Tfu}Gi͑ÔjFWi t4-f %PŀU R\.6؊+F^Y'g,ࢶWc ڡC )ڊ>m_p-pp3A HkELͫ+Cd#X˿}(qP1R%ͮ`gmZ%\.ԡE ?#t<< 2kt)88 ׹rF>zſ| Ue@j=CI@ 'mhC.>&K޳}mK@>Kޢ5"y*n7i[3CGjM9-Z\puA&(Lұ%8 w5Ry&a 2M4TU`#UR(^g P3fg䋼Hcnu[fQuF)đ*B] "]!!ȑ^rMP} .2}t'qUVk*%SZ2L3WGh+ؙ;m36.2=pXq,A o5K(]QEd. *JZQN8.Ch0AJViMi*[{+ Te5Bwsp]p k(d.*C]ldM1TZe:s"TQ/"Q@Rw[Y56!ؓ>O*.wC4ޜq,do: 5^eǞ'{pYX]p&! "o,[8%Yw s*8qbP2OD@>-<;.2cŻH@M$awp-曕B:U@>H^[\)f,\sjZzlg\uκS uסJ%aJ3:55)kr * mzЮrM-`Pz0#F!CV] >pp!_yi}۴|QZ^A~:亚|?^l8J<;hڇQKaI #]H "߫޹ {o~8]"?溒-/=y_w?Uti)B2K7ب e^aPaXxWS7m{/\[wҎ^iԻ9ߤ6[S*ݑ(YLT50P@ = |EwO}rV/tSa@*S ]u^pF7f9g4ʅAAI!xy<٬3|L_`3B17ralG|h9 5f%?'74_ECFЈ{lERTԛkY83`3)F]^<.̪Ț1\5 zHY A[YV( * rH@3[KIgХ!=1~5Mf^[UbϨ`!yN - v"m^j-G5d+ԅj9f60PIQ8&m``~jpcxBHЩ Y(R܏8Qg;NYk5@_maIZb܇HCĎ>)(3b[(6ߡWMNjo6<2B%_ L^Q 7vYVS$910 UGϱ6_8NS8o3%`gufI ?B HW fLM(bsܗ6c,*5 -YF'(yK]c:HMqvƮK[^ >fN8uNy.8SFΰQPaJy܏@\f*j~rVDIXKsrL'T"FJ|^g4 uL&3Ղ ~v)McHqa:kAjJ8dՂ\ 0p:.h+"B Zj]:7|v.;YFߓQ 3=eZs㱍s4c pېɗ&l}g3gCdВ٠('R%'8m@@vM[2ȁMs'[Є!3q3y}hWXp=Rg-xf s˛q(iòLփzMY8$eꑩ[$Li\~p H [/.Dh_f'5hSUR*,=VZRsdF]"LTBQ))nq8ih9]786ǹ 7ThC@T$jJi Nңa2Q!})j*ȒªԜQTdg TK Q0+p΂Z8֌;useRF%by=*LҔۀԔrԛkOIph}bcA_mZ#U5PRUn1؋;ʖꖾW12B3ز.̏SB&1mh|:}u??r )YTmzz40^WHPٳ3@Zt< dى˩h?%ڈ;"۷˯ɳ'  ̦G }_baᜦCľ~اeۂX^wVul>'3Wr!nv3:(꠯ÕkUHo_4۵gKR|Y [?D>T!S`-ŇJkjCgNYe.&\PzEnU(54QWVk<1闋.aΝs\:RҨu^F0) o( J*K/n]tn~sc[>zO/\ '>50tabvd։:\ZjqQ S#X #UV4t=]R/&V\ДھGj:Vh!w0rC [ܡjYN:!dO/áh5~bBQUBa4raZ+V ܯ0>V:JU^lp 1WTF'cG @7@%PJS0W٩ˢFIHAKO 4lU~Ifzrn31og{ͣ@Y8&3h99{#Wn uus}c% &ʼ(v =&5rgi\6qz)Qkke9Ct5O)r@])+H+[?6౪<[!tOJń.IEK]gN'2O^C;1ph'\9^HJOKO!$3jS &"vJD+ gZn7m(7ݷ= K`),G;h%_th'\ז_"ss#2"=@z0e[zh,I KQKٔ,Y "R|:C TeC98s\~Uy,',ýoh!t7䰻Cvnɼ{KA!LmHT8Y)!j}6U:QC"%p`壻ْGqn9 ȧGf3nae(o5dSUz$6`Oat?%ܘR 6렌4槥eGNAkWEsz D_=q>/wx4bs'5p\RȀOª!@uȬՙ56Dwƽ| }UW3\cEuv=mSM3i"wwEqƌ IW5T9kf̤{T endstream endobj 296 0 obj <> stream x}V TTW}EQ < 88$1N D)#JdPD"2*RaFCFbo4?Fu--N뮺kU>\cnd2e+|6N]7fWXLw7"L+sQ&7dkz?vT.w4ߥ0c<mY%t}׆1ɔڝ񳦻Y{=Y&j&,:B3}t͚ͤhMȰ>| ,]o߃/jlOu}&Ɗ0sWTCJ~zxffVFÐ֩IEpR[ J -L|_r6B;_a:EX*0Qa Z7&PE'8濾]L!PО=Qy8tCjdA1߽qGY:B5d J(\CAkוv^J,*/*_3PC'o85yZ C &ZdFVnp 50j\ ~U+K'ᜮzXimk D<o_&z|5uM!S ,v6@lkډ'9sE}ucQ8A.R*'pdc29-*_?!J-ۮ]tv*7rq/AĂv[M6 &7'@ =h-l/EljfQ]&nvKa*q؃)q x,Q[1KyO1Cb;d\[Ty߹WPo-;_qg~172+j5d+0IHÛد) 3m ji ۳Q?N,U;\!\pܲ N䬓xHߣ&>l"9)BNN?6=MP1HeqkϢ4TZb:R5P FOL%VfoiN",t^eUVCCzƋO.ks81dS\= ǩH25|eS][[I=!c#aٖ4_kQ1Op\DEu~{Ʉ # S{\eFsKdEb9Ȣ7B!395-ĺFB8h|6Xf#"s'bq@lUs0'<- U|~U;M9ypddF. *qS8 Zp .γ8 Ρ5:˺8aa>RAm] kNxMAy;^!huI\u9~ [7Q|?~~M%׵h3$oLol3ډQFF;x(F :#DG/,S "*{矝~/ʈĸ"ӁUSE?hL8WpCep]mȁ+NCNt11Ǵ̾/%*"UApZzt i,M&픛R;t^[e약ӝ$^dTz-^߭d5Pw}+}DQV']KiyݙY]oe+uij6'<)Jb#JB{%-Jv7C9*qJe[`z-2`Sd1.ÝNp\{5dqO殬xjXؓ.=K7%6)09Vnu̴iWLbj7ZGd^4=\ [d(WX_tW܂0-CJV%ݐKĥ,Ygٕ ";Cl2S%%GCcOFVFQX]r%C08h/i `M PffNUP1(]}}xJph@CC}ER/QYiOK df sˠk$õW1J 7"Z,+͕EJaH%D&>;;'7`8wB?__/)[X s6}[t endstream endobj 298 0 obj <> stream xڅX\TWK{D &^4T ( bY:,U@`RRD%Vb %&q5j4KĨ)$o{wE!/9̙gfe#ɬVo+CC޲'"K[%Hʤ2CI4u'!6/E ׍0:2_[ytޒ1́ӧھ$dW7f팩S8f[4'!Qa~6^;l9Oq _L ٶ+&mzkYj{kޘob ˼L4"iYdq`V2.l=be0131aL93`3 ό`F2+ƚͼ 3ѝ6ʼdf1sBfg0Ke̻#ĬrU{jƕYø1ku;ldPЌ rY * VTF+Y<\>W~ŵ7kWf?3[ib?Cw =a>|4w +× f[gJ#>VlV9ydY6s O$Q=7!8)sVF6AWf92W@+7.R@g2n?;m{>X;=pun}NLy42N)' +rD%(V:SW긦0uTϞ-̾$[hѼ̷p# µxpN>]Q XyEOyzZŔo5<="0ʹաd+"Eӓoo8gkM=p{Es|_k%+ 8eZnT%8Lp D$ت(vD54V"c>56ԉEJ2d.Y+*. bINcl]m"@pcN(a>/ʄTbT"DKE{Vy/15! @-> r`ok6Ts֧?k^r`/p*Es2܌ IB𠀝6 sTΛ$gĤC*Uxv *d/ErM_v(:[/nwA;-5 J:JTUZVqYj  +rIiFSEQĀM qyFF=$8(D'b`D}rƵ74ͣdNOriQA`+Y ѐs{Rtf\Vd&pr ?εDžA`Pǖ7@;3Ccy`-!U4ʫCxq+pC91?gwSN$@iZ?-3DXq %.V)! l LJ,ϵ)2s7j,׃&Z>X G{\|ɱ pieҊ3r[[CpW?!H+" TGS'YəwWYi+n% VG<|$Ԫ}-fQUŸ "8tT;"eSSDR%B%-hE {yKW#!-R+"UA);f ph'֠q_/ݵ))*C wpVc߻")? jlVtD=j4JӍO-Yzӈv 1?ޟNy,U8L,L )Y_j>:TPGP:FoQXp[uuz1h xhXHD23KFftv2Tdhx;!Rv5ViIiiɴj>bݯ8{'f=TXu#eM+)~fOGxO4WAB N}cn6lg~/G[ĵ5"}驙{ 3Aߴz=KAR 8o{8uo3-=A&Hiִb?w8m,?0iKrG|qJxOq*]]ꃽ/kd(b:WC )84)K^ *b_明xdI_ܣ.>l l bJ)#M *%鸜.+f裔BY\ٛϷ"[JsRAe2?>R ^aw/=ֽRnlReDUKw-(XU(}\z y8s:>n7BkKV%þ{0+љأ;:lљ!Β1Xn\xMSԉK8޸U^cьDk6`&3~Ӿ \|'){~&<?sLyN8W  _}?O ՏWB?߯)n$9JG/SsS 5Ú37`5N';<]GKӸiƒVhqZ>ZhUp 9w.k!oA5qB94ewꜲ}䋳KE, 9>cgBtjŵTK JSA!rG ХWmmu5X 8 T?6G_ Nz_vxH Qc)CO? OeN?-ܐ_x5nh1GM&ArfjTZBf;f-&+5!EE}kA" fǡ%VyUwwKliZ'HE<O'Ym[)&O!~MGl_8io_1{U{~ɵ+kt/.%KBZdFG_aqrxdWLF0 濌GZz ,ű RŖݸ58 ʢ9vkp8Օڟ͙t^qy[mayBt/؍b{3hh9\F ]E.[6_ֺj]m=y F*uѭk$ؘŏMߪ-M]x[/cR" |˔Eja! -cFO(} enH9*H2TU y9eyŹ;{xZRcjJeK-N]W,v\ʢ*p=h 3nFݾқr'_/ ~{p[Ü>,z2CgΕHvM%-{%=iS?{zo%cӘtRi= dTe嘴K'tZ=]%#OOuٗ#{ ;C}w $ 9\ȁB)\,)~8}4"#B_MMk&C)_@h-+7npO2mPf,2_i~;{7"AƯػ7є2OD45OJF`ndoG7}߹n㼽mͰDy'rOI(p%N"AEviGϹ&j>w4)kRQ 5YBkr8p l)8{Я!2J@O2W}nX݃j ?w571`>NڰQ8Zjz\)hdn02eȀX"# o7 JC讅E9yTG:(iUc~Ҿ7=Qi% b"gcj*@g{N&8ђ"2RaOźN}wkˤ'/th~n_kJh?aфLL,{-!ÌC^4tM'E%=TcوȻ7}ﴘ@,6*&jZ5ZYK%C5%p|2UY 1J?+6&N)@|SҨI%Ғ\PW &Fn!fC LTc5=TQ{WfWeo+6*Վ_} endstream endobj 300 0 obj <> stream xuT}lS?6?Բ#k]6a&Kbr,.dz{e:7-n.M0;lnAo;d0olX&kk٦Hu|VgpҠk&VǤ# Nb4Z-:[oY;u{u[oٹևOi0d$}RUQ,zD;+)$erԛta*M@-PtWET(^E>UNnS\ʞ3Tݒ: (N/r9|*3ں%`BޠFk>cˌ=eBu>!"$I`^'~?n@^),@ja$L>i)~APHBS>1$B}z")i_wX_~/A~U?\rksLLn(>㤑V + D` xJ{).C3| DQL{'#Ę%y@Nҫy+cQ&Ym{5= ǎHX(4 \05?X(]$,4Yn@&ě#8)Ž1$KɹV]Q RynL &EJ82 IHB".f _$inG:9]3|Y-o#,4;_ކs'fI4|s zACvcN\#4~ lnb6 [ޣ>-DHf05 ;?xd-bdD:D>mmK&ձw'^Rn꽰WMyq zm.þ586 B _ 3PޘdNcHJ9.E !B65a@K-CD 0%-%7(@܃J'sǷ!W畔Mn,JoqCU=X2&),2 |oqɫ㸗-nd`A7IA2S\eW.#H!~  (Q[->X-6$eeʹU|ڻP[db2*"bTx6Nbi)#?׮ 5ɉ endstream endobj 302 0 obj <> stream xcd`aa`ddw q74 JM/I,əa!C<?/*UD~@5`tb`cdͮn,,/I-KI+)64300q/,L(Q0200 I Nz ^ٙ y) ^zz ~@L<ԌĜ44`נ` Ѐ`M=t~E9@g1v&FFk,\{c ߏ)ڹr^I7GWUUg n)^}=i9[ ?]X@ߡ z&L0qpN>u{= 0a~Du z'zpT EaɆ endstream endobj 304 0 obj <> stream xcd`aa`ddp p M3 JM/I,əf!Cy?wȰI*BD>@`\+SReg``_PYQ`d`` $-*ˋ3R|ˁ y I9i i ! A Az΅rs`4dd`bf`fdd} &,Y߽~yIww[ʮ=c}:{} 9O/>}h9ǟIYurlY=> ^luͭvݵ}ؾX.~Wo&ڦ.^1gܖur._~[+n_߭$;9~<{)XBy8'pupŽwR+xxO?r$^9 Ie endstream endobj 306 0 obj <> stream xcd`aa`ddpr M3 JM/I,əf!Cy?aU |" H' ]-&H9(3=DXWHZ*$U*8)x%&ggg*$(x)34R3sBR#B]܃C5Н &h xLR7t9mJ`W鞱F.ݹrު)Kv_=NE.!P@xP;԰ߟgxwS?0X_4ѳժ?wk=9jʕ[u_.oߢfZ-Y1wºrxa[+n'Wߍs%s8~-<{ XBy8'pupoǽwRO<<z7O;W1+ v~ endstream endobj 308 0 obj <> stream xڝW XSg>I 爠16vbDJqzVDzB / _!k< *:UNG/umg\ާ>7g9[}# @ ݰwknTE+ߟ1lg"8r߹[m!Lj8  YM-W,s ".c*:)6bgxǬ3ucHHU`vUkש#!a¶8nYqƏ6qq%ܗשb<:S7x+ "RXZ'@%!$=@8.+1"%RbXA$VkuGzb&b3K[T!! `Hp]\/<+Z(:o5*Ck@ry=uLtlжIb};jlĎAn-{Mf wrKN^' ɟ^$$? M) 44T>w| eX8O% `5#'t tQh-"pkiFM 24'yA$/yKp׷{NT H}W(D(hUҮQF'D2.H{n%TH[EuaOI{tJR4Ŷ6ۘcxqE"k>dBScQ1LTt.Qȡ2Krl"9;$">Zonoh6["" SC,Jx n 3+PJRtPgdaw<{ PsS 5&{Za‹Y V*(Me!??3ubή}3N ~ D ! ݭ`Q?X h(kO~1Ԑ[L+j쬳7 wp2r&C&^.F:䬶~i lx5_T  B'҃uw#MxF.#!Lp^+lT+ߊ.y ]]Tp/ 4pϮo`awp;,DMTMZ5B9H[ da?22ʙ}-(.-,*,1s#Pe#a;BoLf/@K*6!{RvQNx8reT}\[̀qOu uy2ABT9+,y):}1W=pLa=@k^qRB7΋ 8.}<2 =̚QXe\{H!AN8flݖDHn-'?ߗU6%kq,s>J 2O`NrBbjA # *kZ:ǾKQU]^xk>цޯmsDპh[l%eZn:#;Att̀NCG*3/ҨD3C6״^Z:fb'ӑҁld/Nb |Ȭ-z0G,*q W;u~|c+MC~3KSiޓHͪЀ]]]}p Ηn0vOsVB5͂ATh8B5ߜtN@:u>V 7Agx '` "eћ\$.A5vx6mI#ͪfݧoqvCEl < FS`"1kȖ5IOS$0?)37RݦZ({ʑ+Rۿ;uwWMT1q^@plj{-c"a))Zސ@yI=#m[(B XTR\r$+にep0y.SE[ߧSB TZSQ]%#7fOb*2; (Z[dЩ7UU7˚fː dk£g=77`A =H8cMw\KɯLefAOR$i+Yn]4C tJ.^ܶI9J,x ϐȡw  P9cUK$cqiI#,"o*.@m6r%lLR'W6o3Z^ X;Q+Zژ~!DQݼ M;sM~4/o+s'X kRI7֭WXM?~:U9靾W [6Z^oQp`stN&6nEEH"Kx}kퟻHg_AڑȍtDU5*3n: lCr2 R뫆{fJsntAgD$8m*)E=ZAVWfG5ȃ jHIgf9)!GeNN}(1ﳴcx{L;z:<_RLQ Er7dg,5o;It@› 7'z%ea+nȅ.p8#]MHz>6IGZ١ |Iݷ{Pn`E?L)~ʹ9o͢9y'+7A)î78p]_Bi?=aHXMd8&2:^whJ!24y|saPͰ6?^DZKjxS;/aar6#=C OiM.:!Sn9pi;ݮ F8-ȭ}t:*/3;B0@^le (3Лh.}3.}0 K ]4VC39+j^IY0ME /rsqS2o6m)PcNA: yID-] TlJȊ  B*RA;0pya)Z͑6}Y9J*W/fl6lڀXX>5*uEź֎ ':&7ӑly endstream endobj 310 0 obj <> stream xڅW TSG! kZZB)Vk"*;C(D DEjEJM]jݶ+gnsqPpmIΜ3}7%P"U+._Tw]H Q[_,LgEd &

Rj9KNAQ+*ZEޤR* 6PsȠY4EdoA\+KNmlt&3| $vJq[gGaXC* pPwG;$BAh}PLkd*&2]Pk~?^}H7M֔bHԐ^&y!,粧^BdU 8q;gn慅BS??%e)etBG#<Dliߢ`1c_d'g}/?Xi!ļhvՂ-l-V{ -I8[J v(!vDxƏ]`0fgdk̽D@*" iw0H!w O4LZ`NWSZ=0޺xx[#w j*vIC`Z%>Fɐʰ5i%ۀY} 1:Θ,=oG24XmIVFs^tF<] ]K4Pw@}q?XIw˼^?Ν~zgO˜5.ZPXFloze2a. r6D3ٗ7OBbA]Β] *қ1\xj̽$~ (b"*2&z:hKI)ic{2266A'rٳ#;=&p_fȰ6?u qYFYV|V d燸rAsk Mg=FzX?"{J*L?{se1fnm;SϢg{΢ugM*ˊ~Ʋ?N\"g>uk ["rCLudCNeǃ z>a? ( y QK[y+w0%cUvdfҥ<߶/۸譴\SchlⱿ\V [Em|޲#YO,}=gC<<էc>o! j$UfDf X؟Z"xBQ}w27 !I&h }VEFq0"~&L=:ZH{1C\;"쑙4MqqM\\djj2q8,4-sxEi@ ;c9ݟIC(" TR4AE>. ]haD_$:KRt U#*i<"$ bJ[48׼3&N!3@_l ۄ=XАLRHĎ`NcWd;fgmDŽzI1ʖ6EChfBTUZt~l<~ߚC7A;{1.<G>R5T%r8]}PCF|1{Td{DƸd6_!G@pVYG`u&12;s;wY!s_7/\_9TmFs8u g%AWypj;"j@˚Ӻq/ j5c"5R cգkSʕA456+J`>oF6f36BrASE[)|޷: 0{:r,FV #73D(.DrAXJ$n?g:{9k"n.a ɧ j J`N=LPupUP_Ve:1!,C⌞X`%BzRrіV(W5(͕s÷BupsҘDXmsőkF . ;F#0׼3C{w`\JЯAH.;M`҈4kbEBX8PG`i[B5gj3Ongk2mtsҎ_# _Pnء75KeWD7x޸#0C#T> stream xڕUkTSWrsQkV뽙Z8B7ڢcb R@(H v%cuUGk:s;sPY_sZgw77B  9j-[4j攅A5}O!w,qg}kꅳB5iێu9^yn(0((xFOR)EAALl4 JTO1pKd&iԒXT'I"bwJ"n߾-2,|^`}b\@/]IBF$V!ZbJl 6-VbF|DD$E v.p#fAAWۮѮB @Au6Og ` O[RR)kSwijog<>O Y JfɸS\% Yöu6o:7Fkysت65*-Tt3gg+&* g)Z\\IahU):;L\g N@YY.>K Pq,I]Jak?JHvE Bs,Ğ9Eww.i4=DŽ0thiT4k+ @V`٣rvzwj :F俣z1 &%YG"'U,i})UL? Hک6X\F0oLbr~ f#pBK(1 _{0x4q /TĝSԍ\^SQB _!nIjHHHJR%4$642/iؕ/%C auE:-.lM$4!-1HYjKM? elq)Qq33P46L-FS_ |3pK)2)u0ѩ6<~2B՝6]:щBh@l:>Y{\@iɫR9{_M[$;y47mJ,\.tUgFAabϻYst<+uV#du[]CWt(׀; ( 2aEݿU#+ .aP-c$DB?DCWe%׮_}+W_Gl*|{=h$^Dqu9F -֗b HSq֐]]ỳ.3eދ7mtvDtBC/wʷ@0' !w[|>ܱ$$R.TtC$2'7Cg{翃D1E$-N+=bؼn`c@3K}h%u-2ʯAerq!8iC};TZ؀ss r&ڣV. ytܕPw_z(Mk)̚EN9&6 co?q{u6bFLyw`0a+&?V,V^;㄄(ItZ.K,24?a$ U:aꍹy֓<p;>NaJa嬰yA|?[ܡ)Jc;4rul)EΦ:\|4p01}BţnM,Ja_4 eMA KhD[ |ςңUt95\%=J%OKMI;mRcdEh<<)u%SZ]rc`(QM-v: endstream endobj 314 0 obj <> stream xcd`aa`ddp u M34,ILa!C<^Iy&Ú',AD@1PQ@44$83/='UH9(3=DXWHZ*$U*8)x%&ggg*$(x)34R3sBR#B]܃C5н&Ũ$Y|hj_e?X +|Rs^Ԫn@ؕ^{D;{Kod3OYz{DI3gr.<{qlr\,!> stream xڵWixTU"hjoP(҄(@؄IAJ*ܪ־$$TTV8,DYTzpZaq9NSi9ϓʹ}'LD"u5\.[%KߗxQҦ}饑 ?D٘0''Na?^-!>f/nkc(Q"+;gEgr8J";SVTEZQT(͓%>IJJ"볉 Wg-OL/J\0ea"91?q^QabFv^ĢWS7isMS7l~bO_#PgzNjEjxJ1T1U$L^j j5OZ@%QS+j-HJꨇb(5z.zBsPJ|r< ?6!H@OomJ=9OcCq}Ec-Ayn7CQ:^.A,AS(@5T~h2̍&8Cš,BB.*~=$rS< 4 ,f>}W׷``Ą>itP*) k ಺N4;øU\#88A+'= yd'm!eZoz@@X'u ?%q[ R VErA h7Nj`қtL[B lTn f-T)Vy{45˸@7zo\w jвf1 uB+Ѓ̍öeXN[|B @@om$02qb6>\C݃VNucIZ(y#3!h N?xPSD NGPhA {Z4)>43ixLjlWj7ˀN)=X{I+96Wtq0 A? }x5Ѫ B 8-vp;c3ɮA։g`r0>xFBD]25\1p!vm: b :]6;NS*xlfe#6?R4 { P@EYUQELȽPzu?An6ˏ&—W CV4ebm Iuk@Ǿl1 U#Xf޽-vmL[H5Q @Ȏ8`p9d"U+uZ/gF;~"`A['R"ڧ"-8ORHJύC.Gp/s`4 Ƅ[vV='`,i- >H+^ ʪYk*Z-4f~u0'E~ DgXmePSΗ+ ;_%:ǒNµ붱\cwy^liFjvQ]ދf 5p$ fcn??r!Sh@- `et$ά6&h8fkXk⫌j[]5 N\&l&IGY<'K/:_yfhn<q??`g6H=S6u:vY^apHtT@->>k-vCIiPWP1U.kR7ww9"~'9o&|\}bZmV- 8G ,V> FP^k&K.M$&6vhLc%-֞=QqDkkH*=;[ujד;-)j|$=`;o ^*u]ɫޣ)QJI;vcuGqzIXbS~ ϊÏp͵2Y@^_ɦ{ ԃ@I6wӝϺbq5u-J9꺪L;6mԀ$iť#Ku羨;o~)+X\Jwh ί^ 5;yqpa5L;hV^*[.݊Y5߄K܉}/Abt{q᳻GQjXz5&IjzH X3؍IKNtk8N\RyQ 3] TɌodv.(k7@}G#xIRJʭq5d`2$4-﨣$#[3 ;<_G+3Gs (cgP"Ђ1Kޏ4 6ؑ+Ҝڳ?'OP+דuٷк|h"]wBwtH Lmts;v  =iЫ6w1ήP3H[ކ`1婷yᑻ=Wre:_1-O1̥ˁު}=?V난Cv7I 59*R`;b) ތC6*~v$6n9z8%Je%,(z_t:Sd'Xxug v RM^]ԧGPщH {І↪(K`>QV%/|gq![}tzN/;f}bm)GzR>|8IJwb; !T]ڍvA6蝼Kk.zC8|cJ/ h2+\%mlK &]]֠ԑƵƟU)@.o>UJHlh< F5v:m; ׏ q)OuOicӃ K||Rk Ý endstream endobj 318 0 obj <> stream xڍV TT>39*ar/3u]B LTQ"&DF:g9-QU{-Yif#iuk;{ֿ9?rp$k@`AI[<݃cR\q"%NTea(?I%gNcvbL&+7$6:%bI25q1)WmTz#"ҷ'ĩ"F=M^/HU\?(Q}Gڝ~ꛘ:sH:kE}ֳdC%ڄ쏗bw< o{"?ܐ7M+3ܧIP)DwӞ Sɚ8ܷ_'[8WIWL{.sNJC h5 w@/T X(rk*̮]Xsaa~m,cWSV"GY0pYA@4v\` yЁޜy{Y 8K[IsAq -$F2IU@BQeOl̀vE-!\Z-dncK}U6%)N%Io]A,oWNjI' ιSO%+јc MY1 zf6`oԔJ xvK.=o}2`X) b|pj.g΄ZLBIYU}$X+2뷘`BQ[Nxa- =;9<2{`:r@7ˍ(v!.ə!wnMo:ouv]5d.VHMňf&1389Ɗne;ş0eJSQ\2ic&{_䶫C nx 5<2\oV&hF177ލvSyh:X1GZPѼpW56+DVِVa8UnUԢu vC3TꛊA ):r""CDB~^CC#.23kli(ϐ64&1Ɇ6Z4$V.j)R _W/į `TX %gHgXy]gjLhxեsoFPo"蒠]2Te3nuIdtCÃl<)/3GDZŹ`ew3`zP_|Ch b:f߆前;[v76P]n[E{6j&^[[(N*ư4qpð%{#zizs?5Xq} _f_Yor֊ErDo4FYo*"M/N\"})ǡ0MaN-<3M&v½FNG/GJ(CӾJt?$"M}I;_yA?kE0F//.=ċ2=UZ9)yl7jaT~+"UMiNKOl{#y+{HBDw/4;'q?(X7<ƕ rҎ>ii-H:w4x4`Ǡ`M} !q59^ sݧj# {49~q 8\@:o\ 3W᜿sV&#[UlK?ehI(XHmM/, =gB@.bE#kHvU(dRZ=Ͷu=FF'.kL;*;! rw,tYx7T^o42lƔ. 6\ Y0z`ˆ@o [Jf4lfNB/J j,~|qK& !//{auD]R-̢F;$9*w clX&d,7wBLҲ'8߯O\G endstream endobj 320 0 obj <> stream xڍ{pSUo6\(qo-"*/uX\ }ЖPGKKIӦM4wNM)MRJk[*V]˪ܝqO.{ ;/;9;G@`?'UWH+K/ڸgRZ:q#HMKN_L9񖌹!X8+߆#UwhgK{F&o)Wz[_e4d=8+[ZZ!SVgIvge/[A˳ȪJI+ffmڳ-ksѳm\Pd5p*)b}B*(O"vq;q/qq?0XBs6m2b/B{ k"8GC/Zьw5wIBT&] @tXv+6dj)VB]S:/DH|DW.]M+D9N-=f4LTi|x p;.D_ 24vzJ!Yb4>/Ot"L;81Y0* t`иՠrTbPE_\$?hn$5 L"E?}1_*-tX`Z)ơ4nc4 d5Z h-'A igCpFEҍҁ ] |̃$(>7ݘ)~z*~!wH$FZXQ5\HcN1=x^kaj)'srAūhp#^ۦ*/'k@·{EdĹM }}rRrׂnJN}]L.n Quj-it^˄(A9@K@f ؂V,[CRے\ƏP@9 iLn=SOU2Ps*^?9Ϭ 2K^k*hH?xMG 60K2KCzi8+OJic*x#b٫^jY k63b 7A lfK3/H>GD~M+v3uTUҨa Q ~=㚙\;D!p9섏B/'mtXJ13HU;n7DYU &.0OԂր5u; @V#1!O4W?̹}h`\d߈Bg Gw`|wk1cG;GQHGh >hx'o)UzIȽc {6,O43Z6 INТӂwG\bb4u噵VUu8#>A A4 4&b 4ׂt#$hZ@?~B %n~6 No02ܝ`9rdȣUR-÷lL_+v]D&Rb}x&nIlb* }JtKFJlZ?G rFg305(VkցylߞBt>W^q ,?&r0VX-e<;bGm;L,{k7#ff~8y'^ % xU=Gk.MG3qv8:Dg_J83u5q4 (pgpgFCRM2;91P;[K픖,Qgdm%ߦ MuA;O5И l3z 6kF+Nu-m'}X~p|v6R P7γ+rpjux (5^*@1(* ^F|av'fGz8-{r#T?՚j"Qvh|.[qG\=޾c#i2mg 6Jel .-Y2i[ЂɭTcv_yE*( P[(W7'gnQ6[6/J2o_͇ vH_3R\m=HT/y&dnYf%4\3{85zL~Tu%H%oeNdNi9p8qۃ\؎ endstream endobj 322 0 obj <> stream xm_HSqzjsdPؽ?3IW[naAj3sZ.36xݵ ܃OejE^z"zASQ= fп߽VvD ||΁ao=>_@ǻ;l uJhZ&@ "u4x!OfgpsF ځBmЋ'nxH u"P==WyyE ih jw."QBMrQQAPg@ #9NΠV]y[[O\jH76gEx ^Y&ݼu O4*VpO GcSx\V)<%3Y-gJyRGD* POg*Rl"Q&bDED Nka,-!-hrxaˮOo C)c1˛U2 mP::0' AJ,Z7wav!cy_ֳKݴIRIyR+?g>sOߦBj֟M?l"cI|] O_L#mژ?S|mg1f9dXrtxxhlе1.2 endstream endobj 324 0 obj <> stream xڅT{lTU?3sd%3.i BEFڰ<*-:-MN;39}Lo^N[ "(#Y5뮉 貜3sngJc7|~b"K6o|囷l19<;jm+ Wl쳚M2Dty}RCWhҭ)\8P*{GT̃n#hA(-DKQZ6nTڐqt uX+RoT;͵53flν^c69onrliqۻyu4XMvSUnuCmͽfUAaS5LWՊ<VyCɴ쨫MU-V[3/ʏeeZ-۪rNbw%۷UXV?MV}/\D\kNԈ> Aa8q =rD_V|>mI'FMv6"}!95/TTH,$i7c ])1?TQW&}O cfA ykpAŶNs;*clP1<C龢!g@줒=FֹG6^ޠ/ŋϜbOV[ ߳qekw/D(9?d^$@btrnO6Isđ&52O^*% d(g0~ k$O !q: }橻@-5LWOՌt!b<:#`Z6KE4w%Y|x׈:@7Wo\*)b_ Pk V>~Yc2}~Dw F qot|GޢYi)IzO&1o7s~|Vɲف5jM [gϴ#zS+NKEwu@ $iYBLC:ZbvFhI o%7ն T13!O?@p_}CK韕D}r @A%N4ep?X (c XrYS ra_tazYBd6vQ`0" HMRIɁI BG* 4_tG@ڠ(F1INk?:=S]/t}{F[ N 7sX0+~ct{6"iʶ"ҙ>u5b5R*?xscpА*dMn<'%}Jڟ>Rz/9^o0ަe!:#9ܢHZMB#m۵_3e%f(>-MLVGp}EJDwKlȫaO]NuHJ=V._'mS C?Qay7|65&يK enl(-#GrCtqB0W[n˘͘ Nν2o|, ¡hDFPx0ʟx> stream x][Ha㤫yڐ&[`jj$ں;j`x@>)ILYt́P/DH#H$nl{ )9G 32\c.B$"2b#T;e$B{&nR&S|5 %EN;j/Xhk춲Z&e3)&RR*yI7d,[,R+eL~1s.f>Vj*-(+3Q EZ! )P5Smx4pCP={^jj-I5-uLd>< Ґ79=ۣdHnٴ(a~qaÇ遅)LзL`>A!X:N*wVupg;^):M=`K rhYm*VWVDڗU9qt]u%43T?WƼ3o7잗¤8あǗGctw߽= 􏩤D=3S+lR фUm08y |!%91@5C4JKgmx2AmD]OxGw(*u"ܼ_T'ZOL{;oxT:?E&h@/-\.%h\]n}-ht{^8# endstream endobj 327 0 obj <> stream x]Pj sRJL=I%Ҵht 1} {y̼)Vr 0j<.vaIRV {#E !fy{/ܖ/pQΛC;m 8wf(c% Ok3չ_s򶓴 '$za&$XW('칊LJ㗸;Tݞ\s4\YTmO endstream endobj 329 0 obj <> stream xcd`aa`dd t v54TC Yr?Yd { TX9j *23J45 --u ,sS2|K2RsKԒJ +}rbt;M̒ ԢT kiNj1A@g30110]gZ}?˺D׳}]3zqʴI}݋9˱͝=W{r뤦޴QgWL<ss|7slݕU')=%rv\FͩnjG{Ţ7޽@-Mrպr#T*8~~]н䃭~T6-˻peڽy.w6?ٻ &Wr|ZtROY {g*_~'Mg̵{ b1';ч endstream endobj 331 0 obj <> stream xڅT{LSW 而Y 4 D])y2)---@WZ(GwVAxLeMbĹ\ dK$y|9;88r踴%bJ2Igmf5Ƭ<3y7{aGxOx0+C=x߃x ļq\.-Vmx}@U$9S"%, (2\+ΔʕRHL.+Ih<_#e٢C䨤dQtRBjb^ˋ2NIJpVta`{h,KRT, a8u|6^{"/YXE3z̡ jI5@}MQNEnr8m&WhB#ѮI g@>rv=:&5 G{[ݲq^{~]iUA91;H/t O.ϭoTQOSe蹛`~j?$kUƼYDZ/)sol\CrTPOY}}''\{u(DiYFgoٵQ z,Öi J8q\4Q/mCd C6!p#gݥ7x]t.=9s|,.+Tc6 bJ<96::E*n?Q,`\ nZ6 N;t:4PNm}W_::~L mڲZ)Wo1㧫~y>\B41ǗV 3_  Pi 5\*Bc8H4԰OJ*"~L6FvCoc1f#nŪ`v8=1D<b<[w>}@q2G%)OZBk&޶َKg!GGF_VdGzcIcnB -@:˥B&{gcs{*z6 >ĥaZ;g aKGHC7s< g*mHHN L&~Įx VyUjj]up+H$cpLDơ((R2MCg#ߍҔ8 3($Ȧ: #ҕ2 \ge-,QqvW@U0Bmz:R yz1vp@;(ԕU# &x|_zVx۾3OFfGG۪h,D೾k̭ELe VHsGǜ{+VEAIBW")"j4*O;Yfn"[f?_gޗn064sN߰d46by-: endstream endobj 333 0 obj <> stream xcd`aa`dd M34 JM/I,If!Cy?Nɰv*RD>@`b`edd-1300q/,L(Q0200 I Nz ^ٙ y) ^zz ~@L<ԌĜ44`נ` Ѐ`M= rsa:E̜e}+W:ccecs֭3gn.<{޴ XBy8'pupgRoo %[&N0Wp endstream endobj 335 0 obj <> stream x]n0E /E摇(>$@̐"cwCz<8(6,/7#̲M`N7BEƊxoG3 |ZaAF˺.i6(LH|civڷ^Ht-\\r[{,C>b7zlaט+, sUU.TG.iF*rd\1rDz1G sBr\z=<3N" 8]F̼E9 25߁Ef E${/$EiUtVzJӊ7 %g9l ԋ)*C endstream endobj 346 0 obj <> stream x]Mn0tQET`,Cܾ1ʢ Ûg{WE?f5,Tg`Fk6( rvʱў6c>K#AU۾*u^`T?$߼nj kߦ3]*SxiJvHN̺`u/ %ez7f Ej[S, չe> -=# L8,G>&:Qr@>&>cN'o/`T;DʓI ^Ic}  endstream endobj 349 0 obj <> stream x]Pn -/EH@",C`X0.غ"ŲcF3K9DzL0#~ a8RqF*2|%-E6Voqܧ5 .:'=Ao&S\/[;jMp5}K0Rׅ(qRanB"AmM3#Qe$Mr"|8/cccl>]%Ƭܢۄs6T?fo9 endstream endobj 358 0 obj <> stream x]͎0}.ńhBH4a1?\DJS·^,_e]j<6l~Pyz A( 2o9G?xJDZOHzxyZsyfk| ڗ W+C;,YQxɩYL0,x~< T[-^ZH8iӂ'.+bg'g78c8A>OJnp!8BnpqXe>2˼/:QR)[tT"&P!Y8t1';,6K0ƕw7k=(xzҸ?v> endstream endobj 361 0 obj <> stream x]n0 yWCVX%h!zώ;qv;AYW|I7~0y; (ݠuzlʷ־#n|>Q*;y u]Wc^`M?<R_k^Cn^Op\\6ܭf( .jl[ 5Wy2? erwP4L}!xOƞ3|@N +fE\1W3++++眊r*UF s\OJ|!9!f; jzkSِ9}5Jߴ9~4߳(<0 endstream endobj 366 0 obj <> stream x]n0E|,E6iJ)%EbчJR0q [ggF3qQ*k4Rhƻ vs3GSQ\|0% 3W}KmH8?T?&C%,llbli_ZeKQ{+u$sߎ(pRf(K#نW$2 jU'骧1\,ogx]p}oSJ{8].2OVr/?nL/qB5*Wk endstream endobj 263 0 obj <> stream x\s۶޿N;x?t2MĎ8|P$֭,&?X)RRg3"bG(L)Ɍ376N5e 44gF(:eg֋LX9m2/HXy/0Wp `BWqc J?…Z0bX/*eM?]ϲаWE)?/Uplp?\|HZВĤ$&%1) ]$*("*+ꮨ;,S]QwM5u]SwM54C6Qll%CQwƴĝ%,qgIDY96;[ܹ;Gc:E $l\{WmȄOGs$SaaBܰ= }O ;8 8 Mh,$j51s]x^Yxe╍G#5N$M SDVDd%:䐡$C˽uǑ|%JXgjs)l/e@VoL ʡs y<:B] >./'[dLdbB$:4.YYŐɩoY=׽`6; SVZ ۲='&1!!\k_a*Gc,1pY0Q1^x/<# ][=e(7@a -QXx2zUW\?實墸D0L*(Ռ:,CP{SPP˫2Ujx"Dcߢ^|ɟӛ? & p~YMQYx?H@i3b_?n16 IgSbRWٺM 7bdS'x/AZY\FטN`` {4*G%I>l`vhzjveeCX1q uǘU=JC]kmprZRYs P+4q%+9cP"ԫn P@&%1cHD"量:W5FpY\K'LEl}kc 3!|WC#Ng]0¨i&_LPK!F8m:,A8]ZhbT55swa7VZC~C w/qiө: 6VybLjJhӎ++8XS6yB,Q9֦7=Uk[S]xn~]!6Lep@26zCԾ=[p@E8tFG{tVdC@'45р @&C~X56-Vl&EŒ[S +KQLIT_\P!&t\9l[iiAy 5UVDGuDZC@GH1'gzי+Ilh-d4sMVL 8,> \aqQ`K_Y# Z7@n.EHXN@O c!0+hRQm S J+;&4OK;*ZR4YLWҤyb;9JRr< 8dW*/4z&gdK"z)7'+S&Zi?mg(lD2Z?kxu^ |8p\`![S?q[rҪmm9ۛVd1?K3N@ݱr?e1]ep\Э(5#T?\|餼Zs<\GW% N02^LpH"/^ϧPW@ m?0ē>|hb6 rzS. MCA~4y\L/x_= 8VfX~^om\p6\e*}ps猸ţ\O鬀Ń01/EY,Iy"/E_,Khg_Ug1Njb?ȏY<Q7(I*:&_櫼_ןg))˖ -H} ){Aw/68<.wFlK\ \ǺUMpG0Ğ|L]FB㷿7/ף9/l,t'2'+`Wx6n/G"q0? fQ1`_dZ,t';nG?Xg](&D\.Z>n^E\FfUlGoYlh/ #ˁoGճgbUr^ T!D'Vuyvn-Md40io0II0vͷ;wO.dH0 s[η҆`LϳYqQ7d\|F ol2&.BP. l$"O!視R~LR~?I~,1mL[; wxN/@}k'4}_[ ux`+0hyYCW*F~gx>kж uk--c"x~{mל>z+؁5 XxS,M,DnMR̊k6r:.EΉA6:uq[8<|!뗲n >N(tq_9-Z@>O/VGg&yT݄^+$f]7)7wK|9~hSL6$ֶC*n=yqaY msO52uV[ "c^"o eI'0L3q7xtf#gb7iFw/_ml1ZDJ\nt/^Kwgo  ×Z؟׳YQ&M_I ;^?^TGnv BHZz4c!~Qr_! "{KUQ]sMNH]o;}Ppޗ,8tqNvXӺuo´&ng~ѿ/Ɍ?k&:s[vH&9 ࿤<_ endstream endobj 372 0 obj <<1a7155de541053e0f4484d23720f8a05>]/Root 1 0 R/Info 2 0 R/Size 373/W[1 3 2]/Filter/FlateDecode/Length 853>> stream x-WUAEMqgETLL1TyN4Mr*'4Uݴ O+Q泞}k3/"L6`Nae&;M.I/6_cNmFa$F,M{N4vױ#v]0b7o`쉽xJ!9{c0cwULhJp.CdY o ,Ra8GHq b2M,v?kM9Sia%.mDc,ii4O GN ĉjT3NMlNTiNt|3eMpk#˓/d~8!E\J˽y&p>^/…s»!ly&%/ ej|4`