densityClust/ 0000755 0001762 0000144 00000000000 14555733152 012756 5 ustar ligges users densityClust/NAMESPACE 0000644 0001762 0000144 00000002300 14471055374 014170 0 ustar ligges users # Generated by roxygen2: do not edit by hand
S3method(clustered,densityCluster)
S3method(clusters,densityCluster)
S3method(findClusters,densityCluster)
S3method(labels,densityCluster)
S3method(plot,densityCluster)
S3method(plotMDS,densityCluster)
S3method(plotTSNE,densityCluster)
S3method(print,densityCluster)
export(clustered)
export(clusters)
export(densityClust)
export(estimateDc)
export(findClusters)
export(plotDensityClust)
export(plotMDS)
export(plotTSNE)
importFrom(FNN,get.knn)
importFrom(RColorBrewer,brewer.pal)
importFrom(Rtsne,Rtsne)
importFrom(ggplot2,aes_string)
importFrom(ggplot2,geom_label)
importFrom(ggplot2,geom_line)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,geom_segment)
importFrom(ggplot2,geom_text)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,labs)
importFrom(ggplot2,scale_color_manual)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(ggrepel,geom_label_repel)
importFrom(grDevices,rainbow)
importFrom(graphics,legend)
importFrom(graphics,locator)
importFrom(graphics,plot)
importFrom(graphics,points)
importFrom(gridExtra,grid.arrange)
importFrom(stats,cmdscale)
importFrom(stats,dist)
importFrom(stats,rnorm)
useDynLib(densityClust, .registration = TRUE)
densityClust/README.md 0000644 0001762 0000144 00000011200 14471074477 014234 0 ustar ligges users
# Clustering by fast search and find of density peaks
[](https://github.com/thomasp85/densityClust/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/thomasp85/densityClust?branch=main)
[](https://CRAN.R-project.org/package=densityClust)
[](https://CRAN.R-project.org/package=densityClust)
This package implement the clustering algorithm described by Alex
Rodriguez and Alessandro Laio (2014). It provides the user with tools
for generating the initial rho and delta values for each observation as
well as using these to assign observations to clusters. This is done in
two passes so the user is free to reassign observations to clusters
using a new set of rho and delta thresholds, without needing to
recalculate everything.
## Plotting
Two types of plots are supported by this package, and both mimics the
types of plots used in the publication for the algorithm. The standard
plot function produces a decision plot, with optional colouring of
cluster peaks if these are assigned. Furthermore `plotMDS()` performs a
multidimensional scaling of the distance matrix and plots this as a
scatterplot. If clusters are assigned observations are coloured
according to their assignment.
## Cluster detection
The two main functions for this package are `densityClust()` and
`findClusters()`. The former takes a distance matrix and optionally a
distance cutoff and calculates rho and delta for each observation. The
latter takes the output of `densityClust()` and make cluster assignment
for each observation based on a user defined rho and delta threshold. If
the thresholds are not specified the user is able to supply them
interactively by clicking on a decision plot.
## Usage
``` r
library(densityClust)
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
#> Distance cutoff calculated to 0.2767655
plot(irisClust) # Inspect clustering attributes to define thresholds
```
``` r
irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
```
``` r
split(iris[,5], irisClust$clusters)
#> $`1`
#> [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [11] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [21] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> [41] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
#> Levels: setosa versicolor virginica
#>
#> $`2`
#> [1] versicolor versicolor versicolor versicolor versicolor versicolor
#> [7] versicolor versicolor versicolor versicolor versicolor versicolor
#> [13] versicolor versicolor versicolor versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor versicolor versicolor versicolor
#> [31] versicolor versicolor versicolor versicolor versicolor versicolor
#> [37] versicolor versicolor versicolor versicolor versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor virginica virginica virginica virginica
#> [55] virginica virginica virginica virginica virginica virginica
#> [61] virginica virginica virginica virginica virginica virginica
#> [67] virginica virginica virginica virginica virginica virginica
#> [73] virginica virginica virginica virginica virginica virginica
#> [79] virginica virginica virginica virginica virginica virginica
#> [85] virginica virginica virginica virginica virginica virginica
#> [91] virginica virginica virginica virginica virginica virginica
#> [97] virginica virginica virginica virginica
#> Levels: setosa versicolor virginica
```
Note that while the iris dataset contains information on three different
species of iris, only two clusters are detected by the algorithm. This
is because two of the species (versicolor and virginica) are not clearly
seperated by their data.
## Refences
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of
density peaks. Science, 344(6191), 1492-1496.
densityClust/man/ 0000755 0001762 0000144 00000000000 14555723732 013534 5 ustar ligges users densityClust/man/densityClust.Rd 0000644 0001762 0000144 00000007613 13173652543 016520 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/densityClust.R
\name{densityClust}
\alias{densityClust}
\title{Calculate clustering attributes based on the densityClust algorithm}
\usage{
densityClust(distance, dc, gaussian = FALSE, verbose = FALSE, ...)
}
\arguments{
\item{distance}{A distance matrix or a matrix (or data.frame) for the
coordinates of the data. If a matrix or data.frame is used the distances and
local density will be estimated using a fast k-nearest neighbor approach.}
\item{dc}{A distance cutoff for calculating the local density. If missing it
will be estimated with \code{estimateDc(distance)}}
\item{gaussian}{Logical. Should a gaussian kernel be used to estimate the
density (defaults to FALSE)}
\item{verbose}{Logical. Should the running details be reported}
\item{...}{Additional parameters passed on to \link[FNN:get.knn]{get.knn}}
}
\value{
A densityCluster object. See details for a description.
}
\description{
This function takes a distance matrix and optionally a distance cutoff and
calculates the values necessary for clustering based on the algorithm
proposed by Alex Rodrigues and Alessandro Laio (see references). The actual
assignment to clusters are done in a later step, based on user defined
threshold values. If a distance matrix is passed into \code{distance} the
original algorithm described in the paper is used. If a matrix or data.frame
is passed instead it is interpretted as point coordinates and rho will be
estimated based on k-nearest neighbors of each point (rho is estimated as
\code{exp(-mean(x))} where \code{x} is the distance to the nearest
neighbors). This can be useful when data is so large that calculating the
full distance matrix can be prohibitive.
}
\details{
The function calculates rho and delta for the observations in the provided
distance matrix. If a distance cutoff is not provided this is first estimated
using \code{\link[=estimateDc]{estimateDc()}} with default values.
The information kept in the densityCluster object is:
\describe{
\item{\code{rho}}{A vector of local density values}
\item{\code{delta}}{A vector of minimum distances to observations of higher density}
\item{\code{distance}}{The initial distance matrix}
\item{\code{dc}}{The distance cutoff used to calculate rho}
\item{\code{threshold}}{A named vector specifying the threshold values for rho and delta used for cluster detection}
\item{\code{peaks}}{A vector of indexes specifying the cluster center for each cluster}
\item{\code{clusters}}{A vector of cluster affiliations for each observation. The clusters are referenced as indexes in the peaks vector}
\item{\code{halo}}{A logical vector specifying for each observation if it is considered part of the halo}
\item{\code{knn_graph}}{kNN graph constructed. It is only applicable to the case where coordinates are used as input. Currently it is set as NA.}
\item{\code{nearest_higher_density_neighbor}}{index for the nearest sample with higher density. It is only applicable to the case where coordinates are used as input.}
\item{\code{nn.index}}{indices for each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.}
\item{\code{nn.dist}}{distance to each cell's k-nearest neighbors. It is only applicable for the case where coordinates are used as input.}
}
Before running findClusters the threshold, peaks, clusters and halo data is
\code{NA}.
}
\examples{
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds
irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)
}
\references{
Rodriguez, A., & Laio, A. (2014). \emph{Clustering by fast search and find of density peaks.} Science, \strong{344}(6191), 1492-1496. doi:10.1126/science.1242072
}
\seealso{
\code{\link[=estimateDc]{estimateDc()}}, \code{\link[=findClusters]{findClusters()}}
}
densityClust/man/plotMDS.Rd 0000644 0001762 0000144 00000002705 13173652543 015345 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/densityClust.R
\name{plotMDS}
\alias{plotMDS}
\title{Plot observations using multidimensional scaling and colour by cluster}
\usage{
plotMDS(x, ...)
}
\arguments{
\item{x}{A densityCluster object as produced by \code{\link[=densityClust]{densityClust()}}}
\item{...}{Additional parameters. Currently ignored}
}
\description{
This function produces an MDS scatterplot based on the distance matrix of the
densityCluster object (if there is only the coordinates information, a distance
matrix will be calculate first), and, if clusters are defined, colours each
observation according to cluster affiliation. Observations belonging to a cluster
core is plotted with filled circles and observations belonging to the halo with
hollow circles. This plotting is not suitable for running large datasets (for example
datasets with > 1000 samples). Users are suggested to use other methods, for example
tSNE, etc. to visualize their clustering results too.
}
\examples{
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds
irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)
}
\seealso{
\code{\link[=densityClust]{densityClust()}} for creating \code{densityCluster}
objects, and \code{\link[=plotTSNE]{plotTSNE()}} for an alternative plotting approach.
}
densityClust/man/findClusters.Rd 0000644 0001762 0000144 00000003672 14470606244 016472 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/densityClust.R
\name{findClusters}
\alias{findClusters}
\alias{findClusters.densityCluster}
\title{Detect clusters in a densityCluster obejct}
\usage{
findClusters(x, ...)
\method{findClusters}{densityCluster}(x, rho, delta, plot = FALSE, peaks = NULL, verbose = FALSE, ...)
}
\arguments{
\item{x}{A densityCluster object as produced by \code{\link[=densityClust]{densityClust()}}}
\item{...}{Additional parameters passed on}
\item{rho}{The threshold for local density when detecting cluster peaks}
\item{delta}{The threshold for minimum distance to higher density when detecting cluster peaks}
\item{plot}{Logical. Should a decision plot be shown after cluster detection}
\item{peaks}{A numeric vector indicates the index of density peaks used for clustering. This vector should be retrieved from the decision plot with caution. No checking involved.}
\item{verbose}{Logical. Should the running details be reported}
}
\value{
A densityCluster object with clusters assigned to all observations
}
\description{
This function uses the supplied rho and delta thresholds to detect cluster
peaks and assign the rest of the observations to one of these clusters.
Furthermore core/halo status is calculated. If either rho or delta threshold
is missing the user is presented with a decision plot where they are able to
click on the plot area to set the treshold. If either rho or delta is set,
this takes presedence over the value found by clicking.
}
\examples{
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds
irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)
}
\references{
Rodriguez, A., & Laio, A. (2014). \emph{Clustering by fast search and find of density peaks.} Science, \strong{344}(6191), 1492-1496. doi:10.1126/science.1242072
}
densityClust/man/clusters.Rd 0000644 0001762 0000144 00000003127 13173652543 015666 0 ustar ligges users % Generated by roxygen2: do not edit by hand
% Please edit documentation in R/densityClust.R
\name{clusters}
\alias{clusters}
\alias{clusters.densityCluster}
\title{Extract cluster membership from a densityCluster object}
\usage{
clusters(x, ...)
\method{clusters}{densityCluster}(x, as.list = FALSE, halo.rm = TRUE, ...)
}
\arguments{
\item{x}{The densityCluster object. \code{\link[=findClusters]{findClusters()}} must have
been performed prior to this call to avoid throwing an error.}
\item{...}{Currently ignored}
\item{as.list}{Should the output be in the list format. Defaults to FALSE}
\item{halo.rm}{Logical. should halo observations be removed. Defaults to TRUE}
}
\value{
A vector or list with cluster memberships for the observations in the
initial distance matrix
}
\description{
This function allows the user to extract the cluster membership of all the
observations in the given densityCluster object. The output can be formatted
in two ways as described below. Halo observations can be chosen to be removed
from the output.
}
\details{
Two formats for the output are available. Either a vector of integers
denoting for each observation, which cluster the observation belongs to. If
halo observations are removed, these are set to NA. The second format is a
list with a vector for each group containing the index for the member
observations in the group. If halo observations are removed their indexes are
omitted. The list format correspond to the following transform of the vector
format \code{split(1:length(clusters), clusters)}, where \code{clusters} are
the cluster information in vector format.
}
densityClust/man/figures/ 0000755 0001762 0000144 00000000000 14534577027 015201 5 ustar ligges users densityClust/man/figures/README-unnamed-chunk-2-1.png 0000644 0001762 0000144 00000070062 14471074477 021702 0 ustar ligges users PNG
IHDR z4 iCCPkCGColorSpaceGenericRGB 8U]hU>+$Ԧ5lRфem,lAݝi&3i)>A['!j-P(G 3k~s,[%,-:t}
}-+*&¿ gPG݅ج8"e Ų]A b ;l õ Wϙ2_E,(ۈ#Zsێ<5)"E6N#ӽEkۃO0}*rUt.iei # ]r
>cU{t7+ԙg߃xu