dynamicTreeCut/0000755000176200001440000000000012670402626013177 5ustar liggesusersdynamicTreeCut/NAMESPACE0000644000176200001440000000015412670334431014414 0ustar liggesusersexportPattern("^[^\\.]") importFrom("stats", "cutree") importFrom("utils", "flush.console", "read.table") dynamicTreeCut/Changelog0000644000176200001440000001127112516111576015013 0ustar liggesusers 2015-04-22: 1.63 . Bugfix fixes integer overflow problems with dendrograms that have more than aout 46k singletons . Bugfix makes handling externalBranchSplitFnc more robust . ExternalBranchSplitFnc can now be supplied as a simple vector of function names, rather than a list. . Fractional deepSplit is now accepted and will lead to linearly interpolated settings of minGap and maxCoreScatter 2014-05-07: 1.62 . New arguments minSplitHeight and minSplitAbsHeight allow the user to specify a height below which branches will automatically be merged. 2013-11-22: 1.61 . New argument externalSplitFncNeedsDistance can prevent unnecessary passing of the dissimilarity (distance) matrix to the external split functions (if they are used at all) 2013-11-19: 1.60-3 . A separate error/bug that could result in incorrectly interpreted cluster assignment fixed 2013-11-16: 1.60-2 . Occasional error fixed that occurred when only one cluster is present 2013-11-03: 1.60-1 . Occasional error crash in the function fixed (reported by Axel Kohei Glaubitz). 2013-08-20: 1.60 . Major clean-up: all expr- and stability-related merging criteria are removed. Instead, these should be supplied via the externalBranchSplitFnc. This cleans up the dynamic tree cut and puts the correlation network-specific criteria where they belong, namely into the WGCNA package. . Multiple externalBranchSplitFnc can be given. . Argument externalSplitThreshold is now called minExternalSplit. 2013-07-24: 1.55 . New arguments externalBranchSplitFnc, externalSplitThreshold, and externalSplitOptions let the users supply their own branch merging criterion for cutreeHybrid. 2011-01-26: 1.51 . Minor speed and efficiency gain via internal re-structuring . Bug fix: useMedoids = TRUE spurious error fixed 2010-12-10: 1.50 . Major new functionality: branch merging criteria expanded to include: 1. module split (a measure of separation between the two branches), 2. correlation of eigennodes, 3. similarity measure based on clustering of re-sampled data Criteria 1 and 2 are only applicable to correlation networks. Criterion 3 assumes that a stability study has been carried out before the final module identification. All three new criteria are optional and ignored if the requisite inputs are not supplied. 2009-11-22: 1.21 . new function merge2Clusters 2009-01-29: 1.20 . Support for cluster trimming has been dropped . New functionality: PAM stage can now optionally respect the dendrogram. If this option is set, for each object PAM stage will only consider those clusters that are on the same branch as the object being PAMed. In other words, an object can only be asigned by PAM to one of the clusters lying below the object on the branch on which the object is merged. Intuitively, this should produce a label assignment that is more compatible with the dendrogram. It will also prevent assignment of far outlying objects, resulting in more unassigned objects and cleaner, tighter clusters. 2008-06-12: . Changed verbose output from stage 1,2 to Tree Cut, PAM stage . distM is only modified (copied) if necessary 2008-06-03: 1.12 . labelUnlabeled is now deprecated and replaced by (renamed to) pamStage. . deepSplit range extended to 0 through 4 2008-04-03: cleanup: function .assignModuleColor commented out to remove a R CMD check warning. 2008-03-12: 1.11-3 another bug in interpretation of deepSplit fixed; in particular, the default now works. 2008-03-07: 1.11-2 bug in interpretation of deepSplit fixed 2008-03-05: 1.11-1 dynamicHybrid: When number of merges below the cut is smaller than minClusterSize, simply return with all labels=0 instead of throwing an error. 2008-02-19: 1.11 change default cutHeight from max(dendro$height) to 99% of the range of between maximum and 5th percentile (from minimum) of dendro$height change in printFlush: use cat, not print to get rid of the [1] " leading the output 2008-02-16: 1.10-02 cutreeDynamic: if method=="hybrid" and distM not given, issue a warning and default to method="tree" 2007-12-11: 1.10 cutreeHybrid: distM is a mandatory parameter now (no more NULL default) since it is used in stage 1 as well. 2007-11-22: 1.09 cutreeDynamicTree: bug where the last element on the dendrogram remained unlabeled is fixed by Bin Zhang. 2007-11-13: 1.09 Changes in Dynamic Tree: minimum significant forward runlength is now set as a fraction (1/3)+1 of the minimum cluster size; minAttachModuleSize is now 2*minModuleSize instead of a hardcoded number 100. 2007-09-17 Updated URL in description to http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting/ dynamicTreeCut/R/0000755000176200001440000000000012670334576013410 5ustar liggesusersdynamicTreeCut/R/PrintFlush.R0000644000176200001440000000047412174310651015621 0ustar liggesusersprintFlush = function(...) { # x = print(...) cat(...); cat("\n"); if (exists("flush.console")) flush.console(); } indentSpaces = function(indent = 0) { if (indent>0) { spaces = paste(rep(" ", times=indent), collapse=""); } else { spaces = ""; } spaces; } dynamicTreeCut/R/cutreeWrapper.R0000644000176200001440000001000512325313407016342 0ustar liggesusers#---------------------------------------------------------------------------------------------- # # cutreeDynamic # #---------------------------------------------------------------------------------------------- # A wrapper function for cutreeHybrid and cutreeDynamicTree. cutreeDynamic = function( dendro, cutHeight = NULL, minClusterSize = 20, method = "hybrid", distM = NULL, deepSplit = (ifelse(method=="hybrid", 1, FALSE)), # Advanced options maxCoreScatter = NULL, minGap = NULL, maxAbsCoreScatter = NULL, minAbsGap = NULL, minSplitHeight = NULL, minAbsSplitHeight = NULL, # External (user-supplied) measure of branch split externalBranchSplitFnc = NULL, minExternalSplit = NULL, externalSplitOptions = list(), externalSplitFncNeedsDistance = NULL, assumeSimpleExternalSpecification = TRUE, # PAM stage options pamStage = TRUE, pamRespectsDendro = TRUE, useMedoids = FALSE, maxDistToLabel = NULL, maxPamDist = cutHeight, respectSmallClusters = TRUE, # Various options verbose = 2, indent = 0) { #if (!is.null(labelUnlabeled)) #{ # pamStage = labelUnlabeled; # warning("The argument 'labelUnlabeled' is deprecated. Please use 'pamStage' instead."); #} if (!is.null(maxDistToLabel)) { printFlush("cutreeDynamic: maxDistToLabel in deprecated. Please use maxPamDist instead"); maxPamDist = maxDistToLabel; } if (class(dendro)!="hclust") stop("Argument dendro must have class hclust."); methods = c("hybrid", "tree"); met = charmatch(method, methods); if ( (met==1) && (is.null(distM)) ) { warning(paste('cutreeDynamic: method "hybrid" requires a valid dissimilarity matrix "distM".', '\nDefaulting to method "tree".')); met = 2; } if (is.na(met)) { stop(paste("Invalid method argument. Accepted values are (unique abbreviations of)", paste(methods, collapse = ", "))); } else if (met==1) { # if (is.null(distM)) stop('distM must be given when using method "hybrid"'); return(cutreeHybrid(dendro = dendro, distM = distM, cutHeight = cutHeight, minClusterSize = minClusterSize, deepSplit = deepSplit, maxCoreScatter = maxCoreScatter, minGap = minGap, maxAbsCoreScatter = maxAbsCoreScatter, minAbsGap = minAbsGap, minSplitHeight = minSplitHeight, minAbsSplitHeight = minAbsSplitHeight, # External (user-supplied) measure of branch split externalBranchSplitFnc = externalBranchSplitFnc, minExternalSplit = minExternalSplit, externalSplitOptions = externalSplitOptions, externalSplitFncNeedsDistance = externalSplitFncNeedsDistance, assumeSimpleExternalSpecification = assumeSimpleExternalSpecification, pamStage = pamStage, pamRespectsDendro = pamRespectsDendro, useMedoids = useMedoids, maxPamDist = maxPamDist, respectSmallClusters = respectSmallClusters, verbose = verbose, indent = indent)$labels); } else { return(cutreeDynamicTree(dendro = dendro, maxTreeHeight = cutHeight, deepSplit = deepSplit, minModuleSize = minClusterSize)); } } #---------------------------------------------------------------------------------------------- # # merge2Clusters # #---------------------------------------------------------------------------------------------- # Manually merge 2 clusters. merge2Clusters= function(labels, mainClusterLabel, minorClusterLabel) { labels2 = ifelse(as.character(labels)==minorClusterLabel, mainClusterLabel, as.character(labels)) if (class(labels)=="numeric") labels2 = as.numeric(labels2); if (class(labels)=="factor") labels2 = factor(labels2) labels2; } dynamicTreeCut/R/cutreeDynamic.R0000644000176200001440000003774011140453740016324 0ustar liggesusers#------------------------------------------------------------------------------------------ # # cutreeDynamic # #------------------------------------------------------------------------------------------ # Modification(s) by Peter Langfelder: returns numerical labels (not colors) in a vector (not a factor). # Several rendundant blocks of code removed; duplicate function definitions removed; unused # functions removed. # maxTreeHeight now checked for being too large. #.minAttachModuleSize = 100; cutreeDynamicTree = function(dendro, maxTreeHeight=1, deepSplit=TRUE, minModuleSize=50) { if (is.null(maxTreeHeight)) maxTreeHeight = 0.99 * max(dendro$height); if (maxTreeHeight > max(dendro$height)) maxTreeHeight = 0.99 * max(dendro$height); staticCutCluster = .cutTreeStatic(dendro=dendro, heightcutoff=maxTreeHeight, minsize1=minModuleSize) #get tree height for every singleton #node_index tree_height demdroHeiAll= rbind( cbind(dendro$merge[,1], dendro$height), cbind(dendro$merge[,2], dendro$height) ) #singletons will stand at the front of the list myorder = order(demdroHeiAll[,1]) #get # of singletons no.singletons = length(dendro$order) demdroHeiAll.sort = demdroHeiAll[myorder, ] demdroHei.sort = demdroHeiAll.sort[c(1:no.singletons), ] demdroHei = demdroHei.sort[seq(no.singletons, 1, by=-1), ] demdroHei[,1] = -demdroHei[,1] # combine with prelimilary cluster-cutoff results demdroHei = cbind(demdroHei, as.integer(staticCutCluster)) # re-order the order based on the dendrogram order dendro$order demdroHei.order = demdroHei[dendro$order, ] static.clupos = .locateCluster(demdroHei.order[, 3]) if (is.null(static.clupos) ){ module.assign = rep(0, no.singletons) return ( module.assign ) } static.no = dim(static.clupos)[1] static.clupos2 = static.clupos static.no2 = static.no #split individual cluster if there are sub clusters embedded mcycle=1 while(1==1){ clupos = NULL for (i in c(1:static.no)){ mydemdroHei.order = demdroHei.order[ c(static.clupos[i,1]:static.clupos[i,2]), ] #index to [1, clusterSize] mydemdroHei.order[, 1] = mydemdroHei.order[, 1] - static.clupos[i, 1] + 1 #cat("Cycle ", as.character(mcycle), "cluster (", static.clupos[i,1], static.clupos[i,2], ")\n") #cat("i=", as.character(i), "\n") iclupos = .processIndividualCluster(mydemdroHei.order, cminModuleSize = minModuleSize, cminAttachModuleSize = 2*minModuleSize) iclupos[,1] = iclupos[,1] + static.clupos[i, 1] -1 #recover the original index iclupos[,2] = iclupos[,2] + static.clupos[i, 1] -1 clupos = rbind(clupos, iclupos) #put in the final output buffer } if(deepSplit==FALSE){ break } if(dim(clupos)[1] != static.no) { static.clupos = clupos static.no = dim(static.clupos)[1] }else{ break } mcycle = mcycle + 1 #static.clupos } final.cnt = dim(clupos)[1] #assign colors for modules module.assign = rep(0, no.singletons) module.cnt=1 for (i in c(1:final.cnt )) { sdx = clupos[i, 1] #module start point edx = clupos[i, 2] #module end point module.size = edx - sdx +1 if(module.size 0 : running length (with same sign) to right, otherwise to the left #mysign = -1: negative value, mysign = -1: positive value .runlengthSign = function(mysequence, leftOrright=-1, mysign=-1){ seqlen = length(mysequence) if(leftOrright<0){ pseq = rev(mysequence) }else{ pseq = mysequence } if(mysign<0){ #see where the first POSITIVE number occurs nonezero.bool = (pseq > 0) }else{ #see where the first NEGATIVE number occur nonezero.bool = (pseq < 0) } if( sum(nonezero.bool) > 0){ runlength = min( c(1:seqlen)[nonezero.bool] ) - 1 }else{ runlength = 0 } } #"0" is for grey module #.assignModuleColor = function(labelpred, minsize1=50, anameallmodules=FALSE, auseblackwhite=FALSE) { # here we define modules by using a height cut-off for the branches #labelpred= cutree(dendro,h=heightcutoff) #cat(labelpred) #"0", grey module doesn't participate color assignment, directly assigned as "grey" #labelpredNoZero = labelpred[ labelpred >0 ] #sort1=-sort(-table(labelpredNoZero)) ## sort1 #modulename= as.numeric(names(sort1)) #modulebranch= sort1 >= minsize1 #no.modules = sum(modulebranch) # #colorcode=GlobalStandardColors; # ##"grey" means not in any module; #colorhelp=rep("grey",length(labelpred)) #if ( no.modules==0){ #print("No module detected.") #} #else{ #if ( no.modules > length(colorcode) ){ #print( paste("Too many modules:", as.character(no.modules)) ) #} # #if ( (anameallmodules==FALSE) || (no.modules <=length(colorcode)) ){ #labeledModules = min(no.modules, length(colorcode) ) #for (i in c(1:labeledModules)) { #colorhelp=ifelse(labelpred==modulename[i],colorcode[i],colorhelp) #} #colorhelp=factor(colorhelp,levels=c(colorcode[1:labeledModules],"grey")) #}else{#nameallmodules==TRUE and no.modules >length(colorcode) #maxcolors=length(colorcode) #labeledModules = no.modules #extracolors=NULL #blackwhite=c("red", "black") #for(i in c((maxcolors+1):no.modules)){ #if(auseblackwhite==FALSE){ #icolor=paste("module", as.character(i), sep="") #}else{#use balck white alternatively represent extra colors, for display only ##here we use the ordered label to avoid put the same color for two neighboring clusters #icolor=blackwhite[1+(as.integer(modulename[i])%%2) ] #} #extracolors=c(extracolors, icolor) #} # ##combine the true-color code and the extra colorcode into a uniform colorcode for ##color assignment #allcolorcode=c(colorcode, extracolors) # #for (i in c(1:labeledModules)) { #colorhelp=ifelse(labelpred==modulename[i],allcolorcode[i],colorhelp) #} #colorhelp=factor(colorhelp,levels=c(allcolorcode[1:labeledModules],"grey")) #} #} # #colorhelp #} # This function written by Peter Langfelder, based on .assignModuleColor above but simplified. # Assigns module numbers, not colors. All modules are labeled. #"0" is for grey module .assignModuleNumber = function(labelpred, minsize1=50) { #"0", grey module doesn't participate color assignment, directly assigned as "grey" labelpredNoZero = labelpred[ labelpred >0 ] sort1=-sort(-table(labelpredNoZero)) # sort1 modulename= as.numeric(names(sort1)) modulebranch= sort1 >= minsize1 no.modules = sum(modulebranch) #"grey" means not in any module; colorhelp=rep(0,length(labelpred)) for (i in c(1:no.modules)) { colorhelp=ifelse(labelpred==modulename[i],i,colorhelp) } colorhelp } #locate the start/end positions of each cluster in the ordered cluster label sequence #where "-1" indicating no cluster #3-1 -1 1 1 1 1 2 2 2 #3 3 -1-1 1 1 1 1 2 2 2 (shift) #--------------------------------- #0-4 0 2 0 0 0 1 0 0 0 (difference) # * * @ .locateCluster = function(clusterlabels) { no.nodes = length(clusterlabels) clusterlabels.shift = c(clusterlabels[1], c(clusterlabels[1:(no.nodes-1)]) ) #a non-zero point is the start point of a cluster and it previous point is the end point of the previous #cluster label.diff = abs(clusterlabels - clusterlabels.shift) #process the first and last positions as start/end points if they belong to a cluster instead of no # cluster "-1" if(clusterlabels[1] >0) {label.diff[1]=1} if(clusterlabels[no.nodes]>0) {label.diff[no.nodes]=1} flagpoints.bool = label.diff > 0 if( sum(flagpoints.bool) ==0){ return(NULL) } flagpoints = c(1:no.nodes)[flagpoints.bool] no.points = length(flagpoints) myclupos=NULL for(i in c(1:(no.points-1)) ){ idx = flagpoints[i] if(clusterlabels[idx]>0){ if(flagpoints[i+1]==no.nodes) {#boundary effect myclupos = rbind(myclupos, c(idx, flagpoints[i+1]) ) break }else{ myclupos = rbind(myclupos, c(idx, flagpoints[i+1]-1) ) } } } myclupos } #input is the cluster demdrogram of an individual cluster, we want to find its embbeded subclusters #execution order: mean-height ==> (mean+max)/2 ==> (mean+min)/2 #useMean: =0 ~ use mean-height as calibation line # =1 ~ use (mean+max)/2 as calibation line to detect relatively a small cluster sitting on the head of a bigger one, # so mean-height is too low to detect the two modules. # =-1~ use (mean+min)/2 as calibation line to detect relatively a small cluster sitting on the tail of a bigger one, # so mean-height & (mean+max)/2 are too high to detect the two modules .processIndividualCluster = function(clusterDemdroHei, cminModuleSize=50, cminAttachModuleSize = 2* cminModuleSize, minTailRunlength= as.integer(cminModuleSize/3)+1, useMean=0) { #for debug: use all genes #clusterDemdroHei =demdroHei.order no.cnodes = dim(clusterDemdroHei)[1] cmaxhei = max(clusterDemdroHei[, 2]) cminhei = min(clusterDemdroHei[, 2]) cmeanhei = mean(clusterDemdroHei[, 2]) cmidhei = (cmeanhei + cmaxhei)/2.0 cdwnhei = (cmeanhei + cminhei)/2.0 if (useMean==1){ comphei = cmidhei }else if (useMean==-1){ comphei = cdwnhei }else{ #normal case comphei = cmeanhei } # compute height diffrence with mean height heidiff = clusterDemdroHei[,2] - comphei heidiff.shift = .shiftSequence(heidiff, -1) # get cut positions # detect the end point of a cluster, whose height should be less than meanhei # and the node behind it is the start point of the next cluster which has a height above meanhei cuts.bool = (heidiff<0) & (heidiff.shift > 0) cuts.bool[1] = TRUE cuts.bool[no.cnodes] = TRUE if(sum(cuts.bool)==2){ if (useMean==0){ new.clupos=.processIndividualCluster(clusterDemdroHei=clusterDemdroHei, cminModuleSize=cminModuleSize, cminAttachModuleSize=cminAttachModuleSize, useMean=1) }else if(useMean==1){ new.clupos=.processIndividualCluster(clusterDemdroHei=clusterDemdroHei, cminModuleSize=cminModuleSize, cminAttachModuleSize=cminAttachModuleSize, useMean=-1) }else{ new.clupos = rbind(c(1, no.cnodes)) } return (new.clupos) } #a good candidate cluster-end point should have significant # of ahead nodes with head < meanHei cutindex =c(1:no.cnodes)[cuts.bool] no.cutps = length(cutindex) runlens = rep(999, no.cutps) cuts.bool2 = cuts.bool for(i in c(2:(no.cutps-1)) ){ seq = c( (cutindex[i-1]+1):cutindex[i] ) runlens[i] = .runlengthSign(heidiff[seq], leftOrright=-1, mysign=-1) if(runlens[i] < minTailRunlength){ #cat("run length=", runlens[i], "\n") cuts.bool2[ cutindex[i] ] = FALSE } } #attach SMALL cluster to the left-side BIG cluster if the small one has smaller mean height cuts.bool3=cuts.bool2 if(sum(cuts.bool2) > 3) { curj = 2 while (1==1){ cutindex2 =c(1:no.cnodes)[cuts.bool2] no.clus = length(cutindex2) -1 if (curj>no.clus){ break } pre.sdx = cutindex2[ curj-1 ]+1 #previous module start point pre.edx = cutindex2[ curj ] #previous module end point pre.module.size = pre.edx - pre.sdx +1 pre.module.hei = mean(clusterDemdroHei[c(pre.sdx:pre.edx) , 2]) cur.sdx = cutindex2[ curj ]+1 #previous module start point cur.edx = cutindex2[ curj+1 ] #previous module end point cur.module.size = cur.edx - cur.sdx +1 cur.module.hei = mean(clusterDemdroHei[c(cur.sdx:cur.edx) , 2]) #merge to the leftside major module, don't change the current index "curj" #if( (pre.module.size >minAttachModuleSize)&(cur.module.hei 2){ if( (cutindex2[no.cutps] - cutindex2[no.cutps-1]+1) < cminModuleSize ){ cuts.bool2[ cutindex2[no.cutps-1] ] =FALSE } } cutindex2 = c(1:no.cnodes)[cuts.bool2] cutindex2[1]=cutindex2[1]-1 #the first no.cutps2 = length(cutindex2) if(no.cutps2 > 2){ new.clupos = cbind( cutindex2[c(1:(no.cutps2-1))]+1, cutindex2[c(2:no.cutps2)] ) }else{ new.clupos = cbind( 1, no.cnodes) } if ( dim(new.clupos)[1] == 1 ){ if (useMean==0){ new.clupos=.processIndividualCluster(clusterDemdroHei=clusterDemdroHei, cminModuleSize=cminModuleSize, cminAttachModuleSize=cminAttachModuleSize, useMean=1) }else if(useMean==1){ new.clupos=.processIndividualCluster(clusterDemdroHei=clusterDemdroHei, cminModuleSize=cminModuleSize, cminAttachModuleSize=cminAttachModuleSize, useMean=-1) } } new.clupos } #delta >0 : shift to right, otherwise to the left .shiftSequence = function(mysequence, delta){ seqlen = length(mysequence) if(delta>0){ finalseq=c(mysequence[1:delta], mysequence[1:(seqlen-delta)]) }else{ posdelta = -delta finalseq=c(mysequence[(posdelta+1):seqlen], mysequence[(seqlen-posdelta+1):seqlen]) } finalseq } #use height cutoff to remove .cutTreeStatic = function(dendro,heightcutoff=0.99, minsize1=50) { # here we define modules by using a height cut-off for the branches labelpred= cutree(dendro,h=heightcutoff) sort1=-sort(-table(labelpred)) sort1 modulename= as.numeric(names(sort1)) modulebranch= sort1 >= minsize1 no.modules=sum(modulebranch) colorhelp = rep(-1, length(labelpred) ) if ( no.modules==0){ print("No module detected") } else{ for (i in c(1:no.modules)) { colorhelp=ifelse(labelpred==modulename[i],i ,colorhelp) } } colorhelp } dynamicTreeCut/R/treeCut.R0000644000176200001440000011703312651027714015143 0ustar liggesusers # Attempt to speed up the function a bit. There is some on-the-fly growing of vectors that can be removed, # and perhaps some variables could be deleted from the branch structure. # - remove Clusters, LastMerge # - add counters keeping track o number of merges, number of singletons, number of basic clusters # ClusterTrim is removed for now; it would have been a bit more complicated to make it work with the # enhanced PAM stage. May be re-introduced later if anyone shows any interest in this. # Tree cut # 1.11-2 # . Bug fix in interpretation of deepSplit fixed # 1.11-1 # . If no merge lies below the cut height, simply exit with all labels=0 instead of throwing an # error. # . If cutHeight is above the highest merge, set it to the highest merge. # 1.11 # . change the default cutHeight to 99% of dendogram height range # 1.10-02: # . if distM isn't given, method defaults to "tree". # 1.10: # . Bug fix: since distM us necessary in the first stage as well, the function now complains if # distM is not given # 1.09: # . Bug fix: number of Unassigned = 1 is now handled correctly # 1.08: # . Changing the meaning of minGap, maxCoreScatter: both now relative (i.e., fractions). minGap will # be the minimum cluster gap expressed as a fraction of the range between a certain quantile of the # merging heights and the cutHeight; maxCoreScatter will be interpreted in the same way. Adding two # more parameters, namely minAbsGap, maxAbsCoreScatter that can be used to supply hitherto used # absolute values. If they are given, they override minGap and maxCoreScatter, respectively. # . Change of names: clusterMinSize -> minClusterSize # . Default value of minClusterSize now 20 # . Fixed stage 2 labeling for non-medoids: ClusterDiam{eter} now given is the maximum of average # distances of points to the rest of the cluster. # . Default for maxDistToLabel is now cutHeight even though it may mean that some objects above the # cutHeight may be labeled. There doesn't seem to be a clean and simple way to only label objects # whose merging distance to (the continuation of) the cluster is below cutHeight # . If cutHeight not given, will be set to max(dendro$height) for the DynamicTree as well. # 1.07: Changing parameter names to be more intuitive and in accord with generally used terminology # Also changing internal function names by prepending a . to them # . All extra functions (EvaluateCLusters etc) removed. # . Rename the function from GetClusters to cutreeHybrid # 1.06: The tree cut is exactly the same as 1.05, but the color handling is relegated to NetworkFunctions. # The ColorsFromLabels in NetworkFunctions use a slightly different input format; # in particular, 0 is considered grey. This means, among other things, that # 1.05: # . The PAM stage is changed: instead of calculating distances to medoids, will calculate average # distances to clusters. This is intuitively less desirable than the medoids, but simulations # seem to indicate that large clusters are too greedy. The average linkage may help that a bit. # It is not quite clear that cluster trimming makes a lot of sense in this scenario, but I'll # keep it in (assigning elements based on average distance to a trimmed cluster is not quite the # same as an untrimmed cluster, even for the elements that were trimmed). # . Improve trimming: instead of the lowest joining heights, keep all singletons up to the first joined # object (branch or singleton) whose merging height is above the threshold; the rest (content of all # branches merged higher than threshold) is trimmed. # 1.04: # . Implement Bin's idea that small branches unassigned in Stage 1 should not be broken # up by Stage 2. Implemented as follows: Only keep those branches unbroken whose only reason for not # being a cluster is that they don't have enough objects. # While going through the merge tree: mark branches that are (1) merged into composite clusters, (2) # not clusters themselves because of failing the minimum size requirement only. # . Change boudaries of clusters. Instead of regarding everything up to the merge automatically part # the respective cluster, only elements whose joining heights are less than a cutoff given for # the cluster are considered part of the cluster automatically; the rest is assigned in pam-like # manner. # 1.03: # . Changing the definition of the core from the most connected points to the first points added # to the cluster. Note that the order depends on how branches are merged, so that better be # correct as well. This makes the core stable against adding outliers to the cluster. # 1.02.01: fixing a bug in the main function that was referencing nonexistent heights member of # dendro. # . Fixing a correctness issue: the core average distance is the average distance between points # within the core, not the average distance of points within the core to all points in the # cluster. # 1.02: # . Changing core size: instead of minClusterSize + sqrt(Size - minClusterSize) it is now # CoreSize + sqrt(Size - CoreSize), with CoreSize = as.integer(minClusterSize) + 1. # 1.01.03: Fixing memory usage and a bug in which singletons were added twice (and more times). # In this version, to simplify things, Singletons are only kept for basic clusters. For composite # clusters they are NULL; CoreScatter should never be called for composite clusters as that can get # hugely time and memory intensive. # Another bug concerning couting unassigned points fixed. # -02: adding distance matrix to the parameters of GetClusters. # -03: Merging GetClusters and AssignLabel together; changes in variable names to make code more # readable. # Progress indicator... .initProgInd = function( leadStr = "..", trailStr = "", quiet = !interactive()) { oldStr = " "; cat(oldStr); progInd = list(oldStr = oldStr, leadStr = leadStr, trailStr = trailStr); class(progInd) = "progressIndicator"; .updateProgInd(0, progInd, quiet); } .updateProgInd = function(newFrac, progInd, quiet = !interactive()) { if (class(progInd)!="progressIndicator") stop("Parameter progInd is not of class 'progressIndicator'. Use initProgInd() to initialize", "it prior to use."); newStr = paste(progInd$leadStr, as.integer(newFrac*100), "% ", progInd$trailStr, sep = ""); if (newStr!=progInd$oldStr) { if (quiet) { progInd$oldStr = newStr; } else { cat(paste(rep("\b", nchar(progInd$oldStr)), collapse="")); cat(newStr); if (exists("flush.console")) flush.console(); progInd$oldStr = newStr; } } progInd; } # The following are supporting function for GetClusters. .CoreSize = function(BranchSize, minClusterSize) { BaseCoreSize = minClusterSize/2 + 1; if (BaseCoreSize < BranchSize) { CoreSize = as.integer(BaseCoreSize + sqrt(BranchSize - BaseCoreSize)); } else CoreSize = BranchSize; CoreSize; } # This assumes the diagonal of the distance matrix # is zero, BranchDist is a square matrix whose dimension is at least 2. .CoreScatter = function(BranchDist, minClusterSize) { nPoints = dim(BranchDist)[1]; PointAverageDistances = colSums(BranchDist) / (nPoints-1); CoreSize = minClusterSize/2 + 1; if (CoreSize < nPoints) { EffCoreSize = as.integer(CoreSize + sqrt(nPoints - CoreSize)); ord = order(PointAverageDistances); Core = ord[c(1:EffCoreSize)]; } else { Core = c(1:nPoints); EffCoreSize = nPoints; } CoreAverageDistances = colSums(BranchDist[Core, Core]) / (EffCoreSize-1); mean(CoreAverageDistances); } .interpolate = function(data, index) { i = round(index); n = length(data); if (i<1) return(data[1]); if (i>=n) return(data[n]); r = index-i; data[i] * (1-r) + data[i+1] * r; } .chunkSize = 100; #------------------------------------------------------------------------------------------- # # cutreeHybrid # #------------------------------------------------------------------------------------------- # Traverses a given clustering tree and detects branches whose size is at least minClusterSize, average # singleton joining height is at most maxCoreScatter and split (attaching height minus average # height) is at least minGap. If cutHeight is set, all clusters are cut at that height. # stabilityDistM is assumed normalized to the maximum possible distance, i.e. range between 0 and 1 and # distance 1 means the two genes were put in separate modules in all sampled clusterings. cutreeHybrid = function( # Input data: basic dendro, distM, # Branch cut criteria and options cutHeight = NULL, minClusterSize = 20, deepSplit = 1, # Advanced options maxCoreScatter = NULL, minGap = NULL, maxAbsCoreScatter = NULL, minAbsGap = NULL, minSplitHeight = NULL, minAbsSplitHeight = NULL, # External (user-supplied) measure of branch split externalBranchSplitFnc = NULL, minExternalSplit = NULL, externalSplitOptions = list(), externalSplitFncNeedsDistance = NULL, assumeSimpleExternalSpecification = TRUE, # PAM stage options pamStage = TRUE, pamRespectsDendro = TRUE, useMedoids = FALSE, maxPamDist = cutHeight, respectSmallClusters = TRUE, # Various options verbose = 2, indent = 0) { spaces = indentSpaces(indent); # if (verbose>0) printFlush(paste(spaces, "cutreeHybrid cluster detection starting...")); # No. of merges in the tree nMerge = length(dendro$height); if (nMerge < 1) stop("The given dendrogram is suspicious: number of merges is zero."); if (is.null(distM)) stop("distM must be non-NULL") if (is.null(dim(distM))) stop("distM must be a matrix."); if ( (dim(distM)[1] != nMerge+1) | (dim(distM)[2]!=nMerge+1) ) stop("distM has incorrect dimensions."); if (pamRespectsDendro & !respectSmallClusters) printFlush(paste("cutreeHybrid Warning: parameters pamRespectsDendro (TRUE)", "and respectSmallClusters (FALSE) imply contradictory intent.\n", "Although the code will work, please check you really", "intented these settings for the two arguments.")); if (any(diag(distM!=0))) diag(distM) = 0; refQuantile = 0.05; refMerge = round(nMerge * refQuantile); if (refMerge < 1) refMerge = 1; refHeight = dendro$height[refMerge]; if (is.null(cutHeight)) { cutHeight = 0.99 * (max(dendro$height) - refHeight) + refHeight; if (verbose>0) printFlush(paste(spaces, "..cutHeight not given, setting it to", signif(cutHeight,3), " ===> 99% of the (truncated) height range in dendro.")); } else { if (cutHeight > max(dendro$height)) cutHeight = max(dendro$height); } # If maxPamDist is not set, set it to cutHeight if (is.null(maxPamDist)) maxPamDist = cutHeight; nMergeBelowCut = sum(dendro$height <= cutHeight); if (nMergeBelowCut < minClusterSize) { if (verbose>0) printFlush(paste(spaces, "cutHeight set too low: no merges below the cut.")); return(list(labels = rep(0, times = nMerge+1))) } # Check external branch split function(s), if given. nExternalSplits = length(externalBranchSplitFnc); if (nExternalSplits>0) { if (length(minExternalSplit)<1) stop("'minExternalBranchSplit' must be given."); if (assumeSimpleExternalSpecification && nExternalSplits==1) { externalSplitOptions = list(externalSplitOptions); } externalBranchSplitFnc = lapply(externalBranchSplitFnc, match.fun); for (es in 1:nExternalSplits) { externalSplitOptions[[es]]$tree = dendro; if (length(externalSplitFncNeedsDistance)==0 || externalSplitFncNeedsDistance[es]) externalSplitOptions[[es]]$dissimMat = distM; } } MxBranches = nMergeBelowCut branch.isBasic = rep(TRUE, MxBranches); branch.isTopBasic = rep(TRUE, MxBranches); branch.failSize = rep(FALSE, MxBranches); branch.rootHeight = rep(NA, MxBranches); branch.size = rep(2, MxBranches); branch.nMerge = rep(1, MxBranches); branch.nSingletons = rep(2, MxBranches); branch.nBasicClusters = rep(0, MxBranches); branch.mergedInto = rep(0, MxBranches); branch.attachHeight = rep(NA, MxBranches); branch.singletons = vector(mode = "list", length = MxBranches); branch.basicClusters = vector(mode = "list", length = MxBranches); branch.mergingHeights = vector(mode = "list", length = MxBranches); branch.singletonHeights = vector(mode = "list", length = MxBranches); # Note: size equals the number of singletons on a basic branch, but not on a composite branch. nBranches = 0; spyIndex = NULL; if (file.exists(".dynamicTreeCutSpyFile")) { spyIndex = read.table(".dynamicTreeCutSpyFile", header = FALSE); printFlush("Found 'spy file' with indices of objects to watch for."); spyIndex= as.numeric(spyIndex[, 1]); printFlush(paste(spyIndex, collapse = ", ")); } # Default values for maxCoreScatter and minGap: defMCS = c(0.64, 0.73, 0.82, 0.91, 0.95); # Default max core scatter defMG = (1-defMCS)*3/4; # Default minimum gap nSplitDefaults = length(defMCS); # Convert deep split to range 1..5 if (is.logical(deepSplit)) deepSplit = as.integer(deepSplit)*(nSplitDefaults - 2); deepSplit = deepSplit + 1; if ((deepSplit<1) | (deepSplit>nSplitDefaults)) stop(paste("Parameter deepSplit (value", deepSplit, ") out of range: allowable range is 0 through", nSplitDefaults-1)); # If not set, set the cluster gap and core scatter according to deepSplit. if (is.null(maxCoreScatter)) maxCoreScatter = .interpolate(defMCS, deepSplit) if (is.null(minGap)) minGap = .interpolate(defMG, deepSplit); # Convert (relative) minGap and maxCoreScatter to corresponding absolute quantities if the latter were # not given. if (is.null(maxAbsCoreScatter)) maxAbsCoreScatter = refHeight + maxCoreScatter * (cutHeight - refHeight); if (is.null(minAbsGap)) minAbsGap = minGap * (cutHeight - refHeight); # if minSplitHeight was not given, set it to 0 if (is.null(minSplitHeight)) minSplitHeight = 0; # Convert (relative) minSplitHeight to corresponding minAbsSplitHeight if the latter was not given. if (is.null(minAbsSplitHeight)) minAbsSplitHeight = refHeight + minSplitHeight * (cutHeight - refHeight); nPoints = nMerge+1; # For each merge, record the cluster that it belongs to IndMergeToBranch = rep(0, times = nMerge) # For each object that joins a composite branch, record the number of the branch onBranch = rep(0, nPoints); # The root RootBranch = 0; if (verbose>2) { printFlush(paste(spaces, "..Going through the merge tree")); pind = .initProgInd(); } mergeDiagnostics = data.frame(smI = rep(NA, nMerge), smSize = rep(NA, nMerge), smCrSc = rep(NA, nMerge), smGap = rep(NA, nMerge), lgI = rep(NA, nMerge), lgSize = rep(NA, nMerge), lgCrSc = rep(NA, nMerge), lgGap = rep(NA, nMerge), merged = rep(NA, nMerge)); if (nExternalSplits > 0) { externalMergeDiags = matrix(NA, nMerge, nExternalSplits); colnames(externalMergeDiags) = paste("externalBranchSplit", 1:nExternalSplits, sep = "."); } extender = rep(0, .chunkSize); for (merge in 1:nMerge) if (dendro$height[merge]<=cutHeight) { # are both merged objects sigletons? if (dendro$merge[merge,1]<0 & dendro$merge[merge,2]<0) { # Yes; start a new branch. nBranches = nBranches + 1; branch.isBasic[nBranches] = TRUE; branch.isTopBasic[nBranches] = TRUE; branch.singletons[[nBranches]] = c(-dendro$merge[merge,], extender); branch.basicClusters[[nBranches]] = extender; branch.mergingHeights[[nBranches]] = c(rep(dendro$height[merge], 2), extender); branch.singletonHeights[[nBranches]] = c(rep(dendro$height[merge], 2), extender); IndMergeToBranch[merge] = nBranches; RootBranch = nBranches; } else if (sign(dendro$merge[merge,1]) * sign(dendro$merge[merge,2]) <0) { # merge the sigleton into the branch clust = IndMergeToBranch[max(dendro$merge[merge,])]; if (clust==0) stop("Internal error: a previous merge has no associated cluster. Sorry!"); gene = -min(dendro$merge[merge,]); ns = branch.nSingletons[clust] + 1; nm = branch.nMerge[clust] + 1; if (branch.isBasic[clust]) { if (ns>length(branch.singletons[[clust]])) { branch.singletons[[clust]] = c(branch.singletons[[clust]], extender); branch.singletonHeights[[clust]] = c(branch.singletonHeights[[clust]], extender) } branch.singletons[[clust]] [ns] = gene; branch.singletonHeights[[clust]] [ns] = dendro$height[merge] } else { onBranch[gene] = clust; } if (nm >= length(branch.mergingHeights[[clust]])) branch.mergingHeights[[clust]] = c(branch.mergingHeights[[clust]], extender) branch.mergingHeights[[clust]] [nm] = dendro$height[merge]; branch.size[clust] = branch.size[clust] + 1; branch.nMerge[clust] = nm; branch.nSingletons[clust] = ns; IndMergeToBranch[merge] = clust; RootBranch = clust; } else { # attempt to merge two branches: clusts = IndMergeToBranch[dendro$merge[merge,]]; sizes = branch.size[clusts]; # Note: for 2 elements, rank and order are the same. rnk = rank(sizes, ties.method = "first"); small = clusts[rnk[1]]; large = clusts[rnk[2]]; sizes = sizes[rnk]; branch1 = branch.singletons[[large]] [1:sizes[2]]; branch2 = branch.singletons[[small]] [1:sizes[1]]; spyMatch = FALSE; if (!is.null(spyIndex)) { n1 = length(intersect(branch1, spyIndex)); if ( (n1/length(branch1) > 0.99 && n1/length(spyIndex) > 0.99) ) { printFlush(paste("Found spy match for branch 1 on merge", merge)) spyMatch = TRUE } n2 = length(intersect(branch2, spyIndex)); if ( (n2/length(branch1) > 0.99 && n2/length(spyIndex) > 0.99) ) { printFlush(paste("Found spy match for branch 2 on merge", merge)) spyMatch = TRUE } } if (branch.isBasic[small]) { coresize = .CoreSize(branch.nSingletons[small], minClusterSize); Core = branch.singletons[[small]] [c(1:coresize)]; # SmAveDist = mean(apply(distM[Core, Core], 2, sum)/(coresize-1)); SmAveDist = mean(colSums(distM[Core, Core, drop = FALSE])/(coresize-1)); } else { SmAveDist = 0; } if (branch.isBasic[large]) { coresize = .CoreSize(branch.nSingletons[large], minClusterSize); Core = branch.singletons[[large]] [c(1:coresize)]; LgAveDist = mean(colSums(distM[Core, Core])/(coresize-1)); } else { LgAveDist = 0; } mergeDiagnostics[merge, ] = c(small, branch.size[small], SmAveDist, dendro$height[merge] - SmAveDist, large, branch.size[large], LgAveDist, dendro$height[merge] - LgAveDist, NA); # We first check each cluster separately for being too small, too diffuse, or too shallow: SmallerScores = c(branch.isBasic[small], branch.size[small] < minClusterSize, SmAveDist > maxAbsCoreScatter, dendro$height[merge] - SmAveDist < minAbsGap, dendro$height[merge] < minAbsSplitHeight ); if ( SmallerScores[1] * sum(SmallerScores[-1]) > 0 ) { DoMerge = TRUE; SmallerFailSize = !(SmallerScores[3] | SmallerScores[4]); # Smaller fails only due to size } else { LargerScores = c(branch.isBasic[large], branch.size[large] < minClusterSize, LgAveDist > maxAbsCoreScatter, dendro$height[merge] - LgAveDist < minAbsGap, dendro$height[merge] < minAbsSplitHeight ); if ( LargerScores[1] * sum(LargerScores[-1]) > 0 ) { # Actually: the large one is the one to be merged DoMerge = TRUE; SmallerFailSize = !(LargerScores[3] | LargerScores[4]); # cluster fails only due to size x = small; small = large; large = x; sizes = rev(sizes); } else { DoMerge = FALSE; # None of the two satisfies merging criteria } } if (DoMerge) { mergeDiagnostics$merged[merge] = 1; } #else #browser(); # Still not merging? If user-supplied criterion is given, check whether the criterion is below the # specified threshold. if (!DoMerge && (nExternalSplits > 0) && branch.isBasic[small] && branch.isBasic[large]) { if (verbose > 4) printFlush(paste0("Entering external split code on merge ", merge)); branch1 = branch.singletons[[large]] [1:sizes[2]]; branch2 = branch.singletons[[small]] [1:sizes[1]]; if (verbose > 4 | spyMatch) printFlush(paste0(" ..branch lengths: ", sizes[1], ", ", sizes[2])) #if (any(is.na(branch1)) || any(branch1==0)) browser(); #if (any(is.na(branch2)) || any(branch2==0)) browser(); es = 0; while (es < nExternalSplits && !DoMerge) { es = es + 1; args = externalSplitOptions[[es]]; args = c(args, list(branch1 = branch1, branch2 = branch2)); extSplit = do.call(externalBranchSplitFnc[[es]], args); if (spyMatch) printFlush(" .. external criterion ", es, ": ", extSplit); DoMerge = extSplit < minExternalSplit[es]; externalMergeDiags[merge, es] = extSplit; mergeDiagnostics$merged[merge] = if (DoMerge) 2 else 0; } } if (DoMerge) { # merge the small into the large cluster and close it. branch.failSize[[small]] = SmallerFailSize; branch.mergedInto[small] = large; branch.attachHeight[small] = dendro$height[merge]; branch.isTopBasic[small] = FALSE; nss = branch.nSingletons[small]; nsl = branch.nSingletons[large]; ns = nss + nsl; if (branch.isBasic[large]) { nExt = ceiling( (ns - length(branch.singletons[[large]]))/.chunkSize ); if (nExt > 0) { if (verbose > 5) printFlush(paste("Extending singletons for branch", large, "by", nExt, " extenders.")); branch.singletons[[large]] = c(branch.singletons[[large]], rep(extender, nExt)); branch.singletonHeights[[large]] = c(branch.singletonHeights[[large]], rep(extender, nExt)); } branch.singletons[[large]] [(nsl+1):ns] = branch.singletons[[small]][1:nss]; branch.singletonHeights[[large]] [(nsl+1):ns] = branch.singletonHeights[[small]][1:nss]; branch.nSingletons[large] = ns; } else { if (!branch.isBasic[small]) stop("Internal error: merging two composite clusters. Sorry!"); onBranch[ branch.singletons[[small]] ] = large; } nm = branch.nMerge[large] + 1; if (nm > length(branch.mergingHeights[[large]])) branch.mergingHeights[[large]] = c(branch.mergingHeights[[large]], extender); branch.mergingHeights[[large]] [nm] = dendro$height[merge]; branch.nMerge[large] = nm; branch.size[large] = branch.size[small] + branch.size[large]; IndMergeToBranch[merge] = large; RootBranch = large; } else { # start or continue a composite cluster. # If large is basic and small is not basic, switch them. if (branch.isBasic[large] & !branch.isBasic[small]) { x = large; large = small; small = x; sizes = rev(sizes); } # Note: if pamRespectsDendro, need to start a new composite cluster every time two branches merge, # otherwise will not have the necessary information. # Otherwise, if the large cluster is already composite, I can simply merge both clusters into # one of the non-composite clusters. if (branch.isBasic[large] | (pamStage & pamRespectsDendro)) { nBranches = nBranches + 1; branch.attachHeight[c(large, small)] = dendro$height[merge]; branch.mergedInto[c(large, small)] = nBranches; if (branch.isBasic[small]) { addBasicClusters = small; # add basic clusters } else addBasicClusters = branch.basicClusters[[small]]; if (branch.isBasic[large]) { addBasicClusters = c(addBasicClusters, large); } else addBasicClusters = c(addBasicClusters, branch.basicClusters[[large]]); # print(paste(" Starting a composite cluster with number", nBranches)); branch.isBasic[nBranches] = FALSE; branch.isTopBasic[nBranches] = FALSE; branch.basicClusters[[nBranches]] = addBasicClusters; branch.mergingHeights[[nBranches]] = c(rep(dendro$height[merge], 2), extender); branch.nMerge[nBranches] = 2; branch.size[nBranches] = sum(sizes); branch.nBasicClusters[nBranches] = length(addBasicClusters); IndMergeToBranch[merge] = nBranches; RootBranch = nBranches; } else { # Add small branch to the large one addBasicClusters = if (branch.isBasic[small]) small else branch.basicClusters[[small]]; nbl = branch.nBasicClusters[large]; nb = branch.nBasicClusters[large] + length(addBasicClusters); if (nb > length(branch.basicClusters[[large]])) { nExt = ceiling( ( nb - length(branch.basicClusters[[large]]))/.chunkSize); branch.basicClusters[[large]] = c(branch.basicClusters[[large]], rep(extender, nExt)); } branch.basicClusters[[large]] [(nbl+1):nb] = addBasicClusters; branch.nBasicClusters[large] = nb; branch.size[large] = branch.size[large] + branch.size[small]; nm = branch.nMerge[large] + 1; if (nm > length(branch.mergingHeights[[large]])) branch.mergingHeights[[large]] = c(branch.mergingHeights[[large]], extender); branch.mergingHeights[[large]] [nm] = dendro$height[merge]; branch.nMerge[large] = nm; branch.attachHeight[small] = dendro$height[merge]; branch.mergedInto[small] = large; IndMergeToBranch[merge] = large; RootBranch = large; } } } if (verbose > 2) pind = .updateProgInd(merge/nMerge, pind); } if (verbose > 2) { pind = .updateProgInd(1, pind); printFlush(""); } if (verbose>2) printFlush(paste(spaces, "..Going through detected branches and marking clusters..")); isCluster = rep(FALSE, times = nBranches); SmallLabels = rep(0, times = nPoints); for (clust in 1:nBranches) { if (is.na(branch.attachHeight[clust])) branch.attachHeight[clust] = cutHeight; if (branch.isTopBasic[clust]) { coresize = .CoreSize(branch.nSingletons[clust], minClusterSize); Core = branch.singletons[[clust]] [c(1:coresize)]; CoreScatter = mean(colSums(distM[Core, Core])/(coresize-1)); isCluster[clust] = branch.isTopBasic[clust] & (branch.size[clust] >= minClusterSize) & (CoreScatter < maxAbsCoreScatter) & (branch.attachHeight[clust] - CoreScatter > minAbsGap); } else { CoreScatter = 0; } if (branch.failSize[clust]) SmallLabels[branch.singletons[[clust]]] = clust; } if (!respectSmallClusters) SmallLabels = rep(0, times = nPoints); if (verbose>2) printFlush(paste(spaces, "..Assigning Tree Cut stage labels..")); Colors = rep(0, times = nPoints); coreLabels = rep(0, times = nPoints); clusterBranches = c(1:nBranches)[isCluster]; branchLabels = rep(0, nBranches); color = 0; for (clust in clusterBranches) { color = color+1; Colors[branch.singletons[[clust]]] = color; SmallLabels[branch.singletons[[clust]]] = 0; coresize = .CoreSize(branch.nSingletons[clust], minClusterSize); Core = branch.singletons[[clust]] [c(1:coresize)]; coreLabels[Core] = color; branchLabels[clust] = color; } Labeled = c(1:nPoints)[Colors!=0]; Unlabeled = c(1:nPoints)[Colors==0]; nUnlabeled = length(Unlabeled); UnlabeledExist = (nUnlabeled>0); if (length(Labeled)>0) { LabelFac = factor(Colors[Labeled]); nProperLabels = nlevels(LabelFac); } else nProperLabels = 0; if (pamStage & UnlabeledExist & nProperLabels>0) { if (verbose>2) printFlush(paste(spaces, "..Assigning PAM stage labels..")); nPAMed = 0; # Assign some of the grey genes to the nearest module. Define nearest as the distance to the medoid, # that is the point in the cluster that has the lowest average distance to all other points in the # cluster. First get the medoids. if (useMedoids) { Medoids = rep(0, times = nProperLabels); ClusterRadii = rep(0, times = nProperLabels); for (cluster in 1:nProperLabels) { InCluster = c(1:nPoints)[Colors==cluster]; DistInCluster = distM[InCluster, InCluster]; DistSums = colSums(DistInCluster); Medoids[cluster] = InCluster[which.min(DistSums)]; ClusterRadii[cluster] = max(DistInCluster[, which.min(DistSums)]) } # If small clusters are to be respected, assign those first based on medoid-medoid distances. if (respectSmallClusters) { FSmallLabels = factor(SmallLabels); SmallLabLevs = as.numeric(levels(FSmallLabels)); nSmallClusters = nlevels(FSmallLabels) - (SmallLabLevs[1]==0); if (nSmallClusters>0) for (sclust in SmallLabLevs[SmallLabLevs!=0]) { InCluster = c(1:nPoints)[SmallLabels==sclust]; if (pamRespectsDendro) { onBr = unique(onBranch[InCluster]); if (length(onBr)>1) stop(paste("Internal error: objects in a small cluster are marked to belong", "\nto several large branches:", paste(onBr, collapse = ", "))); if (onBr > 0) { basicOnBranch = branch.basicClusters[[onBr]]; labelsOnBranch = branchLabels[basicOnBranch] } else { labelsOnBranch = NULL; } } else { labelsOnBranch = c(1:nProperLabels) } # printFlush(paste("SmallCluster", sclust, "has", length(InCluster), "elements.")); DistInCluster = distM[InCluster, InCluster, drop = FALSE]; if (length(labelsOnBranch) > 0) { if (length(InCluster)>1) { DistSums = apply(DistInCluster, 2, sum); smed = InCluster[which.min(DistSums)]; DistToMeds = distM[Medoids[labelsOnBranch], smed]; closest = which.min(DistToMeds); DistToClosest = DistToMeds[closest]; closestLabel = labelsOnBranch[closest] if ( (DistToClosest < ClusterRadii[closestLabel]) | (DistToClosest < maxPamDist) ) { Colors[InCluster] = closestLabel; nPAMed = nPAMed + length(InCluster); } else Colors[InCluster] = -1; # This prevents individual points from being assigned later } } else Colors[InCluster] = -1; } } # Assign leftover unlabeled objects to clusters with nearest medoids Unlabeled = c(1:nPoints)[Colors==0]; if (length(Unlabeled>0)) for (obj in Unlabeled) { if (pamRespectsDendro) { onBr = onBranch[obj] if (onBr > 0) { basicOnBranch = branch.basicClusters[[onBr]]; labelsOnBranch = branchLabels[basicOnBranch] } else { labelsOnBranch = NULL; } } else { labelsOnBranch = c(1:nProperLabels) } if (!is.null(labelsOnBranch)) { UnassdToMedoidDist = distM[Medoids[labelsOnBranch], obj]; nearest= which.min(UnassdToMedoidDist) NearestCenterDist = UnassdToMedoidDist[nearest]; nearestMed = labelsOnBranch[nearest] if ( (NearestCenterDist < ClusterRadii[nearestMed]) | (NearestCenterDist < maxPamDist)) { Colors[obj] = nearestMed; nPAMed = nPAMed + 1; } } } UnlabeledExist = (sum(Colors==0)>0); } else # Instead of medoids, use average distances { # This is the default method, so I will try to tune it for speed a bit. ClusterDiam = rep(0, times = nProperLabels); for (cluster in 1:nProperLabels) { InCluster = c(1:nPoints)[Colors==cluster]; nInCluster = length(InCluster) DistInCluster = distM[InCluster, InCluster]; if (nInCluster>1) { AveDistInClust = colSums(DistInCluster)/(nInCluster-1); ClusterDiam[cluster] = max(AveDistInClust); } else { ClusterDiam[cluster] = 0; } } # If small clusters are respected, assign them first based on average cluster-cluster distances. ColorsX = Colors; if (respectSmallClusters) { FSmallLabels = factor(SmallLabels); SmallLabLevs = as.numeric(levels(FSmallLabels)); nSmallClusters = nlevels(FSmallLabels) - (SmallLabLevs[1]==0); if (nSmallClusters>0) { if (pamRespectsDendro) { for (sclust in SmallLabLevs[SmallLabLevs!=0]) { InCluster = c(1:nPoints)[SmallLabels==sclust]; onBr = unique(onBranch[InCluster]); if (length(onBr)>1) stop(paste("Internal error: objects in a small cluster are marked to belong", "\nto several large branches:", paste(onBr, collapse = ", "))); if (onBr > 0) { basicOnBranch = branch.basicClusters[[onBr]]; labelsOnBranch = branchLabels[basicOnBranch] useObjects = ColorsX %in% unique(labelsOnBranch) DistSClustClust = distM[InCluster, useObjects, drop = FALSE]; MeanDist = colMeans(DistSClustClust); useColorsFac = factor(ColorsX[useObjects]) MeanMeanDist = tapply(MeanDist, useColorsFac, mean); nearest = which.min(MeanMeanDist); NearestDist = MeanMeanDist[nearest]; nearestLabel = as.numeric(levels(useColorsFac)[nearest]) if ( ((NearestDist < ClusterDiam[nearestLabel]) | (NearestDist < maxPamDist)) ) { Colors[InCluster] = nearestLabel; nPAMed = nPAMed + length(InCluster); } else Colors[InCluster] = -1; # This prevents individual points from being assigned later } } } else { labelsOnBranch = c(1:nProperLabels) useObjects = c(1:nPoints)[ColorsX!=0]; for (sclust in SmallLabLevs[SmallLabLevs!=0]) { InCluster = c(1:nPoints)[SmallLabels==sclust]; DistSClustClust = distM[InCluster, useObjects, drop = FALSE]; MeanDist = colMeans(DistSClustClust); useColorsFac = factor(ColorsX[useObjects]) MeanMeanDist = tapply(MeanDist, useColorsFac, mean); nearest = which.min(MeanMeanDist); NearestDist = MeanMeanDist[nearest]; nearestLabel = as.numeric(levels(useColorsFac)[nearest]) if ( ((NearestDist < ClusterDiam[nearestLabel]) | (NearestDist < maxPamDist)) ) { Colors[InCluster] = nearestLabel; nPAMed = nPAMed + length(InCluster); } else Colors[InCluster] = -1; # This prevents individual points from being assigned later } } } } # Assign leftover unlabeled objects to clusters with nearest medoids Unlabeled = c(1:nPoints)[Colors==0]; #ColorsX = Colors; if (length(Unlabeled)>0) { if (pamRespectsDendro) { unlabOnBranch = Unlabeled[onBranch[Unlabeled] > 0]; for (obj in unlabOnBranch) { onBr = onBranch[obj] basicOnBranch = branch.basicClusters[[onBr]]; labelsOnBranch = branchLabels[basicOnBranch] useObjects = ColorsX %in% unique(labelsOnBranch); useColorsFac = factor(ColorsX[useObjects]) UnassdToClustDist = tapply(distM[useObjects, obj], useColorsFac, mean); nearest = which.min(UnassdToClustDist); NearestClusterDist = UnassdToClustDist[nearest]; nearestLabel = as.numeric(levels(useColorsFac)[nearest]) if ((NearestClusterDist < ClusterDiam[nearestLabel]) | (NearestClusterDist < maxPamDist) ) { Colors[obj] = nearestLabel nPAMed = nPAMed + 1; } } } else { useObjects = c(1:nPoints)[ColorsX !=0]; useColorsFac = factor(ColorsX[useObjects]) nUseColors = nlevels(useColorsFac); UnassdToClustDist = apply(distM[useObjects, Unlabeled, drop = FALSE], 2, tapply, useColorsFac, mean); # Fix dimensions for the case when there's only one cluster dim(UnassdToClustDist) = c(nUseColors, length(Unlabeled)); nearest = apply(UnassdToClustDist, 2, which.min); nearestDist = apply(UnassdToClustDist, 2, min); nearestLabel = as.numeric(levels(useColorsFac)[nearest]) assign = (nearestDist < ClusterDiam[nearestLabel]) | (nearestDist < maxPamDist) Colors[Unlabeled[assign]] = nearestLabel[assign]; nPAMed = nPAMed + sum(assign); } } } if (verbose>2) printFlush(paste(spaces, "....assigned", nPAMed, "objects to existing clusters.")); } # Relabel labels such that 1 corresponds to the largest cluster etc. Colors[Colors<0] = 0; UnlabeledExist = (sum(Colors==0)>0); NumLabs = as.numeric(as.factor(Colors)); Sizes = table(NumLabs); if (UnlabeledExist) { if (length(Sizes)>1) { SizeRank = c(1, rank(-Sizes[2:length(Sizes)], ties.method="first")+1); } else { SizeRank = 1; } OrdNumLabs = SizeRank[NumLabs]; } else { SizeRank = rank(-Sizes[1:length(Sizes)], ties.method="first"); OrdNumLabs = SizeRank[NumLabs]; } ordCoreLabels = OrdNumLabs-UnlabeledExist; ordCoreLabels[coreLabels==0] = 0; if (verbose>0) printFlush(paste(spaces, "..done.")); list(labels = OrdNumLabs-UnlabeledExist, cores = ordCoreLabels, smallLabels = SmallLabels, onBranch = onBranch, mergeDiagnostics = if (nExternalSplits==0) mergeDiagnostics else cbind(mergeDiagnostics, externalMergeDiags), mergeCriteria = list(maxCoreScatter = maxCoreScatter, minGap = minGap, maxAbsCoreScatter = maxAbsCoreScatter, minAbsGap = minAbsGap, minExternalSplit = minExternalSplit), branches = list(nBranches = nBranches, # Branches = Branches, IndMergeToBranch = IndMergeToBranch, RootBranch = RootBranch, isCluster = isCluster, nPoints = nMerge+1)); } dynamicTreeCut/MD50000644000176200001440000000132712670402626013512 0ustar liggesusers162a906c3129568482e8e24c418a79e3 *Changelog 3f116359373cf6c899d9495f6de3c21e *DESCRIPTION 3d6984458495c92134fdc64509225ad2 *NAMESPACE aa64464d938a5b354ea8a9f700001a35 *R/PrintFlush.R 6f3a385245554c0932505e50816184c7 *R/cutreeDynamic.R a45b8aa6b392e776c2e488cd19f92c0a *R/cutreeWrapper.R c71cb23e3a6d5dbe66f2b82cd6847a1e *R/treeCut.R aa44f6066f0690623259b7e7f49c5099 *man/cutreeDynamic.Rd 37b26bc25b47a212b32002b991b92d95 *man/cutreeDynamicTree.Rd 8adffcff5bd149a23e48225a625c3678 *man/cutreeHybrid.Rd 9aa9c0644cf4548f1a59a9687d4e4778 *man/dynamicTreeCut-package.Rd 5cc3a3cfe58b2dbffc36525de940dbc7 *man/indentSpaces.Rd e1cbf06795f531f86668c7edf679aa37 *man/merge2Clusters.Rd 51a28a6303310b9b1bb292520428f547 *man/printFlush.Rd dynamicTreeCut/DESCRIPTION0000644000176200001440000000134612670402626014711 0ustar liggesusersPackage: dynamicTreeCut Version: 1.63-1 Date: 2016-03-10 Title: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms Author: Peter Langfelder and Bin Zhang , with contributions from Steve Horvath Maintainer: Peter Langfelder Depends: R (>= 2.3.0), stats ZipData: no License: GPL (>= 2) Description: Contains methods for detection of clusters in hierarchical clustering dendrograms. URL: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting/ NeedsCompilation: no Packaged: 2016-03-10 18:25:27 UTC; plangfelder Repository: CRAN Date/Publication: 2016-03-11 00:39:02 dynamicTreeCut/man/0000755000176200001440000000000012670336002013744 5ustar liggesusersdynamicTreeCut/man/cutreeDynamicTree.Rd0000644000176200001440000000364612516112255017661 0ustar liggesusers\name{cutreeDynamicTree} \alias{cutreeDynamicTree} \title{Dynamic Dendrogram Pruning Based on Dendrogram Only} \description{ Detect clusters in a hierarchical dendrogram using a variable cut height approach. Uses only the information in the dendrogram itself is used (which may give incorrect assignment for outlying objects). } \usage{ cutreeDynamicTree(dendro, maxTreeHeight = 1, deepSplit = TRUE, minModuleSize = 50) } \arguments{ \item{dendro}{Hierarchical clustering dendrogram such produced by \code{\link[stats]{hclust}}.} \item{maxTreeHeight}{Maximum joining height of objects to be considered part of clusters.} \item{deepSplit}{If \code{TRUE}, method will favor sensitivity and produce more smaller clusters. When \code{FALSE}, there will be fewer bigger clusters.} \item{minModuleSize}{Minimum module size. Branches containing fewer than \code{minModuleSize} objects will be left unlabeled.} } \details{ A variable height branch pruning technique for dendrograms produced by hierarchical clustering. Initially, branches are cut off at the height \code{maxTreeHeight}; the resulting clusters are then examined for substructure and if subclusters are detected, they are assigned separate labels. Subclusters are detected by structure and are required to have a minimum of \code{minModuleSize} objects on them to be assigned a separate label. A rough degree of control over what it means to be a subcluster is implemented by the parameter \code{deepSplit}. } \value{ A vector of numerical labels giving assignment of objects to modules. Unassigned objects are labeled 0, the largest module has label 1, next largest 2 etc. } \references{ \url{http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting} } \author{ Bin Zhang, \email{binzhang.ucla@gmail.com}, with contributions by Peter Langfelder, \email{Peter.Langfelder@gmail.com}.} \seealso{\code{\link[stats]{hclust}}, \code{\link{cutreeHybrid}}} \keyword{misc} dynamicTreeCut/man/cutreeHybrid.Rd0000644000176200001440000002224612516112307016671 0ustar liggesusers\name{cutreeHybrid} \alias{cutreeHybrid} \title{Hybrid Adaptive Tree Cut for Hierarchical Clustering Dendrograms} \description{ Detect clusters in a dendorgram produced by the function \code{hclust}. } \usage{ cutreeHybrid( # Input data: basic tree cutiing dendro, distM, # Branch cut criteria and options cutHeight = NULL, minClusterSize = 20, deepSplit = 1, # Advanced options maxCoreScatter = NULL, minGap = NULL, maxAbsCoreScatter = NULL, minAbsGap = NULL, minSplitHeight = NULL, minAbsSplitHeight = NULL, # External (user-supplied) measure of branch split externalBranchSplitFnc = NULL, minExternalSplit = NULL, externalSplitOptions = list(), externalSplitFncNeedsDistance = NULL, assumeSimpleExternalSpecification = TRUE, # PAM stage options pamStage = TRUE, pamRespectsDendro = TRUE, useMedoids = FALSE, maxPamDist = cutHeight, respectSmallClusters = TRUE, # Various options verbose = 2, indent = 0) } \arguments{ \item{dendro}{a hierarchical clustering dendorgram such as one returned by \code{hclust}. } \item{distM}{Distance matrix that was used as input to \code{hclust}. } \item{cutHeight}{Maximum joining heights that will be considered. It defaults to 99% of the range between the 5th percentile and the maximum of the joining heights on the dendrogram. } \item{minClusterSize}{Minimum cluster size. } \item{deepSplit}{Either logical or integer in the range 0 to 4. Provides a rough control over sensitivity to cluster splitting. The higher the value, the more and smaller clusters will be produced. A finer control can be achieved via \code{maxBranchCor}, \code{minBranchSplit}, \code{maxCoreScatter} and \code{minGap} below.} \item{maxCoreScatter}{ Maximum scatter of the core for a branch to be a cluster, given as the fraction of \code{cutHeight} relative to the 5th percentile of joining heights. See Details. } \item{minGap}{Minimum cluster gap given as the fraction of the difference between \code{cutHeight} and the 5th percentile of joining heights. } \item{maxAbsCoreScatter}{Maximum scatter of the core for a branch to be a cluster given as absolute heights. If given, overrides \code{maxCoreScatter}. } \item{minAbsGap}{ Minimum cluster gap given as absolute height difference. If given, overrides \code{minGap}. } \item{minSplitHeight}{Minimum split height given as the fraction of the difference between \code{cutHeight} and the 5th percentile of joining heights. Branches merging below this height will automatically be merged. Defaults to zero but is used only if \code{minAbsSplitHeight} below is \code{NULL}.} \item{minAbsSplitHeight}{Minimum split height given as an absolute height. Branches merging below this height will automatically be merged. If not given (default), will be determined from \code{minSplitHeight} above.} \item{externalBranchSplitFnc}{Optional function to evaluate split (dissimilarity) between two branches. Either a single function or a list in which each component is a function (see \code{assumeSimpleExternalSpecification} below for how to specify a single function). Each function can be specified by name (a character string) or the actual function object. Each given function must take arguments \code{branch1} and \code{branch2} that specify the indices of objects in the two branches whose dissimilarity is to be evaluated, and possibly other arguments. It must return a number that quantifies the dissimilarity of the two branches. The returned value will be compared to \code{minExternalSplit} (see below). This argument is only used for method "hybrid".} \item{minExternalSplit}{Thresholds to decide whether two branches should be merged. It should be a numeric vector of the same length as the number of functions in \code{externalBranchSplitFnc} above. Only used for method "hybrid". } \item{externalSplitOptions}{Further arguments to function \code{externalBranchSplitFnc}. If only one external function is specified in \code{externalBranchSplitFnc} above, \code{externalSplitOptions} can be a named list of arguments or a list with one component that is in turn the named list of further arguments for \code{externalBranchSplitFnc[[1]]}. The argument \code{assumeSimpleExternalSpecification} controls which of the two possibilities should be assumed. If multiple functions are specified in \code{externalBranchSplitFnc}, \code{externalSplitOptions} must be a list in which each component is a named list giving the further arguments for the corresponding function in \code{externalBranchSplitFnc}. Only used for method "hybrid".} \item{externalSplitFncNeedsDistance}{Optional specification of whether the external branch split functions need the distance matrix as one of their arguments. Either \code{NULL} or a logical vector with one element per branch split function that specifies whether the corresponding branch split function expects the distance matrix as one of its arguments. The default \code{NULL} is interpreted as a vector of \code{TRUE}. When dealing with a large number of objects, setting this argument to \code{FALSE} whenever possible can prevent unnecessary memory utilization.} \item{assumeSimpleExternalSpecification}{Logical: when \code{minExternalSplit} above is a scalar (has length 1), should the function assume a simple specification of \code{externalBranchSplitFnc} and \code{externalSplitOptions}? If \code{TRUE}, \code{externalBranchSplitFnc} is taken as the function specification and \code{externalSplitOptions} the named list of options. This is suitable for simple direct calls of this function. If \code{FALSE}, \code{externalBranchSplitFnc} is assumed to be a list with a single component which specifies the function, and \code{externalSplitOptions} is a list with one component that is in turn the named list of further arguments for \code{externalBranchSplitFnc[[1]]}.} \item{pamStage}{Logical, only used for method "hybrid". If \code{TRUE}, the second (PAM-like) stage will be performed. } \item{pamRespectsDendro}{Logical, only used for method "hybrid". If \code{TRUE}, the PAM stage will respect the dendrogram in the sense an object can be PAM-assigned only to clusters that lie below it on the branch that the object is merged into. See \code{\link[dynamicTreeCut]{cutreeDynamic}} for more details. } \item{useMedoids}{if TRUE, the second stage will be use object to medoid distance; if FALSE, it will use average object to cluster distance. The default (FALSE) is recommended. } \item{maxPamDist}{Maximum object distance to closest cluster that will result in the object assigned to that cluster. Defaults to \code{cutHeight}. } \item{respectSmallClusters}{If TRUE, branches that failed to be clusters in stage 1 only because of insufficient size will be assigned together in stage 2. If FALSE, all objects will be assigned individually.} \item{verbose}{Controls the verbosity of the output. 0 will make the function completely quiet, values up to 4 gradually increase verbosity.} \item{indent}{Controls indentation of printed messages (see \code{verbose} above). Each unit adds two spaces before printed messages; useful when several functions' output is to be nested. } } \details{ The function detects clusters in a hierarchical dendrogram based on the shape of branches on the dendrogram. For details on the method, see \url{http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting}. In order to make the shape parameters \code{maxCoreScatter} and \code{minGap} more universal, their values are interpreted relative to \code{cutHeight} and the 5th percetile of the merging heights (we arbitrarily chose the 5th percetile rather than the minimum for reasons of stability). Thus, the absolute maximum allowable core scatter is calculated as \code{maxCoreScatter * (cutHeight - refHeight) + refHeight} and the absolute minimum allowable gap as \code{minGap * (cutHeight - refHeight)}, where \code{refHeight} is the 5th percentile of the merging heights. } \value{ A list containg the following elements: \item{labels}{Numerical labels of clusters, with 0 meaning unassigned, label 1 labeling the largest cluster etc. } \item{cores}{Numerical labels indicating cores of found clusters. } \item{smallLabels}{Numerical labels for branches that failed to be recognized clusters only because of insufficient number of objects.} \item{mergeDiagnostics}{A data.frame with one row per merge in the input dendrogram. The columns give the values of the various merging criteria used by the algorithm. Missing data indicate that at least one of the "branches" merged was actually a singleton (single node) and hence the branch merging was automatic. } \item{mergeCriteria}{Values of the merging thresholds. Either a copy of the corresponding input thresholds or values determined by \code{deepSplit}. } \item{branches}{A list detailing the deteced branch structure. } } \references{ Langfelder P, Zhang B, Horvath S, 2007. \url{http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting} } \author{ Peter Langfelder, \email{Peter.Langfelder@gmail.com} } \seealso{ \code{\link[stats]{hclust}}, \code{\link[stats]{as.dist}} } \keyword{cluster} dynamicTreeCut/man/printFlush.Rd0000644000176200001440000000131212516112401016360 0ustar liggesusers\name{printFlush} \alias{printFlush} \title{Print Arguments and Flush the Console} \description{Passes all its arguments unchaged to the standard \code{print} function; after the execution of print it flushes the console, if possible. } \usage{ printFlush(\dots) } \arguments{ \item{\dots}{Arguments to be passed to the standard \code{\link[base]{print}} function.} } \details{ Passes all its arguments unchaged to the standard \code{print} function; after the execution of print it flushes the console, if possible. } \value{Returns the value of the \code{\link[base]{print}} function.} \author{ Peter Langfelder, \email{Peter.Langfelder@gmail.com} } \seealso{ \code{\link[base]{print}} } \keyword{print} dynamicTreeCut/man/merge2Clusters.Rd0000644000176200001440000000167412516112364017153 0ustar liggesusers\name{merge2Clusters} \Rdversion{1.1} \alias{merge2Clusters} \title{ Merge Two Clusters } \description{ Merge 2 clusters into 1. } \usage{ merge2Clusters(labels, mainClusterLabel, minorClusterLabel) } \arguments{ \item{labels}{ a vector or factor giving the cluster labels } \item{mainClusterLabel}{ label of the first merged cluster. The merged cluster will have this label. } \item{minorClusterLabel}{ label of the second merged cluster. } } \value{ A vector or factor of the merged labels. } \author{ Bin Zhang and Peter Langfelder } \examples{ options(stringsAsFactors = FALSE); # Works with character labels: labels = c(rep("grey", 5), rep("blue", 2), rep("red", 3)) merge2Clusters(labels, "blue", "red") # Works with factor labels: labelsF = factor(labels) merge2Clusters(labelsF, "blue", "red") # Works also with numeric labels: labelsN = as.numeric(factor(labels)) labelsN merge2Clusters(labelsF, 1, 3) } \keyword{ misc } dynamicTreeCut/man/cutreeDynamic.Rd0000644000176200001440000002250512516112234017031 0ustar liggesusers\name{cutreeDynamic} \alias{cutreeDynamic} \title{Adaptive Branch Pruning of Hierarchical Clustering Dendrograms} \description{ This wrapper provides a common access point for two methods of adaptive branch pruning of hierarchical clustering dendrograms. } \usage{ cutreeDynamic( dendro, cutHeight = NULL, minClusterSize = 20, # Basic tree cut options method = "hybrid", distM = NULL, deepSplit = (ifelse(method=="hybrid", 1, FALSE)), # Advanced options maxCoreScatter = NULL, minGap = NULL, maxAbsCoreScatter = NULL, minAbsGap = NULL, minSplitHeight = NULL, minAbsSplitHeight = NULL, # External (user-supplied) measure of branch split externalBranchSplitFnc = NULL, minExternalSplit = NULL, externalSplitOptions = list(), externalSplitFncNeedsDistance = NULL, assumeSimpleExternalSpecification = TRUE, # PAM stage options pamStage = TRUE, pamRespectsDendro = TRUE, useMedoids = FALSE, maxDistToLabel = NULL, maxPamDist = cutHeight, respectSmallClusters = TRUE, # Various options verbose = 2, indent = 0) } \arguments{ \item{dendro}{A hierarchical clustering dendorgram such as one returned by \code{hclust}. } \item{cutHeight}{Maximum joining heights that will be considered. For \code{method=="tree"} it defaults to 0.99. For \code{method=="hybrid"} it defaults to 99\% of the range between the 5th percentile and the maximum of the joining heights on the dendrogram.} \item{minClusterSize}{Minimum cluster size. } \item{method}{Chooses the method to use. Recognized values are "hybrid" and "tree". } \item{distM}{Only used for method "hybrid". The distance matrix used as input to \code{hclust}. If not given and \code{method == "hybrid"}, the function will issue a warning and default to \code{method = "tree"}.} \item{deepSplit}{For method "hybrid", can be either logical or integer in the range 0 to 4. For method "tree", must be logical. In both cases, provides a rough control over sensitivity to cluster splitting. The higher the value (or if \code{TRUE}), the more and smaller clusters will be produced. For the "hybrid" method, a finer control can be achieved via \code{maxCoreScatter} and \code{minGap} below.} \item{maxCoreScatter}{Only used for method "hybrid". Maximum scatter of the core for a branch to be a cluster, given as the fraction of \code{cutHeight} relative to the 5th percentile of joining heights. See Details. } \item{minGap}{Only used for method "hybrid". Minimum cluster gap given as the fraction of the difference between \code{cutHeight} and the 5th percentile of joining heights. } \item{maxAbsCoreScatter}{Only used for method "hybrid". Maximum scatter of the core for a branch to be a cluster given as absolute heights. If given, overrides \code{maxCoreScatter}. } \item{minAbsGap}{Only used for method "hybrid". Minimum cluster gap given as absolute height difference. If given, overrides \code{minGap}. } \item{minSplitHeight}{Minimum split height given as the fraction of the difference between \code{cutHeight} and the 5th percentile of joining heights. Branches merging below this height will automatically be merged. Defaults to zero but is used only if \code{minAbsSplitHeight} below is \code{NULL}.} \item{minAbsSplitHeight}{Minimum split height given as an absolute height. Branches merging below this height will automatically be merged. If not given (default), will be determined from \code{minSplitHeight} above.} \item{externalBranchSplitFnc}{Optional function to evaluate split (dissimilarity) between two branches. Either a single function or a list in which each component is a function (see \code{assumeSimpleExternalSpecification} below for how to specify a single function). Each function can be specified by name (a character string) or the actual function object. Each given function must take arguments \code{branch1} and \code{branch2} that specify the indices of objects in the two branches whose dissimilarity is to be evaluated, and possibly other arguments. It must return a number that quantifies the dissimilarity of the two branches. The returned value will be compared to \code{minExternalSplit} (see below). This argument is only used for method "hybrid".} \item{minExternalSplit}{Thresholds to decide whether two branches should be merged. It should be a numeric vector of the same length as the number of functions in \code{externalBranchSplitFnc} above. Only used for method "hybrid". } \item{externalSplitOptions}{Further arguments to function \code{externalBranchSplitFnc}. If only one external function is specified in \code{externalBranchSplitFnc} above, \code{externalSplitOptions} can be a named list of arguments or a list with one component that is in turn the named list of further arguments for \code{externalBranchSplitFnc[[1]]}. The argument \code{assumeSimpleExternalSpecification} controls which of the two possibilities should be assumed. If multiple functions are specified in \code{externalBranchSplitFnc}, \code{externalSplitOptions} must be a list in which each component is a named list giving the further arguments for the corresponding function in \code{externalBranchSplitFnc}. Only used for method "hybrid".} \item{externalSplitFncNeedsDistance}{Optional specification of whether the external branch split functions need the distance matrix as one of their arguments. Either \code{NULL} or a logical vector with one element per branch split function that specifies whether the corresponding branch split function expects the distance matrix as one of its arguments. The default \code{NULL} is interpreted as a vector of \code{TRUE}. When dealing with a large number of objects, setting this argument to \code{FALSE} whenever possible can prevent unnecessary memory utilization.} \item{assumeSimpleExternalSpecification}{Logical: when \code{minExternalSplit} above is a scalar (has length 1), should the function assume a simple specification of \code{externalBranchSplitFnc} and \code{externalSplitOptions}? If \code{TRUE}, \code{externalBranchSplitFnc} is taken as the function specification and \code{externalSplitOptions} the named list of options. This is suitable for simple direct calls of this function. If \code{FALSE}, \code{externalBranchSplitFnc} is assumed to be a list with a single component which specifies the function, and \code{externalSplitOptions} is a list with one component that is in turn the named list of further arguments for \code{externalBranchSplitFnc[[1]]}.} \item{pamStage}{Only used for method "hybrid". If TRUE, the second (PAM-like) stage will be performed. } \item{pamRespectsDendro}{Logical, only used for method "hybrid". If \code{TRUE}, the PAM stage will respect the dendrogram in the sense that objects and small clusters will only be assigned to clusters that belong to the same branch that the objects or small clusters being assigned belong to. } \item{useMedoids}{Only used for method "hybrid" and only if \code{labelUnlabeled==TRUE}. If TRUE, the second stage will be use object to medoid distance; if FALSE, it will use average object to cluster distance. The default (FALSE) is recommended. } \item{maxDistToLabel}{Deprecated, use \code{maxPamDist} instead. Only used for method "hybrid" and only if \code{labelUnlabeled==TRUE}. Maximum object distance to closest cluster that will result in the object assigned to that cluster. } \item{maxPamDist}{Only used for method "hybrid" and only if \code{labelUnlabeled==TRUE}. Maximum object distance to closest cluster that will result in the object assigned to that cluster. Defaults to \code{cutHeight}. } \item{respectSmallClusters}{Only used for method "hybrid" and only if \code{labelUnlabeled==TRUE}. If TRUE, branches that failed to be clusters in stage 1 only because of insufficient size will be assigned together in stage 2. If FALSE, all objects will be assigned individually.} \item{verbose}{Controls the verbosity of the output. 0 will make the function completely quiet, values up to 4 gradually increase verbosity.} \item{indent}{Controls indentation of printed messages (see \code{verbose} above). Each unit adds two spaces before printed messages; useful when several functions' output is to be nested. } } \details{ This is a wrapper for two related but different methods for cluster detection in hierarchical clustering dendrograms. In order to make the shape parameters \code{maxCoreScatter} and \code{minGap} more universal, their values are interpreted relative to \code{cutHeight} and the 5th percetile of the merging heights (we arbitrarily chose the 5th percetile rather than the minimum for reasons of stability). Thus, the absolute maximum allowable core scatter is calculated as \code{maxCoreScatter * (cutHeight - refHeight) + refHeight} and the absolute minimum allowable gap as \code{minGap * (cutHeight - refHeight)}, where \code{refHeight} is the 5th percentile of the merging heights. } \value{ A vector of numerical labels giving assignment of objects to modules. Unassigned objects are labeled 0, the largest module has label 1, next largest 2 etc. } \references{ Langfelder P, Zhang B, Horvath S, 2007. \url{http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting} } \author{ Peter Langfelder, \email{Peter.Langfelder@gmail.com} } \seealso{ \code{\link{hclust}}, \code{\link{cutreeHybrid}}, \code{\link{cutreeDynamicTree}}. } \keyword{cluster} dynamicTreeCut/man/dynamicTreeCut-package.Rd0000644000176200001440000000264312670336002020551 0ustar liggesusers\name{dynamicTreeCut-package} \alias{dynamicTreeCut-package} \alias{dynamicTreeCut} \docType{package} \title{ Methods for Detection of Clusters in Hierarchical Clustering Dendrograms } \description{ Contains methods for detection of clusters in hierarchical clustering dendrograms. } \details{ \tabular{ll}{ Package: \tab dynamicTreeCut\cr Version: \tab 1.63-1\cr Date: \tab 2016-03-10\cr Depends: \tab R, stats\cr ZipData: \tab no\cr License: \tab GPL version 2 or newer\cr URL: \tab http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting/\cr } Index: \preformatted{ cutreeDynamic Adaptive branch pruning of hierarchical clustering dendrograms. cutreeDynamicTree Dynamic dendrogram pruning based on dendrogram only cutreeHybrid Hybrid adaptive tree cut for hierarchical clustering dendrograms. indentSpaces Spaces for indented output. merge2Clusters Merge two clusters printFlush Print arguments and flush the console. treecut-package Methods for detection of clusters in hierarchical clustering dendrograms. } } \author{ Peter Langfelder and Bin Zhang , with contributions from Steve Horvath Maintainer: Peter Langfelder } \keyword{ package } dynamicTreeCut/man/indentSpaces.Rd0000644000176200001440000000120412516112351016646 0ustar liggesusers\name{indentSpaces} \alias{indentSpaces} \title{Spaces for Indented Output} \description{Returns a character string containing two times \code{indent} spaces. } \usage{ indentSpaces(indent = 0) } \arguments{ \item{indent}{Desired level of indentation. The number of returned spaces will be twice this argument.} } \value{A character string containing spaces, of length twice \code{indent}. } \author{ Peter Langfelder, \email{Peter.Langfelder@gmail.com} } \examples{ spaces = indentSpaces(0); print(paste(spaces, "This output is not indented...")); spaces = indentSpaces(1); print(paste(spaces, "...while this one is.")) } \keyword{print}