RobustRankAggreg/0000755000176200001440000000000014316541412013462 5ustar liggesusersRobustRankAggreg/NAMESPACE0000644000176200001440000000035614263077661014720 0ustar liggesusers# Generated by roxygen2: do not edit by hand export(aggregateRanks) export(betaScores) export(rankMatrix) export(rhoScores) import(methods) importFrom(stats,median) importFrom(stats,pbeta) importFrom(stats,pnorm) importFrom(stats,qbeta) RobustRankAggreg/data/0000755000176200001440000000000013035053335014372 5ustar liggesusersRobustRankAggreg/data/cellCycleKO.RData0000755000176200001440000003130511622437254017453 0ustar liggesusersˎ4KVsZ(*A#ůDFx\2"C fR0o# ހQOy~y2<PFFk=O~7ۏ~~E?CϷ?o2~<α?}ܗܛCo>[6>߼cj}+!ߞmz77K]KU-kf{{47ۜ/2(Us5m]~ÿϗS 3z/JfE)}|(6_GZ|XR !HmǞuR{Iz4ǡߌRwj"~D~v $^W+Ew[ނ\73;=eں{KxN]eeĆ~Uh[QǺ!LQ߹qHG~\΄ߌņGqZ6;}5zV5Xz>ԗw.:YSN2Dorl vW=1{>x;Pb,~g$1<Ǧ; WW+ ~F^yipZ_HٯVkYC)Ongև:M8KIy"#gv[pĖ= z_wVJss5x]t"imG6UCK?40a?P e|0RJvpp|ZJ ȁFϐ d~" }q->Z\nq}`n*WT#=En-|FnVG̓I|WCY`>mcv8 ri×衮v^h)[jH.x_|Yat~E,}~o]`A<>‰yB=OCk䌗?:vzS5[1#ܹ[/BVo".\1$NżH?>њ?ShݽG_Jjee۝u ~E-i4`;TxFFf|Rr$ˮHT sI"9F7kΉ(閫YɟMN>x\!vοLn)y?ym6y@΍L2&Y988~so<_=73 3>{_aR"ofD7v xo_7i˜ά+9=2Az wqN;RY קּAH}PBCdnkU7H&Xɞʼn1g%! Y2ztߺlvL`2q7Hh j:fj7f)kԇ ވ4p@{o(3rnLsObznE m0P-<'eV;x_m8l"D@@,ME]g@ | qJ ղ-3σ{5Jֿ,$ Lʙ~T؝#K8}hn"h2c~~˄ذ u}=bօч(Bh`V3wt`gT?9n&n;ԅep#?̂<4;ƃb&<ί5'N2OaEa(UP.Opǁ EOk!G)ps/Wy@'4TKB錏mH,ٵ;F ;kh<_Ϻn..5z޾(XuV[v[vc/;曬A' 'p/p{#C bлX͂ h7KڰxۃՕg.ё__5rkcq-viZ/|6l3q#x$ho0G!#=!a Bzy$RrȜ`a($&U=?;k7ȨFr;֠#(%CB~@0Iۈ(`ep+sH@6B!͖C⒰r\ 0vm@h L`eGhݪ[H&a$āMA-d'S`,U#Ig+Ƒ8bf$Ć=g^cI_Ii΃5O\}|(.~Z& km,2*bXECBC!D'taAZ %%%i4'!30E.3ՐFYLHNGc*~WhHGyN ٲ*½z 遥'AlK8J"&U _zDDh5 /֭As_Y R|P/WO+ ޲X }cgo0 ˥`8ނgmW%QV 9`'V $c^YhM`Ҍ x ,۱/#<eD[t0ݎT}ўWecq,B#%7$HO\kAAid =9tZq٣X֡FRs"BS_]^7t}ϘOtzJs1:0,`On-vD,y"o|0I#|אٛ_xؖ-0k^i6 rHc>u&O!_ ^,P3Bs1+k`gtKlxʑ4Ji2̨*P-'rd'Ηm:+V?ARgCn,`ȿYt&~56vpG]GK_YX$kV!y }0lcyzv) KӌƳy wjg)iز8vd>R8/^u DsرW"O?n~So?e3£&m6{cX*'̰[+|D߻Q 3{پÚ|0 FD.tiґ1&\+)X3(wdmt<5w>7LV T|ϐD):d2(,nOGh]$жze4-ic5a[ӝtO`[`iP9Gz⛏;|-!vނ _ϭ-HRLwLGl=|dKXP9qY93C|a@, a{> |"Ct dP+DQKiuB n-+Ŋҕil3&"PYu*!P1H0Ja`)Op="t5ATYO41h #͖#sgQGb60=ilM("DdDG;V}-UY1ָž)H0HPhveeP*S2L ś )(:t+j_!dGP)хr1pF4Z#`ڤkrJ. ppS1'3 ?D'"Ss'EtDh@Ĉ‡"G2;PqpNZ;>` TϷġ'TJ_DM֋lY(y HQ=w&V ;#MIܑ')L}t[ƼlCІA֪T I聁Оp`f:<Y'ejPAYfAX0ȊoYbIꐖ"z%esCj9QDrk@|Uʢk)T#3MJ n\UѴ~!8@H!4~OT09Ɗ,;ܠ{IԪ"u4r$ŗE^FYR#Svp~;[i}T9W~ՆclMaAF)/1[[ L H03࿋䣝jzCj6blxػ TP^FM8|HZ>4(ڛ"M7@u=32y`a=^c|| [i=O-o)E"TЗ% ضlaFb I8φ@">R~1! W wi.Vk$ҙ–M-Ĥa%T ,&VӼ R=&P'fAoy̻vDq+6~bD[D! DArxƵ>QC!2@ KyBm/NnOlN^ވ ֊lA{wU'}ek!;ĺ>~$,^DTx6\>0 1dCt4¿`F=)Tf'6[Ds`6'Y03J(}̈"UGA&r=V!OTG+ @kgJ-KYBd2zz}5R:-,"  G&XZֹ?wg9=&bY%#BKd2=9rUYə,1w"Vwe. j ^)Npb Gr~#zyr l{^ERB)d/W@_ Z!8#ڡy oL,Ek*2#ÂEF՟OidI Xd6RۇxV˵H7g)N2s`C<(O/>gDv,#ގK)υ-G,v 5,8Q-iI8y2n6C 魼$YV-8+^xTgGL3ǪFHHئKC9,rN dAϾ_ljG͈L^=Π.Q͖y+, Ymae$ # %'d\&z=)[&F؉ 2!30ths itȭ3VgT970m﬍+۴#I[Tl"CU(+ n!kbGBWYX+~-X5RI%@-QYJڠ-b0iW8 3ct<7dq Oc#0FXU_GL ِ`-Y(§u$:%CM& Wޖ wL1"al+E>ћ ZAv )e^Vqa)? jBeXTc6?&!K|SabDEopfSVryQ8d#@ʰ #zj|]73QBH$%FyJت}M"?&:$֡sёB5E@5%ܳ0җViVDy⠝OEӆBW}(vUYqj~P愈[A`3[AFwGs;a5R0z<ڸbZOE{<HE(dW=GOEB*[B~[!=]8;+cD`(BHpɖlC2ZxUE:GC_S9ub( g&ѷX3,AWsuYen_ EE?RAx#@(2cIJ&Hs׽HDTUf⠚4L#X ](w! #و(S`2.Q>8]V5/ ݑk6A7ɨt-RYe *WCߝ—w1hqL,LdG*c J2}^ *<N4UH=5$Uׂg(3~VxÍPr Zy2q‹ޒF6t0XgEӆ-Výca[oף2mJMcB+q$ eb{#ҽ~وg%8G׃SG~҃Pռkw}NlT'Iۭ^ϩozIWᙵ~?oI:[y~3 U^5)Rv 﫼؎⺞xjp dg?/ 2+Y~({?e867%܃ ӡ^vB |j|#I]Xo\t4*`p\@TXݽr$N\o7_v3._z6rl)?oH+q`οy-+s.3؈ó;K#߻/ 7 -)#9FU|N6RTE1L\noވ5L,N;baa@ECy߄36{|"qe,%sPpdb@]*1[?j4a"#JHơ|,05<,^ 4#^'3Y|H4D#Wޤ:[0Yo.Қ(cBgrU Omhָ5.#6(' i'XIf71n 87Nv:Z}Xu~zcPs'LY Q<*HN U\0 OQmҒB#Ұ&"uGR}.ĝ0\̵L4[rlِEўmyG Bg;嫄#w/d2e!4qD96BȺ20}iaSA0/L#!NʝMg2QMzA*@E”ųJNAd|xF/,\X!6^>c]4>9woNM|ҿ(O-k@4ْ-`v d/fKtP೤WD ~c ',_ީw3x}J#TL?ÃdTK,#?[iṺgK ao|axK܌pU$w/)+o*PD!jS~ @G@0#]RO:h+ew.R'nHSx#}n_8iYxFe &`WW*dDa>6 ւ&n3pT06H#m^V#9e9 22{.jm_3&*v!YPn #MMiYm m 6~م7c53X!󞨡QzWY;dDTUy2U3AL8/֧u;f(V˂+݉%8AOTEj~e$nh-4.NEz#o/-9'W F(AS˦#سeYכG\\q`Ɖ^aL\ߕbE!QȞWz,18ff?D(B l x`AP(+)fPX䑞39og{:ȃJ$*R#@T'Y2kOt )M1.ӥ]hDKNԤYDLet,'Edxf*OBODK h!cGv`( <})'b0']"!r3xeRCQY L#'*pVdXљ|UKcLjI 9h~_I"x&=bWD.4_^aL: 4?g=~,i{9`$kG%NEFlXeށN[5( ,RPFJ#ntR͌wb:@^A)#oGž ѱ781:11^(c=dB 8ɂÑ)}yeࡔ8(YJ6&8v/T@~WB7x$6x4dCs!Ɛ^YdHxJ~R HPB> $'J$&!1$֒D" R(hoe&19.K|(PaӜ[8 pL㩚A'eŐIH*-s)(N~5 ,^ë 4Z-r9mo5a"  Tܒ!QBR}'m-hdL{ :d셧Hv+ԋaJ%J<F#./aa#\JT c5$+¼|pc}??{|?o_?ǿmv?7?<>׿<q??=?];'vLϟK-xJW"Z0TqmH)SJeFOoXݬF0Q-w|[%s_-'rMeTdq%nҜ4ѺƲĎL_; IjQ^]O؞Vs: [~}j$-οL<ٱHwqVߎF;#O -LS%#2mR6P2ȲmD QjdZ0hH dP$([ T^Rڧڈra~ yd$.2;o7ycXF,KփeOqK=L;O&󊟨q?!5]tOgr*轲n]I|W܄LeӜp h!2U:̥,X{I:;QCQz*/ʶ$%D7l> ZniVa{#-s a } RobustRankAggreg/man/cellCycleKO.Rd0000644000176200001440000000246414307620624016666 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/aggregateRanks.R \docType{data} \name{cellCycleKO} \alias{cellCycleKO} \title{A dataset based on Reimand \emph{et al} and Hu \emph{et al}.} \description{ The dataset contains lists yeast genes that were most influenced by 12 cell cycle related transcription factor knockouts. The dataset is a list with 3 slots \enumerate{ \item \code{gl} - set of gene lists in a format suitable for \code{\link{aggregateRanks}}; \item \code{N} - number of yeast genes; \item \code{ref} - reference list of cell cycle related genes taken from de Lichtenberg \emph{et al}. } } \references{ Reimand, J., Vaquerizas, J. M., Todd, A. E., Vilo, J., and Luscombe, N. M. (2010). "Comprehensive reanalysis of transcription factor knockout expression data in saccharomyces cerevisiae reveals many new targets. Nucleic Acids Res." Hu, Z., Killion, P. J., and Iyer, V. R. (2007). "Genetic reconstruction of a functional transcriptional regulatory network." Nat. Genet., 39(5), 683-7 de Lichtenberg, U., Jensen, L. J., Fausboll, A., Jensen, T. S., Bork, P., and Brunak, S. (2005). "Comparison of computational methods for the identification of cell cycle- regulated genes. Bioinformatics, 21(7), 1164-71." } \author{ Raivo Kolde } \keyword{data} RobustRankAggreg/man/aggregateRanks.Rd0000644000176200001440000001123114262737724017464 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/aggregateRanks.R \name{aggregateRanks} \alias{aggregateRanks} \alias{RobustRankAggreg} \title{Aggregate ranked lists} \usage{ aggregateRanks( glist, rmat = rankMatrix(glist, N, full = full), N = NA, method = "RRA", full = FALSE, exact = FALSE, topCutoff = NA ) } \arguments{ \item{glist}{list of element vectors, the order of the vectors is used as the ranking.} \item{rmat}{the rankings in matrix format. The glist is by default converted to this format.} \item{N}{the number of ranked elements, important when using only top-k ranks, by default it is calculated as the number of unique elements in the input.} \item{method}{rank aggregation method, by default \code{'RRA'}, other options are \code{'min'}, \code{'geom.mean'}, \code{'mean'}, \code{'median'} and \code{'stuart'}} \item{full}{indicates if the full rankings are given, used if the the sets of ranked elements do not match perfectly} \item{exact}{indicator showing if exact p-value will be calculated based on rho score (Default: if number of lists smaller than 10, exact is used)} \item{topCutoff}{a vector of cutoff values used to limit the number of elements in the input lists elements do not match perfectly} } \value{ Returns a two column dataframe with the element names and associated scores or p-values. } \description{ Method implementing various gene list aggregation methods, most notably Robust Rank Aggregation. } \details{ All the methods implemented in this function make an assumtion that the number of ranked items is known. This assumption is satisfied for example in the case of gene lists (number of all genes known to certain extent), but not when aggregating results from google searches (there are too many web pages). This parameter N can be set manually and has strong influence on the end result. The p-values from RRA algorithm can be trusted only if N is close to the real value. The rankings can be either full or partial. Tests with the RRA algorithm show that one does not lose too much information if only top-k rankings are used. The missing values are assumed to be equal to maximal value and that way taken into account appropriately. The function can handle also the case when elements of the different rankings do not overlap perfectly. For example if we combine results from different microarray platforms with varying coverage. In this case these structurally missing values are substituted with NA-s and handled differently than omitted parts of the rankings. The function accepts as an input either list of rankings or rank matrix based on them. It converts the list to rank matrix automatically using the function \code{\link{rankMatrix}}. For most cases the ranking list is more convenient. Only in complicated cases, for example with top-k lists and structural missing values one would like to construct the rank matrix manually. When the number of top elements included into input is specified in advance, for example some lists are limited to 100 elements, and the lengths of these lists differ significantly, we can use more sensitive and accurate algorithm for the score calculation. Then one has to specify in the input also the parameter topCutoff, which is a vector defining an cutoff value for each input list. For example if we have three lists of 1000 elements but first is limited to 100, second 200 and third to 900 elements, then the topCutoff parameter should be c(0.1, 0.2, 0.9). } \examples{ # Make sample input data glist <- list(sample(letters, 4), sample(letters, 10), sample(letters, 12)) # Aggregate the inputs aggregateRanks(glist = glist, N = length(letters)) aggregateRanks(glist = glist, N = length(letters), method = "stuart") # Since we know the cutoffs for the lists in advance (4, 10, 12) we can use # the more accurate algorithm with parameter topCutoff # Use the rank matrix instead of the gene lists as the input r = rankMatrix(glist) aggregateRanks(rmat = r) # Example, when the input lists represent full rankings but the domains do not match glist <- list(sample(letters[4:24]), sample(letters[2:22]), sample(letters[1:20])) r = rankMatrix(glist, full = TRUE) head(r) aggregateRanks(rmat = r, method = "RRA") # Dataset representing significantly changed genes after knockouts # of cell cycle specific trancription factors data(cellCycleKO) r = rankMatrix(cellCycleKO$gl, N = cellCycleKO$N) ar = aggregateRanks(rmat = r) head(ar) } \references{ Raivo Kolde, Sven Laur, Priit Adler, Jaak Vilo, Robust rank aggregation for gene list integration and meta-analysis, Bioinformatics, 2012,, https://doi.org/10.1093/bioinformatics/btr709 } \author{ Raivo Kolde } RobustRankAggreg/man/rhoScores.Rd0000644000176200001440000000214214307617565016507 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/aggregateRanks.R \name{rhoScores} \alias{rhoScores} \title{Calculate rho scores} \usage{ rhoScores(r, topCutoff = NA, exact = FALSE) } \arguments{ \item{r}{vector of values in [0, 1]} \item{topCutoff}{a vector of cutoff values used to limit the number of elements in the input lists} \item{exact}{indicator if exact p-values should be calculated (Warning: it is computationally unstable and does to give considerable gain)} } \value{ A rho score for the normalized rank vector. } \description{ Calculate Rho score for normalized rank vector } \details{ Takes in a vector with values in [0, 1]. Applies \code{\link{betaScores}} to the vector, takes the minimum of the beta scores and converts it to a valid p-value. } \examples{ rhoScores(c(runif(15))) rhoScores(c(runif(10), rbeta(5, 1, 50))) } \references{ Raivo Kolde, Sven Laur, Priit Adler, Jaak Vilo, Robust rank aggregation for gene list integration and meta-analysis, Bioinformatics, 2012,, https://doi.org/10.1093/bioinformatics/btr709 } \author{ Raivo Kolde } RobustRankAggreg/man/rankMatrix.Rd0000644000176200001440000000266014250171256016652 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/aggregateRanks.R \name{rankMatrix} \alias{rankMatrix} \title{Create rank matrix} \usage{ rankMatrix(glist, N = NA, full = FALSE) } \arguments{ \item{glist}{list of preference lists} \item{N}{number of all rankable elements} \item{full}{logical showing if the given rankings are complete} } \value{ A matrix, with as many columns as input rankings and rows as unique elements in all the rankings combined. } \description{ Convert a set of ranked lists into a rank matrix } \details{ The lists are converted to a format that is used by aggregateRanks. If partial rankings are given to the function, all the missing values are substituted by the maximum rank N, which can be specified manually. This parameter has a very strong influence on the performance of RRA algorithm, therefore it should be reasonably accurate. If the N is different for the gene lists, it can be also given as a vector. Parameter full is used, when full rankings are given, but the sets of ranked elements do not match perfectly. Then the structurally missing values are substituted with NA-s. } \examples{ # Make sample input data glist <- list(sample(letters, 4), sample(letters, 10), sample(letters, 12)) r = rankMatrix(glist) r = rankMatrix(glist, full = TRUE) # Use real data data(cellCycleKO) r = rankMatrix(cellCycleKO$gl, N = cellCycleKO$N) } \author{ Raivo Kolde \email{rkolde@gmail.com} } RobustRankAggreg/DESCRIPTION0000755000176200001440000000230514316541412015173 0ustar liggesusersPackage: RobustRankAggreg Type: Package Title: Methods for Robust Rank Aggregation Version: 1.2.1 Date: 2022-09-12 Authors@R: c( person("Raivo", "Kolde", role = c("aut", "cre"), email = "rkolde@gmail.com", comment = c(ORCID = "0000-0003-2886-6298")), person("Sven", "Laur", role = "ctb", comment = c(ORCID = "0000-0002-9891-3347")) ) Maintainer: Raivo Kolde Description: Methods for aggregating ranked lists, especially lists of genes. It implements the Robust Rank Aggregation Kolde et al (2012) and some other simple algorithms for the task. RRA method uses a probabilistic model for aggregation that is robust to noise and also facilitates the calculation of significance probabilities for all the elements in the final ranking. License: GPL-2 LazyLoad: yes Depends: methods Collate: 'aggregateRanks.R' RoxygenNote: 7.2.1 Encoding: UTF-8 NeedsCompilation: no Packaged: 2022-09-12 12:15:36 UTC; raivokolde Author: Raivo Kolde [aut, cre] (), Sven Laur [ctb] () Repository: CRAN Date/Publication: 2022-10-03 11:10:02 UTC RobustRankAggreg/R/0000755000176200001440000000000013035166765013676 5ustar liggesusersRobustRankAggreg/R/aggregateRanks.R0000755000176200001440000003470214307620616016750 0ustar liggesusers #' Create rank matrix #' #' Convert a set of ranked lists into a rank matrix #' #' The lists are converted to a format that is used by aggregateRanks. If partial #' rankings are given to the function, all the missing values are substituted by the #' maximum rank N, which can be specified manually. This parameter has a very strong #' influence on the performance of RRA algorithm, therefore it should be reasonably #' accurate. If the N is different for the gene lists, it can be also given as a vector. #' #' Parameter full is used, when full rankings are given, but the sets of ranked elements #' do not match perfectly. Then the structurally missing values are substituted with #' NA-s. #' #' @param glist list of preference lists #' @param N number of all rankable elements #' @param full logical showing if the given rankings are complete #' @return A matrix, with as many columns as input rankings and rows as unique elements #' in all the rankings combined. #' @author Raivo Kolde \email{rkolde@@gmail.com} #' @examples #' # Make sample input data #' glist <- list(sample(letters, 4), sample(letters, 10), sample(letters, 12)) #' #' r = rankMatrix(glist) #' r = rankMatrix(glist, full = TRUE) #' #' # Use real data #' data(cellCycleKO) #' r = rankMatrix(cellCycleKO$gl, N = cellCycleKO$N) #' #' @export rankMatrix <- function(glist, N = NA, full = FALSE){ u = unique(c(glist, recursive = TRUE)) if(all(is.na(N))){ N = length(u) } if(!full){ rmat = matrix(1, nrow = length(u), ncol = length(glist), dimnames = list(u, names(glist))) if(length(N) == 1){ N = rep(N, ncol(rmat)) } } else{ rmat = matrix(NA, nrow = length(u), ncol = length(glist), dimnames = list(u, names(glist))) N = unlist(lapply(glist, length)) } for(i in 1:length(glist)){ rmat[glist[[i]], i] = (1:length(glist[[i]])) / N[i] } return(rmat) } # Output function formatOutput <- function(scores, score.names, ordering = "ascending"){ res = data.frame(Name = score.names, Score = scores) if(ordering == "ascending"){ res = res[order(res$Score), ] } else{ res = res[order(res$Score, decreasing = TRUE), ] } return(res) } # Stuart-Aerts method helper functions sumStuart <- function(v, r){ k = length(v) l_k = 1:k ones = (-1)**(l_k + 1) f = factorial(l_k) p = r ** l_k return(ones %*% (rev(v) * p / f)) } qStuart <- function(r){ N = sum(!is.na(r)) v = rep(1, N + 1) for(k in 1:N){ v[k + 1] = sumStuart(v[1:k], r[N - k + 1]) } return(factorial(N) * v[N + 1]) } stuart <- function(rmat){ rmat <- t(apply(rmat, 1, sort, na.last = TRUE)) return(apply(rmat, 1, qStuart)) } # RRA helper functions #' Calculate beta scores #' #' Calculate the beta scores for normalized rank vector. #' #' Takes in a vector with values in [0, 1]. It sorts the values to get the order #' statistics and calculates p-values for each of the order statistics. These are based #' on their expected distribution under the null hypothesis of uniform distribution. #' #' In RRA algorithm context the inputs are supposed to be normalized ranks. However, #' p-values in general follow the uniform distribution, therefore it can be used with any #' kind of p-value vectors, to see if there are more small values than expected. #' #' The NA values are removed before calculation and all results take into account only #' existing values. #' #' @param r vector of values in [0, 1] #' @return The functions returns a vector of p-values, that correspond to the sorted #' input vector. The NA-s are pushed to the end. #' #' @references Raivo Kolde, Sven Laur, Priit Adler, Jaak Vilo, Robust rank aggregation for gene list integration and meta-analysis, Bioinformatics, 2012,, https://doi.org/10.1093/bioinformatics/btr709 #' #' @author Raivo Kolde #' @examples #' betaScores(c(runif(15))) #' betaScores(c(runif(10), rbeta(5, 1, 50))) #' #' @export betaScores <- function(r){ n <- sum(!is.na(r)) p <- rep(1, n) r <- sort(r, na.last = TRUE) p <- pbeta(r, 1:n, n - 1:n + 1) return(p) } thresholdBetaScore <- function(r, k = seq_along(r), n = length(r), sigma = rep(1,n)){ if(length(sigma) != n) stop("The length of sigma does not match n") if(length(r) != n) stop("The length of pvalues does not match n") if(min(sigma)< 0 || max(sigma) > 1) stop("Elements of sigma are not in the range [0,1]") if(any(!is.na(r) & r > sigma)) stop("Elements of r must be smaller than elements of sigma") x <- sort(r[!is.na(r)]) sigma <- sort(sigma, decreasing = TRUE) beta <- rep(NA, length(k)) for(i in seq_along(k)) { if(k[i] > n) { beta[i] <- 0 next; } if(k[i] > length(x)) { beta[i] <- 1 next; } if(sigma[n] >= x[k[i]]) { beta[i] <- pbeta(x[k[i]], k[i], n + 1 - k[i]) next; } # Non-trivial cases # Find the last element such that sigma[n0] <= x[k[i]] n0 <- which(sigma < x[k[i]])[1] - 1 # Compute beta score vector beta(n,k) for n = n0 and k = 1..k[i] if(n0 == 0) { B <- c(1, rep(0, k[i])) } else if(k[i] > n0) { B <- c(1, pbeta(x[k[i]], 1 : n0, n0 : 1), rep(0, k[i] - n0)) } else { B <- c(1, pbeta(x[k[i]], 1 : k[i], n0 + 1 - c(1 : k[i]))) } # In the following update steps sigma < x[k[i]] z <- sigma[(n0 + 1) : n] for(j in seq_len(n - n0)) { B[2 : (k[i] + 1)] <- (1 -z[j]) * B[2 : (k[i] + 1)] + z[j] * B[1 : k[i]] } beta[i] <- B[k[i]+1] } names(beta) <- k return(beta) } correctBetaPvalues <- function(p, k){ p <- min(p * k, 1) return(p) } correctBetaPvaluesExact <- function(p, k){ rm = 1 - t(sapply(p, qbeta, 1:k, k - 1:k + 1)) res = 1 - stuart(rm) return(res) } #' Calculate rho scores #' #' Calculate Rho score for normalized rank vector #' #' Takes in a vector with values in [0, 1]. Applies \code{\link{betaScores}} to the vector, takes the minimum of the beta scores and converts it to a valid p-value. #' #' @param r vector of values in [0, 1] #' @param topCutoff a vector of cutoff values used to limit the number of elements in the #' input lists #' @param exact indicator if exact p-values should be calculated (Warning: it is computationally unstable and does to give considerable gain) #' @return A rho score for the normalized rank vector. #' @references Raivo Kolde, Sven Laur, Priit Adler, Jaak Vilo, Robust rank aggregation for gene list integration and meta-analysis, Bioinformatics, 2012,, https://doi.org/10.1093/bioinformatics/btr709 #' @author Raivo Kolde #' @examples #' rhoScores(c(runif(15))) #' rhoScores(c(runif(10), rbeta(5, 1, 50))) #' #' @export rhoScores <- function(r, topCutoff = NA, exact = FALSE){ if(is.na(topCutoff[1])){ x <- betaScores(r) } else{ r <- r[!is.na(r)] r[r == 1] <- NA x <- thresholdBetaScore(r, sigma = topCutoff) } if(exact){ rho <- correctBetaPvaluesExact(min(x, na.rm = TRUE), k = sum(!is.na(x))) } else{ rho <- correctBetaPvalues(min(x, na.rm = TRUE), k = sum(!is.na(x))) } return(rho) } # The dynamic algorithm for more accurate BetaScore calculation #' Aggregate ranked lists #' #' Method implementing various gene list aggregation methods, most notably Robust Rank #' Aggregation. #' #' All the methods implemented in this function make an assumtion that the number of #' ranked items is known. This assumption is satisfied for example in the case of #' gene lists (number of all genes known to certain extent), but not when aggregating #' results from google searches (there are too many web pages). This parameter N can be #' set manually and has strong influence on the end result. The p-values from RRA #' algorithm can be trusted only if N is close to the real value. #' #' The rankings can be either full or partial. Tests with the RRA algorithm show that one #' does not lose too much information if only top-k rankings are used. The missing values #' are assumed to be equal to maximal value and that way taken into account #' appropriately. #' #' The function can handle also the case when elements of the different rankings do not #' overlap perfectly. For example if we combine results from different microarray #' platforms with varying coverage. In this case these structurally missing values are #' substituted with NA-s and handled differently than omitted parts of the rankings. #' The function accepts as an input either list of rankings or rank matrix based on them. #' It converts the list to rank matrix automatically using the function #' \code{\link{rankMatrix}}. For most cases the ranking list is more convenient. Only #' in complicated cases, for example with top-k lists and structural missing values one #' would like to construct the rank matrix manually. #' #' When the number of top elements included into input is specified in advance, for #' example some lists are limited to 100 elements, and the lengths of these lists differ #' significantly, we can use more sensitive and accurate algorithm for the score #' calculation. Then one has to specify in the input also the parameter topCutoff, which #' is a vector defining an cutoff value for each input list. For example if we have three #' lists of 1000 elements but first is limited to 100, second 200 and third to 900 #' elements, then the topCutoff parameter should be c(0.1, 0.2, 0.9). #' #' @param glist list of element vectors, the order of the vectors is used as the ranking. #' @param rmat the rankings in matrix format. The glist is by default converted to this #' format. #' @param N the number of ranked elements, important when using only top-k ranks, by #' default it is calculated as the number of unique elements in the input. #' @param method rank aggregation method, by default \code{'RRA'}, other options are #' \code{'min'}, \code{'geom.mean'}, \code{'mean'}, \code{'median'} and \code{'stuart'} #' @param full indicates if the full rankings are given, used if the the sets of ranked #' elements do not match perfectly #' @param exact indicator showing if exact p-value will be calculated based on rho score (Default: if number of lists smaller than 10, exact is used) #' @param topCutoff a vector of cutoff values used to limit the number of elements in the #' input lists #' elements do not match perfectly #' @return Returns a two column dataframe with the element names and associated scores #' or p-values. #' @references Raivo Kolde, Sven Laur, Priit Adler, Jaak Vilo, Robust rank aggregation for gene list integration and meta-analysis, Bioinformatics, 2012,, https://doi.org/10.1093/bioinformatics/btr709 #' @author Raivo Kolde #' @examples #' # Make sample input data #' glist <- list(sample(letters, 4), sample(letters, 10), sample(letters, 12)) #' #' # Aggregate the inputs #' aggregateRanks(glist = glist, N = length(letters)) #' aggregateRanks(glist = glist, N = length(letters), method = "stuart") #' #' # Since we know the cutoffs for the lists in advance (4, 10, 12) we can use #' # the more accurate algorithm with parameter topCutoff #' #' # Use the rank matrix instead of the gene lists as the input #' r = rankMatrix(glist) #' #' aggregateRanks(rmat = r) #' #' # Example, when the input lists represent full rankings but the domains do not match #' glist <- list(sample(letters[4:24]), sample(letters[2:22]), sample(letters[1:20])) #' r = rankMatrix(glist, full = TRUE) #' head(r) #' #' aggregateRanks(rmat = r, method = "RRA") #' #' # Dataset representing significantly changed genes after knockouts #' # of cell cycle specific trancription factors #' data(cellCycleKO) #' r = rankMatrix(cellCycleKO$gl, N = cellCycleKO$N) #' ar = aggregateRanks(rmat = r) #' head(ar) #' #' @aliases RobustRankAggreg #' #' @export aggregateRanks <- function(glist, rmat = rankMatrix(glist, N, full = full), N = NA, method = "RRA", full = FALSE, exact = FALSE, topCutoff = NA){ if(!(method %in% c("mean", "min", "median", "geom.mean", "RRA", "stuart"))){ stop("method should be one of: 'min', 'geom.mean', 'mean', 'median', 'stuart' or 'RRA' ") } if(is.na(N)){ N <- nrow(rmat) } if(is.null(rownames(rmat))){ rownames(rmat) <- 1:nrow(rmat) } if(method == "min"){ a <- apply(rmat, 1, min, na.rm = TRUE) return(formatOutput(scores = a, score.names = names(a), ordering = "ascending")) } if(method == "median"){ a <- apply(rmat, 1, median, na.rm = TRUE) return(formatOutput(scores = a, score.names = names(a), ordering = "ascending")) } if(method == "geom.mean"){ a <- apply(rmat, 1, function(x) exp(mean(log(x), na.rm = TRUE))) return(formatOutput(scores = a, score.names = names(a), ordering = "ascending")) } if(method == "RRA"){ a = apply(rmat, 1, rhoScores, topCutoff = topCutoff, exact = exact) names(a) <- rownames(rmat) return(formatOutput(scores = a, score.names = names(a), ordering = "ascending")) } if(method == "mean"){ a <- apply(rmat, 1, mean, na.rm = TRUE) n <- apply(rmat, 1, function(x) sum(!is.na(x))) b <- pnorm(a, 0.5, sqrt(1/12/n)) return(formatOutput(scores = b, score.names = names(a), ordering = "ascending")) } if(method == "stuart"){ a <- stuart(rmat) return(formatOutput(scores = a, score.names = names(a), ordering = "ascending")) } } # # require(foreach) # space = paste("X", 1:10000, sep = "") # gl = foreach(i = 1:10) %do% {sample(space)[1:1000]} # ar = aggregateRanks(gl, exact = TRUE, N = 10000) # ar2 = aggregateRanks(gl, exact = FALSE, N = 10000) # quartz(); hist(ar[, 2]) #' A dataset based on Reimand \emph{et al} and Hu \emph{et al}. #' #' The dataset contains lists #' yeast genes that were most influenced by 12 cell cycle related transcription factor #' knockouts. #' The dataset is a list with 3 slots #' \enumerate{ #' \item \code{gl} - set of gene lists in a format suitable for #' \code{\link{aggregateRanks}}; #' \item \code{N} - number of yeast genes; #' \item \code{ref} - reference list of cell cycle related genes taken from de #' Lichtenberg \emph{et al}. #' } #' #' @name cellCycleKO #' @docType data #' @author Raivo Kolde #' @references Reimand, J., Vaquerizas, J. M., Todd, A. E., Vilo, J., and Luscombe, N. M. #' (2010). "Comprehensive reanalysis of transcription factor knockout expression data #' in saccharomyces cerevisiae reveals many new targets. Nucleic Acids Res." #' #' Hu, Z., Killion, P. J., and Iyer, V. R. (2007). "Genetic reconstruction of #' a functional transcriptional regulatory network." Nat. Genet., 39(5), 683-7 #' #' de Lichtenberg, U., Jensen, L. J., Fausboll, A., Jensen, T. S., Bork, P., #' and Brunak, S. (2005). "Comparison of computational methods for the identification of #' cell cycle- regulated genes. Bioinformatics, 21(7), 1164-71." #' @keywords data NULL #' @importFrom stats median pbeta pnorm qbeta #' @import methods NULL RobustRankAggreg/MD50000644000176200001440000000072014316541412013771 0ustar liggesusers72376ea96fe00c73a8a07451758c085b *DESCRIPTION 480cec37c71535e272f908bc96290035 *NAMESPACE f029ace52579b8ece4e592e2595677f2 *R/aggregateRanks.R aeeb965cf7a7d5061a87222218dbb1ba *data/cellCycleKO.RData 77abeac2cebf24111009feb0361daad4 *man/aggregateRanks.Rd 7dedc1933a63d69aa5029b7791d45f5b *man/betaScores.Rd 740194a876b3c54c409bb88917e98bf6 *man/cellCycleKO.Rd aef4732ce0686efbd71b8abd949bb192 *man/rankMatrix.Rd 9a99171eac075be8b17955ad7cb7cd5a *man/rhoScores.Rd