DESeq2/DESCRIPTION0000644000175400017540000000222413201712502014322 0ustar00biocbuildbiocbuildPackage: DESeq2 Type: Package Title: Differential gene expression analysis based on the negative binomial distribution Version: 1.18.1 Author: Michael Love, Simon Anders, Wolfgang Huber Maintainer: Michael Love Description: Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution. License: LGPL (>= 3) VignetteBuilder: knitr Imports: BiocGenerics (>= 0.7.5), Biobase, BiocParallel, genefilter, methods, locfit, geneplotter, ggplot2, Hmisc, Rcpp (>= 0.11.0) Depends: S4Vectors (>= 0.9.25), IRanges, GenomicRanges, SummarizedExperiment (>= 1.1.6) Suggests: testthat, knitr, BiocStyle, vsn, pheatmap, RColorBrewer, IHW, apeglm, ashr, tximport, tximportData, readr, pbapply, airway, pasilla (>= 0.2.10) LinkingTo: Rcpp, RcppArmadillo URL: https://github.com/mikelove/DESeq2 biocViews: Sequencing, ChIPSeq, RNASeq, SAGE, DifferentialExpression, GeneExpression, Transcription RoxygenNote: 6.0.1 NeedsCompilation: yes Packaged: 2017-11-12 00:26:42 UTC; biocbuild DESeq2/NAMESPACE0000644000175400017540000000633513201671732014053 0ustar00biocbuildbiocbuild# Generated by roxygen2: do not edit by hand S3method(coef,DESeqDataSet) export("dispersionFunction<-") export("dispersions<-") export("normalizationFactors<-") export("priorInfo<-") export(DESeq) export(DESeqDataSet) export(DESeqDataSetFromHTSeqCount) export(DESeqDataSetFromMatrix) export(DESeqDataSetFromTximport) export(DESeqResults) export(DESeqTransform) export(collapseReplicates) export(dispersionFunction) export(dispersions) export(estimateBetaPriorVar) export(estimateDispersionsFit) export(estimateDispersionsGeneEst) export(estimateDispersionsMAP) export(estimateDispersionsPriorVar) export(estimateMLEForBetaPriorVar) export(estimateSizeFactorsForMatrix) export(fpkm) export(fpm) export(getVarianceStabilizedData) export(lfcShrink) export(makeExampleDESeqDataSet) export(nbinomLRT) export(nbinomWaldTest) export(normTransform) export(normalizationFactors) export(normalizeGeneLength) export(plotCounts) export(plotSparsity) export(priorInfo) export(removeResults) export(replaceOutliers) export(replaceOutliersWithTrimmedMean) export(results) export(resultsNames) export(rlog) export(rlogTransformation) export(summary.DESeqResults) export(unmix) export(varianceStabilizingTransformation) export(vst) exportClasses(DESeqDataSet) exportClasses(DESeqResults) exportClasses(DESeqTransform) exportMethods("counts<-") exportMethods("design<-") exportMethods("dispersionFunction<-") exportMethods("dispersions<-") exportMethods("normalizationFactors<-") exportMethods("priorInfo<-") exportMethods("sizeFactors<-") exportMethods(counts) exportMethods(design) exportMethods(dispersionFunction) exportMethods(dispersions) exportMethods(estimateDispersions) exportMethods(estimateSizeFactors) exportMethods(normalizationFactors) exportMethods(plotDispEsts) exportMethods(plotMA) exportMethods(plotPCA) exportMethods(priorInfo) exportMethods(show) exportMethods(sizeFactors) import(Biobase) import(BiocGenerics) import(BiocParallel) import(GenomicRanges) import(IRanges) import(Rcpp) import(S4Vectors) import(SummarizedExperiment) import(methods) importFrom(Hmisc,wtd.quantile) importFrom(genefilter,filtered_p) importFrom(genefilter,rowVars) importFrom(geneplotter,plotMA) importFrom(ggplot2,aes_string) importFrom(ggplot2,coord_fixed) importFrom(ggplot2,geom_point) importFrom(ggplot2,ggplot) importFrom(ggplot2,xlab) importFrom(ggplot2,ylab) importFrom(graphics,axis) importFrom(graphics,hist) importFrom(graphics,plot) importFrom(graphics,points) importFrom(locfit,locfit) importFrom(stats,Gamma) importFrom(stats,as.formula) importFrom(stats,coef) importFrom(stats,coefficients) importFrom(stats,df) importFrom(stats,dnbinom) importFrom(stats,dnorm) importFrom(stats,formula) importFrom(stats,glm) importFrom(stats,loess) importFrom(stats,lowess) importFrom(stats,model.matrix) importFrom(stats,optim) importFrom(stats,p.adjust) importFrom(stats,pchisq) importFrom(stats,pnorm) importFrom(stats,prcomp) importFrom(stats,predict) importFrom(stats,pt) importFrom(stats,qf) importFrom(stats,qnorm) importFrom(stats,rchisq) importFrom(stats,relevel) importFrom(stats,rnbinom) importFrom(stats,rnorm) importFrom(stats,runif) importFrom(stats,splinefun) importFrom(stats,terms) importFrom(stats,terms.formula) importFrom(utils,packageVersion) importFrom(utils,read.table) useDynLib(DESeq2) DESeq2/NEWS0000644000175400017540000005274413201671732013340 0ustar00biocbuildbiocbuildCHANGES IN VERSION 1.18.0 ------------------------- o lfcShrink() offers alternative estimators type="apeglm" and type="ashr", making use of shrinkage estimators in the 'apeglm' and 'ashr' packages, respectively. See ?lfcShrink for more details and appropriate references. The integration of these alternative shrinkage estimators is still in development. Additionally, the DESeqResults object gains priorInfo(res), which passes along details of the fitted prior on LFC. o Factor levels using characters other than letters, numbers, '_' and '.' will print a message (not a warning or error) that it is recommended to restrict to these "safe characters". This follows a suggestion from the Bioconductor support site to avoid user errors. CHANGES IN VERSION 1.16.0 ------------------------- o DESeq() and nbinomWaldTest() the default setting will be betaPrior=FALSE, and the recommended pipeline will be to use lfcShrink() for producing shrunken LFC. o Added a new function unmix(), for unmixing samples according to linear combination of pure components, e.g. "tissue deconvolution". o Added a new size factor estimator, "poscounts", which evolved out of use cases in Paul McMurdie's phyloseq package. o Ability to specify observation-specific weights, using assays(dds)[["weights"]]. These weights are picked up by dispersion and NB GLM fitting functions. CHANGES IN VERSION 1.15.40 -------------------------- o Adding a new function unmix(), for unmixing samples according to pure components, e.g. "tissue deconvolution". The pure components are added on the gene expression scale (either normalized counts or TPMs), and the loss is calculated in a variance stabilized space. CHANGES IN VERSION 1.15.39 -------------------------- o Added a new size factor estimator, "poscounts", which evolved out of use cases in Paul McMurdie's phyloseq package. CHANGES IN VERSION 1.15.36 -------------------------- o Ability to specify observation-specific weights, using assays(dds)[["weights"]]. These weights are picked up by dispersion and NB GLM fitting functions. CHANGES IN VERSION 1.15.28 -------------------------- o Remove some code that would "zero out" LFCs when both groups involved in a contrast had zero counts. This lead to inconsistency when similarly contrasts were performed by refactoring. CHANGES IN VERSION 1.15.12 -------------------------- o DESeq() and nbinomWaldTest() the default setting will be betaPrior=FALSE, and the recommended pipeline will be to use lfcShrink() for producing shrunken log2 fold changes for visualization and ranking. Explanation for the change is presented in the vignette section: "Methods changes since the 2014 DESeq2 paper" CHANGES IN VERSION 1.15.9 ------------------------- o Adding prototype function lfcShrink(). o Vignette conversion to Rmarkdown / HTML. CHANGES IN VERSION 1.15.3 ------------------------- o Removing betaPrior option for nbinomLRT, in an effort to clean up and reduce old un-used functionality. CHANGES IN VERSION 1.13.8 ------------------------- o Use a linear model to estimate the expected counts for dispersion estimation in estDispGeneEst() if the number of groups in the model matrix is equal to the number of columns of the model matrix. Should provide a speed-up for dispersion estimation for model matrices with many samples. CHANGES IN VERSION 1.13.3 ------------------------- o Fixed bug: fpm() and fpkm() for tximport. o Fixed bug: normalization factors and VST. o Added an error if tximport lengths have 0. o Added an error if user matrices are not full rank. o More helpful error for constant factor in design. CHANGES IN VERSION 1.12.0 ------------------------- o Added DESeqDataSetFromTximport() to import counts using tximport. o Added vst() a fast wrapper for the VST. o Added support for IHW p-value adjustment. CHANGES IN VERSION 1.11.42 -------------------------- o Update summary() to be IHW-results-aware. o Small change to fitted mu values to improve fit stability when counts are very low. Inference for high count genes is not affected. o Galaxy script inst/script/deseq2.R moves to Galaxy repo. CHANGES IN VERSION 1.11.33 -------------------------- o Changed 'filterFun' API to accommodate IHW: independent hypothesis weighting in results(), see vignette for example code. Thanks to Nikolaos Ignatiadis, maintainer of IHW package. CHANGES IN VERSION 1.11.18 -------------------------- o Added a function vst(), which is a fast wrapper for varianceStabilizingTransformation(). The speed-up is accomplished by subsetting to a smaller number of genes for the estimation of the dispersion trend. CHANGES IN VERSION 1.11.5 ------------------------- o Adding in functionality to import estimated counts and average transcript length offsets from tximport, using DESeqDataSetFromTximport(). CHANGES IN VERSION 1.10.0 ------------------------- o Added MLE argument to plotMA(). o Added normTransform() for simple log2(K/s + 1) transformation, where K is a count and s is a size factor. o When the design contains an interaction, DESeq() will use betaPrior=FALSE. This makes coefficients easier to interpret. o Independent filtering will be less greedy, using as a threshold the lowest quantile of the filter such that the number of rejections is within 1 SD from the maximum. See ?results. o summary() and plotMA() will use 'alpha' from results(). CHANGES IN VERSION 1.9.42 ------------------------- o New function 'normTranform', for making DESeqTransform objects from normalized counts plus a pseudocount (default 1) then applying a transformation (default log2). o Added MLE argument to plotMA(), if results() was run with addMLE=TRUE, this allows for comparison of shrunken and unshrunken estimates of fold change. o summary() and plotMA() use the 'alpha' which was specified in results() rather than defaulting to 0.1. o Removed rlog's fast option, and instead recommending VST for very large matrices of counts (100s of samples). CHANGES IN VERSION 1.9.17 ------------------------- o Independent filtering: results() no longer uses the maximum of the number of rejections as calculated by the filter_p() function from the genefilter package. Small numbers of rejections at a high quantile of the filter threshold could result in a high filter threshold. Instead, now the results() function will use the lowest quantile of the filter for which the number of rejections is close to the peak of a lowess curve fit through the number of rejections over the filter quantiles. 'Close to' is defined as within 1 residual standard deviations. CHANGES IN VERSION 1.9.16 ------------------------- o When the design formula contains interaction terms, the DESeq() function will by default not use a beta prior (betaPrior=FALSE). The previous implementation of a log fold change prior for interaction terms returned accurate inference, but was confusing for users to interpret. New instructions on building results tables for designs with interactions will be included in the software vignette. CHANGES IN VERSION 1.8.0 ------------------------ o Added support for user-supplied model matrices to DESeq(), estimateDispersions() and nbinomWaldTest(). This helps when the model matrix needs to be edited by the user. CHANGES IN VERSION 1.7.45 ------------------------- o Added a test in rlog for sparse data, mostly zero and some very large counts, which will give a warning and suggestion for alternate transformations. o Added plotSparsity() which will help diagnose issues for using rlog: data which do not resemble negative binomial due to many genes with mostly zeros and a few very large counts. CHANGES IN VERSION 1.7.43 ------------------------- o Added 'replaced' argument to counts() and plotCounts() such that the assay in "replaceCounts" will be used if it exists. Raised a minimum dispersion value used in Cook's calculation, so that other counts in a group with an outlier won't get extreme Cook's distances themselves. CHANGES IN VERSION 1.7.32 ------------------------- o Added logic to results() which will zero out the LFC, Wald statistic and set p-value to 1, for 'contrast' argument results tables where the contrasted groups all have zero count. Non-zero LFCs were otherwise occuring due to large differences in the size factors. CHANGES IN VERSION 1.7.11 ------------------------- o Added support for user-supplied model matrices to DESeq(), estimateDispersions() and nbinomWaldTest(). CHANGES IN VERSION 1.7.9 ------------------------ o Added Genome Biology citation for the DESeq2 methods. o Introduced type="iterate" for estimateSizeFactors, an alternative estimator for the size factors, which can be used even when all genes have a sample with a count of zero. See man page for details. CHANGES IN VERSION 1.7.3 ------------------------ o Fixed two minor bugs: DESeq() with parallel=TRUE was dropping rows with all zero counts, instead of propogating NAs. nbinomLRT() with matrices provided to 'full' and 'reduced' and a design of ~ 1, the matrices were being ignored. CHANGES IN VERSION 1.6.0 ------------------------ o DESeq() and results() gets a 'parallel' argument. o results() gets an 'addMLE' argument. o results() gets a 'test' argument, for constructing Wald tests after DESeq() was run using the likelihood ratio test. o results() argument 'format' for GRanges or GRangesList results. o new plotCounts() function. o Less outlier calling from Cook's distance for analyses with many samples and many conditions. o More robust beta prior variance and log fold change shrinkage. CHANGES IN VERSION 1.5.70 ------------------------- o Added 'parallel' also for results(), which can be slow if run with 100s of samples. CHANGES IN VERSION 1.5.54 ------------------------- o Added 'parallel' argument to DESeq() which splits up the analysis over genes for those steps which are easily done in parallel, leveraging BiocParallel's bplapply. CHANGES IN VERSION 1.5.50 ------------------------- o A matrix can be provided to rlog or to the VST and will return a matrix. Also 'fitType' argument is included, in case dispersions are not estimated which is passed on to estimateDispersions. CHANGES IN VERSION 1.5.49 ------------------------- o The fast=TRUE implementation of rlog is even faster, subsetting genes along the range of base mean to estimate the dispersion trend and for fitting the optimal amount of shrinkage. CHANGES IN VERSION 1.5.40 ------------------------- o Further improved code behind the robust estimation of variance for Cook's cutoff, resulting in less outlier calls due to an individual condition with few samples and high variance. CHANGES IN VERSION 1.5.28 ------------------------- o New results() argument 'addMLE' adds the unshrunken fold changes for simple contrasts or interaction terms to the results tables. CHANGES IN VERSION 1.5.27 ------------------------- o Applied the beta prior variance calculation from v1.5.22 to the regularized logarithm. o Added MLE coefficients as MLE_condition_B_vs_A columns to mcols(dds). o Fixed the statistic which is returned when lfcThreshold is used. Previously, only the p-value and adjusted p-value was changed. o plotPCA() with argument 'returnData' will return a data.frame which can be used for custom plotting. CHANGES IN VERSION 1.5.25 ------------------------- o Improved the robust variance estimate used for calculating Cook's distances. The previous estimate could lead to outlier calls in datasets with many conditions, and when a single condition had large, highly variable counts for all its samples. CHANGES IN VERSION 1.5.22 ------------------------- o Adding an alternate method for beta prior variance calculation in nbinomWaldTest. This helps to produce more robust prior variance estimates when many genes have small counts and highly variable MLE log fold changes. CHANGES IN VERSION 1.5.15 ------------------------- o For likelihood ratio test, expanded model matrices not default. Some improvements in fit time from handling of genes with dispersions that do not converge using line search. CHANGES IN VERSION 1.5.13 ------------------------- o Adding test argument to results(), which allows users to perform a Wald test after DESeq(dds, test="LRT") / nbinomLRT has been run. CHANGES IN VERSION 1.5.11 ------------------------ o Swapping in ggplot2 for lattice for the plotPCA function. CHANGES IN VERSION 1.5.9 ------------------------ o Added a VST for fitType = mean. Allowed designs with ~ 0 and betaPrior = FALSE. Fixed some potential metadata column insertion bugs. CHANGES IN VERSION 1.5.8 ------------------------ o Suppress the glm.fit convergence warning from parametric dispersion curve fitting procedure, instead use this for the iterative convergence test. CHANGES IN VERSION 1.5.3 ------------------------ o Speeding up and reducing copying for DESeqDataSet construction. CHANGES IN VERSION 1.5.2 ------------------------ o Added 'format' argument to results, which will attach results to GRangesList or GRanges if requested (default is DataFrame). CHANGES IN VERSION 1.4.4 ------------------------ o Fixed a hang which could occur in the GLM fitting procedure. CHANGES IN VERSION 1.4.3 ------------------------ o Fixed simple bug when using normalizationFactors and running nbinomWaldTest, error was "no method for coercing this S4 class to a vector". CHANGES IN VERSION 1.4.2 ------------------------ o Fixed bugs: estimating beta prior for interaction between factor and numeric; not returning row names for counts(); construction of DESeqDataSet gives wrong error when there are empty levels: instead now drops the levels for the user. CHANGES IN VERSION 1.4.1 ------------------------ o Fixed bug where DESeqDataSetFromHTSeqCount() imported the special rows, "_ambiguous", etc. CHANGES IN VERSION 1.4.0 ------------------------ o *** USAGE NOTE *** Expanded model matrices are now used when betaPrior = TRUE (the default). Therefore, level comparison results should be extracted using the 'contrast' argument to the results() function. Expanded model matrices produce shrinkage of log fold changes that is independent of the choice of base level. Expanded model matrices are not used in the case of designs with an interaction term between factors with only 2 levels. o The order of the arguments 'name' and 'contrast' to the results() function are swapped, to indicate that 'contrast' should be used for the standard comparisons of levels against each other. Calling results() with no arguments will still produce the same comparison: the fold change of the last level of the last design variable over the first level of the last design variable. See ?results for more details. o The DESeq() function will automatically replace count outliers flagged by Cook's distance when there are 7 or more replicates. The DESeq() argument 'minReplicatesForReplace' (default 7) is used to decide which samples are eligible for automatic replacement. This default behavior helps to prevent filtering genes based on Cook's distance when there are many degrees of freedom. CHANGES IN VERSION 1.3.58 ------------------------- o Added a list() option to the 'contrast' argument of results(). See examples in ?results. CHANGES IN VERSION 1.3.24 ------------------------- o rlogTransformation() gains an argument 'fast', which switches to an approximation of the rlog transformation. Speed-up is ~ 2x. o A more robust estimator for the beta prior variance is used: instead of taking the mean of squared MLE betas, the prior variance is found by matching an upper quantile of the absolute value of MLE betas with an upper quantile of a zero-centered Normal distribution. CHANGES IN VERSION 1.3.17 ------------------------- o It is possible to use a log2 fold change prior (beta prior) and obtain likelihood ratio test p-values, although the default for test="LRT" is still betaPrior=FALSE. CHANGES IN VERSION 1.3.15 ------------------------- o The DESeq() function will automatically replace count outliers flagged by Cook's distance when there are 7 or more replicates. The DESeq() argument 'minReplicatesForReplace' (default 7) is used to decide which samples are eligible for automatic replacement. This default behavior helps to prevent filtering genes based on Cook's distance when there are many degrees of freedom. o The results() function produces an object of class 'DESeqResults' which is a simple subclass of 'DataFrame'. This class allows for methods to be written specifically for DESeq2 results. For example, plotMA() can be called on a 'DESeqResults' object. CHANGES IN VERSION 1.3.12 ------------------------- o Added a check in nbinomWaldTest which ensures that priors on logarithmic fold changes are only estimated for interactions terms, in the case that interaction terms are present in the design formula. CHANGES IN VERSION 1.3.6 ------------------------ o Reduced the amount of filtering from Cook's cutoff: maximum no longer includes samples from experimental groups with only 2 samples, the default F quantile is raised to 0.99, and a robust estimate of dispersion is used to calculate Cook's distance instead of the fitted dispersion. CHANGES IN VERSION 1.3.5 ------------------------ o New arguments to results(), 'lfcThreshold' and 'alternativeHypothesis', allow for tests of log fold changes which are above or below a given threshold. o plotMA() function now passes ellipses arguments to the results() function. CHANGES IN VERSION 1.1.32 ------------------------- o By default, use QR decomposition on the design matrix X. This stabilizes the GLM fitting. Can be turned off with the useQR argument of nbinomWaldTest() and nbinomLRT(). o Allow for "frozen" normalization of new samples using previous estimated parameters for the functions: estimateSizeFactors(), varianceStabilizingTransformation(), and rlogTransformation(). See manual pages for details and examples. CHANGES IN VERSION 1.1.31 ------------------------- o The adjustment of p-values and use of Cook's distance for outlier detection is moved to results() function instead of nbinomWaldTest(), nbinomLRT(), or DESeq(). This allows the user to change parameter settings without having to refit the model. CHANGES IN VERSION 1.1.24 ------------------------- o The results() function allows the user to specify a contrast of coefficients, either using the names of the factor and levels, or using a numeric contrast vector. Contrasts are only available for the Wald test differential analysis. CHANGES IN VERSION 1.1.23 ------------------------- o The results() function automatically performs independent filtering using the genefilter package and optimizing over the mean of normalized counts. CHANGES IN VERSION 1.1.21 ------------------------- o The regularized log transformation uses the fitted dispersions instead of the MAP dispersions. This prevents large, true log fold changes from being moderated due to a large dispersion estimate blind to the design formula. This behavior is also more consistent with the variance stabilizing transformation. CHANGES IN VERSION 1.0.10 ------------------------- o Outlier detection: Cook's distances are calculated for each sample per gene and the matrix is stored in the assays list. These values are used to determine genes in which a single sample disproportionately influences the fitted coefficients. These genes are flagged and the p-values set to NA. The argument 'cooksCutoff' of nbinomWaldTest() and nbinomLRT() can be used to control this functionality. CHANGES IN VERSION 1.0.0 ------------------------ o Base class: SummarizedExperiment is used as the superclass for storing the data. o Workflow: The wrapper function DESeq() performs all steps for a differential expression analysis. Individual steps are still accessible. o Statistics: Incorporation of prior distributions into the estimation of dispersions and fold changes (empirical-Bayes shrinkage). A Wald test for significance is provided as the default inference method, with the likelihood ratio test of the previous version also available. o Normalization: it is possible to provide a matrix of sample- *and* gene-specific normalization factors DESeq2/R/0000755000175400017540000000000013201671732013026 5ustar00biocbuildbiocbuildDESeq2/R/AllClasses.R0000644000175400017540000004211013201671732015175 0ustar00biocbuildbiocbuild#' @rdname DESeqDataSet #' @export setClass("DESeqDataSet", contains = "RangedSummarizedExperiment", representation = representation( design = "formula", dispersionFunction = "function")) setValidity( "DESeqDataSet", function( object ) { if (! ("counts" %in% assayNames(object)) ) return( "the assays slot must contain a matrix named 'counts'" ) if ( !is.numeric( counts(object) ) ) return( "the count data is not numeric" ) if ( any( is.na( counts(object) ) ) ) return( "NA values are not allowed in the count matrix" ) if ( !is.integer( counts(object) ) ) return( "the count data is not in integer mode" ) if ( any( counts(object) < 0 ) ) return( "the count data contains negative values" ) design <- design(object) designVars <- all.vars(design) if (!all(designVars %in% names(colData(object)))) { return("all variables in design formula must be columns in colData") } designVarsClass <- sapply(designVars, function(v) class(colData(object)[[v]])) if (any(designVarsClass == "character")) { return("variables in design formula are character vectors. convert these columns of colData(object) to factors before including in the design formula") } designFactors <- designVars[designVarsClass == "factor"] # levels would duplicate after make.names() if (any(sapply(designFactors,function(v) { factor.lvls <- levels(colData(object)[[v]]) factor.nms <- make.names(factor.lvls) any(duplicated(factor.nms)) }))) { return("levels of factors in the design have non-unique level names after make.names() is applied. best to only uobject letters and numbers for levels of factors in the design") } # levels contain characters other than letters, numbers, and underscore if (any(sapply(designFactors,function(v) { factor.lvls <- levels(colData(object)[[v]]) any(!grepl("^[A-Za-z0-9_.]+$",factor.lvls)) }))) { # just a warning for now message(" Note: levels of factors in the design contain characters other than letters, numbers, '_' and '.'. It is recommended (but not required) to use only letters, numbers, and delimiters '_' or '.', as these are safe characters for column names in R. [This is a message, not an warning or error]") } # else... TRUE } ) #' DESeqDataSet object and constructors #' #' \code{DESeqDataSet} is a subclass of \code{RangedSummarizedExperiment}, #' used to store the input values, intermediate calculations and results of an #' analysis of differential expression. The \code{DESeqDataSet} class #' enforces non-negative integer values in the "counts" matrix stored as #' the first element in the assay list. #' In addition, a formula which specifies the design of the experiment must be provided. #' The constructor functions create a DESeqDataSet object #' from various types of input: #' a RangedSummarizedExperiment, a matrix, count files generated by #' the python package HTSeq, or a list from the tximport function in the #' tximport package. #' See the vignette for examples of construction from different types. #' #' Note on the error message "assay colnames() must be NULL or equal colData rownames()": #' this means that the colnames of countData are different than the rownames of colData. #' Fix this with: \code{colnames(countData) <- NULL} #' #' @param se a \code{RangedSummarizedExperiment} with columns of variables #' indicating sample information in \code{colData}, #' and the counts as the first element in the assays list, which will #' be renamed "counts". A \code{RangedSummarizedExperiment} object can be #' generated by the function \code{summarizeOverlaps} in the GenomicAlignments #' package. #' @param design a \code{formula} which expresses how the counts for each gene #' depend on the variables in \code{colData}. Many R \code{formula} are valid, #' including designs with multiple variables, e.g., \code{~ group + condition}, #' and designs with interactions, e.g., \code{~ genotype + treatment + genotype:treatment}. #' See \code{\link{results}} for a variety of designs and how to extract results tables. #' By default, the functions in this package will use #' the last variable in the formula for building results tables and plotting. #' \code{~ 1} can be used for no design, although users need to remember #' to switch to another design for differential testing. #' @param countData for matrix input: a matrix of non-negative integers #' @param colData for matrix input: a \code{DataFrame} or \code{data.frame} with at least a single column. #' Rows of colData correspond to columns of countData #' @param tidy for matrix input: whether the first column of countData is the rownames for the count matrix #' @param sampleTable for htseq-count: a \code{data.frame} with three or more columns. Each row #' describes one sample. The first column is the sample name, the second column #' the file name of the count file generated by htseq-count, and the remaining #' columns are sample metadata which will be stored in \code{colData} #' @param txi for tximport: the simple list output of the \code{tximport} function #' @param directory for htseq-count: the directory relative to which the filenames are specified. defaults to current directory #' @param ignoreRank use of this argument is reserved for DEXSeq developers only. #' Users will immediately encounter an error upon trying to estimate dispersion #' using a design with a model matrix which is not full rank. #' @param ... arguments provided to \code{SummarizedExperiment} including rowRanges and metadata. Note that #' for Bioconductor 3.1, rowRanges must be a GRanges or GRangesList, with potential metadata columns #' as a DataFrame accessed and stored with \code{mcols}. If a user wants to store metadata columns #' about the rows of the countData, but does not have GRanges or GRangesList information, #' first construct the DESeqDataSet without rowRanges and then add the DataFrame with \code{mcols(dds)}. #' #' @return A DESeqDataSet object. #' #' @aliases DESeqDataSet DESeqDataSet-class DESeqDataSetFromMatrix DESeqDataSetFromHTSeqCount #' #' @references See \url{http://www-huber.embl.de/users/anders/HTSeq} for htseq-count #' #' @docType class #' #' @examples #' #' countData <- matrix(1:100,ncol=4) #' condition <- factor(c("A","A","B","B")) #' dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), ~ condition) #' #' @rdname DESeqDataSet #' @importFrom utils packageVersion #' @export DESeqDataSet <- function(se, design, ignoreRank=FALSE) { if (!is(se, "RangedSummarizedExperiment")) { if (is(se, "SummarizedExperiment")) { se <- as(se, "RangedSummarizedExperiment") } else { stop("'se' must be a RangedSummarizedExperiment object") } } if (is.null(assayNames(se)) || assayNames(se)[1] != "counts") { message("renaming the first element in assays to 'counts'") assayNames(se)[1] <- "counts" } # before validity check, try to convert assay to integer mode if (any(is.na(assay(se)))) stop("NA values are not allowed in the count matrix") if (any(assay(se) < 0)) { stop("some values in assay are negative") } if (!is.integer(assay(se))) { if (any(round(assay(se)) != assay(se))) { stop("some values in assay are not integers") } message("converting counts to integer mode") mode(assay(se)) <- "integer" } if (all(assay(se) == 0)) { stop("all samples have 0 counts for all genes. check the counting script.") } if (all(rowSums(assay(se) == assay(se)[,1]) == ncol(se))) { warning("all genes have equal values for all samples. will not be able to perform differential analysis") } if (any(duplicated(rownames(se)))) { warning(sum(duplicated(rownames(se)))," duplicate rownames were renamed by adding numbers") rnms <- rownames(se) dups <- unique(rnms[duplicated(rnms)]) for (rn in dups) { idx <- which(rnms == rn) rnms[idx[-1]] <- paste(rnms[idx[-1]], c(seq_len(length(idx) - 1)), sep=".") } rownames(se) <- rnms } designVars <- all.vars(design) if (!all(designVars %in% names(colData(se)))) { stop("all variables in design formula must be columns in colData") } designVarsClass <- sapply(designVars, function(v) class(colData(se)[[v]])) if (any(designVarsClass == "character")) { warning("some variables in design formula are characters, converting to factors") for (v in designVars[designVarsClass == "character"]) { colData(se)[[v]] <- factor(colData(se)[[v]]) } } if (length(designVars) == 1) { var <- colData(se)[[designVars]] if (all(var == var[1])) { stop("design has a single variable, with all samples having the same value. use instead a design of '~ 1'. estimateSizeFactors, rlog and the VST can then be used") } } designVarsNumeric <- sapply(designVars, function(v) is.numeric(colData(se)[[v]])) if (any(designVarsNumeric)) { warnIntVars <- FALSE for (v in designVars[designVarsNumeric]) { if (all(colData(se)[[v]] == round(colData(se)[[v]]))) { warnIntVars <- TRUE } } if (warnIntVars) { message("the design formula contains a numeric variable with integer values, specifying a model with increasing fold change for higher values. did you mean for this to be a factor? if so, first convert this variable to a factor using the factor() function") } } designFactors <- designVars[designVarsClass == "factor"] missingLevels <- sapply(designFactors, function(v) any(table(colData(se)[[v]]) == 0)) if (any(missingLevels)) { message("factor levels were dropped which had no samples") for (v in designFactors[missingLevels]) { colData(se)[[v]] <- droplevels(colData(se)[[v]]) } } singleLevel <- sapply(designFactors, function(v) all(colData(se)[[v]] == colData(se)[[v]][1])) if (any(singleLevel)) { stop("design contains one or more variables with all samples having the same value, remove these variables from the design") } modelMatrix <- stats::model.matrix.default(design, data=as.data.frame(colData(se))) if (!ignoreRank) { checkFullRank(modelMatrix) } # if the last variable in the design formula is a # factor, and has a level 'control', check if it is # the reference level and if not print a message lastDV <- length(designVars) if (length(designVars) > 0 && designVarsClass[lastDV] == "factor") { lastDVLvls <- levels(colData(se)[[designVars[lastDV]]]) controlSynonyms <- c("control","Control","CONTROL") for (cSyn in controlSynonyms) { if (cSyn %in% lastDVLvls) { if (cSyn != lastDVLvls[1]) { message(paste0("it appears that the last variable in the design formula, '",designVars[lastDV],"', has a factor level, '",cSyn,"', which is not the reference level. we recommend to use factor(...,levels=...) or relevel() to set this as the reference level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2').")) } } } } # Add columns on the columns mcolsCols <- DataFrame(type=rep("input",ncol(colData(se))), description=rep("",ncol(colData(se)))) mcols(colData(se)) <- if (is.null(mcols(colData(se)))) { mcolsCols } else if (all(names(mcols(colData(se))) == c("type","description"))) { mcolsCols } else { cbind(mcols(colData(se)), mcolsCols) } object <- new("DESeqDataSet", se, design = design) # now we know we have at least an empty GRanges or GRangesList for rowRanges # so we can create a metadata column 'type' for the mcols # and we label any incoming columns as 'input' # this is metadata columns on the rows mcolsRows <- DataFrame(type=rep("input",ncol(mcols(object))), description=rep("",ncol(mcols(object)))) mcols(mcols(object)) <- if (is.null(mcols(mcols(object)))) { mcolsRows } else if (all(names(mcols(mcols(object))) == c("type","description"))) { mcolsRows } else { cbind(mcols(mcols(object)), mcolsRows) } # stash the package version metadata(object)[["version"]] <- packageVersion("DESeq2") return(object) } #' @rdname DESeqDataSet #' @export DESeqDataSetFromMatrix <- function( countData, colData, design, tidy=FALSE, ignoreRank=FALSE, ... ) { if (tidy) { stopifnot(ncol(countData) > 1) rownms <- as.character(countData[,1]) countData <- countData[,-1,drop=FALSE] rownames(countData) <- rownms } # check that these agree in number stopifnot(ncol(countData) == nrow(colData)) # we expect a matrix of counts, which are non-negative integers countData <- as.matrix( countData ) if (is(colData,"data.frame")) colData <- as(colData, "DataFrame") # check if the rownames of colData are simply in different order # than the colnames of the countData, if so throw an error # as the user probably should investigate what's wrong if (!is.null(rownames(colData)) & !is.null(colnames(countData))) { if (all(sort(rownames(colData)) == sort(colnames(countData)))) { if (!all(rownames(colData) == colnames(countData))) { stop(paste("rownames of the colData: ",paste(rownames(colData),collapse=",")," are not in the same order as the colnames of the countData: ",paste(colnames(countData),collapse=","))) } } } if (is.null(rownames(colData)) & !is.null(colnames(countData))) { rownames(colData) <- colnames(countData) } se <- SummarizedExperiment(assays = SimpleList(counts=countData), colData = colData, ...) object <- DESeqDataSet(se, design = design, ignoreRank) return(object) } #' @rdname DESeqDataSet #' @export DESeqDataSetFromHTSeqCount <- function( sampleTable, directory=".", design, ignoreRank=FALSE, ...) { if (missing(design)) stop("design is missing") l <- lapply( as.character( sampleTable[,2] ), function(fn) read.table( file.path( directory, fn ) ) ) if( ! all( sapply( l, function(a) all( a$V1 == l[[1]]$V1 ) ) ) ) stop( "Gene IDs (first column) differ between files." ) tbl <- sapply( l, function(a) a$V2 ) colnames(tbl) <- sampleTable[,1] rownames(tbl) <- l[[1]]$V1 rownames(sampleTable) <- sampleTable[,1] oldSpecialNames <- c("no_feature","ambiguous","too_low_aQual","not_aligned","alignment_not_unique") # either starts with two underscores # or is one of the old special names (htseq-count backward compatability) specialRows <- (substr(rownames(tbl),1,1) == "_") | rownames(tbl) %in% oldSpecialNames tbl <- tbl[ !specialRows, , drop=FALSE ] object <- DESeqDataSetFromMatrix(countData=tbl,colData=sampleTable[,-(1:2),drop=FALSE],design=design,ignoreRank, ...) return(object) } #' @rdname DESeqDataSet #' @export DESeqDataSetFromTximport <- function(txi, colData, design, ...) { counts <- round(txi$counts) mode(counts) <- "integer" object <- DESeqDataSetFromMatrix(countData=counts, colData=colData, design=design, ...) stopifnot(txi$countsFromAbundance %in% c("no","scaledTPM","lengthScaledTPM")) if (txi$countsFromAbundance %in% c("scaledTPM","lengthScaledTPM")) { message("using just counts from tximport") } else { message("using counts and average transcript lengths from tximport") lengths <- txi$length stopifnot(all(lengths > 0)) dimnames(lengths) <- dimnames(object) assays(object)[["avgTxLength"]] <- lengths } return(object) } #' @rdname DESeqResults #' @export setClass("DESeqResults", contains="DataFrame", representation = representation( priorInfo = "list") ) #' DESeqResults object and constructor #' #' This constructor function would not typically be used by "end users". #' This simple class extends the DataFrame class of the IRanges package #' to allow other packages to write methods for results #' objects from the DESeq2 package. It is used by \code{\link{results}} #' to wrap up the results table. #' #' @param DataFrame a DataFrame of results, standard column names are: #' baseMean, log2FoldChange, lfcSE, stat, pvalue, padj. #' @param priorInfo a list giving information on the log fold change prior #' #' @return a DESeqResults object #' @docType class #' @aliases DESeqResults-class #' @rdname DESeqResults #' @export DESeqResults <- function(DataFrame, priorInfo=list()) { new("DESeqResults", DataFrame, priorInfo=priorInfo) } #' @rdname DESeqTransform #' @export setClass("DESeqTransform", contains="RangedSummarizedExperiment") #' DESeqTransform object and constructor #' #' This constructor function would not typically be used by "end users". #' This simple class extends the RangedSummarizedExperiment class of the #' SummarizedExperiment package. #' It is used by \code{\link{rlog}} and #' \code{\link{varianceStabilizingTransformation}} #' to wrap up the results into a class for downstream methods, #' such as \code{\link{plotPCA}}. #' #' @param SummarizedExperiment a RangedSummarizedExperiment #' #' @return a DESeqTransform object #' @docType class #' @aliases DESeqTransform-class #' @rdname DESeqTransform #' @export DESeqTransform <- function(SummarizedExperiment) { se <- SummarizedExperiment if (!is(se, "RangedSummarizedExperiment")) { if (is(se, "SummarizedExperiment")) { se <- as(se, "RangedSummarizedExperiment") } else { stop("'SummarizedExperiment' must be a RangedSummarizedExperiment object") } } new("DESeqTransform", se) } DESeq2/R/AllGenerics.R0000644000175400017540000000201313201671732015335 0ustar00biocbuildbiocbuild#' @rdname dispersionFunction #' @export setGeneric("dispersionFunction", function(object,...) standardGeneric("dispersionFunction")) #' @rdname dispersionFunction #' @export setGeneric("dispersionFunction<-", function(object,...,value) standardGeneric("dispersionFunction<-")) #' @rdname dispersions #' @export setGeneric("dispersions", function(object,...) standardGeneric("dispersions")) #' @rdname dispersions #' @export setGeneric("dispersions<-", function(object,...,value) standardGeneric("dispersions<-")) #' @rdname normalizationFactors #' @export setGeneric("normalizationFactors", function(object,...) standardGeneric("normalizationFactors")) #' @rdname normalizationFactors #' @export setGeneric("normalizationFactors<-", function(object,...,value) standardGeneric("normalizationFactors<-")) #' @rdname priorInfo #' @export setGeneric("priorInfo", function(object,...) standardGeneric("priorInfo")) #' @rdname priorInfo #' @export setGeneric("priorInfo<-", function(object,...,value) standardGeneric("priorInfo<-")) DESeq2/R/RcppExports.R0000644000175400017540000000241313201671732015442 0ustar00biocbuildbiocbuild# Generated by using Rcpp::compileAttributes() -> do not edit by hand # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 fitDisp <- function(ySEXP, xSEXP, mu_hatSEXP, log_alphaSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, min_log_alphaSEXP, kappa_0SEXP, tolSEXP, maxitSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP) { .Call('_DESeq2_fitDisp', PACKAGE = 'DESeq2', ySEXP, xSEXP, mu_hatSEXP, log_alphaSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, min_log_alphaSEXP, kappa_0SEXP, tolSEXP, maxitSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP) } fitBeta <- function(ySEXP, xSEXP, nfSEXP, alpha_hatSEXP, contrastSEXP, beta_matSEXP, lambdaSEXP, weightsSEXP, useWeightsSEXP, tolSEXP, maxitSEXP, useQRSEXP) { .Call('_DESeq2_fitBeta', PACKAGE = 'DESeq2', ySEXP, xSEXP, nfSEXP, alpha_hatSEXP, contrastSEXP, beta_matSEXP, lambdaSEXP, weightsSEXP, useWeightsSEXP, tolSEXP, maxitSEXP, useQRSEXP) } fitDispGrid <- function(ySEXP, xSEXP, mu_hatSEXP, disp_gridSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP) { .Call('_DESeq2_fitDispGrid', PACKAGE = 'DESeq2', ySEXP, xSEXP, mu_hatSEXP, disp_gridSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP) } DESeq2/R/core.R0000644000175400017540000030146513201671732014112 0ustar00biocbuildbiocbuild############################################################ # # DESeq2 organization of R files # # core ........... most of the statistical code (example call below) # fitNbinomGLMs .. three functions for fitting NB GLMs # methods ........ the S4 methods (estimateSizeFactors, etc.) # AllClasses ..... class definitions and object constructors # AllGenerics .... the generics defined in DESeq2 # results ........ results() function and helpers # plots .......... all plotting functions # lfcShrink ...... log2 fold change shrinkage # helper ......... unmix, collapseReplicates, fpkm, fpm, DESeqParallel # expanded ....... helpers for dealing with expanded model matrices # wrappers ....... the R wrappers for the C++ functions (mine) # RcppExports .... the R wrappers for the C++ functions (auto) # # rlogTransformation ... rlog # varianceStabilizingTransformation ... VST # # general outline of the internal function calls. # note: not all of these functions are exported. # # DESeq # |- estimateSizeFactors # |- estimateSizeFactorsForMatrix # |- estimateDispersions # |- estimateDispersionsGeneEst # |- fitNbinomGLMs # |- fitBeta (C++) # |- fitDisp (C++) # |- estimateDispersionsFit # |- estimateDispersionsMAP # |- estimateDispersionPriorVar # |- fitDisp (C++) # |- nbinomWaldTest # |- fitGLMsWithPrior # |- fitNbinomGLMs # |- fitBeta (C++) # |- estimateBetaPriorVar # |- fitNbinomGLMs # |- fitBeta (C++) # ############################################################ #' DESeq2 package for differential analysis of count data #' #' The main functions for differential analysis are \code{\link{DESeq}} and #' \code{\link{results}}. See the examples at \code{\link{DESeq}} for basic analysis steps. #' Two transformations offered for count data are #' the "regularized logarithm", \code{\link{rlog}}, #' and \code{\link{varianceStabilizingTransformation}}. #' For more detailed information on usage, see the package vignette, by typing #' \code{vignette("DESeq2")}, or the workflow linked to on the first page #' of the vignette. All support questions should be posted to the Bioconductor #' support site: \url{http://support.bioconductor.org}. #' #' @references #' #' DESeq2 reference: #' #' Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. \url{http://dx.doi.org/10.1186/s13059-014-0550-8} #' #' DESeq reference: #' #' Simon Anders, Wolfgang Huber (2010) Differential expression analysis for sequence count data. Genome Biology, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} #' #' @author Michael Love, Wolfgang Huber, Simon Anders #' #' @docType package #' @name DESeq2-package #' @aliases DESeq2-package #' @keywords package NULL #' Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution #' #' This function performs a default analysis through the steps: #' \enumerate{ #' \item estimation of size factors: \code{\link{estimateSizeFactors}} #' \item estimation of dispersion: \code{\link{estimateDispersions}} #' \item Negative Binomial GLM fitting and Wald statistics: \code{\link{nbinomWaldTest}} #' } #' For complete details on each step, see the manual pages of the respective #' functions. After the \code{DESeq} function returns a DESeqDataSet object, #' results tables (log2 fold changes and p-values) can be generated #' using the \code{\link{results}} function. See the manual page #' for \code{\link{results}} for information on independent filtering and #' p-value adjustment for multiple test correction. #' #' The differential expression analysis uses a generalized linear model of the form: #' #' \deqn{ K_{ij} \sim \textrm{NB}( \mu_{ij}, \alpha_i) }{ K_ij ~ NB(mu_ij, alpha_i) } #' \deqn{ \mu_{ij} = s_j q_{ij} }{ mu_ij = s_j q_ij } #' \deqn{ \log_2(q_{ij}) = x_{j.} \beta_i }{ log2(q_ij) = x_j. beta_i } #' #' where counts \eqn{K_{ij}}{K_ij} for gene i, sample j are modeled using #' a Negative Binomial distribution with fitted mean \eqn{\mu_{ij}}{mu_ij} #' and a gene-specific dispersion parameter \eqn{\alpha_i}{alpha_i}. #' The fitted mean is composed of a sample-specific size factor #' \eqn{s_j}{s_j} and a parameter \eqn{q_{ij}}{q_ij} proportional to the #' expected true concentration of fragments for sample j. #' The coefficients \eqn{\beta_i}{beta_i} give the log2 fold changes for gene i for each #' column of the model matrix \eqn{X}{X}. #' The sample-specific size factors can be replaced by #' gene-specific normalization factors for each sample using #' \code{\link{normalizationFactors}}. #' #' For details on the fitting of the log2 fold changes and calculation of p-values, #' see \code{\link{nbinomWaldTest}} if using \code{test="Wald"}, #' or \code{\link{nbinomLRT}} if using \code{test="LRT"}. #' #' Experiments without replicates do not allow for estimation of the dispersion #' of counts around the expected value for each group, which is critical for #' differential expression analysis. If an experimental design is #' supplied which does not contain the necessary degrees of freedom for differential #' analysis, \code{DESeq} will provide a warning to the user and follow #' the strategy outlined in Anders and Huber (2010) #' under the section 'Working without replicates', wherein all the samples #' are considered as replicates of a single group for the estimation of dispersion. #' As noted in the reference above: "Some overestimation of the variance #' may be expected, which will make that approach conservative." #' Furthermore, "while one may not want to draw strong conclusions from such an analysis, #' it may still be useful for exploration and hypothesis generation." #' We provide this approach for data exploration only, but for accurately #' identifying differential expression, biological replicates are required. #' #' The argument \code{minReplicatesForReplace} is used to decide which samples #' are eligible for automatic replacement in the case of extreme Cook's distance. #' By default, \code{DESeq} will replace outliers if the Cook's distance is #' large for a sample which has 7 or more replicates (including itself). #' This replacement is performed by the \code{\link{replaceOutliers}} #' function. This default behavior helps to prevent filtering genes #' based on Cook's distance when there are many degrees of freedom. #' See \code{\link{results}} for more information about filtering using #' Cook's distance, and the 'Dealing with outliers' section of the vignette. #' Unlike the behavior of \code{\link{replaceOutliers}}, here original counts are #' kept in the matrix returned by \code{\link{counts}}, original Cook's #' distances are kept in \code{assays(dds)[["cooks"]]}, and the replacement #' counts used for fitting are kept in \code{assays(dds)[["replaceCounts"]]}. #' #' Note that if a log2 fold change prior is used (betaPrior=TRUE) #' then expanded model matrices will be used in fitting. These are #' described in \code{\link{nbinomWaldTest}} and in the vignette. The #' \code{contrast} argument of \code{\link{results}} should be used for #' generating results tables. #' #' @return a \code{\link{DESeqDataSet}} object with results stored as #' metadata columns. These results should accessed by calling the \code{\link{results}} #' function. By default this will return the log2 fold changes and p-values for the last #' variable in the design formula. See \code{\link{results}} for how to access results #' for other variables. #' #' @param object a DESeqDataSet object, see the constructor functions #' \code{\link{DESeqDataSet}}, #' \code{\link{DESeqDataSetFromMatrix}}, #' \code{\link{DESeqDataSetFromHTSeqCount}}. #' @param test either "Wald" or "LRT", which will then use either #' Wald significance tests (defined by \code{\link{nbinomWaldTest}}), #' or the likelihood ratio test on the difference in deviance between a #' full and reduced model formula (defined by \code{\link{nbinomLRT}}) #' @param fitType either "parametric", "local", or "mean" #' for the type of fitting of dispersions to the mean intensity. #' See \code{\link{estimateDispersions}} for description. #' @param betaPrior whether or not to put a zero-mean normal prior on #' the non-intercept coefficients #' See \code{\link{nbinomWaldTest}} for description of the calculation #' of the beta prior. In versions \code{>=1.16}, the default is set #' to \code{FALSE}, and shrunken LFCs are obtained afterwards using #' \code{\link{lfcShrink}}. #' @param full for \code{test="LRT"}, the full model formula, #' which is restricted to the formula in \code{design(object)}. #' alternatively, it can be a model matrix constructed by the user. #' advanced use: specifying a model matrix for full and \code{test="Wald"} #' is possible if \code{betaPrior=FALSE} #' @param reduced for \code{test="LRT"}, a reduced formula to compare against, #' i.e., the full formula with the term(s) of interest removed. #' alternatively, it can be a model matrix constructed by the user #' @param quiet whether to print messages at each step #' @param minReplicatesForReplace the minimum number of replicates required #' in order to use \code{\link{replaceOutliers}} on a #' sample. If there are samples with so many replicates, the model will #' be refit after these replacing outliers, flagged by Cook's distance. #' Set to \code{Inf} in order to never replace outliers. #' @param modelMatrixType either "standard" or "expanded", which describe #' how the model matrix, X of the GLM formula is formed. #' "standard" is as created by \code{model.matrix} using the #' design formula. "expanded" includes an indicator variable for each #' level of factors in addition to an intercept. for more information #' see the Description of \code{\link{nbinomWaldTest}}. #' betaPrior must be set to TRUE in order for expanded model matrices #' to be fit. #' @param parallel if FALSE, no parallelization. if TRUE, parallel #' execution using \code{BiocParallel}, see next argument \code{BPPARAM}. #' A note on running in parallel using \code{BiocParallel}: it may be #' advantageous to remove large, unneeded objects from your current #' R environment before calling \code{DESeq}, #' as it is possible that R's internal garbage collection #' will copy these files while running on worker nodes. #' @param BPPARAM an optional parameter object passed internally #' to \code{\link{bplapply}} when \code{parallel=TRUE}. #' If not specified, the parameters last registered with #' \code{\link{register}} will be used. #' #' @author Michael Love #' #' @references #' #' Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. \url{https://doi.org/10.1186/s13059-014-0550-8} #' #' @import BiocGenerics BiocParallel S4Vectors IRanges GenomicRanges SummarizedExperiment Biobase Rcpp methods #' #' @importFrom locfit locfit #' @importFrom genefilter rowVars filtered_p #' @importFrom Hmisc wtd.quantile #' #' @importFrom graphics axis hist plot points #' @importFrom stats Gamma as.formula coefficients df dnbinom dnorm formula glm loess lowess model.matrix optim p.adjust pchisq pnorm prcomp predict pt qf qnorm rchisq relevel rnbinom rnorm runif splinefun terms terms.formula #' @importFrom utils read.table #' #' @useDynLib DESeq2 #' #' @seealso \code{\link{nbinomWaldTest}}, \code{\link{nbinomLRT}} #' #' @examples #' #' # see vignette for suggestions on generating #' # count tables from RNA-Seq data #' cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10) #' cond <- factor(rep(1:2, each=5)) #' #' # object construction #' dds <- DESeqDataSetFromMatrix(cnts, DataFrame(cond), ~ cond) #' #' # standard analysis #' dds <- DESeq(dds) #' res <- results(dds) #' #' # moderated log2 fold changes #' resultsNames(dds) #' resLFC <- lfcShrink(dds, coef=2, res=res) #' #' # an alternate analysis: likelihood ratio test #' ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1) #' resLRT <- results(ddsLRT) #' #' @export DESeq <- function(object, test=c("Wald","LRT"), fitType=c("parametric","local","mean"), betaPrior, full=design(object), reduced, quiet=FALSE, minReplicatesForReplace=7, modelMatrixType, parallel=FALSE, BPPARAM=bpparam()) { # check arguments stopifnot(is(object, "DESeqDataSet")) test <- match.arg(test, choices=c("Wald","LRT")) fitType <- match.arg(fitType, choices=c("parametric","local","mean")) stopifnot(is.logical(quiet)) stopifnot(is.numeric(minReplicatesForReplace)) stopifnot(is.logical(parallel)) modelAsFormula <- !is.matrix(full) if (missing(betaPrior)) { betaPrior <- FALSE } else { stopifnot(is.logical(betaPrior)) } # get rid of any NA in the mcols(mcols(object)) object <- sanitizeRowRanges(object) if (test == "LRT") { if (missing(reduced)) { stop("likelihood ratio test requires a 'reduced' design, see ?DESeq") } if (betaPrior) { stop("test='LRT' does not support use of LFC shrinkage, use betaPrior=FALSE") } if (!missing(modelMatrixType) && modelMatrixType=="expanded") { stop("test='LRT' does not support use of expanded model matrix") } if (is.matrix(full) | is.matrix(reduced)) { if (!(is.matrix(full) & is.matrix(reduced))) { stop("if one of 'full' and 'reduced' is a matrix, the other must be also a matrix") } } if (modelAsFormula) { checkLRT(full, reduced) } else { checkFullRank(full) checkFullRank(reduced) if (ncol(full) <= ncol(reduced)) { stop("the number of columns of 'full' should be more than the number of columns of 'reduced'") } } } if (test == "Wald" & !missing(reduced)) { stop("'reduced' ignored when test='Wald'") } if (modelAsFormula) { # run some tests common to DESeq, nbinomWaldTest, nbinomLRT designAndArgChecker(object, betaPrior) if (design(object) == formula(~1)) { warning("the design is ~ 1 (just an intercept). is this intended?") } if (full != design(object)) { stop("'full' specified as formula should equal design(object)") } modelMatrix <- NULL } else { if (betaPrior == TRUE) { stop("betaPrior=TRUE is not supported for user-provided model matrices") } checkFullRank(full) # this will be used for dispersion estimation and testing modelMatrix <- full } attr(object, "betaPrior") <- betaPrior stopifnot(length(parallel) == 1 & is.logical(parallel)) if (!is.null(sizeFactors(object)) || !is.null(normalizationFactors(object))) { if (!quiet) { if (!is.null(normalizationFactors(object))) { message("using pre-existing normalization factors") } else { message("using pre-existing size factors") } } } else { if (!quiet) message("estimating size factors") object <- estimateSizeFactors(object) } if (!parallel) { if (!quiet) message("estimating dispersions") object <- estimateDispersions(object, fitType=fitType, quiet=quiet, modelMatrix=modelMatrix) if (!quiet) message("fitting model and testing") if (test == "Wald") { object <- nbinomWaldTest(object, betaPrior=betaPrior, quiet=quiet, modelMatrix=modelMatrix, modelMatrixType=modelMatrixType) } else if (test == "LRT") { object <- nbinomLRT(object, full=full, reduced=reduced, quiet=quiet) } } else if (parallel) { object <- DESeqParallel(object, test=test, fitType=fitType, betaPrior=betaPrior, full=full, reduced=reduced, quiet=quiet, modelMatrix=modelMatrix, modelMatrixType=modelMatrixType, BPPARAM=BPPARAM) } # if there are sufficient replicates, then pass through to refitting function sufficientReps <- any(nOrMoreInCell(attr(object,"modelMatrix"),minReplicatesForReplace)) if (sufficientReps) { object <- refitWithoutOutliers(object, test=test, betaPrior=betaPrior, full=full, reduced=reduced, quiet=quiet, minReplicatesForReplace=minReplicatesForReplace, modelMatrix=modelMatrix, modelMatrixType=modelMatrixType) } # stash the package version (again, also in construction) metadata(object)[["version"]] <- packageVersion("DESeq2") object } #' Make a simulated DESeqDataSet #' #' Constructs a simulated dataset of Negative Binomial data from #' two conditions. By default, there are no fold changes between #' the two conditions, but this can be adjusted with the \code{betaSD} argument. #' #' @param n number of rows #' @param m number of columns #' @param betaSD the standard deviation for non-intercept betas, i.e. beta ~ N(0,betaSD) #' @param interceptMean the mean of the intercept betas (log2 scale) #' @param interceptSD the standard deviation of the intercept betas (log2 scale) #' @param dispMeanRel a function specifying the relationship of the dispersions on #' \code{2^trueIntercept} #' @param sizeFactors multiplicative factors for each sample #' #' @return a \code{\link{DESeqDataSet}} with true dispersion, #' intercept and beta values in the metadata columns. Note that the true #' betas are provided on the log2 scale. #' #' @examples #' #' dds <- makeExampleDESeqDataSet() #' dds #' #' @export makeExampleDESeqDataSet <- function(n=1000,m=12,betaSD=0,interceptMean=4,interceptSD=2, dispMeanRel=function(x) 4/x + .1,sizeFactors=rep(1,m)) { beta <- cbind(rnorm(n,interceptMean,interceptSD),rnorm(n,0,betaSD)) dispersion <- dispMeanRel(2^(beta[,1])) colData <- DataFrame(condition=factor(rep(c("A","B"),times=c(ceiling(m/2),floor(m/2))))) x <- if (m > 1) { stats::model.matrix.default(~ colData$condition) } else { cbind(rep(1,m),rep(0,m)) } mu <- t(2^(x %*% t(beta)) * sizeFactors) countData <- matrix(rnbinom(m*n, mu=mu, size=1/dispersion), ncol=m) mode(countData) <- "integer" colnames(countData) <- paste("sample",1:m,sep="") rowRanges <- GRanges("1",IRanges(start=(1:n - 1) * 100 + 1,width=100)) names(rowRanges) <- paste0("gene",1:n) # set environment to global environment, # to avoid the formula carrying with it all the objects # here including 'object' itself. design <- if (m > 1) { as.formula("~ condition", env=.GlobalEnv) } else { as.formula("~ 1", env=.GlobalEnv) } object <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = design, rowRanges = rowRanges) trueVals <- DataFrame(trueIntercept = beta[,1], trueBeta = beta[,2], trueDisp = dispersion) mcols(trueVals) <- DataFrame(type=rep("input",ncol(trueVals)), description=c("simulated intercept values", "simulated beta values", "simulated dispersion values")) mcols(object) <- cbind(mcols(object),trueVals) return(object) } #' Low-level function to estimate size factors with robust regression. #' #' Given a matrix or data frame of count data, this function estimates the size #' factors as follows: Each column is divided by the geometric means of the #' rows. The median (or, if requested, another location estimator) of these #' ratios (skipping the genes with a geometric mean of zero) is used as the size #' factor for this column. Typically, one will not call this function directly, but use #' \code{\link{estimateSizeFactors}}. #' #' @param counts a matrix or data frame of counts, i.e., non-negative integer #' values #' @param locfunc a function to compute a location for a sample. By default, the #' median is used. However, especially for low counts, the #' \code{\link[genefilter]{shorth}} function from genefilter may give better results. #' @param geoMeans by default this is not provided, and the #' geometric means of the counts are calculated within the function. #' A vector of geometric means from another count matrix can be provided #' for a "frozen" size factor calculation #' @param controlGenes optional, numeric or logical index vector specifying those genes to #' use for size factor estimation (e.g. housekeeping or spike-in genes) #' @return a vector with the estimates size factors, one element per column #' @author Simon Anders #' @seealso \code{\link{estimateSizeFactors}} #' @examples #' #' dds <- makeExampleDESeqDataSet() #' estimateSizeFactorsForMatrix(counts(dds)) #' geoMeans <- exp(rowMeans(log(counts(dds)))) #' estimateSizeFactorsForMatrix(counts(dds),geoMeans=geoMeans) #' #' @export estimateSizeFactorsForMatrix <- function(counts, locfunc=stats::median, geoMeans, controlGenes) { if (missing(geoMeans)) { incomingGeoMeans <- FALSE loggeomeans <- rowMeans(log(counts)) } else { incomingGeoMeans <- TRUE if (length(geoMeans) != nrow(counts)) { stop("geoMeans should be as long as the number of rows of counts") } loggeomeans <- log(geoMeans) } if (all(is.infinite(loggeomeans))) { stop("every gene contains at least one zero, cannot compute log geometric means") } sf <- if (missing(controlGenes)) { apply(counts, 2, function(cnts) { exp(locfunc((log(cnts) - loggeomeans)[is.finite(loggeomeans) & cnts > 0])) }) } else { if ( !( is.numeric(controlGenes) | is.logical(controlGenes) ) ) { stop("controlGenes should be either a numeric or logical vector") } loggeomeansSub <- loggeomeans[controlGenes] apply(counts[controlGenes,,drop=FALSE], 2, function(cnts) { exp(locfunc((log(cnts) - loggeomeansSub)[is.finite(loggeomeansSub) & cnts > 0])) }) } if (incomingGeoMeans) { # stabilize size factors to have geometric mean of 1 sf <- sf/exp(mean(log(sf))) } sf } #' Low-level functions to fit dispersion estimates #' #' Normal users should instead use \code{\link{estimateDispersions}}. #' These low-level functions are called by \code{\link{estimateDispersions}}, #' but are exported and documented for non-standard usage. #' For instance, it is possible to replace fitted values with a custom fit and continue #' with the maximum a posteriori dispersion estimation, as demonstrated in the #' examples below. #' #' @param object a DESeqDataSet #' @param fitType either "parametric", "local", or "mean" #' for the type of fitting of dispersions to the mean intensity. #' See \code{\link{estimateDispersions}} for description. #' @param outlierSD the number of standard deviations of log #' gene-wise estimates above the prior mean (fitted value), #' above which dispersion estimates will be labelled #' outliers. Outliers will keep their original value and #' not be shrunk using the prior. #' @param dispPriorVar the variance of the normal prior on the log dispersions. #' If not supplied, this is calculated as the difference between #' the mean squared residuals of gene-wise estimates to the #' fitted dispersion and the expected sampling variance #' of the log dispersion #' @param minDisp small value for the minimum dispersion, to allow #' for calculations in log scale, one order of magnitude above this value is used #' as a test for inclusion in mean-dispersion fitting #' @param kappa_0 control parameter used in setting the initial proposal #' in backtracking search, higher kappa_0 results in larger steps #' @param dispTol control parameter to test for convergence of log dispersion, #' stop when increase in log posterior is less than dispTol #' @param maxit control parameter: maximum number of iterations to allow for convergence #' @param quiet whether to print messages at each step #' @param modelMatrix for advanced use only, #' a substitute model matrix for gene-wise and MAP dispersion estimation #' @param niter number of times to iterate between estimation of means and #' estimation of dispersion #' @param linearMu estimate the expected counts matrix using a linear model, #' default is NULL, in which case a lienar model is used if the #' number of groups defined by the model matrix is equal to the number #' of columns of the model matrix #' @param minmu lower bound on the estimated count for fitting gene-wise dispersion #' #' @return a DESeqDataSet with gene-wise, fitted, or final MAP #' dispersion estimates in the metadata columns of the object. #' #' \code{estimateDispersionsPriorVar} is called inside of \code{estimateDispersionsMAP} #' and stores the dispersion prior variance as an attribute of #' \code{dispersionFunction(dds)}, which can be manually provided to #' \code{estimateDispersionsMAP} for parallel execution. #' #' @aliases estimateDispersionsGeneEst estimateDispersionsFit estimateDispersionsMAP estimateDispersionsPriorVar #' #' @examples #' #' dds <- makeExampleDESeqDataSet() #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersionsGeneEst(dds) #' dds <- estimateDispersionsFit(dds) #' dds <- estimateDispersionsMAP(dds) #' plotDispEsts(dds) #' #' # after having run estimateDispersionsFit() #' # the dispersion prior variance over all genes #' # can be obtained like so: #' #' dispPriorVar <- estimateDispersionsPriorVar(dds) #' #' @seealso \code{\link{estimateDispersions}} #' #' @export estimateDispersionsGeneEst <- function(object, minDisp=1e-8, kappa_0=1, dispTol=1e-6, maxit=100, quiet=FALSE, modelMatrix=NULL, niter=1, linearMu=NULL, minmu=0.5) { if (!is.null(mcols(object)$dispGeneEst)) { if (!quiet) message("found already estimated gene-wise dispersions, removing these") removeCols <- c("dispGeneEst") mcols(object) <- mcols(object)[,!names(mcols(object)) %in% removeCols,drop=FALSE] } stopifnot(length(minDisp) == 1) stopifnot(length(kappa_0) == 1) stopifnot(length(dispTol) == 1) stopifnot(length(maxit) == 1) if (log(minDisp/10) <= -30) { stop("for computational stability, log(minDisp/10) should be above -30") } # in case the class of the mcols(mcols(object)) are not character object <- sanitizeRowRanges(object) if (is.null(modelMatrix)) { modelMatrix <- getModelMatrix(object) checkFullRank(modelMatrix) if (nrow(modelMatrix) == ncol(modelMatrix)) { stop("the number of samples and the number of model coefficients are equal, i.e., there are no replicates to estimate the dispersion. use an alternate design formula") } } else { message("using supplied model matrix") } object <- getBaseMeansAndVariances(object) # only continue on the rows with non-zero row mean objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] # this rough dispersion estimate (alpha_hat) # is for estimating mu # and for the initial starting point for line search # first check if model matrix is full rank fullRank <- qr(modelMatrix)$rank == ncol(modelMatrix) alpha_hat <- if (fullRank) { # if full rank use this estimator which compares normalized counts to mu roughDisp <- roughDispEstimate(y = counts(objectNZ,normalized=TRUE), x = modelMatrix) momentsDisp <- momentsDispEstimate(objectNZ) pmin(roughDisp, momentsDisp) } else { # if not full rank use method of moments across all samples momentsDispEstimate(objectNZ) } # bound the rough estimated alpha between minDisp and maxDisp for numeric stability maxDisp <- max(10, ncol(object)) alpha_hat <- alpha_hat_new <- alpha_init <- pmin(pmax(minDisp, alpha_hat), maxDisp) stopifnot(length(niter) == 1 & niter > 0) # use weights if they are present in assays(object) # (we need this already to decide about linear mu fitting) wlist <- getAndCheckWeights(object, modelMatrix) weights <- wlist$weights # don't let weights go below 1e-6 weights <- pmax(weights, 1e-6) useWeights <- wlist$useWeights # use a linear model to estimate the expected counts # if the number of groups according to the model matrix # is equal to the number of columns if (is.null(linearMu)) { modelMatrixGroups <- modelMatrixGroups(modelMatrix) linearMu <- nlevels(modelMatrixGroups) == ncol(modelMatrix) # also check for weights (then can't do linear mu) if (useWeights) { linearMu <- FALSE } } # below, iterate between mean and dispersion estimation (niter) times fitidx <- rep(TRUE,nrow(objectNZ)) mu <- matrix(0, nrow=nrow(objectNZ), ncol=ncol(objectNZ)) dispIter <- numeric(nrow(objectNZ)) # bound the estimated count by 'minmu' # this helps make the fitting more robust, # because 1/mu occurs in the weights for the NB GLM for (iter in seq_len(niter)) { if (!linearMu) { fit <- fitNbinomGLMs(objectNZ[fitidx,,drop=FALSE], alpha_hat=alpha_hat[fitidx], modelMatrix=modelMatrix) fitMu <- fit$mu } else { fitMu <- linearModelMuNormalized(objectNZ[fitidx,,drop=FALSE], modelMatrix) } fitMu[fitMu < minmu] <- minmu mu[fitidx,] <- fitMu # use of kappa_0 in backtracking search # initial proposal = log(alpha) + kappa_0 * deriv. of log lik. w.r.t. log(alpha) # use log(minDisp/10) to stop if dispersions going to -infinity dispRes <- fitDispWrapper(ySEXP = counts(objectNZ)[fitidx,,drop=FALSE], xSEXP = modelMatrix, mu_hatSEXP = fitMu, log_alphaSEXP = log(alpha_hat)[fitidx], log_alpha_prior_meanSEXP = log(alpha_hat)[fitidx], log_alpha_prior_sigmasqSEXP = 1, min_log_alphaSEXP = log(minDisp/10), kappa_0SEXP = kappa_0, tolSEXP = dispTol, maxitSEXP = maxit, usePriorSEXP = FALSE, weightsSEXP = weights, useWeightsSEXP = useWeights) dispIter[fitidx] <- dispRes$iter alpha_hat_new[fitidx] <- pmin(exp(dispRes$log_alpha), maxDisp) # only rerun those rows which moved fitidx <- abs(log(alpha_hat_new) - log(alpha_hat)) > .05 alpha_hat <- alpha_hat_new if (sum(fitidx) == 0) break } # dont accept moves if the log posterior did not # increase by more than one millionth, # and set the small estimates to the minimum dispersion dispGeneEst <- alpha_hat if (niter == 1) { noIncrease <- dispRes$last_lp < dispRes$initial_lp + abs(dispRes$initial_lp)/1e6 dispGeneEst[which(noIncrease)] <- alpha_init[which(noIncrease)] } dispGeneEstConv <- dispIter < maxit # if lacking convergence from fitDisp() (C++)... refitDisp <- !dispGeneEstConv & dispGeneEst > minDisp*10 if (sum(refitDisp) > 0) { dispGrid <- fitDispGridWrapper(y = counts(objectNZ)[refitDisp,,drop=FALSE], x = modelMatrix, mu = mu[refitDisp,,drop=FALSE], logAlphaPriorMean = rep(0,sum(refitDisp)), logAlphaPriorSigmaSq = 1, usePrior = FALSE, weightsSEXP = weights, useWeightsSEXP = useWeights) dispGeneEst[refitDisp] <- dispGrid } dispGeneEst <- pmin(pmax(dispGeneEst, minDisp), maxDisp) dispDataFrame <- buildDataFrameWithNARows(list(dispGeneEst=dispGeneEst), mcols(object)$allZero) mcols(dispDataFrame) <- DataFrame(type=rep("intermediate",ncol(dispDataFrame)), description=c("gene-wise estimates of dispersion")) mcols(object) <- cbind(mcols(object), dispDataFrame) assays(object)[["mu"]] <- buildMatrixWithNARows(mu, mcols(object)$allZero) return(object) } #' @rdname estimateDispersionsGeneEst #' @export estimateDispersionsFit <- function(object,fitType=c("parametric","local","mean"), minDisp=1e-8, quiet=FALSE) { if (is.null(mcols(object)$allZero)) { object <- getBaseMeansAndVariances(object) } objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] useForFit <- mcols(objectNZ)$dispGeneEst > 100*minDisp if (sum(useForFit) == 0) { stop("all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT") } fitType <- match.arg(fitType, choices=c("parametric","local","mean")) stopifnot(length(fitType)==1) stopifnot(length(minDisp)==1) if (fitType == "parametric") { trial <- try(dispFunction <- parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit], mcols(objectNZ)$dispGeneEst[useForFit]), silent=TRUE) if (inherits(trial,"try-error")) { message("-- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time.") fitType <- "local" } } if (fitType == "local") { dispFunction <- localDispersionFit(means = mcols(objectNZ)$baseMean[useForFit], disps = mcols(objectNZ)$dispGeneEst[useForFit], minDisp = minDisp) } if (fitType == "mean") { useForMean <- mcols(objectNZ)$dispGeneEst > 10*minDisp meanDisp <- mean(mcols(objectNZ)$dispGeneEst[useForMean],na.rm=TRUE,trim=0.001) dispFunction <- function(means) meanDisp attr( dispFunction, "mean" ) <- meanDisp } if (!(fitType %in% c("parametric","local","mean"))) { stop("unknown fitType") } # store the dispersion function and attributes attr( dispFunction, "fitType" ) <- fitType if (quiet) { suppressMessages({ dispersionFunction(object) <- dispFunction }) } else { dispersionFunction(object) <- dispFunction } return(object) } #' @rdname estimateDispersionsGeneEst #' @export estimateDispersionsMAP <- function(object, outlierSD=2, dispPriorVar, minDisp=1e-8, kappa_0=1, dispTol=1e-6, maxit=100, modelMatrix=NULL, quiet=FALSE) { stopifnot(length(outlierSD)==1) stopifnot(length(minDisp)==1) stopifnot(length(kappa_0)==1) stopifnot(length(dispTol)==1) stopifnot(length(maxit)==1) if (is.null(mcols(object)$allZero)) { object <- getBaseMeansAndVariances(object) } if (!is.null(mcols(object)$dispersion)) { if (!quiet) message("found already estimated dispersions, removing these") removeCols <- c("dispersion","dispOutlier","dispMAP","dispIter","dispConv") mcols(object) <- mcols(object)[,!names(mcols(object)) %in% removeCols,drop=FALSE] } if (is.null(modelMatrix)) { modelMatrix <- getModelMatrix(object) } else { message("using supplied model matrix") } # fill in the calculated dispersion prior variance if (missing(dispPriorVar)) { # if no gene-wise estimates above minimum if (sum(mcols(object)$dispGeneEst >= minDisp*100,na.rm=TRUE) == 0) { warning(paste0("all genes have dispersion estimates < ",minDisp*10, ", returning disp = ",minDisp*10)) resultsList <- list(dispersion = rep(minDisp*10, sum(!mcols(object)$allZero))) dispDataFrame <- buildDataFrameWithNARows(resultsList, mcols(object)$allZero) mcols(dispDataFrame) <- DataFrame(type="intermediate", description="final estimates of dispersion") mcols(object) <- cbind(mcols(object), dispDataFrame) dispFn <- dispersionFunction(object) attr( dispFn, "dispPriorVar" ) <- 0.25 dispersionFunction(object, estimateVar=FALSE) <- dispFn return(object) } dispPriorVar <- estimateDispersionsPriorVar(object, modelMatrix=modelMatrix) dispFn <- dispersionFunction(object) attr( dispFn, "dispPriorVar" ) <- dispPriorVar dispersionFunction(object, estimateVar=FALSE) <- dispFn } else { dispFn <- dispersionFunction(object) attr( dispFn, "dispPriorVar" ) <- dispPriorVar dispersionFunction(object, estimateVar=FALSE) <- dispFn } stopifnot(length(dispPriorVar)==1) objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] varLogDispEsts <- attr( dispersionFunction(object), "varLogDispEsts" ) # set prior variance for fitting dispersion log_alpha_prior_sigmasq <- dispPriorVar # get previously calculated mu mu <- assays(objectNZ)[["mu"]] # start fitting at gene estimate unless the points are one order of magnitude # below the fitted line, then start at fitted line dispInit <- ifelse(mcols(objectNZ)$dispGeneEst > 0.1 * mcols(objectNZ)$dispFit, mcols(objectNZ)$dispGeneEst, mcols(objectNZ)$dispFit) # if any missing values, fill in the fitted value to initialize dispInit[is.na(dispInit)] <- mcols(objectNZ)$dispFit[is.na(dispInit)] # use weights if they are present in assays(object) wlist <- getAndCheckWeights(object, modelMatrix) weights <- wlist$weights useWeights <- wlist$useWeights # run with prior dispResMAP <- fitDispWrapper(ySEXP = counts(objectNZ), xSEXP = modelMatrix, mu_hatSEXP = mu, log_alphaSEXP = log(dispInit), log_alpha_prior_meanSEXP = log(mcols(objectNZ)$dispFit), log_alpha_prior_sigmasqSEXP = log_alpha_prior_sigmasq, min_log_alphaSEXP = log(minDisp/10), kappa_0SEXP = kappa_0, tolSEXP = dispTol, maxitSEXP = maxit, usePriorSEXP = TRUE, weightsSEXP = weights, useWeightsSEXP = useWeights) # prepare dispersions for storage in mcols(object) dispMAP <- exp(dispResMAP$log_alpha) # when lacking convergence from fitDisp() (C++) # we use a function to maximize dispersion parameter # along an adaptive grid (also C++) dispConv <- dispResMAP$iter < maxit refitDisp <- !dispConv if (sum(refitDisp) > 0) { dispGrid <- fitDispGridWrapper(y = counts(objectNZ)[refitDisp,,drop=FALSE], x = modelMatrix, mu = mu[refitDisp,,drop=FALSE], logAlphaPriorMean = log(mcols(objectNZ)$dispFit)[refitDisp], logAlphaPriorSigmaSq = log_alpha_prior_sigmasq, usePrior=TRUE, weightsSEXP = weights, useWeightsSEXP = useWeights) dispMAP[refitDisp] <- dispGrid } # bound the dispersion estimate between minDisp and maxDisp for numeric stability maxDisp <- max(10, ncol(object)) dispMAP <- pmin(pmax(dispMAP, minDisp), maxDisp) dispersionFinal <- dispMAP # detect outliers which have gene-wise estimates # outlierSD * standard deviation of log gene-wise estimates # above the fitted mean (prior mean) # and keep the original gene-est value for these. # Note: we use the variance of log dispersions estimates # from all the genes, not only those from below dispOutlier <- log(mcols(objectNZ)$dispGeneEst) > log(mcols(objectNZ)$dispFit) + outlierSD * sqrt(varLogDispEsts) dispOutlier[is.na(dispOutlier)] <- FALSE dispersionFinal[dispOutlier] <- mcols(objectNZ)$dispGeneEst[dispOutlier] resultsList <- list(dispersion = dispersionFinal, dispIter = dispResMAP$iter, dispOutlier = dispOutlier, dispMAP = dispMAP) dispDataFrame <- buildDataFrameWithNARows(resultsList, mcols(object)$allZero) mcols(dispDataFrame) <- DataFrame(type=rep("intermediate",ncol(dispDataFrame)), description=c("final estimate of dispersion", "number of iterations", "dispersion flagged as outlier", "maximum a posteriori estimate")) mcols(object) <- cbind(mcols(object), dispDataFrame) return(object) } #' @rdname estimateDispersionsGeneEst #' @export estimateDispersionsPriorVar <- function(object, minDisp=1e-8, modelMatrix=NULL) { objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] aboveMinDisp <- mcols(objectNZ)$dispGeneEst >= minDisp*100 if (is.null(modelMatrix)) { modelMatrix <- getModelMatrix(object) } # estimate the variance of the distribution of the # log dispersion estimates around the fitted value dispResiduals <- log(mcols(objectNZ)$dispGeneEst) - log(mcols(objectNZ)$dispFit) if (sum(aboveMinDisp,na.rm=TRUE) == 0) { stop("no data found which is greater than minDisp") } varLogDispEsts <- attr(dispersionFunction(object), "varLogDispEsts") m <- nrow(modelMatrix) p <- ncol(modelMatrix) # if the residual degrees of freedom is between 1 and 3, the distribution # of log dispersions is especially asymmetric and poorly estimated # by the MAD. we then use an alternate estimator, a monte carlo # approach to match the distribution if (((m - p) <= 3) & (m > p)) { # in order to produce identical results we set the seed, # and so we need to save and restore the .Random.seed value first if (exists(".Random.seed")) { oldRandomSeed <- .Random.seed } set.seed(2) # The residuals are the observed distribution we try to match obsDist <- dispResiduals[aboveMinDisp] brks <- -20:20/2 obsDist <- obsDist[obsDist > min(brks) & obsDist < max(brks)] obsVarGrid <- seq(from=0,to=8,length=200) obsDistHist <- hist(obsDist,breaks=brks,plot=FALSE) klDivs <- sapply(obsVarGrid, function(x) { randDist <- log(rchisq(1e4,df=(m-p))) + rnorm(1e4,0,sqrt(x)) - log(m - p) randDist <- randDist[randDist > min(brks) & randDist < max(brks)] randDistHist <- hist(randDist,breaks=brks,plot=FALSE) z <- c(obsDistHist$density,randDistHist$density) small <- min(z[z > 0]) kl <- sum(obsDistHist$density * (log(obsDistHist$density + small) - log(randDistHist$density + small))) kl }) lofit <- loess(klDivs ~ obsVarGrid, span=.2) obsVarFineGrid <- seq(from=0,to=8,length=1000) lofitFitted <- predict(lofit,obsVarFineGrid) argminKL <- obsVarFineGrid[which.min(lofitFitted)] expVarLogDisp <- trigamma((m - p)/2) dispPriorVar <- pmax(argminKL, 0.25) # finally, restore the .Random.seed if it existed beforehand if (exists("oldRandomSeed")) { .Random.seed <<- oldRandomSeed } return(dispPriorVar) } # estimate the expected sampling variance of the log estimates # Var(log(cX)) = Var(log(X)) # X ~ chi-squared with m - p degrees of freedom if (m > p) { expVarLogDisp <- trigamma((m - p)/2) # set the variance of the prior using these two estimates # with a minimum of .25 dispPriorVar <- pmax((varLogDispEsts - expVarLogDisp), 0.25) } else { # we have m = p, so do not try to subtract sampling variance dispPriorVar <- varLogDispEsts expVarLogDisp <- 0 } dispPriorVar } #' Wald test for the GLM coefficients #' #' This function tests for significance of coefficients in a Negative #' Binomial GLM, using previously calculated \code{\link{sizeFactors}} #' (or \code{\link{normalizationFactors}}) #' and dispersion estimates. See \code{\link{DESeq}} for the GLM formula. #' #' The fitting proceeds as follows: standard maximum likelihood estimates #' for GLM coefficients (synonymous with "beta", "log2 fold change", "effect size") #' are calculated. #' Then, optionally, a zero-centered Normal prior distribution #' (\code{betaPrior}) is assumed for the coefficients other than the intercept. #' #' Note that this posterior log2 fold change #' estimation is now not the default setting for \code{nbinomWaldTest}, #' as the standard workflow for coefficient shrinkage has moved to #' an additional function \code{link{lfcShrink}}. #' #' For calculating Wald test p-values, the coefficients are scaled by their #' standard errors and then compared to a standard Normal distribution. #' The \code{\link{results}} #' function without any arguments will automatically perform a contrast of the #' last level of the last variable in the design formula over the first level. #' The \code{contrast} argument of the \code{\link{results}} function can be used #' to generate other comparisons. #' #' The Wald test can be replaced with the \code{\link{nbinomLRT}} #' for an alternative test of significance. #' #' Notes on the log2 fold change prior: #' #' The variance of the prior distribution for each #' non-intercept coefficient is calculated using the observed #' distribution of the maximum likelihood coefficients. #' The final coefficients are then maximum a posteriori estimates #' using this prior (Tikhonov/ridge regularization). #' See below for details on the #' prior variance and the Methods section of the DESeq2 manuscript for more detail. #' The use of a prior has little effect on genes with high counts and helps to #' moderate the large spread in coefficients for genes with low counts. #' #' The prior variance is calculated by matching the 0.05 upper quantile #' of the observed MLE coefficients to a zero-centered Normal distribution. #' In a change of methods since the 2014 paper, #' the weighted upper quantile is calculated using the #' \code{wtd.quantile} function from the Hmisc package. The weights are #' the inverse of the expected variance of log counts, so the inverse of #' \eqn{1/\bar{\mu} + \alpha_{tr}}{1/mu-bar + alpha_tr} using the mean of #' normalized counts and the trended dispersion fit. The weighting ensures #' that noisy estimates of log fold changes from small count genes do not #' overly influence the calculation of the prior variance. #' See \code{\link{estimateBetaPriorVar}}. #' The final prior variance for a factor level is the average of the #' estimated prior variance over all contrasts of all levels of the factor. #' #' When a log2 fold change prior is used (betaPrior=TRUE), #' then \code{nbinomWaldTest} will by default use expanded model matrices, #' as described in the \code{modelMatrixType} argument, unless this argument #' is used to override the default behavior. #' This ensures that log2 fold changes will be independent of the choice #' of reference level. In this case, the beta prior variance for each factor #' is calculated as the average of the mean squared maximum likelihood #' estimates for each level and every possible contrast. #' #' @param object a DESeqDataSet #' @param betaPrior whether or not to put a zero-mean normal prior on #' the non-intercept coefficients #' @param betaPriorVar a vector with length equal to the number of #' model terms including the intercept. #' betaPriorVar gives the variance of the prior on the sample betas #' on the log2 scale. if missing (default) this is estimated from the data #' @param modelMatrix an optional matrix, typically this is set to NULL #' and created within the function. only can be supplied if betaPrior=FALSE #' @param modelMatrixType either "standard" or "expanded", which describe #' how the model matrix, X of the formula in \code{\link{DESeq}}, is #' formed. "standard" is as created by \code{model.matrix} using the #' design formula. "expanded" includes an indicator variable for each #' level of factors in addition to an intercept. #' betaPrior must be set to TRUE in order for expanded model matrices #' to be fit. #' @param betaTol control parameter defining convergence #' @param maxit the maximum number of iterations to allow for convergence of the #' coefficient vector #' @param useOptim whether to use the native optim function on rows which do not #' converge within maxit #' @param quiet whether to print messages at each step #' @param useT whether to use a t-distribution as a null distribution, #' for significance testing of the Wald statistics. #' If FALSE, a standard normal null distribution is used. #' @param df the degrees of freedom for the t-distribution #' @param useQR whether to use the QR decomposition on the design #' matrix X while fitting the GLM #' #' @return a DESeqDataSet with results columns accessible #' with the \code{\link{results}} function. The coefficients and standard errors are #' reported on a log2 scale. #' #' @seealso \code{\link{DESeq}}, \code{\link{nbinomLRT}} #' #' @examples #' #' dds <- makeExampleDESeqDataSet() #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersions(dds) #' dds <- nbinomWaldTest(dds) #' res <- results(dds) #' #' @export nbinomWaldTest <- function(object, betaPrior=FALSE, betaPriorVar, modelMatrix=NULL, modelMatrixType, betaTol=1e-8, maxit=100, useOptim=TRUE, quiet=FALSE, useT=FALSE, df, useQR=TRUE) { if (is.null(dispersions(object))) { stop("testing requires dispersion estimates, first call estimateDispersions()") } stopifnot(length(maxit)==1) # in case the class of the mcols(mcols(object)) are not character object <- sanitizeRowRanges(object) if ("results" %in% mcols(mcols(object))$type) { if (!quiet) message("found results columns, replacing these") object <- removeResults(object) } if (is.null(mcols(object)$allZero)) { object <- getBaseMeansAndVariances(object) } # only continue on the rows with non-zero row mean objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] # model matrix not provided... if (is.null(modelMatrix)) { modelAsFormula <- TRUE termsOrder <- attr(terms.formula(design(object)),"order") interactionPresent <- any(termsOrder > 1) if (missing(betaPrior)) { betaPrior <- FALSE } # run some tests common to DESeq, nbinomWaldTest, nbinomLRT designAndArgChecker(object, betaPrior) # what kind of model matrix to use stopifnot(is.logical(betaPrior)) blindDesign <- design(object) == formula(~ 1) if (blindDesign) { betaPrior <- FALSE } if (missing(modelMatrixType) || is.null(modelMatrixType)) { modelMatrixType <- if (betaPrior) { "expanded" } else { "standard" } } if (modelMatrixType == "expanded" & !betaPrior) { stop("expanded model matrices require a beta prior") } # store modelMatrixType so it can be accessed by estimateBetaPriorVar attr(object, "modelMatrixType") <- modelMatrixType hasIntercept <- attr(terms(design(object)),"intercept") == 1 renameCols <- hasIntercept } else { # model matrix was provided... if (missing(betaPrior)) { betaPrior <- FALSE } else { if (betaPrior) stop("the model matrix can only be user-supplied if betaPrior=FALSE") } message("using supplied model matrix") modelAsFormula <- FALSE attr(object, "modelMatrixType") <- "user-supplied" renameCols <- FALSE } if (!betaPrior) { # fit the negative binomial GLM without a prior # (in actuality a very wide prior with standard deviation 1e3 on log2 fold changes) fit <- fitNbinomGLMs(objectNZ, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, renameCols=renameCols, modelMatrix=modelMatrix) H <- fit$hat_diagonals mu <- fit$mu modelMatrix <- fit$modelMatrix modelMatrixNames <- fit$modelMatrixNames # record the wide prior variance which was used in fitting betaPriorVar <- rep(1e6, ncol(fit$modelMatrix)) } else { priorFitList <- fitGLMsWithPrior(object=object, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, betaPriorVar=betaPriorVar) fit <- priorFitList$fit H <- priorFitList$H mu <- priorFitList$mu betaPriorVar <- priorFitList$betaPriorVar modelMatrix <- priorFitList$modelMatrix mleBetaMatrix <- priorFitList$mleBetaMatrix # will add the MLE betas, so remove any which exist already # (possibly coming from estimateMLEForBetaPriorVar) mcols(object) <- mcols(object)[,grep("MLE_",names(mcols(object)),invert=TRUE)] } # store mu in case the user did not call estimateDispersionsGeneEst dimnames(mu) <- NULL assays(objectNZ)[["mu"]] <- mu assays(object)[["mu"]] <- buildMatrixWithNARows(mu, mcols(object)$allZero) # store the prior variance directly as an attribute # of the DESeqDataSet object, so it can be pulled later by # the results function (necessary for setting max Cook's distance) attr(object,"betaPrior") <- betaPrior attr(object,"betaPriorVar") <- betaPriorVar attr(object,"modelMatrix") <- modelMatrix attr(object,"test") <- "Wald" # calculate Cook's distance dispModelMatrix <- if (modelAsFormula) { getModelMatrix(object) } else { modelMatrix } attr(object,"dispModelMatrix") <- dispModelMatrix cooks <- calculateCooksDistance(objectNZ, H, dispModelMatrix) # record maximum Cook's maxCooks <- recordMaxCooks(design(object), colData(object), dispModelMatrix, cooks, nrow(objectNZ)) # store Cook's distance for each sample assays(object)[["cooks"]] <- buildMatrixWithNARows(cooks, mcols(object)$allZero) # add betas, standard errors and Wald p-values to the object modelMatrixNames <- colnames(modelMatrix) betaMatrix <- fit$betaMatrix colnames(betaMatrix) <- modelMatrixNames betaSE <- fit$betaSE colnames(betaSE) <- paste0("SE_",modelMatrixNames) WaldStatistic <- betaMatrix/betaSE colnames(WaldStatistic) <- paste0("WaldStatistic_",modelMatrixNames) # if useT is set to TRUE, use a t-distribution if (useT) { dispPriorVar <- attr( dispersionFunction(object), "dispPriorVar" ) stopifnot(length(df)==1 | length(df)==nrow(object)) if (length(df) == nrow(object)) { df <- df[!mcols(object)$allZero] stopifnot(length(df)==nrow(WaldStatistic)) } WaldPvalue <- 2*pt(abs(WaldStatistic),df=df,lower.tail=FALSE) } else { WaldPvalue <- 2*pnorm(abs(WaldStatistic),lower.tail=FALSE) } colnames(WaldPvalue) <- paste0("WaldPvalue_",modelMatrixNames) betaConv <- fit$betaConv if (any(!betaConv)) { if (!quiet) message(paste(sum(!betaConv),"rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest")) } mleBetas <- if (betaPrior) { matrixToList(mleBetaMatrix) } else { NULL } resultsList <- c(matrixToList(betaMatrix), matrixToList(betaSE), mleBetas, matrixToList(WaldStatistic), matrixToList(WaldPvalue), list(betaConv = betaConv, betaIter = fit$betaIter, deviance = -2 * fit$logLike, maxCooks = maxCooks)) WaldResults <- buildDataFrameWithNARows(resultsList, mcols(object)$allZero) modelMatrixNamesSpaces <- gsub("_"," ",modelMatrixNames) lfcType <- if (attr(object,"betaPrior")) "MAP" else "MLE" coefInfo <- paste(paste0("log2 fold change (",lfcType,"):"),modelMatrixNamesSpaces) seInfo <- paste("standard error:",modelMatrixNamesSpaces) mleInfo <- if (betaPrior) { gsub("_"," ",colnames(mleBetaMatrix)) } else { NULL } statInfo <- paste("Wald statistic:",modelMatrixNamesSpaces) pvalInfo <- paste("Wald test p-value:",modelMatrixNamesSpaces) mcols(WaldResults) <- DataFrame(type = rep("results",ncol(WaldResults)), description = c(coefInfo, seInfo, mleInfo, statInfo, pvalInfo, "convergence of betas", "iterations for betas", "deviance for the fitted model", "maximum Cook's distance for row")) mcols(object) <- cbind(mcols(object),WaldResults) return(object) } #' Steps for estimating the beta prior variance #' #' These lower-level functions are called within \code{\link{DESeq}} or \code{\link{nbinomWaldTest}}. #' End users should use those higher-level function instead. #' NOTE: \code{estimateBetaPriorVar} returns a numeric vector, not a DESEqDataSet! #' For advanced users: to use these functions, first run \code{estimateMLEForBetaPriorVar} #' and then run \code{estimateBetaPriorVar}. #' #' @param object a DESeqDataSet #' #' @param maxit as defined in \code{link{nbinomWaldTest}} #' @param useOptim as defined in \code{link{nbinomWaldTest}} #' @param useQR as defined in \code{link{nbinomWaldTest}} #' #' @param modelMatrixType an optional override for the type which is set internally #' #' @param betaPriorMethod the method for calculating the beta prior variance, #' either "quanitle" or "weighted": #' "quantile" matches a normal distribution using the upper quantile of the finite MLE betas. #' "weighted" matches a normal distribution using the upper quantile, but weighting by the variance of the MLE betas. #' @param upperQuantile the upper quantile to be used for the #' "quantile" or "weighted" method of beta prior variance estimation #' #' #' @return for \code{estimateMLEForBetaPriorVar}, a DESeqDataSet, with the #' necessary information stored in order to calculate the prior variance. #' for \code{estimateBetaPriorVar}, the vector of variances for the prior #' on the betas in the \code{\link{DESeq}} GLM #' #' @aliases estimateBetaPriorVar estimateMLEForBetaPriorVar #' #' @export estimateBetaPriorVar <- function(object, betaPriorMethod=c("weighted","quantile"), upperQuantile=0.05) { objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] betaMatrix <- as.matrix(mcols(objectNZ)[,grep("MLE_", names(mcols(object))),drop=FALSE]) colnamesBM <- colnames(betaMatrix) colnamesBM <- gsub("MLE_(.*)","\\1",colnamesBM) # renaming in reverse: # make these standard colnames as from model.matrix convertNames <- renameModelMatrixColumns(colData(object),design(object)) colnamesBM <- sapply(colnamesBM, function(x) { if (x %in% convertNames$to) { convertNames$from[convertNames$to == x] } else { x } }) colnames(betaMatrix) <- colnamesBM # this is the model matrix from an MLE run modelMatrix <- getModelMatrix(object) modelMatrixType <- attr(object, "modelMatrixType") betaPriorMethod <- match.arg(betaPriorMethod, choices=c("weighted","quantile")) # estimate the variance of the prior on betas # if expanded, first calculate LFC for all possible contrasts if (modelMatrixType == "expanded") { betaMatrix <- addAllContrasts(objectNZ, betaMatrix) } # weighting by 1/Var(log(K)) # Var(log(K)) ~ Var(K)/mu^2 = 1/mu + alpha # and using the fitted alpha dispFit <- mcols(objectNZ)$dispFit if (is.null(dispFit)) { # betaPrior routine could have been called w/o the dispersion fitted trend dispFit <- mean(dispersions(objectNZ)) } varlogk <- 1/mcols(objectNZ)$baseMean + dispFit weights <- 1/varlogk betaPriorVar <- if (nrow(betaMatrix) > 1) { apply(betaMatrix, 2, function(x) { # this test removes genes which have betas # tending to +/- infinity useFinite <- abs(x) < 10 # if no more betas pass test, return wide prior if (sum(useFinite) == 0 ) { return(1e6) } else { if (betaPriorMethod=="quantile") { return(matchUpperQuantileForVariance(x[useFinite],upperQuantile)) } else if (betaPriorMethod=="weighted") { return(matchWeightedUpperQuantileForVariance(x[useFinite],weights[useFinite],upperQuantile)) } } }) } else { (betaMatrix)^2 } names(betaPriorVar) <- colnames(betaMatrix) # intercept set to wide prior if ("Intercept" %in% names(betaPriorVar)) { betaPriorVar[which(names(betaPriorVar) == "Intercept")] <- 1e6 } if (modelMatrixType == "expanded") { # bring over beta priors from the GLM fit without prior. # for factors: prior variance of each level are the average of the # prior variances for the levels present in the previous GLM fit betaPriorExpanded <- averagePriorsOverLevels(objectNZ, betaPriorVar) betaPriorVar <- betaPriorExpanded } betaPriorVar } #' @rdname estimateBetaPriorVar #' @export estimateMLEForBetaPriorVar <- function(object, maxit=100, useOptim=TRUE, useQR=TRUE, modelMatrixType=NULL) { # this function copies code from other functions, # in order to allow parallelization objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] if (is.null(modelMatrixType)) { # this code copied from nbinomWaldTest() blindDesign <- design(object) == formula(~ 1) mmTypeTest <- !blindDesign modelMatrixType <- if (mmTypeTest) { "expanded" } else { "standard" } } attr(object, "modelMatrixType") <- modelMatrixType # this code copied from fitGLMsWithPrior() fit <- fitNbinomGLMs(objectNZ, maxit=maxit, useOptim=useOptim, useQR=useQR, renameCols = (modelMatrixType == "standard")) modelMatrix <- fit$modelMatrix modelMatrixNames <- colnames(modelMatrix) H <- fit$hat_diagonal betaMatrix <- fit$betaMatrix modelMatrixNames[modelMatrixNames == "(Intercept)"] <- "Intercept" modelMatrixNames <- make.names(modelMatrixNames) colnames(betaMatrix) <- modelMatrixNames convertNames <- renameModelMatrixColumns(colData(object), design(objectNZ)) convertNames <- convertNames[convertNames$from %in% modelMatrixNames,,drop=FALSE] modelMatrixNames[match(convertNames$from, modelMatrixNames)] <- convertNames$to mleBetaMatrix <- fit$betaMatrix colnames(mleBetaMatrix) <- paste0("MLE_",modelMatrixNames) # remove any MLE columns if they exist mcols(object) <- mcols(object)[,grep("MLE_",names(mcols(object)),invert=TRUE)] mcols(object) <- cbind(mcols(object), buildDataFrameWithNARows(DataFrame(mleBetaMatrix), mcols(object)$allZero)) assays(object)[["H"]] <- buildMatrixWithNARows(H, mcols(object)$allZero) object } #' Likelihood ratio test (chi-squared test) for GLMs #' #' This function tests for significance of change in deviance between a #' full and reduced model which are provided as \code{formula}. #' Fitting uses previously calculated \code{\link{sizeFactors}} (or \code{\link{normalizationFactors}}) #' and dispersion estimates. #' #' The difference in deviance is compared to a chi-squared distribution #' with df = (reduced residual degrees of freedom - full residual degrees of freedom). #' This function is comparable to the \code{nbinomGLMTest} of the previous version of DESeq #' and an alternative to the default \code{\link{nbinomWaldTest}}. #' #' @param object a DESeqDataSet #' @param full the full model formula, this should be the formula in #' \code{design(object)}. #' alternatively, can be a matrix #' @param reduced a reduced formula to compare against, e.g. #' the full model with a term or terms of interest removed. #' alternatively, can be a matrix #' @param betaTol control parameter defining convergence #' @param maxit the maximum number of iterations to allow for convergence of the #' coefficient vector #' @param useOptim whether to use the native optim function on rows which do not #' converge within maxit #' @param quiet whether to print messages at each step #' @param useQR whether to use the QR decomposition on the design #' matrix X while fitting the GLM #' #' @return a DESeqDataSet with new results columns accessible #' with the \code{\link{results}} function. The coefficients and standard errors are #' reported on a log2 scale. #' #' @seealso \code{\link{DESeq}}, \code{\link{nbinomWaldTest}} #' #' @examples #' #' dds <- makeExampleDESeqDataSet() #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersions(dds) #' dds <- nbinomLRT(dds, reduced = ~ 1) #' res <- results(dds) #' #' @export nbinomLRT <- function(object, full=design(object), reduced, betaTol=1e-8, maxit=100, useOptim=TRUE, quiet=FALSE, useQR=TRUE) { if (is.null(dispersions(object))) { stop("testing requires dispersion estimates, first call estimateDispersions()") } if (missing(reduced)) { stop("provide a reduced formula for the LRT, e.g. nbinomLRT(object, reduced= ~1)") } # in case the class of the mcols(mcols(object)) are not character object <- sanitizeRowRanges(object) # run check on the formula modelAsFormula <- !(is.matrix(full) & is.matrix(reduced)) if (modelAsFormula) { checkLRT(full, reduced) # run some tests common to DESeq, nbinomWaldTest, nbinomLRT designAndArgChecker(object, betaPrior=FALSE) # try to form model matrices, test for difference # in residual degrees of freedom fullModelMatrix <- stats::model.matrix.default(full, data=as.data.frame(colData(object))) reducedModelMatrix <- stats::model.matrix.default(reduced, data=as.data.frame(colData(object))) df <- ncol(fullModelMatrix) - ncol(reducedModelMatrix) } else { message("using supplied model matrix") df <- ncol(full) - ncol(reduced) } if (df < 1) stop("less than one degree of freedom, perhaps full and reduced models are not in the correct order") if (any(mcols(mcols(object))$type == "results")) { if (!quiet) message("found results columns, replacing these") object <- removeResults(object) } if (is.null(mcols(object)$allZero)) { object <- getBaseMeansAndVariances(object) } if (modelAsFormula) { modelMatrixType <- "standard" # check for intercept hasIntercept <- attr(terms(design(object)),"intercept") == 1 renameCols <- hasIntercept } else { modelMatrixType <- "user-supplied" renameCols <- FALSE } # store modelMatrixType attr(object,"modelMatrixType") <- modelMatrixType # only continue on the rows with non-zero row mean objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] if (modelAsFormula) { fullModel <- fitNbinomGLMs(objectNZ, modelFormula=full, renameCols=renameCols, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, warnNonposVar=FALSE) modelMatrix <- fullModel$modelMatrix reducedModel <- fitNbinomGLMs(objectNZ, modelFormula=reduced, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, warnNonposVar=FALSE) } else { fullModel <- fitNbinomGLMs(objectNZ, modelMatrix=full, renameCols=FALSE, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, warnNonposVar=FALSE) modelMatrix <- full reducedModel <- fitNbinomGLMs(objectNZ, modelMatrix=reduced, renameCols=FALSE, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, warnNonposVar=FALSE) } betaPriorVar <- rep(1e6, ncol(modelMatrix)) attr(object,"betaPrior") <- FALSE attr(object,"betaPriorVar") <- betaPriorVar attr(object,"modelMatrix") <- modelMatrix attr(object,"reducedModelMatrix") <- reducedModel$modelMatrix attr(object,"test") <- "LRT" # store mu in case the user did not call estimateDispersionsGeneEst dimnames(fullModel$mu) <- NULL assays(objectNZ)[["mu"]] <- fullModel$mu assays(object)[["mu"]] <- buildMatrixWithNARows(fullModel$mu, mcols(object)$allZero) H <- fullModel$hat_diagonals # calculate Cook's distance dispModelMatrix <- modelMatrix attr(object,"dispModelMatrix") <- dispModelMatrix cooks <- calculateCooksDistance(objectNZ, H, dispModelMatrix) # record maximum of Cook's maxCooks <- recordMaxCooks(design(object), colData(object), dispModelMatrix, cooks, nrow(objectNZ)) # store Cook's distance for each sample assays(object)[["cooks"]] <- buildMatrixWithNARows(cooks, mcols(object)$allZero) if (any(!fullModel$betaConv)) { if (!quiet) message(paste(sum(!fullModel$betaConv),"rows did not converge in beta, labelled in mcols(object)$fullBetaConv. Use larger maxit argument with nbinomLRT")) } # calculate LRT statistic and p-values LRTStatistic <- (2 * (fullModel$logLike - reducedModel$logLike)) LRTPvalue <- pchisq(LRTStatistic, df=df, lower.tail=FALSE) # no need to store additional betas (no beta prior) mleBetas <- NULL # continue storing LRT results resultsList <- c(matrixToList(fullModel$betaMatrix), matrixToList(fullModel$betaSE), mleBetas, list(LRTStatistic = LRTStatistic, LRTPvalue = LRTPvalue, fullBetaConv = fullModel$betaConv, reducedBetaConv = reducedModel$betaConv, betaIter = fullModel$betaIter, deviance = -2 * fullModel$logLike, maxCooks = maxCooks)) LRTResults <- buildDataFrameWithNARows(resultsList, mcols(object)$allZero) modelComparison <- if (modelAsFormula) { paste0("'",paste(as.character(full),collapse=" "), "' vs '", paste(as.character(reduced),collapse=" "),"'") } else { "full vs reduced" } modelMatrixNames <- colnames(fullModel$betaMatrix) modelMatrixNamesSpaces <- gsub("_"," ",modelMatrixNames) lfcType <- "MLE" coefInfo <- paste(paste0("log2 fold change (",lfcType,"):"),modelMatrixNamesSpaces) seInfo <- paste("standard error:",modelMatrixNamesSpaces) mleInfo <- NULL statInfo <- paste("LRT statistic:",modelComparison) pvalInfo <- paste("LRT p-value:",modelComparison) mcols(LRTResults) <- DataFrame(type = rep("results",ncol(LRTResults)), description = c(coefInfo, seInfo, mleInfo, statInfo, pvalInfo, "convergence of betas for full model", "convergence of betas for reduced model", "iterations for betas for full model", "deviance of the full model", "maximum Cook's distance for row")) mcols(object) <- cbind(mcols(object),LRTResults) return(object) } #' Replace outliers with trimmed mean #' #' Note that this function is called within \code{\link{DESeq}}, so is not #' necessary to call on top of a \code{DESeq} call. See the \code{minReplicatesForReplace} #' argument documented in \code{link{DESeq}}. #' #' This function replaces outlier counts flagged by extreme Cook's distances, #' as calculated by \code{\link{DESeq}}, \code{\link{nbinomWaldTest}} #' or \code{\link{nbinomLRT}}, with values predicted by the trimmed mean #' over all samples (and adjusted by size factor or normalization factor). #' This function replaces the counts in the matrix returned by \code{counts(dds)} #' and the Cook's distances in \code{assays(dds)[["cooks"]]}. Original counts are #' preserved in \code{assays(dds)[["originalCounts"]]}. #' #' The \code{\link{DESeq}} function calculates a diagnostic measure called #' Cook's distance for every gene and every sample. The \code{\link{results}} #' function then sets the p-values to \code{NA} for genes which contain #' an outlying count as defined by a Cook's distance above a threshold. #' With many degrees of freedom, i.e. many more samples than number of parameters to #' be estimated-- it might be undesirable to remove entire genes from the analysis #' just because their data include a single count outlier. #' An alternate strategy is to replace the outlier counts #' with the trimmed mean over all samples, adjusted by the size factor or normalization #' factor for that sample. The following simple function performs this replacement #' for the user, for samples which have at least \code{minReplicates} number #' of replicates (including that sample). #' For more information on Cook's distance, please see the two #' sections of the vignette: 'Dealing with count outliers' and 'Count outlier detection'. #' #' @param object a DESeqDataSet object, which has already been processed by #' either DESeq, nbinomWaldTest or nbinomLRT, and therefore contains a matrix #' contained in \code{assays(dds)[["cooks"]]}. These are the Cook's distances which will #' be used to define outlier counts. #' @param trim the fraction (0 to 0.5) of observations to be trimmed from #' each end of the normalized counts for a gene before the mean is computed #' @param cooksCutoff the threshold for defining an outlier to be replaced. #' Defaults to the .99 quantile of the F(p, m - p) distribution, where p is #' the number of parameters and m is the number of samples. #' @param minReplicates the minimum number of replicate samples necessary to consider #' a sample eligible for replacement (including itself). Outlier counts will not be replaced #' if the sample is in a cell which has less than minReplicates replicates. #' @param whichSamples optional, a numeric or logical index to specify #' which samples should have outliers replaced. if missing, this is determined using #' minReplicates. #' #' @seealso \code{\link{DESeq}} #' #' @aliases replaceOutliersWithTrimmedMean #' #' @return a DESeqDataSet with replaced counts in the slot returned by #' \code{\link{counts}} and the original counts preserved in #' \code{assays(dds)[["originalCounts"]]} #' #' @export replaceOutliers <- function(object, trim=.2, cooksCutoff, minReplicates=7, whichSamples) { if (is.null(attr(object,"modelMatrix")) | !("cooks" %in% assayNames(object))) { stop("first run DESeq, nbinomWaldTest, or nbinomLRT to identify outliers") } if (minReplicates < 3) { stop("at least 3 replicates are necessary in order to indentify a sample as a count outlier") } stopifnot(is.numeric(minReplicates) & length(minReplicates) == 1) p <- ncol(attr(object,"modelMatrix")) m <- ncol(object) if (m <= p) { assays(object)[["originalCounts"]] <- counts(object) return(object) } if (missing(cooksCutoff)) { cooksCutoff <- qf(.99, p, m - p) } idx <- which(assays(object)[["cooks"]] > cooksCutoff) mcols(object)$replace <- apply(assays(object)[["cooks"]], 1, function(row) any(row > cooksCutoff)) mcols(mcols(object),use.names=TRUE)["replace",] <- DataFrame(type="intermediate",description="had counts replaced") trimBaseMean <- apply(counts(object,normalized=TRUE),1,mean,trim=trim) # build a matrix of counts based on the trimmed mean and the size factors replacementCounts <- if (!is.null(normalizationFactors(object))) { as.integer(matrix(rep(trimBaseMean,ncol(object)),ncol=ncol(object)) * normalizationFactors(object)) } else { as.integer(outer(trimBaseMean, sizeFactors(object), "*")) } # replace only those values which fall above the cutoff on Cook's distance newCounts <- counts(object) newCounts[idx] <- replacementCounts[idx] if (missing(whichSamples)) { whichSamples <- nOrMoreInCell(attr(object,"modelMatrix"), n = minReplicates) } stopifnot(is.logical(whichSamples)) object$replaceable <- whichSamples mcols(colData(object),use.names=TRUE)["replaceable",] <- DataFrame(type="intermediate", description="outliers can be replaced") assays(object)[["originalCounts"]] <- counts(object) if (sum(whichSamples) == 0) { return(object) } counts(object)[,whichSamples] <- newCounts[,whichSamples,drop=FALSE] object } #' @export #' @rdname replaceOutliers replaceOutliersWithTrimmedMean <- replaceOutliers ########################################################### # unexported functons ########################################################### # Get base means and variances # # An internally used function to calculate the row means and variances # from the normalized counts, which requires that \code{\link{estimateSizeFactors}} # has already been called. Adds these and a logical column if the row sums # are zero to the mcols of the object. # # object a DESeqDataSet object # # return a DESeqDataSet object with columns baseMean # and baseVar in the row metadata columns getBaseMeansAndVariances <- function(object) { meanVarZero <- DataFrame(baseMean = unname(rowMeans(counts(object,normalized=TRUE))), baseVar = unname(rowVars(counts(object,normalized=TRUE))), allZero = unname(rowSums(counts(object)) == 0)) mcols(meanVarZero) <- DataFrame(type = rep("intermediate",ncol(meanVarZero)), description = c("mean of normalized counts for all samples", "variance of normalized counts for all samples", "all counts for a gene are zero")) if (all(c("baseMean","baseVar","allZero") %in% names(mcols(object)))) { mcols(object)[c("baseMean","baseVar","allZero")] <- meanVarZero } else { mcols(object) <- cbind(mcols(object),meanVarZero) } return(object) } estimateNormFactors <- function(counts, normMatrix, locfunc=median, geoMeans, controlGenes) { sf <- estimateSizeFactorsForMatrix(counts / normMatrix, locfunc=locfunc, geoMeans=geoMeans, controlGenes=controlGenes) nf <- t( t(normMatrix) * sf ) nf / exp(rowMeans(log(nf))) } # Estimate a parametric fit of dispersion to the mean intensity parametricDispersionFit <- function( means, disps ) { coefs <- c( .1, 1 ) iter <- 0 while(TRUE) { residuals <- disps / ( coefs[1] + coefs[2] / means ) good <- which( (residuals > 1e-4) & (residuals < 15) ) # check for glm convergence below to exit while-loop suppressWarnings({fit <- glm( disps[good] ~ I(1/means[good]), family=Gamma(link="identity"), start=coefs )}) oldcoefs <- coefs coefs <- coefficients(fit) if ( !all( coefs > 0 ) ) stop(simpleError("parametric dispersion fit failed")) if ( ( sum( log( coefs / oldcoefs )^2 ) < 1e-6 ) & fit$converged ) break iter <- iter + 1 if ( iter > 10 ) stop(simpleError("dispersion fit did not converge")) } names( coefs ) <- c( "asymptDisp", "extraPois" ) ans <- function(q) coefs[1] + coefs[2] / q attr( ans, "coefficients" ) <- coefs ans } # Local fit of dispersion to the mean intensity # fitting is done on log dispersion, log mean scale localDispersionFit <- function( means, disps, minDisp ) { if (all(disps < minDisp*10)) { return(rep(minDisp,length(disps))) } d <- data.frame(logDisps = log(disps), logMeans = log(means)) fit <- locfit(logDisps ~ logMeans, data=d[disps >= minDisp*10,,drop=FALSE], weights = means[disps >= minDisp*10]) dispFunction <- function(means) exp(predict(fit, data.frame(logMeans=log(means)))) return(dispFunction) } # convenience function for testing the log likelihood # for a count matrix, mu matrix and vector disp nbinomLogLike <- function(counts, mu, disp, weights, useWeights) { if (is.null(disp)) return(NULL) if (useWeights) { rowSums(weights * matrix(dnbinom(counts,mu=mu,size=1/disp, log=TRUE),ncol=ncol(counts))) } else { rowSums(matrix(dnbinom(counts,mu=mu,size=1/disp, log=TRUE),ncol=ncol(counts))) } } # simple function to return a matrix of size factors # or normalization factors getSizeOrNormFactors <- function(object) { if (!is.null(normalizationFactors(object))) { return(normalizationFactors(object)) } else { return(matrix(rep(sizeFactors(object),each=nrow(object)), ncol=ncol(object))) } } # convenience function for building results tables # out of a list and filling in NA rows buildDataFrameWithNARows <- function(resultsList, NArows) { lengths <- sapply(resultsList,length) if (!all(lengths == lengths[1])) { stop("lengths of vectors in resultsList must be equal") } if (sum(!NArows) != lengths[1]) { stop("number of non-NA rows must be equal to lengths of vectors in resultsList") } if (sum(NArows) == 0) { return(DataFrame(resultsList)) } dfFull <- DataFrame(lapply(resultsList, function(x) vector(mode(x), length(NArows)))) dfFull[NArows,] <- NA dfFull[!NArows,] <- DataFrame(resultsList) dfFull } # convenience function for building larger matrices # by filling in NA rows buildMatrixWithNARows <- function(m, NARows) { mFull <- matrix(NA, ncol=ncol(m), nrow=length(NARows)) mFull[!NARows,] <- m mFull } # convenience function for building larger matrices # by filling in 0 rows buildMatrixWithZeroRows <- function(m, zeroRows) { mFull <- matrix(0, ncol=ncol(m), nrow=length(zeroRows)) mFull[!zeroRows,] <- m mFull } # convenience function for breaking up matrices # by column and preserving column names matrixToList <- function(m) { l <- split(m, col(m)) names(l) <- colnames(m) l } # calculate a robust method of moments dispersion, # in order to estimate the dispersion excluding # individual outlier counts which would raise the variance estimate robustMethodOfMomentsDisp <- function(object, modelMatrix) { cnts <- counts(object,normalized=TRUE) # if there are 3 or more replicates in any cell threeOrMore <- nOrMoreInCell(modelMatrix,n=3) v <- if (any(threeOrMore)) { cells <- apply(modelMatrix,1,paste0,collapse="") cells <- unname(factor(cells,levels=unique(cells))) levels(cells) <- seq_along(levels(cells)) levelsThreeOrMore <- levels(cells)[table(cells) >= 3] idx <- cells %in% levelsThreeOrMore cntsSub <- cnts[,idx,drop=FALSE] cellsSub <- factor(cells[idx]) trimmedCellVariance(cntsSub, cellsSub) } else { trimmedVariance(cnts) } m <- rowMeans(cnts) alpha <- ( v - m ) / m^2 # cannot use the typical minDisp = 1e-8 here or else all counts in the same # group as the outlier count will get an extreme Cook's distance minDisp <- 0.04 alpha <- pmax(alpha, minDisp) alpha } trimmedCellVariance <- function(cnts, cells) { # how much to trim at different n trimratio <- c(1/3, 1/4, 1/8) # returns an index for the vector above for three sample size bins trimfn <- function(n) as.integer(cut(n, breaks=c(0,3.5,23.5,Inf))) cellMeans <- matrix(sapply(levels(cells), function(lvl) { n <- sum(cells==lvl) apply(cnts[,cells==lvl,drop=FALSE],1,mean,trim=trimratio[trimfn(n)]) }), nrow=nrow(cnts)) qmat <- cellMeans[,as.integer(cells),drop=FALSE] sqerror <- (cnts - qmat)^2 varEst <- matrix(sapply(levels(cells), function(lvl) { n <- sum(cells==lvl) # scale due to trimming of large squares, by e.g. 1/mean(rnorm(1e6)^2,trim=1/8) scale.c <- c(2.04, 1.86, 1.51)[trimfn(n)] scale.c * apply(sqerror[,cells==lvl,drop=FALSE],1,mean,trim=trimratio[trimfn(n)]) }), nrow=nrow(sqerror)) # take the max of variance estimates from cells # as one condition might have highly variable counts rowMax(varEst) } trimmedVariance <- function(x) { rm <- apply(x,1,mean,trim=1/8) sqerror <- (x - rm)^2 # scale due to trimming of large squares 1.51 * apply(sqerror,1,mean,trim=1/8) } calculateCooksDistance <- function(object, H, modelMatrix) { p <- ncol(modelMatrix) dispersions <- robustMethodOfMomentsDisp(object, modelMatrix) V <- assays(object)[["mu"]] + dispersions * assays(object)[["mu"]]^2 PearsonResSq <- (counts(object) - assays(object)[["mu"]])^2 / V cooks <- PearsonResSq / p * H / (1 - H)^2 cooks } # this function breaks out the logic for calculating the max Cook's distance: # the samples over which max Cook's distance is calculated: # # Cook's distance is considered for those samples with 3 or more replicates per cell # # if m == p or there are no samples over which to calculate max Cook's, then give NA recordMaxCooks <- function(design, colData, modelMatrix, cooks, numRow) { samplesForCooks <- nOrMoreInCell(modelMatrix, n=3) p <- ncol(modelMatrix) m <- nrow(modelMatrix) maxCooks <- if ((m > p) & any(samplesForCooks)) { apply(cooks[,samplesForCooks,drop=FALSE], 1, max) } else { rep(NA, numRow) } maxCooks } # for each sample in the model matrix, # are there n or more replicates in the same cell # (including that sample) # so for a 2 x 3 comparison, the returned vector for n = 3 is: # FALSE, FALSE, TRUE, TRUE, TRUE nOrMoreInCell <- function(modelMatrix, n) { numEqual <- sapply(seq_len(nrow(modelMatrix)), function(i) { modelMatrixDiff <- t(t(modelMatrix) - modelMatrix[i,]) sum(apply(modelMatrixDiff, 1, function(row) all(row == 0))) }) numEqual >= n } # an unexported diagnostic function # to retrieve the covariance matrix # for the GLM coefficients of a single row # only for standard model matrices covarianceMatrix <- function(object, rowNumber) { if (attr(object, "modelMatrixType") != "standard") stop("only for standard model matrices") # convert coefficients to log scale coefColumns <- names(mcols(object))[grep("log2 fold change",mcols(mcols(object))$description)] beta <- log(2) * as.numeric(as.data.frame(mcols(object)[rowNumber,coefColumns,drop=FALSE])) x <- getModelMatrix(object) y <- counts(object)[rowNumber,] sf <- sizeFactors(object) alpha <- dispersions(object)[rowNumber] mu.hat <- as.vector(sf * exp(x %*% beta)) minmu <- 0.5 mu.hat[mu.hat < minmu] <- minmu w <- diag(1/(1/mu.hat^2 * ( mu.hat + alpha * mu.hat^2 ))) betaPriorVar <- attr(object,"betaPriorVar") ridge <- diag(1/(log(2)^2 * betaPriorVar)) sigma <- solve(t(x) %*% w %*% x + ridge) %*% (t(x) %*% w %*% x) %*% t(solve(t(x) %*% w %*% x + ridge)) # convert back to log2 scale sigmaLog2Scale <- log2(exp(1))^2 * sigma sigmaLog2Scale } getDesignFactors <- function(object) { design <- design(object) designVars <- all.vars(design) designVarsClass <- sapply(designVars, function(v) class(colData(object)[[v]])) designVars[designVarsClass == "factor"] } # looking at the values of x which are large # in absolute value, find the zero-centered Normal distribution # with the matching quantile, and return the variance # of that Normal distribution matchUpperQuantileForVariance <- function(x, upperQuantile=.05) { sdEst <- quantile(abs(x), 1 - upperQuantile) / qnorm(1 - upperQuantile/2) unname(sdEst)^2 } matchWeightedUpperQuantileForVariance <- function(x, weights, upperQuantile=.05) { sdEst <- wtd.quantile(abs(x), weights=weights, 1 - upperQuantile, normwt=TRUE) / qnorm(1 - upperQuantile/2) unname(sdEst)^2 } # rough dispersion estimate using counts and fitted values roughDispEstimate <- function(y, x) { # must be positive mu <- linearModelMu(y, x) mu <- matrix(pmax(1, mu), ncol=ncol(mu)) m <- nrow(x) p <- ncol(x) # an alternate rough estimator with higher mean squared or absolute error # (rowSums( (y - mu)^2/(mu * (m - p)) ) - 1)/rowMeans(mu) # rough disp ests will be adjusted up to minDisp later est <- rowSums( ((y - mu)^2 - mu) / mu^2 ) / (m - p) pmax(est, 0) } momentsDispEstimate <- function(object) { xim <- if (!is.null(normalizationFactors(object))) { mean(1/colMeans(normalizationFactors(object))) } else { mean(1/sizeFactors(object)) } bv <- mcols(object)$baseVar bm <- mcols(object)$baseMean (bv - xim*bm)/bm^2 } modelMatrixGroups <- function(x) { factor(unname(apply(x, 1, paste0, collapse="__"))) } linearModelMu <- function(y, x) { qrx <- qr(x) Q <- qr.Q(qrx) Rinv <- solve(qr.R(qrx)) hatmatrix <- x %*% Rinv %*% t(Q) t(hatmatrix %*% t(y)) } linearModelMuNormalized <- function(object, x) { cts <- counts(object) norm.cts <- counts(object, normalized=TRUE) muhat <- linearModelMu(norm.cts, x) nf <- getSizeOrNormFactors(object) muhat * nf } # checks for LRT formulas, written as function to remove duplicate code # in DESeq and nbinomLRT checkLRT <- function(full, reduced) { reducedNotInFull <- !all.vars(reduced) %in% all.vars(full) if (any(reducedNotInFull)) { stop(paste("the following variables in the reduced formula not in the full formula:", paste(all.vars(reduced)[reducedNotInFull],collapse=", "))) } } # bulky code separated from DESeq() refitWithoutOutliers <- function(object, test, betaPrior, full, reduced, quiet, minReplicatesForReplace, modelMatrix, modelMatrixType) { cooks <- assays(object)[["cooks"]] object <- replaceOutliers(object, minReplicates=minReplicatesForReplace) # refit without outliers, if there were any replacements nrefit <- sum(mcols(object)$replace, na.rm=TRUE) if ( nrefit > 0 ) { object <- getBaseMeansAndVariances(object) newAllZero <- which(mcols(object)$replace & mcols(object)$allZero) } # only refit if some of the replacements don't result in all zero counts # otherwise, these cases are handled by results() if ( nrefit > 0 && nrefit > length(newAllZero) ) { if (!quiet) message(paste("-- replacing outliers and refitting for", nrefit,"genes -- DESeq argument 'minReplicatesForReplace' =",minReplicatesForReplace," -- original counts are preserved in counts(dds)")) # refit on those rows which had replacement refitReplace <- which(mcols(object)$replace & !mcols(object)$allZero) objectSub <- object[refitReplace,] intermediateOrResults <- which(mcols(mcols(objectSub))$type %in% c("intermediate","results")) mcols(objectSub) <- mcols(objectSub)[,-intermediateOrResults,drop=FALSE] # estimate gene-wise dispersion if (!quiet) message("estimating dispersions") objectSub <- estimateDispersionsGeneEst(objectSub, quiet=quiet, modelMatrix=modelMatrix) # need to redo fitted dispersion due to changes in base mean mcols(objectSub)$dispFit <- dispersionFunction(objectSub)(mcols(objectSub)$baseMean) mcols(mcols(objectSub),use.names=TRUE)["dispFit",] <- DataFrame(type="intermediate", description="fitted values of dispersion") dispPriorVar <- attr( dispersionFunction(object), "dispPriorVar" ) # estimate dispersion MAP objectSub <- estimateDispersionsMAP(objectSub, quiet=quiet, dispPriorVar=dispPriorVar, modelMatrix=modelMatrix) # fit GLM if (!quiet) message("fitting model and testing") if (test == "Wald") { betaPriorVar <- attr(object, "betaPriorVar") objectSub <- nbinomWaldTest(objectSub, betaPrior=betaPrior, betaPriorVar=betaPriorVar, quiet=quiet, modelMatrix=modelMatrix, modelMatrixType=modelMatrixType) } else if (test == "LRT") { objectSub <- nbinomLRT(objectSub, full=full, reduced=reduced, quiet=quiet) } idx <- match(names(mcols(objectSub)), names(mcols(object))) mcols(object)[refitReplace, idx] <- mcols(objectSub) mcols(object)[newAllZero, mcols(mcols(object))$type == "results"] <- NA # continue to flag if some conditions have less than minReplicatesForReplace if (all(object$replaceable)) { mcols(object)$maxCooks <- NA } else { replaceCooks <- assays(object)[["cooks"]] replaceCooks[,object$replaceable] <- 0 mcols(object)$maxCooks <- recordMaxCooks(design(object), colData(object), attr(object,"dispModelMatrix"), replaceCooks, nrow(object)) } } if ( nrefit > 0 ) { # save the counts used for fitting as replaceCounts assays(object)[["replaceCounts"]] <- counts(object) assays(object)[["replaceCooks"]] <- assays(object)[["cooks"]] # preserve original counts and Cook's distances counts(object) <- assays(object)[["originalCounts"]] assays(object)[["cooks"]] <- cooks # no longer need this assay slot assays(object)[["originalCounts"]] <- NULL } object } sanitizeRowRanges <- function(object) { if (is.null(mcols(mcols(object)))) { mcols(mcols(object)) <- DataFrame(type=rep("input",ncol(mcols(object))), description=character(ncol(mcols(object)))) } class(mcols(mcols(object))$type) <- "character" class(mcols(mcols(object))$description) <- "character" mcols(mcols(object))$type[ is.na(mcols(mcols(object))$type) ] <- "" mcols(mcols(object))$description[ is.na(mcols(mcols(object))$description) ] <- "" object } sanitizeColData <- function(object) { if (is.null(mcols(colData(object)))) { mcols(colData(object)) <- DataFrame(type=rep("input",ncol(colData(object))), description=character(ncol(colData(object)))) } class(mcols(colData(object))$type) <- "character" class(mcols(colData(object))$description) <- "character" mcols(colData(object))$type[ is.na(mcols(colData(object))$type) ] <- "" mcols(colData(object))$description[ is.na(mcols(colData(object))$description) ] <- "" object } estimateSizeFactorsIterate <- function(object, niter=10, Q=0.05) { design(object) <- ~ 1 sf <- rep(1, ncol(object)) idx <- rowSums(counts(object)) > 0 cts <- counts(object)[idx,] for (i in seq_len(niter)) { sizeFactors(object) <- sf object <- estimateDispersions(object, fitType="mean", quiet=TRUE) q <- t(t(assays(object)[["mu"]])/sf)[idx,] disps <- dispersions(object)[idx] sf.old <- sf fn <- function(p) { sf <- exp(p - mean(p)) mu.new <- t(t(q) * sf) ll <- matrix(dnbinom(cts, mu=mu.new, size=1/disps, log=TRUE), ncol=ncol(cts)) gene.ll <- rowSums(ll) sum(gene.ll[ gene.ll > quantile(gene.ll, Q) ]) } res <- optim(log(sf.old), fn, control=list(fnscale=-1), method="L-BFGS-B") if (res$convergence != 0) { stop("iterative size factor normalization did not converge within an iteration") } sf <- exp(res$par - mean(res$par)) # loop more than once, and test for convergence if (i > 1 & sum((log(sf.old) - log(sf))^2) < 1e-4) { break } else { if (i == niter) { stop("iterative size factor normalization did not converge") } } } sf } checkFullRank <- function(modelMatrix) { if (qr(modelMatrix)$rank < ncol(modelMatrix)) { if (any(apply(modelMatrix, 2, function(col) all(col == 0)))) { stop("the model matrix is not full rank, so the model cannot be fit as specified. Levels or combinations of levels without any samples have resulted in column(s) of zeros in the model matrix. Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')") } else { stop("the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed. Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')") } } } designAndArgChecker <- function(object, betaPrior) { termsOrder <- attr(terms.formula(design(object)),"order") hasIntercept <- attr(terms(design(object)),"intercept") == 1 interactionPresent <- any(termsOrder > 1) if (betaPrior & !hasIntercept) { stop("betaPrior=TRUE can only be used if the design has an intercept. if specifying + 0 in the design formula, use betaPrior=FALSE") } if (betaPrior & interactionPresent) { stop("betaPrior=FALSE should be used for designs with interactions") } design <- design(object) designVars <- all.vars(design) if (length(designVars) > 0) { if (any(sapply(designVars, function(v) any(is.na(colData(object)[[v]]))))) { stop("variables in the design formula cannot have NA values") } designFactors <- designVars[sapply(designVars, function(v) is(colData(object)[[v]], "factor"))] if (length(designFactors) > 0 && any(sapply(designFactors,function(v) any(table(colData(object)[[v]]) == 0)))) { stop("factors in design formula must have samples for each level. this error can arise when subsetting a DESeqDataSet, in which all the samples for one or more levels of a factor in the design were removed. if this was intentional, use droplevels() to remove these levels, e.g.: dds$condition <- droplevels(dds$condition) ") } if (any(sapply(designVars, function(v) is(colData(object)[[v]], "ordered")))) { stop("the design contains an ordered factor. The internal steps that estimate the beta prior variance and produce resultsNames do not work on ordered factors. You should instead use model.matrix and then provide your custom matrix to 'full' argument of DESeq. (You should also provide a matrix to 'reduced' for test='LRT'.)") } } } getModelMatrix <- function(object) { stats::model.matrix.default(design(object), data=as.data.frame(colData(object))) } getAndCheckWeights <- function(object, modelMatrix) { if ("weights" %in% assayNames(object)) { useWeights <- TRUE weights <- unname(assays(object)[["weights"]]) stopifnot(all(weights >= 0)) weights <- weights / apply(weights, 1, max) # some code for testing whether still full rank # only performed once per analysis, by setting object attribute if (is.null(attr(object, "weightsOK"))) { m <- ncol(modelMatrix) full.rank <- qr(modelMatrix)$rank == m weights.ok <- logical(nrow(weights)) if (full.rank) { for (i in seq_len(nrow(weights))) { weights.ok[i] <- qr(weights[i,] * modelMatrix)$rank == m } } else { # model matrix is not full rank, # e.g. expanded model matrix from betaPrior=TRUE: # just check zero columns weights.ok <- rep(TRUE, nrow(weights)) for (j in seq_len(ncol(modelMatrix))) { num.zero <- colSums(t(weights) * modelMatrix[,j] == 0) weights.ok <- weights.ok & (num.zero != nrow(modelMatrix)) } } stopifnot(all(weights.ok)) } attr(object, "weightsOK") <- TRUE } else { useWeights <- FALSE weights <- matrix(1, nrow=nrow(object), ncol=ncol(object)) } list(weights=weights,useWeights=useWeights) } DESeq2/R/expanded.R0000644000175400017540000000767713201671732014762 0ustar00biocbuildbiocbuildmakeExpandedModelMatrix <- function(object) { designFactors <- getDesignFactors(object) coldata <- colData(object) coldata <- rbind(coldata,coldata[nrow(coldata),,drop=FALSE]) for (f in designFactors) { levels(coldata[[f]]) <- c(levels(coldata[[f]]),"_null_level_") coldata[[f]] <- relevel(coldata[[f]],"_null_level_") coldata[[f]][nrow(coldata)] <- "_null_level_" } mm0 <- stats::model.matrix.default(design(object), data=as.data.frame(coldata)) # these can appear when interactions are present without main effect variables nullLvls <- grepl("_null_level_",colnames(mm0)) mm <- mm0[-nrow(mm0),!nullLvls,drop=FALSE] attr(mm,"assign") <- attr(mm0,"assign") colnames(mm)[colnames(mm) == "(Intercept)"] <- "Intercept" colnames(mm) <- make.names(colnames(mm)) mm } averagePriorsOverLevels <- function(object, betaPriorVar) { expandedModelMatrix <- makeExpandedModelMatrix(object) expandedNames <- colnames(expandedModelMatrix) betaPriorIn <- betaPriorVar betaPriorOut <- numeric(length(expandedNames)) names(betaPriorOut) <- expandedNames bpiNms <- names(betaPriorIn) idx <- which(bpiNms %in% expandedNames) betaPriorOut[match(bpiNms[idx],expandedNames)] <- betaPriorIn[idx] designFactors <- getDesignFactors(object) allVars <- all.vars(design(object)) coldata <- colData(object) for (f in designFactors) { lvls <- levels(coldata[[f]]) mmColnames <- make.names(paste0(f,c(lvls,"Cntrst"))) meanPriorVar <- mean(betaPriorIn[names(betaPriorIn) %in% mmColnames]) betaPriorOut[expandedNames %in% mmColnames] <- meanPriorVar } # pre-v1.10 code regarding interactions and the beta prior: # ------------------------------------------------------------ # also set prior for any interactions between design factors # which are new in the expanded model matrix using existing interactions ## termsOrder <- attr(terms.formula(design(object)),"order") ## if (any(termsOrder > 1)) { ## for (f1 in designFactors) { ## for (f2 in allVars) { ## if (f1 == f2) next ## lvls1 <- levels(coldata[[f1]]) ## # the case where f2 is a factor like f1 ## if (f2 %in% designFactors) { ## lvls2 <- levels(coldata[[f2]]) ## mmColnames <- make.names(paste0(f1,rep(lvls1,each=length(lvls2)),":", ## f2,rep(lvls2,times=length(lvls1)))) ## meanPriorVar <- mean(betaPriorIn[names(betaPriorIn) %in% mmColnames]) ## betaPriorOut[expandedNames %in% mmColnames] <- meanPriorVar ## # the case where f2 is not a factor ## } else { ## mmColnames <- make.names(c(paste0(f1,lvls1,":",f2),paste0(f2,":",f1,lvls1))) ## meanPriorVar <- mean(betaPriorIn[names(betaPriorIn) %in% mmColnames]) ## betaPriorOut[expandedNames %in% mmColnames] <- meanPriorVar ## } ## } ## } ## } if (any(is.na(betaPriorOut))) { stop(paste("beta prior for",paste(names(betaPriorOut)[is.na(betaPriorOut)],collapse=","),"is NA")) } if (!all(betaPriorOut > 0)) { stop(paste("beta prior for",paste(names(betaPriorOut)[betaPriorOut <= 0],collapse=","),"is not greater than 0")) } betaPriorOut } # adds all first order contrasts addAllContrasts <- function(object, betaMatrix) { designFactors <- getDesignFactors(object) coldata <- colData(object) for (f in designFactors) { lvls <- levels(coldata[[f]]) mmColnames <- make.names(paste0(f,lvls)) M <- betaMatrix[,colnames(betaMatrix) %in% mmColnames,drop=FALSE] n <- ncol(M) if (n > 1) { if (n == 2) { is <- 2 js <- 1 } else { is <- do.call(c,sapply(seq_len(n-1)+1, function(k) seq(from=k,to=n))) js <- rep(seq_len(n-1),rev(seq_len(n-1))) } contrastCols <- mapply(function(i,j) M[,i] - M[,j], i=is, j=js) colnames(contrastCols) <- rep(make.names(paste0(f,"Cntrst")),ncol(contrastCols)) betaMatrix <- cbind(betaMatrix, contrastCols) } } betaMatrix } DESeq2/R/fitNbinomGLMs.R0000644000175400017540000003535113201671732015630 0ustar00biocbuildbiocbuild# Unexported, low-level function for fitting negative binomial GLMs # # Users typically call \code{\link{nbinomWaldTest}} or \code{\link{nbinomLRT}} # which calls this function to perform fitting. These functions return # a \code{\link{DESeqDataSet}} object with the appropriate columns # added. This function returns results as a list. # # object a DESeqDataSet # modelMatrix the design matrix # modelFormula a formula specifying how to construct the design matrix # alpha_hat the dispersion parameter estimates # lambda the 'ridge' term added for the penalized GLM on the log2 scale # renameCols whether to give columns variable_B_vs_A style names # betaTol control parameter: stop when the following is satisfied: # abs(dev - dev_old)/(abs(dev) + 0.1) < betaTol # maxit control parameter: maximum number of iteration to allow for # convergence # useOptim whether to use optim on rows which have not converged: # Fisher scoring is not ideal with multiple groups and sparse # count distributions # useQR whether to use the QR decomposition on the design matrix X # forceOptim whether to use optim on all rows # warnNonposVar whether to warn about non positive variances, # for advanced users only running LRT without beta prior, # this might be desirable to be ignored. # # return a list of results, with coefficients and standard # errors on the log2 scale fitNbinomGLMs <- function(object, modelMatrix=NULL, modelFormula, alpha_hat, lambda, renameCols=TRUE, betaTol=1e-8, maxit=100, useOptim=TRUE, useQR=TRUE, forceOptim=FALSE, warnNonposVar=TRUE) { if (missing(modelFormula)) { modelFormula <- design(object) } if (is.null(modelMatrix)) { modelAsFormula <- TRUE modelMatrix <- stats::model.matrix.default(modelFormula, data=as.data.frame(colData(object))) } else { modelAsFormula <- FALSE } stopifnot(all(colSums(abs(modelMatrix)) > 0)) # rename columns, for use as columns in DataFrame # and to emphasize the reference level comparison modelMatrixNames <- colnames(modelMatrix) modelMatrixNames[modelMatrixNames == "(Intercept)"] <- "Intercept" modelMatrixNames <- make.names(modelMatrixNames) if (renameCols) { convertNames <- renameModelMatrixColumns(colData(object), modelFormula) convertNames <- convertNames[convertNames$from %in% modelMatrixNames,,drop=FALSE] modelMatrixNames[match(convertNames$from, modelMatrixNames)] <- convertNames$to } colnames(modelMatrix) <- modelMatrixNames normalizationFactors <- getSizeOrNormFactors(object) if (missing(alpha_hat)) { alpha_hat <- dispersions(object) } if (length(alpha_hat) != nrow(object)) { stop("alpha_hat needs to be the same length as nrows(object)") } # set a wide prior for all coefficients if (missing(lambda)) { lambda <- rep(1e-6, ncol(modelMatrix)) } # use weights if they are present in assays(object) wlist <- getAndCheckWeights(object, modelMatrix) weights <- wlist$weights useWeights <- wlist$useWeights # bypass the beta fitting if the model formula is only intercept and # the prior variance is large (1e6) # i.e., LRT with reduced ~ 1 and no beta prior justIntercept <- if (modelAsFormula) { modelFormula == formula(~ 1) } else { ncol(modelMatrix) == 1 & all(modelMatrix == 1) } if (justIntercept & all(lambda <= 1e-6)) { alpha <- alpha_hat betaConv <- rep(TRUE, nrow(object)) betaIter <- rep(1,nrow(object)) betaMatrix <- if (useWeights) { matrix(log2(rowSums(weights*counts(object, normalized=TRUE)) /rowSums(weights)),ncol=1) } else { matrix(log2(rowMeans(counts(object, normalized=TRUE))),ncol=1) } mu <- normalizationFactors * as.numeric(2^betaMatrix) logLikeMat <- dnbinom(counts(object), mu=mu, size=1/alpha, log=TRUE) logLike <- if (useWeights) { rowSums(weights*logLikeMat) } else { rowSums(logLikeMat) } modelMatrix <- stats::model.matrix.default(~ 1, as.data.frame(colData(object))) colnames(modelMatrix) <- modelMatrixNames <- "Intercept" w <- if (useWeights) { weights * (mu^-1 + alpha)^-1 } else { (mu^-1 + alpha)^-1 } xtwx <- rowSums(w) sigma <- xtwx^-1 betaSE <- matrix(log2(exp(1)) * sqrt(sigma),ncol=1) hat_diagonals <- w * xtwx^-1; res <- list(logLike = logLike, betaConv = betaConv, betaMatrix = betaMatrix, betaSE = betaSE, mu = mu, betaIter = betaIter, modelMatrix=modelMatrix, nterms=1, hat_diagonals=hat_diagonals) return(res) } qrx <- qr(modelMatrix) # if full rank, estimate initial betas for IRLS below if (qrx$rank == ncol(modelMatrix)) { Q <- qr.Q(qrx) R <- qr.R(qrx) y <- t(log(counts(object,normalized=TRUE) + .1)) beta_mat <- t(solve(R, t(Q) %*% y)) } else { if ("Intercept" %in% modelMatrixNames) { beta_mat <- matrix(0, ncol=ncol(modelMatrix), nrow=nrow(object)) # use the natural log as fitBeta occurs in the natural log scale logBaseMean <- log(rowMeans(counts(object,normalized=TRUE))) beta_mat[,which(modelMatrixNames == "Intercept")] <- logBaseMean } else { beta_mat <- matrix(1, ncol=ncol(modelMatrix), nrow=nrow(object)) } } # here we convert from the log2 scale of the betas # and the beta prior variance to the log scale # used in fitBeta. # so we divide by the square of the # conversion factor, log(2) lambdaNatLogScale <- lambda / log(2)^2 betaRes <- fitBetaWrapper(ySEXP = counts(object), xSEXP = modelMatrix, nfSEXP = normalizationFactors, alpha_hatSEXP = alpha_hat, beta_matSEXP = beta_mat, lambdaSEXP = lambdaNatLogScale, weightsSEXP = weights, useWeightsSEXP = useWeights, tolSEXP = betaTol, maxitSEXP = maxit, useQRSEXP=useQR) # Note on deviance: the 'deviance' calculated in fitBeta() (C++) # is not returned in mcols(object)$deviance. instead, we calculate # the log likelihood below and use -2 * logLike. # (reason is that we have other ways of estimating beta: # above intercept code, and below optim code) mu <- normalizationFactors * t(exp(modelMatrix %*% t(betaRes$beta_mat))) dispersionVector <- rep(dispersions(object), times=ncol(object)) logLike <- nbinomLogLike(counts(object), mu, dispersions(object), weights, useWeights) # test for stability rowStable <- apply(betaRes$beta_mat,1,function(row) sum(is.na(row))) == 0 # test for positive variances rowVarPositive <- apply(betaRes$beta_var_mat,1,function(row) sum(row <= 0)) == 0 # test for convergence, stability and positive variances betaConv <- betaRes$iter < maxit # here we transform the betaMatrix and betaSE to a log2 scale betaMatrix <- log2(exp(1))*betaRes$beta_mat colnames(betaMatrix) <- modelMatrixNames colnames(modelMatrix) <- modelMatrixNames # warn below regarding these rows with negative variance betaSE <- log2(exp(1))*sqrt(pmax(betaRes$beta_var_mat,0)) colnames(betaSE) <- paste0("SE_",modelMatrixNames) # switch based on whether we should also use optim # on rows which did not converge rowsForOptim <- if (useOptim) { which(!betaConv | !rowStable | !rowVarPositive) } else { which(!rowStable | !rowVarPositive) } if (forceOptim) { rowsForOptim <- seq_along(betaConv) } if (length(rowsForOptim) > 0) { # we use optim if didn't reach convergence with the IRLS code resOptim <- fitNbinomGLMsOptim(object,modelMatrix,lambda, rowsForOptim,rowStable, normalizationFactors,alpha_hat, weights,useWeights, betaMatrix,betaSE,betaConv, beta_mat, mu,logLike) betaMatrix <- resOptim$betaMatrix betaSE <- resOptim$betaSE betaConv <- resOptim$betaConv mu <- resOptim$mu logLike <- resOptim$logLike } stopifnot(!any(is.na(betaSE))) nNonposVar <- sum(rowSums(betaSE == 0) > 0) if (warnNonposVar & nNonposVar > 0) warning(nNonposVar,"rows had non-positive estimates of variance for coefficients") list(logLike = logLike, betaConv = betaConv, betaMatrix = betaMatrix, betaSE = betaSE, mu = mu, betaIter = betaRes$iter, modelMatrix=modelMatrix, nterms=ncol(modelMatrix), hat_diagonals=betaRes$hat_diagonals) } # this function calls fitNbinomGLMs() twice: # 1 - without the beta prior, in order to calculate the # beta prior variance and hat matrix # 2 - again but with the prior in order to get beta matrix and standard errors fitGLMsWithPrior <- function(object, betaTol, maxit, useOptim, useQR, betaPriorVar) { objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] modelMatrixType <- attr(object, "modelMatrixType") if (missing(betaPriorVar) | !(all(c("mu","H") %in% assayNames(objectNZ)))) { # first, fit the negative binomial GLM without a prior, # used to construct the prior variances # and for the hat matrix diagonals for calculating Cook's distance fit <- fitNbinomGLMs(objectNZ, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, renameCols = (modelMatrixType == "standard")) modelMatrix <- fit$modelMatrix modelMatrixNames <- colnames(modelMatrix) H <- fit$hat_diagonal betaMatrix <- fit$betaMatrix mu <- fit$mu modelMatrixNames[modelMatrixNames == "(Intercept)"] <- "Intercept" modelMatrixNames <- make.names(modelMatrixNames) colnames(betaMatrix) <- modelMatrixNames # save the MLE log fold changes for addMLE argument of results convertNames <- renameModelMatrixColumns(colData(object), design(objectNZ)) convertNames <- convertNames[convertNames$from %in% modelMatrixNames,,drop=FALSE] modelMatrixNames[match(convertNames$from, modelMatrixNames)] <- convertNames$to mleBetaMatrix <- fit$betaMatrix colnames(mleBetaMatrix) <- paste0("MLE_",modelMatrixNames) # store for use in estimateBetaPriorVar below mcols(objectNZ) <- cbind(mcols(objectNZ), DataFrame(mleBetaMatrix)) } else { # we can skip the first MLE fit because the # beta prior variance and hat matrix diagonals were provided modelMatrix <- getModelMatrix(object) H <- assays(objectNZ)[["H"]] mu <- assays(objectNZ)[["mu"]] mleBetaMatrix <- as.matrix(mcols(objectNZ)[,grep("MLE_",names(mcols(objectNZ))),drop=FALSE]) } if (missing(betaPriorVar)) { betaPriorVar <- estimateBetaPriorVar(objectNZ) } else { # else we are provided the prior variance: # check if the lambda is the correct length # given the design formula if (modelMatrixType == "expanded") { modelMatrix <- makeExpandedModelMatrix(objectNZ) } p <- ncol(modelMatrix) if (length(betaPriorVar) != p) { stop(paste("betaPriorVar should have length",p,"to match:",paste(colnames(modelMatrix),collapse=", "))) } } # refit the negative binomial GLM with a prior on betas if (any(betaPriorVar == 0)) { stop("beta prior variances are equal to zero for some variables") } lambda <- 1/betaPriorVar if (modelMatrixType == "standard") { fit <- fitNbinomGLMs(objectNZ, lambda=lambda, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR) modelMatrix <- fit$modelMatrix } else { modelMatrix <- makeExpandedModelMatrix(objectNZ) fit <- fitNbinomGLMs(objectNZ, lambda=lambda, betaTol=betaTol, maxit=maxit, useOptim=useOptim, useQR=useQR, modelMatrix=modelMatrix, renameCols=FALSE) } res <- list(fit=fit, H=H, betaPriorVar=betaPriorVar, mu=mu, modelMatrix=modelMatrix, mleBetaMatrix=mleBetaMatrix) res } # breaking out the optim backup code from fitNbinomGLMs fitNbinomGLMsOptim <- function(object,modelMatrix,lambda, rowsForOptim,rowStable, normalizationFactors,alpha_hat, weights,useWeights, betaMatrix,betaSE,betaConv, beta_mat, mu,logLike) { x <- modelMatrix lambdaNatLogScale <- lambda / log(2)^2 large <- 30 for (row in rowsForOptim) { betaRow <- if (rowStable[row] & all(abs(betaMatrix[row,]) < large)) { betaMatrix[row,] } else { beta_mat[row,] } nf <- normalizationFactors[row,] k <- counts(object)[row,] alpha <- alpha_hat[row] objectiveFn <- function(p) { mu_row <- as.numeric(nf * 2^(x %*% p)) logLikeVector <- dnbinom(k,mu=mu_row,size=1/alpha,log=TRUE) logLike <- if (useWeights) { sum(weights[row,] * logLikeVector) } else { sum(logLikeVector) } logPrior <- sum(dnorm(p,0,sqrt(1/lambda),log=TRUE)) negLogPost <- -1 * (logLike + logPrior) if (is.finite(negLogPost)) negLogPost else 10^300 } o <- optim(betaRow, objectiveFn, method="L-BFGS-B",lower=-large, upper=large) ridge <- if (length(lambdaNatLogScale) > 1) { diag(lambdaNatLogScale) } else { as.matrix(lambdaNatLogScale,ncol=1) } # if we converged, change betaConv to TRUE if (o$convergence == 0) { betaConv[row] <- TRUE } # with or without convergence, store the estimate from optim betaMatrix[row,] <- o$par # calculate the standard errors mu_row <- as.numeric(nf * 2^(x %*% o$par)) # store the new mu vector mu[row,] <- mu_row minmu <- 0.5 mu_row[mu_row < minmu] <- minmu w <- if (useWeights) { diag((mu_row^-1 + alpha)^-1) } else { diag(weights[row,] * (mu_row^-1 + alpha)^-1) } xtwx <- t(x) %*% w %*% x xtwxRidgeInv <- solve(xtwx + ridge) sigma <- xtwxRidgeInv %*% xtwx %*% xtwxRidgeInv # warn below regarding these rows with negative variance betaSE[row,] <- log2(exp(1)) * sqrt(pmax(diag(sigma),0)) logLikeVector <- dnbinom(k,mu=mu_row,size=1/alpha,log=TRUE) logLike[row] <- if (useWeights) { sum(weights[row,] * logLikeVector) } else { sum(logLikeVector) } } return(list(betaMatrix=betaMatrix,betaSE=betaSE, betaConv=betaConv,mu=mu,logLike=logLike)) } DESeq2/R/helper.R0000644000175400017540000004563213201671732014442 0ustar00biocbuildbiocbuild#' Unmix samples using loss in a variance stabilized space #' #' Unmixes samples in \code{x} according to \code{pure} components, #' using numerical optimization. The components in \code{pure} #' are added on the scale of gene expression (either normalized counts, or TPMs). #' The loss function when comparing fitted expression to the #' samples in \code{x} occurs in a variance stabilized space. #' This task is sometimes referred to as "deconvolution", #' and can be used, for example, to identify contributions from #' various tissues. #' Note: if the \code{pbapply} package is installed a progress bar #' will be displayed while mixing components are fit. #' #' @param x normalized counts or TPMs of the samples to be unmixed #' @param pure normalized counts or TPMs of the "pure" samples #' @param alpha for normalized counts, the dispersion of the data #' when a negative binomial model is fit. this can be found by examining #' the asymptotic value of \code{dispersionFunction(dds)}, when using #' \code{fitType="parametric"} or the mean value when using #' \code{fitType="mean"}. #' @param shift for TPMs, the shift which approximately stabilizes the variance #' of log shifted TPMs. Can be assessed with \code{vsn::meanSdPlot}. #' @param loss either 1 (for L1) or 2 (for squared) loss function. #' Default is 1. #' @param quiet suppress progress bar. default is FALSE, show progress bar #' if pbapply is installed. #' #' @return mixture components for each sample (rows), which sum to 1. #' #' @export unmix <- function(x, pure, alpha, shift, loss=1, quiet=FALSE) { if (missing(alpha)) stopifnot(!missing(shift)) if (missing(shift)) stopifnot(!missing(alpha)) stopifnot(missing(shift) | missing(alpha)) stopifnot(loss %in% 1:2) stopifnot(nrow(x) == nrow(pure)) stopifnot(ncol(pure) > 1) if (requireNamespace("pbapply", quietly=TRUE) & !quiet) { lapply <- pbapply::pblapply } if (missing(shift)) { stopifnot(alpha > 0) # variance stabilizing transformation for NB w/ fixed dispersion alpha vst <- function(q, alpha) ( 2 * asinh(sqrt(alpha * q)) - log(alpha) - log(4) ) / log(2) distVST <- function(p, i, vst, alpha, loss) { sum(abs(vst(x[,i], alpha) - vst(pure %*% p, alpha))^loss) } res <- lapply(seq_len(ncol(x)), function(i) { optim(par=rep(1, ncol(pure)), fn=distVST, gr=NULL, i, vst, alpha, loss, method="L-BFGS-B", lower=0, upper=100)$par }) } else { stopifnot(shift > 0) # VST of shifted log vstSL <- function(q, shift) log(q + shift) distSL <- function(p, i, vst, shift, loss) { sum(abs(vstSL(x[,i], shift) - vstSL(pure %*% p, shift))^loss) } res <- lapply(seq_len(ncol(x)), function(i) { optim(par=rep(1, ncol(pure)), fn=distSL, gr=NULL, i, vstSL, shift, loss, method="L-BFGS-B", lower=0, upper=100)$par }) } mix <- do.call(rbind, res) mix <- mix / rowSums(mix) colnames(mix) <- colnames(pure) return(mix) } #' Collapse technical replicates in a RangedSummarizedExperiment or DESeqDataSet #' #' Collapses the columns in \code{object} by summing within levels #' of a grouping factor \code{groupby}. The purpose of this function #' is to sum up read counts from technical replicates to create an object #' with a single column of read counts for each sample. #' Note: by "technical replicates", we mean multiple sequencing runs of the same #' library, in constrast to "biological replicates" in which multiple #' libraries are prepared from separate biological units. #' Optionally renames the columns of returned object with the levels of the #' grouping factor. #' Note: this function is written very simply and #' can be easily altered to produce other behavior by examining the source code. #' #' @param object A \code{RangedSummarizedExperiment} or \code{DESeqDataSet} #' @param groupby a grouping factor, as long as the columns of object #' @param run optional, the names of each unique column in object. if provided, #' a new column \code{runsCollapsed} will be added to the \code{colData} #' which pastes together the names of \code{run} #' @param renameCols whether to rename the columns of the returned object #' using the levels of the grouping factor #' #' @return the \code{object} with as many columns as levels in \code{groupby}. #' This object has assay/count data which is summed from the various #' columns which are grouped together, and the \code{colData} is subset using #' the first column for each group in \code{groupby}. #' #' @examples #' #' dds <- makeExampleDESeqDataSet(m=12) #' #' # make data with two technical replicates for three samples #' dds$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1))))) #' dds$run <- paste0("run",1:12) #' #' ddsColl <- collapseReplicates(dds, dds$sample, dds$run) #' #' # examine the colData and column names of the collapsed data #' colData(ddsColl) #' colnames(ddsColl) #' #' # check that the sum of the counts for "sample1" is the same #' # as the counts in the "sample1" column in ddsColl #' matchFirstLevel <- dds$sample == levels(dds$sample)[1] #' stopifnot(all(rowSums(counts(dds[,matchFirstLevel])) == counts(ddsColl[,1]))) #' #' @export collapseReplicates <- function(object, groupby, run, renameCols=TRUE) { if (!is.factor(groupby)) groupby <- factor(groupby) groupby <- droplevels(groupby) stopifnot(length(groupby) == ncol(object)) sp <- split(seq(along=groupby), groupby) countdata <- sapply(sp, function(i) rowSums(assay(object)[,i,drop=FALSE])) mode(countdata) <- "integer" colsToKeep <- sapply(sp, `[`, 1) collapsed <- object[,colsToKeep] dimnames(countdata) <- dimnames(collapsed) assay(collapsed) <- countdata if (!missing(run)) { stopifnot(length(groupby) == length(run)) colData(collapsed)$runsCollapsed <- sapply(sp, function(i) paste(run[i],collapse=",")) } if (renameCols) { colnames(collapsed) <- levels(groupby) } stopifnot(sum(as.numeric(assay(object))) == sum(as.numeric(assay(collapsed)))) collapsed } #' FPKM: fragments per kilobase per million mapped fragments #' #' The following function returns fragment counts normalized #' per kilobase of feature length per million mapped fragments #' (by default using a robust estimate of the library size, #' as in \code{\link{estimateSizeFactors}}). #' #' The length of the features (e.g. genes) is calculated one of two ways: #' (1) If there is a matrix named "avgTxLength" in \code{assays(dds)}, #' this will take precedence in the length normalization. #' This occurs when using the tximport-DESeq2 pipeline. #' (2) Otherwise, feature length is calculated #' from the \code{rowRanges} of the dds object, #' if a column \code{basepairs} is not present in \code{mcols(dds)}. #' The calculated length is the number of basepairs in the union of all \code{GRanges} #' assigned to a given row of \code{object}, e.g., #' the union of all basepairs of exons of a given gene. #' Note that the second approach over-estimates the gene length #' (average transcript length, weighted by abundance is a more appropriate #' normalization for gene counts), and so the FPKM will be an underestimate of the true value. #' #' Note that, when the read/fragment counting has inter-feature dependencies, a strict #' normalization would not incorporate the basepairs of a feature which #' overlap another feature. This inter-feature dependence is not taken into #' consideration in the internal union basepair calculation. #' #' @param object a \code{DESeqDataSet} #' @param robust whether to use size factors to normalize #' rather than taking the column sums of the raw counts, #' using the \code{\link{fpm}} function. #' #' @return a matrix which is normalized per kilobase of the #' union of basepairs in the \code{GRangesList} or \code{GRanges} #' of the mcols(object), and per million of mapped fragments, #' either using the robust median ratio method (robust=TRUE, default) #' or using raw counts (robust=FALSE). #' Defining a column \code{mcols(object)$basepairs} takes #' precedence over internal calculation of the kilobases for each row. #' #' @examples #' #' # create a matrix with 1 million counts for the #' # 2nd and 3rd column, the 1st and 4th have #' # half and double the counts, respectively. #' m <- matrix(1e6 * rep(c(.125, .25, .25, .5), each=4), #' ncol=4, dimnames=list(1:4,1:4)) #' mode(m) <- "integer" #' se <- SummarizedExperiment(list(counts=m), colData=DataFrame(sample=1:4)) #' dds <- DESeqDataSet(se, ~ 1) #' #' # create 4 GRanges with lengths: 1, 1, 2, 2.5 Kb #' gr1 <- GRanges("chr1",IRanges(1,1000)) # 1kb #' gr2 <- GRanges("chr1",IRanges(c(1,1001),c( 500,1500))) # 1kb #' gr3 <- GRanges("chr1",IRanges(c(1,1001),c(1000,2000))) # 2kb #' gr4 <- GRanges("chr1",IRanges(c(1,1001),c(200,1300))) # 500bp #' rowRanges(dds) <- GRangesList(gr1,gr2,gr3,gr4) #' #' # the raw counts #' counts(dds) #' #' # the FPM values #' fpm(dds) #' #' # the FPKM values #' fpkm(dds) #' #' @seealso \code{\link{fpm}} #' #' @docType methods #' @name fpkm #' @rdname fpkm #' #' @export fpkm <- function(object, robust=TRUE) { fpm <- fpm(object, robust=robust) if ("avgTxLength" %in% assayNames(object)) { exprs <- 1e3 * fpm / assays(object)[["avgTxLength"]] if (robust) { sf <- estimateSizeFactorsForMatrix(exprs) exprs <- t(t(exprs)/sf) return(exprs) } else { return(exprs) } } if (is.null(mcols(object)$basepairs)) { if (class(rowRanges(object)) == "GRangesList") { ubp <- DataFrame(basepairs = sum(width(reduce(rowRanges(object))))) } else if (class(rowRanges(object)) == "GRanges") { ubp <- DataFrame(basepairs = width(rowRanges(object))) } if (all(ubp$basepairs == 0)) { stop("rowRanges(object) has all ranges of zero width. the user should instead supply a column, mcols(object)$basepairs, which will be used to produce FPKM values") } if (is.null(mcols(mcols(object)))) { mcols(object) <- ubp } else { mcols(ubp) <- DataFrame(type="intermediate", description="count of basepairs in the union of all ranges") mcols(object) <- cbind(mcols(object), ubp) } } 1e3 * sweep(fpm, 1, mcols(object)$basepairs, "/") } #' FPM: fragments per million mapped fragments #' #' Calculates either a robust version (default) #' or the traditional matrix of fragments/counts per million mapped #' fragments (FPM/CPM). #' Note: this function is written very simply and #' can be easily altered to produce other behavior by examining the source code. #' #' @param object a \code{DESeqDataSet} #' @param robust whether to use size factors to normalize #' rather than taking the column sums of the raw counts. #' If TRUE, the size factors and the geometric mean of #' column sums are multiplied to create a robust library size estimate. #' Robust normalization is not used if average transcript lengths are present. #' #' @return a matrix which is normalized per million of mapped fragments, #' either using the robust median ratio method (robust=TRUE, default) #' or using raw counts (robust=FALSE). #' #' @examples #' #' # generate a dataset with size factors: .5, 1, 1, 2 #' dds <- makeExampleDESeqDataSet(m = 4, n = 1000, #' interceptMean=log2(1e3), #' interceptSD=0, #' sizeFactors=c(.5,1,1,2), #' dispMeanRel=function(x) .01) #' #' # add a few rows with very high count #' counts(dds)[4:10,] <- 2e5L #' #' # in this robust version, the counts are comparable across samples #' round(head(fpm(dds), 3)) #' #' # in this column sum version, the counts are still skewed: #' # sample1 < sample2 & 3 < sample 4 #' round(head(fpm(dds, robust=FALSE), 3)) #' #' # the column sums of the robust version #' # are not equal to 1e6, but the #' # column sums of the non-robust version #' # are equal to 1e6 by definition #' #' colSums(fpm(dds))/1e6 #' colSums(fpm(dds, robust=FALSE))/1e6 #' #' @seealso \code{\link{fpkm}} #' #' @docType methods #' @name fpm #' @rdname fpm #' #' @export fpm <- function(object, robust=TRUE) { # we do something different if average tx lengths are present noAvgTxLen <- !("avgTxLength" %in% assayNames(object)) if (robust & is.null(sizeFactors(object)) & noAvgTxLen) { object <- estimateSizeFactors(object) } k <- counts(object) library.sizes <- if (robust & noAvgTxLen) { sizeFactors(object) * exp(mean(log(colSums(k)))) } else { colSums(k) } 1e6 * sweep(k, 2, library.sizes, "/") } #' Normalize for gene length #' #' Normalize for gene length using the output of transcript abundance estimators #' #' This function is deprecated and moved to a new general purpose package, #' tximport, which will be added to Bioconductor. #' #' @param ... ... #' #' @export normalizeGeneLength <- function(...) { .Deprecated("tximport, a separate package on Bioconductor") } #' Normalized counts transformation #' #' A simple function for creating a \code{\link{DESeqTransform}} #' object after applying: \code{f(count(dds,normalized=TRUE) + pc)}. #' #' @param object a DESeqDataSet object #' @param f a function to apply to normalized counts #' @param pc a pseudocount to add to normalized counts #' #' @seealso \code{\link{varianceStabilizingTransformation}}, \code{\link{rlog}} #' #' @export normTransform <- function(object, f=log2, pc=1) { if (is.null(sizeFactors(object)) & is.null(normalizationFactors(object))) { object <- estimateSizeFactors(object) } nt <- f(counts(object, normalized=TRUE) + pc) se <- SummarizedExperiment( assays = nt, colData = colData(object), rowRanges = rowRanges(object), metadata = metadata(object)) DESeqTransform(se) } ##################### # unexported ##################### # function to split up DESeqDataSet by rows during easily parallelizable steps # TODO: recombining the resulting DESeqDataSets using rbind() is a bit wasteful, # as the count matrix and GRanges from the original object are unchanged. DESeqParallel <- function(object, test, fitType, betaPrior, full, reduced, quiet, modelMatrix, modelMatrixType, BPPARAM) { # size factors already estimated or supplied # break up the object into equal sized chunks # to be fed to the different workers object <- getBaseMeansAndVariances(object) objectNZ <- object[!mcols(object)$allZero,,drop=FALSE] nworkers <- BPPARAM$workers idx <- factor(sort(rep(seq_len(nworkers),length=nrow(objectNZ)))) if (missing(modelMatrixType)) { modelMatrixType <- NULL } # if no reps, treat samples as replicates and print warning noReps <- checkForExperimentalReplicates(object, modelMatrix) if (noReps) { designIn <- design(objectNZ) design(objectNZ) <- formula(~ 1) } # first parallel execution: gene-wise dispersion estimates if (!quiet) message("estimating dispersions") if (!quiet) message(paste("gene-wise dispersion estimates:",nworkers,"workers")) objectNZ <- do.call(rbind, bplapply(levels(idx), function(l) { estimateDispersionsGeneEst(objectNZ[idx == l,,drop=FALSE], quiet=TRUE, modelMatrix=modelMatrix) }, BPPARAM=BPPARAM)) # the dispersion fit and dispersion prior are estimated over all rows if (!quiet) message("mean-dispersion relationship") objectNZ <- estimateDispersionsFit(objectNZ, fitType=fitType) dispPriorVar <- estimateDispersionsPriorVar(objectNZ, modelMatrix=modelMatrix) # need to condition on whether a beta prior needs to be fit if (betaPrior) { # if so: # if explicitly set if (!is.null(modelMatrixType) && modelMatrixType == "standard") { attr(object, "modelMatrixType") <- "standard" attr(objectNZ, "modelMatrixType") <- "standard" } # second parallel execution: fit the final dispersion estimates and MLE betas if (!quiet) message(paste("final dispersion estimates, MLE betas:",nworkers,"workers")) objectNZ <- do.call(rbind, bplapply(levels(idx), function(l) { objectNZSub <- estimateDispersionsMAP(objectNZ[idx == l,,drop=FALSE], dispPriorVar=dispPriorVar, quiet=TRUE) # replace design if (noReps) design(objectNZSub) <- designIn estimateMLEForBetaPriorVar(objectNZSub, modelMatrixType=modelMatrixType) }, BPPARAM=BPPARAM)) # replace design if (noReps) design(objectNZ) <- designIn # the beta prior is estimated over all rows betaPriorVar <- estimateBetaPriorVar(objectNZ) # the third parallel execution: the final GLM and statistics if (!quiet) message(paste("fitting model and testing:",nworkers,"workers")) objectNZ <- do.call(rbind, bplapply(levels(idx), function(l) { nbinomWaldTest(objectNZ[idx == l,,drop=FALSE], betaPrior=TRUE, betaPriorVar=betaPriorVar, quiet=TRUE, modelMatrixType=modelMatrixType) }, BPPARAM=BPPARAM)) } else { # or, if no beta prior to fit, # second parallel execution: fit the final dispersion estimates and the final GLM and statistics if (!quiet) message(paste("final dispersion estimates, fitting model and testing:",nworkers,"workers")) if (test == "Wald") { objectNZ <- do.call(rbind, bplapply(levels(idx), function(l) { objectNZSub <- estimateDispersionsMAP(objectNZ[idx == l,,drop=FALSE], dispPriorVar=dispPriorVar, quiet=TRUE, modelMatrix=modelMatrix) # replace design if (noReps) design(objectNZSub) <- designIn nbinomWaldTest(objectNZSub, betaPrior=FALSE, quiet=TRUE, modelMatrix=modelMatrix, modelMatrixType="standard") }, BPPARAM=BPPARAM)) } else if (test == "LRT") { objectNZ <- do.call(rbind, bplapply(levels(idx), function(l) { objectNZSub <- estimateDispersionsMAP(objectNZ[idx == l,,drop=FALSE], dispPriorVar=dispPriorVar, quiet=TRUE, modelMatrix=modelMatrix) # replace design if (noReps) design(objectNZSub) <- designIn nbinomLRT(objectNZSub, full=full, reduced=reduced, quiet=TRUE) }, BPPARAM=BPPARAM)) } } outMcols <- buildDataFrameWithNARows(mcols(objectNZ), mcols(object)$allZero) mcols(outMcols) <- mcols(mcols(objectNZ)) outMu <- buildMatrixWithNARows(assays(objectNZ)[["mu"]], mcols(object)$allZero) outCooks <- buildMatrixWithNARows(assays(objectNZ)[["cooks"]], mcols(object)$allZero) # now backfill any columns in rowRanges which existed before running DESeq() # and which are not of type "intermediate" or "results" object <- sanitizeRowRanges(object) inMcols <- mcols(object) namesCols <- names(mcols(object)) inputCols <- namesCols[! mcols(mcols(object))$type %in% c("intermediate","results")] for (var in inputCols) { outMcols[var] <- inMcols[var] } mcols(object) <- outMcols object <- getBaseMeansAndVariances(object) assays(object)[["mu"]] <- outMu assays(object)[["cooks"]] <- outCooks attrNames <- c("dispersionFunction","modelMatrixType","betaPrior", "betaPriorVar","modelMatrix","test","dispModelMatrix") for (attrName in attrNames) { attr(object, attrName) <- attr(objectNZ, attrName) } object } DESeq2/R/lfcShrink.R0000644000175400017540000003132213201671732015075 0ustar00biocbuildbiocbuild#' Shrink log2 fold changes #' #' Adds shrunken log2 fold changes (LFC) and SE to a #' results table from \code{DESeq} run without LFC shrinkage. #' Three shrinkage esimators for LFC are available via \code{type}. #' #' As of DESeq2 version 1.18, \code{type="apeglm"} and \code{type="ashr"} #' are new features, and still under development. #' Specifying \code{type="apeglm"} passes along DESeq2 MLE log2 #' fold changes and standard errors to the \code{apeglm} function #' in the apeglm package, and re-estimates posterior LFCs for #' the coefficient specified by \code{coef}. #' Specifying \code{type="ashr"} passes along DESeq2 MLE log2 #' fold changes and standard errors to the \code{ash} function #' in the ashr package, #' with arguments \code{mixcompdist="normal"} and \code{method="shrink"} #' (\code{coef} and \code{contrast} ignored). #' See vignette for a comparison of shrinkage estimators on an example dataset. #' For all shrinkage methods, details on the prior is included in #' \code{priorInfo(res)}, including the \code{fitted_g} mixture for ashr. #' The integration of shrinkage methods from #' external packages will likely evolve over time. We will likely incorporate an #' \code{lfcThreshold} argument which can be passed to apeglm #' to specify regions of the posterior at an arbitrary threshold. #' #' For \code{type="normal"}, shrinkage cannot be applied to coefficients #' in a model with interaction terms. #' #' @param dds a DESeqDataSet object, after running \code{\link{DESeq}} #' @param coef the name or number of the coefficient (LFC) to shrink, #' consult \code{resultsNames(dds)} after running \code{DESeq(dds)}. #' note: only \code{coef} or \code{contrast} can be specified, not both. #' \code{type="apeglm"} requires use of \code{coef}. #' @param contrast see argument description in \code{\link{results}}. #' only \code{coef} or \code{contrast} can be specified, not both. #' @param res a DESeqResults object. Results table produced by the #' default pipeline, i.e. \code{DESeq} followed by \code{results}. #' If not provided, it will be generated internally using \code{coef} or \code{contrast} #' @param type \code{"normal"} is the original DESeq2 shrinkage estimator; #' \code{"apeglm"} is the adaptive t prior shrinkage estimator from the 'apeglm' package; #' \code{"ashr"} is the adaptive shrinkage estimator from the 'ashr' package, #' using a fitted mixture of normals prior #' - see the Stephens (2016) reference below for citation #' @param svalue logical, should p-values and adjusted p-values be replaced #' with s-values when using \code{apeglm} or \code{ashr}. #' See Stephens (2016) reference on s-values. #' @param returnList logical, should \code{lfcShrink} return a list, where #' the first element is the results table, and the second element is the #' output of \code{apeglm} or \code{ashr} #' @param apeAdapt logical, should \code{apeglm} use the MLE estimates of #' LFC to adapt the prior, or use default or specified \code{prior.control} #' @param parallel if FALSE, no parallelization. if TRUE, parallel #' execution using \code{BiocParallel}, see same argument of \code{\link{DESeq}} #' parallelization only used with \code{normal} or \code{apeglm} #' @param BPPARAM see same argument of \code{\link{DESeq}} #' @param ... arguments passed to \code{apeglm} and \code{ashr} #' #' @references #' #' \code{type="normal"}: #' #' Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. \url{https://doi.org/10.1186/s13059-014-0550-8} #' #' \code{type="ashr"}: #' #' Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2. \url{https://doi.org/10.1093/biostatistics/kxw041} #' #' @return a DESeqResults object with the \code{log2FoldChange} and \code{lfcSE} #' columns replaced with shrunken LFC and SE. #' \code{priorInfo(res)} contains information about the shrinkage procedure, #' relevant to the various methods specified by \code{type}. #' #' @export #' #' @examples #' #' set.seed(1) #' dds <- makeExampleDESeqDataSet(n=500,betaSD=1) #' dds <- DESeq(dds) #' res <- results(dds) #' #' res.shr <- lfcShrink(dds=dds, coef=2) #' res.shr <- lfcShrink(dds=dds, contrast=c("condition","B","A")) #' res.ape <- lfcShrink(dds=dds, coef=2, type="apeglm") #' res.ash <- lfcShrink(dds=dds, coef=2, type="ashr") #' lfcShrink <- function(dds, coef, contrast, res, type=c("normal","apeglm","ashr"), svalue=FALSE, returnList=FALSE, apeAdapt=TRUE, parallel=FALSE, BPPARAM=bpparam(), ...) { # TODO: lfcThreshold for types: normal and apeglm type <- match.arg(type, choices=c("normal","apeglm","ashr")) if (attr(dds,"betaPrior")) { stop("lfcShrink should be used downstream of DESeq() with betaPrior=FALSE (the default)") } if (!missing(coef)) { if (is.numeric(coef)) { stopifnot(coef <= length(resultsNames(dds))) coefAlpha <- resultsNames(dds)[coef] coefNum <- coef } else if (is.character(coef)) { stopifnot(coef %in% resultsNames(dds)) coefNum <- which(resultsNames(dds) == coef) coefAlpha <- coef } } if (missing(res)) { if (!missing(coef)) { res <- results(dds, name=coefAlpha) } else if (!missing(contrast)) { res <- results(dds, contrast=contrast) } else { stop("one of coef or contrast required if 'res' is missing") } } if (type %in% c("normal","apeglm")) { if (is.null(dispersions(dds))) { stop("type='normal' and 'apeglm' require dispersion estimates, first call estimateDispersions()") } stopifnot(all(rownames(dds) == rownames(res))) if (parallel) { nworkers <- BPPARAM$workers parallelIdx <- factor(sort(rep(seq_len(nworkers),length=nrow(dds)))) } } if (type == "normal") { ############ ## normal ## ############ termsOrder <- attr(terms.formula(design(dds)),"order") interactionPresent <- any(termsOrder > 1) if (interactionPresent) { stop("LFC shrinkage type='normal' not implemented for designs with interactions") } stopifnot(missing(coef) | missing(contrast)) # find and rename the MLE columns for estimateBetaPriorVar betaCols <- grep("log2 fold change \\(MLE\\)", mcols(mcols(dds))$description) stopifnot(length(betaCols) > 0) if (!any(grepl("MLE_",names(mcols(dds))[betaCols]))) { names(mcols(dds))[betaCols] <- paste0("MLE_", names(mcols(dds))[betaCols]) } if (missing(contrast)) { modelMatrixType <- "standard" } else { modelMatrixType <- "expanded" } attr(dds,"modelMatrixType") <- modelMatrixType betaPriorVar <- estimateBetaPriorVar(dds) stopifnot(length(betaPriorVar) > 0) # parallel fork if (!parallel) { dds.shr <- nbinomWaldTest(dds, betaPrior=TRUE, betaPriorVar=betaPriorVar, modelMatrixType=modelMatrixType, quiet=TRUE) } else { dds.shr <- do.call(rbind, bplapply(levels(parallelIdx), function(l) { nbinomWaldTest(dds[parallelIdx == l,,drop=FALSE], betaPrior=TRUE, betaPriorVar=betaPriorVar, modelMatrixType=modelMatrixType, quiet=TRUE) }, BPPARAM=BPPARAM)) } if (missing(contrast)) { # parallel not necessary here res.shr <- results(dds.shr, name=coefAlpha) } else { res.shr <- results(dds.shr, contrast=contrast, parallel=parallel, BPPARAM=BPPARAM) } res$log2FoldChange <- res.shr$log2FoldChange res$lfcSE <- res.shr$lfcSE mcols(res)$description[2:3] <- mcols(res.shr)$description[2:3] deseq2.version <- packageVersion("DESeq2") priorInfo(res) <- list(type="normal", package="DESeq2", version=deseq2.version, betaPriorVar=betaPriorVar) return(res) } else if (type == "apeglm") { ############ ## apeglm ## ############ if (!requireNamespace("apeglm", quietly=TRUE)) { stop("type='apeglm' requires installing the Bioconductor package 'apeglm'") } message("using 'apeglm' for LFC shrinkage") if (!missing(contrast)) { stop("type='apeglm' shrinkage only for use with 'coef'") } stopifnot(!missing(coef)) incomingCoef <- gsub(" ","_",sub("log2 fold change \\(MLE\\): ","",mcols(res)[2,2])) if (coefAlpha != incomingCoef) { stop("'coef' should specify same coefficient as in results 'res'") } Y <- counts(dds) if (attr(dds, "modelMatrixType") == "user-supplied") { message("using supplied model matrix") design <- attr(dds, "modelMatrix") } else { design <- model.matrix(design(dds), data=colData(dds)) } disps <- dispersions(dds) if (is.null(normalizationFactors(dds))) { offset <- matrix(log(sizeFactors(dds)), nrow=nrow(dds), ncol=ncol(dds), byrow=TRUE) } else { offset <- log(normalizationFactors(dds)) } if ("weights" %in% assayNames(dds)) { weights <- assays(dds)[["weights"]] } else { weights <- matrix(1, nrow=nrow(dds), ncol=ncol(dds)) } if (apeAdapt) { mle <- log(2) * cbind(res$log2FoldChange, res$lfcSE) } else { mle <- NULL } if (!parallel) { fit <- apeglm::apeglm(Y=Y, x=design, log.lik=apeglm::logLikNB, param=disps, coef=coefNum, mle=mle, weights=weights, offset=offset, ...) } else { fitList <- bplapply(levels(parallelIdx), function(l) { idx <- parallelIdx == l apeglm::apeglm(Y=Y[idx,,drop=FALSE], x=design, log.lik=apeglm::logLikNB, param=disps[idx], coef=coefNum, mle=mle, weights=weights[idx,,drop=FALSE], offset=offset[idx,,drop=FALSE], ...) }) fit <- list() for (param in c("map","se","fsr","svalue","interval","diag")) { fit[[param]] <- do.call(rbind, lapply(fitList, `[[`, param)) } fit$prior.control <- fitList[[1]]$prior.control fit$svalue <- apeglm::svalue(fit$fsr[,1]) } stopifnot(nrow(fit$map) == nrow(dds)) conv <- fit$diag[,"conv"] if (!all(conv[!is.na(conv)] == 0)) { message("Some rows did not converge in finding the MAP") } res$log2FoldChange <- log2(exp(1)) * fit$map[,coefNum] res$lfcSE <- log2(exp(1)) * fit$se[,coefNum] mcols(res)$description[2] <- sub("MLE","MAP",mcols(res)$description[2]) if (svalue) { coefAlphaSpaces <- gsub("_"," ",coefAlpha) res <- res[,1:3] res$svalue <- as.numeric(fit$svalue) mcols(res)[4,] <- DataFrame(type="results", description=paste("s-value:",coefAlphaSpaces)) } else{ res <- res[,c(1:3,5:6)] } priorInfo(res) <- list(type="apeglm", package="apeglm", version=packageVersion("apeglm"), prior.control=fit$prior.control) if (returnList) { return(list(res=res, fit=fit)) } else{ return(res) } } else if (type == "ashr") { ########## ## ashr ## ########## if (!requireNamespace("ashr", quietly=TRUE)) { stop("type='ashr' requires installing the CRAN package 'ashr'") } message("using 'ashr' for LFC shrinkage. If used in published research, please cite: Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2. https://doi.org/10.1093/biostatistics/kxw041") betahat <- res$log2FoldChange sebetahat <- res$lfcSE fit <- ashr::ash(betahat, sebetahat, mixcompdist="normal", method="shrink", ...) res$log2FoldChange <- fit$result$PosteriorMean res$lfcSE <- fit$result$PosteriorSD mcols(res)$description[2] <- sub("MLE","PostMean",mcols(res)$description[2]) if (svalue) { coefAlphaSpaces <- sub(".*p-value: ","",mcols(res)$description[5]) res <- res[,1:3] res$svalue <- fit$result$svalue mcols(res)[4,] <- DataFrame(type="results", description=paste("s-value:",coefAlphaSpaces)) } else { res <- res[,c(1:3,5:6)] } priorInfo(res) <- list(type="ashr", package="ashr", version=packageVersion("ashr"), fitted_g=fit$fitted_g) if (returnList) { return(list(res=res, fit=fit)) } else{ return(res) } } } DESeq2/R/methods.R0000644000175400017540000010752513201671732014626 0ustar00biocbuildbiocbuildcounts.DESeqDataSet <- function(object, normalized=FALSE, replaced=FALSE) { # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) if (replaced) { if ("replaceCounts" %in% assayNames(object)) { cnts <- assays(object)[["replaceCounts"]] } else { warning("there are no assays named 'replaceCounts', using original. calling DESeq() will replace outliers if they are detected and store this assay.") cnts <- assays(object)[["counts"]] } } else { cnts <- assays(object)[["counts"]] } if (!normalized) { return(cnts) } else { if (!is.null(normalizationFactors(object))) { return( cnts / normalizationFactors(object) ) } else if (is.null(sizeFactors(object)) || any(is.na(sizeFactors(object)))) { stop("first calculate size factors, add normalizationFactors, or set normalized=FALSE") } else { return( t( t( cnts ) / sizeFactors(object) ) ) } } } #' Accessors for the 'counts' slot of a DESeqDataSet object. #' #' The counts slot holds the count data as a matrix of non-negative integer #' count values, one row for each observational unit (gene or the like), and one #' column for each sample. #' #' @docType methods #' @name counts #' @rdname counts #' @aliases counts counts,DESeqDataSet-method counts<-,DESeqDataSet,matrix-method #' #' @param object a \code{DESeqDataSet} object. #' @param normalized logical indicating whether or not to divide the counts by #' the size factors or normalization factors before returning #' (normalization factors always preempt size factors) #' @param replaced after a \code{DESeq} call, this argument will return the counts #' with outliers replaced instead of the original counts, and optionally \code{normalized}. #' The replaced counts are stored by \code{DESeq} in \code{assays(object)[['replaceCounts']]}. #' @param value an integer matrix #' @author Simon Anders #' @seealso \code{\link{sizeFactors}}, \code{\link{normalizationFactors}} #' #' @examples #' #' dds <- makeExampleDESeqDataSet(m=4) #' head(counts(dds)) #' #' dds <- estimateSizeFactors(dds) # run this or DESeq() first #' head(counts(dds, normalized=TRUE)) #' #' @export setMethod("counts", signature(object="DESeqDataSet"), counts.DESeqDataSet) #' @name counts #' @rdname counts #' @exportMethod "counts<-" setReplaceMethod("counts", signature(object="DESeqDataSet", value="matrix"), function( object, value ) { assays(object)[["counts"]] <- value validObject(object) object }) design.DESeqDataSet <- function(object) object@design #' Accessors for the 'design' slot of a DESeqDataSet object. #' #' The design holds the R \code{formula} which expresses how the #' counts depend on the variables in \code{colData}. #' See \code{\link{DESeqDataSet}} for details. #' #' @docType methods #' @name design #' @rdname design #' @aliases design design,DESeqDataSet-method design<-,DESeqDataSet,formula-method #' @param object a \code{DESeqDataSet} object #' @param value a \code{formula} used for estimating dispersion #' and fitting Negative Binomial GLMs #' @examples #' #' dds <- makeExampleDESeqDataSet(m=4) #' design(dds) <- formula(~ 1) #' #' @export setMethod("design", signature(object="DESeqDataSet"), design.DESeqDataSet) #' @name design #' @rdname design #' @exportMethod "design<-" setReplaceMethod("design", signature(object="DESeqDataSet", value="formula"), function( object, value ) { # Temporary hack for backward compatibility with "old" # DESeqDataSet objects. Remove once all serialized # DESeqDataSet objects around have been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) object@design <- value validObject(object) object }) dispersionFunction.DESeqDataSet <- function(object) object@dispersionFunction #' Accessors for the 'dispersionFunction' slot of a DESeqDataSet object. #' #' The dispersion function is calculated by \code{\link{estimateDispersions}} and #' used by \code{\link{varianceStabilizingTransformation}}. Parametric dispersion #' fits store the coefficients of the fit as attributes in this slot. #' #' Setting this will also overwrite \code{mcols(object)$dispFit} and the estimate #' the variance of dispersion residuals, see \code{estimateVar} below. #' #' @docType methods #' @name dispersionFunction #' @rdname dispersionFunction #' @aliases dispersionFunction dispersionFunction,DESeqDataSet-method dispersionFunction<-,DESeqDataSet,function-method #' @param object a \code{DESeqDataSet} object. #' @param value a \code{function} #' @param estimateVar whether to estimate the variance of dispersion residuals. #' setting to FALSE is needed, e.g. within \code{estimateDispersionsMAP} when #' called on a subset of the full dataset in parallel execution. #' @param ... additional arguments #' #' @seealso \code{\link{estimateDispersions}} #' #' @examples #' #' dds <- makeExampleDESeqDataSet(m=4) #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersions(dds) #' dispersionFunction(dds) #' #' @export setMethod("dispersionFunction", signature(object="DESeqDataSet"), dispersionFunction.DESeqDataSet) #' @name dispersionFunction #' @rdname dispersionFunction #' @exportMethod "dispersionFunction<-" setReplaceMethod("dispersionFunction", signature(object="DESeqDataSet", value="function"), function(object, value, estimateVar=TRUE) { # Temporary hack for backward compatibility with "old" # DESeqDataSet objects. Remove once all serialized # DESeqDataSet objects around have been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) if (estimateVar) { if (is.null(mcols(object)$baseMean) | is.null(mcols(object)$allZero)) { object <- getBaseMeansAndVariances(object) } if (!is.null(mcols(object)$dispFit)) { message("found already estimated fitted dispersions, removing these") mcols(object) <- mcols(object)[,!names(mcols(object)) == "dispFit",drop=FALSE] } nonzeroIdx <- !mcols(object)$allZero dispFit <- value(mcols(object)$baseMean[nonzeroIdx]) # if the function returns a single value, build the full vector if (length(dispFit) == 1) { dispFit <- rep(dispFit, sum(nonzeroIdx)) } dispDataFrame <- buildDataFrameWithNARows(list(dispFit=dispFit), mcols(object)$allZero) mcols(dispDataFrame) <- DataFrame(type="intermediate", description="fitted values of dispersion") mcols(object) <- cbind(mcols(object), dispDataFrame) # need to estimate variance of log dispersion residuals minDisp <- 1e-8 dispGeneEst <- mcols(object)$dispGeneEst[nonzeroIdx] aboveMinDisp <- dispGeneEst >= minDisp*100 if (sum(aboveMinDisp,na.rm=TRUE) > 0) { dispResiduals <- log(dispGeneEst) - log(dispFit) varLogDispEsts <- mad(dispResiduals[aboveMinDisp],na.rm=TRUE)^2 attr( value, "varLogDispEsts" ) <- varLogDispEsts } else { message("variance of dispersion residuals not estimated (necessary only for differential expression calling)") } } object@dispersionFunction <- value validObject(object) object }) dispersions.DESeqDataSet <- function(object) mcols(object)$dispersion #' Accessor functions for the dispersion estimates in a DESeqDataSet #' object. #' #' The dispersions for each row of the DESeqDataSet. Generally, #' these are set by \code{\link{estimateDispersions}}. #' #' @docType methods #' @name dispersions #' @rdname dispersions #' @aliases dispersions dispersions,DESeqDataSet-method dispersions<-,DESeqDataSet,numeric-method #' @param object a \code{DESeqDataSet} object. #' @param value the dispersions to use for the Negative Binomial modeling #' @param ... additional arguments #' #' @author Simon Anders #' @seealso \code{\link{estimateDispersions}} #' #' @export setMethod("dispersions", signature(object="DESeqDataSet"), dispersions.DESeqDataSet) #' @name dispersions #' @rdname dispersions #' @exportMethod "dispersions<-" setReplaceMethod("dispersions", signature(object="DESeqDataSet", value="numeric"), function(object, value) { firstRowDataColumn <- ncol(mcols(object)) == 0 mcols(object)$dispersion <- value if (firstRowDataColumn) { mcols(mcols(object)) <- DataFrame(type="input", description="final estimate of dispersion") } validObject( object ) object }) sizeFactors.DESeqDataSet <- function(object) { if (!"sizeFactor" %in% names(colData(object))) return(NULL) sf <- object$sizeFactor names( sf ) <- colnames( object ) sf } #' Accessor functions for the 'sizeFactors' information in a DESeqDataSet #' object. #' #' The sizeFactors vector assigns to each column of the count matrix a value, the #' size factor, such that count values in the columns can be brought to a common #' scale by dividing by the corresponding size factor (as performed by #' \code{counts(dds, normalized=TRUE)}). #' See \code{\link{DESeq}} for a description of the use of size factors. If gene-specific normalization #' is desired for each sample, use \code{\link{normalizationFactors}}. #' #' @docType methods #' @name sizeFactors #' @rdname sizeFactors #' @aliases sizeFactors sizeFactors,DESeqDataSet-method sizeFactors<-,DESeqDataSet,numeric-method #' @param object a \code{DESeqDataSet} object. #' @param value a numeric vector, one size factor for each column in the count #' data. #' @author Simon Anders #' @seealso \code{\link{estimateSizeFactors}} #' #' @export setMethod("sizeFactors", signature(object="DESeqDataSet"), sizeFactors.DESeqDataSet) #' @name sizeFactors #' @rdname sizeFactors #' @exportMethod "sizeFactors<-" setReplaceMethod("sizeFactors", signature(object="DESeqDataSet", value="numeric"), function( object, value ) { stopifnot(all(!is.na(value))) stopifnot(all(is.finite(value))) stopifnot(all(value > 0)) # Temporary hack for backward compatibility with "old" # DESeqDataSet objects. Remove once all serialized # DESeqDataSet objects around have been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) # have to make sure to remove sizeFactor which might be # coming from a previous CountDataSet object$sizeFactor <- value idx <- which(colnames(colData(object)) == "sizeFactor") metaDataFrame <- DataFrame(type="intermediate", description="a scaling factor for columns") mcols(colData(object))[idx,] <- metaDataFrame validObject( object ) object }) normalizationFactors.DESeqDataSet <- function(object) { # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) if (!"normalizationFactors" %in% assayNames(object)) return(NULL) assays(object)[["normalizationFactors"]] } #' Accessor functions for the normalization factors in a DESeqDataSet #' object. #' #' Gene-specific normalization factors for each sample can be provided as a matrix, #' which will preempt \code{\link{sizeFactors}}. In some experiments, counts for each #' sample have varying dependence on covariates, e.g. on GC-content for sequencing #' data run on different days, and in this case it makes sense to provide #' gene-specific factors for each sample rather than a single size factor. #' #' Normalization factors alter the model of \code{\link{DESeq}} in the following way, for #' counts \eqn{K_{ij}}{K_ij} and normalization factors \eqn{NF_{ij}}{NF_ij} for gene i and sample j: #' #' \deqn{ K_{ij} \sim \textrm{NB}( \mu_{ij}, \alpha_i) }{ K_ij ~ NB(mu_ij, alpha_i) } #' \deqn{ \mu_{ij} = NF_{ij} q_{ij} }{ mu_ij = NF_ij q_ij } #' #' @note Normalization factors are on the scale of the counts (similar to \code{\link{sizeFactors}}) #' and unlike offsets, which are typically on the scale of the predictors (in this case, log counts). #' Normalization factors should include library size normalization. They should have #' row-wise geometric mean near 1, as is the case with size factors, such that the mean of normalized #' counts is close to the mean of unnormalized counts. See example code below. #' #' @docType methods #' @name normalizationFactors #' @rdname normalizationFactors #' @aliases normalizationFactors normalizationFactors,DESeqDataSet-method normalizationFactors<-,DESeqDataSet,matrix-method #' @param object a \code{DESeqDataSet} object. #' @param value the matrix of normalization factors #' @param ... additional arguments #' @examples #' #' dds <- makeExampleDESeqDataSet(n=100, m=4) #' #' normFactors <- matrix(runif(nrow(dds)*ncol(dds),0.5,1.5), #' ncol=ncol(dds),nrow=nrow(dds), #' dimnames=list(1:nrow(dds),1:ncol(dds))) #' #' # the normalization factors matrix should not have 0's in it #' # it should have geometric mean near 1 for each row #' normFactors <- normFactors / exp(rowMeans(log(normFactors))) #' normalizationFactors(dds) <- normFactors #' #' dds <- DESeq(dds) #' #' @export setMethod("normalizationFactors", signature(object="DESeqDataSet"), normalizationFactors.DESeqDataSet) #' @name normalizationFactors #' @rdname normalizationFactors #' @exportMethod "normalizationFactors<-" setReplaceMethod("normalizationFactors", signature(object="DESeqDataSet", value="matrix"), function(object, value) { stopifnot(all(!is.na(value))) stopifnot(all(is.finite(value))) stopifnot(all(value > 0)) # Temporary hack for backward compatibility with "old" # DESeqDataSet objects. Remove once all serialized # DESeqDataSet objects around have been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) # enforce same dimnames dimnames(value) <- dimnames(object) assays(object)[["normalizationFactors"]] <- value validObject( object ) object }) estimateSizeFactors.DESeqDataSet <- function(object, type=c("ratio","poscounts","iterate"), locfunc=stats::median, geoMeans, controlGenes, normMatrix) { type <- match.arg(type, c("ratio","poscounts","iterate")) # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. if (!.hasSlot(object, "rowRanges")) { object <- updateObject(object) } object <- sanitizeColData(object) if (type == "iterate") { sizeFactors(object) <- estimateSizeFactorsIterate(object) } else { if (type == "poscounts") { geoMeanNZ <- function(x) { if (all(x == 0)) { 0 } else { exp( sum(log(x[x > 0])) / length(x) ) } } geoMeans <- apply(counts(object), 1, geoMeanNZ) } if ("avgTxLength" %in% assayNames(object)) { nm <- assays(object)[["avgTxLength"]] nm <- nm / exp(rowMeans(log(nm))) # divide out the geometric mean normalizationFactors(object) <- estimateNormFactors(counts(object), normMatrix=nm, locfunc=locfunc, geoMeans=geoMeans, controlGenes=controlGenes) message("using 'avgTxLength' from assays(dds), correcting for library size") } else if (missing(normMatrix)) { sizeFactors(object) <- estimateSizeFactorsForMatrix(counts(object), locfunc=locfunc, geoMeans=geoMeans, controlGenes=controlGenes) } else { normalizationFactors(object) <- estimateNormFactors(counts(object), normMatrix=normMatrix, locfunc=locfunc, geoMeans=geoMeans, controlGenes=controlGenes) message("using 'normMatrix', adding normalization factors which correct for library size") } } object } #' Estimate the size factors for a \code{\link{DESeqDataSet}} #' #' This function estimates the size factors using the #' "median ratio method" described by Equation 5 in Anders and Huber (2010). #' The estimated size factors can be accessed using the accessor function \code{\link{sizeFactors}}. #' Alternative library size estimators can also be supplied #' using the assignment function \code{\link{sizeFactors<-}}. #' #' Typically, the function is called with the idiom: #' #' \code{dds <- estimateSizeFactors(dds)} #' #' See \code{\link{DESeq}} for a description of the use of size factors in the GLM. #' One should call this function after \code{\link{DESeqDataSet}} #' unless size factors are manually specified with \code{\link{sizeFactors}}. #' Alternatively, gene-specific normalization factors for each sample can be provided using #' \code{\link{normalizationFactors}} which will always preempt \code{\link{sizeFactors}} #' in calculations. #' #' Internally, the function calls \code{\link{estimateSizeFactorsForMatrix}}, #' which provides more details on the calculation. #' #' @docType methods #' @name estimateSizeFactors #' @rdname estimateSizeFactors #' @aliases estimateSizeFactors estimateSizeFactors,DESeqDataSet-method #' #' @param object a DESeqDataSet #' @param type Method for estimation: either "ratio", "poscounts", or "iterate". #' "ratio" uses the standard median ratio method introduced in DESeq. The size factor is the #' median ratio of the sample over a "pseudosample": for each gene, the geometric mean #' of all samples. #' "poscounts" and "iterate" offer alternative estimators, which can be #' used even when all genes contain a sample with a zero (a problem for the #' default method, as the geometric mean becomes zero, and the ratio undefined). #' The "poscounts" estimator deals with a gene with some zeros, by calculating a #' modified geometric mean by taking the n-th root of the product of the non-zero counts. #' This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples. #' The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and #' finding a size factor vector by numerically optimizing the likelihood #' of the ~1 model. #' @param locfunc a function to compute a location for a sample. By default, the #' median is used. However, especially for low counts, the #' \code{\link[genefilter]{shorth}} function from the genefilter package may give better results. #' @param geoMeans by default this is not provided and the #' geometric means of the counts are calculated within the function. #' A vector of geometric means from another count matrix can be provided #' for a "frozen" size factor calculation #' @param controlGenes optional, numeric or logical index vector specifying those genes to #' use for size factor estimation (e.g. housekeeping or spike-in genes) #' @param normMatrix optional, a matrix of normalization factors which do not yet #' control for library size. Note that this argument should not be used (and #' will be ignored) if the \code{dds} object was created using \code{tximport}. #' In this case, the information in \code{assays(dds)[["avgTxLength"]]} #' is automatically used to create appropriate normalization factors. #' Providing \code{normMatrix} will estimate size factors on the #' count matrix divided by \code{normMatrix} and store the product of the #' size factors and \code{normMatrix} as \code{\link{normalizationFactors}}. #' It is recommended to divide out the row-wise geometric mean of #' \code{normMatrix} so the rows roughly are centered on 1. #' #' @return The DESeqDataSet passed as parameters, with the size factors filled #' in. #' @author Simon Anders #' @seealso \code{\link{estimateSizeFactorsForMatrix}} #' #' @references #' #' Reference for the median ratio method: #' #' Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} #' #' @examples #' #' dds <- makeExampleDESeqDataSet(n=1000, m=4) #' dds <- estimateSizeFactors(dds) #' sizeFactors(dds) #' #' dds <- estimateSizeFactors(dds, controlGenes=1:200) #' #' m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4) #' dds <- estimateSizeFactors(dds, normMatrix=m) #' normalizationFactors(dds)[1:3,] #' #' geoMeans <- exp(rowMeans(log(counts(dds)))) #' dds <- estimateSizeFactors(dds,geoMeans=geoMeans) #' sizeFactors(dds) #' #' @export setMethod("estimateSizeFactors", signature(object="DESeqDataSet"), estimateSizeFactors.DESeqDataSet) estimateDispersions.DESeqDataSet <- function(object, fitType=c("parametric","local","mean"), maxit=100, quiet=FALSE, modelMatrix=NULL) { # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) if (is.null(sizeFactors(object)) & is.null(normalizationFactors(object))) { stop("first call estimateSizeFactors or provide a normalizationFactor matrix before estimateDispersions") } # size factors could have slipped in to colData from a previous run if (!is.null(sizeFactors(object))) { if (!is.numeric(sizeFactors(object))) { stop("the sizeFactor column in colData is not numeric. this column could have come in during colData import and should be removed.") } if (any(is.na(sizeFactors(object)))) { stop("the sizeFactor column in colData contains NA. this column could have come in during colData import and should be removed.") } } if (all(rowSums(counts(object) == counts(object)[,1]) == ncol(object))) { stop("all genes have equal values for all samples. will not be able to perform differential analysis") } if (!is.null(dispersions(object))) { if (!quiet) message("found already estimated dispersions, replacing these") mcols(object) <- mcols(object)[,!(mcols(mcols(object))$type %in% c("intermediate","results")),drop=FALSE] } stopifnot(length(maxit)==1) fitType <- match.arg(fitType, choices=c("parametric","local","mean")) noReps <- checkForExperimentalReplicates(object, modelMatrix) if (noReps) { designIn <- design(object) design(object) <- formula(~ 1) } if (!quiet) message("gene-wise dispersion estimates") object <- estimateDispersionsGeneEst(object, maxit=maxit, quiet=quiet, modelMatrix=modelMatrix) if (!quiet) message("mean-dispersion relationship") object <- estimateDispersionsFit(object, fitType=fitType, quiet=quiet) if (!quiet) message("final dispersion estimates") object <- estimateDispersionsMAP(object, maxit=maxit, quiet=quiet, modelMatrix=modelMatrix) # replace the previous design if (noReps) design(object) <- designIn return(object) } checkForExperimentalReplicates <- function(object, modelMatrix) { # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) noReps <- if (is.null(modelMatrix)) { mmtest <- getModelMatrix(object) nrow(mmtest) == ncol(mmtest) } else { nrow(modelMatrix) == ncol(modelMatrix) } if (noReps) { if (!is.null(modelMatrix)) stop("same number of samples and coefficients to fit with supplied model matrix") warning("same number of samples and coefficients to fit, estimating dispersion by treating samples as replicates. please read the ?DESeq section on 'Experiments without replicates'. in summary: this analysis only potentially useful for data exploration, accurate differential expression analysis requires replication") } noReps } #' Estimate the dispersions for a DESeqDataSet #' #' This function obtains dispersion estimates for Negative Binomial distributed data. #' #' Typically the function is called with the idiom: #' #' \code{dds <- estimateDispersions(dds)} #' #' The fitting proceeds as follows: for each gene, an estimate of the dispersion #' is found which maximizes the Cox Reid-adjusted profile likelihood #' (the methods of Cox Reid-adjusted profile likelihood maximization for #' estimation of dispersion in RNA-Seq data were developed by McCarthy, #' et al. (2012), first implemented in the edgeR package in 2010); #' a trend line capturing the dispersion-mean relationship is fit to the maximum likelihood estimates; #' a normal prior is determined for the log dispersion estimates centered #' on the predicted value from the trended fit #' with variance equal to the difference between the observed variance of the #' log dispersion estimates and the expected sampling variance; #' finally maximum a posteriori dispersion estimates are returned. #' This final dispersion parameter is used in subsequent tests. #' The final dispersion estimates can be accessed from an object using \code{\link{dispersions}}. #' The fitted dispersion-mean relationship is also used in #' \code{\link{varianceStabilizingTransformation}}. #' All of the intermediate values (gene-wise dispersion estimates, fitted dispersion #' estimates from the trended fit, etc.) are stored in \code{mcols(dds)}, with #' information about these columns in \code{mcols(mcols(dds))}. #' #' The log normal prior on the dispersion parameter has been proposed #' by Wu, et al. (2012) and is also implemented in the DSS package. #' #' In DESeq2, the dispersion estimation procedure described above replaces the #' different methods of dispersion from the previous version of the DESeq package. #' #' \code{estimateDispersions} checks for the case of an analysis #' with as many samples as the number of coefficients to fit, #' and will temporarily substitute a design formula \code{~ 1} for the #' purposes of dispersion estimation. This treats the samples as #' replicates for the purpose of dispersion estimation. As mentioned in the DESeq paper: #' "While one may not want to draw strong conclusions from such an analysis, #' it may still be useful for exploration and hypothesis generation." #' #' The lower-level functions called by \code{estimateDispersions} are: #' \code{\link{estimateDispersionsGeneEst}}, #' \code{\link{estimateDispersionsFit}}, and #' \code{\link{estimateDispersionsMAP}}. #' #' @docType methods #' @name estimateDispersions #' @rdname estimateDispersions #' @aliases estimateDispersions estimateDispersions,DESeqDataSet-method #' @param object a DESeqDataSet #' @param fitType either "parametric", "local", or "mean" #' for the type of fitting of dispersions to the mean intensity. #' \itemize{ #' \item parametric - fit a dispersion-mean relation of the form: #' \deqn{dispersion = asymptDisp + extraPois / mean} #' via a robust gamma-family GLM. The coefficients \code{asymptDisp} and \code{extraPois} #' are given in the attribute \code{coefficients} of the \code{\link{dispersionFunction}} #' of the object. #' \item local - use the locfit package to fit a local regression #' of log dispersions over log base mean (normal scale means and dispersions #' are input and output for \code{\link{dispersionFunction}}). The points #' are weighted by normalized mean count in the local regression. #' \item mean - use the mean of gene-wise dispersion estimates. #' } #' @param maxit control parameter: maximum number of iterations to allow for convergence #' @param quiet whether to print messages at each step #' @param modelMatrix an optional matrix which will be used for fitting the expected counts. #' by default, the model matrix is constructed from \code{design(object)} #' #' @return The DESeqDataSet passed as parameters, with the dispersion information #' filled in as metadata columns, accessible via \code{mcols}, or the final dispersions #' accessible via \code{\link{dispersions}}. #' #' @references \itemize{ #' \item Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 11 (2010) R106, \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} #' \item McCarthy, DJ, Chen, Y, Smyth, GK: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40 (2012), 4288-4297, \url{http://dx.doi.org/10.1093/nar/gks042} #' \item Wu, H., Wang, C. & Wu, Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics (2012). \url{http://dx.doi.org/10.1093/biostatistics/kxs033} #' } #' #' @examples #' #' dds <- makeExampleDESeqDataSet() #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersions(dds) #' head(dispersions(dds)) #' #' @export setMethod("estimateDispersions", signature(object="DESeqDataSet"), estimateDispersions.DESeqDataSet) #' Show method for DESeqResults objects #' #' Prints out the information from the metadata columns #' of the results object regarding the log2 fold changes #' and p-values, then shows the DataFrame using the #' standard method. #' #' @docType methods #' @name show #' @rdname show #' @aliases show show,DESeqResults-method #' @author Michael Love #' #' @param object a DESeqResults object #' #' @export setMethod("show", signature(object="DESeqResults"), function(object) { # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. cat(mcols(object)$description[ colnames(object) == "log2FoldChange"],"\n") cat(mcols(object)$description[ colnames(object) == "pvalue"],"\n") show(DataFrame(object)) }) #' Extract a matrix of model coefficients/standard errors #' #' \strong{Note:} results tables with log2 fold change, p-values, adjusted p-values, etc. #' for each gene are best generated using the \code{\link{results}} function. The \code{coef} #' function is designed for advanced users who wish to inspect all model coefficients at once. #' #' Estimated model coefficients or estimated standard errors are provided in a matrix #' form, number of genes by number of parameters, on the log2 scale. #' The columns correspond to columns of the model matrix for final GLM fitting, i.e., #' \code{attr(dds, "modelMatrix")}. #' #' @param object a DESeqDataSet returned by \code{\link{DESeq}}, \code{\link{nbinomWaldTest}}, #' or \code{\link{nbinomLRT}}. #' @param SE whether to give the standard errors instead of coefficients. #' defaults to FALSE so that the coefficients are given. #' @param ... additional arguments #' #' @docType methods #' @name coef #' @rdname coef #' @aliases coef coef.DESeqDataSet #' @author Michael Love #' @importFrom stats coef #' #' @examples #' #' dds <- makeExampleDESeqDataSet(m=4) #' dds <- DESeq(dds) #' coef(dds)[1,] #' coef(dds, SE=TRUE)[1,] #' #' @export coef.DESeqDataSet <- function(object, SE=FALSE, ...) { # Temporary hack for backward compatibility with "old" DESeqDataSet # objects. Remove once all serialized DESeqDataSet objects around have # been updated. if (!.hasSlot(object, "rowRanges")) object <- updateObject(object) resNms <- resultsNames(object) if (length(resNms) == 0) { stop("no coefficients have been generated yet, first call DESeq()") } if (!SE) { as.matrix(mcols(object,use.names=TRUE)[resNms]) } else { as.matrix(mcols(object,use.names=TRUE)[paste0("SE_",resNms)]) } } #' Summarize DESeq results #' #' Print a summary of the results from a DESeq analysis. #' #' @usage #' \method{summary}{DESeqResults}(object, alpha, \dots) #' #' @param object a \code{\link{DESeqResults}} object #' @param alpha the adjusted p-value cutoff. If not set, this #' defaults to the \code{alpha} argument which was used in #' \code{\link{results}} to set the target FDR for independent #' filtering, or if independent filtering was not performed, #' to 0.1. #' @param ... additional arguments #' #' @docType methods #' @name summary #' @rdname summary #' @aliases summary summary.DESeqResults #' @author Michael Love #' #' @examples #' #' dds <- makeExampleDESeqDataSet(m=4) #' dds <- DESeq(dds) #' res <- results(dds) #' summary(res) #' #' @export summary.DESeqResults <- function(object, alpha, ...) { if (missing(alpha)) { alpha <- if (is.null(metadata(object)$alpha)) { 0.1 } else { metadata(object)$alpha } } cat("\n") notallzero <- sum(object$baseMean > 0) up <- sum(object$padj < alpha & object$log2FoldChange > 0, na.rm=TRUE) down <- sum(object$padj < alpha & object$log2FoldChange < 0, na.rm=TRUE) filt <- sum(!is.na(object$pvalue) & is.na(object$padj)) outlier <- sum(object$baseMean > 0 & is.na(object$pvalue)) ft <- if (is.null(metadata(object)$filterThreshold)) { 0 } else { round(metadata(object)$filterThreshold) } ihw <- "ihwResult" %in% names(metadata(object)) printsig <- function(x) format(x, digits=2) cat("out of",notallzero,"with nonzero total read count\n") cat(paste0("adjusted p-value < ",alpha,"\n")) cat(paste0("LFC > 0 (up) : ",up,", ",printsig(up/notallzero*100),"% \n")) cat(paste0("LFC < 0 (down) : ",down,", ",printsig(down/notallzero*100),"% \n")) cat(paste0("outliers [1] : ",outlier,", ",printsig(outlier/notallzero*100),"% \n")) if (!ihw) cat(paste0("low counts [2] : ",filt,", ",printsig(filt/notallzero*100),"% \n")) if (!ihw) cat(paste0("(mean count < ",ft,")\n")) cat("[1] see 'cooksCutoff' argument of ?results\n") if (!ihw) cat("[2] see 'independentFiltering' argument of ?results\n") if (ihw) cat("[2] see metadata(res)$ihwResult on hypothesis weighting\n") cat("\n") } #' Accessors for the 'priorInfo' slot of a DESeqResults object. #' #' The priorInfo slot contains details about the prior on log fold changes #' #' @docType methods #' @name priorInfo #' @rdname priorInfo #' @aliases priorInfo priorInfo,DESeqResults-method priorInfo<-,DESeqResults,list-method #' #' @param object a \code{DESeqResults} object #' @param value a \code{list} #' @param ... additional arguments #' #' @export setMethod("priorInfo", signature(object="DESeqResults"), function(object) object@priorInfo) #' @name priorInfo #' @rdname priorInfo #' @exportMethod "priorInfo<-" setReplaceMethod("priorInfo", signature(object="DESeqResults", value="list"), function(object, value) { object@priorInfo <- value object }) DESeq2/R/plots.R0000644000175400017540000003277613201671732014331 0ustar00biocbuildbiocbuildplotDispEsts.DESeqDataSet <- function( object, ymin, CV=FALSE, genecol = "black", fitcol = "red", finalcol = "dodgerblue", legend=TRUE, xlab, ylab, log = "xy", cex = 0.45, ... ) { if (missing(xlab)) xlab <- "mean of normalized counts" if (missing(ylab)) { if (CV) { ylab <- "coefficient of variation" } else { ylab <- "dispersion" } } px = mcols(object)$baseMean sel = (px>0) px = px[sel] # transformation of dispersion into CV or not f <- if (CV) sqrt else I py = f(mcols(object)$dispGeneEst[sel]) if(missing(ymin)) ymin = 10^floor(log10(min(py[py>0], na.rm=TRUE))-0.1) plot(px, pmax(py, ymin), xlab=xlab, ylab=ylab, log=log, pch=ifelse(py 1) { factor(apply( intgroup.df, 1, paste, collapse=":")) } else { colData(object)[[intgroup]] } # assembly the data for the plot d <- data.frame(PC1=pca$x[,1], PC2=pca$x[,2], group=group, intgroup.df, name=colnames(object)) if (returnData) { attr(d, "percentVar") <- percentVar[1:2] return(d) } ggplot(data=d, aes_string(x="PC1", y="PC2", color="group")) + geom_point(size=3) + xlab(paste0("PC1: ",round(percentVar[1] * 100),"% variance")) + ylab(paste0("PC2: ",round(percentVar[2] * 100),"% variance")) + coord_fixed() } #' Sample PCA plot for transformed data #' #' This plot helps to check for batch effects and the like. #' #' @docType methods #' @name plotPCA #' @rdname plotPCA #' @aliases plotPCA plotPCA,DESeqTransform-method #' #' @param object a \code{\link{DESeqTransform}} object, with data in \code{assay(x)}, #' produced for example by either \code{\link{rlog}} or #' \code{\link{varianceStabilizingTransformation}}. #' @param intgroup interesting groups: a character vector of #' names in \code{colData(x)} to use for grouping #' @param ntop number of top genes to use for principal components, #' selected by highest row variance #' @param returnData should the function only return the data.frame of PC1 and PC2 #' with intgroup covariates for custom plotting (default is FALSE) #' #' @return An object created by \code{ggplot}, which can be assigned and further customized. #' #' @author Wolfgang Huber #' #' @note See the vignette for an example of variance stabilization and PCA plots. #' Note that the source code of \code{plotPCA} is very simple. #' The source can be found by typing \code{DESeq2:::plotPCA.DESeqTransform} #' or \code{getMethod("plotPCA","DESeqTransform")}, or #' browsed on github at \url{https://github.com/Bioconductor-mirror/DESeq2/blob/master/R/plots.R} #' Users should find it easy to customize this function. #' #' @examples #' #' # using rlog transformed data: #' dds <- makeExampleDESeqDataSet(betaSD=1) #' rld <- rlog(dds) #' plotPCA(rld) #' #' # also possible to perform custom transformation: #' dds <- estimateSizeFactors(dds) #' # shifted log of normalized counts #' se <- SummarizedExperiment(log2(counts(dds, normalized=TRUE) + 1), #' colData=colData(dds)) #' # the call to DESeqTransform() is needed to #' # trigger our plotPCA method. #' plotPCA( DESeqTransform( se ) ) #' #' @export setMethod("plotPCA", signature(object="DESeqTransform"), plotPCA.DESeqTransform) #' Plot of normalized counts for a single gene #' #' Normalized counts plus a pseudocount of 0.5 are shown by default. #' #' @param dds a \code{DESeqDataSet} #' @param gene a character, specifying the name of the gene to plot #' @param intgroup interesting groups: a character vector of names in \code{colData(x)} to use for grouping #' @param normalized whether the counts should be normalized by size factor #' (default is TRUE) #' @param transform whether to have log scale y-axis or not. #' defaults to TRUE #' @param main as in 'plot' #' @param xlab as in 'plot' #' @param returnData should the function only return the data.frame of counts and #' covariates for custom plotting (default is FALSE) #' @param replaced use the outlier-replaced counts if they exist #' @param pc pseudocount for log transform #' @param ... arguments passed to plot #' #' @examples #' #' dds <- makeExampleDESeqDataSet() #' plotCounts(dds, "gene1") #' #' @export plotCounts <- function(dds, gene, intgroup="condition", normalized=TRUE, transform=TRUE, main, xlab="group", returnData=FALSE, replaced=FALSE, pc, ...) { stopifnot(length(gene) == 1 & (is.character(gene) | (is.numeric(gene) & (gene >= 1 & gene <= nrow(dds))))) if (!all(intgroup %in% names(colData(dds)))) stop("all variables in 'intgroup' must be columns of colData") stopifnot(returnData | all(sapply(intgroup, function(v) is(colData(dds)[[v]], "factor")))) if (missing(pc)) { pc <- if (transform) 0.5 else 0 } if (is.null(sizeFactors(dds)) & is.null(normalizationFactors(dds))) { dds <- estimateSizeFactors(dds) } cnts <- counts(dds,normalized=normalized,replaced=replaced)[gene,] group <- if (length(intgroup) == 1) { colData(dds)[[intgroup]] } else if (length(intgroup) == 2) { lvls <- as.vector(t(outer(levels(colData(dds)[[intgroup[1]]]), levels(colData(dds)[[intgroup[2]]]), function(x,y) paste(x,y,sep=":")))) droplevels(factor(apply( as.data.frame(colData(dds)[, intgroup, drop=FALSE]), 1, paste, collapse=":"), levels=lvls)) } else { factor(apply( as.data.frame(colData(dds)[, intgroup, drop=FALSE]), 1, paste, collapse=":")) } data <- data.frame(count=cnts + pc, group=as.integer(group)) logxy <- if (transform) "y" else "" if (missing(main)) { main <- if (is.numeric(gene)) { rownames(dds)[gene] } else { gene } } ylab <- ifelse(normalized,"normalized count","count") if (returnData) return(data.frame(count=data$count, colData(dds)[intgroup])) plot(data$group + runif(ncol(dds),-.05,.05), data$count, xlim=c(.5,max(data$group)+.5), log=logxy, xaxt="n", xlab=xlab, ylab=ylab, main=main, ...) axis(1, at=seq_along(levels(group)), levels(group)) } #' Sparsity plot #' #' A simple plot of the concentration of counts in a single sample over the #' sum of counts per gene. Not technically the same as "sparsity", but this #' plot is useful diagnostic for datasets which might not fit a negative #' binomial assumption: genes with many zeros and individual very large #' counts are difficult to model with the negative binomial distribution. #' #' @param x a matrix or DESeqDataSet #' @param normalized whether to normalize the counts from a DESeqDataSEt #' @param ... passed to \code{plot} #' #' @examples #' #' dds <- makeExampleDESeqDataSet(n=1000,m=4,dispMeanRel=function(x) .5) #' dds <- estimateSizeFactors(dds) #' plotSparsity(dds) #' #' @export plotSparsity <- function(x, normalized=TRUE, ...) { if (is(x, "DESeqDataSet")) { x <- counts(x, normalized=normalized) } rs <- rowSums(x) rmx <- apply(x, 1, max) plot(rs[rs > 0], (rmx/rs)[rs > 0], log="x", ylim=c(0,1), xlab="sum of counts per gene", ylab="max count / sum", main="Concentration of counts over total sum of counts", ...) } # convenience function for adding alpha transparency to named colors ## col2useful <- function(col,alpha) { ## x <- col2rgb(col)/255 ## rgb(x[1],x[2],x[3],alpha) ## } DESeq2/R/results.R0000644000175400017540000013641013201671732014657 0ustar00biocbuildbiocbuild#' Extract results from a DESeq analysis #' #' \code{results} extracts a result table from a DESeq analysis giving base means across samples, #' log2 fold changes, standard errors, test statistics, p-values and adjusted p-values; #' \code{resultsNames} returns the names of the estimated effects (coefficents) of the model; #' \code{removeResults} returns a \code{DESeqDataSet} object with results columns removed. #' #' The results table when printed will provide the information about #' the comparison, e.g. "log2 fold change (MAP): condition treated vs untreated", meaning #' that the estimates are of log2(treated / untreated), as would be returned by #' \code{contrast=c("condition","treated","untreated")}. #' Multiple results can be returned for analyses beyond a simple two group comparison, #' so \code{results} takes arguments \code{contrast} and \code{name} to help #' the user pick out the comparisons of interest for printing a results table. #' The use of the \code{contrast} argument is recommended for exact specification #' of the levels which should be compared and their order. #' #' If \code{results} is run without specifying \code{contrast} or \code{name}, #' it will return the comparison of the last level of the last variable in the #' design formula over the first level of this variable. For example, for a simple two-group #' comparison, this would return the log2 fold changes of the second group over the #' first group (the reference level). Please see examples below and in the vignette. #' #' The argument \code{contrast} can be used to generate results tables for #' any comparison of interest, for example, the log2 fold change between #' two levels of a factor, and its usage is described below. It can also #' accomodate more complicated numeric comparisons. #' The test statistic used for a contrast is: #' #' \deqn{ c^t \beta / \sqrt{c^t \Sigma c } }{ c' beta / sqrt( c' Sigma c ) } #' #' The argument \code{name} can be used to generate results tables for #' individual effects, which must be individual elements of \code{resultsNames(object)}. #' These individual effects could represent continuous covariates, effects #' for individual levels, or individual interaction effects. #' #' Information on the comparison which was used to build the results table, #' and the statistical test which was used for p-values (Wald test or likelihood ratio test) #' is stored within the object returned by \code{results}. This information is in #' the metadata columns of the results table, which is accessible by calling \code{mcols} #' on the \code{\link{DESeqResults}} object returned by \code{results}. #' #' On p-values: #' #' By default, independent filtering is performed to select a set of genes #' for multiple test correction which maximizes the number of adjusted #' p-values less than a given critical value \code{alpha} (by default 0.1). #' See the reference in this man page for details on independent filtering. #' The filter used for maximizing the number of rejections is the mean #' of normalized counts for all samples in the dataset. #' Several arguments from the \code{\link[genefilter]{filtered_p}} function of #' the genefilter package (used within the \code{results} function) #' are provided here to control the independent filtering behavior. #' In DESeq2 version >= 1.10, the threshold that is chosen is #' the lowest quantile of the filter for which the #' number of rejections is close to the peak of a curve fit #' to the number of rejections over the filter quantiles. #' 'Close to' is defined as within 1 residual standard deviation. #' The adjusted p-values for the genes which do not pass the filter threshold #' are set to \code{NA}. #' #' By default, \code{results} assigns a p-value of \code{NA} #' to genes containing count outliers, as identified using Cook's distance. #' See the \code{cooksCutoff} argument for control of this behavior. #' Cook's distances for each sample are accessible as a matrix "cooks" #' stored in the \code{assays()} list. This measure is useful for identifying rows where the #' observed counts might not fit to a Negative Binomial distribution. #' #' For analyses using the likelihood ratio test (using \code{\link{nbinomLRT}}), #' the p-values are determined solely by the difference in deviance between #' the full and reduced model formula. A single log2 fold change is printed #' in the results table for consistency with other results table outputs, #' however the test statistic and p-values may nevertheless involve #' the testing of one or more log2 fold changes. #' Which log2 fold change is printed in the results table can be controlled #' using the \code{name} argument, or by default this will be the estimated #' coefficient for the last element of \code{resultsNames(object)}. #' #' @references Richard Bourgon, Robert Gentleman, Wolfgang Huber: Independent #' filtering increases detection power for high-throughput experiments. #' PNAS (2010), \url{http://dx.doi.org/10.1073/pnas.0914005107} #' #' @param object a DESeqDataSet, on which one #' of the following functions has already been called: #' \code{\link{DESeq}}, \code{\link{nbinomWaldTest}}, or \code{\link{nbinomLRT}} #' @param contrast this argument specifies what comparison to extract from #' the \code{object} to build a results table. one of either: #' \itemize{ #' \item a character vector with exactly three elements: #' the name of a factor in the design formula, #' the name of the numerator level for the fold change, #' and the name of the denominator level for the fold change #' (simplest case) #' \item a list of 2 character vectors: the names of the fold changes #' for the numerator, and the names of the fold changes #' for the denominator. #' these names should be elements of \code{resultsNames(object)}. #' if the list is length 1, a second element is added which is the #' empty character vector, \code{character()}. #' (more general case, can be to combine interaction terms and main effects) #' \item a numeric contrast vector with one element #' for each element in \code{resultsNames(object)} (most general case) #' } #' If specified, the \code{name} argument is ignored. #' @param name the name of the individual effect (coefficient) for #' building a results table. Use this argument rather than \code{contrast} #' for continuous variables, individual effects or for individual interaction terms. #' The value provided to \code{name} must be an element of \code{resultsNames(object)}. #' @param lfcThreshold a non-negative value which specifies a log2 fold change #' threshold. The default value is 0, corresponding to a test that #' the log2 fold changes are equal to zero. The user can #' specify the alternative hypothesis using the \code{altHypothesis} argument, #' which defaults to testing #' for log2 fold changes greater in absolute value than a given threshold. #' If \code{lfcThreshold} is specified, #' the results are for Wald tests, and LRT p-values will be overwritten. #' @param altHypothesis character which specifies the alternative hypothesis, #' i.e. those values of log2 fold change which the user is interested in #' finding. The complement of this set of values is the null hypothesis which #' will be tested. If the log2 fold change specified by \code{name} #' or by \code{contrast} is written as \eqn{ \beta }{ beta }, then the possible values for #' \code{altHypothesis} represent the following alternate hypotheses: #' \itemize{ #' \item greaterAbs: \eqn{|\beta| > \textrm{lfcThreshold} }{ |beta| > lfcThreshold }, #' and p-values are two-tailed #' \item lessAbs: \eqn{ |\beta| < \textrm{lfcThreshold} }{ |beta| < lfcThreshold }, #' NOTE: this requires that \code{betaPrior=FALSE} has been specified in the #' previous \code{\link{DESeq}} call. #' p-values are the maximum of the upper and lower tests. #' \item greater: \eqn{ \beta > \textrm{lfcThreshold} }{ beta > lfcThreshold } #' \item less: \eqn{ \beta < -\textrm{lfcThreshold} }{ beta < -lfcThreshold } #' } #' @param listValues only used if a list is provided to \code{contrast}: #' a numeric of length two: the log2 fold changes in the list are multiplied by these values. #' the first number should be positive and the second negative. #' by default this is \code{c(1,-1)} #' @param cooksCutoff theshold on Cook's distance, such that if one or more #' samples for a row have a distance higher, the p-value for the row is #' set to NA. The default cutoff is the .99 quantile of the F(p, m-p) distribution, #' where p is the number of coefficients being fitted and m is the number of samples. #' Set to \code{Inf} or \code{FALSE} to disable the resetting of p-values to NA. #' Note: this test excludes the Cook's distance of samples belonging to experimental #' groups with only 2 samples. #' @param independentFiltering logical, whether independent filtering should be #' applied automatically #' @param alpha the significance cutoff used for optimizing the independent #' filtering (by default 0.1). If the adjusted p-value cutoff (FDR) will be a #' value other than 0.1, \code{alpha} should be set to that value. #' @param filter the vector of filter statistics over which the independent #' filtering will be optimized. By default the mean of normalized counts is used. #' @param theta the quantiles at which to assess the number of rejections #' from independent filtering #' @param pAdjustMethod the method to use for adjusting p-values, see \code{?p.adjust} #' @param filterFun an optional custom function for performing independent filtering #' and p-value adjustment, with arguments \code{res} (a DESeqResults object), #' \code{filter} (the quantitity for filtering tests), #' \code{alpha} (the target FDR), #' \code{pAdjustMethod}. This function should return a DESeqResults object #' with a \code{padj} column. #' @param format character, either \code{"DataFrame"}, \code{"GRanges"}, or \code{"GRangesList"}, #' whether the results should be printed as a \code{\link{DESeqResults}} DataFrame, #' or if the results DataFrame should be attached as metadata columns to #' the \code{GRanges} or \code{GRangesList} \code{rowRanges} of the \code{DESeqDataSet}. #' If the \code{rowRanges} is a \code{GRangesList}, and \code{GRanges} is requested, #' the range of each gene will be returned #' @param test this is automatically detected internally if not provided. #' the one exception is after \code{nbinomLRT} has been run, \code{test="Wald"} #' will generate Wald statistics and Wald test p-values. #' @param addMLE if \code{betaPrior=TRUE} was used, #' whether the "unshrunken" maximum likelihood estimates (MLE) #' of log2 fold change should be added as a column to the results table (default is FALSE). #' This argument is preserved for backward compatability, as now the #' recommended pipeline is to generate shrunken MAP estimates using \code{\link{lfcShrink}}. #' This argument functionality is only implemented for \code{contrast} #' specified as three element character vectors. #' @param tidy whether to output the results table with rownames as a first column 'row'. #' the table will also be coerced to \code{data.frame} #' @param parallel if FALSE, no parallelization. if TRUE, parallel #' execution using \code{BiocParallel}, see next argument \code{BPPARAM} #' @param BPPARAM an optional parameter object passed internally #' to \code{\link{bplapply}} when \code{parallel=TRUE}. #' If not specified, the parameters last registered with #' \code{\link{register}} will be used. #' @param ... optional arguments passed to \code{filterFun} #' #' @return For \code{results}: a \code{\link{DESeqResults}} object, which is #' a simple subclass of DataFrame. This object contains the results columns: #' \code{baseMean}, \code{log2FoldChange}, \code{lfcSE}, \code{stat}, #' \code{pvalue} and \code{padj}, #' and also includes metadata columns of variable information. #' The \code{lfcSE} gives the standard error of the \code{log2FoldChange}. #' For the Wald test, \code{stat} is the Wald statistic: the \code{log2FoldChange} #' divided by \code{lfcSE}, which is compared to a standard Normal distribution #' to generate a two-tailed \code{pvalue}. For the likelihood ratio test (LRT), #' \code{stat} is the difference in deviance between the reduced model and the full model, #' which is compared to a chi-squared distribution to generate a \code{pvalue}. #' #' For \code{resultsNames}: the names of the columns available as results, #' usually a combination of the variable name and a level #' #' For \code{removeResults}: the original \code{DESeqDataSet} with results metadata columns removed #' #' @seealso \code{\link{DESeq}}, \code{\link[genefilter]{filtered_R}} #' #' @examples #' #' ## Example 1: two-group comparison #' #' dds <- makeExampleDESeqDataSet(m=4) #' #' dds <- DESeq(dds) #' res <- results(dds, contrast=c("condition","B","A")) #' #' # with more than two groups, the call would look similar, e.g.: #' # results(dds, contrast=c("condition","C","A")) #' # etc. #' #' ## Example 2: two conditions, two genotypes, with an interaction term #' #' dds <- makeExampleDESeqDataSet(n=100,m=12) #' dds$genotype <- factor(rep(rep(c("I","II"),each=3),2)) #' #' design(dds) <- ~ genotype + condition + genotype:condition #' dds <- DESeq(dds) #' resultsNames(dds) #' #' # Note: design with interactions terms by default have betaPrior=FALSE #' #' # the condition effect for genotype I (the main effect) #' results(dds, contrast=c("condition","B","A")) #' #' # the condition effect for genotype II #' # this is, by definition, the main effect *plus* the interaction term #' # (the extra condition effect in genotype II compared to genotype I). #' results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") )) #' #' # the interaction term, answering: is the condition effect *different* across genotypes? #' results(dds, name="genotypeII.conditionB") #' #' ## Example 3: two conditions, three genotypes #' #' # ~~~ Using interaction terms ~~~ #' #' dds <- makeExampleDESeqDataSet(n=100,m=18) #' dds$genotype <- factor(rep(rep(c("I","II","III"),each=3),2)) #' design(dds) <- ~ genotype + condition + genotype:condition #' dds <- DESeq(dds) #' resultsNames(dds) #' #' # the condition effect for genotype I (the main effect) #' results(dds, contrast=c("condition","B","A")) #' #' # the condition effect for genotype III. #' # this is the main effect *plus* the interaction term #' # (the extra condition effect in genotype III compared to genotype I). #' results(dds, contrast=list( c("condition_B_vs_A","genotypeIII.conditionB") )) #' #' # the interaction term for condition effect in genotype III vs genotype I. #' # this tests if the condition effect is different in III compared to I #' results(dds, name="genotypeIII.conditionB") #' #' # the interaction term for condition effect in genotype III vs genotype II. #' # this tests if the condition effect is different in III compared to II #' results(dds, contrast=list("genotypeIII.conditionB", "genotypeII.conditionB")) #' #' # Note that a likelihood ratio could be used to test if there are any #' # differences in the condition effect between the three genotypes. #' #' # ~~~ Using a grouping variable ~~~ #' #' # This is a useful construction when users just want to compare #' # specific groups which are combinations of variables. #' #' dds$group <- factor(paste0(dds$genotype, dds$condition)) #' design(dds) <- ~ group #' dds <- DESeq(dds) #' resultsNames(dds) #' #' # the condition effect for genotypeIII #' results(dds, contrast=c("group", "IIIB", "IIIA")) #' #' @rdname results #' @aliases results resultsNames removeResults #' @export results <- function(object, contrast, name, lfcThreshold=0, altHypothesis=c("greaterAbs","lessAbs","greater","less"), listValues=c(1,-1), cooksCutoff, independentFiltering=TRUE, alpha=0.1, filter, theta, pAdjustMethod="BH", filterFun, format=c("DataFrame","GRanges","GRangesList"), test, addMLE=FALSE, tidy=FALSE, parallel=FALSE, BPPARAM=bpparam(), ...) { # match args format <- match.arg(format, choices=c("DataFrame", "GRanges","GRangesList")) altHypothesis <- match.arg(altHypothesis, choices=c("greaterAbs","lessAbs","greater","less")) if (!missing(test)) { test <- match.arg(test, choices=c("Wald","LRT")) } # initial argument testing stopifnot(lfcThreshold >= 0) stopifnot(length(lfcThreshold)==1) stopifnot(length(alpha)==1) stopifnot(alpha > 0 & alpha < 1) stopifnot(length(pAdjustMethod)==1) stopifnot(length(listValues)==2 & is.numeric(listValues)) stopifnot(listValues[1] > 0 & listValues[2] < 0) if (!"results" %in% mcols(mcols(object))$type) { stop("couldn't find results. you should first run DESeq()") } if (missing(test)) { test <- attr(object, "test") } else if (test == "Wald" & attr(object, "test") == "LRT") { # initially test was LRT, now need to add Wald statistics and p-values object <- makeWaldTest(object) } else if (test == "LRT" & attr(object, "test") == "Wald") { stop("the LRT requires the user run nbinomLRT or DESeq(dds,test='LRT')") } if (lfcThreshold == 0 & altHypothesis == "lessAbs") { stop("when testing altHypothesis='lessAbs', set the argument lfcThreshold to a positive value") } if (addMLE) { if (!attr(object,"betaPrior")) { stop("addMLE=TRUE is only for when a beta prior was used. otherwise, the log2 fold changes are already MLE") } if (!missing(name) & missing(contrast)) { stop("addMLE=TRUE should be used by providing character vector of length 3 to 'contrast' instead of using 'name'") } } if (format == "GRanges" & is(rowRanges(object),"GRangesList")) { if (any(elementNROWS(rowRanges(object)) == 0)) { stop("rowRanges is GRangesList and one or more GRanges have length 0. Use format='DataFrame' or 'GRangesList'") } } if (!missing(contrast)) { if (attr(object,"modelMatrixType") == "user-supplied" & is.character(contrast)) { stop("only list- and numeric-type contrasts are supported for user-supplied model matrices") } } hasIntercept <- attr(terms(design(object)),"intercept") == 1 isExpanded <- attr(object, "modelMatrixType") == "expanded" termsOrder <- attr(terms.formula(design(object)),"order") # if neither 'contrast' nor 'name' were specified, create the default result table: # the last level / first level for the last variable in design. # (unless there are interactions, in which case the lastCoefName is pulled below) if ((test == "Wald") & (isExpanded | !hasIntercept) & missing(contrast) & missing(name) & all(termsOrder < 2)) { designVars <- all.vars(design(object)) lastVarName <- designVars[length(designVars)] lastVar <- colData(object)[[lastVarName]] if (is.factor(lastVar)) { nlvls <- nlevels(lastVar) contrast <- c(lastVarName, levels(lastVar)[nlvls], levels(lastVar)[1]) } } if (missing(name)) { name <- lastCoefName(object) } else { if (length(name) != 1 | !is.character(name)) { stop("the argument 'name' should be a character vector of length 1") } } WaldResults <- paste0("WaldPvalue_",name) %in% names(mcols(object)) LRTResults <- "LRTPvalue" %in% names(mcols(object)) # if performing a contrast call the function cleanContrast() if (!missing(contrast)) { resNames <- resultsNames(object) # do some arg checking/cleaning contrast <- checkContrast(contrast, resNames) ### cleanContrast call ### # need to go back to C++ code in order to build the beta covariance matrix # then this is multiplied by the numeric contrast to get the Wald statistic. # with 100s of samples, this can get slow, so offer parallelization res <- if (!parallel) { cleanContrast(object, contrast, expanded=isExpanded, listValues=listValues, test=test) } else if (parallel) { nworkers <- BPPARAM$workers idx <- factor(sort(rep(seq_len(nworkers),length=nrow(object)))) do.call(rbind, bplapply(levels(idx), function(l) { cleanContrast(object[idx == l,,drop=FALSE], contrast, expanded=isExpanded, listValues=listValues, test=test) }, BPPARAM=BPPARAM)) } } else { # if not performing a contrast # pull relevant columns from mcols(object) log2FoldChange <- getCoef(object, name) lfcSE <- getCoefSE(object, name) stat <- getStat(object, test, name) pvalue <- getPvalue(object, test, name) res <- cbind(mcols(object)["baseMean"],log2FoldChange,lfcSE,stat,pvalue) names(res) <- c("baseMean","log2FoldChange","lfcSE","stat","pvalue") } rownames(res) <- rownames(object) # add unshrunken MLE coefficients to the results table if (addMLE) { if (is.numeric(contrast)) stop("addMLE only implemented for: contrast=c('condition','B','A')") if (is.list(contrast)) stop("addMLE only implemented for: contrast=c('condition','B','A')") res <- cbind(res, mleContrast(object, contrast)) res <- res[,c("baseMean","log2FoldChange","lfcMLE","lfcSE","stat","pvalue")] } # only if we need to generate new p-values if ( !(lfcThreshold == 0 & altHypothesis == "greaterAbs") ) { if (test == "LRT") { stop("tests of log fold change above or below a theshold must be Wald tests.") } if (altHypothesis == "greaterAbs") { newStat <- sign(res$log2FoldChange) * pmax(0, (abs(res$log2FoldChange) - lfcThreshold)) / res$lfcSE newPvalue <- pmin(1, 2 * pnorm(abs(res$log2FoldChange), mean = lfcThreshold, sd = res$lfcSE, lower.tail = FALSE)) } else if (altHypothesis == "lessAbs") { # check requirement if betaPrior was set to FALSE if (attr(object,"betaPrior")) { stop("testing altHypothesis='lessAbs' requires setting the DESeq() argument betaPrior=FALSE") } newStatAbove <- pmax(0, lfcThreshold - res$log2FoldChange) / res$lfcSE pvalueAbove <- pnorm(res$log2FoldChange, mean = lfcThreshold, sd = res$lfcSE, lower.tail = TRUE) newStatBelow <- pmax(0, res$log2FoldChange + lfcThreshold) / res$lfcSE pvalueBelow <- pnorm(res$log2FoldChange, mean = -lfcThreshold, sd = res$lfcSE, lower.tail = FALSE) newStat <- pmin(newStatAbove, newStatBelow) newPvalue <- pmax(pvalueAbove, pvalueBelow) } else if (altHypothesis == "greater") { newStat <- pmax(0, res$log2FoldChange - lfcThreshold) / res$lfcSE newPvalue <- pnorm(res$log2FoldChange, mean = lfcThreshold, sd = res$lfcSE, lower.tail = FALSE) } else if (altHypothesis == "less") { newStat <- pmax(0, lfcThreshold - res$log2FoldChange) / res$lfcSE newPvalue <- pnorm(res$log2FoldChange, mean = -lfcThreshold, sd = res$lfcSE, lower.tail = TRUE) } res$stat <- newStat res$pvalue <- newPvalue } # calculate Cook's cutoff m <- nrow(attr(object,"dispModelMatrix")) p <- ncol(attr(object,"dispModelMatrix")) # only if more samples than parameters: if (m > p) { defaultCutoff <- qf(.99, p, m - p) if (missing(cooksCutoff)) { cooksCutoff <- defaultCutoff } stopifnot(length(cooksCutoff)==1) if (is.logical(cooksCutoff) & cooksCutoff) { cooksCutoff <- defaultCutoff } } else { cooksCutoff <- FALSE } # apply cutoff based on maximum Cook's distance performCooksCutoff <- (is.numeric(cooksCutoff) | cooksCutoff) if ((m > p) & performCooksCutoff) { cooksOutlier <- mcols(object)$maxCooks > cooksCutoff res$pvalue[cooksOutlier] <- NA } # if original baseMean was positive, but now zero due to replaced counts, fill in results if ( sum(mcols(object)$replace, na.rm=TRUE) > 0) { nowZero <- which(mcols(object)$replace & mcols(object)$baseMean == 0) res$log2FoldChange[nowZero] <- 0 if (addMLE) { res$lfcMLE[nowZero] <- 0 } res$lfcSE[nowZero] <- 0 res$stat[nowZero] <- 0 res$pvalue[nowZero] <- 1 } # add prior information deseq2.version <- packageVersion("DESeq2") if (!attr(object,"betaPrior")) { priorInfo <- list(type="none", package="DESeq2", version=deseq2.version) } else { betaPriorVar <- attr(object, "betaPriorVar") priorInfo <- list(type="normal", package="DESeq2", version=deseq2.version, betaPriorVar=betaPriorVar) } # make results object deseqRes <- DESeqResults(res, priorInfo=priorInfo) # p-value adjustment if (missing(filterFun)) { deseqRes <- pvalueAdjustment(deseqRes, independentFiltering, filter, theta, alpha, pAdjustMethod) } else { deseqRes <- filterFun(deseqRes, filter, alpha, pAdjustMethod) } # remove rownames and attach as a new column, 'row' if (tidy) { colnms <- colnames(deseqRes) deseqRes$row <- rownames(deseqRes) mcols(deseqRes,use.names=TRUE)["row","type"] <- "results" mcols(deseqRes,use.names=TRUE)["row","description"] <- "row names" deseqRes <- deseqRes[,c("row",colnms)] rownames(deseqRes) <- NULL deseqRes <- as.data.frame(deseqRes) } if (format == "DataFrame") { return(deseqRes) } else if (format == "GRangesList") { if (class(rowRanges(object)) == "GRanges") warning("rowRanges is GRanges") out <- rowRanges(object) mcols(out) <- deseqRes return(out) } else if (format == "GRanges") { if (class(rowRanges(object)) == "GRangesList") { message("rowRanges is GRangesList, performing unlist(range(x)) on the rowRanges") out <- unlist(range(rowRanges(object))) mcols(out) <- deseqRes return(out) } else { out <- rowRanges(object) mcols(out) <- deseqRes return(out) } } } #' @rdname results #' @export resultsNames <- function(object) { names(mcols(object))[grep("log2 fold change",mcols(mcols(object))$description)] } #' @rdname results #' @export removeResults <- function(object) { resCols <- mcols(mcols(object))$type == "results" if (sum(resCols,na.rm=TRUE) > 0) { mcols(object) <- mcols(object)[,-which(resCols),drop=FALSE] } return(object) } ########################################################### # unexported functons ########################################################### pvalueAdjustment <- function(res, independentFiltering, filter, theta, alpha, pAdjustMethod) { # perform independent filtering if (independentFiltering) { if (missing(filter)) { filter <- res$baseMean } if (missing(theta)) { lowerQuantile <- mean(filter == 0) if (lowerQuantile < .95) upperQuantile <- .95 else upperQuantile <- 1 theta <- seq(lowerQuantile, upperQuantile, length=50) } # do filtering using genefilter stopifnot(length(theta) > 1) stopifnot(length(filter) == nrow(res)) filtPadj <- filtered_p(filter=filter, test=res$pvalue, theta=theta, method=pAdjustMethod) numRej <- colSums(filtPadj < alpha, na.rm = TRUE) # prevent over-aggressive filtering when all genes are null, # by requiring the max number of rejections is above a fitted curve. # If the max number of rejection is not greater than 10, then don't # perform independent filtering at all. lo.fit <- lowess(numRej ~ theta, f=1/5) if (max(numRej) <= 10) { j <- 1 } else { residual <- if (all(numRej==0)) { 0 } else { numRej[numRej > 0] - lo.fit$y[numRej > 0] } thresh <- max(lo.fit$y) - sqrt(mean(residual^2)) j <- if (any(numRej > thresh)) { which(numRej > thresh)[1] } else { 1 } } # j <- which.max(numRej) # old method padj <- filtPadj[, j, drop=TRUE] cutoffs <- quantile(filter, theta) filterThreshold <- cutoffs[j] filterNumRej <- data.frame(theta=theta, numRej=numRej) filterTheta <- theta[j] metadata(res)[["filterThreshold"]] <- filterThreshold metadata(res)[["filterTheta"]] <- filterTheta metadata(res)[["filterNumRej"]] <- filterNumRej metadata(res)[["lo.fit"]] <- lo.fit metadata(res)[["alpha"]] <- alpha } else { # regular p-value adjustment # does not include those rows which were removed # by maximum Cook's distance padj <- p.adjust(res$pvalue,method=pAdjustMethod) } res$padj <- padj # add metadata to padj column mcols(res)$type[names(res)=="padj"] <- "results" mcols(res)$description[names(res)=="padj"] <- paste(pAdjustMethod,"adjusted p-values") res } # two low-level functions used by results() to perform contrasts # # getContrast takes a DESeqDataSet object # and a numeric vector specifying a contrast # and returns a vector of Wald statistics # corresponding to the contrast. # # cleanContrast checks for the validity of # the specified contrast (numeric or character vector) # and turns character vector contrast into the appropriate # numeric vector contrast # # results() calls cleanContrast() which calls getContrast() # # the formula used is: # c' beta / sqrt( c' sigma c) # where beta is the coefficient vector # and sigma is the covariance matrix for beta getContrast <- function(object, contrast, useT=FALSE, df) { if (missing(contrast)) { stop("must provide a contrast") } if (is.null(attr(object,"modelMatrix"))) { stop("was expecting a model matrix stored as an attribute of the DESeqDataSet") } modelMatrix <- attr(object, "modelMatrix") # only continue on the rows with non-zero row mean objectNZ <- object[!mcols(object)$allZero,] normalizationFactors <- getSizeOrNormFactors(objectNZ) alpha_hat <- dispersions(objectNZ) coefColumns <- names(mcols(objectNZ))[grep("log2 fold change",mcols(mcols(object))$description)] # convert betas to log scale beta_mat <- log(2) * as.matrix(mcols(objectNZ)[,coefColumns,drop=FALSE]) # convert beta prior variance to log scale lambda = 1/(log(2)^2 * attr(object,"betaPriorVar")) # check if DESeq() replaced outliers countsMatrix <- if ("replaceCounts" %in% assayNames(object)) { assays(objectNZ)[["replaceCounts"]] } else { counts(objectNZ) } # use weights if they are present in assays(object) if ("weights" %in% assayNames(object)) { useWeights <- TRUE weights <- assays(object)[["weights"]] stopifnot(all(weights >= 0)) weights <- weights / apply(weights, 1, max) } else { useWeights <- FALSE weights <- matrix(1, nrow=nrow(object), ncol=ncol(object)) } betaRes <- fitBeta(ySEXP = countsMatrix, xSEXP = modelMatrix, nfSEXP = normalizationFactors, alpha_hatSEXP = alpha_hat, contrastSEXP = contrast, beta_matSEXP = beta_mat, lambdaSEXP = lambda, weightsSEXP = weights, useWeightsSEXP = useWeights, tolSEXP = 1e-8, maxitSEXP = 0, useQRSEXP=FALSE) # QR not relevant, fitting loop isn't entered # convert back to log2 scale contrastEstimate <- log2(exp(1)) * betaRes$contrast_num contrastSE <- log2(exp(1)) * betaRes$contrast_denom contrastStatistic <- contrastEstimate / contrastSE if (useT) { stopifnot(length(df)==1) contrastPvalue <- 2*pt(abs(contrastStatistic),df=df,lower.tail=FALSE) } else { contrastPvalue <- 2*pnorm(abs(contrastStatistic),lower.tail=FALSE) } contrastList <- list(log2FoldChange=contrastEstimate, lfcSE=contrastSE, stat=contrastStatistic, pvalue=contrastPvalue) contrastResults <- buildDataFrameWithNARows(contrastList, mcols(object)$allZero) names(contrastResults) <- c("log2FoldChange","lfcSE","stat","pvalue") contrastResults } # this function takes a desired contrast as specified by results(), # performs checks, and then either returns the already existing contrast # or generates the contrast by calling getContrast() using a numeric vector cleanContrast <- function(object, contrast, expanded=FALSE, listValues, test) { # get the names of columns in the beta matrix resNames <- resultsNames(object) # if possible, return pre-computed columns, which are # already stored in mcols(dds). this will be the case using # results() with 'name', or if expanded model matrices were not # run and the contrast contains the reference level as numerator or denominator resReady <- FALSE if (is.character(contrast)) { contrastFactor <- contrast[1] if (!contrastFactor %in% names(colData(object))) { stop(paste(contrastFactor,"should be the name of a factor in the colData of the DESeqDataSet")) } contrastNumLevel <- contrast[2] contrastDenomLevel <- contrast[3] contrastBaseLevel <- levels(colData(object)[,contrastFactor])[1] # check for intercept hasIntercept <- attr(terms(design(object)),"intercept") == 1 firstVar <- contrastFactor == all.vars(design(object))[1] # tricky case: if the design has no intercept, the factor is # not the first variable in the design, and one of the numerator or denominator # is the reference level, then the desired contrast is simply a coefficient (or -1 times) noInterceptPullCoef <- !hasIntercept & !firstVar & (contrastBaseLevel %in% c(contrastNumLevel, contrastDenomLevel)) # case 1: standard model matrices: pull coef or build the appropriate contrast # coefficients names are of the form "factor_level_vs_baselevel" # output: contrastNumColumn and contrastDenomColumn if (!expanded & (hasIntercept | noInterceptPullCoef)) { # use make.names() so the column names are # the same as created by DataFrame in mcols(object). contrastNumColumn <- make.names(paste0(contrastFactor,"_",contrastNumLevel,"_vs_",contrastBaseLevel)) contrastDenomColumn <- make.names(paste0(contrastFactor,"_",contrastDenomLevel,"_vs_",contrastBaseLevel)) # check that the desired contrast is already # available in mcols(object), and then we can either # take it directly or multiply the log fold # changes and Wald stat by -1 if ( contrastDenomLevel == contrastBaseLevel ) { cleanName <- paste(contrastFactor,contrastNumLevel,"vs",contrastDenomLevel) # the results can be pulled directly from mcols(object) name <- if (!noInterceptPullCoef) { make.names(paste0(contrastFactor,"_",contrastNumLevel,"_vs_",contrastDenomLevel)) } else { make.names(paste0(contrastFactor,contrastNumLevel)) } if (!name %in% resNames) { stop(paste("as",contrastDenomLevel,"is the reference level, was expecting",name,"to be present in 'resultsNames(object)'")) } log2FoldChange <- getCoef(object, name) lfcSE <- getCoefSE(object, name) stat <- getStat(object, test, name) pvalue <- getPvalue(object, test, name) res <- cbind(mcols(object)["baseMean"],log2FoldChange,lfcSE,stat,pvalue) names(res) <- c("baseMean","log2FoldChange","lfcSE","stat","pvalue") lfcType <- if (attr(object,"betaPrior")) "MAP" else "MLE" lfcDesc <- paste0("log2 fold change (",lfcType,"): ",cleanName) mcols(res,use.names=TRUE)["log2FoldChange","description"] <- lfcDesc resReady <- TRUE } else if ( contrastNumLevel == contrastBaseLevel ) { # fetch the results for denom vs num # and mutiply the log fold change and stat by -1 cleanName <- paste(contrastFactor,contrastNumLevel,"vs",contrastDenomLevel) swapName <- if (!noInterceptPullCoef) { make.names(paste0(contrastFactor,"_",contrastDenomLevel,"_vs_",contrastNumLevel)) } else { make.names(paste0(contrastFactor,contrastDenomLevel)) } if (!swapName %in% resNames) { stop(paste("as",contrastNumLevel,"is the reference level, was expecting",swapName,"to be present in 'resultsNames(object)'")) } log2FoldChange <- getCoef(object, swapName) lfcSE <- getCoefSE(object, swapName) stat <- getStat(object, test, swapName) pvalue <- getPvalue(object, test, swapName) res <- cbind(mcols(object)["baseMean"],log2FoldChange,lfcSE,stat,pvalue) names(res) <- c("baseMean","log2FoldChange","lfcSE","stat","pvalue") res$log2FoldChange <- -1 * res$log2FoldChange if (test == "Wald") res$stat <- -1 * res$stat lfcType <- if (attr(object,"betaPrior")) "MAP" else "MLE" # rename some of the columns using the flipped contrast if (test == "Wald") { contrastDescriptions <- paste(c(paste0("log2 fold change (",lfcType,"):"), "standard error:", "Wald statistic:","Wald test p-value:"), cleanName) mcols(res,use.names=TRUE)[c("log2FoldChange","lfcSE","stat","pvalue"), "description"] <- contrastDescriptions } else { contrastDescriptions <- paste(c(paste0("log2 fold change (",lfcType,"):"), "standard error:"), cleanName) mcols(res,use.names=TRUE)[c("log2FoldChange","lfcSE"), "description"] <- contrastDescriptions } resReady <- TRUE } else { # check for the case where neither are present # as comparisons against reference level if ( ! (contrastNumColumn %in% resNames & contrastDenomColumn %in% resNames) ) { stop(paste(contrastNumLevel,"and",contrastDenomLevel,"should be levels of",contrastFactor,"such that",contrastNumColumn,"and",contrastDenomColumn,"are contained in 'resultsNames(object)'")) } } # case 2: expanded model matrices or no intercept and first variable # need to then build the appropriate contrast. # these coefficient names have the form "factorlevel" # output: contrastNumColumn and contrastDenomColumn } else { # we only need to check validity contrastNumColumn <- make.names(paste0(contrastFactor, contrastNumLevel)) contrastDenomColumn <- make.names(paste0(contrastFactor, contrastDenomLevel)) if ( ! (contrastNumColumn %in% resNames & contrastDenomColumn %in% resNames) ) { stop(paste(paste0(contrastFactor,contrastNumLevel),"and",paste0(contrastFactor,contrastDenomLevel), "are expected to be in resultsNames(object)")) } } } # if the result table not already built in the above code... if (!resReady) { # here, a numeric / list / character contrast which will be converted # into a numeric contrast and run through getContrast() if (is.numeric(contrast)) { # make name for numeric contrast signMap <- c("","","+") contrastSigns <- signMap[sign(contrast)+2] contrastName <- paste(paste0(contrastSigns,as.character(contrast)),collapse=",") } else if (is.list(contrast)) { # interpret list contrast into numeric and make a name for the contrast lc1 <- length(contrast[[1]]) lc2 <- length(contrast[[2]]) # these just used for naming listvalname1 <- round(listValues[1],3) listvalname2 <- round(listValues[2],3) if (lc1 > 0 & lc2 > 0) { listvalname2 <- abs(listvalname2) listvalname1 <- if (listvalname1 == 1) "" else paste0(listvalname1," ") listvalname2 <- if (listvalname2 == 1) "" else paste0(listvalname2," ") contrastName <- paste0(listvalname1,paste(contrast[[1]],collapse="+")," vs ",listvalname2,paste(contrast[[2]],collapse="+")) } else if (lc1 > 0 & lc2 == 0) { listvalname1 <- if (listvalname1 == 1) "" else paste0(listvalname1," ") contrastName <- paste0(listvalname1,paste(contrast[[1]],collapse="+")," effect") } else if (lc1 == 0 & lc2 > 0) { contrastName <- paste(listvalname2,paste(contrast[[2]],collapse="+"),"effect") } contrastNumeric <- rep(0,length(resNames)) contrastNumeric[resNames %in% contrast[[1]]] <- listValues[1] contrastNumeric[resNames %in% contrast[[2]]] <- listValues[2] contrast <- contrastNumeric } else if (is.character(contrast)) { # interpret character contrast into numeric and make a name for the contrast contrastNumeric <- rep(0,length(resNames)) contrastNumeric[resNames == contrastNumColumn] <- 1 contrastNumeric[resNames == contrastDenomColumn] <- -1 contrast <- contrastNumeric contrastName <- paste(contrastFactor,contrastNumLevel,"vs",contrastDenomLevel) } # now get the contrast contrastResults <- getContrast(object, contrast, useT=FALSE, df) lfcType <- if (attr(object,"betaPrior")) "MAP" else "MLE" contrastDescriptions <- paste(c(paste0("log2 fold change (",lfcType,"):"), "standard error:", "Wald statistic:", "Wald test p-value:"), contrastName) mcols(contrastResults) <- DataFrame(type=rep("results",ncol(contrastResults)), description=contrastDescriptions) res <- cbind(mcols(object)["baseMean"], contrastResults) } # if test is "LRT", overwrite the statistic and p-value # (we only ran contrast for the coefficient) if (test == "LRT") { stat <- getStat(object, test, name=NULL) pvalue <- getPvalue(object, test, name=NULL) res <- cbind(res[c("baseMean","log2FoldChange","lfcSE")],stat,pvalue) names(res) <- c("baseMean","log2FoldChange","lfcSE","stat","pvalue") } return(res) } # convenience function to guess the name of the last coefficient # in the model matrix, unless specified this will be used for # plots and accessor functions lastCoefName <- function(object) { resNms <- resultsNames(object) resNms[length(resNms)] } # functions to get coef, coefSE, pvalues and padj from mcols(object) getCoef <- function(object,name) { if (missing(name)) { name <- lastCoefName(object) } mcols(object)[name] } getCoefSE <- function(object,name) { if (missing(name)) { name <- lastCoefName(object) } mcols(object)[paste0("SE_",name)] } getStat <- function(object,test="Wald",name) { if (missing(name)) { name <- lastCoefName(object) } if (test == "Wald") { return(mcols(object)[paste0("WaldStatistic_",name)]) } else if (test == "LRT") { return(mcols(object)["LRTStatistic"]) } else { stop("unknown test") } } getPvalue <- function(object,test="Wald",name) { if (missing(name)) { name <- lastCoefName(object) } if (test == "Wald") { return(mcols(object)[paste0("WaldPvalue_",name)]) } else if (test == "LRT") { return(mcols(object)["LRTPvalue"]) } else { stop("unknown test") } } # convenience function to make more descriptive names # for factor variables renameModelMatrixColumns <- function(data, design) { data <- as.data.frame(data) designVars <- all.vars(design) designVarsClass <- sapply(designVars, function(v) class(data[[v]])) factorVars <- designVars[designVarsClass == "factor"] colNamesFrom <- make.names(do.call(c,lapply(factorVars, function(v) paste0(v,levels(data[[v]])[-1])))) colNamesTo <- make.names(do.call(c,lapply(factorVars, function(v) paste0(v,"_",levels(data[[v]])[-1],"_vs_",levels(data[[v]])[1])))) data.frame(from=colNamesFrom,to=colNamesTo,stringsAsFactors=FALSE) } makeWaldTest <- function(object) { betaMatrix <- as.matrix(mcols(object)[,grep("log2 fold change",mcols(mcols(object))$description),drop=FALSE]) modelMatrixNames <- colnames(betaMatrix) betaSE <- as.matrix(mcols(object)[,grep("standard error",mcols(mcols(object))$description),drop=FALSE]) WaldStatistic <- betaMatrix/betaSE colnames(WaldStatistic) <- paste0("WaldStatistic_",modelMatrixNames) WaldPvalue <- 2*pnorm(abs(WaldStatistic),lower.tail=FALSE) colnames(WaldPvalue) <- paste0("WaldPvalue_",modelMatrixNames) modelMatrixNamesSpaces <- gsub("_"," ",modelMatrixNames) statInfo <- paste("Wald statistic:",modelMatrixNamesSpaces) pvalInfo <- paste("Wald test p-value:",modelMatrixNamesSpaces) WaldResults <- DataFrame(c(matrixToList(WaldStatistic), matrixToList(WaldPvalue))) mcols(WaldResults) <- DataFrame(type = rep("results",ncol(WaldResults)), description = c(statInfo, pvalInfo)) mcols(object) <- cbind(mcols(object),WaldResults) return(object) } mleContrast <- function(object, contrast) { contrastFactor <- contrast[1] contrastNumLevel <- contrast[2] contrastDenomLevel <- contrast[3] contrastRefLevel <- levels(colData(object)[,contrastFactor])[1] contrastNumColumn <- make.names(paste0("MLE_",contrastFactor,"_",contrastNumLevel,"_vs_",contrastRefLevel)) contrastDenomColumn <- make.names(paste0("MLE_",contrastFactor,"_",contrastDenomLevel,"_vs_",contrastRefLevel)) cleanName <- paste("log2 fold change (MLE):",contrastFactor,contrastNumLevel,"vs",contrastDenomLevel) if ( contrastDenomLevel == contrastRefLevel ) { name <- make.names(paste0("MLE_",contrastFactor,"_",contrastNumLevel,"_vs_",contrastDenomLevel)) lfcMLE <- mcols(object)[name] } else if ( contrastNumLevel == contrastRefLevel ) { swapName <- make.names(paste0("MLE_",contrastFactor,"_",contrastDenomLevel,"_vs_",contrastNumLevel)) lfcMLE <- mcols(object)[swapName] lfcMLE[[1]] <- -1 * lfcMLE[[swapName]] } else { numMLE <- mcols(object)[[contrastNumColumn]] denomMLE <- mcols(object)[[contrastDenomColumn]] lfcMLE <- mcols(object)[contrastNumColumn] lfcMLE[[1]] <- numMLE - denomMLE } names(lfcMLE) <- "lfcMLE" mcols(lfcMLE)$description <- cleanName lfcMLE } checkContrast <- function(contrast, resNames) { if (!(is.numeric(contrast) | is.character(contrast) | is.list(contrast))) { stop("'contrast' vector should be either a character vector of length 3, a list of length 2 containing character vectors, or a numeric vector, see the argument description in ?results") } # character if (is.character(contrast)) { if (length(contrast) != 3) { stop("'contrast', as a character vector of length 3, should have the form: contrast = c('factorName','numeratorLevel','denominatorLevel'), see the manual page of ?results for more information") } if (contrast[2] == contrast[3]) { stop(paste(contrast[2],"and",contrast[3],"should be different level names")) } } # list if (is.list(contrast)) { if (length(contrast) == 1) { contrast <- list(contrast[[1]], character()) } if (length(contrast) != 2) { stop("'contrast', as a list, should have length 2, or, if length 1, an empty vector will be added for the second element. see the manual page of ?results for more information") } if (!(is.character(contrast[[1]]) & is.character(contrast[[2]]))) { stop("'contrast', as a list of length 2, should have character vectors as elements, see the manual page of ?results for more information") } if (!all(c(contrast[[1]],contrast[[2]]) %in% resNames)) { stop("all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'") } if (length(intersect(contrast[[1]], contrast[[2]])) > 0) { stop("elements in the contrast list should only appear in the numerator (first element of contrast list) or the denominator (second element of contrast list), but not both") } if (length(c(contrast[[1]],contrast[[2]])) == 0) { stop("one of the two elements in the list should be a character vector of non-zero length") } } # numeric if (is.numeric(contrast)) { if (length(contrast) != length(resNames) ) stop("numeric contrast vector should have one element for every element of 'resultsNames(object)'") if (all(contrast==0)) { stop("numeric contrast vector cannot have all elements equal to 0") } } return(contrast) } DESeq2/R/rlog.R0000644000175400017540000003111613201671732014116 0ustar00biocbuildbiocbuild#' Apply a 'regularized log' transformation #' #' This function transforms the count data to the log2 scale in a way #' which minimizes differences between samples for rows with small counts, #' and which normalizes with respect to library size. #' The rlog transformation produces a similar variance stabilizing effect as #' \code{\link{varianceStabilizingTransformation}}, #' though \code{rlog} is more robust in the #' case when the size factors vary widely. #' The transformation is useful when checking for outliers #' or as input for machine learning techniques #' such as clustering or linear discriminant analysis. #' \code{rlog} takes as input a \code{\link{DESeqDataSet}} and returns a #' \code{\link{RangedSummarizedExperiment}} object. #' #' Note that neither rlog transformation nor the VST are used by the #' differential expression estimation in \code{\link{DESeq}}, which always #' occurs on the raw count data, through generalized linear modeling which #' incorporates knowledge of the variance-mean dependence. The rlog transformation #' and VST are offered as separate functionality which can be used for visualization, #' clustering or other machine learning tasks. See the transformation section of the #' vignette for more details. #' #' The transformation does not require that one has already estimated size factors #' and dispersions. #' #' The regularization is on the log fold changes of the count for each sample #' over an intercept, for each gene. As nearby count values for low counts genes #' are almost as likely as the observed count, the rlog shrinkage is greater for low counts. #' For high counts, the rlog shrinkage has a much weaker effect. #' The fitted dispersions are used rather than the MAP dispersions #' (so similar to the \code{\link{varianceStabilizingTransformation}}). #' #' The prior variance for the shrinkag of log fold changes is calculated as follows: #' a matrix is constructed of the logarithm of the counts plus a pseudocount of 0.5, #' the log of the row means is then subtracted, leaving an estimate of #' the log fold changes per sample over the fitted value using only an intercept. #' The prior variance is then calculated by matching the upper quantiles of the observed #' log fold change estimates with an upper quantile of the normal distribution. #' A GLM fit is then calculated using this prior. It is also possible to supply the variance of the prior. #' See the vignette for an example of the use and a comparison with \code{varianceStabilizingTransformation}. #' #' The transformed values, rlog(K), are equal to #' \eqn{rlog(K_{ij}) = \log_2(q_{ij}) = \beta_{i0} + \beta_{ij}}{rlog(K_ij) = log2(q_ij) = beta_i0 + beta_ij}, #' with formula terms defined in \code{\link{DESeq}}. #' #' The parameters of the rlog transformation from a previous dataset #' can be frozen and reapplied to new samples. See the 'Data quality assessment' #' section of the vignette for strategies to see if new samples are #' sufficiently similar to previous datasets. #' The frozen rlog is accomplished by saving the dispersion function, #' beta prior variance and the intercept from a previous dataset, #' and running \code{rlog} with 'blind' set to FALSE #' (see example below). #' #' @aliases rlog rlogTransformation #' @rdname rlog #' @name rlog #' #' @param object a DESeqDataSet, or matrix of counts #' @param blind logical, whether to blind the transformation to the experimental #' design. blind=TRUE should be used for comparing samples in an manner unbiased by #' prior information on samples, for example to perform sample QA (quality assurance). #' blind=FALSE should be used for transforming data for downstream analysis, #' where the full use of the design information should be made. #' blind=FALSE will skip re-estimation of the dispersion trend, if this has already been calculated. #' If many of genes have large differences in counts due to #' the experimental design, it is important to set blind=FALSE for downstream #' analysis. #' @param intercept by default, this is not provided and calculated automatically. #' if provided, this should be a vector as long as the number of rows of object, #' which is log2 of the mean normalized counts from a previous dataset. #' this will enforce the intercept for the GLM, allowing for a "frozen" rlog #' transformation based on a previous dataset. #' You will also need to provide \code{mcols(object)$dispFit}. #' @param betaPriorVar a single value, the variance of the prior on the sample #' betas, which if missing is estimated from the data #' @param fitType in case dispersions have not yet been estimated for \code{object}, #' this parameter is passed on to \code{\link{estimateDispersions}} (options described there). #' #' @return a \code{\link{DESeqTransform}} if a \code{DESeqDataSet} was provided, #' or a matrix if a count matrix was provided as input. #' Note that for \code{\link{DESeqTransform}} output, the matrix of #' transformed values is stored in \code{assay(rld)}. #' To avoid returning matrices with NA values, in the case of a row #' of all zeros, the rlog transformation returns zeros #' (essentially adding a pseudocount of 1 only to these rows). #' #' @references #' #' Reference for regularized logarithm (rlog): #' #' Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 2014, 15:550. \url{http://dx.doi.org/10.1186/s13059-014-0550-8} #' #' @seealso \code{\link{plotPCA}}, \code{\link{varianceStabilizingTransformation}}, \code{\link{normTransform}} #' @examples #' #' dds <- makeExampleDESeqDataSet(m=6,betaSD=1) #' rld <- rlog(dds) #' dists <- dist(t(assay(rld))) #' plot(hclust(dists)) #' #' # run the rlog transformation on one dataset #' design(dds) <- ~ 1 #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersions(dds) #' rld <- rlog(dds, blind=FALSE) #' #' # apply the parameters to a new sample #' #' ddsNew <- makeExampleDESeqDataSet(m=1) #' mcols(ddsNew)$dispFit <- mcols(dds)$dispFit #' betaPriorVar <- attr(rld,"betaPriorVar") #' intercept <- mcols(rld)$rlogIntercept #' rldNew <- rlog(ddsNew, blind=FALSE, #' intercept=intercept, #' betaPriorVar=betaPriorVar) #' #' #' @export rlog <- function(object, blind=TRUE, intercept, betaPriorVar, fitType="parametric") { if (is.null(colnames(object))) { colnames(object) <- seq_len(ncol(object)) } if (is.matrix(object)) { matrixIn <- TRUE object <- DESeqDataSetFromMatrix(object, DataFrame(row.names=colnames(object)), ~ 1) } else { matrixIn <- FALSE } if (is.null(sizeFactors(object)) & is.null(normalizationFactors(object))) { object <- estimateSizeFactors(object) } if (blind) { design(object) <- ~ 1 } # sparsity test if (missing(intercept)) { sparseTest(counts(object, normalized=TRUE), .9, 100, .1) } if (blind | is.null(mcols(object)$dispFit)) { # estimate the dispersions on all genes if (is.null(mcols(object)$baseMean)) { object <- getBaseMeansAndVariances(object) } object <- estimateDispersionsGeneEst(object, quiet=TRUE) object <- estimateDispersionsFit(object, fitType, quiet=TRUE) } if (!missing(intercept)) { if (length(intercept) != nrow(object)) { stop("intercept should be as long as the number of rows of object") } } rld <- rlogData(object, intercept, betaPriorVar) if (matrixIn) { return(rld) } se <- SummarizedExperiment( assays = rld, colData = colData(object), rowRanges = rowRanges(object), metadata = metadata(object)) dt <- DESeqTransform(se) attr(dt,"betaPriorVar") <- attr(rld, "betaPriorVar") if (!is.null(attr(rld,"intercept"))) { mcols(dt)$rlogIntercept <- attr(rld,"intercept") } dt } #' @rdname rlog #' @export rlogTransformation <- rlog ###################### unexported rlogData <- function(object, intercept, betaPriorVar) { if (is.null(mcols(object)$dispFit)) { stop("first estimate dispersion") } samplesVector <- as.character(seq_len(ncol(object))) if (!missing(intercept)) { if (length(intercept) != nrow(object)) { stop("intercept should be as long as the number of rows of object") } } if (is.null(mcols(object)$allZero) | is.null(mcols(object)$baseMean)) { object <- getBaseMeansAndVariances(object) } # make a design matrix with a term for every sample # this would typically produce unidentifiable solution # for the GLM, but we add priors for all terms except # the intercept samplesVector <- factor(samplesVector,levels=unique(samplesVector)) if (missing(intercept)) { samples <- factor(c("null_level",as.character(samplesVector)), levels=c("null_level",levels(samplesVector))) modelMatrix <- stats::model.matrix.default(~samples)[-1,] modelMatrixNames <- colnames(modelMatrix) modelMatrixNames[modelMatrixNames == "(Intercept)"] <- "Intercept" } else { # or we want to set the intercept using the # provided intercept instead samples <- factor(samplesVector) if (length(samples) > 1) { modelMatrix <- stats::model.matrix.default(~ 0 + samples) } else { modelMatrix <- matrix(1,ncol=1) modelMatrixNames <- "samples1" } modelMatrixNames <- colnames(modelMatrix) if (!is.null(normalizationFactors(object))) { nf <- normalizationFactors(object) } else { sf <- sizeFactors(object) nf <- matrix(rep(sf,each=nrow(object)),ncol=ncol(object)) } # if the intercept is not finite, these rows # were all zero. here we put a small value instead intercept <- as.numeric(intercept) infiniteIntercept <- !is.finite(intercept) intercept[infiniteIntercept] <- -10 normalizationFactors(object) <- nf * 2^intercept # we set the intercept, so replace the all zero # column with the rows which were all zero # in the previous dataset mcols(object)$allZero <- infiniteIntercept } # only continue on the rows with non-zero row sums objectNZ <- object[!mcols(object)$allZero,] stopifnot(all(!is.na(mcols(objectNZ)$dispFit))) # if a prior sigma squared not provided, estimate this # by the matching upper quantiles of the # log2 counts plus a pseudocount if (missing(betaPriorVar)) { logCounts <- log2(counts(objectNZ,normalized=TRUE) + 0.5) logFoldChangeMatrix <- logCounts - log2(mcols(objectNZ)$baseMean + 0.5) logFoldChangeVector <- as.numeric(logFoldChangeMatrix) varlogk <- 1/mcols(objectNZ)$baseMean + mcols(objectNZ)$dispFit weights <- 1/varlogk betaPriorVar <- matchWeightedUpperQuantileForVariance(logFoldChangeVector, rep(weights,ncol(objectNZ))) } stopifnot(length(betaPriorVar)==1) lambda <- 1/rep(betaPriorVar,ncol(modelMatrix)) # except for intercept which we set to wide prior if ("Intercept" %in% modelMatrixNames) { lambda[which(modelMatrixNames == "Intercept")] <- 1e-6 } fit <- fitNbinomGLMs(object=objectNZ, modelMatrix=modelMatrix, lambda=lambda, renameCols=FALSE, alpha_hat=mcols(objectNZ)$dispFit, betaTol=1e-4, useOptim=FALSE, useQR=TRUE) normalizedDataNZ <- t(modelMatrix %*% t(fit$betaMatrix)) normalizedData <- buildMatrixWithZeroRows(normalizedDataNZ, mcols(object)$allZero) # add back in the intercept, if finite if (!missing(intercept)) { normalizedData <- normalizedData + ifelse(infiniteIntercept, 0, intercept) } colnames(normalizedData) <- colnames(object) attr(normalizedData,"betaPriorVar") <- betaPriorVar if ("Intercept" %in% modelMatrixNames) { fittedInterceptNZ <- fit$betaMatrix[,which(modelMatrixNames == "Intercept"),drop=FALSE] fittedIntercept <- buildMatrixWithNARows(fittedInterceptNZ, mcols(object)$allZero) fittedIntercept[is.na(fittedIntercept)] <- -Inf attr(normalizedData,"intercept") <- fittedIntercept } normalizedData } sparseTest <- function(x, p, t1, t2) { rs <- rowSums(x) rmx <- apply(x, 1, max) if (all(rs <= t1)) return(invisible()) prop <- (rmx/rs)[rs > t1] total <- mean(prop > p) if (total > t2) warning("the rlog assumes that data is close to a negative binomial distribution, an assumption which is sometimes not compatible with datasets where many genes have many zero counts despite a few very large counts. In this data, for ",round(total,3)*100,"% of genes with a sum of normalized counts above ",t1,", it was the case that a single sample's normalized count made up more than ",p*100,"% of the sum over all samples. the threshold for this warning is ",t2*100,"% of genes. See plotSparsity(dds) for a visualization of this. We recommend instead using the varianceStabilizingTransformation or shifted log (see vignette).") } DESeq2/R/vst.R0000644000175400017540000003100713201671732013766 0ustar00biocbuildbiocbuild#' Apply a variance stabilizing transformation (VST) to the count data #' #' This function calculates a variance stabilizing transformation (VST) from the #' fitted dispersion-mean relation(s) and then transforms the count data (normalized #' by division by the size factors or normalization factors), yielding a matrix #' of values which are now approximately homoskedastic (having constant variance along the range #' of mean values). The transformation also normalizes with respect to library size. #' The \code{\link{rlog}} is less sensitive #' to size factors, which can be an issue when size factors vary widely. #' These transformations are useful when checking for outliers or as input for #' machine learning techniques such as clustering or linear discriminant analysis. #' #' @aliases varianceStabilizingTransformation getVarianceStabilizedData #' #' @param object a DESeqDataSet or matrix of counts #' @param blind logical, whether to blind the transformation to the experimental #' design. blind=TRUE should be used for comparing samples in an manner unbiased by #' prior information on samples, for example to perform sample QA (quality assurance). #' blind=FALSE should be used for transforming data for downstream analysis, #' where the full use of the design information should be made. #' blind=FALSE will skip re-estimation of the dispersion trend, if this has already been calculated. #' If many of genes have large differences in counts due to #' the experimental design, it is important to set blind=FALSE for downstream #' analysis. #' @param fitType in case dispersions have not yet been estimated for \code{object}, #' this parameter is passed on to \code{\link{estimateDispersions}} (options described there). #' #' @details For each sample (i.e., column of \code{counts(dds)}), the full variance function #' is calculated from the raw variance (by scaling according to the size factor and adding #' the shot noise). We recommend a blind estimation of the variance function, i.e., #' one ignoring conditions. This is performed by default, and can be modified using the #' 'blind' argument. #' #' Note that neither rlog transformation nor the VST are used by the #' differential expression estimation in \code{\link{DESeq}}, which always #' occurs on the raw count data, through generalized linear modeling which #' incorporates knowledge of the variance-mean dependence. The rlog transformation #' and VST are offered as separate functionality which can be used for visualization, #' clustering or other machine learning tasks. See the transformation section of the #' vignette for more details. #' #' The transformation does not require that one has already estimated size factors #' and dispersions. #' #' A typical workflow is shown in Section \emph{Variance stabilizing transformation} #' in the package vignette. #' #' If \code{\link{estimateDispersions}} was called with: #' #' \code{fitType="parametric"}, #' a closed-form expression for the variance stabilizing #' transformation is used on the normalized #' count data. The expression can be found in the file \file{vst.pdf} #' which is distributed with the vignette. #' #' \code{fitType="local"}, #' the reciprocal of the square root of the variance of the normalized counts, as derived #' from the dispersion fit, is then numerically #' integrated, and the integral (approximated by a spline function) is evaluated for each #' count value in the column, yielding a transformed value. #' #' \code{fitType="mean"}, a VST is applied for Negative Binomial distributed counts, 'k', #' with a fixed dispersion, 'a': ( 2 asinh(sqrt(a k)) - log(a) - log(4) )/log(2). #' #' In all cases, the transformation is scaled such that for large #' counts, it becomes asymptotically (for large values) equal to the #' logarithm to base 2 of normalized counts. #' #' The variance stabilizing transformation from a previous dataset #' can be frozen and reapplied to new samples. See the 'Data quality assessment' #' section of the vignette for strategies to see if new samples are #' sufficiently similar to previous datasets. #' The frozen VST is accomplished by saving the dispersion function #' accessible with \code{\link{dispersionFunction}}, assigning this #' to the \code{DESeqDataSet} with the new samples, and running #' varianceStabilizingTransformation with 'blind' set to FALSE #' (see example below). #' Then the dispersion function from the previous dataset will be used #' to transform the new sample(s). #' #' Limitations: In order to preserve normalization, the same #' transformation has to be used for all samples. This results in the #' variance stabilizition to be only approximate. The more the size #' factors differ, the more residual dependence of the variance on the #' mean will be found in the transformed data. \code{\link{rlog}} is a #' transformation which can perform better in these cases. #' As shown in the vignette, the function \code{meanSdPlot} #' from the package \pkg{vsn} can be used to see whether this is a problem. #' #' @return \code{varianceStabilizingTransformation} returns a #' \code{\link{DESeqTransform}} if a \code{DESeqDataSet} was provided, #' or returns a a matrix if a count matrix was provided. #' Note that for \code{\link{DESeqTransform}} output, the matrix of #' transformed values is stored in \code{assay(vsd)}. #' \code{getVarianceStabilizedData} also returns a matrix. #' #' @references #' #' Reference for the variance stabilizing transformation for counts with a dispersion trend: #' #' Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} #' #' @author Simon Anders #' #' @seealso \code{\link{plotPCA}}, \code{\link{rlog}}, \code{\link{normTransform}} #' #' @examples #' #' dds <- makeExampleDESeqDataSet(m=6) #' vsd <- varianceStabilizingTransformation(dds) #' dists <- dist(t(assay(vsd))) #' plot(hclust(dists)) #' #' # learn the dispersion function of a dataset #' design(dds) <- ~ 1 #' dds <- estimateSizeFactors(dds) #' dds <- estimateDispersions(dds) #' #' # use the previous dispersion function for a new sample #' ddsNew <- makeExampleDESeqDataSet(m=1) #' ddsNew <- estimateSizeFactors(ddsNew) #' dispersionFunction(ddsNew) <- dispersionFunction(dds) #' vsdNew <- varianceStabilizingTransformation(ddsNew, blind=FALSE) #' #' @export varianceStabilizingTransformation <- function (object, blind=TRUE, fitType="parametric") { if (is.null(colnames(object))) { colnames(object) <- seq_len(ncol(object)) } if (is.matrix(object)) { matrixIn <- TRUE object <- DESeqDataSetFromMatrix(object, DataFrame(row.names=colnames(object)), ~1) } else { matrixIn <- FALSE } if (is.null(sizeFactors(object)) & is.null(normalizationFactors(object))) { object <- estimateSizeFactors(object) } if (blind) { design(object) <- ~ 1 } if (blind | is.null(attr(dispersionFunction(object),"fitType"))) { object <- estimateDispersionsGeneEst(object, quiet=TRUE) object <- estimateDispersionsFit(object, quiet=TRUE, fitType) } vsd <- getVarianceStabilizedData(object) if (matrixIn) { return(vsd) } se <- SummarizedExperiment( assays = vsd, colData = colData(object), rowRanges = rowRanges(object), metadata = metadata(object)) DESeqTransform(se) } #' @rdname varianceStabilizingTransformation #' @export getVarianceStabilizedData <- function(object) { if (is.null(attr(dispersionFunction(object),"fitType"))) { stop("call estimateDispersions before calling getVarianceStabilizedData") } ncounts <- counts(object, normalized=TRUE) if( attr( dispersionFunction(object), "fitType" ) == "parametric" ) { coefs <- attr( dispersionFunction(object), "coefficients" ) vst <- function( q ) { log( (1 + coefs["extraPois"] + 2 * coefs["asymptDisp"] * q + 2 * sqrt( coefs["asymptDisp"] * q * ( 1 + coefs["extraPois"] + coefs["asymptDisp"] * q ) ) ) / ( 4 * coefs["asymptDisp"] ) ) / log(2) } return(vst(ncounts)) } else if ( attr( dispersionFunction(object), "fitType" ) == "local" ) { # non-parametric fit -> numerical integration if (is.null(sizeFactors(object))) { stopifnot(!is.null(normalizationFactors(object))) # approximate size factors from columns of NF sf <- exp(colMeans(log(normalizationFactors(object)))) } else { sf <- sizeFactors(object) } xg <- sinh( seq( asinh(0), asinh(max(ncounts)), length.out=1000 ) )[-1] xim <- mean( 1/sf ) baseVarsAtGrid <- dispersionFunction(object)( xg ) * xg^2 + xim * xg integrand <- 1 / sqrt( baseVarsAtGrid ) splf <- splinefun( asinh( ( xg[-1] + xg[-length(xg)] )/2 ), cumsum( ( xg[-1] - xg[-length(xg)] ) * ( integrand[-1] + integrand[-length(integrand)] )/2 ) ) h1 <- quantile( rowMeans(ncounts), .95 ) h2 <- quantile( rowMeans(ncounts), .999 ) eta <- ( log2(h2) - log2(h1) ) / ( splf(asinh(h2)) - splf(asinh(h1)) ) xi <- log2(h1) - eta * splf(asinh(h1)) tc <- sapply( colnames(counts(object)), function(clm) { eta * splf( asinh( ncounts[,clm] ) ) + xi }) rownames( tc ) <- rownames( counts(object) ) return(tc) } else if ( attr( dispersionFunction(object), "fitType" ) == "mean" ) { alpha <- attr( dispersionFunction(object), "mean" ) # the following stablizes NB counts with fixed dispersion alpha # and converges to log2(q) as q => infinity vst <- function(q) ( 2 * asinh(sqrt(alpha * q)) - log(alpha) - log(4) ) / log(2) return(vst(ncounts)) } else { stop( "fitType is not parametric, local or mean" ) } } #' Quickly estimate dispersion trend and apply a variance stabilizing transformation #' #' This is a wrapper for the \code{\link{varianceStabilizingTransformation}} (VST) #' that provides much faster estimation of the dispersion trend used to determine #' the formula for the VST. The speed-up is accomplished by #' subsetting to a smaller number of genes in order to estimate this dispersion trend. #' The subset of genes is chosen deterministically, to span the range #' of genes' mean normalized count. #' This wrapper for the VST is not blind to the experimental design: #' the sample covariate information is used to estimate the global trend #' of genes' dispersion values over the genes' mean normalized count. #' It can be made strictly blind to experimental design by first #' assigning a \code{\link{design}} of \code{~1} before running this function, #' or by avoiding subsetting and using \code{\link{varianceStabilizingTransformation}}. #' #' @param object a DESeqDataSet or a matrix of counts #' @param blind logical, whether to blind the transformation to the experimental #' design (see \code{\link{varianceStabilizingTransformation}}) #' @param nsub the number of genes to subset to (default 1000) #' @param fitType for estimation of dispersions: this parameter #' is passed on to \code{\link{estimateDispersions}} (options described there) #' #' @return a DESeqTranform object or a matrix of transformed, normalized counts #' #' @examples #' #' dds <- makeExampleDESeqDataSet(n=20000, m=20) #' vsd <- vst(dds) #' #' @export vst <- function(object, blind=TRUE, nsub=1000, fitType="parametric") { if (nrow(object) < nsub) { stop("less than 'nsub' rows, it is recommended to use varianceStabilizingTransformation directly") } if (is.null(colnames(object))) { colnames(object) <- seq_len(ncol(object)) } if (is.matrix(object)) { matrixIn <- TRUE object <- DESeqDataSetFromMatrix(object, DataFrame(row.names=colnames(object)), ~ 1) } else { if (blind) { design(object) <- ~ 1 } matrixIn <- FALSE } if (is.null(sizeFactors(object)) & is.null(normalizationFactors(object))) { object <- estimateSizeFactors(object) } baseMean <- rowMeans(counts(object, normalized=TRUE)) if (sum(baseMean > 5) < nsub) { stop("less than 'nsub' rows with mean normalized count > 5, it is recommended to use varianceStabilizingTransformation directly") } # subset to a specified number of genes with mean normalized count > 5 object.sub <- object[baseMean > 5,] baseMean <- baseMean[baseMean > 5] o <- order(baseMean) idx <- o[round(seq(from=1, to=length(o), length=nsub))] object.sub <- object.sub[idx,] # estimate dispersion trend object.sub <- estimateDispersionsGeneEst(object.sub, quiet=TRUE) object.sub <- estimateDispersionsFit(object.sub, fitType=fitType, quiet=TRUE) # assign to the full object suppressMessages({dispersionFunction(object) <- dispersionFunction(object.sub)}) # calculate and apply the VST vsd <- varianceStabilizingTransformation(object, blind=FALSE) if (matrixIn) { return(assay(vsd)) } else { return(vsd) } } DESeq2/R/wrappers.R0000644000175400017540000001303313201671732015014 0ustar00biocbuildbiocbuild# Fit dispersions for negative binomial GLM # # This function estimates the dispersion parameter (alpha) for negative binomial # generalized linear models. The fitting is performed on the log scale. # # ySEXP n by m matrix of counts # xSEXP m by k design matrix # mu_hatSEXP n by m matrix, the expected mean values, given beta-hat # log_alphaSEXP n length vector of initial guesses for log(alpha) # log_alpha_prior_meanSEXP n length vector of the fitted values for log(alpha) # log_alpha_prior_sigmasqSEXP a single numeric value for the variance of the prior # min_log_alphaSEXP the minimum value of log alpha # kappa_0SEXP a parameter used in calculting the initial proposal # for the backtracking search # initial proposal = log(alpha) + kappa_0 * deriv. of log lik. w.r.t. log(alpha) # tolSEXP tolerance for convergence in estimates # maxitSEXP maximum number of iterations # usePriorSEXP boolean variable, whether to use a prior or just calculate the MLE # weightsSEXP n by m matrix of weights # useWeightsSEXP whether to use weights # # return a list with elements: log_alpha, iter, iter_accept, last_change, initial_lp, intial_dlp, last_lp, last_dlp, last_d2lp fitDispWrapper <- function (ySEXP, xSEXP, mu_hatSEXP, log_alphaSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, min_log_alphaSEXP, kappa_0SEXP, tolSEXP, maxitSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP) { # test for any NAs in arguments arg.names <- names(formals(fitDispWrapper)) na.test <- sapply(mget(arg.names), function(x) any(is.na(x))) if (any(na.test)) stop(paste("in call to fitDisp, the following arguments contain NA:", paste(arg.names[na.test],collapse=", "))) fitDisp(ySEXP=ySEXP, xSEXP=xSEXP, mu_hatSEXP=mu_hatSEXP, log_alphaSEXP=log_alphaSEXP, log_alpha_prior_meanSEXP=log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP=log_alpha_prior_sigmasqSEXP, min_log_alphaSEXP=min_log_alphaSEXP, kappa_0SEXP=kappa_0SEXP, tolSEXP=tolSEXP, maxitSEXP=maxitSEXP, usePriorSEXP=usePriorSEXP, weightsSEXP=weightsSEXP, useWeightsSEXP=useWeightsSEXP) } # Fit dispersions by evaluating over grid # # This function estimates the dispersion parameter (alpha) for negative binomial # generalized linear models. The fitting is performed on the log scale. # # ySEXP n by m matrix of counts # xSEXP m by k design matrix # mu_hatSEXP n by m matrix, the expected mean values, given beta-hat # disp_gridSEXP the grid over which to estimate # log_alpha_prior_meanSEXP n length vector of the fitted values for log(alpha) # log_alpha_prior_sigmasqSEXP a single numeric value for the variance of the prior # usePriorSEXP boolean variable, whether to use a prior or just calculate the MLE # weightsSEXP n by m matrix of weights # useWeightsSEXP whether to use weights # # return a list with elements: fitDispGridWrapper <- function(y, x, mu, logAlphaPriorMean, logAlphaPriorSigmaSq, usePrior, weightsSEXP, useWeightsSEXP) { # test for any NAs in arguments arg.names <- names(formals(fitDispGridWrapper)) na.test <- sapply(mget(arg.names), function(x) any(is.na(x))) if (any(na.test)) stop(paste("in call to fitDispGridWrapper, the following arguments contain NA:", paste(arg.names[na.test],collapse=", "))) minLogAlpha <- log(1e-8) maxLogAlpha <- log(max(10, ncol(y))) dispGrid <- seq(from=minLogAlpha, to=maxLogAlpha, length=15) logAlpha <- fitDispGrid(ySEXP=y, xSEXP=x, mu_hatSEXP=mu, disp_gridSEXP=dispGrid, log_alpha_prior_meanSEXP=logAlphaPriorMean, log_alpha_prior_sigmasqSEXP=logAlphaPriorSigmaSq, usePriorSEXP=usePrior, weightsSEXP=weightsSEXP, useWeightsSEXP=useWeightsSEXP)$log_alpha exp(logAlpha) } # Fit beta coefficients for negative binomial GLM # # This function estimates the coefficients (betas) for negative binomial generalized linear models. # # ySEXP n by m matrix of counts # xSEXP m by k design matrix # nfSEXP n by m matrix of normalization factors # alpha_hatSEXP n length vector of the disperion estimates # contrastSEXP a k length vector for a possible contrast # beta_matSEXP n by k matrix of the initial estimates for the betas # lambdaSEXP k length vector of the ridge values # weightsSEXP n by m matrix of weights # useWeightsSEXP whether to use weights # tolSEXP tolerance for convergence in estimates # maxitSEXP maximum number of iterations # useQRSEXP whether to use QR decomposition # # Note: at this level the betas are on the natural log scale fitBetaWrapper <- function (ySEXP, xSEXP, nfSEXP, alpha_hatSEXP, contrastSEXP, beta_matSEXP, lambdaSEXP, weightsSEXP, useWeightsSEXP, tolSEXP, maxitSEXP, useQRSEXP) { if ( missing(contrastSEXP) ) { # contrast is not required, just give 1,0,0,... contrastSEXP <- c(1,rep(0,ncol(xSEXP)-1)) } # test for any NAs in arguments arg.names <- names(formals(fitBetaWrapper)) na.test <- sapply(mget(arg.names), function(x) any(is.na(x))) if (any(na.test)) stop(paste("in call to fitBeta, the following arguments contain NA:", paste(arg.names[na.test],collapse=", "))) fitBeta(ySEXP=ySEXP, xSEXP=xSEXP, nfSEXP=nfSEXP, alpha_hatSEXP=alpha_hatSEXP, contrastSEXP=contrastSEXP, beta_matSEXP=beta_matSEXP, lambdaSEXP=lambdaSEXP, weightsSEXP=weightsSEXP, useWeightsSEXP=useWeightsSEXP, tolSEXP=tolSEXP, maxitSEXP=maxitSEXP, useQRSEXP=useQRSEXP) } DESeq2/build/0000755000175400017540000000000013201712502013713 5ustar00biocbuildbiocbuildDESeq2/build/vignette.rds0000644000175400017540000000033513201712502016253 0ustar00biocbuildbiocbuildb```b`fab`b2 1# 'rq N-4 MAQrK̩KWs-N-THI,IT(,PB 5+$7Mf E`q<PǚZ% 5/$~yVaaqIY0AAn0Ez0?¿Ht&${ ziE@ wB̌0DESeq2/inst/0000755000175400017540000000000013201712502013571 5ustar00biocbuildbiocbuildDESeq2/inst/CITATION0000644000175400017540000000127513201671732014744 0ustar00biocbuildbiocbuildcitEntry(entry="article", title = "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2", author = personList( as.person("Michael I. Love"), as.person("Wolfgang Huber"), as.person("Simon Anders")), year = 2014, journal = "Genome Biology", doi = "10.1186/s13059-014-0550-8", volume = 15, issue = 12, pages = 550, textVersion = paste("Love, M.I., Huber, W., Anders, S.", "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2", "Genome Biology 15(12):550 (2014)" ) ) DESeq2/inst/doc/0000755000175400017540000000000013201712502014336 5ustar00biocbuildbiocbuildDESeq2/inst/doc/DESeq2.R0000644000175400017540000004604013201712477015523 0ustar00biocbuildbiocbuild## ----setup, echo=FALSE, results="hide"----------------------------------- knitr::opts_chunk$set(tidy=FALSE, cache=TRUE, dev="png", message=FALSE, error=FALSE, warning=TRUE) ## ----quickStart, eval=FALSE---------------------------------------------- # dds <- DESeqDataSetFromMatrix(countData = cts, # colData = coldata, # design= ~ batch + condition) # dds <- DESeq(dds) # res <- results(dds, contrast=c("condition","treat","ctrl")) # resultsNames(dds) # res <- lfcShrink(dds, coef=2) ## ----txiSetup------------------------------------------------------------ library("tximport") library("readr") library("tximportData") dir <- system.file("extdata", package="tximportData") samples <- read.table(file.path(dir,"samples.txt"), header=TRUE) samples$condition <- factor(rep(c("A","B"),each=3)) rownames(samples) <- samples$run samples[,c("pop","center","run","condition")] ## ----txiFiles------------------------------------------------------------ files <- file.path(dir,"salmon", samples$run, "quant.sf") names(files) <- samples$run tx2gene <- read.csv(file.path(dir, "tx2gene.csv")) ## ----tximport, results="hide"-------------------------------------------- txi <- tximport(files, type="salmon", tx2gene=tx2gene) ## ----txi2dds, results="hide"--------------------------------------------- library("DESeq2") ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ condition) ## ----loadPasilla--------------------------------------------------------- library("pasilla") pasCts <- system.file("extdata", "pasilla_gene_counts.tsv", package="pasilla", mustWork=TRUE) pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv", package="pasilla", mustWork=TRUE) cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id")) coldata <- read.csv(pasAnno, row.names=1) coldata <- coldata[,c("condition","type")] ## ----showPasilla--------------------------------------------------------- head(cts,2) coldata ## ----reorderPasila------------------------------------------------------- rownames(coldata) <- sub("fb", "", rownames(coldata)) all(rownames(coldata) %in% colnames(cts)) all(rownames(coldata) == colnames(cts)) cts <- cts[, rownames(coldata)] all(rownames(coldata) == colnames(cts)) ## ----matrixInput--------------------------------------------------------- library("DESeq2") dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition) dds ## ----addFeatureData------------------------------------------------------ featureData <- data.frame(gene=rownames(cts)) mcols(dds) <- DataFrame(mcols(dds), featureData) mcols(dds) ## ----htseqDirI, eval=FALSE----------------------------------------------- # directory <- "/path/to/your/files/" ## ----htseqDirII---------------------------------------------------------- directory <- system.file("extdata", package="pasilla", mustWork=TRUE) ## ----htseqInput---------------------------------------------------------- sampleFiles <- grep("treated",list.files(directory),value=TRUE) sampleCondition <- sub("(.*treated).*","\\1",sampleFiles) sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition) ## ----hsteqDds------------------------------------------------------------ library("DESeq2") ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition) ddsHTSeq ## ----loadSumExp---------------------------------------------------------- library("airway") data("airway") se <- airway ## ----sumExpInput--------------------------------------------------------- library("DESeq2") ddsSE <- DESeqDataSet(se, design = ~ cell + dex) ddsSE ## ----prefilter----------------------------------------------------------- keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] ## ----factorlvl----------------------------------------------------------- dds$condition <- factor(dds$condition, levels = c("untreated","treated")) ## ----relevel------------------------------------------------------------- dds$condition <- relevel(dds$condition, ref = "untreated") ## ----droplevels---------------------------------------------------------- dds$condition <- droplevels(dds$condition) ## ----deseq--------------------------------------------------------------- dds <- DESeq(dds) res <- results(dds) res ## ----lfcShrink----------------------------------------------------------- resultsNames(dds) resLFC <- lfcShrink(dds, coef=2) resLFC ## ----parallel, eval=FALSE------------------------------------------------ # library("BiocParallel") # register(MulticoreParam(4)) ## ----resOrder------------------------------------------------------------ resOrdered <- res[order(res$pvalue),] ## ----sumRes-------------------------------------------------------------- summary(res) ## ----sumRes01------------------------------------------------------------ sum(res$padj < 0.1, na.rm=TRUE) ## ----resAlpha05---------------------------------------------------------- res05 <- results(dds, alpha=0.05) summary(res05) sum(res05$padj < 0.05, na.rm=TRUE) ## ----IHW----------------------------------------------------------------- library("IHW") resIHW <- results(dds, filterFun=ihw) summary(resIHW) sum(resIHW$padj < 0.1, na.rm=TRUE) metadata(resIHW)$ihwResult ## ----MA------------------------------------------------------------------ plotMA(res, ylim=c(-2,2)) ## ----shrunkMA------------------------------------------------------------ plotMA(resLFC, ylim=c(-2,2)) ## ----MAidentify, eval=FALSE---------------------------------------------- # idx <- identify(res$baseMean, res$log2FoldChange) # rownames(res)[idx] ## ----warning=FALSE------------------------------------------------------- resApe <- lfcShrink(dds, coef=2, type="apeglm") resAsh <- lfcShrink(dds, coef=2, type="ashr") ## ----fig.width=8, fig.height=3------------------------------------------- par(mfrow=c(1,3), mar=c(4,4,2,1)) xlim <- c(1,1e5); ylim <- c(-3,3) plotMA(resLFC, xlim=xlim, ylim=ylim, main="normal") plotMA(resApe, xlim=xlim, ylim=ylim, main="apeglm") plotMA(resAsh, xlim=xlim, ylim=ylim, main="ashr") ## ----plotCounts---------------------------------------------------------- plotCounts(dds, gene=which.min(res$padj), intgroup="condition") ## ----plotCountsAdv------------------------------------------------------- d <- plotCounts(dds, gene=which.min(res$padj), intgroup="condition", returnData=TRUE) library("ggplot2") ggplot(d, aes(x=condition, y=count)) + geom_point(position=position_jitter(w=0.1,h=0)) + scale_y_log10(breaks=c(25,100,400)) ## ----metadata------------------------------------------------------------ mcols(res)$description ## ----export, eval=FALSE-------------------------------------------------- # write.csv(as.data.frame(resOrdered), # file="condition_treated_results.csv") ## ----subset-------------------------------------------------------------- resSig <- subset(resOrdered, padj < 0.1) resSig ## ----multifactor--------------------------------------------------------- colData(dds) ## ----copyMultifactor----------------------------------------------------- ddsMF <- dds ## ----fixLevels----------------------------------------------------------- levels(ddsMF$type) levels(ddsMF$type) <- sub("-.*", "", levels(ddsMF$type)) levels(ddsMF$type) ## ----replaceDesign------------------------------------------------------- design(ddsMF) <- formula(~ type + condition) ddsMF <- DESeq(ddsMF) ## ----multiResults-------------------------------------------------------- resMF <- results(ddsMF) head(resMF) ## ----multiTypeResults---------------------------------------------------- resMFType <- results(ddsMF, contrast=c("type", "single", "paired")) head(resMFType) ## ----rlogAndVST---------------------------------------------------------- vsd <- vst(dds, blind=FALSE) rld <- rlog(dds, blind=FALSE) head(assay(vsd), 3) ## ----meansd-------------------------------------------------------------- # this gives log2(n + 1) ntd <- normTransform(dds) library("vsn") meanSdPlot(assay(ntd)) meanSdPlot(assay(vsd)) meanSdPlot(assay(rld)) ## ----heatmap------------------------------------------------------------- library("pheatmap") select <- order(rowMeans(counts(dds,normalized=TRUE)), decreasing=TRUE)[1:20] df <- as.data.frame(colData(dds)[,c("condition","type")]) pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) pheatmap(assay(vsd)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) pheatmap(assay(rld)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) ## ----sampleClust--------------------------------------------------------- sampleDists <- dist(t(assay(vsd))) ## ----figHeatmapSamples, fig.height=4, fig.width=6------------------------ library("RColorBrewer") sampleDistMatrix <- as.matrix(sampleDists) rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-") colnames(sampleDistMatrix) <- NULL colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) pheatmap(sampleDistMatrix, clustering_distance_rows=sampleDists, clustering_distance_cols=sampleDists, col=colors) ## ----figPCA-------------------------------------------------------------- plotPCA(vsd, intgroup=c("condition", "type")) ## ----figPCA2------------------------------------------------------------- pcaData <- plotPCA(vsd, intgroup=c("condition", "type"), returnData=TRUE) percentVar <- round(100 * attr(pcaData, "percentVar")) ggplot(pcaData, aes(PC1, PC2, color=condition, shape=type)) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar[1],"% variance")) + ylab(paste0("PC2: ",percentVar[2],"% variance")) + coord_fixed() ## ----WaldTest, eval=FALSE------------------------------------------------ # dds <- estimateSizeFactors(dds) # dds <- estimateDispersions(dds) # dds <- nbinomWaldTest(dds) ## ----simpleContrast, eval=FALSE------------------------------------------ # results(dds, contrast=c("condition","C","B")) ## ----combineFactors, eval=FALSE------------------------------------------ # dds$group <- factor(paste0(dds$genotype, dds$condition)) # design(dds) <- ~ group # dds <- DESeq(dds) # resultsNames(dds) # results(dds, contrast=c("group", "IB", "IA")) ## ----interFig, echo=FALSE, results="hide", fig.height=3------------------ npg <- 20 mu <- 2^c(8,10,9,11,10,12) cond <- rep(rep(c("A","B"),each=npg),3) geno <- rep(c("I","II","III"),each=2*npg) table(cond, geno) counts <- rnbinom(6*npg, mu=rep(mu,each=npg), size=1/.01) d <- data.frame(log2c=log2(counts+1), cond, geno) library("ggplot2") plotit <- function(d, title) { ggplot(d, aes(x=cond, y=log2c, group=geno)) + geom_jitter(size=1.5, position = position_jitter(width=.15)) + facet_wrap(~ geno) + stat_summary(fun.y=mean, geom="line", colour="red", size=0.8) + xlab("condition") + ylab("log2(counts+1)") + ggtitle(title) } plotit(d, "Gene 1") + ylim(7,13) lm(log2c ~ cond + geno + geno:cond, data=d) ## ----interFig2, echo=FALSE, results="hide", fig.height=3----------------- mu[4] <- 2^12 mu[6] <- 2^8 counts <- rnbinom(6*npg, mu=rep(mu,each=npg), size=1/.01) d2 <- data.frame(log2c=log2(counts + 1), cond, geno) plotit(d2, "Gene 2") + ylim(7,13) lm(log2c ~ cond + geno + geno:cond, data=d2) ## ----simpleLRT, eval=FALSE----------------------------------------------- # dds <- DESeq(dds, test="LRT", reduced=~1) # res <- results(dds) ## ----simpleLRT2, eval=FALSE---------------------------------------------- # dds <- DESeq(dds, test="LRT", reduced=~batch) # res <- results(dds) ## ----boxplotCooks-------------------------------------------------------- par(mar=c(8,5,2,2)) boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2) ## ----dispFit------------------------------------------------------------- plotDispEsts(dds) ## ----dispFitCustom------------------------------------------------------- ddsCustom <- dds useForMedian <- mcols(ddsCustom)$dispGeneEst > 1e-7 medianDisp <- median(mcols(ddsCustom)$dispGeneEst[useForMedian], na.rm=TRUE) dispersionFunction(ddsCustom) <- function(mu) medianDisp ddsCustom <- estimateDispersionsMAP(ddsCustom) ## ----filtByMean---------------------------------------------------------- metadata(res)$alpha metadata(res)$filterThreshold plot(metadata(res)$filterNumRej, type="b", ylab="number of rejections", xlab="quantiles of filter") lines(metadata(res)$lo.fit, col="red") abline(v=metadata(res)$filterTheta) ## ----noFilt-------------------------------------------------------------- resNoFilt <- results(dds, independentFiltering=FALSE) addmargins(table(filtering=(res$padj < .1), noFiltering=(resNoFilt$padj < .1))) ## ----lfcThresh----------------------------------------------------------- par(mfrow=c(2,2),mar=c(2,2,1,1)) ylim <- c(-2.5,2.5) resGA <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") resLA <- results(dds, lfcThreshold=.5, altHypothesis="lessAbs") resG <- results(dds, lfcThreshold=.5, altHypothesis="greater") resL <- results(dds, lfcThreshold=.5, altHypothesis="less") drawLines <- function() abline(h=c(-.5,.5),col="dodgerblue",lwd=2) plotMA(resGA, ylim=ylim); drawLines() plotMA(resLA, ylim=ylim); drawLines() plotMA(resG, ylim=ylim); drawLines() plotMA(resL, ylim=ylim); drawLines() ## ----mcols--------------------------------------------------------------- mcols(dds,use.names=TRUE)[1:4,1:4] substr(names(mcols(dds)),1,10) mcols(mcols(dds), use.names=TRUE)[1:4,] ## ----muAndCooks---------------------------------------------------------- head(assays(dds)[["mu"]]) head(assays(dds)[["cooks"]]) ## ----dispersions--------------------------------------------------------- head(dispersions(dds)) head(mcols(dds)$dispersion) ## ----sizefactors--------------------------------------------------------- sizeFactors(dds) ## ----coef---------------------------------------------------------------- head(coef(dds)) ## ----betaPriorVar-------------------------------------------------------- attr(dds, "betaPriorVar") ## ----priorInfo----------------------------------------------------------- priorInfo(resLFC) priorInfo(resApe) priorInfo(resAsh) ## ----dispPriorVar-------------------------------------------------------- dispersionFunction(dds) attr(dispersionFunction(dds), "dispPriorVar") ## ----versionNum---------------------------------------------------------- metadata(dds)[["version"]] ## ----normFactors, eval=FALSE--------------------------------------------- # normFactors <- normFactors / exp(rowMeans(log(normFactors))) # normalizationFactors(dds) <- normFactors ## ----offsetTransform, eval=FALSE----------------------------------------- # cqnOffset <- cqnObject$glm.offset # cqnNormFactors <- exp(cqnOffset) # EDASeqNormFactors <- exp(-1 * EDASeqOffset) ## ----lineardep, echo=FALSE----------------------------------------------- DataFrame(batch=factor(c(1,1,2,2)), condition=factor(c("A","A","B","B"))) ## ----lineardep2, echo=FALSE---------------------------------------------- DataFrame(batch=factor(c(1,1,1,1,2,2)), condition=factor(c("A","A","B","B","C","C"))) ## ----lineardep3, echo=FALSE---------------------------------------------- DataFrame(batch=factor(c(1,1,1,2,2,2)), condition=factor(c("A","B","C","A","B","C"))) ## ----groupeffect--------------------------------------------------------- coldata <- DataFrame(grp=factor(rep(c("X","Y"),each=6)), ind=factor(rep(1:6,each=2)), cnd=factor(rep(c("A","B"),6))) coldata ## ------------------------------------------------------------------------ as.data.frame(coldata) ## ----groupeffect2-------------------------------------------------------- coldata$ind.n <- factor(rep(rep(1:3,each=2),2)) as.data.frame(coldata) ## ----groupeffect3-------------------------------------------------------- model.matrix(~ grp + grp:ind.n + grp:cnd, coldata) ## ----groupeffect4, eval=FALSE-------------------------------------------- # results(dds, contrast=list("grpY.cndB","grpX.cndB")) ## ----missingcombo-------------------------------------------------------- group <- factor(rep(1:3,each=6)) condition <- factor(rep(rep(c("A","B","C"),each=2),3)) d <- DataFrame(group, condition)[-c(17,18),] as.data.frame(d) ## ----missingcombo2------------------------------------------------------- m1 <- model.matrix(~ condition*group, d) colnames(m1) unname(m1) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero ## ----missingcombo3------------------------------------------------------- idx <- which(all.zero) m1 <- m1[,-idx] unname(m1) ## ----cooksPlot----------------------------------------------------------- W <- res$stat maxCooks <- apply(assays(dds)[["cooks"]],1,max) idx <- !is.na(W) plot(rank(W[idx]), maxCooks[idx], xlab="rank of Wald statistic", ylab="maximum Cook's distance per gene", ylim=c(0,5), cex=.4, col=rgb(0,0,0,.3)) m <- ncol(dds) p <- 3 abline(h=qf(.99, p, m - p)) ## ----indFilt------------------------------------------------------------- plot(res$baseMean+1, -log10(res$pvalue), log="x", xlab="mean of normalized counts", ylab=expression(-log[10](pvalue)), ylim=c(0,30), cex=.4, col=rgb(0,0,0,.3)) ## ----histindepfilt------------------------------------------------------- use <- res$baseMean > metadata(res)$filterThreshold h1 <- hist(res$pvalue[!use], breaks=0:50/50, plot=FALSE) h2 <- hist(res$pvalue[use], breaks=0:50/50, plot=FALSE) colori <- c(`do not pass`="khaki", `pass`="powderblue") ## ----fighistindepfilt---------------------------------------------------- barplot(height = rbind(h1$counts, h2$counts), beside = FALSE, col = colori, space = 0, main = "", ylab="frequency") text(x = c(0, length(h1$counts)), y = 0, label = paste(c(0,1)), adj = c(0.5,1.7), xpd=NA) legend("topright", fill=rev(colori), legend=rev(names(colori))) ## ----vanillaDESeq, eval=FALSE-------------------------------------------- # dds <- DESeq(dds, minReplicatesForReplace=Inf) # res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE) ## ----varGroup, echo=FALSE------------------------------------------------ set.seed(3) dds1 <- makeExampleDESeqDataSet(n=1000,m=12,betaSD=.3,dispMeanRel=function(x) 0.01) dds2 <- makeExampleDESeqDataSet(n=1000,m=12, betaSD=.3, interceptMean=mcols(dds1)$trueIntercept, interceptSD=0, dispMeanRel=function(x) 0.2) dds2 <- dds2[,7:12] dds2$condition <- rep("C",6) mcols(dds2) <- NULL dds12 <- cbind(dds1, dds2) rld <- rlog(dds12, blind=FALSE, fitType="mean") plotPCA(rld) ## ----convertNA, eval=FALSE----------------------------------------------- # res$padj <- ifelse(is.na(res$padj), 1, res$padj) ## ----sessionInfo--------------------------------------------------------- sessionInfo() DESeq2/inst/doc/DESeq2.Rmd0000644000175400017540000031311313201671732016040 0ustar00biocbuildbiocbuild--- title: "Analyzing RNA-seq data with DESeq2" author: "Michael I. Love, Simon Anders, and Wolfgang Huber" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('DESeq2')`" abstract: > A basic task in the analysis of count data from RNA-seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each gene. Analogous data also arise for other assay types, including comparative ChIP-Seq, HiC, shRNA screening, mass spectrometry. An important analysis question is the quantification and statistical inference of systematic changes between conditions, as compared to within-condition variability. The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions This vignette explains the use of the package and demonstrates typical workflows. [An RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/) on the Bioconductor website covers similar material to this vignette but at a slower pace, including the generation of count matrices from FASTQ files. DESeq2 package version: `r packageVersion("DESeq2")` output: rmarkdown::html_document: highlight: pygments toc: true fig_width: 5 bibliography: library.bib vignette: > %\VignetteIndexEntry{Analyzing RNA-seq data with DESeq2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding[utf8]{inputenc} --- ```{r setup, echo=FALSE, results="hide"} knitr::opts_chunk$set(tidy=FALSE, cache=TRUE, dev="png", message=FALSE, error=FALSE, warning=TRUE) ``` # Standard workflow **Note:** if you use DESeq2 in published research, please cite: > Love, M.I., Huber, W., Anders, S. (2014) > Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. > *Genome Biology*, **15**:550. > [10.1186/s13059-014-0550-8](http://dx.doi.org/10.1186/s13059-014-0550-8) Other Bioconductor packages with similar aims are [edgeR](http://bioconductor.org/packages/edgeR), [limma](http://bioconductor.org/packages/limma), [DSS](http://bioconductor.org/packages/DSS), [EBSeq](http://bioconductor.org/packages/EBSeq), and [baySeq](http://bioconductor.org/packages/baySeq). ## Quick start Here we show the most basic steps for a differential expression analysis. There are a variety of steps upstream of DESeq2 that result in the generation of counts or estimated counts for each sample, which we will discuss in the sections below. This code chunk assumes that you have a count matrix called `cts` and a table of sample information called `coldata`. The `design` indicates how to model the samples, here, that we want to measure the effect of the condition, controlling for batch differences. The two factor variables `batch` and `condition` should be columns of `coldata`. ```{r quickStart, eval=FALSE} dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design= ~ batch + condition) dds <- DESeq(dds) res <- results(dds, contrast=c("condition","treat","ctrl")) resultsNames(dds) res <- lfcShrink(dds, coef=2) ``` The following starting functions will be explained below: * If you have transcript quantification files, as produced by *Salmon*, *Sailfish*, or *kallisto*, you would use *DESeqDataSetFromTximport*. * If you have *htseq-count* files, the first line would use *DESeqDataSetFromHTSeq*. * If you have a *RangedSummarizedExperiment*, the first line would use *DESeqDataSet*. ## How to get help for DESeq2 Any and all DESeq2 questions should be posted to the **Bioconductor support site**, which serves as a searchable knowledge base of questions and answers: Posting a question and tagging with "DESeq2" will automatically send an alert to the package authors to respond on the support site. See the first question in the list of [Frequently Asked Questions](#FAQ) (FAQ) for information about how to construct an informative post. You should **not** email your question to the package authors, as we will just reply that the question should be posted to the **Bioconductor support site**. ## Input data ### Why un-normalized counts? As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the *i*-th row and the *j*-th column of the matrix tells how many reads can be assigned to gene *i* in sample *j*. Analogously, for other types of assays, the rows of the matrix might correspond e.g. to binding regions (with ChIP-Seq) or peptide sequences (with quantitative mass spectrometry). We will list method for obtaining count matrices in sections below. The values in the matrix should be un-normalized counts or estimated counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq). The [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/) describes multiple techniques for preparing such count matrices. It is important to provide count matrices as input for DESeq2's statistical model [@Love2014] to hold, as only the count values allow assessing the measurement precision correctly. The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input. ### The DESeqDataSet The object class used by the DESeq2 package to store the read counts and the intermediate estimated quantities during statistical analysis is the *DESeqDataSet*, which will usually be represented in the code here as an object `dds`. A technical detail is that the *DESeqDataSet* class extends the *RangedSummarizedExperiment* class of the [SummarizedExperiment](http://bioconductor.org/packages/SummarizedExperiment) package. The "Ranged" part refers to the fact that the rows of the assay data (here, the counts) can be associated with genomic ranges (the exons of genes). This association facilitates downstream exploration of results, making use of other Bioconductor packages' range-based functionality (e.g. find the closest ChIP-seq peaks to the differentially expressed genes). A *DESeqDataSet* object must have an associated *design formula*. The design formula expresses the variables which will be used in modeling. The formula should be a tilde (~) followed by the variables with plus signs between them (it will be coerced into an *formula* if it is not already). The design can be changed later, however then all differential analysis steps should be repeated, as the design formula is used to estimate the dispersions and to estimate the log2 fold changes of the model. *Note*: In order to benefit from the default settings of the package, you should put the variable of interest at the end of the formula and make sure the control level is the first level. We will now show 4 ways of constructing a *DESeqDataSet*, depending on what pipeline was used upstream of DESeq2 to generated counts or estimated counts: 1) From [transcript abundance files and tximport](#tximport) 2) From a [count matrix](#countmat) 3) From [htseq-count files](#htseq) 4) From a [SummarizedExperiment](#se) object ### Transcript abundance files and *tximport* input A newer and recommended pipeline is to use fast transcript abundance quantifiers upstream of DESeq2, and then to create gene-level count matrices for use with DESeq2 by importing the quantification data using the [tximport](http://bioconductor.org/packages/tximport) package. This workflow allows users to import transcript abundance estimates from a variety of external software, including the following methods: * [Salmon](http://combine-lab.github.io/salmon/) [@Patro2017Salmon] * [Sailfish](http://www.cs.cmu.edu/~ckingsf/software/sailfish/) [@Patro2014Sailfish] * [kallisto](https://pachterlab.github.io/kallisto/about.html) [@Bray2016Near] * [RSEM](http://deweylab.github.io/RSEM/) [@Li2011RSEM] Some advantages of using the above methods for transcript abundance estimation are: (i) this approach corrects for potential changes in gene length across samples (e.g. from differential isoform usage) [@Trapnell2013Differential], (ii) some of these methods (*Salmon*, *Sailfish*, *kallisto*) are substantially faster and require less memory and disk usage compared to alignment-based methods that require creation and storage of BAM files, and (iii) it is possible to avoid discarding those fragments that can align to multiple genes with homologous sequence, thus increasing sensitivity [@Robert2015Errors]. Full details on the motivation and methods for importing transcript level abundance and count estimates, summarizing to gene-level count matrices and producing an offset which corrects for potential changes in average transcript length across samples are described in [@Soneson2015]. Note that the tximport-to-DESeq2 approach uses *estimated* gene counts from the transcript abundance quantifiers, but not *normalized* counts. A tutorial on how to use the *Salmon* software for quantifying transcript abundance can be found [here](https://combine-lab.github.io/salmon/getting_started/). We recommend using the `--gcBias` [flag](http://salmon.readthedocs.io/en/latest/salmon.html#gcbias) which estimates a correction factor for systematic biases commonly present in RNA-seq data [@Love2016Modeling; @Patro2017Salmon], unless you are certain that your data do not contain such bias. Here, we demonstrate how to import transcript abundances and construct of a gene-level *DESeqDataSet* object from *Salmon* `quant.sf` files, which are stored in the [tximportData](http://bioconductor.org/packages/tximportData) package. You do not need the `tximportData` package for your analysis, it is only used here for demonstration. Note that, instead of locating `dir` using *system.file*, a user would typically just provide a path, e.g. `/path/to/quant/files`. For a typical use, the `condition` information should already be present as a column of the sample table `samples`, while here we construct artificial condition labels for demonstration. ```{r txiSetup} library("tximport") library("readr") library("tximportData") dir <- system.file("extdata", package="tximportData") samples <- read.table(file.path(dir,"samples.txt"), header=TRUE) samples$condition <- factor(rep(c("A","B"),each=3)) rownames(samples) <- samples$run samples[,c("pop","center","run","condition")] ``` Next we specify the path to the files using the appropriate columns of `samples`, and we read in a table that links transcripts to genes for this dataset. ```{r txiFiles} files <- file.path(dir,"salmon", samples$run, "quant.sf") names(files) <- samples$run tx2gene <- read.csv(file.path(dir, "tx2gene.csv")) ``` We import the necessary quantification data for DESeq2 using the *tximport* function. For further details on use of *tximport*, including the construction of the `tx2gene` table for linking transcripts to genes in your dataset, please refer to the [tximport](http://bioconductor.org/packages/tximport) package vignette. ```{r tximport, results="hide"} txi <- tximport(files, type="salmon", tx2gene=tx2gene) ``` Finally, we can construct a *DESeqDataSet* from the `txi` object and sample information in `samples`. ```{r txi2dds, results="hide"} library("DESeq2") ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ condition) ``` The `ddsTxi` object here can then be used as `dds` in the following analysis steps. ### Count matrix input Alternatively, the function *DESeqDataSetFromMatrix* can be used if you already have a matrix of read counts prepared from another source. Another method for quickly producing count matrices from alignment files is the *featureCounts* function [@Liao2013feature] in the [Rsubread](http://bioconductor.org/packages/Rsubread) package. To use *DESeqDataSetFromMatrix*, the user should provide the counts matrix, the information about the samples (the columns of the count matrix) as a *DataFrame* or *data.frame*, and the design formula. To demonstate the use of *DESeqDataSetFromMatrix*, we will read in count data from the [pasilla](http://bioconductor.org/packages/pasilla) package. We read in a count matrix, which we will name `cts`, and the sample information table, which we will name `coldata`. Further below we describe how to extract these objects from, e.g. *featureCounts* output. ```{r loadPasilla} library("pasilla") pasCts <- system.file("extdata", "pasilla_gene_counts.tsv", package="pasilla", mustWork=TRUE) pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv", package="pasilla", mustWork=TRUE) cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id")) coldata <- read.csv(pasAnno, row.names=1) coldata <- coldata[,c("condition","type")] ``` We examine the count matrix and column data to see if they are consistent in terms of sample order. ```{r showPasilla} head(cts,2) coldata ``` Note that these are not in the same order with respect to samples! It is absolutely critical that the columns of the count matrix and the rows of the column data (information about samples) are in the same order. DESeq2 will not make guesses as to which column of the count matrix belongs to which row of the column data, these must be provided to DESeq2 already in consistent order. As they are not in the correct order as given, we need to re-arrange one or the other so that they are consistent in terms of sample order (if we do not, later functions would produce an error). We additionally need to chop off the `"fb"` of the row names of `coldata`, so the naming is consistent. ```{r reorderPasila} rownames(coldata) <- sub("fb", "", rownames(coldata)) all(rownames(coldata) %in% colnames(cts)) all(rownames(coldata) == colnames(cts)) cts <- cts[, rownames(coldata)] all(rownames(coldata) == colnames(cts)) ``` If you have used the *featureCounts* function [@Liao2013feature] in the [Rsubread](http://bioconductor.org/packages/Rsubread) package, the matrix of read counts can be directly provided from the `"counts"` element in the list output. The count matrix and column data can typically be read into R from flat files using base R functions such as *read.csv* or *read.delim*. For *htseq-count* files, see the dedicated input function below. With the count matrix, `cts`, and the sample information, `coldata`, we can construct a *DESeqDataSet*: ```{r matrixInput} library("DESeq2") dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition) dds ``` If you have additional feature data, it can be added to the *DESeqDataSet* by adding to the metadata columns of a newly constructed object. (Here we add redundant data just for demonstration, as the gene names are already the rownames of the `dds`.) ```{r addFeatureData} featureData <- data.frame(gene=rownames(cts)) mcols(dds) <- DataFrame(mcols(dds), featureData) mcols(dds) ``` ### *htseq-count* input You can use the function *DESeqDataSetFromHTSeqCount* if you have used *htseq-count* from the [HTSeq](http://www-huber.embl.de/users/anders/HTSeq) python package [@Anders:2014:htseq]. For an example of using the python scripts, see the [pasilla](http://bioconductor.org/packages/pasilla) data package. First you will want to specify a variable which points to the directory in which the *htseq-count* output files are located. ```{r htseqDirI, eval=FALSE} directory <- "/path/to/your/files/" ``` However, for demonstration purposes only, the following line of code points to the directory for the demo *htseq-count* output files packages for the [pasilla](http://bioconductor.org/packages/pasilla) package. ```{r htseqDirII} directory <- system.file("extdata", package="pasilla", mustWork=TRUE) ``` We specify which files to read in using *list.files*, and select those files which contain the string `"treated"` using *grep*. The *sub* function is used to chop up the sample filename to obtain the condition status, or you might alternatively read in a phenotypic table using *read.table*. ```{r htseqInput} sampleFiles <- grep("treated",list.files(directory),value=TRUE) sampleCondition <- sub("(.*treated).*","\\1",sampleFiles) sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition) ``` Then we build the *DESeqDataSet* using the following function: ```{r hsteqDds} library("DESeq2") ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition) ddsHTSeq ``` ### *SummarizedExperiment* input An example of the steps to produce a *RangedSummarizedExperiment* can be found in the [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/) and in the vignette for the data package [airway](http://bioconductor.org/packages/airway). Here we load the *RangedSummarizedExperiment* from that package in order to build a *DESeqDataSet*. ```{r loadSumExp} library("airway") data("airway") se <- airway ``` The constructor function below shows the generation of a *DESeqDataSet* from a *RangedSummarizedExperiment* `se`. ```{r sumExpInput} library("DESeq2") ddsSE <- DESeqDataSet(se, design = ~ cell + dex) ddsSE ``` ### Pre-filtering While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the `dds` data object, and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to keep only rows that have at least 10 reads total. Note that more strict filtering to increase power is *automatically* applied via [independent filtering](#indfilt) on the mean of normalized counts within the *results* function. ```{r prefilter} keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] ``` ### Note on factor levels By default, R will choose a *reference level* for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell *results* which comparison to make using the `contrast` argument (this will be shown later), or you can explicitly set the factors levels. You should only change the factor levels of variables in the design **before** running the DESeq2 analysis, not during or afterward. Setting the factor levels can be done in two ways, either using factor: ```{r factorlvl} dds$condition <- factor(dds$condition, levels = c("untreated","treated")) ``` ...or using *relevel*, just specifying the reference level: ```{r relevel} dds$condition <- relevel(dds$condition, ref = "untreated") ``` If you need to subset the columns of a *DESeqDataSet*, i.e., when removing certain samples from the analysis, it is possible that all the samples for one or more levels of a variable in the design formula would be removed. In this case, the *droplevels* function can be used to remove those levels which do not have samples in the current *DESeqDataSet*: ```{r droplevels} dds$condition <- droplevels(dds$condition) ``` ### Collapsing technical replicates DESeq2 provides a function *collapseReplicates* which can assist in combining the counts from technical replicates into single columns of the count matrix. The term *technical replicate* implies multiple sequencing runs of the same library. You should not collapse biological replicates using this function. See the manual page for an example of the use of *collapseReplicates*. ### About the pasilla dataset We continue with the [pasilla](http://bioconductor.org/packages/pasilla) data constructed from the count matrix method above. This data set is from an experiment on *Drosophila melanogaster* cell cultures and investigated the effect of RNAi knock-down of the splicing factor *pasilla* [@Brooks2010]. The detailed transcript of the production of the [pasilla](http://bioconductor.org/packages/pasilla) data is provided in the vignette of the data package [pasilla](http://bioconductor.org/packages/pasilla). ## Differential expression analysis The standard differential expression analysis steps are wrapped into a single function, *DESeq*. The estimation steps performed by this function are described [below](#theory), in the manual page for `?DESeq` and in the Methods section of the DESeq2 publication [@Love2014]. Results tables are generated using the function *results*, which extracts a results table with log2 fold changes, *p* values and adjusted *p* values. With no additional arguments to *results*, the log2 fold change and Wald test *p* value will be for the last variable in the design formula, and if this is a factor, the comparison will be the last level of this variable over the first level. However, the order of the variables of the design do not matter so long as the user specifies the comparison using the `name` or `contrast` arguments of *results* (described later and in `?results`). Details about the comparison are printed to the console, above the results table. The text, `condition treated vs untreated`, tells you that the estimates are of the logarithmic fold change log2(treated/untreated). ```{r deseq} dds <- DESeq(dds) res <- results(dds) res ``` In previous versions of DESeq2, the *DESeq* function by default would produce moderated, or shrunken, log2 fold changes through the use of the `betaPrior` argument. In version 1.16 and higher, we have split the moderation of log2 fold changes into a separate function, *lfcShrink*, for reasons described in the [changes section](#changes) below. Here we provide the `dds` object and the number of the coefficient we want to moderate. It is also possible to specify a `contrast`, instead of `coef`, which works the same as the `contrast` argument of the *results* function. If a results object is provided, the `log2FoldChange` column will be swapped out, otherwise *lfcShrink* returns a vector of shrunken log2 fold changes. ```{r lfcShrink} resultsNames(dds) resLFC <- lfcShrink(dds, coef=2) resLFC ``` The above steps should take less than 30 seconds for most analyses. For experiments with many samples (e.g. 100 samples), one can take advantage of parallelized computation. Parallelizing `DESeq`, `results`, and `lfcShrink` can be easily accomplished by loading the BiocParallel package, and then setting the following arguments: `parallel=TRUE` and `BPPARAM=MulticoreParam(4)`, for example, splitting the job over 4 cores. Note that `results` for coefficients or contrasts listed in `resultsNames(dds)` is fast and will not need parallelization. As an alternative to `BPPARAM`, one can `register` cores at the beginning of an analysis, and then just specify `parallel=TRUE` to the functions when called. ```{r parallel, eval=FALSE} library("BiocParallel") register(MulticoreParam(4)) ``` We can order our results table by the smallest *p* value: ```{r resOrder} resOrdered <- res[order(res$pvalue),] ``` We can summarize some basic tallies using the *summary* function. ```{r sumRes} summary(res) ``` How many adjusted p-values were less than 0.1? ```{r sumRes01} sum(res$padj < 0.1, na.rm=TRUE) ``` The *results* function contains a number of arguments to customize the results table which is generated. You can read about these arguments by looking up `?results`. Note that the *results* function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted *p* value below a given FDR cutoff, `alpha`. Independent filtering is further discussed [below](#indfilt). By default the argument `alpha` is set to $0.1$. If the adjusted *p* value cutoff will be a value other than $0.1$, `alpha` should be set to that value: ```{r resAlpha05} res05 <- results(dds, alpha=0.05) summary(res05) sum(res05$padj < 0.05, na.rm=TRUE) ``` A generalization of the idea of *p* value filtering is to *weight* hypotheses to optimize power. A Bioconductor package, [IHW](http://bioconductor.org/packages/IHW), is available that implements the method of *Independent Hypothesis Weighting* [@Ignatiadis2016]. Here we show the use of *IHW* for *p* value adjustment of DESeq2 results. For more details, please see the vignette of the [IHW](http://bioconductor.org/packages/IHW) package. The *IHW* result object is stored in the metadata. **Note:** If the results of independent hypothesis weighting are used in published research, please cite: > Ignatiadis, N., Klaus, B., Zaugg, J.B., Huber, W. (2016) > Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. > *Nature Methods*, **13**:7. > [10.1038/nmeth.3885](http://dx.doi.org/10.1038/nmeth.3885) ```{r IHW} library("IHW") resIHW <- results(dds, filterFun=ihw) summary(resIHW) sum(resIHW$padj < 0.1, na.rm=TRUE) metadata(resIHW)$ihwResult ``` If a multi-factor design is used, or if the variable in the design formula has more than two levels, the `contrast` argument of *results* can be used to extract different comparisons from the *DESeqDataSet* returned by *DESeq*. The use of the `contrast` argument is further discussed [below](#contrasts). For advanced users, note that all the values calculated by the DESeq2 package are stored in the *DESeqDataSet* object, and access to these values is discussed [below](#access). ## Exploring and exporting results ### MA-plot In DESeq2, the function *plotMA* shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples in the *DESeqDataSet*. Points will be colored red if the adjusted *p* value is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down. ```{r MA} plotMA(res, ylim=c(-2,2)) ``` It is more useful visualize the MA-plot for the shrunken log2 fold changes, which remove the noise associated with log2 fold changes from low count genes without requiring arbitrary filtering thresholds. ```{r shrunkMA} plotMA(resLFC, ylim=c(-2,2)) ``` After calling *plotMA*, one can use the function *identify* to interactively detect the row number of individual genes by clicking on the plot. One can then recover the gene identifiers by saving the resulting indices: ```{r MAidentify, eval=FALSE} idx <- identify(res$baseMean, res$log2FoldChange) rownames(res)[idx] ``` ### Alternative shrinkage estimators The moderated log fold changes proposed by @Love2014 use a normal prior distribution, centered on zero and with a scale that is fit to the data. The shrunken log fold changes are useful for ranking and visualization, without the need for arbitrary filters on low count genes. The normal prior can sometimes produce too strong of shrinkage for certain datasets. In DESeq2 version 1.18, we include two additional adaptive shrinkage estimators, available via the `type` argument of `lfcShrink`. For more details, see `?lfcShrink` The options for `type` are: * `normal` is the the original DESeq2 shrinkage estimator, an adaptive normal prior * `apeglm` is the adaptive t prior shrinkage estimator from the [apeglm](http://bioconductor.org/packages/apeglm) package * `ashr` is the adaptive shrinkage estimator from the [ashr](https://github.com/stephens999/ashr) package [@Stephens2016]. Here DESeq2 uses the ashr option to fit a mixture of normal distributions to form the prior, with `method="shrinkage"` **Note:** if the shrinkage estimator `type="ashr"` is used in published research, please cite: > Stephens, M. (2016) > False discovery rates: a new deal. *Biostatistics*, **18**:2. > [10.1093/biostatistics/kxw041](https://doi.org/10.1093/biostatistics/kxw041) ```{r warning=FALSE} resApe <- lfcShrink(dds, coef=2, type="apeglm") resAsh <- lfcShrink(dds, coef=2, type="ashr") ``` ```{r fig.width=8, fig.height=3} par(mfrow=c(1,3), mar=c(4,4,2,1)) xlim <- c(1,1e5); ylim <- c(-3,3) plotMA(resLFC, xlim=xlim, ylim=ylim, main="normal") plotMA(resApe, xlim=xlim, ylim=ylim, main="apeglm") plotMA(resAsh, xlim=xlim, ylim=ylim, main="ashr") ``` **Note:** due to the nature of the statistical model and optimization approach, `apeglm` is usually a factor of ~5 slower than `normal`. For example, with 10,000 genes and 10 samples, `normal` may take ~3 seconds, while `apeglm` takes ~15 seconds (on a laptop). However, `apeglm` can be more than an order of magnitude slower when there are many coefficients, e.g. 10 or more coefficients in `resultsNames(dds)`. The method `ashr` is fairly fast and does not depend on the number of coefficients, as it uses only the estimated MLE coefficients and their standard errors. A solution for speeding up `normal` and `apeglm` is to use multiple cores. This can be easily accomplished by loading the BiocParallel package, and then setting the following arguments of `lfcShrink`: `parallel=TRUE` and `BPPARAM=MulticoreParam(4)`, for example, splitting the job over 4 cores. This approach can also be used with `DESeq` and `results`, as mentioned [above](#parallel). **Note:** If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this, which can be accommodated in DESeq2 by including in the design any known batch variables or by using functions/packages such as `svaseq` in [sva](http://bioconductor.org/packages/sva) [@Leek2014] or the `RUV` functions in [RUVSeq](http://bioconductor.org/packages/RUVSeq) [@Risso2014] to estimate variables that capture the unwanted variation. In addition, the ashr developers have a [specific method](https://github.com/dcgerard/vicar) for accounting for unwanted variation in combination with ashr [@Gerard2017]. ### Plot counts It can also be useful to examine the counts of reads for a single gene across the groups. A simple function for making this plot is *plotCounts*, which normalizes counts by sequencing depth and adds a pseudocount of 1/2 to allow for log scale plotting. The counts are grouped by the variables in `intgroup`, where more than one variable can be specified. Here we specify the gene which had the smallest *p* value from the results table created above. You can select the gene to plot by rowname or by numeric index. ```{r plotCounts} plotCounts(dds, gene=which.min(res$padj), intgroup="condition") ``` For customized plotting, an argument `returnData` specifies that the function should only return a *data.frame* for plotting with *ggplot*. ```{r plotCountsAdv} d <- plotCounts(dds, gene=which.min(res$padj), intgroup="condition", returnData=TRUE) library("ggplot2") ggplot(d, aes(x=condition, y=count)) + geom_point(position=position_jitter(w=0.1,h=0)) + scale_y_log10(breaks=c(25,100,400)) ``` ### More information on results columns Information about which variables and tests were used can be found by calling the function *mcols* on the results object. ```{r metadata} mcols(res)$description ``` For a particular gene, a log2 fold change of -1 for `condition treated vs untreated` means that the treatment induces a multiplicative change in observed gene expression level of $2^{-1} = 0.5$ compared to the untreated condition. If the variable of interest is continuous-valued, then the reported log2 fold change is per unit of change of that variable. **Note on p-values set to NA**: some values in the results table can be set to `NA` for one of the following reasons: * If within a row, all samples have zero counts, the `baseMean` column will be zero, and the log2 fold change estimates, *p* value and adjusted *p* value will all be set to `NA`. * If a row contains a sample with an extreme count outlier then the *p* value and adjusted *p* value will be set to `NA`. These outlier counts are detected by Cook's distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described [below](#outlier) * If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted *p* value will be set to `NA`. Description and customization of independent filtering is described [below](#indfilt) ### Rich visualization and reporting of results **ReportingTools.** An HTML report of the results with plots and sortable/filterable columns can be generated using the [ReportingTools](http://bioconductor.org/packages/ReportingTools) package on a *DESeqDataSet* that has been processed by the *DESeq* function. For a code example, see the *RNA-seq differential expression* vignette at the [ReportingTools](http://bioconductor.org/packages/ReportingTools) page, or the manual page for the *publish* method for the *DESeqDataSet* class. **regionReport.** An HTML and PDF summary of the results with plots can also be generated using the [regionReport](http://bioconductor.org/packages/regionReport) package. The *DESeq2Report* function should be run on a *DESeqDataSet* that has been processed by the *DESeq* function. For more details see the manual page for *DESeq2Report* and an example vignette in the [regionReport](http://bioconductor.org/packages/regionReport) package. **Glimma.** Interactive visualization of DESeq2 output, including MA-plots (also called MD-plot) can be generated using the [Glimma](http://bioconductor.org/packages/Glimma) package. See the manual page for *glMDPlot.DESeqResults*. **pcaExplorer.** Interactive visualization of DESeq2 output, including PCA plots, boxplots of counts and other useful summaries can be generated using the [pcaExplorer](http://bioconductor.org/packages/pcaExplorer) package. See the *Launching the application* section of the package vignette. ### Exporting results to CSV files A plain-text file of the results can be exported using the base R functions *write.csv* or *write.delim*. We suggest using a descriptive file name indicating the variable and levels which were tested. ```{r export, eval=FALSE} write.csv(as.data.frame(resOrdered), file="condition_treated_results.csv") ``` Exporting only the results which pass an adjusted *p* value threshold can be accomplished with the *subset* function, followed by the *write.csv* function. ```{r subset} resSig <- subset(resOrdered, padj < 0.1) resSig ``` ## Multi-factor designs Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables. In fact, DESeq2 can analyze any possible experimental design that can be expressed with fixed effects terms (multiple factors, designs with interactions, designs with continuous variables, splines, and so on are all possible). By adding variables to the design, one can control for additional variation in the counts. For example, if the condition samples are balanced across experimental batches, by including the `batch` factor to the design, one can increase the sensitivity for finding differences due to `condition`. There are multiple ways to analyze experiments when the additional variables are of interest and not just controlling factors (see [section on interactions](#interactions)). The data in the [pasilla](http://bioconductor.org/packages/pasilla) package have a condition of interest (the column `condition`), as well as information on the type of sequencing which was performed (the column `type`), as we can see below: ```{r multifactor} colData(dds) ``` We create a copy of the *DESeqDataSet*, so that we can rerun the analysis using a multi-factor design. ```{r copyMultifactor} ddsMF <- dds ``` We change the levels of `type` so it only contains letters (numbers, underscore and period are also allowed in design factor levels). Be careful when changing level names to use the same order as the current levels. ```{r fixLevels} levels(ddsMF$type) levels(ddsMF$type) <- sub("-.*", "", levels(ddsMF$type)) levels(ddsMF$type) ``` We can account for the different types of sequencing, and get a clearer picture of the differences attributable to the treatment. As `condition` is the variable of interest, we put it at the end of the formula. Thus the *results* function will by default pull the `condition` results unless `contrast` or `name` arguments are specified. Then we can re-run *DESeq*: ```{r replaceDesign} design(ddsMF) <- formula(~ type + condition) ddsMF <- DESeq(ddsMF) ``` Again, we access the results using the *results* function. ```{r multiResults} resMF <- results(ddsMF) head(resMF) ``` It is also possible to retrieve the log2 fold changes, *p* values and adjusted *p* values of the `type` variable. The `contrast` argument of the function *results* takes a character vector of length three: the name of the variable, the name of the factor level for the numerator of the log2 ratio, and the name of the factor level for the denominator. The `contrast` argument can also take other forms, as described in the help page for *results* and [below](#contrasts) ```{r multiTypeResults} resMFType <- results(ddsMF, contrast=c("type", "single", "paired")) head(resMFType) ``` If the variable is continuous or an interaction term (see [section on interactions](#interactions)) then the results can be extracted using the `name` argument to *results*, where the name is one of elements returned by `resultsNames(dds)`. # Data transformations and visualization ## Count data transformations In order to test for differential expression, we operate on raw counts and use discrete distributions as described in the previous section on differential expression. However for other downstream analyses -- e.g. for visualization or clustering -- it might be useful to work with transformed versions of the count data. Maybe the most obvious choice of transformation is the logarithm. Since count values for a gene can be zero in some conditions (and non-zero in others), some advocate the use of *pseudocounts*, i.e. transformations of the form: $$ y = \log_2(n + n_0) $$ where *n* represents the count values and $n_0$ is a positive constant. In this section, we discuss two alternative approaches that offer more theoretical justification and a rational way of choosing parameters equivalent to $n_0$ above. One makes use of the concept of variance stabilizing transformations (VST) [@Tibshirani1988; @sagmb2003; @Anders:2010:GB], and the other is the *regularized logarithm* or *rlog*, which incorporates a prior on the sample differences [@Love2014]. Both transformations produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors. The point of these two transformations, the VST and the *rlog*, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low. Both VST and *rlog* use the experiment-wide trend of variance over mean, in order to transform the data to remove the experiment-wide trend. Note that we do not require or desire that all the genes have *exactly* the same variance after transformation. Indeed, in a figure below, you will see that after the transformations the genes with the same mean do not have exactly the same standard deviations, but that the experiment-wide trend has flattened. It is those genes with row variance above the trend which will allow us to cluster samples into interesting groups. **Note on running time:** if you have many samples (e.g. 100s), the *rlog* function might take too long, and so the *vst* function will be a faster choice. The rlog and VST have similar properties, but the rlog requires fitting a shrinkage term for each sample and each gene which takes time. See the DESeq2 paper for more discussion on the differences [@Love2014]. ### Blind dispersion estimation The two functions, *vst* and *rlog* have an argument `blind`, for whether the transformation should be blind to the sample information specified by the design formula. When `blind` equals `TRUE` (the default), the functions will re-estimate the dispersions using only an intercept. This setting should be used in order to compare samples in a manner wholly unbiased by the information about experimental groups, for example to perform sample QA (quality assurance) as demonstrated below. However, blind dispersion estimation is not the appropriate choice if one expects that many or the majority of genes (rows) will have large differences in counts which are explainable by the experimental design, and one wishes to transform the data for downstream analysis. In this case, using blind dispersion estimation will lead to large estimates of dispersion, as it attributes differences due to experimental design as unwanted *noise*, and will result in overly shrinking the transformed values towards each other. By setting `blind` to `FALSE`, the dispersions already estimated will be used to perform transformations, or if not present, they will be estimated using the current design formula. Note that only the fitted dispersion estimates from mean-dispersion trend line are used in the transformation (the global dependence of dispersion on mean for the entire experiment). So setting `blind` to `FALSE` is still for the most part not using the information about which samples were in which experimental group in applying the transformation. ### Extracting transformed values These transformation functions return an object of class *DESeqTransform* which is a subclass of *RangedSummarizedExperiment*. For ~20 samples, running on a newly created `DESeqDataSet`, *rlog* may take 30 seconds, while *vst* takes less than 1 second. The running times are shorter when using `blind=FALSE` and if the function *DESeq* has already been run, because then it is not necessary to re-estimate the dispersion values. The *assay* function is used to extract the matrix of normalized values. ```{r rlogAndVST} vsd <- vst(dds, blind=FALSE) rld <- rlog(dds, blind=FALSE) head(assay(vsd), 3) ``` ### Variance stabilizing transformation Above, we used a parametric fit for the dispersion. In this case, the closed-form expression for the variance stabilizing transformation is used by the *vst* function. If a local fit is used (option `fitType="locfit"` to *estimateDispersions*) a numerical integration is used instead. The transformed data should be approximated variance stabilized and also includes correction for size factors or normalization factors. The transformed data is on the log2 scale for large counts. ### Regularized log transformation The function *rlog*, stands for *regularized log*, transforming the original count data to the log2 scale by fitting a model with a term for each sample and a prior distribution on the coefficients which is estimated from the data. This is the same kind of shrinkage (sometimes referred to as regularization, or moderation) of log fold changes used by the *DESeq* and *nbinomWaldTest*. The resulting data contains elements defined as: $$ \log_2(q_{ij}) = \beta_{i0} + \beta_{ij} $$ where $q_{ij}$ is a parameter proportional to the expected true concentration of fragments for gene *i* and sample *j* (see formula [below](#theory)), $\beta_{i0}$ is an intercept which does not undergo shrinkage, and $\beta_{ij}$ is the sample-specific effect which is shrunk toward zero based on the dispersion-mean trend over the entire dataset. The trend typically captures high dispersions for low counts, and therefore these genes exhibit higher shrinkage from the *rlog*. Note that, as $q_{ij}$ represents the part of the mean value $\mu_{ij}$ after the size factor $s_j$ has been divided out, it is clear that the rlog transformation inherently accounts for differences in sequencing depth. Without priors, this design matrix would lead to a non-unique solution, however the addition of a prior on non-intercept betas allows for a unique solution to be found. ### Effects of transformations on the variance The figure below plots the standard deviation of the transformed data, across samples, against the mean, using the shifted logarithm transformation, the regularized log transformation and the variance stabilizing transformation. The shifted logarithm has elevated standard deviation in the lower count range, and the regularized log to a lesser extent, while for the variance stabilized data the standard deviation is roughly constant along the whole dynamic range. Note that the vertical axis in such plots is the square root of the variance over all samples, so including the variance due to the experimental conditions. While a flat curve of the square root of variance over the mean may seem like the goal of such transformations, this may be unreasonable in the case of datasets with many true differences due to the experimental conditions. ```{r meansd} # this gives log2(n + 1) ntd <- normTransform(dds) library("vsn") meanSdPlot(assay(ntd)) meanSdPlot(assay(vsd)) meanSdPlot(assay(rld)) ``` ## Data quality assessment by sample clustering and visualization Data quality assessment and quality control (i.e. the removal of insufficiently good data) are essential steps of any data analysis. These steps should typically be performed very early in the analysis of a new data set, preceding or in parallel to the differential expression testing. We define the term *quality* as *fitness for purpose*. Our purpose is the detection of differentially expressed genes, and we are looking in particular for samples whose experimental treatment suffered from an anormality that renders the data points obtained from these particular samples detrimental to our purpose. ### Heatmap of the count matrix To explore a count matrix, it is often instructive to look at it as a heatmap. Below we show how to produce such a heatmap for various transformations of the data. ```{r heatmap} library("pheatmap") select <- order(rowMeans(counts(dds,normalized=TRUE)), decreasing=TRUE)[1:20] df <- as.data.frame(colData(dds)[,c("condition","type")]) pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) pheatmap(assay(vsd)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) pheatmap(assay(rld)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) ``` ### Heatmap of the sample-to-sample distances Another use of the transformed data is sample clustering. Here, we apply the *dist* function to the transpose of the transformed count matrix to get sample-to-sample distances. ```{r sampleClust} sampleDists <- dist(t(assay(vsd))) ``` A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples. We have to provide a hierarchical clustering `hc` to the heatmap function based on the sample distances, or else the heatmap function would calculate a clustering based on the distances between the rows/columns of the distance matrix. ```{r figHeatmapSamples, fig.height=4, fig.width=6} library("RColorBrewer") sampleDistMatrix <- as.matrix(sampleDists) rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-") colnames(sampleDistMatrix) <- NULL colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) pheatmap(sampleDistMatrix, clustering_distance_rows=sampleDists, clustering_distance_cols=sampleDists, col=colors) ``` ### Principal component plot of the samples Related to the distance matrix is the PCA plot, which shows the samples in the 2D plane spanned by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects. ```{r figPCA} plotPCA(vsd, intgroup=c("condition", "type")) ``` It is also possible to customize the PCA plot using the *ggplot* function. ```{r figPCA2} pcaData <- plotPCA(vsd, intgroup=c("condition", "type"), returnData=TRUE) percentVar <- round(100 * attr(pcaData, "percentVar")) ggplot(pcaData, aes(PC1, PC2, color=condition, shape=type)) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar[1],"% variance")) + ylab(paste0("PC2: ",percentVar[2],"% variance")) + coord_fixed() ``` # Variations to the standard workflow ## Wald test individual steps The function *DESeq* runs the following functions in order: ```{r WaldTest, eval=FALSE} dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) ``` ## Contrasts A contrast is a linear combination of estimated log2 fold changes, which can be used to test if differences between groups are equal to zero. The simplest use case for contrasts is an experimental design containing a factor with three levels, say A, B and C. Contrasts enable the user to generate results for all 3 possible differences: log2 fold change of B vs A, of C vs A, and of C vs B. The `contrast` argument of *results* function is used to extract test results of log2 fold changes of interest, for example: ```{r simpleContrast, eval=FALSE} results(dds, contrast=c("condition","C","B")) ``` Log2 fold changes can also be added and subtracted by providing a `list` to the `contrast` argument which has two elements: the names of the log2 fold changes to add, and the names of the log2 fold changes to subtract. The names used in the list should come from `resultsNames(dds)`. Alternatively, a numeric vector of the length of `resultsNames(dds)` can be provided, for manually specifying the linear combination of terms. Demonstrations of the use of contrasts for various designs can be found in the examples section of the help page for the *results* function. The mathematical formula that is used to generate the contrasts can be found [below](#theory). ## Interactions Interaction terms can be added to the design formula, in order to test, for example, if the log2 fold change attributable to a given condition is *different* based on another factor, for example if the condition effect differs across genotype. Many users begin to add interaction terms to the design formula, when in fact a much simpler approach would give all the results tables that are desired. We will explain this approach first, because it is much simpler to perform. If the comparisons of interest are, for example, the effect of a condition for different sets of samples, a simpler approach than adding interaction terms explicitly to the design formula is to perform the following steps: * combine the factors of interest into a single factor with all combinations of the original factors * change the design to include just this factor, e.g. ~ group Using this design is similar to adding an interaction term, in that it models multiple condition effects which can be easily extracted with *results*. Suppose we have two factors `genotype` (with values I, II, and III) and `condition` (with values A and B), and we want to extract the condition effect specifically for each genotype. We could use the following approach to obtain, e.g. the condition effect for genotype I: ```{r combineFactors, eval=FALSE} dds$group <- factor(paste0(dds$genotype, dds$condition)) design(dds) <- ~ group dds <- DESeq(dds) resultsNames(dds) results(dds, contrast=c("group", "IB", "IA")) ``` The following two plots diagram hypothetical genotype-specific condition effects, which could be modeled with interaction terms by using a design of `~genotype + condition + genotype:condition`. In the first plot (Gene 1), note that the condition effect is consistent across genotypes. Although condition A has a different baseline for I,II, and III, the condition effect is a log2 fold change of about 2 for each genotype. Using a model with an interaction term `genotype:condition`, the interaction terms for genotype II and genotype III will be nearly 0. Here, the y-axis represents log2(n+1), and each group has 20 samples (black dots). A red line connects the mean of the groups within each genotype. ```{r interFig, echo=FALSE, results="hide", fig.height=3} npg <- 20 mu <- 2^c(8,10,9,11,10,12) cond <- rep(rep(c("A","B"),each=npg),3) geno <- rep(c("I","II","III"),each=2*npg) table(cond, geno) counts <- rnbinom(6*npg, mu=rep(mu,each=npg), size=1/.01) d <- data.frame(log2c=log2(counts+1), cond, geno) library("ggplot2") plotit <- function(d, title) { ggplot(d, aes(x=cond, y=log2c, group=geno)) + geom_jitter(size=1.5, position = position_jitter(width=.15)) + facet_wrap(~ geno) + stat_summary(fun.y=mean, geom="line", colour="red", size=0.8) + xlab("condition") + ylab("log2(counts+1)") + ggtitle(title) } plotit(d, "Gene 1") + ylim(7,13) lm(log2c ~ cond + geno + geno:cond, data=d) ``` In the second plot (Gene 2), we can see that the condition effect is not consistent across genotype. Here the main condition effect (the effect for the reference genotype I) is again 2. However, this time the interaction terms will be around 1 for genotype II and -4 for genotype III. This is because the condition effect is higher by 1 for genotype II compared to genotype I, and lower by 4 for genotype III compared to genotype I. The condition effect for genotype II (or III) is obtained by adding the main condition effect and the interaction term for that genotype. Such a plot can be made using the *plotCounts* function as shown above. ```{r interFig2, echo=FALSE, results="hide", fig.height=3} mu[4] <- 2^12 mu[6] <- 2^8 counts <- rnbinom(6*npg, mu=rep(mu,each=npg), size=1/.01) d2 <- data.frame(log2c=log2(counts + 1), cond, geno) plotit(d2, "Gene 2") + ylim(7,13) lm(log2c ~ cond + geno + geno:cond, data=d2) ``` Now we will continue to explain the use of interactions in order to test for *differences* in condition effects. We continue with the example of condition effects across three genotypes (I, II, and III). The key point to remember about designs with interaction terms is that, unlike for a design `~genotype + condition`, where the condition effect represents the *overall* effect controlling for differences due to genotype, by adding `genotype:condition`, the main condition effect only represents the effect of condition for the *reference level* of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms `genotypeII.conditionB` and `genotypeIII.conditionB` give the *difference* between the condition effect for a given genotype and the condition effect for the reference genotype. This genotype-condition interaction example is examined in further detail in Example 3 in the help page for *results*, which can be found by typing `?results`. In particular, we show how to test for differences in the condition effect across genotype, and we show how to obtain the condition effect for non-reference genotypes. ## Time-series experiments There are a number of ways to analyze time-series experiments, depending on the biological question of interest. In order to test for any differences over multiple time points, once can use a design including the time factor, and then test using the likelihood ratio test as described in the following section, where the time factor is removed in the reduced formula. For a control and treatment time series, one can use a design formula containing the condition factor, the time factor, and the interaction of the two. In this case, using the likelihood ratio test with a reduced model which does not contain the interaction terms will test whether the condition induces a change in gene expression at any time point after the reference level time point (time 0). An example of the later analysis is provided in our [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene). ## Likelihood ratio test DESeq2 offers two kinds of hypothesis tests: the Wald test, where we use the estimated standard error of a log2 fold change to test if it is equal to zero, and the likelihood ratio test (LRT). The LRT examines two models for the counts, a *full* model with a certain number of terms and a *reduced* model, in which some of the terms of the *full* model are removed. The test determines if the increased likelihood of the data using the extra terms in the *full* model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the *deviance* captures the difference in likelihood between a full and a reduced model. The likelihood ratio test can be performed by specifying `test="LRT"` when using the *DESeq* function, and providing a reduced design formula, e.g. one in which a number of terms from `design(dds)` are removed. The degrees of freedom for the test is obtained from the difference between the number of parameters in the two models. A simple likelihood ratio test, if the full design was `~condition` would look like: ```{r simpleLRT, eval=FALSE} dds <- DESeq(dds, test="LRT", reduced=~1) res <- results(dds) ``` If the full design contained other variables, such as a batch variable, e.g. `~batch + condition` then the likelihood ratio test would look like: ```{r simpleLRT2, eval=FALSE} dds <- DESeq(dds, test="LRT", reduced=~batch) res <- results(dds) ``` ## Approach to count outliers RNA-seq data sometimes contain isolated instances of very large counts that are apparently unrelated to the experimental or study design, and which may be considered outliers. There are many reasons why outliers can arise, including rare technical or experimental artifacts, read mapping problems in the case of genetically differing samples, and genuine, but rare biological events. In many cases, users appear primarily interested in genes that show a consistent behavior, and this is the reason why by default, genes that are affected by such outliers are set aside by DESeq2, or if there are sufficient samples, outlier counts are replaced for model fitting. These two behaviors are described below. The *DESeq* function calculates, for every gene and for every sample, a diagnostic test for outliers called *Cook's distance*. Cook's distance is a measure of how much a single sample is influencing the fitted coefficients for a gene, and a large value of Cook's distance is intended to indicate an outlier count. The Cook's distances are stored as a matrix available in `assays(dds)[["cooks"]]`. The *results* function automatically flags genes which contain a Cook's distance above a cutoff for samples which have 3 or more replicates. The *p* values and adjusted *p* values for these genes are set to `NA`. At least 3 replicates are required for flagging, as it is difficult to judge which sample might be an outlier with only 2 replicates. This filtering can be turned off with `results(dds, cooksCutoff=FALSE)`. With many degrees of freedom -- i.\,e., many more samples than number of parameters to be estimated -- it is undesirable to remove entire genes from the analysis just because their data include a single count outlier. When there are 7 or more replicates for a given sample, the *DESeq* function will automatically replace counts with large Cook's distance with the trimmed mean over all samples, scaled up by the size factor or normalization factor for that sample. This approach is conservative, it will not lead to false positives, as it replaces the outlier value with the value predicted by the null hypothesis. This outlier replacement only occurs when there are 7 or more replicates, and can be turned off with `DESeq(dds, minReplicatesForReplace=Inf)`. The default Cook's distance cutoff for the two behaviors described above depends on the sample size and number of parameters to be estimated. The default is to use the 99% quantile of the F(p,m-p) distribution (with *p* the number of parameters including the intercept and *m* number of samples). The default for gene flagging can be modified using the `cooksCutoff` argument to the *results* function. For outlier replacement, *DESeq* preserves the original counts in `counts(dds)` saving the replacement counts as a matrix named `replaceCounts` in `assays(dds)`. Note that with continuous variables in the design, outlier detection and replacement is not automatically performed, as our current methods involve a robust estimation of within-group variance which does not extend easily to continuous covariates. However, users can examine the Cook's distances in `assays(dds)[["cooks"]]`, in order to perform manual visualization and filtering if necessary. **Note on many outliers:** if there are very many outliers (e.g. many hundreds or thousands) reported by `summary(res)`, one might consider further exploration to see if a single sample or a few samples should be removed due to low quality. The automatic outlier filtering/replacement is most useful in situations which the number of outliers is limited. When there are thousands of reported outliers, it might make more sense to turn off the outlier filtering/replacement (*DESeq* with `minReplicatesForReplace=Inf` and *results* with `cooksCutoff=FALSE`) and perform manual inspection: First it would be advantageous to make a PCA plot as described above to spot individual sample outliers; Second, one can make a boxplot of the Cook's distances to see if one sample is consistently higher than others (here this is not the case): ```{r boxplotCooks} par(mar=c(8,5,2,2)) boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2) ``` ## Dispersion plot and fitting alternatives Plotting the dispersion estimates is a useful diagnostic. The dispersion plot below is typical, with the final estimates shrunk from the gene-wise estimates towards the fitted estimates. Some gene-wise estimates are flagged as outliers and not shrunk towards the fitted value, (this outlier detection is described in the manual page for *estimateDispersionsMAP*). The amount of shrinkage can be more or less than seen here, depending on the sample size, the number of coefficients, the row mean and the variability of the gene-wise estimates. ```{r dispFit} plotDispEsts(dds) ``` ### Local or mean dispersion fit A local smoothed dispersion fit is automatically substitited in the case that the parametric curve doesn't fit the observed dispersion mean relationship. This can be prespecified by providing the argument `fitType="local"` to either *DESeq* or *estimateDispersions*. Additionally, using the mean of gene-wise disperion estimates as the fitted value can be specified by providing the argument `fitType="mean"`. ### Supply a custom dispersion fit Any fitted values can be provided during dispersion estimation, using the lower-level functions described in the manual page for *estimateDispersionsGeneEst*. In the code chunk below, we store the gene-wise estimates which were already calculated and saved in the metadata column `dispGeneEst`. Then we calculate the median value of the dispersion estimates above a threshold, and save these values as the fitted dispersions, using the replacement function for *dispersionFunction*. In the last line, the function *estimateDispersionsMAP*, uses the fitted dispersions to generate maximum *a posteriori* (MAP) estimates of dispersion. ```{r dispFitCustom} ddsCustom <- dds useForMedian <- mcols(ddsCustom)$dispGeneEst > 1e-7 medianDisp <- median(mcols(ddsCustom)$dispGeneEst[useForMedian], na.rm=TRUE) dispersionFunction(ddsCustom) <- function(mu) medianDisp ddsCustom <- estimateDispersionsMAP(ddsCustom) ``` ## Independent filtering of results The *results* function of the DESeq2 package performs independent filtering by default using the mean of normalized counts as a filter statistic. A threshold on the filter statistic is found which optimizes the number of adjusted *p* values lower than a significance level `alpha` (we use the standard variable name for significance level, though it is unrelated to the dispersion parameter $\alpha$). The theory behind independent filtering is discussed in greater detail [below](#indfilttheory). The adjusted *p* values for the genes which do not pass the filter threshold are set to `NA`. The default independent filtering is performed using the *filtered_p* function of the [genefilter](http://bioconductor.org/packages/genefilter) package, and all of the arguments of *filtered_p* can be passed to the *results* function. The filter threshold value and the number of rejections at each quantile of the filter statistic are available as metadata of the object returned by *results*. For example, we can visualize the optimization by plotting the `filterNumRej` attribute of the results object. The *results* function maximizes the number of rejections (adjusted *p* value less than a significance level), over the quantiles of a filter statistic (the mean of normalized counts). The threshold chosen (vertical line) is the lowest quantile of the filter for which the number of rejections is within 1 residual standard deviation to the peak of a curve fit to the number of rejections over the filter quantiles: ```{r filtByMean} metadata(res)$alpha metadata(res)$filterThreshold plot(metadata(res)$filterNumRej, type="b", ylab="number of rejections", xlab="quantiles of filter") lines(metadata(res)$lo.fit, col="red") abline(v=metadata(res)$filterTheta) ``` Independent filtering can be turned off by setting `independentFiltering` to `FALSE`. ```{r noFilt} resNoFilt <- results(dds, independentFiltering=FALSE) addmargins(table(filtering=(res$padj < .1), noFiltering=(resNoFilt$padj < .1))) ``` ## Tests of log2 fold change above or below a threshold It is also possible to provide thresholds for constructing Wald tests of significance. Two arguments to the *results* function allow for threshold-based Wald tests: `lfcThreshold`, which takes a numeric of a non-negative threshold value, and `altHypothesis`, which specifies the kind of test. Note that the *alternative hypothesis* is specified by the user, i.e. those genes which the user is interested in finding, and the test provides *p* values for the null hypothesis, the complement of the set defined by the alternative. The `altHypothesis` argument can take one of the following four values, where $\beta$ is the log2 fold change specified by the `name` argument, and $x$ is the `lfcThreshold`. * `greaterAbs` - $|\beta| > x$ - tests are two-tailed * `lessAbs` - $|\beta| < x$ - *p* values are the maximum of the upper and lower tests * `greater` - $\beta > x$ * `less` - $\beta < -x$ The four possible values of `altHypothesis` are demonstrated in the following code and visually by MA-plots in the following figures. ```{r lfcThresh} par(mfrow=c(2,2),mar=c(2,2,1,1)) ylim <- c(-2.5,2.5) resGA <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") resLA <- results(dds, lfcThreshold=.5, altHypothesis="lessAbs") resG <- results(dds, lfcThreshold=.5, altHypothesis="greater") resL <- results(dds, lfcThreshold=.5, altHypothesis="less") drawLines <- function() abline(h=c(-.5,.5),col="dodgerblue",lwd=2) plotMA(resGA, ylim=ylim); drawLines() plotMA(resLA, ylim=ylim); drawLines() plotMA(resG, ylim=ylim); drawLines() plotMA(resL, ylim=ylim); drawLines() ``` ## Access to all calculated values All row-wise calculated values (intermediate dispersion calculations, coefficients, standard errors, etc.) are stored in the *DESeqDataSet* object, e.g. `dds` in this vignette. These values are accessible by calling *mcols* on `dds`. Descriptions of the columns are accessible by two calls to *mcols*. Note that the call to `substr` below is only for display purposes. ```{r mcols} mcols(dds,use.names=TRUE)[1:4,1:4] substr(names(mcols(dds)),1,10) mcols(mcols(dds), use.names=TRUE)[1:4,] ``` The mean values $\mu_{ij} = s_j q_{ij}$ and the Cook's distances for each gene and sample are stored as matrices in the assays slot: ```{r muAndCooks} head(assays(dds)[["mu"]]) head(assays(dds)[["cooks"]]) ``` The dispersions $\alpha_i$ can be accessed with the *dispersions* function. ```{r dispersions} head(dispersions(dds)) head(mcols(dds)$dispersion) ``` The size factors $s_j$ are accessible via *sizeFactors*: ```{r sizefactors} sizeFactors(dds) ``` For advanced users, we also include a convenience function *coef* for extracting the matrix $[\beta_{ir}]$ for all genes *i* and model coefficients $r$. This function can also return a matrix of standard errors, see `?coef`. The columns of this matrix correspond to the effects returned by *resultsNames*. Note that the *results* function is best for building results tables with *p* values and adjusted *p* values. ```{r coef} head(coef(dds)) ``` The beta prior variance $\sigma_r^2$ is stored as an attribute of the *DESeqDataSet*: ```{r betaPriorVar} attr(dds, "betaPriorVar") ``` General information about the prior used for log fold change shrinkage is also stored in a slot of the *DESeqResults* object. This would also contain information about what other packages were used for log2 fold change shrinkage. ```{r priorInfo} priorInfo(resLFC) priorInfo(resApe) priorInfo(resAsh) ``` The dispersion prior variance $\sigma_d^2$ is stored as an attribute of the dispersion function: ```{r dispPriorVar} dispersionFunction(dds) attr(dispersionFunction(dds), "dispPriorVar") ``` The version of DESeq2 which was used to construct the *DESeqDataSet* object, or the version used when *DESeq* was run, is stored here: ```{r versionNum} metadata(dds)[["version"]] ``` ## Sample-/gene-dependent normalization factors In some experiments, there might be gene-dependent dependencies which vary across samples. For instance, GC-content bias or length bias might vary across samples coming from different labs or processed at different times. We use the terms *normalization factors* for a gene x sample matrix, and *size factors* for a single number per sample. Incorporating normalization factors, the mean parameter $\mu_{ij}$ becomes: $$ \mu_{ij} = NF_{ij} q_{ij} $$ with normalization factor matrix *NF* having the same dimensions as the counts matrix *K*. This matrix can be incorporated as shown below. We recommend providing a matrix with row-wise geometric means of 1, so that the mean of normalized counts for a gene is close to the mean of the unnormalized counts. This can be accomplished by dividing out the current row geometric means. ```{r normFactors, eval=FALSE} normFactors <- normFactors / exp(rowMeans(log(normFactors))) normalizationFactors(dds) <- normFactors ``` These steps then replace *estimateSizeFactors* which occurs within the *DESeq* function. The *DESeq* function will look for pre-existing normalization factors and use these in the place of size factors (and a message will be printed confirming this). The methods provided by the [cqn](http://bioconductor.org/packages/cqn) or [EDASeq](http://bioconductor.org/packages/EDASeq) packages can help correct for GC or length biases. They both describe in their vignettes how to create matrices which can be used by DESeq2. From the formula above, we see that normalization factors should be on the scale of the counts, like size factors, and unlike offsets which are typically on the scale of the predictors (i.e. the logarithmic scale for the negative binomial GLM). At the time of writing, the transformation from the matrices provided by these packages should be: ```{r offsetTransform, eval=FALSE} cqnOffset <- cqnObject$glm.offset cqnNormFactors <- exp(cqnOffset) EDASeqNormFactors <- exp(-1 * EDASeqOffset) ``` ## "Model matrix not full rank" While most experimental designs run easily using design formula, some design formulas can cause problems and result in the *DESeq* function returning an error with the text: "the model matrix is not full rank, so the model cannot be fit as specified." There are two main reasons for this problem: either one or more columns in the model matrix are linear combinations of other columns, or there are levels of factors or combinations of levels of multiple factors which are missing samples. We address these two problems below and discuss possible solutions: ### Linear combinations The simplest case is the linear combination, or linear dependency problem, when two variables contain exactly the same information, such as in the following sample table. The software cannot fit an effect for `batch` and `condition`, because they produce identical columns in the model matrix. This is also referred to as *perfect confounding*. A unique solution of coefficients (the $\beta_i$ in the formula [below](#theory)) is not possible. ```{r lineardep, echo=FALSE} DataFrame(batch=factor(c(1,1,2,2)), condition=factor(c("A","A","B","B"))) ``` Another situation which will cause problems is when the variables are not identical, but one variable can be formed by the combination of other factor levels. In the following example, the effect of batch 2 vs 1 cannot be fit because it is identical to a column in the model matrix which represents the condition C vs A effect. ```{r lineardep2, echo=FALSE} DataFrame(batch=factor(c(1,1,1,1,2,2)), condition=factor(c("A","A","B","B","C","C"))) ``` In both of these cases above, the batch effect cannot be fit and must be removed from the model formula. There is just no way to tell apart the condition effects and the batch effects. The options are either to assume there is no batch effect (which we know is highly unlikely given the literature on batch effects in sequencing datasets) or to repeat the experiment and properly balance the conditions across batches. A balanced design would look like: ```{r lineardep3, echo=FALSE} DataFrame(batch=factor(c(1,1,1,2,2,2)), condition=factor(c("A","B","C","A","B","C"))) ``` ### Group-specific condition effects, individuals nested within groups Finally, there is a case where we *can* in fact perform inference, but we may need to re-arrange terms to do so. Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a condition or treatment, while controlling for individual effects. The individuals are nested within the groups: an individual can only be in one of the groups, although each individual has one or more observations across condition. An example of such an experiment is below: ```{r groupeffect} coldata <- DataFrame(grp=factor(rep(c("X","Y"),each=6)), ind=factor(rep(1:6,each=2)), cnd=factor(rep(c("A","B"),6))) coldata ``` Note that individual (`ind`) is a *factor* not a numeric. This is very important. To make R display all the rows, we can do: ```{r} as.data.frame(coldata) ``` We have two groups of samples X and Y, each with three distinct individuals (labeled here 1-6). For each individual, we have conditions A and B (for example, this could be control and treated). This design can be analyzed by DESeq2 but requires a bit of refactoring in order to fit the model terms. Here we will use a trick described in the [edgeR](http://bioconductor.org/packages/edgeR) user guide, from the section *Comparisons Both Between and Within Subjects*. If we try to analyze with a formula such as, `~ ind + grp*cnd`, we will obtain an error, because the effect for group is a linear combination of the individuals. However, the following steps allow for an analysis of group-specific condition effects, while controlling for differences in individual. For object construction, you can use a simple design, such as `~ ind + cnd`, as long as you remember to replace it before running *DESeq*. Then add a column `ind.n` which distinguishes the individuals nested within a group. Here, we add this column to coldata, but in practice you would add this column to `dds`. ```{r groupeffect2} coldata$ind.n <- factor(rep(rep(1:3,each=2),2)) as.data.frame(coldata) ``` Now we can reassign our *DESeqDataSet* a design of `~ grp + grp:ind.n + grp:cnd`, before we call *DESeq*. This new design will result in the following model matrix: ```{r groupeffect3} model.matrix(~ grp + grp:ind.n + grp:cnd, coldata) ``` Note that, if you have unbalanced numbers of individuals in the two groups, you will have zeros for some of the interactions between `grp` and `ind.n`. You can remove these columns manually from the model matrix and pass the corrected model matrix to the `full` argument of the *DESeq* function. See example code in the next section. Above, the terms `grpX.cndB` and `grpY.cndB` give the group-specific condition effects, in other words, the condition B vs A effect for group X samples, and likewise for group Y samples. These terms control for all of the six individual effects. These group-specific condition effects can be extracted using *results* with the `name` argument. Furthermore, `grpX.cndB` and `grpY.cndB` can be contrasted using the `contrast` argument, in order to test if the condition effect is different across group: ```{r groupeffect4, eval=FALSE} results(dds, contrast=list("grpY.cndB","grpX.cndB")) ``` ### Levels without samples The base R function for creating model matrices will produce a column of zeros if a level is missing from a factor or a combination of levels is missing from an interaction of factors. The solution to the first case is to call *droplevels* on the column, which will remove levels without samples. This was shown in the beginning of this vignette. The second case is also solvable, by manually editing the model matrix, and then providing this to *DESeq*. Here we construct an example dataset to illustrate: ```{r missingcombo} group <- factor(rep(1:3,each=6)) condition <- factor(rep(rep(c("A","B","C"),each=2),3)) d <- DataFrame(group, condition)[-c(17,18),] as.data.frame(d) ``` Note that if we try to estimate all interaction terms, we introduce a column with all zeros, as there are no condition C samples for group 3. (Here, *unname* is used to display the matrix concisely.) ```{r missingcombo2} m1 <- model.matrix(~ condition*group, d) colnames(m1) unname(m1) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero ``` We can remove this column like so: ```{r missingcombo3} idx <- which(all.zero) m1 <- m1[,-idx] unname(m1) ``` Now this matrix `m1` can be provided to the `full` argument of *DESeq*. For a likelihood ratio test of interactions, a model matrix using a reduced design such as `~ condition + group` can be given to the `reduced` argument. Wald tests can also be generated instead of the likelihood ratio test, but for user-supplied model matrices, the argument `betaPrior` must be set to `FALSE`. # Theory behind DESeq2 ## The DESeq2 model The DESeq2 model and all the steps taken in the software are described in detail in our publication [@Love2014], and we include the formula and descriptions in this section as well. The differential expression analysis in DESeq2 uses a generalized linear model of the form: $$ K_{ij} \sim \textrm{NB}(\mu_{ij}, \alpha_i) $$ $$ \mu_{ij} = s_j q_{ij} $$ $$ \log_2(q_{ij}) = x_{j.} \beta_i $$ where counts $K_{ij}$ for gene *i*, sample *j* are modeled using a negative binomial distribution with fitted mean $\mu_{ij}$ and a gene-specific dispersion parameter $\alpha_i$. The fitted mean is composed of a sample-specific size factor $s_j$ and a parameter $q_{ij}$ proportional to the expected true concentration of fragments for sample *j*. The coefficients $\beta_i$ give the log2 fold changes for gene *i* for each column of the model matrix $X$. Note that the model can be generalized to use sample- and gene-dependent normalization factors $s_{ij}$. The dispersion parameter $\alpha_i$ defines the relationship between the variance of the observed count and its mean value. In other words, how far do we expected the observed count will be from the mean value, which depends both on the size factor $s_j$ and the covariate-dependent part $q_{ij}$ as defined above. $$ \textrm{Var}(K_{ij}) = E[ (K_{ij} - \mu_{ij})^2 ] = \mu_{ij} + \alpha_i \mu_{ij}^2 $$ An option in DESeq2 is to provide maximum *a posteriori* estimates of the log2 fold changes in $\beta_i$ after incorporating a zero-centered Normal prior (`betaPrior`). While previously, these moderated, or shrunken, estimates were generated by *DESeq* or *nbinomWaldTest* functions, they are now produced by the *lfcShrink* function. Dispersions are estimated using expected mean values from the maximum likelihood estimate of log2 fold changes, and optimizing the Cox-Reid adjusted profile likelihood, as first implemented for RNA-seq data in [edgeR](http://bioconductor.org/packages/edgeR) [@CR,edgeR_GLM]. The steps performed by the *DESeq* function are documented in its manual page `?DESeq`; briefly, they are: 1) estimation of size factors $s_j$ by *estimateSizeFactors* 2) estimation of dispersion $\alpha_i$ by *estimateDispersions* 3) negative binomial GLM fitting for $\beta_i$ and Wald statistics by *nbinomWaldTest* For access to all the values calculated during these steps, see the section [above](#access). ## Changes compared to DESeq The main changes in the package *DESeq2*, compared to the (older) version *DESeq*, are as follows: * *RangedSummarizedExperiment* is used as the superclass for storage of input data, intermediate calculations and results. * Optional, maximum *a posteriori* estimation of GLM coefficients incorporating a zero-centered Normal prior with variance estimated from data (equivalent to Tikhonov/ridge regularization). This adjustment has little effect on genes with high counts, yet it helps to moderate the otherwise large variance in log2 fold change estimates for genes with low counts or highly variable counts. These estimates are now provided by the *lfcShrink* function. * Maximum *a posteriori* estimation of dispersion replaces the `sharingMode` options `fit-only` or `maximum` of the previous version of the package. This is similar to the dispersion estimation methods of DSS [@Wu2012New]. * All estimation and inference is based on the generalized linear model, which includes the two condition case (previously the *exact test* was used). * The Wald test for significance of GLM coefficients is provided as the default inference method, with the likelihood ratio test of the previous version still available. * It is possible to provide a matrix of sample-/gene-dependent normalization factors. * Automatic independent filtering on the mean of normalized counts. * Automatic outlier detection and handling. ## Methods changes since the 2014 DESeq2 paper * In version 1.18 (November 2017), we add two [alternative shrinkage estimators](#alternative-shrinkage-estimators), which can be used via `lfcShrink`: an estimator using a t prior from the apeglm packages, and an estimator with a fitted mixture of normals prior from the ashr package. * In version 1.16 (November 2016), the log2 fold change shrinkage is no longer default for the *DESeq* and *nbinomWaldTest* functions, by setting the defaults of these to `betaPrior=FALSE`, and by introducing a separate function *lfcShrink*, which performs log2 fold change shrinkage for visualization and ranking of genes. While for the majority of bulk RNA-seq experiments, the LFC shrinkage did not affect statistical testing, DESeq2 has become used as an inference engine by a wider community, and certain sequencing datasets show better performance with the testing separated from the use of the LFC prior. Also, the separation of LFC shrinkage to a separate function `lfcShrink` allows for easier methods development of alternative effect size estimators. * A small change to the independent filtering routine: instead of taking the quantile of the filter (the mean of normalized counts) which directly *maximizes* the number of rejections, the threshold chosen is the lowest quantile of the filter for which the number of rejections is close to the peak of a curve fit to the number of rejections over the filter quantiles. ``Close to'' is defined as within 1 residual standard deviation. This change was introduced in version 1.10 (October 2015). * For the calculation of the beta prior variance, instead of matching the empirical quantile to the quantile of a Normal distribution, DESeq2 now uses the weighted quantile function of the Hmisc package. The weighting is described in the manual page for *nbinomWaldTest*. The weights are the inverse of the expected variance of log counts (as used in the diagonals of the matrix $W$ in the GLM). The effect of the change is that the estimated prior variance is robust against noisy estimates of log fold change from genes with very small counts. This change was introduced in version 1.6 (October 2014). For a list of all changes since version 1.0.0, see the `NEWS` file included in the package. ## Count outlier detection DESeq2 relies on the negative binomial distribution to make estimates and perform statistical inference on differences. While the negative binomial is versatile in having a mean and dispersion parameter, extreme counts in individual samples might not fit well to the negative binomial. For this reason, we perform automatic detection of count outliers. We use Cook's distance, which is a measure of how much the fitted coefficients would change if an individual sample were removed [@Cook1977Detection]. For more on the implementation of Cook's distance see the manual page for the *results* function. Below we plot the maximum value of Cook's distance for each row over the rank of the test statistic to justify its use as a filtering criterion. ```{r cooksPlot} W <- res$stat maxCooks <- apply(assays(dds)[["cooks"]],1,max) idx <- !is.na(W) plot(rank(W[idx]), maxCooks[idx], xlab="rank of Wald statistic", ylab="maximum Cook's distance per gene", ylim=c(0,5), cex=.4, col=rgb(0,0,0,.3)) m <- ncol(dds) p <- 3 abline(h=qf(.99, p, m - p)) ``` ## Contrasts Contrasts can be calculated for a *DESeqDataSet* object for which the GLM coefficients have already been fit using the Wald test steps (*DESeq* with `test="Wald"` or using *nbinomWaldTest*). The vector of coefficients $\beta$ is left multiplied by the contrast vector $c$ to form the numerator of the test statistic. The denominator is formed by multiplying the covariance matrix $\Sigma$ for the coefficients on either side by the contrast vector $c$. The square root of this product is an estimate of the standard error for the contrast. The contrast statistic is then compared to a normal distribution as are the Wald statistics for the DESeq2 package. $$ W = \frac{c^t \beta}{\sqrt{c^t \Sigma c}} $$ ## Expanded model matrices For the specific combination of `lfcShrink` with the type `normal` and using `contrast`, DESeq2 uses *expanded model matrices* to produce shrunken log2 fold change estimates where the shrinkage is independent of the choice of reference level. In all other cases, DESeq2 uses standard model matrices, as produced by `model.matrix`. The expanded model matrices differ from the standard model matrices, in that they have an indicator column (and therefore a coefficient) for each level of factors in the design formula in addition to an intercept. This is described in the DESeq2 paper, but the DESeq2 software package has moved away from this approach, with more support for shrinkage of individual coefficients (although the expanded model matrix approach is still supported using the above combination of functions and arguments). ## Independent filtering and multiple testing ### Filtering criteria The goal of independent filtering is to filter out those tests from the procedure that have no, or little chance of showing significant evidence, without even looking at their test statistic. Typically, this results in increased detection power at the same experiment-wide type I error. Here, we measure experiment-wide type I error in terms of the false discovery rate. A good choice for a filtering criterion is one that 1) is statistically independent from the test statistic under the null hypothesis, 2) is correlated with the test statistic under the alternative, and 3) does not notably change the dependence structure -- if there is any -- between the tests that pass the filter, compared to the dependence structure between the tests before filtering. The benefit from filtering relies on property (2), and we will explore it further below. Its statistical validity relies on property (1) -- which is simple to formally prove for many combinations of filter criteria with test statistics -- and (3), which is less easy to theoretically imply from first principles, but rarely a problem in practice. We refer to [@Bourgon:2010:PNAS] for further discussion of this topic. A simple filtering criterion readily available in the results object is the mean of normalized counts irrespective of biological condition, and so this is the criterion which is used automatically by the *results* function to perform independent filtering. Genes with very low counts are not likely to see significant differences typically due to high dispersion. For example, we can plot the $-\log_{10}$ *p* values from all genes over the normalized mean counts: ```{r indFilt} plot(res$baseMean+1, -log10(res$pvalue), log="x", xlab="mean of normalized counts", ylab=expression(-log[10](pvalue)), ylim=c(0,30), cex=.4, col=rgb(0,0,0,.3)) ``` ### Why does it work? Consider the *p* value histogram below It shows how the filtering ameliorates the multiple testing problem -- and thus the severity of a multiple testing adjustment -- by removing a background set of hypotheses whose *p* values are distributed more or less uniformly in [0,1]. ```{r histindepfilt} use <- res$baseMean > metadata(res)$filterThreshold h1 <- hist(res$pvalue[!use], breaks=0:50/50, plot=FALSE) h2 <- hist(res$pvalue[use], breaks=0:50/50, plot=FALSE) colori <- c(`do not pass`="khaki", `pass`="powderblue") ``` Histogram of p values for all tests. The area shaded in blue indicates the subset of those that pass the filtering, the area in khaki those that do not pass: ```{r fighistindepfilt} barplot(height = rbind(h1$counts, h2$counts), beside = FALSE, col = colori, space = 0, main = "", ylab="frequency") text(x = c(0, length(h1$counts)), y = 0, label = paste(c(0,1)), adj = c(0.5,1.7), xpd=NA) legend("topright", fill=rev(colori), legend=rev(names(colori))) ``` # Frequently asked questions ## How can I get support for DESeq2? We welcome questions about our software, and want to ensure that we eliminate issues if and when they appear. We have a few requests to optimize the process: * all questions should take place on the Bioconductor support site: , which serves as a repository of questions and answers. This helps to save the developers' time in responding to similar questions. Make sure to tag your post with `deseq2`. It is often very helpful in addition to describe the aim of your experiment. * before posting, first search the Bioconductor support site mentioned above for past threads which might have answered your question. * if you have a question about the behavior of a function, read the sections of the manual page for this function by typing a question mark and the function name, e.g. `?results`. We spend a lot of time documenting individual functions and the exact steps that the software is performing. * include all of your R code, especially the creation of the *DESeqDataSet* and the design formula. Include complete warning or error messages, and conclude your message with the full output of `sessionInfo()`. * if possible, include the output of `as.data.frame(colData(dds))`, so that we can have a sense of the experimental setup. If this contains confidential information, you can replace the levels of those factors using *levels()*. ## Why are some *p* values set to NA? See the details [above](#pvaluesNA). ## How can I get unfiltered DESeq2 results? Users can obtain unfiltered GLM results, i.e. without outlier removal or independent filtering with the following call: ```{r vanillaDESeq, eval=FALSE} dds <- DESeq(dds, minReplicatesForReplace=Inf) res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE) ``` In this case, the only *p* values set to `NA` are those from genes with all counts equal to zero. ## How do I use VST or rlog data for differential testing? The variance stabilizing and rlog transformations are provided for applications other than differential testing, for example clustering of samples or other machine learning applications. For differential testing we recommend the *DESeq* function applied to raw counts as outlined [above](#de). ## Can I use DESeq2 to analyze paired samples? Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. `~ subject + condition`. ## If I have multiple groups, should I run all together or split into pairs of groups? Typically, we recommend users to run samples from all groups together, and then use the `contrast` argument of the *results* function to extract comparisons of interest after fitting the model using *DESeq*. The model fit by *DESeq* estimates a single dispersion parameter for each gene, which defines how far we expect the observed count for a sample will be from the mean value from the model given its size factor and its condition group. See the section [above](#theory) and the DESeq2 paper for full details. Having a single dispersion parameter for each gene is usually sufficient for analyzing multi-group data, as the final dispersion value will incorporate the within-group variability across all groups. However, for some datasets, exploratory data analysis (EDA) plots could reveal that one or more groups has much higher within-group variability than the others. A simulated example of such a set of samples is shown below. This is case where, by comparing groups A and B separately -- subsetting a *DESeqDataSet* to only samples from those two groups and then running *DESeq* on this subset -- will be more sensitive than a model including all samples together. It should be noted that such an extreme range of within-group variability is not common, although it could arise if certain treatments produce an extreme reaction (e.g. cell death). Again, this can be easily detected from the EDA plots such as PCA described in this vignette. Here we diagram an extreme range of within-group variability with a simulated dataset. Typically, it is recommended to run *DESeq* across samples from all groups, for datasets with multiple groups. However, this simulated dataset shows a case where it would be preferable to compare groups A and B by creating a smaller dataset without the C samples. Group C has much higher within-group variability, which would inflate the per-gene dispersion estimate for groups A and B as well: ```{r varGroup, echo=FALSE} set.seed(3) dds1 <- makeExampleDESeqDataSet(n=1000,m=12,betaSD=.3,dispMeanRel=function(x) 0.01) dds2 <- makeExampleDESeqDataSet(n=1000,m=12, betaSD=.3, interceptMean=mcols(dds1)$trueIntercept, interceptSD=0, dispMeanRel=function(x) 0.2) dds2 <- dds2[,7:12] dds2$condition <- rep("C",6) mcols(dds2) <- NULL dds12 <- cbind(dds1, dds2) rld <- rlog(dds12, blind=FALSE, fitType="mean") plotPCA(rld) ``` ## Can I run DESeq2 to contrast the levels of many groups? DESeq2 will work with any kind of design specified using the R formula. We enourage users to consider exploratory data analysis such as principal components analysis rather than performing statistical testing of all pairs of many groups of samples. Statistical testing is one of many ways of describing differences between samples. Regarding multiple test correction, if a user is planning to contrast all pairs of many levels, and then selectively reporting the results of only a *subset* of those pairs, one needs to perform multiple testing across *contrasts* as well as genes to control for this additional form of multiple testing. This can be done by using the `p.adjust` function across a long vector of *p* values from all pairs of contasts, then re-assigning these adjusted *p* values to the appropriate results table. As a speed concern with fitting very large models, note that each additional level of a factor in the design formula adds another parameter to the GLM which is fit by DESeq2. Users might consider first removing genes with very few reads, e.g. genes with row sum of 1, as this will speed up the fitting procedure. ## Can I use DESeq2 to analyze a dataset without replicates? If a *DESeqDataSet* is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for `?DESeq`. ## How can I include a continuous covariate in the design formula? Continuous covariates can be included in the design formula in exactly the same manner as factorial covariates, and then *results* for the continuous covariate can be extracted by specifying `name`. Continuous covariates might make sense in certain experiments, where a constant fold change might be expected for each unit of the covariate. However, in many cases, more meaningful results can be obtained by cutting continuous covariates into a factor defined over a small number of bins (e.g. 3-5). In this way, the average effect of each group is controlled for, regardless of the trend over the continuous covariates. In R, *numeric* vectors can be converted into *factors* using the function *cut*. ## I ran a likelihood ratio test, but results() only gives me one comparison. "... How do I get the *p* values for all of the variables/levels that were removed in the reduced design?" This is explained in the help page for `?results` in the section about likelihood ratio test p-values, but we will restate the answer here. When one performs a likelihood ratio test, the *p* values and the test statistic (the `stat` column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test *p* values therefore represent a test of *all the variables and all the levels of factors* which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This is indicated at the top of the results table with the text, e.g., log2 fold change (MLE): condition C vs A, followed by, LRT p-value: '~ batch + condition' vs '~ batch'. This indicates that the *p* value is for the likelihood ratio test of *all the variables and all the levels*, while the log fold change is a single comparison from among those variables and levels. See the help page for *results* for more details. ## What are the exact steps performed by DESeq()? See the manual page for *DESeq*, which links to the subfunctions which are called in order, where complete details are listed. Also you can read the three steps listed in the [DESeq2 model](#theory) in this document. ## Is there an official Galaxy tool for DESeq2? Yes. The repository for the DESeq2 tool is and a link to its location in the Tool Shed is . ## I want to benchmark DESeq2 comparing to other DE tools. One aspect which can cause problems for comparison is that, by default, DESeq2 outputs `NA` values for adjusted *p* values based on independent filtering of genes which have low counts. This is a way for the DESeq2 to give extra information on why the adjusted *p* value for this gene is not small. Additionally, *p* values can be set to `NA` based on extreme count outlier detection. These `NA` values should be considered *negatives* for purposes of estimating sensitivity and specificity. The easiest way to work with the adjusted *p* values in a benchmarking context is probably to convert these `NA` values to 1: ```{r convertNA, eval=FALSE} res$padj <- ifelse(is.na(res$padj), 1, res$padj) ``` ## I have trouble installing DESeq2 on Ubuntu/Linux... "*I try to install DESeq2 using biocLite(), but I get an error trying to install the R packages XML and/or RCurl:*" `ERROR: configuration failed for package XML` `ERROR: configuration failed for package RCurl` You need to install the following devel versions of packages using your standard package manager, e.g. `sudo apt-get install` or `sudo apt install` * libxml2-dev * libcurl4-openssl-dev # Acknowledgments We have benefited in the development of DESeq2 from the help and feedback of many individuals, including but not limited to: The Bionconductor Core Team, Alejandro Reyes, Andrzej Oles, Aleksandra Pekowska, Felix Klein, Nikolaos Ignatiadis (IHW), Anqi Zhu (apeglm), Joseph Ibrahim (apeglm), Vince Carey, Owen Solberg, Ruping Sun, Devon Ryan, Steve Lianoglou, Jessica Larson, Christina Chaivorapol, Pan Du, Richard Bourgon, Willem Talloen, Elin Videvall, Hanneke van Deutekom, Todd Burwell, Jesse Rowley, Igor Dolgalev, Stephen Turner, Ryan C Thompson, Tyr Wiesner-Hanks, Konrad Rudolph, David Robinson, Mingxiang Teng, Mathias Lesche, Sonali Arora, Jordan Ramilowski, Ian Dworkin, Bjorn Gruning, Ryan McMinds, Paul Gordon, Leonardo Collado Torres, Enrico Ferrero, Peter Langfelder. # Session info ```{r sessionInfo} sessionInfo() ``` # References DESeq2/inst/doc/DESeq2.html0000644000175400017540001110721313201712502016255 0ustar00biocbuildbiocbuild Analyzing RNA-seq data with DESeq2

Standard workflow

Note: if you use DESeq2 in published research, please cite:

Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. 10.1186/s13059-014-0550-8

Other Bioconductor packages with similar aims are edgeR, limma, DSS, EBSeq, and baySeq.

Quick start

Here we show the most basic steps for a differential expression analysis. There are a variety of steps upstream of DESeq2 that result in the generation of counts or estimated counts for each sample, which we will discuss in the sections below. This code chunk assumes that you have a count matrix called cts and a table of sample information called coldata. The design indicates how to model the samples, here, that we want to measure the effect of the condition, controlling for batch differences. The two factor variables batch and condition should be columns of coldata.

dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design= ~ batch + condition)
dds <- DESeq(dds)
res <- results(dds, contrast=c("condition","treat","ctrl"))
resultsNames(dds)
res <- lfcShrink(dds, coef=2)

The following starting functions will be explained below:

  • If you have transcript quantification files, as produced by Salmon, Sailfish, or kallisto, you would use DESeqDataSetFromTximport.
  • If you have htseq-count files, the first line would use DESeqDataSetFromHTSeq.
  • If you have a RangedSummarizedExperiment, the first line would use DESeqDataSet.

How to get help for DESeq2

Any and all DESeq2 questions should be posted to the Bioconductor support site, which serves as a searchable knowledge base of questions and answers:

https://support.bioconductor.org

Posting a question and tagging with “DESeq2” will automatically send an alert to the package authors to respond on the support site. See the first question in the list of Frequently Asked Questions (FAQ) for information about how to construct an informative post.

You should not email your question to the package authors, as we will just reply that the question should be posted to the Bioconductor support site.

Input data

Why un-normalized counts?

As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix tells how many reads can be assigned to gene i in sample j. Analogously, for other types of assays, the rows of the matrix might correspond e.g. to binding regions (with ChIP-Seq) or peptide sequences (with quantitative mass spectrometry). We will list method for obtaining count matrices in sections below.

The values in the matrix should be un-normalized counts or estimated counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq). The RNA-seq workflow describes multiple techniques for preparing such count matrices. It is important to provide count matrices as input for DESeq2’s statistical model (Love, Huber, and Anders 2014) to hold, as only the count values allow assessing the measurement precision correctly. The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input.

The DESeqDataSet

The object class used by the DESeq2 package to store the read counts and the intermediate estimated quantities during statistical analysis is the DESeqDataSet, which will usually be represented in the code here as an object dds.

A technical detail is that the DESeqDataSet class extends the RangedSummarizedExperiment class of the SummarizedExperiment package. The “Ranged” part refers to the fact that the rows of the assay data (here, the counts) can be associated with genomic ranges (the exons of genes). This association facilitates downstream exploration of results, making use of other Bioconductor packages’ range-based functionality (e.g. find the closest ChIP-seq peaks to the differentially expressed genes).

A DESeqDataSet object must have an associated design formula. The design formula expresses the variables which will be used in modeling. The formula should be a tilde (~) followed by the variables with plus signs between them (it will be coerced into an formula if it is not already). The design can be changed later, however then all differential analysis steps should be repeated, as the design formula is used to estimate the dispersions and to estimate the log2 fold changes of the model.

Note: In order to benefit from the default settings of the package, you should put the variable of interest at the end of the formula and make sure the control level is the first level.

We will now show 4 ways of constructing a DESeqDataSet, depending on what pipeline was used upstream of DESeq2 to generated counts or estimated counts:

  1. From transcript abundance files and tximport
  2. From a count matrix
  3. From htseq-count files
  4. From a SummarizedExperiment object

Transcript abundance files and tximport input

A newer and recommended pipeline is to use fast transcript abundance quantifiers upstream of DESeq2, and then to create gene-level count matrices for use with DESeq2 by importing the quantification data using the tximport package. This workflow allows users to import transcript abundance estimates from a variety of external software, including the following methods:

Some advantages of using the above methods for transcript abundance estimation are: (i) this approach corrects for potential changes in gene length across samples (e.g. from differential isoform usage) (Trapnell et al. 2013), (ii) some of these methods (Salmon, Sailfish, kallisto) are substantially faster and require less memory and disk usage compared to alignment-based methods that require creation and storage of BAM files, and (iii) it is possible to avoid discarding those fragments that can align to multiple genes with homologous sequence, thus increasing sensitivity (Robert and Watson 2015).

Full details on the motivation and methods for importing transcript level abundance and count estimates, summarizing to gene-level count matrices and producing an offset which corrects for potential changes in average transcript length across samples are described in (Soneson, Love, and Robinson 2015). Note that the tximport-to-DESeq2 approach uses estimated gene counts from the transcript abundance quantifiers, but not normalized counts.

A tutorial on how to use the Salmon software for quantifying transcript abundance can be found here. We recommend using the --gcBias flag which estimates a correction factor for systematic biases commonly present in RNA-seq data (Love, Hogenesch, and Irizarry 2016; Patro et al. 2017), unless you are certain that your data do not contain such bias.

Here, we demonstrate how to import transcript abundances and construct of a gene-level DESeqDataSet object from Salmon quant.sf files, which are stored in the tximportData package. You do not need the tximportData package for your analysis, it is only used here for demonstration.

Note that, instead of locating dir using system.file, a user would typically just provide a path, e.g. /path/to/quant/files. For a typical use, the condition information should already be present as a column of the sample table samples, while here we construct artificial condition labels for demonstration.

library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")
samples <- read.table(file.path(dir,"samples.txt"), header=TRUE)
samples$condition <- factor(rep(c("A","B"),each=3))
rownames(samples) <- samples$run
samples[,c("pop","center","run","condition")]
##           pop center       run condition
## ERR188297 TSI  UNIGE ERR188297         A
## ERR188088 TSI  UNIGE ERR188088         A
## ERR188329 TSI  UNIGE ERR188329         A
## ERR188288 TSI  UNIGE ERR188288         B
## ERR188021 TSI  UNIGE ERR188021         B
## ERR188356 TSI  UNIGE ERR188356         B

Next we specify the path to the files using the appropriate columns of samples, and we read in a table that links transcripts to genes for this dataset.

files <- file.path(dir,"salmon", samples$run, "quant.sf")
names(files) <- samples$run
tx2gene <- read.csv(file.path(dir, "tx2gene.csv"))

We import the necessary quantification data for DESeq2 using the tximport function. For further details on use of tximport, including the construction of the tx2gene table for linking transcripts to genes in your dataset, please refer to the tximport package vignette.

txi <- tximport(files, type="salmon", tx2gene=tx2gene)

Finally, we can construct a DESeqDataSet from the txi object and sample information in samples.

library("DESeq2")
ddsTxi <- DESeqDataSetFromTximport(txi,
                                   colData = samples,
                                   design = ~ condition)

The ddsTxi object here can then be used as dds in the following analysis steps.

Count matrix input

Alternatively, the function DESeqDataSetFromMatrix can be used if you already have a matrix of read counts prepared from another source. Another method for quickly producing count matrices from alignment files is the featureCounts function (Liao, Smyth, and Shi 2013) in the Rsubread package. To use DESeqDataSetFromMatrix, the user should provide the counts matrix, the information about the samples (the columns of the count matrix) as a DataFrame or data.frame, and the design formula.

To demonstate the use of DESeqDataSetFromMatrix, we will read in count data from the pasilla package. We read in a count matrix, which we will name cts, and the sample information table, which we will name coldata. Further below we describe how to extract these objects from, e.g. featureCounts output.

library("pasilla")
pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]

We examine the count matrix and column data to see if they are consistent in terms of sample order.

head(cts,2)
##             untreated1 untreated2 untreated3 untreated4 treated1 treated2
## FBgn0000003          0          0          0          0        0        0
## FBgn0000008         92        161         76         70      140       88
##             treated3
## FBgn0000003        1
## FBgn0000008       70
coldata
##              condition        type
## treated1fb     treated single-read
## treated2fb     treated  paired-end
## treated3fb     treated  paired-end
## untreated1fb untreated single-read
## untreated2fb untreated single-read
## untreated3fb untreated  paired-end
## untreated4fb untreated  paired-end

Note that these are not in the same order with respect to samples!

It is absolutely critical that the columns of the count matrix and the rows of the column data (information about samples) are in the same order. DESeq2 will not make guesses as to which column of the count matrix belongs to which row of the column data, these must be provided to DESeq2 already in consistent order.

As they are not in the correct order as given, we need to re-arrange one or the other so that they are consistent in terms of sample order (if we do not, later functions would produce an error). We additionally need to chop off the "fb" of the row names of coldata, so the naming is consistent.

rownames(coldata) <- sub("fb", "", rownames(coldata))
all(rownames(coldata) %in% colnames(cts))
## [1] TRUE
all(rownames(coldata) == colnames(cts))
## [1] FALSE
cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
## [1] TRUE

If you have used the featureCounts function (Liao, Smyth, and Shi 2013) in the Rsubread package, the matrix of read counts can be directly provided from the "counts" element in the list output. The count matrix and column data can typically be read into R from flat files using base R functions such as read.csv or read.delim. For htseq-count files, see the dedicated input function below.

With the count matrix, cts, and the sample information, coldata, we can construct a DESeqDataSet:

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)
dds
## class: DESeqDataSet 
## dim: 14599 7 
## metadata(1): version
## assays(1): counts
## rownames(14599): FBgn0000003 FBgn0000008 ... FBgn0261574
##   FBgn0261575
## rowData names(0):
## colnames(7): treated1 treated2 ... untreated3 untreated4
## colData names(2): condition type

If you have additional feature data, it can be added to the DESeqDataSet by adding to the metadata columns of a newly constructed object. (Here we add redundant data just for demonstration, as the gene names are already the rownames of the dds.)

featureData <- data.frame(gene=rownames(cts))
mcols(dds) <- DataFrame(mcols(dds), featureData)
mcols(dds)
## DataFrame with 14599 rows and 1 column
##              gene
##          <factor>
## 1     FBgn0000003
## 2     FBgn0000008
## 3     FBgn0000014
## 4     FBgn0000015
## 5     FBgn0000017
## ...           ...
## 14595 FBgn0261571
## 14596 FBgn0261572
## 14597 FBgn0261573
## 14598 FBgn0261574
## 14599 FBgn0261575

htseq-count input

You can use the function DESeqDataSetFromHTSeqCount if you have used htseq-count from the HTSeq python package (Anders, Pyl, and Huber 2014). For an example of using the python scripts, see the pasilla data package. First you will want to specify a variable which points to the directory in which the htseq-count output files are located.

directory <- "/path/to/your/files/"

However, for demonstration purposes only, the following line of code points to the directory for the demo htseq-count output files packages for the pasilla package.

directory <- system.file("extdata", package="pasilla",
                         mustWork=TRUE)

We specify which files to read in using list.files, and select those files which contain the string "treated" using grep. The sub function is used to chop up the sample filename to obtain the condition status, or you might alternatively read in a phenotypic table using read.table.

sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)
sampleTable <- data.frame(sampleName = sampleFiles,
                          fileName = sampleFiles,
                          condition = sampleCondition)

Then we build the DESeqDataSet using the following function:

library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~ condition)
ddsHTSeq
## class: DESeqDataSet 
## dim: 70463 7 
## metadata(1): version
## assays(1): counts
## rownames(70463): FBgn0000003:001 FBgn0000008:001 ...
##   FBgn0261575:001 FBgn0261575:002
## rowData names(0):
## colnames(7): treated1fb.txt treated2fb.txt ... untreated3fb.txt
##   untreated4fb.txt
## colData names(1): condition

SummarizedExperiment input

An example of the steps to produce a RangedSummarizedExperiment can be found in the RNA-seq workflow and in the vignette for the data package airway. Here we load the RangedSummarizedExperiment from that package in order to build a DESeqDataSet.

library("airway")
data("airway")
se <- airway

The constructor function below shows the generation of a DESeqDataSet from a RangedSummarizedExperiment se.

library("DESeq2")
ddsSE <- DESeqDataSet(se, design = ~ cell + dex)
ddsSE
## class: DESeqDataSet 
## dim: 64102 8 
## metadata(2): '' version
## assays(1): counts
## rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
## rowData names(0):
## colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
## colData names(9): SampleName cell ... Sample BioSample

Pre-filtering

While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the dds data object, and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to keep only rows that have at least 10 reads total. Note that more strict filtering to increase power is automatically applied via independent filtering on the mean of normalized counts within the results function.

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Note on factor levels

By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. You should only change the factor levels of variables in the design before running the DESeq2 analysis, not during or afterward. Setting the factor levels can be done in two ways, either using factor:

dds$condition <- factor(dds$condition, levels = c("untreated","treated"))

…or using relevel, just specifying the reference level:

dds$condition <- relevel(dds$condition, ref = "untreated")

If you need to subset the columns of a DESeqDataSet, i.e., when removing certain samples from the analysis, it is possible that all the samples for one or more levels of a variable in the design formula would be removed. In this case, the droplevels function can be used to remove those levels which do not have samples in the current DESeqDataSet:

dds$condition <- droplevels(dds$condition)

Collapsing technical replicates

DESeq2 provides a function collapseReplicates which can assist in combining the counts from technical replicates into single columns of the count matrix. The term technical replicate implies multiple sequencing runs of the same library. You should not collapse biological replicates using this function. See the manual page for an example of the use of collapseReplicates.

About the pasilla dataset

We continue with the pasilla data constructed from the count matrix method above. This data set is from an experiment on Drosophila melanogaster cell cultures and investigated the effect of RNAi knock-down of the splicing factor pasilla (Brooks et al. 2011). The detailed transcript of the production of the pasilla data is provided in the vignette of the data package pasilla.

Differential expression analysis

The standard differential expression analysis steps are wrapped into a single function, DESeq. The estimation steps performed by this function are described below, in the manual page for ?DESeq and in the Methods section of the DESeq2 publication (Love, Huber, and Anders 2014).

Results tables are generated using the function results, which extracts a results table with log2 fold changes, p values and adjusted p values. With no additional arguments to results, the log2 fold change and Wald test p value will be for the last variable in the design formula, and if this is a factor, the comparison will be the last level of this variable over the first level. However, the order of the variables of the design do not matter so long as the user specifies the comparison using the name or contrast arguments of results (described later and in ?results).

Details about the comparison are printed to the console, above the results table. The text, condition treated vs untreated, tells you that the estimates are of the logarithmic fold change log2(treated/untreated).

dds <- DESeq(dds)
res <- results(dds)
res
## log2 fold change (MLE): condition treated vs untreated 
## Wald test p-value: condition treated vs untreated 
## DataFrame with 9921 rows and 6 columns
##                baseMean log2FoldChange     lfcSE        stat     pvalue
##               <numeric>      <numeric> <numeric>   <numeric>  <numeric>
## FBgn0000008   95.144292    0.002276428 0.2237292  0.01017493 0.99188172
## FBgn0000014    1.056523   -0.495113878 2.1431096 -0.23102593 0.81729466
## FBgn0000017 4352.553569   -0.239918945 0.1263378 -1.89902705 0.05756092
## FBgn0000018  418.610484   -0.104673913 0.1484903 -0.70492106 0.48085936
## FBgn0000024    6.406200    0.210848562 0.6895923  0.30575830 0.75978868
## ...                 ...            ...       ...         ...        ...
## FBgn0261570 3208.388610     0.29553289 0.1273514  2.32061001  0.0203079
## FBgn0261572    6.197188    -0.95882276 0.7753130 -1.23669125  0.2162017
## FBgn0261573 2240.979511     0.01271946 0.1133028  0.11226079  0.9106166
## FBgn0261574 4857.680373     0.01539243 0.1925619  0.07993497  0.9362890
## FBgn0261575   10.682520     0.16356865 0.9308661  0.17571663  0.8605166
##                  padj
##             <numeric>
## FBgn0000008 0.9972093
## FBgn0000014        NA
## FBgn0000017 0.2880108
## FBgn0000018 0.8268644
## FBgn0000024 0.9435005
## ...               ...
## FBgn0261570 0.1442486
## FBgn0261572 0.6078453
## FBgn0261573 0.9826550
## FBgn0261574 0.9881787
## FBgn0261575 0.9679223

In previous versions of DESeq2, the DESeq function by default would produce moderated, or shrunken, log2 fold changes through the use of the betaPrior argument. In version 1.16 and higher, we have split the moderation of log2 fold changes into a separate function, lfcShrink, for reasons described in the changes section below.

Here we provide the dds object and the number of the coefficient we want to moderate. It is also possible to specify a contrast, instead of coef, which works the same as the contrast argument of the results function. If a results object is provided, the log2FoldChange column will be swapped out, otherwise lfcShrink returns a vector of shrunken log2 fold changes.

resultsNames(dds)
## [1] "Intercept"                      "condition_treated_vs_untreated"
resLFC <- lfcShrink(dds, coef=2)
resLFC
## log2 fold change (MAP): condition treated vs untreated 
## Wald test p-value: condition treated vs untreated 
## DataFrame with 9921 rows and 6 columns
##                baseMean log2FoldChange      lfcSE        stat     pvalue
##               <numeric>      <numeric>  <numeric>   <numeric>  <numeric>
## FBgn0000008   95.144292     0.00155932 0.15353974  0.01017493 0.99188172
## FBgn0000014    1.056523    -0.01200958 0.05031619 -0.23102593 0.81729466
## FBgn0000017 4352.553569    -0.20935007 0.11026137 -1.89902705 0.05756092
## FBgn0000018  418.610484    -0.08715582 0.12358964 -0.70492106 0.48085936
## FBgn0000024    6.406200     0.03956173 0.12886543  0.30575830 0.75978868
## ...                 ...            ...        ...         ...        ...
## FBgn0261570 3208.388610     0.25744403  0.1109237  2.32061001  0.0203079
## FBgn0261572    6.197188    -0.14674317  0.1220335 -1.23669125  0.2162017
## FBgn0261573 2240.979511     0.01138380  0.1014129  0.11226079  0.9106166
## FBgn0261574 4857.680373     0.01150000  0.1438479  0.07993497  0.9362890
## FBgn0261575   10.682520     0.01898573  0.1043720  0.17571663  0.8605166
##                  padj
##             <numeric>
## FBgn0000008 0.9972093
## FBgn0000014        NA
## FBgn0000017 0.2880108
## FBgn0000018 0.8268644
## FBgn0000024 0.9435005
## ...               ...
## FBgn0261570 0.1442486
## FBgn0261572 0.6078453
## FBgn0261573 0.9826550
## FBgn0261574 0.9881787
## FBgn0261575 0.9679223

The above steps should take less than 30 seconds for most analyses. For experiments with many samples (e.g. 100 samples), one can take advantage of parallelized computation. Parallelizing DESeq, results, and lfcShrink can be easily accomplished by loading the BiocParallel package, and then setting the following arguments: parallel=TRUE and BPPARAM=MulticoreParam(4), for example, splitting the job over 4 cores. Note that results for coefficients or contrasts listed in resultsNames(dds) is fast and will not need parallelization. As an alternative to BPPARAM, one can register cores at the beginning of an analysis, and then just specify parallel=TRUE to the functions when called.

library("BiocParallel")
register(MulticoreParam(4))

We can order our results table by the smallest p value:

resOrdered <- res[order(res$pvalue),]

We can summarize some basic tallies using the summary function.

summary(res)
## 
## out of 9921 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)     : 518, 5.2% 
## LFC < 0 (down)   : 536, 5.4% 
## outliers [1]     : 1, 0.01% 
## low counts [2]   : 1539, 16% 
## (mean count < 6)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

How many adjusted p-values were less than 0.1?

sum(res$padj < 0.1, na.rm=TRUE)
## [1] 1054

The results function contains a number of arguments to customize the results table which is generated. You can read about these arguments by looking up ?results. Note that the results function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha. Independent filtering is further discussed below. By default the argument alpha is set to \(0.1\). If the adjusted p value cutoff will be a value other than \(0.1\), alpha should be set to that value:

res05 <- results(dds, alpha=0.05)
summary(res05)
## 
## out of 9921 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)     : 407, 4.1% 
## LFC < 0 (down)   : 431, 4.3% 
## outliers [1]     : 1, 0.01% 
## low counts [2]   : 1347, 14% 
## (mean count < 5)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
sum(res05$padj < 0.05, na.rm=TRUE)
## [1] 838

A generalization of the idea of p value filtering is to weight hypotheses to optimize power. A Bioconductor package, IHW, is available that implements the method of Independent Hypothesis Weighting (Ignatiadis et al. 2016). Here we show the use of IHW for p value adjustment of DESeq2 results. For more details, please see the vignette of the IHW package. The IHW result object is stored in the metadata.

Note: If the results of independent hypothesis weighting are used in published research, please cite:

Ignatiadis, N., Klaus, B., Zaugg, J.B., Huber, W. (2016) Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13:7. 10.1038/nmeth.3885

library("IHW")
resIHW <- results(dds, filterFun=ihw)
summary(resIHW)
## 
## out of 9921 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)     : 504, 5.1% 
## LFC < 0 (down)   : 540, 5.4% 
## outliers [1]     : 1, 0.01% 
## [1] see 'cooksCutoff' argument of ?results
## [2] see metadata(res)$ihwResult on hypothesis weighting
sum(resIHW$padj < 0.1, na.rm=TRUE)
## [1] 1044
metadata(resIHW)$ihwResult
## ihwResult object with 9921 hypothesis tests 
## Nominal FDR control level: 0.1 
## Split into 6 bins, based on an ordinal covariate

If a multi-factor design is used, or if the variable in the design formula has more than two levels, the contrast argument of results can be used to extract different comparisons from the DESeqDataSet returned by DESeq. The use of the contrast argument is further discussed below.

For advanced users, note that all the values calculated by the DESeq2 package are stored in the DESeqDataSet object, and access to these values is discussed below.

Exploring and exporting results

MA-plot

In DESeq2, the function plotMA shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples in the DESeqDataSet. Points will be colored red if the adjusted p value is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

plotMA(res, ylim=c(-2,2))

It is more useful visualize the MA-plot for the shrunken log2 fold changes, which remove the noise associated with log2 fold changes from low count genes without requiring arbitrary filtering thresholds.

plotMA(resLFC, ylim=c(-2,2))

After calling plotMA, one can use the function identify to interactively detect the row number of individual genes by clicking on the plot. One can then recover the gene identifiers by saving the resulting indices:

idx <- identify(res$baseMean, res$log2FoldChange)
rownames(res)[idx]

Alternative shrinkage estimators

The moderated log fold changes proposed by Love, Huber, and Anders (2014) use a normal prior distribution, centered on zero and with a scale that is fit to the data. The shrunken log fold changes are useful for ranking and visualization, without the need for arbitrary filters on low count genes. The normal prior can sometimes produce too strong of shrinkage for certain datasets. In DESeq2 version 1.18, we include two additional adaptive shrinkage estimators, available via the type argument of lfcShrink. For more details, see ?lfcShrink

The options for type are:

  • normal is the the original DESeq2 shrinkage estimator, an adaptive normal prior
  • apeglm is the adaptive t prior shrinkage estimator from the apeglm package
  • ashr is the adaptive shrinkage estimator from the ashr package (Stephens 2016). Here DESeq2 uses the ashr option to fit a mixture of normal distributions to form the prior, with method="shrinkage"

Note: if the shrinkage estimator type="ashr" is used in published research, please cite:

Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2. 10.1093/biostatistics/kxw041

resApe <- lfcShrink(dds, coef=2, type="apeglm")
resAsh <- lfcShrink(dds, coef=2, type="ashr")
par(mfrow=c(1,3), mar=c(4,4,2,1))
xlim <- c(1,1e5); ylim <- c(-3,3)
plotMA(resLFC, xlim=xlim, ylim=ylim, main="normal")
plotMA(resApe, xlim=xlim, ylim=ylim, main="apeglm")
plotMA(resAsh, xlim=xlim, ylim=ylim, main="ashr")

Note: due to the nature of the statistical model and optimization approach, apeglm is usually a factor of ~5 slower than normal. For example, with 10,000 genes and 10 samples, normal may take ~3 seconds, while apeglm takes ~15 seconds (on a laptop). However, apeglm can be more than an order of magnitude slower when there are many coefficients, e.g. 10 or more coefficients in resultsNames(dds). The method ashr is fairly fast and does not depend on the number of coefficients, as it uses only the estimated MLE coefficients and their standard errors. A solution for speeding up normal and apeglm is to use multiple cores. This can be easily accomplished by loading the BiocParallel package, and then setting the following arguments of lfcShrink: parallel=TRUE and BPPARAM=MulticoreParam(4), for example, splitting the job over 4 cores. This approach can also be used with DESeq and results, as mentioned above.

Note: If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this, which can be accommodated in DESeq2 by including in the design any known batch variables or by using functions/packages such as svaseq in sva (Leek 2014) or the RUV functions in RUVSeq (Risso et al. 2014) to estimate variables that capture the unwanted variation. In addition, the ashr developers have a specific method for accounting for unwanted variation in combination with ashr (Gerard and Stephens 2017).

Plot counts

It can also be useful to examine the counts of reads for a single gene across the groups. A simple function for making this plot is plotCounts, which normalizes counts by sequencing depth and adds a pseudocount of 1/2 to allow for log scale plotting. The counts are grouped by the variables in intgroup, where more than one variable can be specified. Here we specify the gene which had the smallest p value from the results table created above. You can select the gene to plot by rowname or by numeric index.

plotCounts(dds, gene=which.min(res$padj), intgroup="condition")

For customized plotting, an argument returnData specifies that the function should only return a data.frame for plotting with ggplot.

d <- plotCounts(dds, gene=which.min(res$padj), intgroup="condition", 
                returnData=TRUE)
library("ggplot2")
ggplot(d, aes(x=condition, y=count)) + 
  geom_point(position=position_jitter(w=0.1,h=0)) + 
  scale_y_log10(breaks=c(25,100,400))

More information on results columns

Information about which variables and tests were used can be found by calling the function mcols on the results object.

mcols(res)$description
## [1] "mean of normalized counts for all samples"             
## [2] "log2 fold change (MLE): condition treated vs untreated"
## [3] "standard error: condition treated vs untreated"        
## [4] "Wald statistic: condition treated vs untreated"        
## [5] "Wald test p-value: condition treated vs untreated"     
## [6] "BH adjusted p-values"

For a particular gene, a log2 fold change of -1 for condition treated vs untreated means that the treatment induces a multiplicative change in observed gene expression level of \(2^{-1} = 0.5\) compared to the untreated condition. If the variable of interest is continuous-valued, then the reported log2 fold change is per unit of change of that variable.

Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:

  • If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
  • If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below
  • If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA. Description and customization of independent filtering is described below

Rich visualization and reporting of results

ReportingTools. An HTML report of the results with plots and sortable/filterable columns can be generated using the ReportingTools package on a DESeqDataSet that has been processed by the DESeq function. For a code example, see the RNA-seq differential expression vignette at the ReportingTools page, or the manual page for the publish method for the DESeqDataSet class.

regionReport. An HTML and PDF summary of the results with plots can also be generated using the regionReport package. The DESeq2Report function should be run on a DESeqDataSet that has been processed by the DESeq function. For more details see the manual page for DESeq2Report and an example vignette in the regionReport package.

Glimma. Interactive visualization of DESeq2 output, including MA-plots (also called MD-plot) can be generated using the Glimma package. See the manual page for glMDPlot.DESeqResults.

pcaExplorer. Interactive visualization of DESeq2 output, including PCA plots, boxplots of counts and other useful summaries can be generated using the pcaExplorer package. See the Launching the application section of the package vignette.

Exporting results to CSV files

A plain-text file of the results can be exported using the base R functions write.csv or write.delim. We suggest using a descriptive file name indicating the variable and levels which were tested.

write.csv(as.data.frame(resOrdered), 
          file="condition_treated_results.csv")

Exporting only the results which pass an adjusted p value threshold can be accomplished with the subset function, followed by the write.csv function.

resSig <- subset(resOrdered, padj < 0.1)
resSig
## log2 fold change (MLE): condition treated vs untreated 
## Wald test p-value: condition treated vs untreated 
## DataFrame with 1054 rows and 6 columns
##               baseMean log2FoldChange      lfcSE      stat        pvalue
##              <numeric>      <numeric>  <numeric> <numeric>     <numeric>
## FBgn0039155   730.5677      -4.618741 0.16912665 -27.30936 3.283533e-164
## FBgn0025111  1501.4479       2.899946 0.12735926  22.76981 9.134947e-115
## FBgn0029167  3706.0240      -2.196912 0.09792037 -22.43570 1.765125e-111
## FBgn0003360  4342.8321      -3.179541 0.14356712 -22.14672 1.121885e-108
## FBgn0035085   638.2193      -2.560242 0.13781525 -18.57735  4.901323e-77
## ...                ...            ...        ...       ...           ...
## FBgn0037073  973.10163     -0.2521459 0.10099319 -2.496662    0.01253684
## FBgn0029976 2312.58850     -0.2211265 0.08858313 -2.496259    0.01255108
## FBgn0030938   24.80638      0.9576449 0.38364585  2.496169    0.01255428
## FBgn0034753 7775.27113      0.3935148 0.15767268  2.495770    0.01256840
## FBgn0039260 1088.27659     -0.2592536 0.10387902 -2.495727    0.01256994
##                      padj
##                 <numeric>
## FBgn0039155 2.751929e-160
## FBgn0025111 3.827999e-111
## FBgn0029167 4.931170e-108
## FBgn0003360 2.350629e-105
## FBgn0035085  8.215598e-74
## ...                   ...
## FBgn0037073     0.0999513
## FBgn0029976     0.0999513
## FBgn0030938     0.0999513
## FBgn0034753     0.0999513
## FBgn0039260     0.0999513

Multi-factor designs

Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables. In fact, DESeq2 can analyze any possible experimental design that can be expressed with fixed effects terms (multiple factors, designs with interactions, designs with continuous variables, splines, and so on are all possible).

By adding variables to the design, one can control for additional variation in the counts. For example, if the condition samples are balanced across experimental batches, by including the batch factor to the design, one can increase the sensitivity for finding differences due to condition. There are multiple ways to analyze experiments when the additional variables are of interest and not just controlling factors (see section on interactions).

The data in the pasilla package have a condition of interest (the column condition), as well as information on the type of sequencing which was performed (the column type), as we can see below:

colData(dds)
## DataFrame with 7 rows and 3 columns
##            condition        type sizeFactor
##             <factor>    <factor>  <numeric>
## treated1     treated single-read  1.6355014
## treated2     treated  paired-end  0.7612159
## treated3     treated  paired-end  0.8326603
## untreated1 untreated single-read  1.1383376
## untreated2 untreated single-read  1.7935406
## untreated3 untreated  paired-end  0.6494828
## untreated4 untreated  paired-end  0.7516005

We create a copy of the DESeqDataSet, so that we can rerun the analysis using a multi-factor design.

ddsMF <- dds

We change the levels of type so it only contains letters (numbers, underscore and period are also allowed in design factor levels). Be careful when changing level names to use the same order as the current levels.

levels(ddsMF$type)
## [1] "paired-end"  "single-read"
levels(ddsMF$type) <- sub("-.*", "", levels(ddsMF$type))
levels(ddsMF$type)
## [1] "paired" "single"

We can account for the different types of sequencing, and get a clearer picture of the differences attributable to the treatment. As condition is the variable of interest, we put it at the end of the formula. Thus the results function will by default pull the condition results unless contrast or name arguments are specified.

Then we can re-run DESeq:

design(ddsMF) <- formula(~ type + condition)
ddsMF <- DESeq(ddsMF)

Again, we access the results using the results function.

resMF <- results(ddsMF)
head(resMF)
## log2 fold change (MLE): condition treated vs untreated 
## Wald test p-value: condition treated vs untreated 
## DataFrame with 6 rows and 6 columns
##                baseMean log2FoldChange     lfcSE        stat     pvalue
##               <numeric>      <numeric> <numeric>   <numeric>  <numeric>
## FBgn0000008   95.144292    -0.04055736 0.2200633 -0.18429862 0.85377920
## FBgn0000014    1.056523    -0.08351882 2.0760816 -0.04022906 0.96791051
## FBgn0000017 4352.553569    -0.25605701 0.1122166 -2.28181137 0.02250048
## FBgn0000018  418.610484    -0.06461523 0.1313488 -0.49193622 0.62276444
## FBgn0000024    6.406200     0.30898382 0.7560075  0.40870468 0.68275640
## FBgn0000032  989.720217    -0.04837925 0.1208420 -0.40035128 0.68889781
##                  padj
##             <numeric>
## FBgn0000008 0.9494634
## FBgn0000014        NA
## FBgn0000017 0.1302627
## FBgn0000018 0.8593904
## FBgn0000024 0.8877697
## FBgn0000032 0.8902013

It is also possible to retrieve the log2 fold changes, p values and adjusted p values of the type variable. The contrast argument of the function results takes a character vector of length three: the name of the variable, the name of the factor level for the numerator of the log2 ratio, and the name of the factor level for the denominator. The contrast argument can also take other forms, as described in the help page for results and below

resMFType <- results(ddsMF,
                     contrast=c("type", "single", "paired"))
head(resMFType)
## log2 fold change (MLE): type single vs paired 
## Wald test p-value: type single vs paired 
## DataFrame with 6 rows and 6 columns
##                baseMean log2FoldChange     lfcSE       stat     pvalue
##               <numeric>      <numeric> <numeric>  <numeric>  <numeric>
## FBgn0000008   95.144292     -0.2623745 0.2185279 -1.2006453 0.22988882
## FBgn0000014    1.056523      3.2898915 2.0531720  1.6023457 0.10907917
## FBgn0000017 4352.553569     -0.1000200 0.1120782 -0.8924127 0.37217178
## FBgn0000018  418.610484      0.2290491 0.1302603  1.7583952 0.07868028
## FBgn0000024    6.406200      0.3060704 0.7514061  0.4073303 0.68376543
## FBgn0000032  989.720217      0.2374130 0.1202745  1.9739259 0.04839017
##                  padj
##             <numeric>
## FBgn0000008 0.5361332
## FBgn0000014        NA
## FBgn0000017 0.6831308
## FBgn0000018 0.2917133
## FBgn0000024 0.8804532
## FBgn0000032 0.2175655

If the variable is continuous or an interaction term (see section on interactions) then the results can be extracted using the name argument to results, where the name is one of elements returned by resultsNames(dds).

Data transformations and visualization

Count data transformations

In order to test for differential expression, we operate on raw counts and use discrete distributions as described in the previous section on differential expression. However for other downstream analyses – e.g. for visualization or clustering – it might be useful to work with transformed versions of the count data.

Maybe the most obvious choice of transformation is the logarithm. Since count values for a gene can be zero in some conditions (and non-zero in others), some advocate the use of pseudocounts, i.e. transformations of the form:

\[ y = \log_2(n + n_0) \]

where n represents the count values and \(n_0\) is a positive constant.

In this section, we discuss two alternative approaches that offer more theoretical justification and a rational way of choosing parameters equivalent to \(n_0\) above. One makes use of the concept of variance stabilizing transformations (VST) (Tibshirani 1988; Huber et al. 2003; Anders and Huber 2010), and the other is the regularized logarithm or rlog, which incorporates a prior on the sample differences (Love, Huber, and Anders 2014). Both transformations produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors.

The point of these two transformations, the VST and the rlog, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low. Both VST and rlog use the experiment-wide trend of variance over mean, in order to transform the data to remove the experiment-wide trend. Note that we do not require or desire that all the genes have exactly the same variance after transformation. Indeed, in a figure below, you will see that after the transformations the genes with the same mean do not have exactly the same standard deviations, but that the experiment-wide trend has flattened. It is those genes with row variance above the trend which will allow us to cluster samples into interesting groups.

Note on running time: if you have many samples (e.g. 100s), the rlog function might take too long, and so the vst function will be a faster choice. The rlog and VST have similar properties, but the rlog requires fitting a shrinkage term for each sample and each gene which takes time. See the DESeq2 paper for more discussion on the differences (Love, Huber, and Anders 2014).

Blind dispersion estimation

The two functions, vst and rlog have an argument blind, for whether the transformation should be blind to the sample information specified by the design formula. When blind equals TRUE (the default), the functions will re-estimate the dispersions using only an intercept. This setting should be used in order to compare samples in a manner wholly unbiased by the information about experimental groups, for example to perform sample QA (quality assurance) as demonstrated below.

However, blind dispersion estimation is not the appropriate choice if one expects that many or the majority of genes (rows) will have large differences in counts which are explainable by the experimental design, and one wishes to transform the data for downstream analysis. In this case, using blind dispersion estimation will lead to large estimates of dispersion, as it attributes differences due to experimental design as unwanted noise, and will result in overly shrinking the transformed values towards each other. By setting blind to FALSE, the dispersions already estimated will be used to perform transformations, or if not present, they will be estimated using the current design formula. Note that only the fitted dispersion estimates from mean-dispersion trend line are used in the transformation (the global dependence of dispersion on mean for the entire experiment). So setting blind to FALSE is still for the most part not using the information about which samples were in which experimental group in applying the transformation.

Extracting transformed values

These transformation functions return an object of class DESeqTransform which is a subclass of RangedSummarizedExperiment. For ~20 samples, running on a newly created DESeqDataSet, rlog may take 30 seconds, while vst takes less than 1 second. The running times are shorter when using blind=FALSE and if the function DESeq has already been run, because then it is not necessary to re-estimate the dispersion values. The assay function is used to extract the matrix of normalized values.

vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
##              treated1  treated2  treated3 untreated1 untreated2 untreated3
## FBgn0000008  7.607799  7.834807  7.594933   7.567177   7.642058   7.844499
## FBgn0000014  6.318607  6.040987  6.040987   6.412578   6.173698   6.040987
## FBgn0000017 11.938303 12.024550 12.013558  12.045714  12.284641  12.455933
##             untreated4
## FBgn0000008   7.669033
## FBgn0000014   6.040987
## FBgn0000017  12.077397

Variance stabilizing transformation

Above, we used a parametric fit for the dispersion. In this case, the closed-form expression for the variance stabilizing transformation is used by the vst function. If a local fit is used (option fitType="locfit" to estimateDispersions) a numerical integration is used instead. The transformed data should be approximated variance stabilized and also includes correction for size factors or normalization factors. The transformed data is on the log2 scale for large counts.

Regularized log transformation

The function rlog, stands for regularized log, transforming the original count data to the log2 scale by fitting a model with a term for each sample and a prior distribution on the coefficients which is estimated from the data. This is the same kind of shrinkage (sometimes referred to as regularization, or moderation) of log fold changes used by the DESeq and nbinomWaldTest. The resulting data contains elements defined as:

\[ \log_2(q_{ij}) = \beta_{i0} + \beta_{ij} \]

where \(q_{ij}\) is a parameter proportional to the expected true concentration of fragments for gene i and sample j (see formula below), \(\beta_{i0}\) is an intercept which does not undergo shrinkage, and \(\beta_{ij}\) is the sample-specific effect which is shrunk toward zero based on the dispersion-mean trend over the entire dataset. The trend typically captures high dispersions for low counts, and therefore these genes exhibit higher shrinkage from the rlog.

Note that, as \(q_{ij}\) represents the part of the mean value \(\mu_{ij}\) after the size factor \(s_j\) has been divided out, it is clear that the rlog transformation inherently accounts for differences in sequencing depth. Without priors, this design matrix would lead to a non-unique solution, however the addition of a prior on non-intercept betas allows for a unique solution to be found.

Effects of transformations on the variance

The figure below plots the standard deviation of the transformed data, across samples, against the mean, using the shifted logarithm transformation, the regularized log transformation and the variance stabilizing transformation. The shifted logarithm has elevated standard deviation in the lower count range, and the regularized log to a lesser extent, while for the variance stabilized data the standard deviation is roughly constant along the whole dynamic range.

Note that the vertical axis in such plots is the square root of the variance over all samples, so including the variance due to the experimental conditions. While a flat curve of the square root of variance over the mean may seem like the goal of such transformations, this may be unreasonable in the case of datasets with many true differences due to the experimental conditions.

# this gives log2(n + 1)
ntd <- normTransform(dds)
library("vsn")
meanSdPlot(assay(ntd))

meanSdPlot(assay(vsd))

meanSdPlot(assay(rld))

Data quality assessment by sample clustering and visualization

Data quality assessment and quality control (i.e. the removal of insufficiently good data) are essential steps of any data analysis. These steps should typically be performed very early in the analysis of a new data set, preceding or in parallel to the differential expression testing.

We define the term quality as fitness for purpose. Our purpose is the detection of differentially expressed genes, and we are looking in particular for samples whose experimental treatment suffered from an anormality that renders the data points obtained from these particular samples detrimental to our purpose.

Heatmap of the count matrix

To explore a count matrix, it is often instructive to look at it as a heatmap. Below we show how to produce such a heatmap for various transformations of the data.

library("pheatmap")
select <- order(rowMeans(counts(dds,normalized=TRUE)),
                decreasing=TRUE)[1:20]
df <- as.data.frame(colData(dds)[,c("condition","type")])
pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col=df)

pheatmap(assay(vsd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col=df)

pheatmap(assay(rld)[select,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col=df)

Heatmap of the sample-to-sample distances

Another use of the transformed data is sample clustering. Here, we apply the dist function to the transpose of the transformed count matrix to get sample-to-sample distances.

sampleDists <- dist(t(assay(vsd)))

A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples. We have to provide a hierarchical clustering hc to the heatmap function based on the sample distances, or else the heatmap function would calculate a clustering based on the distances between the rows/columns of the distance matrix.

library("RColorBrewer")
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)

Principal component plot of the samples

Related to the distance matrix is the PCA plot, which shows the samples in the 2D plane spanned by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects.

plotPCA(vsd, intgroup=c("condition", "type"))

It is also possible to customize the PCA plot using the ggplot function.

pcaData <- plotPCA(vsd, intgroup=c("condition", "type"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=condition, shape=type)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed()

Variations to the standard workflow

Wald test individual steps

The function DESeq runs the following functions in order:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)

Contrasts

A contrast is a linear combination of estimated log2 fold changes, which can be used to test if differences between groups are equal to zero. The simplest use case for contrasts is an experimental design containing a factor with three levels, say A, B and C. Contrasts enable the user to generate results for all 3 possible differences: log2 fold change of B vs A, of C vs A, and of C vs B. The contrast argument of results function is used to extract test results of log2 fold changes of interest, for example:

results(dds, contrast=c("condition","C","B"))

Log2 fold changes can also be added and subtracted by providing a list to the contrast argument which has two elements: the names of the log2 fold changes to add, and the names of the log2 fold changes to subtract. The names used in the list should come from resultsNames(dds).

Alternatively, a numeric vector of the length of resultsNames(dds) can be provided, for manually specifying the linear combination of terms. Demonstrations of the use of contrasts for various designs can be found in the examples section of the help page for the results function. The mathematical formula that is used to generate the contrasts can be found below.

Interactions

Interaction terms can be added to the design formula, in order to test, for example, if the log2 fold change attributable to a given condition is different based on another factor, for example if the condition effect differs across genotype.

Many users begin to add interaction terms to the design formula, when in fact a much simpler approach would give all the results tables that are desired. We will explain this approach first, because it is much simpler to perform. If the comparisons of interest are, for example, the effect of a condition for different sets of samples, a simpler approach than adding interaction terms explicitly to the design formula is to perform the following steps:

  • combine the factors of interest into a single factor with all combinations of the original factors
  • change the design to include just this factor, e.g. ~ group

Using this design is similar to adding an interaction term, in that it models multiple condition effects which can be easily extracted with results. Suppose we have two factors genotype (with values I, II, and III) and condition (with values A and B), and we want to extract the condition effect specifically for each genotype. We could use the following approach to obtain, e.g. the condition effect for genotype I:

dds$group <- factor(paste0(dds$genotype, dds$condition))
design(dds) <- ~ group
dds <- DESeq(dds)
resultsNames(dds)
results(dds, contrast=c("group", "IB", "IA"))

The following two plots diagram hypothetical genotype-specific condition effects, which could be modeled with interaction terms by using a design of ~genotype + condition + genotype:condition.

In the first plot (Gene 1), note that the condition effect is consistent across genotypes. Although condition A has a different baseline for I,II, and III, the condition effect is a log2 fold change of about 2 for each genotype. Using a model with an interaction term genotype:condition, the interaction terms for genotype II and genotype III will be nearly 0.

Here, the y-axis represents log2(n+1), and each group has 20 samples (black dots). A red line connects the mean of the groups within each genotype.

In the second plot (Gene 2), we can see that the condition effect is not consistent across genotype. Here the main condition effect (the effect for the reference genotype I) is again 2. However, this time the interaction terms will be around 1 for genotype II and -4 for genotype III. This is because the condition effect is higher by 1 for genotype II compared to genotype I, and lower by 4 for genotype III compared to genotype I. The condition effect for genotype II (or III) is obtained by adding the main condition effect and the interaction term for that genotype. Such a plot can be made using the plotCounts function as shown above.

Now we will continue to explain the use of interactions in order to test for differences in condition effects. We continue with the example of condition effects across three genotypes (I, II, and III).

The key point to remember about designs with interaction terms is that, unlike for a design ~genotype + condition, where the condition effect represents the overall effect controlling for differences due to genotype, by adding genotype:condition, the main condition effect only represents the effect of condition for the reference level of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms genotypeII.conditionB and genotypeIII.conditionB give the difference between the condition effect for a given genotype and the condition effect for the reference genotype.

This genotype-condition interaction example is examined in further detail in Example 3 in the help page for results, which can be found by typing ?results. In particular, we show how to test for differences in the condition effect across genotype, and we show how to obtain the condition effect for non-reference genotypes.

Time-series experiments

There are a number of ways to analyze time-series experiments, depending on the biological question of interest. In order to test for any differences over multiple time points, once can use a design including the time factor, and then test using the likelihood ratio test as described in the following section, where the time factor is removed in the reduced formula. For a control and treatment time series, one can use a design formula containing the condition factor, the time factor, and the interaction of the two. In this case, using the likelihood ratio test with a reduced model which does not contain the interaction terms will test whether the condition induces a change in gene expression at any time point after the reference level time point (time 0). An example of the later analysis is provided in our RNA-seq workflow.

Likelihood ratio test

DESeq2 offers two kinds of hypothesis tests: the Wald test, where we use the estimated standard error of a log2 fold change to test if it is equal to zero, and the likelihood ratio test (LRT). The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero.

The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model.

The likelihood ratio test can be performed by specifying test="LRT" when using the DESeq function, and providing a reduced design formula, e.g. one in which a number of terms from design(dds) are removed. The degrees of freedom for the test is obtained from the difference between the number of parameters in the two models. A simple likelihood ratio test, if the full design was ~condition would look like:

dds <- DESeq(dds, test="LRT", reduced=~1)
res <- results(dds)

If the full design contained other variables, such as a batch variable, e.g. ~batch + condition then the likelihood ratio test would look like:

dds <- DESeq(dds, test="LRT", reduced=~batch)
res <- results(dds)

Approach to count outliers

RNA-seq data sometimes contain isolated instances of very large counts that are apparently unrelated to the experimental or study design, and which may be considered outliers. There are many reasons why outliers can arise, including rare technical or experimental artifacts, read mapping problems in the case of genetically differing samples, and genuine, but rare biological events. In many cases, users appear primarily interested in genes that show a consistent behavior, and this is the reason why by default, genes that are affected by such outliers are set aside by DESeq2, or if there are sufficient samples, outlier counts are replaced for model fitting. These two behaviors are described below.

The DESeq function calculates, for every gene and for every sample, a diagnostic test for outliers called Cook’s distance. Cook’s distance is a measure of how much a single sample is influencing the fitted coefficients for a gene, and a large value of Cook’s distance is intended to indicate an outlier count. The Cook’s distances are stored as a matrix available in assays(dds)[["cooks"]].

The results function automatically flags genes which contain a Cook’s distance above a cutoff for samples which have 3 or more replicates. The p values and adjusted p values for these genes are set to NA. At least 3 replicates are required for flagging, as it is difficult to judge which sample might be an outlier with only 2 replicates. This filtering can be turned off with results(dds, cooksCutoff=FALSE).

With many degrees of freedom – i.,e., many more samples than number of parameters to be estimated – it is undesirable to remove entire genes from the analysis just because their data include a single count outlier. When there are 7 or more replicates for a given sample, the DESeq function will automatically replace counts with large Cook’s distance with the trimmed mean over all samples, scaled up by the size factor or normalization factor for that sample. This approach is conservative, it will not lead to false positives, as it replaces the outlier value with the value predicted by the null hypothesis. This outlier replacement only occurs when there are 7 or more replicates, and can be turned off with DESeq(dds, minReplicatesForReplace=Inf).

The default Cook’s distance cutoff for the two behaviors described above depends on the sample size and number of parameters to be estimated. The default is to use the 99% quantile of the F(p,m-p) distribution (with p the number of parameters including the intercept and m number of samples). The default for gene flagging can be modified using the cooksCutoff argument to the results function. For outlier replacement, DESeq preserves the original counts in counts(dds) saving the replacement counts as a matrix named replaceCounts in assays(dds). Note that with continuous variables in the design, outlier detection and replacement is not automatically performed, as our current methods involve a robust estimation of within-group variance which does not extend easily to continuous covariates. However, users can examine the Cook’s distances in assays(dds)[["cooks"]], in order to perform manual visualization and filtering if necessary.

Note on many outliers: if there are very many outliers (e.g. many hundreds or thousands) reported by summary(res), one might consider further exploration to see if a single sample or a few samples should be removed due to low quality. The automatic outlier filtering/replacement is most useful in situations which the number of outliers is limited. When there are thousands of reported outliers, it might make more sense to turn off the outlier filtering/replacement (DESeq with minReplicatesForReplace=Inf and results with cooksCutoff=FALSE) and perform manual inspection: First it would be advantageous to make a PCA plot as described above to spot individual sample outliers; Second, one can make a boxplot of the Cook’s distances to see if one sample is consistently higher than others (here this is not the case):

par(mar=c(8,5,2,2))
boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2)

Dispersion plot and fitting alternatives

Plotting the dispersion estimates is a useful diagnostic. The dispersion plot below is typical, with the final estimates shrunk from the gene-wise estimates towards the fitted estimates. Some gene-wise estimates are flagged as outliers and not shrunk towards the fitted value, (this outlier detection is described in the manual page for estimateDispersionsMAP). The amount of shrinkage can be more or less than seen here, depending on the sample size, the number of coefficients, the row mean and the variability of the gene-wise estimates.

plotDispEsts(dds)

Local or mean dispersion fit

A local smoothed dispersion fit is automatically substitited in the case that the parametric curve doesn’t fit the observed dispersion mean relationship. This can be prespecified by providing the argument fitType="local" to either DESeq or estimateDispersions. Additionally, using the mean of gene-wise disperion estimates as the fitted value can be specified by providing the argument fitType="mean".

Supply a custom dispersion fit

Any fitted values can be provided during dispersion estimation, using the lower-level functions described in the manual page for estimateDispersionsGeneEst. In the code chunk below, we store the gene-wise estimates which were already calculated and saved in the metadata column dispGeneEst. Then we calculate the median value of the dispersion estimates above a threshold, and save these values as the fitted dispersions, using the replacement function for dispersionFunction. In the last line, the function estimateDispersionsMAP, uses the fitted dispersions to generate maximum a posteriori (MAP) estimates of dispersion.

ddsCustom <- dds
useForMedian <- mcols(ddsCustom)$dispGeneEst > 1e-7
medianDisp <- median(mcols(ddsCustom)$dispGeneEst[useForMedian],
                     na.rm=TRUE)
dispersionFunction(ddsCustom) <- function(mu) medianDisp
ddsCustom <- estimateDispersionsMAP(ddsCustom)

Independent filtering of results

The results function of the DESeq2 package performs independent filtering by default using the mean of normalized counts as a filter statistic. A threshold on the filter statistic is found which optimizes the number of adjusted p values lower than a significance level alpha (we use the standard variable name for significance level, though it is unrelated to the dispersion parameter \(\alpha\)). The theory behind independent filtering is discussed in greater detail below. The adjusted p values for the genes which do not pass the filter threshold are set to NA.

The default independent filtering is performed using the filtered_p function of the genefilter package, and all of the arguments of filtered_p can be passed to the results function. The filter threshold value and the number of rejections at each quantile of the filter statistic are available as metadata of the object returned by results.

For example, we can visualize the optimization by plotting the filterNumRej attribute of the results object. The results function maximizes the number of rejections (adjusted p value less than a significance level), over the quantiles of a filter statistic (the mean of normalized counts). The threshold chosen (vertical line) is the lowest quantile of the filter for which the number of rejections is within 1 residual standard deviation to the peak of a curve fit to the number of rejections over the filter quantiles:

metadata(res)$alpha
## [1] 0.1
metadata(res)$filterThreshold
## 15.5102% 
## 6.150425
plot(metadata(res)$filterNumRej, 
     type="b", ylab="number of rejections",
     xlab="quantiles of filter")
lines(metadata(res)$lo.fit, col="red")
abline(v=metadata(res)$filterTheta)

Independent filtering can be turned off by setting independentFiltering to FALSE.

resNoFilt <- results(dds, independentFiltering=FALSE)
addmargins(table(filtering=(res$padj < .1),
                 noFiltering=(resNoFilt$padj < .1)))
##          noFiltering
## filtering FALSE TRUE  Sum
##     FALSE  7327    0 7327
##     TRUE     74  980 1054
##     Sum    7401  980 8381

Tests of log2 fold change above or below a threshold

It is also possible to provide thresholds for constructing Wald tests of significance. Two arguments to the results function allow for threshold-based Wald tests: lfcThreshold, which takes a numeric of a non-negative threshold value, and altHypothesis, which specifies the kind of test. Note that the alternative hypothesis is specified by the user, i.e. those genes which the user is interested in finding, and the test provides p values for the null hypothesis, the complement of the set defined by the alternative. The altHypothesis argument can take one of the following four values, where \(\beta\) is the log2 fold change specified by the name argument, and \(x\) is the lfcThreshold.

  • greaterAbs - \(|\beta| > x\) - tests are two-tailed
  • lessAbs - \(|\beta| < x\) - p values are the maximum of the upper and lower tests
  • greater - \(\beta > x\)
  • less - \(\beta < -x\)

The four possible values of altHypothesis are demonstrated in the following code and visually by MA-plots in the following figures.

par(mfrow=c(2,2),mar=c(2,2,1,1))
ylim <- c(-2.5,2.5)
resGA <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs")
resLA <- results(dds, lfcThreshold=.5, altHypothesis="lessAbs")
resG <- results(dds, lfcThreshold=.5, altHypothesis="greater")
resL <- results(dds, lfcThreshold=.5, altHypothesis="less")
drawLines <- function() abline(h=c(-.5,.5),col="dodgerblue",lwd=2)
plotMA(resGA, ylim=ylim); drawLines()
plotMA(resLA, ylim=ylim); drawLines()
plotMA(resG, ylim=ylim); drawLines()
plotMA(resL, ylim=ylim); drawLines()

Access to all calculated values

All row-wise calculated values (intermediate dispersion calculations, coefficients, standard errors, etc.) are stored in the DESeqDataSet object, e.g. dds in this vignette. These values are accessible by calling mcols on dds. Descriptions of the columns are accessible by two calls to mcols. Note that the call to substr below is only for display purposes.

mcols(dds,use.names=TRUE)[1:4,1:4]
## DataFrame with 4 rows and 4 columns
##                    gene    baseMean      baseVar   allZero
##                <factor>   <numeric>    <numeric> <logical>
## FBgn0000008 FBgn0000008   95.144292 2.248206e+02     FALSE
## FBgn0000014 FBgn0000014    1.056523 2.961952e+00     FALSE
## FBgn0000017 FBgn0000017 4352.553569 3.615380e+05     FALSE
## FBgn0000018 FBgn0000018  418.610484 2.349027e+03     FALSE
substr(names(mcols(dds)),1,10) 
##  [1] "gene"       "baseMean"   "baseVar"    "allZero"    "dispGeneEs"
##  [6] "dispFit"    "dispersion" "dispIter"   "dispOutlie" "dispMAP"   
## [11] "Intercept"  "condition_" "SE_Interce" "SE_conditi" "WaldStatis"
## [16] "WaldStatis" "WaldPvalue" "WaldPvalue" "betaConv"   "betaIter"  
## [21] "deviance"   "maxCooks"
mcols(mcols(dds), use.names=TRUE)[1:4,]
## DataFrame with 4 rows and 2 columns
##                  type                                   description
##           <character>                                   <character>
## gene            input                                              
## baseMean intermediate     mean of normalized counts for all samples
## baseVar  intermediate variance of normalized counts for all samples
## allZero  intermediate                all counts for a gene are zero

The mean values \(\mu_{ij} = s_j q_{ij}\) and the Cook’s distances for each gene and sample are stored as matrices in the assays slot:

head(assays(dds)[["mu"]])
##               treated1     treated2     treated3  untreated1  untreated2
## FBgn0000008  154.39604   71.8609681   78.6055335  107.292913  169.048447
## FBgn0000014    1.50181    0.6989914    0.7645958    1.473259    2.321236
## FBgn0000017 6450.25959 3002.1618954 3283.9320689 5301.761109 8353.342790
## FBgn0000018  658.34912  306.4172250  335.1762452  492.704366  776.294589
## FBgn0000024   11.44974    5.3290832    5.8292484    6.885635   10.848861
## FBgn0000032 1561.83053  726.9270379  795.1533243 1158.470643 1825.261870
##               untreated3   untreated4
## FBgn0000008   61.2163768   70.8413790
## FBgn0000014    0.8405737    0.9727364
## FBgn0000017 3024.9398185 3500.5486968
## FBgn0000018  281.1143362  325.3137193
## FBgn0000024    3.9286252    4.5463198
## FBgn0000032  660.9697995  764.8935546
head(assays(dds)[["cooks"]])
##               treated1    treated2   treated3 untreated1   untreated2
## FBgn0000008 0.08830678 0.303871771 0.07781046 0.09824099 0.0137763910
## FBgn0000014 1.88673190 0.218246742 0.25186426 1.88310396 0.1847650641
## FBgn0000017 0.01372829 0.004978868 0.00214944 0.08044192 0.0104725895
## FBgn0000018 0.09518416 0.004710886 0.05477260 0.18460854 0.0023367713
## FBgn0000024 0.06631472 0.131131695 0.03122767 0.27064873 0.0004706695
## FBgn0000032 0.07377787 0.015891436 0.02053258 0.34090628 0.0217426932
##             untreated3   untreated4
## FBgn0000008 0.18921869 0.0005147318
## FBgn0000014 0.15347774 0.1887766572
## FBgn0000017 0.17278836 0.0549333693
## FBgn0000018 0.07634172 0.0108036598
## FBgn0000024 0.03100295 0.0813891462
## FBgn0000032 0.02482481 0.0774936225

The dispersions \(\alpha_i\) can be accessed with the dispersions function.

head(dispersions(dds))
## [1] 0.03035168 2.80571497 0.01289667 0.01565291 0.23770103 0.01691016
head(mcols(dds)$dispersion)
## [1] 0.03035168 2.80571497 0.01289667 0.01565291 0.23770103 0.01691016

The size factors \(s_j\) are accessible via sizeFactors:

sizeFactors(dds)
##   treated1   treated2   treated3 untreated1 untreated2 untreated3 
##  1.6355014  0.7612159  0.8326603  1.1383376  1.7935406  0.6494828 
## untreated4 
##  0.7516005

For advanced users, we also include a convenience function coef for extracting the matrix \([\beta_{ir}]\) for all genes i and model coefficients \(r\). This function can also return a matrix of standard errors, see ?coef. The columns of this matrix correspond to the effects returned by resultsNames. Note that the results function is best for building results tables with p values and adjusted p values.

head(coef(dds))
##              Intercept condition_treated_vs_untreated
## FBgn0000008  6.5584825                    0.002276428
## FBgn0000014  0.3720829                   -0.495113878
## FBgn0000017 12.1853275                   -0.239918945
## FBgn0000018  8.7576500                   -0.104673913
## FBgn0000024  2.5966613                    0.210848562
## FBgn0000032  9.9910773                   -0.091788071

The beta prior variance \(\sigma_r^2\) is stored as an attribute of the DESeqDataSet:

attr(dds, "betaPriorVar")
## [1] 1e+06 1e+06

General information about the prior used for log fold change shrinkage is also stored in a slot of the DESeqResults object. This would also contain information about what other packages were used for log2 fold change shrinkage.

priorInfo(resLFC)
## $type
## [1] "normal"
## 
## $package
## [1] "DESeq2"
## 
## $version
## [1] '1.18.1'
## 
## $betaPriorVar
##        Intercept conditiontreated 
##     1.000000e+06     1.094939e-01
priorInfo(resApe)
## $type
## [1] "apeglm"
## 
## $package
## [1] "apeglm"
## 
## $version
## [1] '1.0.0'
## 
## $prior.control
## $prior.control$no.shrink
## [1] 1
## 
## $prior.control$prior.mean
## [1] 0
## 
## $prior.control$prior.scale
## [1] 0.4045597
## 
## $prior.control$prior.df
## [1] 1
## 
## $prior.control$prior.no.shrink.mean
## [1] 0
## 
## $prior.control$prior.no.shrink.scale
## [1] 15
## 
## $prior.control$prior.var
## [1] 0.04091713
priorInfo(resAsh)
## $type
## [1] "ashr"
## 
## $package
## [1] "ashr"
## 
## $version
## [1] '2.0.5'
## 
## $fitted_g
## $pi
##  [1] 1.502747e-01 1.758840e-05 2.349157e-05 4.109161e-05 1.166040e-04
##  [6] 7.118828e-04 1.045939e-02 1.664217e-01 2.877691e-01 3.076675e-02
## [11] 9.989217e-02 8.123288e-02 3.932428e-02 1.038946e-01 3.968250e-10
## [16] 4.355989e-03 2.469746e-02 0.000000e+00 0.000000e+00 0.000000e+00
## [21] 0.000000e+00 3.092738e-07 0.000000e+00
## 
## $mean
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 
## $sd
##  [1]  0.005779228  0.008173063  0.011558456  0.016346125  0.023116912
##  [6]  0.032692251  0.046233824  0.065384502  0.092467649  0.130769003
## [11]  0.184935298  0.261538006  0.369870596  0.523076013  0.739741191
## [16]  1.046152026  1.479482383  2.092304051  2.958964766  4.184608102
## [21]  5.917929531  8.369216205 11.835859063
## 
## attr(,"row.names")
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## attr(,"class")
## [1] "normalmix"

The dispersion prior variance \(\sigma_d^2\) is stored as an attribute of the dispersion function:

dispersionFunction(dds)
## function (q) 
## coefs[1] + coefs[2]/q
## <environment: 0x126e8ad8>
## attr(,"coefficients")
## asymptDisp  extraPois 
##  0.0140678  2.6551441 
## attr(,"fitType")
## [1] "parametric"
## attr(,"varLogDispEsts")
## [1] 0.974076
## attr(,"dispPriorVar")
## [1] 0.4837182
attr(dispersionFunction(dds), "dispPriorVar")
## [1] 0.4837182

The version of DESeq2 which was used to construct the DESeqDataSet object, or the version used when DESeq was run, is stored here:

metadata(dds)[["version"]]
## [1] '1.18.1'

Sample-/gene-dependent normalization factors

In some experiments, there might be gene-dependent dependencies which vary across samples. For instance, GC-content bias or length bias might vary across samples coming from different labs or processed at different times. We use the terms normalization factors for a gene x sample matrix, and size factors for a single number per sample. Incorporating normalization factors, the mean parameter \(\mu_{ij}\) becomes:

\[ \mu_{ij} = NF_{ij} q_{ij} \]

with normalization factor matrix NF having the same dimensions as the counts matrix K. This matrix can be incorporated as shown below. We recommend providing a matrix with row-wise geometric means of 1, so that the mean of normalized counts for a gene is close to the mean of the unnormalized counts. This can be accomplished by dividing out the current row geometric means.

normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds) <- normFactors

These steps then replace estimateSizeFactors which occurs within the DESeq function. The DESeq function will look for pre-existing normalization factors and use these in the place of size factors (and a message will be printed confirming this).

The methods provided by the cqn or EDASeq packages can help correct for GC or length biases. They both describe in their vignettes how to create matrices which can be used by DESeq2. From the formula above, we see that normalization factors should be on the scale of the counts, like size factors, and unlike offsets which are typically on the scale of the predictors (i.e. the logarithmic scale for the negative binomial GLM). At the time of writing, the transformation from the matrices provided by these packages should be:

cqnOffset <- cqnObject$glm.offset
cqnNormFactors <- exp(cqnOffset)
EDASeqNormFactors <- exp(-1 * EDASeqOffset)

“Model matrix not full rank”

While most experimental designs run easily using design formula, some design formulas can cause problems and result in the DESeq function returning an error with the text: “the model matrix is not full rank, so the model cannot be fit as specified.” There are two main reasons for this problem: either one or more columns in the model matrix are linear combinations of other columns, or there are levels of factors or combinations of levels of multiple factors which are missing samples. We address these two problems below and discuss possible solutions:

Linear combinations

The simplest case is the linear combination, or linear dependency problem, when two variables contain exactly the same information, such as in the following sample table. The software cannot fit an effect for batch and condition, because they produce identical columns in the model matrix. This is also referred to as perfect confounding. A unique solution of coefficients (the \(\beta_i\) in the formula below) is not possible.

## DataFrame with 4 rows and 2 columns
##      batch condition
##   <factor>  <factor>
## 1        1         A
## 2        1         A
## 3        2         B
## 4        2         B

Another situation which will cause problems is when the variables are not identical, but one variable can be formed by the combination of other factor levels. In the following example, the effect of batch 2 vs 1 cannot be fit because it is identical to a column in the model matrix which represents the condition C vs A effect.

## DataFrame with 6 rows and 2 columns
##      batch condition
##   <factor>  <factor>
## 1        1         A
## 2        1         A
## 3        1         B
## 4        1         B
## 5        2         C
## 6        2         C

In both of these cases above, the batch effect cannot be fit and must be removed from the model formula. There is just no way to tell apart the condition effects and the batch effects. The options are either to assume there is no batch effect (which we know is highly unlikely given the literature on batch effects in sequencing datasets) or to repeat the experiment and properly balance the conditions across batches. A balanced design would look like:

## DataFrame with 6 rows and 2 columns
##      batch condition
##   <factor>  <factor>
## 1        1         A
## 2        1         B
## 3        1         C
## 4        2         A
## 5        2         B
## 6        2         C

Group-specific condition effects, individuals nested within groups

Finally, there is a case where we can in fact perform inference, but we may need to re-arrange terms to do so. Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a condition or treatment, while controlling for individual effects. The individuals are nested within the groups: an individual can only be in one of the groups, although each individual has one or more observations across condition.

An example of such an experiment is below:

coldata <- DataFrame(grp=factor(rep(c("X","Y"),each=6)),
                       ind=factor(rep(1:6,each=2)),
                      cnd=factor(rep(c("A","B"),6)))
coldata
## DataFrame with 12 rows and 3 columns
##          grp      ind      cnd
##     <factor> <factor> <factor>
## 1          X        1        A
## 2          X        1        B
## 3          X        2        A
## 4          X        2        B
## 5          X        3        A
## ...      ...      ...      ...
## 8          Y        4        B
## 9          Y        5        A
## 10         Y        5        B
## 11         Y        6        A
## 12         Y        6        B

Note that individual (ind) is a factor not a numeric. This is very important.

To make R display all the rows, we can do:

as.data.frame(coldata)
##    grp ind cnd
## 1    X   1   A
## 2    X   1   B
## 3    X   2   A
## 4    X   2   B
## 5    X   3   A
## 6    X   3   B
## 7    Y   4   A
## 8    Y   4   B
## 9    Y   5   A
## 10   Y   5   B
## 11   Y   6   A
## 12   Y   6   B

We have two groups of samples X and Y, each with three distinct individuals (labeled here 1-6). For each individual, we have conditions A and B (for example, this could be control and treated).

This design can be analyzed by DESeq2 but requires a bit of refactoring in order to fit the model terms. Here we will use a trick described in the edgeR user guide, from the section Comparisons Both Between and Within Subjects. If we try to analyze with a formula such as, ~ ind + grp*cnd, we will obtain an error, because the effect for group is a linear combination of the individuals.

However, the following steps allow for an analysis of group-specific condition effects, while controlling for differences in individual. For object construction, you can use a simple design, such as ~ ind + cnd, as long as you remember to replace it before running DESeq. Then add a column ind.n which distinguishes the individuals nested within a group. Here, we add this column to coldata, but in practice you would add this column to dds.

coldata$ind.n <- factor(rep(rep(1:3,each=2),2))
as.data.frame(coldata)
##    grp ind cnd ind.n
## 1    X   1   A     1
## 2    X   1   B     1
## 3    X   2   A     2
## 4    X   2   B     2
## 5    X   3   A     3
## 6    X   3   B     3
## 7    Y   4   A     1
## 8    Y   4   B     1
## 9    Y   5   A     2
## 10   Y   5   B     2
## 11   Y   6   A     3
## 12   Y   6   B     3

Now we can reassign our DESeqDataSet a design of ~ grp + grp:ind.n + grp:cnd, before we call DESeq. This new design will result in the following model matrix:

model.matrix(~ grp + grp:ind.n + grp:cnd, coldata)
##    (Intercept) grpY grpX:ind.n2 grpY:ind.n2 grpX:ind.n3 grpY:ind.n3
## 1            1    0           0           0           0           0
## 2            1    0           0           0           0           0
## 3            1    0           1           0           0           0
## 4            1    0           1           0           0           0
## 5            1    0           0           0           1           0
## 6            1    0           0           0           1           0
## 7            1    1           0           0           0           0
## 8            1    1           0           0           0           0
## 9            1    1           0           1           0           0
## 10           1    1           0           1           0           0
## 11           1    1           0           0           0           1
## 12           1    1           0           0           0           1
##    grpX:cndB grpY:cndB
## 1          0         0
## 2          1         0
## 3          0         0
## 4          1         0
## 5          0         0
## 6          1         0
## 7          0         0
## 8          0         1
## 9          0         0
## 10         0         1
## 11         0         0
## 12         0         1
## attr(,"assign")
## [1] 0 1 2 2 2 2 3 3
## attr(,"contrasts")
## attr(,"contrasts")$grp
## [1] "contr.treatment"
## 
## attr(,"contrasts")$ind.n
## [1] "contr.treatment"
## 
## attr(,"contrasts")$cnd
## [1] "contr.treatment"

Note that, if you have unbalanced numbers of individuals in the two groups, you will have zeros for some of the interactions between grp and ind.n. You can remove these columns manually from the model matrix and pass the corrected model matrix to the full argument of the DESeq function. See example code in the next section.

Above, the terms grpX.cndB and grpY.cndB give the group-specific condition effects, in other words, the condition B vs A effect for group X samples, and likewise for group Y samples. These terms control for all of the six individual effects. These group-specific condition effects can be extracted using results with the name argument.

Furthermore, grpX.cndB and grpY.cndB can be contrasted using the contrast argument, in order to test if the condition effect is different across group:

results(dds, contrast=list("grpY.cndB","grpX.cndB"))

Levels without samples

The base R function for creating model matrices will produce a column of zeros if a level is missing from a factor or a combination of levels is missing from an interaction of factors. The solution to the first case is to call droplevels on the column, which will remove levels without samples. This was shown in the beginning of this vignette.

The second case is also solvable, by manually editing the model matrix, and then providing this to DESeq. Here we construct an example dataset to illustrate:

group <- factor(rep(1:3,each=6))
condition <- factor(rep(rep(c("A","B","C"),each=2),3))
d <- DataFrame(group, condition)[-c(17,18),]
as.data.frame(d)
##    group condition
## 1      1         A
## 2      1         A
## 3      1         B
## 4      1         B
## 5      1         C
## 6      1         C
## 7      2         A
## 8      2         A
## 9      2         B
## 10     2         B
## 11     2         C
## 12     2         C
## 13     3         A
## 14     3         A
## 15     3         B
## 16     3         B

Note that if we try to estimate all interaction terms, we introduce a column with all zeros, as there are no condition C samples for group 3. (Here, unname is used to display the matrix concisely.)

m1 <- model.matrix(~ condition*group, d)
colnames(m1)
## [1] "(Intercept)"       "conditionB"        "conditionC"       
## [4] "group2"            "group3"            "conditionB:group2"
## [7] "conditionC:group2" "conditionB:group3" "conditionC:group3"
unname(m1)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    1    0    0    0    0    0    0    0    0
##  [2,]    1    0    0    0    0    0    0    0    0
##  [3,]    1    1    0    0    0    0    0    0    0
##  [4,]    1    1    0    0    0    0    0    0    0
##  [5,]    1    0    1    0    0    0    0    0    0
##  [6,]    1    0    1    0    0    0    0    0    0
##  [7,]    1    0    0    1    0    0    0    0    0
##  [8,]    1    0    0    1    0    0    0    0    0
##  [9,]    1    1    0    1    0    1    0    0    0
## [10,]    1    1    0    1    0    1    0    0    0
## [11,]    1    0    1    1    0    0    1    0    0
## [12,]    1    0    1    1    0    0    1    0    0
## [13,]    1    0    0    0    1    0    0    0    0
## [14,]    1    0    0    0    1    0    0    0    0
## [15,]    1    1    0    0    1    0    0    1    0
## [16,]    1    1    0    0    1    0    0    1    0
## attr(,"assign")
## [1] 0 1 1 2 2 3 3 3 3
## attr(,"contrasts")
## attr(,"contrasts")$condition
## [1] "contr.treatment"
## 
## attr(,"contrasts")$group
## [1] "contr.treatment"
all.zero <- apply(m1, 2, function(x) all(x==0))
all.zero
##       (Intercept)        conditionB        conditionC            group2 
##             FALSE             FALSE             FALSE             FALSE 
##            group3 conditionB:group2 conditionC:group2 conditionB:group3 
##             FALSE             FALSE             FALSE             FALSE 
## conditionC:group3 
##              TRUE

We can remove this column like so:

idx <- which(all.zero)
m1 <- m1[,-idx]
unname(m1)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
##  [1,]    1    0    0    0    0    0    0    0
##  [2,]    1    0    0    0    0    0    0    0
##  [3,]    1    1    0    0    0    0    0    0
##  [4,]    1    1    0    0    0    0    0    0
##  [5,]    1    0    1    0    0    0    0    0
##  [6,]    1    0    1    0    0    0    0    0
##  [7,]    1    0    0    1    0    0    0    0
##  [8,]    1    0    0    1    0    0    0    0
##  [9,]    1    1    0    1    0    1    0    0
## [10,]    1    1    0    1    0    1    0    0
## [11,]    1    0    1    1    0    0    1    0
## [12,]    1    0    1    1    0    0    1    0
## [13,]    1    0    0    0    1    0    0    0
## [14,]    1    0    0    0    1    0    0    0
## [15,]    1    1    0    0    1    0    0    1
## [16,]    1    1    0    0    1    0    0    1

Now this matrix m1 can be provided to the full argument of DESeq. For a likelihood ratio test of interactions, a model matrix using a reduced design such as ~ condition + group can be given to the reduced argument. Wald tests can also be generated instead of the likelihood ratio test, but for user-supplied model matrices, the argument betaPrior must be set to FALSE.

Theory behind DESeq2

The DESeq2 model

The DESeq2 model and all the steps taken in the software are described in detail in our publication (Love, Huber, and Anders 2014), and we include the formula and descriptions in this section as well. The differential expression analysis in DESeq2 uses a generalized linear model of the form:

\[ K_{ij} \sim \textrm{NB}(\mu_{ij}, \alpha_i) \]

\[ \mu_{ij} = s_j q_{ij} \]

\[ \log_2(q_{ij}) = x_{j.} \beta_i \]

where counts \(K_{ij}\) for gene i, sample j are modeled using a negative binomial distribution with fitted mean \(\mu_{ij}\) and a gene-specific dispersion parameter \(\alpha_i\). The fitted mean is composed of a sample-specific size factor \(s_j\) and a parameter \(q_{ij}\) proportional to the expected true concentration of fragments for sample j. The coefficients \(\beta_i\) give the log2 fold changes for gene i for each column of the model matrix \(X\). Note that the model can be generalized to use sample- and gene-dependent normalization factors \(s_{ij}\).

The dispersion parameter \(\alpha_i\) defines the relationship between the variance of the observed count and its mean value. In other words, how far do we expected the observed count will be from the mean value, which depends both on the size factor \(s_j\) and the covariate-dependent part \(q_{ij}\) as defined above.

\[ \textrm{Var}(K_{ij}) = E[ (K_{ij} - \mu_{ij})^2 ] = \mu_{ij} + \alpha_i \mu_{ij}^2 \]

An option in DESeq2 is to provide maximum a posteriori estimates of the log2 fold changes in \(\beta_i\) after incorporating a zero-centered Normal prior (betaPrior). While previously, these moderated, or shrunken, estimates were generated by DESeq or nbinomWaldTest functions, they are now produced by the lfcShrink function. Dispersions are estimated using expected mean values from the maximum likelihood estimate of log2 fold changes, and optimizing the Cox-Reid adjusted profile likelihood, as first implemented for RNA-seq data in edgeR (Cox and Reid 1987,edgeR_GLM). The steps performed by the DESeq function are documented in its manual page ?DESeq; briefly, they are:

  1. estimation of size factors \(s_j\) by estimateSizeFactors
  2. estimation of dispersion \(\alpha_i\) by estimateDispersions
  3. negative binomial GLM fitting for \(\beta_i\) and Wald statistics by nbinomWaldTest

For access to all the values calculated during these steps, see the section above.

Changes compared to DESeq

The main changes in the package DESeq2, compared to the (older) version DESeq, are as follows:

  • RangedSummarizedExperiment is used as the superclass for storage of input data, intermediate calculations and results.
  • Optional, maximum a posteriori estimation of GLM coefficients incorporating a zero-centered Normal prior with variance estimated from data (equivalent to Tikhonov/ridge regularization). This adjustment has little effect on genes with high counts, yet it helps to moderate the otherwise large variance in log2 fold change estimates for genes with low counts or highly variable counts. These estimates are now provided by the lfcShrink function.
  • Maximum a posteriori estimation of dispersion replaces the sharingMode options fit-only or maximum of the previous version of the package. This is similar to the dispersion estimation methods of DSS (H. Wu, Wang, and Wu 2012).
  • All estimation and inference is based on the generalized linear model, which includes the two condition case (previously the exact test was used).
  • The Wald test for significance of GLM coefficients is provided as the default inference method, with the likelihood ratio test of the previous version still available.
  • It is possible to provide a matrix of sample-/gene-dependent normalization factors.
  • Automatic independent filtering on the mean of normalized counts.
  • Automatic outlier detection and handling.

Methods changes since the 2014 DESeq2 paper

  • In version 1.18 (November 2017), we add two alternative shrinkage estimators, which can be used via lfcShrink: an estimator using a t prior from the apeglm packages, and an estimator with a fitted mixture of normals prior from the ashr package.
  • In version 1.16 (November 2016), the log2 fold change shrinkage is no longer default for the DESeq and nbinomWaldTest functions, by setting the defaults of these to betaPrior=FALSE, and by introducing a separate function lfcShrink, which performs log2 fold change shrinkage for visualization and ranking of genes. While for the majority of bulk RNA-seq experiments, the LFC shrinkage did not affect statistical testing, DESeq2 has become used as an inference engine by a wider community, and certain sequencing datasets show better performance with the testing separated from the use of the LFC prior. Also, the separation of LFC shrinkage to a separate function lfcShrink allows for easier methods development of alternative effect size estimators.
  • A small change to the independent filtering routine: instead of taking the quantile of the filter (the mean of normalized counts) which directly maximizes the number of rejections, the threshold chosen is the lowest quantile of the filter for which the number of rejections is close to the peak of a curve fit to the number of rejections over the filter quantiles. ``Close to’’ is defined as within 1 residual standard deviation. This change was introduced in version 1.10 (October 2015).
  • For the calculation of the beta prior variance, instead of matching the empirical quantile to the quantile of a Normal distribution, DESeq2 now uses the weighted quantile function of the Hmisc package. The weighting is described in the manual page for nbinomWaldTest. The weights are the inverse of the expected variance of log counts (as used in the diagonals of the matrix \(W\) in the GLM). The effect of the change is that the estimated prior variance is robust against noisy estimates of log fold change from genes with very small counts. This change was introduced in version 1.6 (October 2014).

For a list of all changes since version 1.0.0, see the NEWS file included in the package.

Count outlier detection

DESeq2 relies on the negative binomial distribution to make estimates and perform statistical inference on differences. While the negative binomial is versatile in having a mean and dispersion parameter, extreme counts in individual samples might not fit well to the negative binomial. For this reason, we perform automatic detection of count outliers. We use Cook’s distance, which is a measure of how much the fitted coefficients would change if an individual sample were removed (Cook 1977). For more on the implementation of Cook’s distance see the manual page for the results function. Below we plot the maximum value of Cook’s distance for each row over the rank of the test statistic to justify its use as a filtering criterion.

W <- res$stat
maxCooks <- apply(assays(dds)[["cooks"]],1,max)
idx <- !is.na(W)
plot(rank(W[idx]), maxCooks[idx], xlab="rank of Wald statistic", 
     ylab="maximum Cook's distance per gene",
     ylim=c(0,5), cex=.4, col=rgb(0,0,0,.3))
m <- ncol(dds)
p <- 3
abline(h=qf(.99, p, m - p))

Contrasts

Contrasts can be calculated for a DESeqDataSet object for which the GLM coefficients have already been fit using the Wald test steps (DESeq with test="Wald" or using nbinomWaldTest). The vector of coefficients \(\beta\) is left multiplied by the contrast vector \(c\) to form the numerator of the test statistic. The denominator is formed by multiplying the covariance matrix \(\Sigma\) for the coefficients on either side by the contrast vector \(c\). The square root of this product is an estimate of the standard error for the contrast. The contrast statistic is then compared to a normal distribution as are the Wald statistics for the DESeq2 package.

\[ W = \frac{c^t \beta}{\sqrt{c^t \Sigma c}} \]

Expanded model matrices

For the specific combination of lfcShrink with the type normal and using contrast, DESeq2 uses expanded model matrices to produce shrunken log2 fold change estimates where the shrinkage is independent of the choice of reference level. In all other cases, DESeq2 uses standard model matrices, as produced by model.matrix. The expanded model matrices differ from the standard model matrices, in that they have an indicator column (and therefore a coefficient) for each level of factors in the design formula in addition to an intercept. This is described in the DESeq2 paper, but the DESeq2 software package has moved away from this approach, with more support for shrinkage of individual coefficients (although the expanded model matrix approach is still supported using the above combination of functions and arguments).

Independent filtering and multiple testing

Filtering criteria

The goal of independent filtering is to filter out those tests from the procedure that have no, or little chance of showing significant evidence, without even looking at their test statistic. Typically, this results in increased detection power at the same experiment-wide type I error. Here, we measure experiment-wide type I error in terms of the false discovery rate.

A good choice for a filtering criterion is one that

  1. is statistically independent from the test statistic under the null hypothesis,
  2. is correlated with the test statistic under the alternative, and
  3. does not notably change the dependence structure – if there is any – between the tests that pass the filter, compared to the dependence structure between the tests before filtering.

The benefit from filtering relies on property (2), and we will explore it further below. Its statistical validity relies on property (1) – which is simple to formally prove for many combinations of filter criteria with test statistics – and (3), which is less easy to theoretically imply from first principles, but rarely a problem in practice. We refer to (Bourgon, Gentleman, and Huber 2010) for further discussion of this topic.

A simple filtering criterion readily available in the results object is the mean of normalized counts irrespective of biological condition, and so this is the criterion which is used automatically by the results function to perform independent filtering. Genes with very low counts are not likely to see significant differences typically due to high dispersion. For example, we can plot the \(-\log_{10}\) p values from all genes over the normalized mean counts:

plot(res$baseMean+1, -log10(res$pvalue),
     log="x", xlab="mean of normalized counts",
     ylab=expression(-log[10](pvalue)),
     ylim=c(0,30),
     cex=.4, col=rgb(0,0,0,.3))

Why does it work?

Consider the p value histogram below It shows how the filtering ameliorates the multiple testing problem – and thus the severity of a multiple testing adjustment – by removing a background set of hypotheses whose p values are distributed more or less uniformly in [0,1].

use <- res$baseMean > metadata(res)$filterThreshold
h1 <- hist(res$pvalue[!use], breaks=0:50/50, plot=FALSE)
h2 <- hist(res$pvalue[use], breaks=0:50/50, plot=FALSE)
colori <- c(`do not pass`="khaki", `pass`="powderblue")

Histogram of p values for all tests. The area shaded in blue indicates the subset of those that pass the filtering, the area in khaki those that do not pass:

barplot(height = rbind(h1$counts, h2$counts), beside = FALSE,
        col = colori, space = 0, main = "", ylab="frequency")
text(x = c(0, length(h1$counts)), y = 0, label = paste(c(0,1)),
     adj = c(0.5,1.7), xpd=NA)
legend("topright", fill=rev(colori), legend=rev(names(colori)))

Frequently asked questions

How can I get support for DESeq2?

We welcome questions about our software, and want to ensure that we eliminate issues if and when they appear. We have a few requests to optimize the process:

  • all questions should take place on the Bioconductor support site: https://support.bioconductor.org, which serves as a repository of questions and answers. This helps to save the developers’ time in responding to similar questions. Make sure to tag your post with deseq2. It is often very helpful in addition to describe the aim of your experiment.
  • before posting, first search the Bioconductor support site mentioned above for past threads which might have answered your question.
  • if you have a question about the behavior of a function, read the sections of the manual page for this function by typing a question mark and the function name, e.g. ?results. We spend a lot of time documenting individual functions and the exact steps that the software is performing.
  • include all of your R code, especially the creation of the DESeqDataSet and the design formula. Include complete warning or error messages, and conclude your message with the full output of sessionInfo().
  • if possible, include the output of as.data.frame(colData(dds)), so that we can have a sense of the experimental setup. If this contains confidential information, you can replace the levels of those factors using levels().

Why are some p values set to NA?

See the details above.

How can I get unfiltered DESeq2 results?

Users can obtain unfiltered GLM results, i.e. without outlier removal or independent filtering with the following call:

dds <- DESeq(dds, minReplicatesForReplace=Inf)
res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE)

In this case, the only p values set to NA are those from genes with all counts equal to zero.

How do I use VST or rlog data for differential testing?

The variance stabilizing and rlog transformations are provided for applications other than differential testing, for example clustering of samples or other machine learning applications. For differential testing we recommend the DESeq function applied to raw counts as outlined above.

Can I use DESeq2 to analyze paired samples?

Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. ~ subject + condition.

If I have multiple groups, should I run all together or split into pairs of groups?

Typically, we recommend users to run samples from all groups together, and then use the contrast argument of the results function to extract comparisons of interest after fitting the model using DESeq.

The model fit by DESeq estimates a single dispersion parameter for each gene, which defines how far we expect the observed count for a sample will be from the mean value from the model given its size factor and its condition group. See the section above and the DESeq2 paper for full details. Having a single dispersion parameter for each gene is usually sufficient for analyzing multi-group data, as the final dispersion value will incorporate the within-group variability across all groups.

However, for some datasets, exploratory data analysis (EDA) plots could reveal that one or more groups has much higher within-group variability than the others. A simulated example of such a set of samples is shown below. This is case where, by comparing groups A and B separately – subsetting a DESeqDataSet to only samples from those two groups and then running DESeq on this subset – will be more sensitive than a model including all samples together. It should be noted that such an extreme range of within-group variability is not common, although it could arise if certain treatments produce an extreme reaction (e.g. cell death). Again, this can be easily detected from the EDA plots such as PCA described in this vignette.

Here we diagram an extreme range of within-group variability with a simulated dataset. Typically, it is recommended to run DESeq across samples from all groups, for datasets with multiple groups. However, this simulated dataset shows a case where it would be preferable to compare groups A and B by creating a smaller dataset without the C samples. Group C has much higher within-group variability, which would inflate the per-gene dispersion estimate for groups A and B as well:

Can I run DESeq2 to contrast the levels of many groups?

DESeq2 will work with any kind of design specified using the R formula. We enourage users to consider exploratory data analysis such as principal components analysis rather than performing statistical testing of all pairs of many groups of samples. Statistical testing is one of many ways of describing differences between samples.

Regarding multiple test correction, if a user is planning to contrast all pairs of many levels, and then selectively reporting the results of only a subset of those pairs, one needs to perform multiple testing across contrasts as well as genes to control for this additional form of multiple testing. This can be done by using the p.adjust function across a long vector of p values from all pairs of contasts, then re-assigning these adjusted p values to the appropriate results table.

As a speed concern with fitting very large models, note that each additional level of a factor in the design formula adds another parameter to the GLM which is fit by DESeq2. Users might consider first removing genes with very few reads, e.g. genes with row sum of 1, as this will speed up the fitting procedure.

Can I use DESeq2 to analyze a dataset without replicates?

If a DESeqDataSet is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for ?DESeq.

How can I include a continuous covariate in the design formula?

Continuous covariates can be included in the design formula in exactly the same manner as factorial covariates, and then results for the continuous covariate can be extracted by specifying name. Continuous covariates might make sense in certain experiments, where a constant fold change might be expected for each unit of the covariate. However, in many cases, more meaningful results can be obtained by cutting continuous covariates into a factor defined over a small number of bins (e.g. 3-5). In this way, the average effect of each group is controlled for, regardless of the trend over the continuous covariates. In R, numeric vectors can be converted into factors using the function cut.

I ran a likelihood ratio test, but results() only gives me one comparison.

“… How do I get the p values for all of the variables/levels that were removed in the reduced design?”

This is explained in the help page for ?results in the section about likelihood ratio test p-values, but we will restate the answer here. When one performs a likelihood ratio test, the p values and the test statistic (the stat column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero.

The likelihood ratio test p values therefore represent a test of all the variables and all the levels of factors which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This is indicated at the top of the results table with the text, e.g., log2 fold change (MLE): condition C vs A, followed by, LRT p-value: ‘~ batch + condition’ vs ‘~ batch’. This indicates that the p value is for the likelihood ratio test of all the variables and all the levels, while the log fold change is a single comparison from among those variables and levels. See the help page for results for more details.

What are the exact steps performed by DESeq()?

See the manual page for DESeq, which links to the subfunctions which are called in order, where complete details are listed. Also you can read the three steps listed in the DESeq2 model in this document.

Is there an official Galaxy tool for DESeq2?

Yes. The repository for the DESeq2 tool is

https://github.com/galaxyproject/tools-iuc/tree/master/tools/deseq2

and a link to its location in the Tool Shed is

https://toolshed.g2.bx.psu.edu/view/iuc/deseq2/d983d19fbbab.

I want to benchmark DESeq2 comparing to other DE tools.

One aspect which can cause problems for comparison is that, by default, DESeq2 outputs NA values for adjusted p values based on independent filtering of genes which have low counts. This is a way for the DESeq2 to give extra information on why the adjusted p value for this gene is not small. Additionally, p values can be set to NA based on extreme count outlier detection. These NA values should be considered negatives for purposes of estimating sensitivity and specificity. The easiest way to work with the adjusted p values in a benchmarking context is probably to convert these NA values to 1:

res$padj <- ifelse(is.na(res$padj), 1, res$padj)

I have trouble installing DESeq2 on Ubuntu/Linux…

I try to install DESeq2 using biocLite(), but I get an error trying to install the R packages XML and/or RCurl:

ERROR: configuration failed for package XML

ERROR: configuration failed for package RCurl

You need to install the following devel versions of packages using your standard package manager, e.g. sudo apt-get install or sudo apt install

  • libxml2-dev
  • libcurl4-openssl-dev

Acknowledgments

We have benefited in the development of DESeq2 from the help and feedback of many individuals, including but not limited to:

The Bionconductor Core Team, Alejandro Reyes, Andrzej Oles, Aleksandra Pekowska, Felix Klein, Nikolaos Ignatiadis (IHW), Anqi Zhu (apeglm), Joseph Ibrahim (apeglm), Vince Carey, Owen Solberg, Ruping Sun, Devon Ryan, Steve Lianoglou, Jessica Larson, Christina Chaivorapol, Pan Du, Richard Bourgon, Willem Talloen, Elin Videvall, Hanneke van Deutekom, Todd Burwell, Jesse Rowley, Igor Dolgalev, Stephen Turner, Ryan C Thompson, Tyr Wiesner-Hanks, Konrad Rudolph, David Robinson, Mingxiang Teng, Mathias Lesche, Sonali Arora, Jordan Ramilowski, Ian Dworkin, Bjorn Gruning, Ryan McMinds, Paul Gordon, Leonardo Collado Torres, Enrico Ferrero, Peter Langfelder.

Session info

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-2         pheatmap_1.0.8            
##  [3] hexbin_1.27.1              vsn_3.46.0                
##  [5] ggplot2_2.2.1              IHW_1.6.0                 
##  [7] airway_0.112.0             pasilla_1.6.0             
##  [9] DESeq2_1.18.1              SummarizedExperiment_1.8.0
## [11] DelayedArray_0.4.1         matrixStats_0.52.2        
## [13] Biobase_2.38.0             GenomicRanges_1.30.0      
## [15] GenomeInfoDb_1.14.0        IRanges_2.12.0            
## [17] S4Vectors_0.16.0           BiocGenerics_0.24.0       
## [19] tximportData_1.6.0         readr_1.1.1               
## [21] tximport_1.6.0            
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6            bit64_0.9-7            
##  [3] doParallel_1.0.11       rprojroot_1.2          
##  [5] numDeriv_2016.8-1       tools_3.4.2            
##  [7] backports_1.1.1         affyio_1.48.0          
##  [9] R6_2.2.2                rpart_4.1-11           
## [11] Hmisc_4.0-3             DBI_0.7                
## [13] lazyeval_0.2.1          colorspace_1.3-2       
## [15] nnet_7.3-12             apeglm_1.0.0           
## [17] gridExtra_2.3           preprocessCore_1.40.0  
## [19] bit_1.1-12              compiler_3.4.2         
## [21] fdrtool_1.2.15          htmlTable_1.9          
## [23] labeling_0.3            slam_0.1-40            
## [25] scales_0.5.0            checkmate_1.8.5        
## [27] SQUAREM_2017.10-1       affy_1.56.0            
## [29] genefilter_1.60.0       stringr_1.2.0          
## [31] digest_0.6.12           foreign_0.8-69         
## [33] rmarkdown_1.7           XVector_0.18.0         
## [35] pscl_1.5.2              base64enc_0.1-3        
## [37] htmltools_0.3.6         lpsymphony_1.6.0       
## [39] limma_3.34.1            bbmle_1.0.20           
## [41] htmlwidgets_0.9         rlang_0.1.4            
## [43] RSQLite_2.0             BiocInstaller_1.28.0   
## [45] BiocParallel_1.12.0     acepack_1.4.1          
## [47] RCurl_1.95-4.8          magrittr_1.5           
## [49] GenomeInfoDbData_0.99.1 Formula_1.2-2          
## [51] Matrix_1.2-11           Rcpp_0.12.13           
## [53] munsell_0.4.3           stringi_1.1.5          
## [55] yaml_2.1.14             MASS_7.3-47            
## [57] zlibbioc_1.24.0         plyr_1.8.4             
## [59] grid_3.4.2              blob_1.1.0             
## [61] lattice_0.20-35         splines_3.4.2          
## [63] annotate_1.56.0         hms_0.3                
## [65] locfit_1.5-9.1          knitr_1.17             
## [67] rjson_0.2.15            geneplotter_1.56.0     
## [69] codetools_0.2-15        XML_3.98-1.9           
## [71] evaluate_0.10.1         latticeExtra_0.6-28    
## [73] data.table_1.10.4-3     foreach_1.4.3          
## [75] gtable_0.2.0            assertthat_0.2.0       
## [77] ashr_2.0.5              emdbook_1.3.9          
## [79] xtable_1.8-2            coda_0.19-1            
## [81] survival_2.41-3         truncnorm_1.0-7        
## [83] tibble_1.3.4            iterators_1.0.8        
## [85] AnnotationDbi_1.40.0    memoise_1.1.0          
## [87] cluster_2.0.6           BiocStyle_2.6.0

References

Anders, Simon, and Wolfgang Huber. 2010. “Differential Expression Analysis for Sequence Count Data.” Genome Biology 11: R106. http://genomebiology.com/2010/11/10/R106.

Anders, Simon, Paul Theodor Pyl, and Wolfgang Huber. 2014. “HTSeq – A Python framework to work with high-throughput sequencing data.” Bioinformatics. http://dx.doi.org/10.1093/bioinformatics/btu638.

Bourgon, Richard, Robert Gentleman, and Wolfgang Huber. 2010. “Independent Filtering Increases Detection Power for High-Throughput Experiments.” PNAS 107 (21): 9546–51. http://www.pnas.org/content/107/21/9546.long.

Bray, Nicolas, Harold Pimentel, Pall Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic Rna-Seq Quantification.” Nature Biotechnology 34: 525–27. http://dx.doi.org/10.1038/nbt.3519.

Brooks, A. N., L. Yang, M. O. Duff, K. D. Hansen, J. W. Park, S. Dudoit, S. E. Brenner, and B. R. Graveley. 2011. “Conservation of an RNA regulatory map between Drosophila and mammals.” Genome Research, 193–202. doi:10.1101/gr.108662.110.

Cook, R. Dennis. 1977. “Detection of Influential Observation in Linear Regression.” Technometrics, February.

Cox, D. R., and N. Reid. 1987. “Parameter orthogonality and approximate conditional inference.” Journal of the Royal Statistical Society, Series B 49 (1): 1–39. http://www.jstor.org/stable/2345476.

Gerard, David, and Matthew Stephens. 2017. “Empirical Bayes Shrinkage and False Discovery Rate Estimation, Allowing For Unwanted Variation.” ArXiv. https://arxiv.org/abs/1709.10066.

Huber, Wolfgang, Anja von Heydebreck, Holger Sültmann, Annemarie Poustka, and Martin Vingron. 2003. “Parameter Estimation for the Calibration and Variance Stabilization of Microarray Data.” Statistical Applications in Genetics and Molecular Biology 2 (1): Article 3.

Ignatiadis, Nikolaos, Bernd Klaus, Judith Zaugg, and Wolfgang Huber. 2016. “Data-Driven Hypothesis Weighting Increases Detection Power in Genome-Scale Multiple Testing.” Nature Methods. http://dx.doi.org/10.1038/nmeth.3885.

Leek, Jeffrey T. 2014. “svaseq: removing batch effects and other unwanted noise from sequencing data.” Nucleic Acids Research 42 (21). http://dx.doi.org/10.1093/nar/gku864.

Li, Bo, and Colin N. Dewey. 2011. “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.” BMC Bioinformatics 12: 323+. doi:10.1186/1471-2105-12-3231.

Liao, Y., G. K. Smyth, and W. Shi. 2013. “featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.” Bioinformatics, November.

Love, Michael I., John B. Hogenesch, and Rafael A. Irizarry. 2016. “Modeling of Rna-Seq Fragment Sequence Bias Reduces Systematic Errors in Transcript Abundance Estimation.” Nature Biotechnology 34 (12): 1287–91. http://dx.doi.org/10.1038/nbt.3682.

Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology 15 (12): 550. http://dx.doi.org/10.1186/s13059-014-0550-8.

Patro, Rob, Geet Duggal, Michael I. Love, Rafael A. Irizarry, and Carl Kingsford. 2017. “Salmon Provides Fast and Bias-Aware Quantification of Transcript Expression.” Nature Methods. http://dx.doi.org/10.1038/nmeth.4197.

Patro, Rob, Stephen M. Mount, and Carl Kingsford. 2014. “Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.” Nature Biotechnology 32: 462–64. http://dx.doi.org/10.1038/nbt.2862.

Risso, Davide, John Ngai, Terence P Speed, and Sandrine Dudoit. 2014. “Normalization of RNA-seq data using factor analysis of control genes or samples.” Nature Biotechnology 32 (9). http://dx.doi.org/10.1038/nbt.2931.

Robert, Christelle, and Mick Watson. 2015. “Errors in RNA-Seq quantification affect genes of relevance to human disease.” Genome Biology. doi:10.1186/s13059-015-0734-x.

Soneson, Charlotte, Michael I. Love, and Mark Robinson. 2015. “Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.” F1000Research 4 (1521). http://dx.doi.org/10.12688/f1000research.7563.1.

Stephens, Matthew. 2016. “False Discovery Rates: A New Deal.” Biostatistics 18 (2). https://doi.org/10.1093/biostatistics/kxw041.

Tibshirani, Robert. 1988. “Estimating Transformations for Regression via Additivity and Variance Stabilization.” Journal of the American Statistical Association 83: 394–405.

Trapnell, Cole, David G Hendrickson, Martin Sauvageau, Loyal Goff, John L Rinn, and Lior Pachter. 2013. “Differential analysis of gene regulation at transcript resolution with RNA-seq.” Nature Biotechnology. doi:10.1038/nbt.2450.

Wu, Hao, Chi Wang, and Zhijin Wu. 2012. “A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data.” Biostatistics, September. Oxford University Press. doi:10.1093/biostatistics/kxs033.

DESeq2/inst/script/0000755000175400017540000000000013201671732015106 5ustar00biocbuildbiocbuildDESeq2/inst/script/makeSim.R0000644000175400017540000000063713201671732016625 0ustar00biocbuildbiocbuildmakeSim <- function(n, m, x, beta, meanDispPairs, sf=rep(1,m)) { idx <- sample(nrow(meanDispPairs), n, replace=TRUE) mu0 <- meanDispPairs[idx,1] disp <- meanDispPairs[idx,2] betafull <- cbind(log2(mu0), beta) mu <- 2^(betafull %*% t(x)) muMat <- matrix(rep(mu, times=m) * rep(sf, each=n), ncol=m) list(mat = matrix(rnbinom(n*m, mu=muMat, size=1/disp), ncol=m), disp = disp, mu0 = mu0) } DESeq2/inst/script/runScripts.R0000644000175400017540000001565013201671732017414 0ustar00biocbuildbiocbuildrunDESeq2 <- function(e, retDDS=FALSE) { counts <- exprs(e) mode(counts) <- "integer" dds <- DESeqDataSetFromMatrix(counts, DataFrame(pData(e)), ~ condition) dds <- DESeq(dds,quiet=TRUE) res <- results(dds) beta <- res$log2FoldChange pvals <- res$pvalue padj <- res$padj pvals[is.na(pvals)] <- 1 pvals[rowSums(exprs(e)) == 0] <- NA padj[is.na(padj)] <- 1 return(list(pvals=pvals, padj=padj, beta=beta)) } runDESeq2LRT <- function(e, retDDS=FALSE) { counts <- exprs(e) mode(counts) <- "integer" dds <- DESeqDataSetFromMatrix(counts, DataFrame(pData(e)), ~ condition) dds <- DESeq(dds,test="LRT",reduced=~1,quiet=TRUE) res <- results(dds) beta <- res$log2FoldChange pvals <- res$pvalue padj <- res$padj pvals[is.na(pvals)] <- 1 pvals[rowSums(exprs(e)) == 0] <- NA padj[is.na(padj)] <- 1 return(list(pvals=pvals, padj=padj, beta=beta)) } runDESeq2NoIF <- function(e, retDDS=FALSE) { counts <- exprs(e) mode(counts) <- "integer" dds <- DESeqDataSetFromMatrix(counts, DataFrame(pData(e)), ~ condition) dds <- DESeq(dds,quiet=TRUE) res <- results(dds, independentFiltering=FALSE) beta <- res$log2FoldChange pvals <- res$pvalue padj <- res$padj pvals[is.na(pvals)] <- 1 pvals[rowSums(exprs(e)) == 0] <- NA padj[is.na(padj)] <- 1 return(list(pvals=pvals, padj=padj, beta=beta)) } runDESeq2Outliers <- function(e, retDDS=FALSE) { counts <- exprs(e) mode(counts) <- "integer" dds <- DESeqDataSetFromMatrix(counts, DataFrame(pData(e)), ~ condition) ddsDefault <- DESeq(dds, quiet=TRUE) ddsNoRepl <- ddsDefault if (ncol(e) >= 14) { # insert original maximum Cook's distances # so the rows with replacement will be filtered # this avoid re-running with minReplicateForReplace=Inf mcols(ddsNoRepl)$maxCooks <- apply(assays(ddsNoRepl)[["cooks"]], 1, max) } resDefault <- results(ddsDefault) resNoFilt <- results(ddsDefault, cooksCutoff=FALSE) resNoRepl <- results(ddsNoRepl) resList <- list("DESeq2"=resDefault, "DESeq2-noFilt"=resNoFilt, "DESeq2-noRepl"=resNoRepl) resOut <- lapply(resList, function(res) { pvals <- res$pvalue padj <- res$padj pvals[is.na(pvals)] <- 1 pvals[rowSums(exprs(e)) == 0] <- NA padj[is.na(padj)] <- 1 list(pvals=pvals, padj=padj) }) return(resOut) } runEdgeR <- function(e) { design <- model.matrix(~ pData(e)$condition) dgel <- DGEList(exprs(e)) dgel <- calcNormFactors(dgel) ## dgel <- estimateGLMCommonDisp(dgel, design) ## dgel <- estimateGLMTrendedDisp(dgel, design) ## dgel <- estimateGLMTagwiseDisp(dgel, design) dgel <- estimateDisp(dgel, design) edger.fit <- glmFit(dgel, design) edger.lrt <- glmLRT(edger.fit) predbeta <- predFC(exprs(e), design, offset=getOffset(dgel), dispersion=dgel$tagwise.dispersion) predbeta10 <- predFC(exprs(e), design, prior.count=10, offset=getOffset(dgel), dispersion=dgel$tagwise.dispersion) pvals <- edger.lrt$table$PValue pvals[rowSums(exprs(e)) == 0] <- NA padj <- p.adjust(pvals,method="BH") padj[is.na(padj)] <- 1 list(pvals=pvals, padj=padj, beta=log2(exp(1)) * edger.fit$coefficients[,"pData(e)$conditionB"], predbeta=predbeta[,"pData(e)$conditionB"], predbeta10=predbeta10[,"pData(e)$conditionB"]) } runEdgeRRobust <- function(e) { design <- model.matrix(~ pData(e)$condition) dgel <- DGEList(exprs(e)) dgel <- calcNormFactors(dgel) # settings for robust from robinson_lab/edgeR_robust/robust_simulation.R dgel <- estimateGLMRobustDisp(dgel, design, maxit=6) edger.fit <- glmFit(dgel, design) edger.lrt <- glmLRT(edger.fit) predbeta <- predFC(exprs(e), design, offset=getOffset(dgel), dispersion=dgel$tagwise.dispersion) pvals <- edger.lrt$table$PValue pvals[rowSums(exprs(e)) == 0] <- NA padj <- p.adjust(pvals,method="BH") padj[is.na(padj)] <- 1 list(pvals=pvals, padj=padj, beta=log2(exp(1)) * edger.fit$coefficients[,"pData(e)$conditionB"], predbeta=predbeta[,"pData(e)$conditionB"]) } runDSS <- function(e) { X <- as.matrix(exprs(e)) colnames(X) <- NULL designs <- as.character(pData(e)$condition) seqData <- newSeqCountSet(X, designs) seqData <- estNormFactors(seqData) seqData <- estDispersion(seqData) result <- waldTest(seqData, "B", "A") result <- result[match(rownames(seqData),rownames(result)),] pvals <- result$pval pvals[rowSums(exprs(e)) == 0] <- NA padj <- p.adjust(pvals,method="BH") padj[is.na(padj)] <- 1 list(pvals=pvals, padj=padj, beta=( log2(exp(1)) * result$lfc )) } runDSSFDR <- function(e) { X <- as.matrix(exprs(e)) colnames(X) <- NULL designs <- as.character(pData(e)$condition) seqData <- newSeqCountSet(X, designs) seqData <- estNormFactors(seqData) seqData <- estDispersion(seqData) result <- waldTest(seqData, "B", "A") result <- result[match(rownames(seqData),rownames(result)),] pvals <- result$pval pvals[rowSums(exprs(e)) == 0] <- NA padj <- result$fdr padj[is.na(padj)] <- 1 list(pvals=pvals, padj=padj, beta=( log2(exp(1)) * result$lfc )) } runVoom <- function(e) { design <- model.matrix(~ condition, pData(e)) dgel <- DGEList(exprs(e)) dgel <- calcNormFactors(dgel) v <- voom(dgel,design,plot=FALSE) fit <- lmFit(v,design) fit <- eBayes(fit) tt <- topTable(fit,coef=ncol(design),n=nrow(dgel),sort.by="none") pvals <- tt$P.Value pvals[rowSums(exprs(e)) == 0] <- NA padj <- p.adjust(pvals,method="BH") padj[is.na(padj)] <- 1 list(pvals=pvals, padj=padj, beta=tt$logFC) } runSAMseq <- function(e) { set.seed(1) x <- exprs(e) y <- pData(e)$condition capture.output({samfit <- SAMseq(x, y, resp.type = "Two class unpaired")}) beta <- log2(samfit$samr.obj$foldchange) pvals <- samr.pvalues.from.perms(samfit$samr.obj$tt, samfit$samr.obj$ttstar) pvals[rowSums(exprs(e)) == 0] <- NA padj <- p.adjust(pvals,method="BH") padj[is.na(padj)] <- 1 list(pvals=pvals,padj=padj,beta=beta) } runSAMseqFDR <- function(e) { set.seed(1) x <- exprs(e) y <- pData(e)$condition capture.output({samfit <- SAMseq(x, y, resp.type = "Two class unpaired", fdr.output=1)}) padj <- rep(1,nrow(e)) idx <- as.numeric(samfit$siggenes.table$genes.up[,"Gene Name"]) padj[idx] <- 1/100 * as.numeric(samfit$siggenes.table$genes.up[,"q-value(%)"]) idx <- as.numeric(samfit$siggenes.table$genes.lo[,"Gene Name"]) padj[idx] <- 1/100 * as.numeric(samfit$siggenes.table$genes.lo[,"q-value(%)"]) beta <- log2(samfit$samr.obj$foldchange) pvals <- rep(NA,nrow(e)) list(pvals=pvals,padj=padj,beta=beta) } runEBSeq <- function(e) { sizes <- MedianNorm(exprs(e)) out <- capture.output({ suppressMessages({ res <- EBTest(Data = exprs(e), Conditions = pData(e)$condition, sizeFactors = sizes, maxround = 5) }) }) padj <- rep(1, nrow(exprs(e))) # we use 1 - PPDE for the FDR cutoff as this is recommended in the EBSeq vignette padj[match(rownames(res$PPMat), rownames(e))] <- res$PPMat[,"PPEE"] beta <- rep(0, nrow(exprs(e))) pvals <- rep(NA,nrow(e)) list(pvals=pvals, padj=padj, beta=beta) } DESeq2/inst/script/simulateCluster.R0000644000175400017540000000522513201671732020422 0ustar00biocbuildbiocbuildsource("makeSim.R") load("meanDispPairs.RData") library("DESeq2") library("PoiClaClu") library("mclust") library("parallel") options(mc.cores=20) set.seed(1) n <- 2000 # create 20 samples, then remove first group, leaving 16 # this way the groups are equidistant from each other m <- 20 k <- 4 methods <- c("norm Eucl","log2 Eucl","rlog Eucl","VST Eucl","PoisDist") condition0 <- factor(rep(c("null","A","B","C","D"), each = m/(k+1))) x <- model.matrix(~ condition0) rnormsds <- list(seq(from=0, to=.6, length=7), seq(from=0, to=.8, length=7), seq(from=0, to=1.2, length=7)) sfs <- list(equal=rep(1,m), unequal=rep(c(1,1,1/3,3), times=(k+1))) dispScales <- c(.1, .25, 1) nreps <- 20 res <- do.call(rbind, lapply(seq_along(dispScales), function(idx) { dispScale <- dispScales[idx] do.call(rbind, lapply(rnormsds[[idx]], function(rnormsd) { do.call(rbind, mclapply(seq_along(sfs), function(sf.idx) { sf <- sfs[[sf.idx]] do.call(rbind, lapply(seq_len(nreps), function(i) { beta <- replicate(k, c(rep(0,8/10 * n), rnorm(2/10 * n, 0, rnormsd))) mdp <- meanDispPairs mdp$disp <- mdp$disp * dispScale mat0 <- makeSim(n,m,x,beta,mdp,sf)$mat mat <- mat0[,5:20] mode(mat) <- "integer" condition <- droplevels(condition0[5:20]) dds <- DESeqDataSetFromMatrix(mat, DataFrame(condition), ~ 1) dds <- estimateSizeFactors(dds) dds <- estimateDispersionsGeneEst(dds) # don't warn if local fit is used dds <- suppressWarnings({estimateDispersionsFit(dds)}) norm <- t(counts(dds, normalized=TRUE)) lognorm <- t(log2(counts(dds, normalized=TRUE) + 1)) rld <- t(assay(rlog(dds, blind=FALSE))) vsd <- t(assay(varianceStabilizingTransformation(dds, blind=FALSE))) poiDist <- PoissonDistance(t(mat))$dd normARI <- adjustedRandIndex(condition, cutree(hclust(dist(norm)),k=k)) lognormARI <- adjustedRandIndex(condition, cutree(hclust(dist(lognorm)),k=k)) rlogARI <- adjustedRandIndex(condition, cutree(hclust(dist(rld)),k=k)) vstARI <- adjustedRandIndex(condition, cutree(hclust(dist(vsd)),k=k)) poiDistARI <- adjustedRandIndex(condition, cutree(hclust(poiDist),k=k)) data.frame(ARI = c(normARI, lognormARI, rlogARI, vstARI, poiDistARI), method = methods, rnormsd = rep(rnormsd,length(methods)), dispScale = rep(dispScale,length(methods)), sizeFactor = rep(names(sfs)[sf.idx], length(methods))) })) })) })) })) res$method <- factor(res$method, methods) save(res, file="results_simulateCluster.RData") DESeq2/inst/script/simulateDE.R0000644000175400017540000000457413201671732017277 0ustar00biocbuildbiocbuildsource("makeSim.R") load("meanDispPairs.RData") library("Biobase") library("DESeq2") library("edgeR") library("limma") library("samr") library("DSS") library("EBSeq") source("runScripts.R") algos <- list("DESeq2"=runDESeq2,"DESeq2-LRT"=runDESeq2LRT,"DESeq2-NoIF"=runDESeq2NoIF, "edgeR"=runEdgeR,"edgeR-robust"=runEdgeRRobust, "DSS"=runDSS,"DSS-FDR"=runDSSFDR, "voom"=runVoom, "SAMseq"=runSAMseq,"SAMseq-FDR"=runSAMseqFDR, "EBSeq"=runEBSeq) namesAlgos <- names(algos) n <- 10000 effSizeLevels <- log2(c(2,3,4)) mLevels <- c(6,8,10,20) nreps <- 6 effSizes <- rep(rep(effSizeLevels, each=nreps), times=length(mLevels)) ms <- rep(mLevels, each=nreps * length(effSizeLevels)) library("parallel") options(mc.cores=20) resList <- mclapply(seq_along(ms), function(i) { set.seed(i) m <- ms[i] es <- effSizes[i] condition <- factor(rep(c("A","B"), each = m/2)) x <- model.matrix(~ condition) beta <- c(rep(0, n * 8/10), sample(c(-es,es), n * 2/10, TRUE)) mat <- makeSim(n,m,x,beta,meanDispPairs)$mat e <- ExpressionSet(mat, AnnotatedDataFrame(data.frame(condition))) resTest <- lapply(algos, function(f) f(e)) nonzero <- rowSums(exprs(e)) > 0 sensidx <- abs(beta) > 0 & nonzero sens <- sapply(resTest, function(z) mean((z$padj < .1)[sensidx])) rmf <- cut(rowMeans(mat), c(0, 20, 100, 300, Inf), include.lowest=TRUE) levels(rmf) <- paste0("sens",c("0to20","20to100","100to300","more300")) sensStratified <- t(sapply(resTest, function(z) tapply((z$padj < .1)[sensidx], rmf[sensidx], mean))) oneminusspecpvals <- sapply(resTest, function(z) mean((z$pvals < .01)[beta == 0 & nonzero], na.rm=TRUE)) oneminusspecpadj <- sapply(resTest, function(z) mean((z$padj < .1)[beta == 0 & nonzero], na.rm=TRUE)) oneminusprec <- sapply(resTest, function(z) { idx <- which(z$padj < .1) ifelse(sum(idx) == 0, 0, mean((beta == 0)[idx])) }) data.frame(sensitivity=sens, sensStratified, oneminusspecpvals=oneminusspecpvals, oneminusspecpadj=oneminusspecpadj, oneminusprec=oneminusprec, algorithm=namesAlgos, effSize=es, m=m) }) res <- do.call(rbind, resList) res$algorithm <- factor(res$algorithm, namesAlgos) sessInfo <- sessionInfo() save(res, namesAlgos, sessInfo, file="results_simulateDE.RData") DESeq2/inst/script/simulateLFCAccuracy.R0000644000175400017540000000434113201671732021056 0ustar00biocbuildbiocbuildsource("makeSim.R") source("runScripts.R") load("meanDispPairs.RData") library("DESeq2") library("edgeR") library("Biobase") algos <- list("DESeq2"=runDESeq2,"edgeR"=runEdgeR) namesAlgos <- names(algos) library("parallel") options(mc.cores=10) set.seed(1) nreps <- 10 n <- 1000 ms <- c(4,6,10,16,20) types <- c("bell","slab bell","slab spike","spike spike") methods <- c("DESeq2","edgeR predFC","edgeR predFC10") res <- do.call(rbind, lapply(ms, function(m) { do.call(rbind, lapply(types, function(type) { do.call(rbind, mclapply(seq_len(nreps), function(i) { beta <- if (type == "bell") { rnorm(n) } else if (type == "slab bell") { c(rnorm(n * 8/10), runif(n * 2/10, -4, 4)) } else if (type == "slab spike") { beta <- c(rep(0, n * 8/10), runif(n * 2/10, -4, 4)) } else if (type == "spike spike") { beta <- c(rep(0, n * 8/10), sample(c(-2, 2), n * 2/10, TRUE)) } condition <- factor(rep(c("A","B"), each = m/2)) x <- model.matrix(~ condition) mat <- makeSim(n,m,x,beta,meanDispPairs)$mat e <- ExpressionSet(mat, AnnotatedDataFrame(data.frame(condition))) res0 <- lapply(algos, function(f) f(e)) resdf <- do.call(rbind, lapply(methods, function(method) { if (method == "edgeR predFC") { predbeta <- res0[["edgeR"]]$predbeta } else if (method == "edgeR predFC10") { predbeta <- res0[["edgeR"]]$predbeta10 } else { predbeta <- res0[[method]]$beta } SE <- ((beta - predbeta)^2) AE <- abs(beta - predbeta) nz <- rowSums(exprs(e)) > 0 de <- beta != 0 RMSE <- sqrt(mean(SE[nz])) MAE <- mean(AE[nz]) DiffRMSE <- sqrt(mean(SE[nz & de])) DiffMAE <- mean(AE[nz & de]) data.frame(RMSE=RMSE, MAE=MAE, DiffRMSE=DiffRMSE, DiffMAE) })) data.frame(m=rep(m, length(methods)), type=rep(type, length(methods)), method=methods, RMSE=resdf$RMSE, MAE=resdf$MAE, DiffRMSE=resdf$DiffRMSE, DiffMAE=resdf$DiffMAE) })) })) })) res$method <- factor(res$method, methods) save(res, file="results_simulateLFCAccuracy.RData") DESeq2/inst/script/simulateOutliers.R0000644000175400017540000000677113201671732020616 0ustar00biocbuildbiocbuildsource("makeSim.R") load("meanDispPairs.RData") library("Biobase") library("DESeq2") library("edgeR") source("runScripts.R") algos <- list("DESeq2"=runDESeq2Outliers, "edgeR"=runEdgeR, "edgeR-robust"=runEdgeRRobust) namesAlgos <- names(algos) methods <- c("DESeq2", "DESeq2-noFilt", "DESeq2-noRepl", "edgeR", "edgeR-robust") library("parallel") options(mc.cores=10) set.seed(1) padjVector <- seq(from=0, to=1, length=201) pvalsVector <- seq(from=0, to=.4, length=201) n <- 4000 percentOutliers <- seq(from=0,to=.15,length=4) ms <- c(10,20) nreps <- 10 res <- do.call(rbind, lapply(ms, function(m) { do.call(rbind, lapply(percentOutliers, function(pOut) { resList <- mclapply(seq_len(nreps), function(i) { condition <- factor(rep(c("A","B"), each = m/2)) x <- model.matrix(~ condition) beta <- c(rep(0, n * 8/10), sample(c(-1,1), n * 2/10, TRUE)) mat <- makeSim(n,m,x,beta,meanDispPairs)$mat idx.i <- sample(n, round(n*pOut)) idx.j <- sample(m, round(n*pOut), TRUE) mat[cbind(idx.i,idx.j)] <- mat[cbind(idx.i,idx.j)] * 100 e <- ExpressionSet(mat, AnnotatedDataFrame(data.frame(condition))) resTest0 <- lapply(algos, function(f) f(e)) # this avoids re-running DESeq2 without filtering or replacement resTest <- list() resTest[names(resTest0[["DESeq2"]])] <- resTest0[["DESeq2"]] resTest[c("edgeR","edgeR-robust")] <- resTest0[c("edgeR","edgeR-robust")] resTest[["beta"]] <- beta resTest[["nonzero"]] <- rowSums(exprs(e)) > 0 resTest }) sens <- sapply(methods, function(method) { sensMat <- sapply(seq_along(resList), function(i) { sapply(pvalsVector, function(p) { idx <- resList[[i]][["beta"]] != 0 & resList[[i]][["nonzero"]] mean((resList[[i]][[method]]$pvals < p)[idx]) }) }) apply(sensMat, 1, median) }) spec <- sapply(methods, function(method) { specMat <- sapply(seq_along(resList), function(i) { sapply(pvalsVector, function(p) { idx <- resList[[i]][["beta"]] == 0 & resList[[i]][["nonzero"]] mean((resList[[i]][[method]]$pvals >= p)[idx]) }) }) apply(specMat, 1, median) }) senspadj <- sapply(methods, function(method) { padjMat <- sapply(seq_along(resList), function(i) { sapply(pvalsVector, function(p) { idx <- resList[[i]][["nonzero"]] smallp <- resList[[i]][[method]]$pvals[idx] < p if (sum(smallp) == 0) 0 else max(resList[[i]][[method]]$padj[idx][ smallp ]) }) }) apply(padjMat, 1, median) }) prec <- sapply(methods, function(method) { precMat <- sapply(seq_along(resList), function(i) { sapply(padjVector, function(p) { idx <- resList[[i]][[method]]$padj < p if (sum(idx) == 0) 1 else mean( (resList[[i]][["beta"]] != 0)[idx] ) }) }) apply(precMat, 1, median) }) data.frame(sensitivity = as.vector(sens), pvals = rep(pvalsVector, times=length(methods)), senspadj = as.vector(senspadj), oneminusspec = 1 - as.vector(spec), oneminusprec = 1 - as.vector(prec), precpadj = rep(padjVector, times=length(methods)), algorithm = rep(methods, each=length(pvalsVector)), m = rep(m, length(methods) * length(pvalsVector)), percentOutlier = rep(pOut, length(methods) * length(pvalsVector))) })) })) save(res, file="results_simulateOutliers.RData") DESeq2/inst/script/simulation.Rmd0000644000175400017540000004436313201671732017750 0ustar00biocbuildbiocbuild# Assessment of DESeq2 through simulation Michael Love, Wolfgang Huber, Simon Anders ```{r options, echo=FALSE, results="hide"} opts_chunk$set(dev="pdf") ``` Document compiled on: ```{r time, echo=FALSE} Sys.time() ``` This document and associated files provide the code and plots for the simulation section of the *DESeq2* paper, so that these results can be easily updated over time. For more details, read the paper at the following URL: http://dx.doi.org/10.1101/002832 ## Differential expression analysis We assessed the sensitivity and specificity of various algorithms using simulation to complement an analysis on real data. The following Negative Binomial simulation samples (mean, dispersion) pairs from the joint distribution of estimated means and dispersions from the Pickrell et al dataset. The true differences between two groups are drawn from either *z*, *0* or *-z*, where the *0* component represents 80% of the genes. The absolute value of the effect size *z* for the 20% of genes with differential expression is varied, as is the total sample size m (such that each group has m/2 samples). 10,000 genes were simulated, and each combination of parameters was repeated 6 times. The code to generate these results is in `simulateDE.R` and the code to run each algorithm is in `runScripts.R`. Note: *DSS* denotes running *DSS* and then Benjamini-Hochberg adjustment on *p*-values. *DSS-FDR* denotes the native FDR estimation of the *DSS* software. *SAMseq* denotes running *SAMseq* with *p*-value estimation and Benjamini-Hochberg adjustment for FDR. *SAMseq-FDR* denotes the native FDR estimation and no *p*-value estimation. *EBSeq* likewise only produces FDR. ```{r loadDE} load("results_simulateDE.RData") res$m <- factor(res$m) levels(res$m) <- paste0("m=",levels(res$m)) res$effSize <- factor(res$effSize) levels(res$effSize) <- c("fold change 2","fold change 3","fold change 4") res$algorithm <- factor(res$algorithm) resClean <- res[!is.na(res$oneminusspecpvals),] ``` ```{r simulateDE, fig.width=7, fig.height=5, fig.cap="Sensitivity and specificity on simulated datasets."} library("ggplot2") p <- ggplot(resClean, aes(y=sensitivity, x=oneminusspecpvals, color=algorithm, shape=algorithm)) p + geom_point() + theme_bw() + facet_grid(effSize ~ m) + scale_shape_manual(values=1:9) + xlab("1 - specificity (false positive rate)") + coord_cartesian(xlim=c(-.003,.035)) + geom_vline(xintercept=.01) + scale_x_continuous(breaks=c(0,.02)) ``` Use of simulation to assess the sensitivity and specificity of algorithms across combinations of sample size and effect size. The sensitivity was calculated as the fraction of genes with adjusted *p*-value less than 0.1 among the genes with true differences between group means. The specificity was calculated as the fraction of genes with *p*-value greater than 0.01 among the genes with no true differences between group means. The *p*-value was chosen instead of the adjusted *p*-value, as this allows for comparison against the expected fraction of *p*-values less than a critical value given the uniformity of *p*-values under the null hypothesis. ```{r simulateDEPrec, fig.width=7, fig.height=5, fig.cap="Sensitivity and precision on simulated datasets."} library("ggplot2") p <- ggplot(res, aes(y=sensitivity, x=oneminusprec, color=algorithm, shape=algorithm)) p + geom_point() + theme_bw() + facet_grid(effSize ~ m) + scale_shape_manual(values=1:11) + xlab("1 - precision (false discovery rate)") + coord_cartesian(xlim=c(-.03, .3)) + geom_vline(xintercept=.1) ``` Sensitivity and precision of algorithms across combinations of sample size and effect size. The sensitivity was calculated as the fraction of genes with adjusted *p*-value less than 0.1 among the genes with true differences between group means. The precision was calculated as the fraction of genes with true differences between group means among those with adjusted *p*-value less than 0.1. ```{r simulateDESensStratify, fig.width=7, fig.height=5, fig.cap="Sensitivity dependence on mean count."} library("reshape") id.vars <- c("algorithm","effSize","m") measure.vars <- c("sens0to20","sens20to100","sens100to300","sensmore300") melted <- melt(res[,c(id.vars,measure.vars)], id.vars=id.vars, measure.vars=measure.vars) names(melted) <- c(id.vars, "aveexp", "sensitivity") levels(melted$aveexp) <- c("<20","20-100","100-300",">300") p <- ggplot(melted, aes(y=sensitivity, x=aveexp, group=algorithm, color=algorithm, shape=algorithm)) p + stat_summary(fun.y="mean", geom="line") + stat_summary(fun.y="mean", geom="point") + theme_bw() + facet_grid(effSize ~ m) + scale_shape_manual(values=1:11) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("mean counts") ``` The sensitivity of algorithms across combinations of sample size and effect size, and further stratified by the mean of counts of the differentially expressed genes in the simulation data. Points indicate the average over 6 replicates. Algorithms all show a similar dependence of sensitivity on the mean of counts. The height of the sensitivity curve should be compared with the previous plot indicating the total sensitivity and specificity of each algorithm. ## Performance in the presence of outliers The following plots examine the affect of outliers on differential calls by the two Negative-Binomial-based methods *DESeq2* and *edgeR*. *DESeq2* was run with default settings, after turning off gene filtering, and after turning off outlier replacement. *edgeR* was run with default settings, and after using the *robust* option. The code to generate these results is in `simulateOutliers.R`. ```{r loadOut} load("results_simulateOutliers.RData") # when < 7 replicates DESeq does not replace res <- res[!(res$algorithm == "DESeq2-noRepl" & res$m < 14),] # when >= 7 replicates DESeq does not filter res <- res[!(res$algorithm == "DESeq2-noFilt" & res$m >= 14),] res$m <- factor(res$m) levels(res$m) <- paste0("m=",levels(res$m)) res$percentOutlier <- 100 * res$percentOutlier res$percentOutlier <- factor(res$percentOutlier) levels(res$percentOutlier) <- paste0(levels(res$percentOutlier),"% outlier") ``` Because the sensitivity-specificity curve is evaluated using the p value, we use fhe following code to pick out the point on the sensitivity-specificity curve with largest p value such that the nominal adjusted *p*-value is less than 0.1. ```{r} resSensPadj <- res[res$senspadj < .1,] resSensPadj <- resSensPadj[nrow(resSensPadj):1,] resSensPadj <- resSensPadj[!duplicated(with(resSensPadj, paste(algorithm, m, percentOutlier))),] summary(resSensPadj$senspadj) ``` ```{r simulateOutliersSens, fig.width=8, fig.height=4, fig.cap="Sensitivity and specificity in presence of outliers."} library("ggplot2") p <- ggplot(res, aes(x=oneminusspec, y=sensitivity, color=algorithm)) p + scale_x_continuous(breaks=c(0,.1,.2)) + scale_y_continuous(breaks=c(0,.2,.4,.6,.8)) + geom_line() + theme_bw() + facet_grid(m ~ percentOutlier) + xlab("1 - specificity") + coord_cartesian(xlim=c(-.03, .25), ylim=c(-.05, .9)) + geom_point(aes(x=oneminusspec, y=sensitivity, shape=algorithm), data=res[res$precpadj == .1,]) ``` Sensitivity-specificity curves for detecting true differences in the presence of outliers. Negative Binomial counts were simulated for 4000 genes and total sample sizes (m) of 10 and 20, for a two-group comparison. 80% of the simulated genes had no true differential expression, while for 20% of the genes true logarithmic (base 2) fold changes of -1 or 1. The number of genes with simulated outliers was increased from 0% to 15%. The outliers were constructed for a gene by multiplying the count of a single sample by 100. Sensitivity and specificity were calculated by thresholding on *p*-values. Points indicate an adjusted *p*-value of 0.1. DESeq2 filters genes with potential outliers for samples with 3 to 6 replicates, while replacing outliers for samples with 7 or more replicates, hence the filtering can be turned off for the top row (m=10) and the replacement can be turned off for the bottom row (m=20). ```{r simulateOutliersPrec, fig.width=8, fig.height=4, fig.cap="FDR and target FDR in presence of outliers."} p <- ggplot(res, aes(x=precpadj, y=oneminusprec, color=algorithm)) p + scale_x_continuous(breaks=c(0,.1,.2)) + scale_y_continuous(breaks=c(0,.1,.2)) + geom_line() + theme_bw() + facet_grid(m ~ percentOutlier) + geom_abline(intercept=0,slope=1) + xlab("adjusted p-value") + ylab("1 - precision (FDR)") + coord_cartesian(xlim=c(-.03, .25), ylim=c(-.05, .25)) + geom_point(aes(x=precpadj, y=oneminusprec, shape=algorithm), data=res[res$precpadj == .1,]) ``` Outlier handling: One minus the precision (false discovery rate) plotted over various thresholds of adjusted *p*-value. Shown is the results for the same simulation with outliers described in the previous figure. Points indicate an adjusted *p*-value of 0.1. ## Accuracy of log fold change estimates The following simulations used Negative Binomial random variables with mean and dispersion pairs samples from the joint distribution of mean-dispersion estimates from the Pickrell data. In addition, true differences between two groups were randomly generated, according to the following models, diagrammed below. The accuracy of four methods for estimating the log fold change between groups were compared by the root mean squared error (RMSE) and the mean absolute error (MAE). The four methods were chosen for their particular focus on the logs fold change estimate. The code to generate these results is in `simulateLFCAccuracy.R`. ```{r lfcAccuracyHist, fig.width=6, fig.height=6, fig.cap="Examples of simulated log2 fold changes."} par(mfrow=c(2,2),mar=c(3,3,3,1)) n <- 1000 brks <- seq(from=-4,to=4,length.out=20) trimit <- function(x) x[x > -4 & x < 4] # for visualization only myhist <- function(x, ...) hist(x, breaks=brks, border="white", col="blue", xlab="", ylab="", ...) myhist(trimit(rnorm(n)), main="bell") myhist(trimit(c(rnorm(n * 8/10), runif(n * 2/10, -4, 4))), main="slab bell") myhist(c(rep(0, n * 8/10), runif(n * 2/10, -4, 4)), main="slab spike") myhist(c(rep(0, n * 8/10), sample(c(-2, 2), n * 2/10, TRUE)), main="spike spike") ``` Benchmarking LFC estimation: Models for simulating logarithmic (base 2) fold changes. For the bell model, true logarithmic fold changes were drawn from a Normal with mean 0 and variance 1. For the slab bell model, true logarithmic fold changes were drawn for 80% of genes from a Normal with mean 0 and variance 1 and for 20% of genes from a Uniform distribution with range from -4 to 4. For the slab spike model, true logarithmic fold changes were drawn similarly to the slab bell model except the Normal is replaced with a spike of logarithmic fold changes at 0. For the spike spike model, true logarithmic fold changes were drawn according to a spike of logarithmic fold changes at 0 (80%) and a spike randomly sampled from -2 or 2 (20%). These spikes represent fold changes of 1/4 and 4, respectively. ```{r lfcAccuracy, fig.width=7, fig.height=5, fig.cap="Root mean squared error in estimating log2 fold changes."} load("results_simulateLFCAccuracy.RData") library("ggplot2") library("Hmisc") p <- ggplot(data=res, aes(x=m, y=RMSE, color=method, shape=method)) p + stat_summary(fun.y=mean, geom="point") + stat_summary(fun.y=mean, geom="line") + stat_summary(fun.data=mean_cl_normal, geom="errorbar") + theme_bw() + xlab("total sample size") + facet_wrap(~ type) + scale_x_continuous(breaks=unique(res$m)) ``` Root mean squared error (RMSE) for estimating logarithmic fold changes under the four models of logarithmic fold changes and varying total sample size m. Simulated Negative Binomial counts were generated for two groups and for 1000 genes. Points and error bars are drawn for the mean and 95% confidence interval over 10 replications. ```{r lfcAccuracyDE, fig.width=6, fig.height=2.5, fig.cap="Root mean squared error for only differentially expressed genes."} p <- ggplot(data=res[grepl("spike",res$type),], aes(x=m, y=DiffRMSE, color=method, shape=method)) p + stat_summary(fun.y=mean, geom="point") + stat_summary(fun.y=mean, geom="line") + stat_summary(fun.data=mean_cl_normal, geom="errorbar") + theme_bw() + xlab("total sample size") + ylab("RMSE only of DE genes") + facet_wrap(~ type) + scale_x_continuous(breaks=unique(res$m)) ``` Root mean squared error (RMSE) of logarithmic fold change estimates, only considering genes with non-zero true logarithmic fold change. For the same simulation, shown here is the error only for the 20% of genes with non-zero true logarithmic fold changes (for bell and slab bell all genes have non-zero logarithmic fold change). ```{r lfcAccuracyMAE, fig.width=7, fig.height=5, fig.cap="Mean absolute error in estimating log2 fold changes."} p <- ggplot(data=res, aes(x=m, y=MAE, color=method, shape=method)) p + stat_summary(fun.y=mean, geom="point") + stat_summary(fun.y=mean, geom="line") + stat_summary(fun.data=mean_cl_normal, geom="errorbar") + theme_bw() + xlab("total sample size") + ylab("MAE") + facet_wrap(~ type) + scale_x_continuous(breaks=unique(res$m)) ``` Mean absolute error (MAE) of logarithmic fold change estimates. Results for the same simulation, however here using median absolute error in place of root mean squared error. Mean absolute error places less weight on the largest errors. ```{r lfcAccuracyDiffMAE, fig.width=6, fig.height=2.5, fig.cap="Mean absolute error for only differentially expressed genes."} p <- ggplot(data=res[grepl("spike",res$type),], aes(x=m, y=DiffMAE, color=method, shape=method)) p + stat_summary(fun.y=mean, geom="point") + stat_summary(fun.y=mean, geom="line") + stat_summary(fun.data=mean_cl_normal, geom="errorbar") + theme_bw() + xlab("total sample size") + ylab("MAE only of DE genes") + facet_wrap(~ type) + scale_x_continuous(breaks=unique(res$m)) ``` Mean absolute error (MAE) of logarithmic fold change estimates, only considering those genes with non-zero true logarithmic fold change. ## Transformations and distances for recovery of true clusters The following simulation evaluated a set of methods for transformation, and for calculating distances betweeen vectors of counts, for their performance in recapturing true clusters in simulated data. Negative Binomial counts were generated in four groups, each with four samples. These groups were generated with 20% of genes given Normally-distributed log fold changes from a centroid. The standard deviation of the Normal for the non-null genes was varied to make the clustering easier or more difficult. The mean of the centroid and the dispersion of the counts were drawn as pairs from the joint distribution of estimates from the Pickrell et al dataset. As the Pickrell dataset has high dispersion (RNA-Seq counts of lymphoblast cells across a population of individuals), simulations were also considered wherein the dispersion was 0.1 and 0.25 times the Pickrell dispersions. Hierarchical clustering with complete linkage was used to separate the samples into four predicted clusters, using a variety of combinations of transformation and distance. These predicted clusters were then compared to the true clusters according to the simulation using the adjusted Rand Index. Furthermore, two variations were considered, one in which the size factors between conditions were equal and one in which the size factors within each group were [1, 1, 1/3, 3]. The code to generate these results is in `simulateCluster.R`. ```{r simulateCluster, fig.width=8, fig.height=5, fig.cap="Clustering accuracy over the size of group differences."} load("results_simulateCluster.RData") library("ggplot2") library("Hmisc") res$sizeFactor <- factor(res$sizeFactor) levels(res$sizeFactor) <- paste("size factors", levels(res$sizeFactor)) res$dispScale <- factor(res$dispScale) levels(res$dispScale) <- paste(levels(res$dispScale),"x dispersion") p <- ggplot(res, aes(x=rnormsd, y=ARI, color=method, shape=method)) p + stat_summary(fun.y=mean, geom="point", aes(shape=method)) + stat_summary(fun.y=mean, geom="line") + stat_summary(fun.data=mean_cl_normal, geom="errorbar") + facet_grid(sizeFactor ~ dispScale, scale="free") + theme_bw() + ylab("adjusted Rand Index") + xlab("SD of group differences") ``` Adjusted Rand Index from clusters using various transformation and distances compared to the true clusters from simulation. The methods assessed were Euclidean distance on counts normalized by size factor, log2 of normalized counts plus a pseudocount of 1, and after applying the rlog and variance stabilizing transformation. Additionally, the Poisson Distance from the PoiClaClu package was used for hierarchical clustering. The points indicate the mean from 20 simulations and the bars are 95 percent confidence intervals. ## Genes expressed in only one condition As discussed by [Robinson and Smyth](http://biostatistics.oxfordjournals.org/content/9/2/321.long) and by [Rapaport et al.](http://genomebiology.com/2013/14/9/R95), it is desirable that, in the situation in which a gene is only expressed in one condition, the statistical significance increase with the signal to noise ratio of expression in the expressed condition. For example, Rapaport et al. plot the $-\log_{10}(p)$ for *p* values over the signal to noise ratio. In the following code chunk we demontrate that *DESeq2* has increasing $-\log_{10}(p)$ in a comparison of two conditions in which one group has all zero counts, e.g.: $\{0,0,0\}$ vs $\{10,10,10\}$. ```{r onlyonecondition, fig.width=4, fig.height=4, fig.cap="Simulated gene expressed in only one condition"} library("DESeq2") m <- 6 disp <- 0.1 ii <- seq(from=1,to=4,length=7) coldata <- DataFrame(x=factor(rep(1:2,each=m/2))) pvals <- sapply(ii, function(i) { mat <- matrix(as.integer(c(rep(0,m/2),rep(10^i,m/2))),nrow=1) dds <- DESeqDataSetFromMatrix(mat, coldata, ~ x) sizeFactors(dds) <- rep(1,m) dispersions(dds) <- disp results(nbinomWaldTest(dds))$pvalue }) plot(10^ii, -log10(pvals), log="x", type="b", xaxt="n", xlab="group 2 counts", ylab="-log10 pvalue") axis(1,10^(1:4),10^(1:4)) ``` ## Session information The session information saved in the `simulateDE` script: ```{r} sessInfo ``` DESeq2/inst/script/simulation.pdf0000644000175400017540000140144613201671732017777 0ustar00biocbuildbiocbuild%PDF-1.5 % 33 0 obj << /Length 2504 /Filter /FlateDecode >> stream xY[oF~ϯЦZbluŢ->A@Kc)E*}ϙJXc_h;h>[NGL!c>1!HX% 5G,u5,ju7WO{1˼-j_oY>s["y^ΏO4UivƠdqd]֖hF`:i\@ڍ'협rowߍ'=/9e MgT^(>,FJ)&bt)׋€G34%cpu%.oo:w*d=US#bFc{@Q֭qof@vЏgBLIA2:ڳ@;lZpsMU%"rhkMW`p:.yhlq8ނĽ[Y0vƹj'@}Xj2H)6eݠ4!!Tm^9s=]`᬴{$Ĵ,-$.~{ž\ZXȵ2 R7sF cSSLQSkeEKWFx8l{^)霁M9#sBbJhtqQH;TWhE{ $2uSTXN{\HJ)nv48@L,H4$@J4(@P,з:3~P0G0)ESEA=RGꥋH7?PD`No+Bȗ+WwRՙ{Dh6vrĽ~H}PI2!<xno"4g+{ 9Sp:y{kJiHkz#0#oc.ե4گC Ci.X/(0KI\|4y&EmM; dUzT:#JY&b&@%6\a|Ζ? 챎UF81Rw^Ǟ/6XOWIg4#)l9Ox_gfm9=˙kȫG ?pHg\GF1M!fٮ(5~cs140ʏ1a -+ ژ>s # pd)n?64zAWωP FEݣds}dC l_F8+6qa 8}ᶃE=´.fGƦƳܱ}#?N%1(f`ś3L endstream endobj 45 0 obj << /Length 1556 /Filter /FlateDecode >> stream xXK4ϯp))Z{`)K43;ÁNa$dwr׭ _D B-#?e7_-n^YC[1cCaM"rw_BvkYh}k^*a_wwo-b0.]- vBI^;9{]j?&!Ph60^6 K!XjEIF4򾔵8 6A4bW;b׽bvT3mkESw7{z]+oKa1zUI^#9Yo7RB5Pj"7(pѣs/pus{Nƻ4EB1,} MR:ï!c쭨xfړ$=^$A{QVhjj'#+=u_Bgmf&xǕӤL"C`Q?)j2kXɌi-'\)ۚЖE= ,GCl\oݶrymLuA8GAd$ANi`h)찂g߬JEѹ) ,{j+Cז#;_-> +i#e‾(+^isꙌtlH#9^Rsba &lt)Bܹ +sjꍗ0!t6lM!W`m% O;cCΑMwV% Mj:O^20$SilV{=Y88 {zNpx! `3=>ݾ8hbhXwm'HP R7Ny3!)z6eۧ%5hqîCSǦH復c@z0&i,8cZx6ʺ kN uWŖ5sC) Yz, CS.ACK˲1ĸuӺRiz[*U jj{"LU:Dmt2W Ӡ6ޤqm:$l_K;1Y۴cưZI` ,a}@m~i߶IkrQa\snIcM/w-omGiҫ6]t &3tG Vh`u,"Z# u{-1Fۀ/g2pLr|Yz@!ቶԮ́grR6bW͵i}C8(ф|[>\-8_vW̄Ū˻\k^ni2z֜c^o d93 Tfw%U܋S^Qe¦0_G! endstream endobj 42 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/simulateDE-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 48 0 R /BBox [0 0 504 360] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 49 0 R/F2 50 0 R>> /ExtGState << >>/ColorSpace << /sRGB 51 0 R >>>> /Length 20382 /Filter /FlateDecode >> stream x}K-m޾~Y XxaJ KL25%m |9}/-8@f#@ͼ ?>1nǟ嗷_xGsӿ_w7s_/͟epoއRo9xņ7h^ O.мZnt/pyk56Wz;@R͇"^S9ݱ>K,G66SƇMq\tO]0?k]SKdz<旼h{w=:Ηr5n8>Zo7MswOr\7zX{ y|9<j|vi&%*AIqoԶŧQ:x%=0BBVٸ4vNPy'2V@ZFkGޥvs yi׮#tmh| RC֯O ׮Nk-NZܢ%NZdi1yysi%wG6U^ 줇uʹ_S҅5r޸2%Q%_Zow6V~c|hcDJ`ώ`p{*ZjiFk#Z]ᠵd9 o&VhhmV؍:>NVUle \>~h| njQRlmŮ`#lm֚Ե6hDk3';籟A [fD'WNv-V5%7֓'5f%Q9 $5ԭqfWc\#'#֒cvkEgqH/kғ18f,%pc7Z#6'O.&`\im(]_nsx[8ި,?fUo4[I|׃m⃫W z@t;:GؒTkZ;?f1O7Ovc?okS8 01JS*9 om-I!<`8QVg?IB^D<[(Q`U!OxpT"Ojr[{=ªO{?QrhmQdڑwi%Y?t>o=d GY(4mpq&x؉} Ei$*EVk VAk=|\Z!N㏙͓gHmewަuw l>U?dU,b ڔb@dMC2JNdU{mY&в,MY8@د3HHrK-p#BwjMzʺUiwIvH4&YM*=܌]絙kT98?wYk(u>/ǂFE:B/Ed/Ӟ#LrW6f,R$'{5[?Ƈ>C7\ί\;t@OPkN汏dyT&82[a y7SK'ƒכ5Z<,ؖeGl;]+RCcoPukr71-=*})'r9{؍j)sNa-A4r$D+Ņ-dV&az^ױAs ȁ(jzξܥBj-qqTVc)";/HZKTd0OdV8yI,Ž#l+Z˺WՍIheEjmv|{I 5FZƇFk9!ǷD[) YrABtͬ#ZNjjEM*9 o- Is{%/@kŖ*?+ŷ1ZN2 GAuqT)WjJJ4(N66}f^Qe/\{Y2 (dV*\=c3';籟.n%ўB>/S>rȄ3fɭi(/>(/rĦeC#"ۂ(z 9Z o4y@k Lg0kʠh{*lKc7Zk\XN@eJoJY.Ƈـ6ˋ,iN?t`[M./m )Z-'{.>RS|yT*G[Z!3';_b`ƥOZv %8p=o6$Qx[Ő![ f%% hր ܑ ׈- CC|[|DbkɛC+T5'uLc _j9 oc`&Oa·1|'ԢʗhY| ZkD+ܚ)?Izϵ\SR6Bk@lZ[8l:jmt[ZN㏙͓k筟T d/3A( IyR~͖@YkB=!):]ueRBR:f jaۜTd EQr)e}.$mAOˋM!*+k.ő@rQݙIA'>8If(pyئX,R\'L z5CVgl`t=%2))Tj47I}3=wcn>bXm?bXS,ͰM,m=#p-0Yl9+[=#aHFtO^1T9dV-M㉕k!_cdgñol^2)@ޢFoJc52449JHd?a #Y8ũljEŬb%r"%i{&<1RU-agӪ}6^ԂS4-z~!rqF΁QxWdiVV81r!Il(dZR/܊Წ9"fw_|K1}EWyE8P(^vW(QBK5i7(~YcZ¼,vJ 5&Drg~!I*89AkQKd<_Z(lb;p~ GZV{7bŻ)xBA"V552%-tV$]o3U&K+^u'iYNǺc[8449J4,%X5샂EMУQF-zPl- oRi#T !8BlAJ5;$ȫ*(ƤSnu[ˠ_ drjà{S3X.+G9Ʀĵ.@#y+Q*j9o/-n,u0 ~t,{x3ătosݜw+ݧ㿘zй2 TB1TcffDR`:E5FvT,#n;AYzIڻTBT#gBgDU,ߎ sa@R QR i8D6WD!O̽_Fbf`9GV:< !H}4/byZK sı̹x.dlY8WľrphbōN.^HڔUZ_;AQyW@<zt)`hTk@@!;UOYiNcd^ xNo~2`wٓכn%H.{NwDx90 oIZ q}%yM =pF`4*^ 0ymi<DZH˜ˌ2k6DJ u*@Zn( Ym;:9v Gر.s8I.;8dRM P~,oڢV҇HYbSGzpVZmc;Zo6%*ڿ9<y!zv9  Mn+ "9k9sxXA1PP9EX\f<_p}[ QA<9TwnrUq>Tځ9) tV>#lS| 8,UGlx j/8e .5^ʧ;bkcK8ANeHPN; pĹoL86up}%0~>c,/A>ن$[m͆Щ ixK1(#\Ҙ2F(a ası̹x.5-ےx"ݴI8}Kl;$$>41P81pb*xb*A)h:/C!%4^U^)PP1b*b*Q~VLÊ⊩DS_M@'%48& vfT/ɠ9*i~GD:-/#ZJ^KqtN9 *&9j%ф8IȈ~8MZ|zAW}._ FHzߘ&,Tfpb4V"Acfdl |2r]ZEMk=tIKJd;'4 dn8dhydn遡$zP{8zX$eesG7cˎ|'|)%Dyw."IA_@E16GrVS5%&!#./UݜxˏzTN D4s@A |ҝ5%:=4 K(;1_zm '/]#v@ux!zT\ګI}rnK+li+]Q6ܥTc,^iTucX$ees VC\Rbh/bGzi8p*J|*3xJ>v|Ym7ۤ[M[M &&|EW~GWjjӎ }7`MT@NO#:m'j}qW5w5+ƚ;ƚާ}v5P5(k7$I4ָ~&j4!wG'- '+Hj[]$KwBEȜSV26)^CSdMK/qƵ M]~_Gu xSFx*%8wG[ZH:XZ98#BZ\*@4:nvŲۥG;˟9N"x@jmTD?#U?yʦ jnbxT1jx r9$Q߆' 6#^VSpK@Е͈O˜!)T|tL=ȣHHY 埍o";}D E8I`/*z9FzeD(}x8S\+UDcD mQ>cKA~|`wdIF]#+F;~@Ub$gͰ.Idcn~Y>g3Y=zA!Pr[m%'զD4 Lwpoto+QE$mD"yq ^ˌ |Sa?hj`/p-+k:c^Ca=ʿS0nRI2]h t~ ?Dɵ%8Y(ᇾs BQHM%bǥ ^ C#^H>2 ];rcn~)v\n$`̕,z\bE4Qzr *ϳ%o`'Mmş7ڦ`ަhM"h{&Z9cDgs\!%:7g1w={0֝q!"|=e~g~,;0̱ ].>gmQ*:UQc\?W ;uFRBl,(a[,&1,UIs1(e9&;y.GV5>~h|SiQT!ν6byt@;K jC#Vu? ɠŕABD"G@fŦ3';_N@OϨ3rO- +gARw}aV)nC0|ōۃ2:x5~j0E%Dԗ&-z@f3QR l|}z vUkzK%ztq|^ 0c? ~G91`Tm֊]:?4xuV [V(%{dK9,&+F^㏙͓OunI%.92Sq^]'j (V.nzlOYGJIhR_+p;JܹGu <3^:Y%$;6|AtQj,<=ט;g+_DA $ђ TuAP6~x \R=`N Kξ>Q$-HLX$AK5#&Jgq,22㹌?K3G6"8jnZ1*,d8hV[( mHN^ᷪ5(~]~5`p@ӚpBN+gx1DEchy"U^$mǻgvg(l|[4mNaNPAf^Gzi8I9 ̼@( }6 ID?ixQhYtvBD"I Pcb%P(*- H-D h0Ҫ69йE9EX\f<_pBM# yBmMhx=!@mdzkxbȢႶrlԆg';7m_?1~p+Jw"((:+Rb, 7x6pɁIGtQ~P2̄ep@jqjEK"N- 7IDvR6V F^YJ OڅN-Pëi &-;M >ݔ^8cs\A=bvv?n*f~ ;e/ѣN<֌z"6G{{/]6ĀZ rHi/ '屆'~MˬXYÍ <~} 2hQ~Bx(83TDniY$BG_»/PgL%KHoD0JKMa𶅦]UsUKd;5/)(z"q,s.3/hx8c\嬉l44gMm7zix\MrhȐ;,BwAE5K(4&ߴ .^YF0bd Ap`J]~_fpJI Nu8ГFãĩqOWV\/XTU"iIxŽz*O~ X,q2@y-"5цtn"q,s.3/hxP sۏk8V$m_(o=4# {>OIb щq+rEt 4Ԋ0VjUY{;1 /w $n(ҩ)w&|p5&!( M"h^/غ)[$ԃf5ZfPډP{C[9`@ѓUBec89e'd`Oi8p] zY@O; + q>AoE7FB(CeAqG3lzzQ p>Ѿ~S>#N)gKe2~nm; DRػѓN֝|z7H.{ >eCri ѾX!0SWd[- @=b,Dϩ,尻0xKߞ8cs\A |(ˑ fVYN; ƪJK @{ 7_'eU" I8mxslV7N0埝ϧsّjpNESڑ&QzXxGG&,4Ie%"a{QUN^xY$ DPfS , 4Y Toy4n1Ic*Ki0DZH˜ˌ2k 'ljOAs'{4Z1с V3 -49,998#=DlVt6֦,,Bӂс:,[- 3Ighy>XeϠB3T`J="-^Ƹm(V9Lݔrg±S/U<`V4Y$СJ_qI. Fg"Ƭt Vc4tmtN-J^i"z^EՆOsı̹x.㿠 ?`|Ź2ңN9GFmVZ?ybmI)Sq "RaTRA}[ BF|l+RzgNN0}J9Kܼt$:=Sq'ܙNB5?ը y ूΩ llVXeh$aV}pz6wAktA{R2.2#t:P|=esı̹x.㿔R67tϥ+,bGbf{?ң>Df0ctL-,|9-@m[J$47 ߏ5O(IN^2UN){ o ]qڈ2ң^ifB+(C_ ۛkxFܩp?}/D+G=@>aBt+< o;Ŷ076vJE(cY"q,s.3O$nݒx>nv|n7ۤnN ['[~ǭp$5q4'`l406mdRM;=[Pq뤝 ցqpuz N֩Dí[LzoOBT@MÉ[P3$?ң;)zP$t70+6J RZ0i:6iUH@ei mRdk3*JJ> r>{b5#qٟL=:NX\Z{3U 㑽s4Z jQ%E·`1C =SH-k||ж#_ c\J)Ahw|I}ҡY)EX'<_+E;Kۇ/X́R%:=JPUkBTZ lN=^2Ǥ iL*{VcR/&G/\́3i(zbOv6q ^VE X| 'T?ңN'ޯxuO hHWS? 8ѷ`WP$X%L[؛7JW Vd0P3\ӀsU?÷x32ě"q,SU?)c:EM(1|vuwA%ܟ%t_U]maTq+Dt x6GL@g 2~PjE!ZۋBBIE@ToQTE8UCb:w8fa4-ޖ f΄QQmAm UeG,g6m6dG|dӻh E';_R=WVU_# f=JG^` **u~1/Zkz[=%F;JmM~l+fI]Wk)|62~ߖHT{q\GۑY~{tRWU{coKŎp[i3{?R =*j)iqzY`jvd?3$SıL9Ox.ÿTI2D>JѣNZMo $BPۅ@#V'^kEGIrƑLz<58'u>^3vGwzzGUשdǹaN2⪖>+ "|"||PF2EWf[XKR;<)EX'<_:ga1ߴrnrrr$6PdnIpz/fB;tCj_$LkV||g.^|Fd=|S>ߟei"-xdSQ~hǡ ~m4GV;aɔ\iA4/JO +E`2JIvfltsZ E';)ޒx>U`?\̭XxDBTb S  O$, MjĖ(]<1Z]$dcѫ Bg, {j*^y\y%C=?6|:aM̼S+;M5- @G~TUJȕ&$Au5" pntS ߒ BUN:2DZH˜ˌ2Ӯ-MC5=^޿3ViM4+5 ;QHʻfS$pWbɅ_5|bze=4ADp_R-hgQ4[(#9|"A} uT `nݬ40i&JQim7ΆJh*ic4q,22㹌g<2sE£͔]4zixXD/$[R[cxuܵ&U,V0?)G9X4$dΆ3Z,w7En?OhfOYN.Fsv$9G]~'ų԰(m(m*} 8C;-sH$5 ׶PG {BUZO/-t[P`G Q*ȫA[c3"q,s.3/y[/~l7\y̳ڑΡ`5yuf"9~p~כherT I769YT- 2]%$|>;.iݏg(p8h8}oѓN #N#(Q럚|$IXŏm8D"1 NN[,e1 ֪ wL3"Nw >q,22㹌&ݴ6nKn|.&y,opjXxDBTb S ObicKhn%5\7I0u M@='תXxzH(J Xx"Xx*1`*' OL~5՟L]BP {왓3<)87fY)zmW[H3E i%uݐHh54 mCe$"'0 I "WU6>y Ud?.CAJfFGQ639›8rrZ&Q_$ ^Llfzͅ~R"q 4(eHf c,5jƖh0the5pk?q,22㹌Qu>Lr &@(st#vZ@e 7-F݇/ō0< B>5@&NThK7tl|on.78 *aN;G7Fs `iM t"L$!EjM#Z= Bڀܼqt—g%XMzP3|zHszu"q,s.3/hxU/ء$hJ&B@O; Gy8I1@woDžxX4x- )Ї6eOMo>h8M ]6s>pRfNKUw6<$ؐu(K`^+nmp/]´h~|g~ǫYB )p, Bih(ΝFƞ߸tytX"'ᑆGݽp8Flvm"6qNӌ| %]s4*JO; 7gx#t1T"k8K$Z&y EeMס.-,RR` v(mSWeՙXę]P"q,s.3/iTsNEV(Ln8OhK0yqac89eNXK89 |Ĩ#=ʿSGuТ/7 Y$VD=pfdikc ,g[|wYDZ4RN9j;^DMjGU"-V1Dž[T=@]W$0%}Y4kKH%=q艻EpRA#Гw$Mj9EX\f Os SLzoOBT@M OH M+$4EM}1ғQooмf98`^h [kÿ*)+5 5faӃuNw៝/[KQXqѝ#th*IG[ˍ@g|v&Vvv zP%|YC%ya40N"'"QI$RoPXY# |]ծHDm@9 sEvpLW4e Д#e9X'}hM/MLTwk=l{v@g ѳ| VDny(ޝâvK>)&fUs8cs\!ДasD\j\\=CSc\h} A556v2ױVYQL{K%!l[dUn`_& 4oN-qB70:$N-t}k'T:٩ۥ >:1hI/Dlui0t ;\4u߃dLL[i0DZH˜ˌ2Ow,ƚ;ixXQݎEб(d%[k8g5l |dBt {,5ʀqFg0 w4EHZT Q~ǽ:02x*x! u@$d}u`4?8g"B፧\$&#U= N&j26좪6VU 0t sı̹x.㿔^K?Tz f 67O@O{\ZU ;vȰ᭪0ÆU/%5 hgo}埝oj^XdR>w+_GwK+ҙlFO,6T|be9}re9՚f:k {NNVs9޹he +&-ĺeU擫4g.EKsp{\͗4[.'˅rT~nڭ_qݭik v"ɓriI<ë| ϯߞp?7}•?ǿPm?<돋<^o=pq=| /mC3FP_2($ ոQrmɟ~TR*^7}T]L{xZ&Oh*sڔ?V}'X J?)k3떿4ˍ7/n??׿~o~!~~]qچ p6: ӏ?+=w\}ʕb_*o\L&(Ó~{=_恽sv~`g;; X7މBUIN| w<{c1;+-=FVS< Õayb=n\n4n*ZŔ*~ga >,e{h[l\kkO6n&rCSC%yX:~{$jk:~b6f }A14.cGIPRc轜%p ]UB+oR$$ZIOP#Ncpg_ЪP.NOQ[ںE?4DPnxga> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 57 0 obj << /Length 1139 /Filter /FlateDecode >> stream xڽVKs6WpCi NNzkd` wLR_d<2 QH/",JQ?NSɟc4VpНH@@'Z`0(4g#Y@\B8- vAp!/dNF%1@, E222_%[]2Kǟl~UYktꪱ5KrZ8 B܀]I+n㿲v}eveX Rf_+T}H~Q̝ {gջRUkc dZīe>f[;k.! 7Ǜ^&$yAU @>خ?\kCQA1grĦ@)$7KCQRe#o'fg>D ڭ:rV-а2OíRWm9]GRѶ~(*pD {],LNpq|؛M,g z(/د H! @B1E6 %8O{H[6jZهo]: endstream endobj 54 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/simulateDEPrec-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 61 0 R /BBox [0 0 504 360] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 62 0 R/F2 63 0 R>> /ExtGState << >>/ColorSpace << /sRGB 64 0 R >>>> /Length 25356 /Filter /FlateDecode >> stream xK-q%6߿ ~ <0%{ S}Eޕr:{G"8嗅./d}/^7OvI/ۗ_nǿ~?н7lzi}n忽=7?6ccKr~O~to_?py寏ޅ_I?ҿ#˯R~ƜҷLS㟟Γy9;fg??;/xNϛo=o%g??=oyzz3ؾ-wq 濯A=ĤpE߼?#} 5}ua0dti2- nMri>^8/izU{cEK E^f•oiKPe.,u)Ek5A.=~~\gY.s*A([迡T6o_%Khu8{.hdw)cnI,喾Ǖ&X߲NrZȕ4%eeCTW}9>VJ$/ 1LzvO"ƮXu GUf[ʾV9>"//lI"|`NK: }phrp< RVckdڱ)S?r>e<9ץ 8/ց?qȧ~!y vn;v?me>'Qd&>&/EX_aڼ}dkXd Y嘖ދ.qv 㼰'XXsAA-k~rwC^?Ij~ɱlACK]dF y]XQ!Cd/ P Td\@<bz(y|l?@+`:((+XRtøIٟ.|,c\pvn)c>ʾc켎OQ'+|S>Y8?V f2qg>+M7_pG!2> r?~I[XgO!Q6Gyܛ#_kz?$|(Z t^>]$re^R!␹RBRq z 璢Mq mʡC,c{E>;Nc{ù~F+i%*"V{<ȕm; 8?=mv|r=}0ە}zEdOVH*4p@5QAE" -xa)~ 3%;:Q^Wz'B^GȘ1 Wσ"V(}dJAzWG6I .cyYx$۔0M[vRUʙ 6!A ayn{k% ݠhcqʯWgSȫ3$~}7.ʽt|U_aEԢȓ0l= yy۱[^Uin x29Ѓ PqPx]' A.^#y?c]ւ04&66?xɷ^FTm'Oc!)xHe=_UBSp A#2o0|53l^J*8MF#,aΪ#vcGKq蓅sE8_ξgcv L6φL>#;jg0^5z G̈́cY~d=C&ƛ2b(h*툑팠S9 XL&׷,w NڋgPcg2'yb^giUōTgڐwrT) !/0~ux0\Ȁ]4C}s1XmlFPM~XSmwGk$WEVwE^ŋS!>QK>\8V!xqnfHǩk UU?G=,+ IA"ߣLaL 42Ez oy`s/>dž`Ux }(V,h"]QUքI̊xxp(%)އR=ȕJԁ)*FvLnm^ۊD^%eҕS<1s`[U{ rW= F;74AdLU"HI0MѫWze}m 9S7lYu׊Xq~\7ئ!t^̄!NmrW -wg-3j15E`>YlU]M^DnGD6̮F9~$F?mI2!(`vWɞ_&c2o$ 2I ,]6 {!9OM]zrPvoMR%F|C`띭NS u l +hʼ,?Aɿ2 ڎ(Pfr3:+t!פP6.2dÙ$M;ȇx1RżgBScHQ=# ZKoy`sg`)hZ55(':b4dS/?"]*}7V!LB/{YȕymI"W!]ckZ:v<` :+=lp<=p4WAbo$MKAlAl;VYl*[Kd+jK\ՠi $Wc[wc$n`߬QX[ȕy+!&mǶrA3L~Z v<`mJhP@ѣ[IF;{Fˉ!i6Vn+F"feITcخ[l]#a i{A)#uʪbbݰ:mkk0 HCl&y%KvZKr2܆[lBq&yb<5:ErZ}+Ɨ8]j '9}Ð%u%t\X~h>펂lԂ]tzeg띭<#iƥ&XT/hxB[pS{VS56/hb{3+F)Mꫩ5%`g(g UvMMvAj (W /Xn?% V$Op˓<1%{^z;8Zo#OMpl㰁klBD MxD&7ucӤ%lv:6o=ޏeAPp! lfhfk  K %?tWf$'X&Wnoh5%=-Y53,aNj2,ar?O!uMŋr¹ш4- jb(t:#ƭ:LGsAE5w5;XغCZ{֮[UH],x>t H$OiY2,/ske_Z ^>+q0Foջe"ıDf2R{fvPǖS۩v4rqjGA kT/0^0Ėd)9q-ep)tG\ii=ǀBD+`9X(.f jܚ]ߠ总RôJ@!ʨ2,Jiw @b\lQ@4!zwk=S׍MZSF2F[}xrFKEARK Y aҚ}V${m鲒EjSunhU^k)QQT#:NH]T]fPTovШ>f۩v4TU e.j9 Z8ir% Ξi$*mR M2ХGO=iaxT2sUj &{.Od0UFO;bQfw5vP0V[9w@XSCOǯx$ TE9m^Atj̅y ֚* 5eZ%uVI azUU$C}1+D<5gUz-ըv]^Jݐ05U2TM4wW=~IO4$7td+V:P{`ѤSU W-Vnwk@Mݚ֎{֪*^ ~Jgi85BwKߋXET sL[܃-(NEB5ȏ#VW(h77h(\A[>4Qf(NgwfzLULR(#>-^ؤaTnz=Z)vFV5qUL?.Bdu!W歄uLhUCk k4 YŽdq>+5lp3| i)J<ɓX$5j7ߖBFriVUl>z;t~Jb8&O5©x .ίHXUUwJXE5TM[ǰɢrѠ7 WJQU' I+9 ZXMUG%e1qWE g:u݀YװTN? }ZȅwUzP_Ut&Yk6 Cg޼'PHa™1njڦ): P{ 7_DAZS[DA#.#XuVN*ݢ4jv/uF#O+GUmr^9: ?Ϛ}>TM:^An%+]rD>?|N;Eru^.:eЙ$QɚTgXz/I6<\ـ)Y{ݐј&h/ # cK8@I%`?*$*,;dqW]Ocx29x!:#dX| v3)ӤM-#,Hs^w C>S~=@GFԈ]Feۅv2?$WEiZ!@c&yȠ^8Kj ug+XRē)p}[hF*34CR4r y3}+; )@?Vs;y&  JL  %]cBDLhe$D|ItKgPT;wE05bu+ wBw<(pna-{vE Rf(@m"f4bOr"IedP3iph 6.{~uhH!h@NV%) P?qp.c^F_c@'Y'mmmoenL6$A$4H 2F8`eAQ }%AXܥ?j1,ʿɐb,]>aeP,[%<~dBTz-q%9+e#u4N3@Q[}HvIRq "%ʿEu W1z} 5s;9Fw@ (J&p}ꎼ1OGƁpOid',Y"ĤjE.qtAI!azCJESKy2qy~yPxyZiQh~9{2 zxzT,NJm8\zѯPJ#Lz("C(נ\ =58dr9GC, cdEKF %0dPɵHEe(Ke(.c.˘_E+%("˶ir9|*0j=ŹVlCř|[kW~ZoIXF`=0Ks| lBoGUMͩT4I1^(k01ԆMPauYPVBk.]'L؁,iaON.>h(;P4+(8WT@ ^we]/jٍ;óE@>H Ɓ'=(JB]ōnO$(B}׺ r4eey.##eۅv2 6ŎsA3=)vOH'*iL^o=?:ҒOKZrJ 6EAe0(fHUPdFM.?*1j+@1%l66WyUAWO@ &؁zlD .2J ]ޠd;Z jUK:@Ȣw>c/)Vi˔`e.2a-ZI^qrT 7&,$9cʤO I,1sr1elU2 v6 1 L}#8ǦCP· s|#:Y8a"oz&P#Fw?6X(#(#I6^H)KƸ]8n1/#/5*TNy2u^5D/A/G;I77!{7P'QpH70 qvɫ<2>S7fpLJ[>:.P@ͳD}HMDTǰ[7-aO؁'*V (V ;P\*45 ^yƇ^sPJ:O&z˦K -}ƾGiSt)V:rtߧ1yp-& ;XW?ataE 2eQDԶAXvf'K-FjZ HOwy*jyF ~2:n͊_GKD Ȩq<?& 8,`vVYYg Nkp0 b``<Ȭa?F  PαaYg%kg)k888b8/09D`> `>]C;^mU$aT!2ʿE,e8F{ɑn/F{W&ΘnT'=vW={PF,X}Ƙ^?\(5NqLzgM+z6ڠ{դ9PZ;)D${wvd>Xdp5kR)Dז鞌aM<@?j.MDdX {6S۹v< Gdzl{h`@dCY/ήv޴D4$faX%DA.̟WRGBf츫 EuW?pXLK~U8r!5*Ȓxqab==It P/2㠿y{{Y5F]/fG( r=j&~ТS׫ 'Z҆IZs4崴k+yQ-z|[}e{m ǹ5&ͦLG%Q!g4M%;_Ҭ{7CWVcȕH f."T%Λa(:XgT;UV6Gb3s_uZxgֺZLXg հwXgq"M%}߲,]ڿtj7(ZⰵW=^+֡QMUg.+=8ݤ[z0]\ITMTdF7btG /P ȭZ jY3Wp_-f̏@";I+:m7n4~짢]*W"ju$Eu #fvG2#ۙzratU|Ć'dfE70:.ʿe;k91>NR!UѸ(VD/KZCpnxx&Y ZlHRXe}jS}47Dv~oa(^DZT=U( ҏ #hǃWw4.Y%x7n^SDsZ*D#)'-zӛ &m8bg"go$Y<:B Noy`s/'ن44d9r 5Ts H"4/yoQYra%M/n0)֣ZU7K# 5n+2X]PO6Q.59Jyހgl8bK.3 srd.g%XV5jX?AeGoU5SQ;rB{ juk6ڀiHڻc֞i_k]UtM%_J=~tSe ӫ;,PJU){ex\Nfo΃}d7di* 'lBʼnT1praX@^wd G`D|DPNn䏇GQ/%8$lm/GpQ/ei1nj&Ǩb^qz)XꥄrT53K@JoȒy"c(VT DTHA0*?ВujgRk? "rix29cRǀ잏=*lj|wbAfn%"f) Fb+U{]9P w@Gޗ;at##|bpx CI&g7$%itNHTK 2e@@V#P,(rPtFƋ4|*e0|U*QBcULOb['wW^?sfKu,71g@Cϒ+HEf@Z{>:>YkZei>gBON3MP|p$\xklZ: lf<߁b MP%pʿW;KVY9‘fg<5Ҧl7 2e@1bR۱!{'Qat} %F@}4X"Q.e(kbI_|I*m&nNA;P$*Ey[é>Hyi8?7Z މ{8:MG^AŚJJV5RЫiG{FCHR`W=(d5IP<;$9PVYjgESl&ΎO&zPZeitT%L gcuRR;D*J0ҐI>/c.˘_k<; i9V~CC{GF'GD>h5x a'~Hxb(!7G bmZu}HC0F {I1!uaHQ@H;cwTyL؁bPcsѺȶh~g8s˜a49B;PȎ F<ԑ##1xM2Ί<-Ƒձu8y"5'f\]1(֎#3BM8fS3iۅv2K6b'Vc$$vbBD3KO5Lb۫K.ws99ǰ'~"T-13G@%9 O x_.āl#4^|^ʶ RnK!Ǩb^ 9FR8Q^Fz)1rQ?&C&br00CR6^ͬBQ/8frz)1pFFy׏^qz)1ߗ!sF'@``@``ȏkd`0J3D뜐>哆|0'[e_GKtK4}ph]Dm[ъ/vmS#Ba<`T9drޒU@yH7p2ZړC{p1'öR ͓Nӳ''=?8ު|j]~o\RFŖOOH@ H0' Gbm]s؋"g\lFkXE>cOib?rBԺ ɠ>vqOӵP)9X%ʨ 2eBBÄ2 6WAEJq~Y43PD®Ɛb5!1 z(IgȰƹ&3JTWpRiG+^bGHMz߁Gi'8̊Ҟ.gukc`ݦd(_* d"G ΂$>]BZs} .NI` Sw\UzS# *MRZ#G5qp?S+@%pÀnv9}p$zrT(39﷫T#bFDžβ<:挻WwGz]Ƹ]8n1/#/BlQ",XM_u2L 0Y=e7"Wj> &Gdz,;&M"|؇ղEC z z߀+#͵XTXPtsg3 CÁFL'qx@Q/3W }ݑ=д߁bb+I]CȡIoXGh4JqtKB6IbkUUy~({LhWSHK/ĠJOQNEf fz~,ܣc.˘_CF>{Gz BԊ@&s<ER=@?G&yPēJ]cVE# P,ꥠ}KQ/^ 'hKY/?BLj:  P1XQ/Gz)>ꥠ}KDz)6^ 9FR/09D`> `>]/EQz)*G_Fg['vzTG+3Y)YGZQypk (>&WE[=lS}̇aq=!v$Aʿϲ&M*ڠ{ed C99.;z~.-U[Vdl#W="kkSHvo](rGE,:POnRFdBF*M~:rZ +މe?VUs\"KgJYFv(65'=)jГ2Po5$|D9 ;J1 VSTb,(Dɡ[-Ԍ57 %›^VƀYbf8ˬ| #5Gq\"G"pT65W(g~Y:# 6Ӻe {^]ѽFµ Ǫ'iTGp0 ʑϋVX"1L[5QöFRgvAbeztGz+ǒ 2yV&; {V,ITLelh[D|Y/L&NraM05c;ǝxQY[Y fbqQ+M +D Yz{G/Da{ 7=;Z|*+`[};ˤq b.:Qм,(h@DkOɟ媩 h_*<4(yY*Đa_yOjar8h#"8WDDަuCP,]K@~rK.uۅv?Efd*hxL-Q@|p4#]MzgD>~jd"}txp px!`1h5J:Rڿ c$z z]zp7ܠ Gqp.c^F_Z.ѽ'};\k1hLbBA]X.#hTi^1%+?(}Rja# z&+/` |Ⱦ!sH Rd'(BM;i4T֑t߻+גTüዂi0{28&0=I%l]1C׬!RDAStUjz߁biW6~ U굃'GP]FQJn 1>У4UaS@ Uzz+-MXIo#qԍF:8Nc.˘E_7@#~(󋠢\p%%LmJJ"E}FJ@DRe cIE1q3PԔ"hY Rޢ` ĿE2JTn7Zfe_({9ʠ?28ܼXR4 3r5]t:"X}|$A (1 ZW=(J7lIzY~pt\~gXL•&02QkOZU`Sa1CU(uf9҇>EuQ׿.# A\y~/c.˘_McP #Iz00Rl3vwۅt5 S"HVMr2?$E{}0Kb!9rTba+gSsHAh;vNܙS{!vh" )5{28h+8ihg]@8߃W"rV\uW(z̀"(kP\[96݃#98eb-D;8FH]vU=M$C -C1r:nA-Gd9seLYrv]Ƽx؆n(e"ȞYF)nWZC9NADl/**ޑ>nX93 G }wT}~SUZJ{Y?`u u^~2UE ˴VdpX,Y!튻J~)vM}wޡ~:{5@&c>/)҃~Fȁ"ݴƂ~XR?Eͬ!)ie )MSW2s, Ԕ\ǂzQM9P(ř>+4SR'LQKX-U hx2'(jItF' fZC2ƦR_]+G 'hHD18*:Zrq%xٮ@GK;G -9M0RĿE VUʢs@j+jqO(6Jyr$(zSCk& ^jUm^)(P=sHH]xT1neˈKbik xEVg!P<&""*yy sM,L/􉟠q&C]>$ x?x2-?&dipLqL 92-2-XʴDL #f @F`(b!!+bo2-|LqL 92-2-Xʴ،L gd)b,BY?20dҟ  eZ'#W4#AOuPB;DS j%n45zY\(=/rty>= Gawǫx.!Ǫ|uQt+e!%n쿓O%o  jЬΔ4}' 2-F95AF8#2-Ѵy4#;u8drX4I0"xTY.ʣ"R 0q0?O1y#\B&qp.c^F_NuG}M*!;ak*^UU!7@DܒQ.KCP5/􉟖v,HGjkE*ZDėW0_DO@M '0HvC;ħ )IG֔E{i@}3-Y 38LVz߁bD*êday]^wZc;LKV|29cغzBjw:-YNٽ%KC ` 4TxD$=".c.˘_yjcM=#&_BiVIN{ftxRQB#K"WzZ[st2FPl1WH;üxĐVz[#I^->u52gbU^*x29rit\y|UR|>%/lYTѫ(n>̀̒oDĿŮW6b+'LX0Op@ڊ*mFGzMQg>OqZfaCm8ُ/I2qy~a# yEk4~hXqwI42DGmHK%'3:.@XAͼK W,j)(z߃{2AR[7g3PtFtXyPC|X+,#Bk@,+g4w^ pjx35=$;"!i{>%Qˆ HW\}>;@= 0Ų !jFHX ԍayvCcK&=b񞙲mOP?qp.c^F_IR̙qtb/ $2iJ@xS}N3W(*K3>S}凫 A Ĥ9¤O_ Ι&\.SKRPꆬީ-@4uCԇ^g6EP4o*GMbp7Q ^5:Dl߁G?!`mW} HR/Mi$>rYY HۼWMO&zP؝HXmPEFǠ Xpl*L 2eg;K3Y\FMYV)KFUvUN-vM \ 4pi^>[Z zD (r<2-eZ6L˴XR,bKL/eZlXL 3Xa2r12, 2-6V7fL/eZ2-־i`NR8XL ~E2g"u>,2,˴Dh$j%)MXm^$mM~&;NMѡ>yI ?sM<.I 5xy& f )#vM=A}[s=wIPm Od'oDړKȩQ[20̀O&T|"̴33qrr WKM"+)hU1YHH0 2boRiۅv2k|+𞻨 -4E͙jҍLiH'u:!HQ,*H8^x"ߺ]z&JgEn|h% pS2'#>^wdrUњI,>و:,gR )8sC" D47%gTKW:ʮtjyswoj 5$NVDPݴjΠO{eb>F8/5q4͐&Ct hPkL8906+?- }]Fp4^QiLuͮt:o߷>c> :Jzh-8 Z N=(jNɁ^m"LG{]ˌ N^ݏ{k%B!E9vfVyWܙF( sg2wq7rgIAm8+p^Ǣ6- /zhbDs A%@ rIbW!)"1Pt|v]Ƽx(Z'W&(SdW& o+"l6(*c{RM ^))F}-αm^$)Fc6 {[X [/)FCOܯA)y=BזdtH뵥dp,%E)29ʐc}{Ŀ3"H##R+SO@&s+fכ k䓅#0sPFŶ4' G]Fzff8-O OkдvT) zIIqv]ƼxZ,wLMT3 o@z|N*^ S)ʄǽqeq:E&_s21neˈK"{hS,飽5pýf Bޜ%}a@/I /__}~v~Y ڗ/β/><[Q?xeyϮ,gɕk%Or&\Yrpe|2K:kg6kdd}L2f"kgkidD+Kم /}b]Ue%%FgWqj#L RC RC+c:Cl1B PvƧY O&.B,dxx̑WWלGG9~_i)R2ы`8=-i~OY b~~K{'{.fѤ {ǿ>arG ٝMX|4a2a ߆6,⪶gkloA#F?oog}=ط9BXo}ؾAX}c}Ooac䷗Ә?cF|LS1Շ_4 /~wJ?_|~_~??_o?_Eb^kww?WbDGUHZ9NIŀhb2u2Մӗ=~v[2=ߓQ\sm_g ;vp1Rg;rDU G޳4$tyn;1%ʿb&{f⑿ﭙ(Z&t8o2-Fy{!@ʹpz+=sC\T@UWW7.;8ܟa.|Ӹ#օ\zWLUN>ۭùxCG^waQp15sÎ^35Wf~~ÿjVۺe L<)'*jOJvo aqhc@JA/@ލ}*̶O+GPpszm>Iw]Ƹ]8n1/#/iwɈ~(ARQ'˥ z8N ͏?B_{:*pIvPLradX'{ws{ 5ݦ ?~?4wcK?l㾼/^QjrOwT"R/ǟ&WM_d/v_ǟnc)F3m?υ`_鏿ן\fO9M#o/^̦~ endstream endobj 66 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 70 0 obj << /Length 1354 /Filter /FlateDecode >> stream xXMo6WA2fDRŶ@6E{je#3 }"8=w![(nb<"g{ x?M%GShE,F2vȕXLg0lI'^ Q,0;0A468MҎ::8ňwS7[ Bp;؂aj@0= Loz!Jf.(a;Jb ѿxBӋ\BȓlFV5X ,#oخw$E)3m2|,J0SOrj$H9&|ij8~#ޑ,.+pp0g{VK P􅱾$\J}VG̫0FA۴.|DY/!烞Y7,G;ZU>t/:Sb?쬖"J\?-쎸 CWw#W) ^?׊)@h/p(?n^.'Qg8ޫ.G,fF1̭T_:Wuj% qty;/\y|P:[<.[x||%,Gv"(..Ig\ܛn_Z+݆獐GQ}tkSPo%lW̱{Y1zcͷDJlL~uWr_B;F3aś@ۜ?4j7McZ5'D>#XAi1s9Nf@Ѽj9~&y;dֺ}O~8D9> /ExtGState << >>/ColorSpace << /sRGB 76 0 R >>>> /Length 18364 /Filter /FlateDecode >> stream x}Kd9r>E.g}y1,Xj A yԒFȚG2}Ȉ/ |ݬEj&q!d0v^R?s?u?p?짿9S>~Lg矮ti{8?tѿ%k޶.oy9-O^0m /ަ};/xٿ9cxN3[Iٿz+:/|3ο}sm_l<9?S ݇>;.ou xz~M^ߎBsOO4O諣)Vm۲<) g}k_ tU&%h_N巚 @P&&z1}5טא6z@m؏/6g>ދmnzP,6:Z:-I9Ixm8}q1j/Sm*}mi{63yL#HDHb Ԅ#)t.uti}CzŴ3Gֽ#5" 鼿Nj5g~: c68pŀ4ӛ2mii9]9 426Gc%x#Җk[0D};<8edim?x+t1IrcJ*vRmڛ\88ٜԄ+ uk$?~F$ug]bwA>ڟ6;Z?ǴL۾wuiM=ֶe9DчPJcj9\89gjc?mI?o35Dk5mb,d`܇R޻^|bMf~p^XK|<<j{&84|_}>*h?9'oMќ9Ro&U*6k9??Ese6~;^mY;2sCTR}r&h;xڞW[̮]M4I/?X/MDZύɛbk8ֽ}w}& 23W5Ҍj0=O ! UvR|9t4d*z<<&$OQ- UtZc;)it BjͦY%}c}hBBJ7Jh20 )ҖD P&P:(P: P&t癄A{&d:ҡj̧lWز`Nlw%k7/ĒM,#gڤGzf|PJXDDatKA.' .drYRa(%4H=%wOb_~Rsͭ>/иoY~UMŒ\ПJrn{g1ѕ5`Z>uXѮ#ժ&y|wW&&ʄ7nLCɃ a}."ıI2Xj{x22q)"ǒ&`e˸3qƟb).q,k2 _Ǣ>FBqp/^KW~7Q6ޭ ",sF@F(PN穧taL $Owzn=$ʡDo@$pgn:b!i $4nG}O Un΂W"j^Yy*G-d1vYm34Fldr7olms!i5?6ucא6A `_՛Ri{浛R& '@NfNLpC[pC`Z@3ae9t,R`G|؏HdNti2/ -ߎ7Bq2mH7ݏY^t7`!]Ԧ {vBӉBImT MbYΝ_x6,byA;}v7, %R$A5lCm:궎5-Z[*E5,kGuMHnz2gqV!Ak]w'/N/;Cd6T;݋;eCl.rwUE_:ݸE_S_nx./IǁM)!jS_t߲uO3Bغi0,${R-"bN@]J=vtjڋG4Եg]=W9dW9D/`>m13ymw׆&S`{F' `4 LUʺR}}>4GbaZ&qe8Kb8Cٴݦ::yeOvm72.yvެG a@;,~==_%~ !qޤ䶵z4)Z}L`h NOvz i361+-MzMϙ `$i; 1a+tn|tPnJCd9$+MoPI/[)XtY-^A&$2ZI2mVX\Ĕܑ& dЮAh20ˠc bA䇳ZE АvXbCءϘ0)3D2aQ!r3=g = ka$"_BF< $T[wW;#ɧZD8haf2Sqҡ_~m[H1M Z(ulZ1#']9J5-Fk1 pkXr1X]'M'Oۚ%M(kB5I>X[>X[OVGlgãՇiO6fB = 蒌l;=Aip9pd`6֒y}d`2v *Vh2p4 eG?EȽlkrp c]O bWϟƲG/f; ,}Uk(/ς GϢT(h ބN*)JK5Z5Z(҂w"-VI«4cTEE\ge)`6C *H!$~V ^y[>>Ć>}v' Nx9 @xm M;=rc-7&1soC:j+}ص@ E/w6f0$> ֊yX ͲxLx۹D{stKF6wYz QfJ)MZ^2#LVGx'^tv|г]`-k:[Eںw_=d^z,}TXzloQ{dӢVF.|^ fx㭈UO.Ie=zH6Óoo.^orfxRవ۠Afئvn3= ȩd=YwH%o|R昋IaH}PA}~tBk#B]U} t:/$ivB߲WT$_8#Zox6~M6Cb2y$wW%;9QyPAD6 &U"9h/ѯ&cK!j/9 :5!Sj&J j%8bԶň2a(ǔe4=sFg,4ӯ rz>,?390 M3'3/8O|HPa2扢O8rp!9mg9vhljƶ7x!9mg9֪@M$Ǫe+JՄLk\5cr 0sXf72̹8s*20 ̒N9ȖkmvVp)禮od:WM)2:fHgbx6 w3Mc\&)Ӣĺ m怶A!9)^[ nnG7ۑ}b)hd& %Z1N҃E m Նv.O5.Y+.k{v*g6LBf;tAFp}9?Џ2|V/kMU@Cz`K!;Mo+1Eg`w )ejI )jȚJan@e*\piS6l檧*Bؽe-0!ۋ_2|[,w*ژ|6?)T؛' ^A/AWpnUoۤ}_9*X+<&j=uab]WTVk C;:mB~q0'`N5uQڵ`&(hH9LrP= }|֊9é}WYz&wzmGRwjS+[{ŖB*> n9V\!?fT6v)ZAM*wPM}֥CZ;=YϕEW !8,,iGzI[6 B69D9ɡ^.XB/#dm+f)N{"3 ܯoCMi~%[Q_ctɩTPW"_xDWһՉDC;Չ g*um͠BdKlr& 9frlȊ1k3<˱ԓo$1k3<˱4I$c45X^ђQ^4ƽc'1gEJ` 2sg%Ne#oNg 4Y :[-r!fa bj8fM_/O zmI,-h)+r| S̀ĎuMv-icr֝4eIic%ovϨJ!M[žvmtpZ{Ak/FXfai]jƅB+jcŵxEzVXK To!B0/l€4@0tz=D fi8 8a2C FtdB!Ix^yիB֢V>m'D~)]r;3'dzkrJZN=R5yH-u2mDуWo4@ע o 3H[VB7imgy~ގ sгǜA1Cegz(A\rEv5q7ËxŪ,h墱VdCZzW3}^8:rd^J[/:0WjxeHlo)ogjme[` l-yf akVơo7""ݟV)M^KD0t Hr~Uq`AkcTDůTm@~n栙\.G2p 2p).6EIDbăc 4Z1uQpXnVE@hK6S9K%( 9ʂ ot uGsDthuEZptrac$6=9fr,2dʐ9cKX3 &Ǭ,.^39fm9C3硖_XR3kod9p9_ZY97=3l6La2:WENb 3Y7zd-9tŀaO+_Ȃu-t͈ dmZU E2Ud v-i =PUgKLbE_!BBE*gͲ ]s\O-5bR>ᐈ遤4>`ȮM 41ŜBeNb.u=t4@0a4̙{8Ni9` YC԰>,4)R8 2g(niC,"Ҍ%җ8 2A3=&L`L*֦L e'=\wLvud׊FG%=UZ[~ogI& 7&|="Oz켙9 d3eSBqՆpZ_I(9\HV0. `@;dߐoy}jy(z1$ f`TXEO6 Ɯ!k lCKmH mV9eŀm`Qi!{н_'o`$J IXi!{i!@Xi!Xi!b~+-$^Zoa- Bxi!@Xi!BШ?mV0MЩhB~6uĖ)T=4rL% 2"D@q-r.Îky}36@wy}>w-Dae;z5͌V:q'KM@SpAdǽBN )Ȏ2GbU#؊zhBJoL< !)Mc4%"jD]9'agJzMՠE`o^~/\f_n.P^4/fYݭ2k20xĢ0 )CQTIh7.SӬĚWݚ6? L Ydg q(tnz<]-C(ODfÎ6BSZY䙽HBW m2wk`NvkQe[d`[-_Yn!emg%]nMDnyYnM@nUPU.  X+~= }Ƒ[5~kQZVb(= EYky= z!TH8j NgPĮ\@̸n`[uŒ:YtX(#Mh7E=V{ M6iXŶ(8!7:"j(k<Ñ!4v2+Ian)fl1 JVQ%&+V5Y.8ܽBXm1m @& Q "KN{GL$Lb˾J)[v?qӀsxƳG``k3|0VQ faג&P?T[h=yrm{4үQ[<j )j j R--7Am! RRwEm!+Rj'rO-}OCH!*i!!Zm!^e/"xX(GPoO/+Ũ _U˙ ot WM4q} f}^J*gIKW>wf>_[x(k%Sfp)i"$k^%.YgNeLOby RUh鶙<ڪH6pY(2Hm'a{8H^PfYnG`7oRs;TJ?1R^yk30eHb*} 9Y{9hsgq]uJRLX.;/rs˲J190K!89HO79;j fjy@t:;,ڿP+$U "F:%gB%U 6HS#Вi?hI 9NI)gm FiЯ$L1Ϊt5*Z`4k^(Tn'q6@L"XBr 䘔Fw96BYDcU^C;9dב1=hr ϒ{ٺ!;93g.Te*? t:禄KXA3EBuk5  ֹ y0|rQu[PJM"WkcuHKsR^ 8Ni9` _הI% 0w\{n驛J Xَ{mXmLӓ}ཱུ5V$ 1lx8d~꜄yQWp(hyk9qߓyZ;Hk3< 0{% fx'=aLol,E\C89ox[L`6aZ;tO0$݋ lC:V,n!sڐfx"+ah1؛6?89xн_'s`4 εAtP[x--n$`/3y9Ґ8 2:m]d;vn)jVO0B0>Y'@e{Z#m,i/ DӀsxi根ڡeqwvOz9}J*nfc i 9RJg9P'9(ĄH+6<5˙#X3NP#%‡jf"r-Hk3< $aղm[ov,@=t+$1<[e /,@,@|!N_b|mW6d|lHk3|B'l _60s𰶐\[7@m! Sm!Gm!B:-8<Rk R$T[HQ[Hj -B B 24 rh @ tj;xޯe~x*WJ -V ^>{/Wr9'[ᝇ=Ws] :S{Bë o=4'}sSp5x\nx8siBTBnaJ;}Tw 4a,q/U.:kӫgVY&' mnF㵮m}^mR^^jzb|k5 Q0eՅ*XZ` 8Ni9` 2+R&-DxdF8"i9` #`(扉b4Vq Ne)_};C^ .gv rˤsښB P?5(mt Nzyi9כ uI4dC D8= ?0A%E\Pwנ4}=KvM.o0U+C{6#Z0qz 8h`aG5 >p <(|p}nT u d 5tbhNo3k>R0oKorFiivAٳ"* _m^z4>齦^~[M^>ӷc)Y2NdxeV?KGڏzG厴ՎߋI:~*ut^Y}ejٮ}{Z4/WK.l$/;#Kbbh?hQÁ%??8~g"zfսN\_\*qҭWk- SY[OJ/S%,NˑZQz,ƞmq7֯M_簸~ٓd*ח 2o/vM'7WkKz%y;W}o6{׿߿`[Z~8]Swơ78>9~&Uk}}[~!_SrZ\sejz] >筅 >8ß~g~ V> O>w[~]ޫ ~\dW)eJv/}||n{ՉWGvWY-򷛨W-${Wf ھPclhJP>rf1Ζ赝~O-@;yJ;VwTNm'N^<܄BT'<<)!Ӭ5bɗtG "Ϲ\z-ϭǬ-b=f<$0˜”/1J!z.U6seC{GesjO=l=l*jmXkyJqcd1n;ְl4@0疅msڎy%[.ܕ| .R*1-1-bm{J4Њn?7sLقI]sB㔰E7k e CipTUFUBf~{'}Nx߿ }3|}b5-~sZmM}ϯϿ&Yz_|>. 2|ۿfӘ碕$WyL|Տ?H~O>fߕVM'1%l|zQ;V> endstream endobj 78 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 82 0 obj << /Length 2239 /Filter /FlateDecode >> stream xZ[s۶~ϯP ՘(t>dt6[$Hb"Mv-X.' 7/yK A' a A4 D(˔OVZ&E*> W]U%k6sVy&zſyKIz.#I("2X´2޼rJc&+j6Yf몚1o1*[2ܰxÍ-,3+YLi,m4sJ P'Rj&aڑHAJ-?r㔆dE3M{K˩3:i@>~|/{hw3{bvxJCDqo1d0c(g#qr#'ZSzšSǙ'cR)P md[UaTw+b:+z5{:N,a",˕\I*NAрInxܕϱۣ2j y |}gB7S_-k[MeQMoznN_54 7i ;||䋲TG,HH| 7]u_.p-fyyӷ@.9X'f| p&:SUج>]"%x."fa꺉~>C;d I"U/c8z3ZmKus(zdu> ǚ> ڤ4MԷO#g^lǃYAÏvf7ຑ=vv E2ANDDO8.k}6mF`DEheR<l\wz+8A~txյK/x?Y)`լ6&hy2 igA[(m!䜴ƛ1qLıA2F>ʧZ,Lmch 7^~X 7әi{5v1nχPDu٩{z}" N}vm㜞j/dsbr=9M=`Oo#rR].wwW6;6ۗ|> stream xZ[o~ϯqś.Eئh_zټEȲV]IN6}o %9;͞`ȹ~38z+/g-űTT^0F1uaxU;7' o.^_KYȻYzsy!Khl$VEnj++:?:#1I|R :KӺXl74z%4 좹uAWY_8IAK0sD"pUj5o mA# 4,Zp ˬ=-Ye9=hHbZ8f*u]{Ʌu1R]jyKSN9 Κ妜$0%pSr4t7kGAmmrn@aP 4m]:x ; {*,8CI΁^[ Y-WCĥ!{v\؈4@~[k@'UʏQf) U_nDzYM-6GBF:D *7MP~ b,tQb*v]HmZ[ J8KDL Y`A_`Ţ[䬲8hZQq VS.Z3ɛ5BORmM"#[T̔[62I8HMv4[P*ۓ$UB3IL$)B||e$vzŻW}zBHvBz7}a#%/̵CN퀊6)D$8}SYvWOZ꘥K ,TY<=Џ9d:SbMM;"cS+bLH2Y\͕ vĈs<E6^nFr +7sQjN8, "p꘎|?:NF`k3EĸƝŲ^6'眙3f9-59P!\&Ǭ[}β[̈́6ڹ- 䳫X<2MQͶC'MeY2XG\-&Zʦ8#Ù{ Nuq{pT / {06^]h&C5V&-?YKo6/:\t?tvg -չilz4Ҧg3)lG}UvwQӣ|8**dN;!8(%췣UeM4 t^vi] [3f$3rcm_f1 S\g~GUeC+CRpA% Ǻ~ 7r_tLw/Olis/FƮ|_@3pfxu~ v yse? w?(;(.)2ֲp,\a4Mol V=}[p % Yչ .NKm8`MO"X:Y6%Ua NX)Rmiׯ[sRbq> /ExtGState << >>/ColorSpace << /sRGB 94 0 R >>>> /Length 37400 /Filter /FlateDecode >> stream xĽM5Y5OL`_H@ sESU]oƒ9y2wX/ǿןߏ3%8W:͏??Oӏ?ÿߟ:_}?W=s~u犏{ug?Vwß?Sg|,G~0gmO]?_Ϳ~׉Us~%ʣ}o^}J"TF9"߾ɓUu?U/Ws|=nWϑ_?;{Sy%5投g9ꏿ)v~y١v~Pڡϟ0|*~/6\5_{|nɸӕsi_WT,Mg ڡC?{١g ||Wߗ*u-.՞<#Yot0bGo>}>ML9ɼǹ8g)?åZ>W.΅-? '%՟-2A%8'9ش?s?KfM\~p6<[X>pOv]  ~^\q!?T}~?Gw` cA~q>gr~`Vs9 U%Cp5σu È/ ϊQi\cj[F>GYrX7c88ϙՀqT'/yDf>pY <7 :[8;NaFDRҹ'~~S@Nat`3^#fq~ҭ)|K)qF[)|`x\{?+W}~s\GUy`Jmu<ίq`\ ؚ`߰q%WαPwr \A~SQg;0s[ts~`. Y>0Ng?_y3p/w-` {rF3 α7ֹrޛK'u_>b|ɃwY[%^||\7KqAcQ9ځ߸w9{,{5p)|d\>YZnX Οy*?xr<c)m\OY>s7AO~pjXk?yL^‹QղqY'?x}9o>kZ9{{pmn||^x}p|+^/||gqy8ǓPbgsY(>!S:Ɲ% 7YSfy~/tW|yA~0׿೰?xqw֔~輬N{G\3r Skp~ Y9\zzs]<΍TqN=5?tOUWU4PXy-<.] rn{ {,]pg>ś?#Dž9{d08gSq-ih;7> x^>wQ߸5҉{\8O@e^q?)ppϖnJ&ǛDA"6p# 1K$"`Ov( x7;J$">5HvXC ;>W;j$` u⏈$|ыDE"7#`3kH_$bр񠌈ܱNb "밆x~`D-gEbx<rǙk 5tHDn-n 08y鐦57kw{XCeANOnIgCXHYϋ0w}U2žעNHy"l~HĹE/|Q!\,F@^9 찆7D& ܱ&0^+78Ezf + GWRNv8a +'/Frz9yA&\+?a#;a m,n[nr7 vXC\όu.%賚fb|YBƛ%/P GlEi~x} K jf?(Mᰈ-|R|-~YŒ[YGTP~Ro<{Ra3ۛ5lzG5*-aY'Q5 @OzV/kd Xg Vְ8MWa[x7 XV+7+/ƍV֔W2. H}TYAmK;,(Y(+5,>ޓ%f %^`KYB%XEy7xg6K›(/V^'b `Yy0MwX@ *=d7>aGu lZ\@,|^>+< y~E|n\ #"kp,rpKFpAt|\y}P`I+Isihϛ`uEDM%'-N´V~bH')PlppOY.l:* Db9ɝ$0B !DQQfg+';BM.A*S? 6:/ Ġ٩yDEAgN=ٹLUc,y:Ǿ*}$&#螐NoV<{%K{:K,;?oU(\^ QU'BafG~bI P p -}b p%4"xVƄR$NhLBW_j$DCw*:݄zU,MN*$LxPG ťo.8\p؄2 .4`%8YEhE9Rg#4z |cGj|D^  wB!q>á Qv#Be~s|*@LSEۡY7*eݮN!s&tR$@yULr9 ݗY2\OB~yz$IдA9;KqjN(Q kOG%yƇJ~Bx,&UW#3Vڲ|BEcjQ?*s"D7ڜP$r΁CL5>8BF'Ght ]hP7>԰j4VE>r]r03I@+v8?|s4m~á$ e#JEtB"lo:))* uզW>|7lSѡ>p>M̓`?OCu`|,BiPzD܌9Rx0Il߉g[` LuPg~Vp\yt [<_Q@|:_`yڢf|"lh}iz j9ߢAXn\:B|܃Eé*TUUT$B4A S?oe5yQj/jЃ"zōu+*w΀se5T(#'@RE5) >p'xd>p;w䅛TBRzlV&d Dv_%'Cs8CO|R7N:a R*I g&sgVBq ]ȕ`OM%(E_&~O\2ӁgtnF ®ENU}w+$tQSL!ď e>)hm,̨uغ42伲VN@UˌA6*G\NR,VЃj<\8 fb.Rr:8+>x&NnY!(৞UrJV'瀽#@+z$\-5TNZJN܎ wܠiedtm\Y@s꘸r$Y=u%b_\7Rs܎Z]/7^ ;,E*ZhL];ⅷ:.\lun{7C=֓x*׫eAνײst]:<1\,S/`@]!8E׿;)K 0ٹ!P@[u?4>km?Z!!a!cpL/BE(\6͵gݕ 6{&JH h9:\}?x)B3DL[sb0hKkCxAh>CkRosvl~OW L:¬<v!2Ԏbogx; I`eOz}y{"X=ܟtFQ9^W&ro~{iCO=ó~{{wG x?ɼWuH/nS oޱ{x @_;h6zv?^k$nCzyƾ͚v}=;c^-:!S8fA^ kkKԳCђ/r2͚z]!ƹҴ 3|>'آ6=F4N'h&("ga 4oG1YD:4Oj%uv) i8Eϻ&4u$j2&ӤLwCsO#b\`9 eF|FcQ0j3:YKj䢄O9rJ 0²9Ơr%p+}Um<\@"1]QU\J=wyܤŝy)arlETKQ&H  եCE5T M:׼rr|z.UH6^x`˨\]}s6)]odeG6|};fWedL%0l^cEQ/<ݝ})s0P+{4޸{wg*±?lpv'(_BI1/1x0#S1&^Uo|ovJQ48.:'{OS_"{uֻێ47тxә9Fߣ4"шHhDzhyш FoC#z~шJ`޿h .HDFohDHFtxCV)eYe7p^Ň!Vyh ;zN&ߚUD Ͻ~ >p)chXC(;4#wh˺7C]Z&BҌg+ n ڳuZߘDCCR]-iȆ_ڱ>/MkP&7^=ztWjUk4ԙ Zz|HcTǝX $L7x$^|[[yTޠ`}^GVd_^_8Z_xb=ۍ %$isme!'.cW5;ǕH_0.P놺UylD3KZ-#%u+_oVۤnT{tyGcENv8r!M uCԝ:jQ8ȋ/ԕ%~eSgO'urRrH>9U!FKb 4y#]79bcNoAbūP4rI!&~jMYQ]RT Pi(VO݈t9SRؓ\}L&"jJlhbWh߸+17t`7P%+1Tމf3Uܟc:=IlNNX]s&zoNtjH=1b`bl?DcVbtL; nN$曈TL;ã|t=Ȟcobco_Y*nvS~[/8 ( M4q4^ s˱Gzc?>Lr>WVr9V?r,ehS}[_nsk9'-g3t.ԝ~SÙcj?C'8ӠJcw%1r(I`O149fNwp-N0oL{Mg,1 rj~}%t\318N)|s"_I?֝\9_xo_^r*.tWtwQr=U1Ez0EjLwe 7/B,JqU?z)2+B|rE95Ǽ/lA^?_{}|/a/}y}y+4 fj{, f?&i gy&%AmEǹ&nl MUYv-))JG‚Qݮ?=-6Dd0^؇[ESg4&*~>-`YC)&.j!MSOqPZMAytɕ% /ZiSԠ?/b#D|R3g<1&3#GgA&%!1t$I.&M&*%X $ʶ&.FKVМ;ٟ)Q@d-SpŮb1*E"!oUIՑc.5>Fu~..t8 ~'$<1q*.Dʩ Mhmw=߻A*m^,J}XrNIxa*ޝIN{ pHiݤFX&6lIh@-a5y8 { T,3[IKE*,)X@,hC!Sw/!9%ן>IBgHd Q?&?C槒&!l@!Vd‘#Uh I&ch2Lk._][X[AAHdERP %,l,Ϫ! BEj *d3B2æb<I$uXCx >4AW?OFjq|-aj9d$e7bKZ|h!yQ{fWk^/<KLB`ʃ{[_[h {=$I/$")9|_Һ98޷g~KD6%5U/ƒ1lj,[b)ZÎc `O"yJNƝuSeã^.btӓ"M'xq0x8.>]֜tޑ-M6c5^Nux*`=8nWdKn0Roā sl-녶G^hJ ċkս()<t~M,YŨ5Qtc qn8߿\ F{٥|lS=Eňoz,O?{eVgc;qyqu/qŷ뿇lx~\ 1kEoy^x+[qcSS4.w%$+8*eݢ h'" %~TCYq)2;zX,=*Ӆ6U=".&M+q?sts 8Wۉ3⺋mn+٬Zm|_Ys q%qu~R<92x! 4+[q(t4M7ŝk/(V9i)VNx7fǽ?fvG_f3pZ#$|Z"'2;8O-ޫ淉ryFlǭ$QTn}e7Ns`vkY߼C>ʗ70(_i7΃#{5|Lo>~r~=/oE>^ellZD(T (GtS}/: 6ì7v>y$,ye7{=M#p~ q t/6=My|yxA2ߞg缰,S7 o.6O48ߊ(^dOf/oEm"<Ҫ&.V8䡚lCyA%T~pı'rT, <۔O0 KF~b]_I>N=2_|F~AAK4 9c[:v#oji~%[qf]*$OW4x[z9vT<5Z}:k&>⾣E\OZ77/JN-z}6X<}62f ǻ\Pŕ11Ev7:ͬnJk<蕾~J&?kU{i-֩ g$!dWjfcRly3Mۛyʔ7yF{YE>Zv_O5yx4Xzy~{}'Ѱ Ɠz}?bsߏ Ye|*8 1/ Y} E$^oۭo V~0էvyCX%'z4COYv&.*ĉu(}<ga}pWtzO?Sq}bt<(:M7 /vT^mFUx^Xm351+d=*}-E /Ը_+y~ 5|Aߣ_}Ci_OЪVl͝AI#OX?[²bo[ߧ+^+2}pL5Lc^u ؖf*sw}ak4~+~.X}=x?y_쑤?4loOm W ?mh'^G&wĠH2#t ?:"={s)C]u7:_=⿼ ;G~WvYw?ͨ4xiմ1Wvikd{5=jjWP F4R&c"'ŁQ>2 cd[1V]1Әk'*u>+;5Ƕk0b-)SuVNd+ MjrEH.vT7y 3G-{bjv3'6t{ܝXNؤ{Nc*k*϶;Lfs t5O5d4a)mډ[a[5ww:U]b;,j oyɬdN"ٯ$<ƃϻ)۬! o ̓i`wZSL؞eg $_Xdݙ#٪)S1d&h|+|v;!<6 ˞@ֈȎF rDvLoWo,)̑l.^-q-grd(%D[?H۸mgzk;ZȈ-*&lY0gx;S+-N! 6f vCf]vC+6o< Ƭ5b^97eWěmT9!*OL[KƋdi[$劳Wß/z]87f2,68E_Q6:Frbr;n\m&(lW,!-rE< LY~x!u9*2

ϖ_uE5d{Q47 x l=:syDuܬj!U\Ԯdv8' uyޯ(WW /%[-H5u9Wul3ŝ_H=!18o<_poO7j=x۝r=[Yջ]Yd7g̛ ;r뵮^nrٺDz!H H}z$`~tpѐ߾ n6p-7[_{/cP;[5(=Wy Wd1ruca&^=ᒝ nֻqg}ej]U2"^Wi],{@N6Մ >ae`n$[$7;[;|f nda8@ˈnYWaBuƃ}YrSXqW<GY磬ǔ!lL{}T/|n2}01n 7q_q M_:'$߸4dgzw? H 8q2 loϛ/c`2jسGc}ܾ^?b~vZXsâ"kα\81p9N85B Iv݃F6ΰtQ$ܻj1eTl4ǒg}Iܴ=k3=6b>8vvŚؘA>I$mm$՗ . 8t`扊M16ǜ1|{8DČv,J:Ȟ>Imgv}%I<,.$铧YE ̬[p k8EwF$8$5Azq1(E1۶s$~[CI5u\I>9oNqJBqDNqno0kVE6:HjoZ;to|g(hdOadPۜCu4PT("TmdprWx]҅ecDYklGBn~5-+Ɇd {ޘe Nh&Y+)׍f8&$̳5'd'm:qJ'Jq-;F2#b,';T|cqibUOkT,uXkOfК]SG~u7-cf"S4.1Y߸p,8bZmP~CHy@)}cuCtWF{N_NSoGp_Im8}pظcÙ8Fi l wOIq>; ޾09a8Cp6'7mIS{<1{Ⓨ 4'|.oث5-)y>uctCVkV,88ĂCXoP>qN=0wV8cjsOCb=Z+tt2z:1l Q6xؗSmGz:. />‰.n/,<_XʥuX&9}:5ڋ=b]_h N5;)-DT.tXm&:>Eu tt54Kn#f({ygA~g_ҽ7k79,#k",?[S?·??rnߺZI>uþu׷N{|뼇}[=[=ߺX? vXG- ?ӷW"<"oox||>uþuo]2ߞs5CC.ki̽ ||@^.wk+#ATy.b 8y={;YzH֡!KND7Tx7ѝg;z8F F։ >AjIw*]\tu5_wپwx#ޑ%CkW a.~HZUBgu9h]}މ>7N3 fNPLg8tpQdq1uw񢻗1>rExKu:&Arl>3Ot tHc<>z?f>3[F0{4qNLKc}Eyak߽?C2K0LH [_}tqbbz%OLфzVSx7}ۨo]JC'Uj=8\+^X7{S4]|grz'vd3x>*9jiE_,߃bj*kCS|j>JDZXa;3e;S>| E./;磦eR'뇎hYm_x~kF?Hw^Y6^Df2…S57yh%+ˇQ9_q.",@):%G>Yd[+өŰ^\o{d?_e̫\KG?j$k'Zpx`8e_W8|'\wU{рzkCx߯x8*M]U|7u,_9B&ko$gn+ {btjv+WWaߏ+N\6H8J)Eqy!>>EFVW[*SJE«qGWT{~;:$Pmi ř^]q}!Z8e:N˿|cޭٸ?k0_.V]܉{j*.U/>׏ S?v⼁A.Zo^j6TIb8%.,9_YZ?W\' `7p?q\n,q: >xL6V뚪_88tc9_iq 8xPw㼍,U#nq~=Y': ~{4 Em^+d?!jox Qt?n[@LGk[NT{G0_:x@ɗ mw^`r># ~ck?utox<5=~~#?cOˊC;}C|:͛|dS\~y|sp3/(o~_i4xq7kUީbqr︷Y18:]w/2i)$cŹԇZxx~h4*#^ y^!8.N<~f^^~hP?6ġ'^층 oZx:`ĥCxҳqf>8ƝTq8/<&Ӽm}x 'ol\&ً}x^{u yo,_7ﵯ6~7~ռ\?yvluzE\W_:}Y}4A Lb5T/ :zZ}x^܇ͫ݇3qE72Bk>+>z PI멗={g4PRWwQM/a%=Xԣ q?qkRܙWT3W{2W$|j{0,C'vx}\HP?ŋ8ټ&nMM)<>'dxL~з)] ^/ D̫7̫|̋MS epuo'^x"ݾ iaM^Wox|{yeQ}A<‹t*}><ьw ,$kg^#__+w  ɟ:zUxyˮQkY*_3/u_.׊y>"f*r;_,銓{5^S~;fq|ĵ}g<^5_OMƾ>xQ5߿;F~Sg8n˟;T_[ɉOb9x߹f{O=e7|߂g9o0hQ+~pǧ."_:a֧.=S?9:f. />~b&PIڞX/4+U|ݙNVMx Nә2+ƨ:NCGUk(LNZU|}Rư\~ygLZYD&LN/?J1&֒G'ಬ pT]5AQ9kdzPi' ճpf(?_O/|ҞەiU-XgmGY6ne% ŹPрOv'uFIި|\%;m߸w}a:Td~Ph{MxI*+r;k5UJʾR [>rXRrn0*Tiҿhwǀr/^HWw`.)Uq\e O"]_<1PMS KKj9S$L|'*JxV͞HSMĔ RÝΓ۷YʽKX>vURźQ:X*Xř#]̅͝ݝRRſ[x^O+/w"K5+Gu+*`\Qժ@f 7F*Y'7nRv\ZJLRr'rVd~PݺȉUN4}E~+SV%M_my(!e*[dNŊ,eVkUfraTȔT/Qzn Vu*ZܔhUJD9e6>/ daMXe*Wix^7^cL@5Kk|^wlYB卧T9r'qoP:}adqtڎ؝<׎Láu;8pf{_ڵ\a.W]AT˧NX5qTx*Ty*U|AA|Z/c˛Opez#!]C=ܖ ﱊ}|vG,dKƁ X~b{֧DLzj& NTiKW8 JdW[r{^ɡ% W-]EuvYcV+QAU*Lbh&Gܣ{ ^+Vr1P"tԦǔyvѺY:wK{|BdωkV.z#hqr;M/zA_-͇O-.TPK0"YKmP,Eu&,ѣ{J.|7\eOCd5ii;T'C]yLWFuiƅqӂ/u]o;ͫSۋ| nZʢRn۲=EW6Yj8BvLJq@՜ͫ&,X&,^IUoܿuSzcrΞ SsC@sdW%bT>n>;EPClQ?W{J ]4[ w\vb>\tol_nlWT&-_^"gwef_ _XLj,zxW! 5sˮPz]4p]-9<9Wus-. R;v5,{LFO.7ĸWg5~ONLԲSQ0xmQJW;xu[41LSjؙxئZ\:؅߰\u]*uerw(%R 6U>lÈŤgOթ4_=/sz?%s2k7x}wⷢTq}"xGYH}{h;% x?%&Cԍ]dj4lzxNSv*or]x7.J# ^Sw*xEDt<7$O2SIjzi37VV'ޜJz MNՇISR`&Cox~ ]vjQk@E5sjϢ!aNjo_ > <;dz-1K'7ٞ=l61De2+ i2\2L>26WR/y8m :sWHϧ9c !Aѐɑ!Q Cq E3sENBꮮ,sݴ*蝛V5Dǥgn`BwjРJ 0l2R۷az,W݀M &'f ҽ1 :};v1:bO崏w&hn҆Ա/%zsdAU/bxyvD\sJvUc9"(]^Bqt>P:nqb0 P{49u R9FJ6j91ccyAAF{Վl1x9lŐE)xnW=Vh5V9Fzf Ac0NCi"!E.#Xk&hjғ}F­7=sfb? $}Ҋ|a ]hjg0'5 239ԎIS㹵A@G+@o]1{WKu%ȞhV|~ ].j0H)rhnPڅAJbrfp3rGtƷuu{sAun_t~qC O Z= N{hu:@뼇u CxIQ쁾(=_ yZkh?yZ= n?"s (?{ss.hU<-+g1FHL8wpl3#խ铙SUu?LMm17(pp FP` 3g40W1o /ZO4uGi S[ djl(_Ƞ)dp`.%?:l/Х1+}]OL>-NLf2E:s90ykԏ5,-bC6rq6̪<20 g/uʉ߭7,%Ăavh$aF11vN"v [Iz1v >KqCzH/ >Al9B0@xk̘Jh>ިE&=^LhJ)-ablF ӭ*3TtQXMdJΠK]z]ypnIz`珂~t _.O&KLTEp$ťGdjt ^Z|(IWPpfp>$GU|2]lyU -ϋhv6)Kˌ*wMgYp) ,LKٙr% ʦ] ~[Z].]o]iZ,]x=\ZwptV:̭ .W۬o+oyi|7_ycOK 9~)7Gi6WXz3l'OC>S:؃4pu̫ƱuhPO6[ %hum: e=ܵǵ,^$֍Ԛ&&aЭLMS(ͫ|f7\XM 61]z.:fxMОg2YUznkԫ&Ӊu<'?m6<Nv :A޳N(n:c91-sfO[7 os5wi=@\;\wZ'Hi[Lzgn Vu_*t Nz5)jzGst8&K>Ⱥ3u }2iy.% V9i]GÆU}TKwdcs}[` )91MEns= ҂cF{Cߒt4; Vy5w6 pY:W.XZ'AtF.ʂ%?rn*?CςuBƤux]19LWSho!Or6n} 3e{Lyu<,8tGO B~ՂAZ f0upfs˴i+{)KohRtӺb޾_lZ0ܿ&QZ[+<-{7fVoҽt<-=&o~Nm󘸖:1vJ ./vS6q҂qdV~Osl0IuCj+.H]P}6ug}Yoͺw,@p1j=Q;:~~!wgLG۴׫s?+^QguŤhʯ1 -S[iHvo 0 cb| To/=y=S!ƥ+zʈd!xƥ;(<҂{a$A NjZ|8N"'K;aͩ5?S3vcAYȪ z"g^c(? _M~2o2'ҜCV ~_)8+;nٰ {Ys铹>I{8drv>@<2ήZqI~CO: 'b97@\/S^ @|V\TjWՓ'8=/u{##]O',]p\i;֍#ƒ} VܣzƉ лܤ`>To4`7,S_Xz绝'`ϝ+8Qn^0<97 x#æ8GɏYݼ`32OON(&'⮝# jqQH79jy8?*ӊФw0ϭ҃Tq_'+:O+E$;?-?{ e[y:ق0py`+ ;4Hwu]\ha !]#90Ox8]ȂE܎;te;_~uYqZ"*|`:Z+qyc8KhkNt' Ƌnw+nmx ֊XC}HOsDu$'˂(,98;z;ni\O\,9S <97ǡ44ii'-hSȏvǍuw>|>Ry&yݎ+tq{KEs;ct8qܘ׺߰lUذρ]ӵW>нu+n=X<,z`q'wNêwN dָ/-Qq}|i0W-ӄq= qL8r)qnภr=Z Jd_`\yüT`8m{%Cw\:R?*Xq v]V.x6:[oBıykŭ^[!_F |#3`8 5P x}ӨzzY=d݁QioKw\_> K B6O7l~Eonfq3ڑ8A󟍸UbO +NSնrRX]g&ɽ.u@[meC= wV^:kߊ nϞTzg%Z'/P;G^hh#!;m}E^z+tGvlCwCz fqz>Y{M7JwmX'u<^tշX^ЧwO侰܇RG}NbC}Lr1JPos={LJ竇/ٳA]5U?Vڽ}MUyِ.N<}vNO'W_C6n<}II*='e!T޹/)SpݟY6rtA8;7ʪ7]짯zҮ'q'+L3j@NYrC\}fylX?\Rf(ғ8Mw89N} > r} Dvo ꉶF|Ěh I 2Lq]]Z8~zm,)8Fw˙hH㒵&Ih 8e0#.Plx-dn-Z:PT>NS-wO ΐ驸\i װ}itg9qzR9;]~ȈD=N醀k;BTZiH0D;wwuzQpVD\N A3A{:2$~:.+s_rc"zAV'[@3?Wy;$.S@Typ+#:"r>UT +pEȯZйr2\  W@2l+2r2.m8Ӓ3+<D3; ⎃.*^؅tto(R鉷n|HlzNrmDkyo@׊' 0цՙ yWEN_pRykTv^ι>{&t4aySVm+`hOxNݝE.V;=.VKOXQΐuTr&hʵ 9,Gu'Sh(ŮWO>:qq=R z5!v\؇K׀Х;\Ug2F\.']WGg^szO檾:˭gr2v:NzL][+EZe} Zm﮷h*ֽ:NZ;xj$ftw׾ZYԝ2]Mi;O/U}ӝ͡Ƈ֝L#;٭=e ھL.oӶ|sX݈`+d X7僚lk~gLc e &^Fɯ_%40 B1-;plΡv)FhG Cipa&M0A7t~⫦H!<B}B MB6ƙ]F.^v C{Ɗx} K~z{+h,`[lQ5oᮯ(.noX;A2mxOgNkY i^qԁUHXL4V`GnOS ׏iZτ|L4qb_2qBn]N::azt7 Pyo6a lN6eh5cZLċ.j _D'`, d` ъa$z˫IHqt7=]l[p];ל6.ւ R)mbPl+ Uw?MyS!Xx!ؔC"R=Z)Kp@}>.aGO=&&y`?.RGwO;b};zXXF![a{vFT ^քtN)(nǘ| azk0pۼ5\[zc=cnvb sA[OkI{P^ј&95VDzL o'PsI3uU<',DDzD> f璵r^4sЬsM r^oZh,C8#u`-%tAYaluH@& =R&$'+a $W/9jc<HX˂۱Y'sp**zĭYP\I[h1y+,A꧑eY8 nJ'!O paGڄv Nxg%ψgDmܚb-,W[%q( 0T`ɠ+%?lɲ>ߚ#qU(7FNPp|x23X@f?n޶]::Sd1*-{m'f(gcֿ rkFggBH[ڭKnt}V^ZO8Vҵn=Pt#d;=;ZKwS䌠=9>SeZWczգSv Gh<{uw^o5GC,a7vº]ْ{zVnaβN4c3#sOdgێaɫ)_ݤ՟n?l[`wOPpBgt[;NAi{USzN쯀!6Gj7E.0@=**!$,1@w%0y}ѫڢW!Lӣ#1zL4lM衣y'$TPs>}X zɗCN5' !t^!GTgm*8j$rنd) i0sO$%G)P3pwHvIvmqkLIHr =D)37B<”hk vhS= tr,] <7;KK7v!<`I2DOѸt]<֡q7OB3m&RBHи'4]Zs\\!q]Z!8A=oiYzLwD.҃"Ѵ]X۫CFxD=&Iu'#d4?*9 hѭ,ǼhwBZ=>+vUڭRYox\P4ST2i2BAU2c*ԓz.{&v%vx("tzr0]y:L*?!@".] C,Ne6kJ8*wONn>^ndJh{B*zj$LdPLVL4Hr؊hɮ=5ґ%s\Hzfryظg!WvPaɚ5MsOfhd% zvWۅru!QZnR<5cΪ#r%/]|9n8dI8{n"rU 'ĢeXv0mHcC# \q8tENv <-⭹޺H'd;Z!0=zZN!WsҫVδ\yOuC?z[gGWhn1>rm7$EL.TI;y$$v?XPYe! ͟Ǧ=`3sٹfo//Jޣhd{+[_|F7>s"Ь#]r__rkzc: !ݾT⯇'$~ :s T\9/̜o:ϱα`P>wEWePy]jf{/]Ieuf.z\gro.3o^\gڒ dԄØLc1̾Nᄒ0jsc/"nЧOnDN:AS's_˩-T.Rg2lg4KW4;v/fe_\^؝s+;vɳ}ٟŽn7#yߗ;P};ɿ9gW9cvɳ=zlwo'o'SsnfxN. w};{}Q};_Ai .YNJ|_+wvUGw|hgUvz7Fx(j'k& hm? 7?^=_?WLpIXn$KOdLJ?m:/b+ֿ;:+]}7|=gRS/;uwj=)[ö?Vvj#//8OMv~I,ې9ϜOXfEYdOj$w'5[e'5sRIї Za? q!M槾O[?3vsџ9:_;\aW?Hn\+>Pڣ~uprf ~=?W;Pρ??͗wovc?faz??E姏D7nMZ endstream endobj 96 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 100 0 obj << /Length 1956 /Filter /FlateDecode >> stream xkoFpw|m}jhg>73DEv-v 40`Β]O#"::yw"LFh8?4'm@ D$ KGFD&pH@=+6xZk|=g?"gi;{y6pmuDK~.NL'Vyy@r)c2$۫n-4D#CY6s.rFP//z8JC[E~ H( 0XXeB$_67E='T7nQMT.˴/`M-|F/LuQ7U82n m*y?1E:-s&tS bc33c}e p5 jp EGs(y8K-3a.fnbxV3Ar2Cn1\,@;l3~Cu ²,{X Ϛ꽒+߶JrjtE(\zhO} C mKfѿy$T`naBFAi qkNHxB}-Ӳ_J&D55 ]o 96^NbCꒄ¤ԠoR+ӫyZp ۝]ߋE/(:vy9kdˣD+a!y?D`{2यC(q-fkG" Ɠ1uVwt(MAPE}`e% ' x?Y:#|F1۱rcqto{JDX.YL=Ÿ?9P۞u<$Bϣ+q'ӳ:/WX|Yf}z]WKR`us..XvK2×z8ϛPlާپ&ʏNu^rW|oܟ=ZSۮ}nTnqLL-3C'MӶd`_cẝ}-{}ɾq,=Z>^E|?\@&fl:X'a֦Á v^ RB%i sHZзO֒QdlG7eIWZ3(4&3qLֽ' AzRBXX.+¡ "bo#xb_S6&Rz "!l1CRh_Cj'NO  { endstream endobj 85 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/simulateOutliersPrec-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 104 0 R /BBox [0 0 576 288] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 105 0 R/F2 106 0 R>> /ExtGState << >>/ColorSpace << /sRGB 107 0 R >>>> /Length 22061 /Filter /FlateDecode >> stream x͝K-uW9`1ޏ'Dl<4ŖD({"ԭn629;؏oo|O9S{1?=ۻWOwq1|q>_w}\x \/}_ŷ1__Syꤔ__>_INO޼z)_>_K'o"[y1>6[9~hRë+w1{l0s hRؾNy6Kgjz G:|uq>ο/h5]7u?r ~P}h^9s8ܯ`Fg57fU\VsRȎG@Zc7lƫgvkosBv_7V+n/޺jvy~|`}׍W=/ lo7g7Xny<.jel[] .jWo>ػykw|2M]׳.d 'k L`W?o_Y\ Clr }?Pݶ S| qsV!.oCa[e'f@eLk q' q=wbgc !^vO#<^uUd?방-Sa:>۠~}}l} .縟_|-hh!X[?( ]ca<>>!uz?׌|N@'pkMD2`usF8'|~d܎Ws c>DTЁw|u0q񥨄.lp]~!C~aHB7 ,YSO5aPIlIlHDtIWۗ>MkQv0Eif޲ng*ۜHYU;gΪ0Ai،=֑m2Ҟs}|XT| -Pٻ- ^5t0` ít{i&s=ךӜ)a5g*9{vpnE4WZ0G[}߃ߺos蘍lհnh_Zne2>1jTkx^>?wb*IK}'_zmݧ1Pq~xףh7_-{րXhӾG샹V,]ǯ>9WZ@Fx]@4E~nNУO-R{x[6GU~DK{e,kH{RNju-6Wz dk`G;2غm7N_;QkW3|x/ӗuw>-_qB+g?wXbYv&6l|F: Sp|=])9q"g0x;2cbAp8]akÁ}8]~{HX'{p 8;p\.},]a1R8#z8\~9cf߄zp XPX gBQ_Ga&g癎r #QeKagSi]tN<U $6Dݦ3g l k26'¼>T36W`82NA 9ЇI06>Xw`l>MY㬀:r 0 fo2t®GY'T ƖzUGK0;`\ ƕr%$0n,Hlƭ f7-MQƀhC`$w-]N0Lvw cCa" ˄G0|c3,x %X00I01g'OUgK2ORƓ>@T5x4^`lm\?|}\`U 1Z|j#ڕPA^`i!;{&+w06P4\cc|%T8^`lm&P8;h]>].0NO0N,g& 3"9)`,ƙSjFqC|)KK|?&LKQ`\+#0.!>dqB`\%WJ,+!A`\Y!0 ' PƄqex}-؂gq b|bcƍFiC0A` 0c3KiqgK`ܷ`Zq\(벰"wxhF0 P)`ix|cy *c/kZ]*`nR冯U ̘JO ȞG{&spZ_^4j%Ã#`'  j `q K 4T5Jt #k$0KHgHHga##Dhߊ>gs>Os_(nUɟn$o9F:CF:~+M_Ԁ3X~˙E4tA[ԩӑ?PLc! u"N9D=DbShx[yk#Uc9 vJ6F CD$Z)ӑklak ADA\쌹"O|'N4P`0(Q-NXD%Du6x]e(| 8QR@'"SVr/R]p ,OʧH(8`i8oߏ6XڡprƁxzq:J!~#J!bT{der6ͶN>6cٳHݻ7Npzf_T8lTt0Vf3%)]k#x4X>P@׵*8uq=dx?BƲQV[lPV}:2YBk),f }0k+)`J6~-15# =1= ?DMQ4#,ǁws+7`U]Yd?<ѾvSX]oJ.ulPO6zýz=]h'ߦl}+ck,%y۳5Ȩ Vb 2\[1*}PW=μ#.+W8 4<=A2*}@WPspɼCBh'oq¤?WS^g=O[Jy$w R<}<bao #ÃtDa$ .3x?0Uo=^B?h˲@ gY ]e.p'],NU\Kp$ הfT,sc-o_Fd^5ր`%$6'B%wOaɉn=/y9XԏCu`=/8 _rvoY{^!9sWo_{IO5e~51/ŲEV́/-si(!K>v;>[1Wdͯz4z\ck.#>9s6J,B~}%k֠ yc|k(!@ncHx .O.fR }cܛ\ڞ'/ó}J5kۚqµ.3Fw?,R~c=ҵgv+\*~Zq>,!B_%%@NoOj3\^.i?=!ߗ~<0-}| xDvP3 Kw烒`]~~'ޤfe5_FfHL9mGjFS-@dC[L'9?RsBCH/9aEQ3f$nj9\MT5cr#5 9(/cVhb76Ml.\ f`yCElJ6F>8l,٤}a39]@OE{c3"U6a6+R2hnlгq|csW%6_B"Y"b3_ج,͈lWв)6͔6)4L U66c$l. 6Wbfz 76y )csa366c "ZU!lVfdF=L>ج,:bs"f*,cs"V YޱYTasU%6q(l*W6wc3#IY>~6ll"b3fUb;6Sr`f'6WOb˘9Y98olgc3=66 ͬbsҲE=`663BmlM؜ lf-Z,-a3e66Szc6g fnpa3 f،Z]،f`f 6ylFlx'67nlp#^n }llfх f*7llnލs>/leasW[{~R1#!Oݮ<=6%ժ*m$"2jҞbHȫ{2$U0'36fB#!J yU Lۿw,ۊU7!`mhUUaz t`ke$Z8\Nק 2;_Jx^Yȶ6wU)1{ ٕ:f:0{W0T@lUL [Q70u. MEn!! Wy$fk=a ݩ!-!!kntOobvfwڳOmUqxef}<ڣjGك}fK0FC t+E 1{pa|*V"fO?b7B=UN̞\8 '0{fa80{ X ̞H"fOŷ00J;=Z"̞S uH ReBbB^-LȓJb~콩+1;EMl7@r#^"E17ٞ؄!2ыmLfYm@hꂍ#f'E}ql3CN[Vw:QG܃ى 촱(%NLf'%PavF2GLfo%fKQ #f[4bgavav}fg 3UVٙ;6 =x cB'.Lf1J͖-afKF]$̮ Tٕ[] f31{ ̶%`uYMbvSݘ1[0{{L<ݘe {PY1  1!CN̞ ab{)ݵ1u7w[XV"-kBvB>kC akB'`tN^ zt}õ%[Ͻ}Ԧ YuZ<XC# "Wk^W+cSq΀# 긅Dq5 N[u\BN%W'ջ(\ YڲV\"Wgee3G\mE7W*Wg됫by0\%(k\]'W 5 wWoaHrcf8\]TN:rvBWWn)R'WkCqu;\W$$pV!Wy]AQjI, ruppu⊸wq9+A\TBNJTBqTC rUB\=T(ESGj*:2KF\bqkWWnt띅B =Zr_k/|sZpW#\Ml\"uU\DN|_թ5E95sTV:Q\'}(WkqufjƉnՅOZ[r Յ\$.eWWquk5\]9k!W+J\]^'rד+oq յ+K\])l/ژ1&n|ՍEXխ6%:!W]&*PWwf񉫻B4qugV+\YP!̂W^ !զ6z$'rW@L\=Fi\zG  s*oE*B \Mt2_" ᡼(W{}v7%"xQrF]`$;]3/,PP ?@(_ S("wG(glq E E#aSu{oY s6t` "tݍ"t7G(BgH(BgnE6׺_j3u1uBQ6Vժ.TX'xyҏ,5?rS47V lwo+H*I/8Wo NG\cR/v|TDЧ bgzv .(R O6P=6)OaT30I?IQzlhc$ 6Cw>f& AL܏YǾB$ X369ic)>v!6Vо5\_hVK##az{'&ac˘V..^"n~,^v!X\!,ܰy`cqm~ !F?c)?vGx_[|ׯ( j3IHq!0wv#pzN0>ޔQ ؅ El2 AL+ :[!!J+@:G=<`__[=EvQ)HB~k+szJc33UH }/3;<>Q0Bp ; WFjXyjyrw`c 'hi8iUOpK{* u,?seK7 KހǏ@Zy!yq 8}.ᮢ~}Z+B uyyݨ tGހб r(f}׍b7ʥW緜* d1mU;[)N7siUeN?&p(WӺR} :|$ܣ(=-=}y$gp'`3X'!IٻAvUaXp=Igrd̊O9L)s, mʙRƋ`Zwq'J$8X8cXC^յT)bE`IƉd'XnF`GkyL`FdcbW4hq|sx'cqƉ$XnJ`w!(T1!E B*mkk| 1dIJ|/èXY 4!jx >w1)Gw3Ά%"ޢR,QKR0P(g%E)_ꐬivAu)$zxMLe[ W9)bZ+bd|g2]ssgrPKxJ^q: R,G)XyR, JGE\c|~H*ׂR,*8PRm|]Ow,?N1Dgi^oTv(/'*Dy]Z(/ePq|BrN_ɲI5ն Q#Z0~kL(Ţ1 QމEs_'Ο>ޕCZ8r)%Wog?x&H#bs@ 5LOH7Hs47"Hwrf$ O~ UiMPY QOR6H Rb&HKg4ˆ =# A:4 -'&HktcYr{;e&i"=6I#"ix-U11HҬIK~rD"St&IHDʚIOyA)H+$&P$ͱIHۇ$MQK-t0|IZQ't IǣDpN"Iv&iEeH oNt>Ya{>0"& Izp!VVIntX$RISpHJ$y}"iEHrt_$04ƧHZ+C𖊤!,14Vj$imeQ31>I}IS{k46Ikf&Iwlv&M֛YM6&i7I7VG\$&iHѦMܻrtV{45DҬ2$,ڋq=qIsQrH: iT&o$MMH%!iHYE&y$MIѥMҜ6I"iTIҙ3&i:.C$]y}"i6I?EҬ>I'"*-@ܮdtIF5!i<tJF$M\$|4Qo;$!V&iVo6&in7r &iV;Whn4S6IsMҴG'HzpMMҬ$MMҬn$MǬtm ^'iO^GA$v$1gmYEʇ"eC<;qkr`!Aj']PwC UY;Vv<\Ay/BvP^AySE͆ӊL:tIf㵵6Xͼ/"%!Ɏ^qTl<:숷`d)bW=#0{&̞L$fOٳj>`W=̀ٓ}d)b9am480[; 'ٓ; 2= 9$fOa&X ʯCrY4|PAlϠ|Pqiߘmm`fu僰AЅ]ɔrX{,QlVP>ta4AH |$`VP>hr僊JcP>l*(T|EA~'̢u&iGZ%ŢH.,.TrYh$KrӒP`eIJ.KAH.KaLFrYReIEL.KYl$%˒A\1]E fi$f'-#W1{kE(i )^kkaEړ]*60۰ va;%UE켱-kavCOe#f&50;xV' hxfمKavQ@]&.N.C 2֕l '̮߄ٕڄ**bLbv0Jۊ0[Dav Ob&a60`vd0ImQ}8@-avGf771u$G E"ٍ0?13IPfw.+ٽٝlKFll\a6q:I كl>l&]#k,#`8?7f"uWEC r;bwY7%9{>%׃ЖkKK;[Ϭl${TU6zh\='X8m :h#p־s Sbu XՑ7auT<:jubuV',N"V N1V- N!5j3Nau$$zMc bufQX8zK&s{^gmM|̈́E5B8aW buiVaźLྲྀ<x.SZA^WX{]#ueWDxS5^nz/6ҤMCuFK z;wYKh]XzLz(W{=3C5^){]7Vvj0TzRsBfz2V),zRC)-+z齞ZaU\ 5Ձ $ha*O#<2a En`u6:2hcuv5Zqya5݅ղ7%84 ;XehoR2hij57VS\X](+.V VKEX]!.ԴVnfhcCX<bu2Xm$VW=Obu\ lV+FXMMtlf^jE3|_6VOyMy+Fa6Vg>Oa5%7Vk%fic+ꮼb4d`i XY%Vۆ``ux$Vnz{NՃ5jIj 6Ճ1axVk^aMFN(a ,M·+^H`!p@zuh~~_~EЛt׼m@ x_]':y@UR|J#ExV#_K)S#dK#AgH#Ag5xƥ33FI[[`6I3оΰ)u_Uh$n9 :C :h$DO9c0v7/_ 痭eA$}ekfPiWh 4X!a 34  2|'.5旐E;fPej4 %(c~17:>ek/U k4x~ocE) u&6}OS뷙U-X% i{mV;7Ԅ(BwJeC*k M|W;݀y@YuKgQ3dB4긂M֮VI1_4Mګ +ګz? ai^%wlq[p=TZ@0v୒w ګx5^&4#4 ainqƗE)x Oh?_?x|۝L!EYK 55wp( oh(Z7s(<.?i hy4t4t\ F*a \"htV_1 Uy4x\<~4tЍBǾ2uU \_7ic(_1ߚ JFjv?Q:Q'nGNiv٧I cܬOi9]lӕ,`"ty6 xFR Ɂg>3I igP3 ֙ufVIws$9/PNބϾȦT^]{[vHP @tUH9^TPEõڵڧk@IuB6(Ǣ阅A98tqjQEC*n,)+)")+";[YR'ݳ֪"֣8ސt$ I=6?tU P9ƒ<ڇVW&'5$!3$d4XK#9;5cCOܰfoʂ5$ZD$!ع7x7iSD|JC2kjHBx?j~WÖNDGe xVV$!kC2/{~֋IE>7/ypcyOo.NEG5} .[Vg(RŢby[vOSro,B WVLuhu֭+ A+h#nJʊ7ڵM .Ⱥ$ ij*+QЪIh|U> r߁V Z)bB 7@νшVhl3U_B+ґ VMzZ( H (ъI(8# [VADmVr $0"JU&`@(j;VIPVъbV4JFD|dUrTF)^B$GPYy@UZhՕND U ЌhOY"ZSVtZP hU ЪT v(Ƈ**цh$Bڂ-D+ JeB+VsRVKhN 'jLZ5;N>;ъcHVTh%U9e^V^DޕD ЪsSЍVhEhFB,!ZU F{D"` x(ъkFZ5jh剔@1I7Zq3VZQ{PDhUu>* XJB+&vnRJ B#ZIh'ۻ PhŲV,3ЊGRЊABDF+nvhMV7ZmaeLVڄtE,7ZqO_KSY *ɍ]v%kSVSRqx]V!n=)K*c.tEqv[q;ݟvvHQeۥ(vKe.ioU팶v+NKX]Hv)?`oE色OxKEa>]b}vi{S=vC73u![mbvT.)nȸ]zv"n(]81ng q;[^-ovy\]!vGWqĽڕ=3ȬwPtGU,(7]~|%\_eyG^m./cʱUv8XW0 ])waV4ᚘXE+ &MCsvqx 0̱_I#bzWG᲏y GK®2Lb%C쒎 /].(* b^]T® إeKbaW-6`lvU.]®J~aW^r ^ .`m5 v5<Z!j]-|`W'.E.jc إaaW/Z~hQrI흹v \] %$eidȃr5"N[)aK9dAt֔H\ǗGkSJΚRfJU 7%2p5AYQ겤!ey*4&Nv :-Rvʲ*r,JsKq69P|[Ҽs+1;Mέ=[rnGs+Iέ= ҹClXtnY-Sd=.'82Mέ*\<ʹUwsB*U;3ZVCvsr\9*vnUjɹUpn4lji]έVIutn&ub8[Urn5e/ѹu^ܲ7ʒ5IQՙw+;ʝ%.8tnuEɹutn"[Vsky ֠[W6W9skskPPέ[SYtnMȹ5gskQ[Zsk*kέYZέHtniE9f5eHR#5 2q.8@YA.)+DYRQV*PV/PV b$"weũ"(+疯p>(+).GJe& {-8@Yǟ(+K(+'TE Eq@P$#\$"$"*IOUl#e)UU2㶤ªQV3]U+R(/ZfGR,VEnʊtVJʪhRV )Q(j^U'&e.eU!ʒব@(^̾TNaUL^UL?zcҦl~I!+}[JfcoI)~"TQ>ܙ2\Q:#NܩϞC?2|~˄sA;E$xTƞ=u{>\>.ZşΞO'}Z7x%B| ?F$˿1\O3nEzd9-kmj_]({ZL!pQшJZVslO K8  z'y.o<}}gs꼹?߆+}É(Rʧ㊯/tzސހQya;鸳|Puڧcoo?ծwRtEJr Y~ԧc lxUO.UOc~M/~ۮI\8nӿt-F~w?o/~v]Dt<.Ϟo[b7sז՘#DG>!tLwu㿾>q ^z IE;_W^OWz|O:}0~2?zk!zWx̿jENY=s5:PFW̡_H;4 fMstbjgtk׉%W:cN=c'&Ea.l;5"HٝBNM(;|{7lF#;y@`=?]O\W87Dx?g0hf>kP?\gԗ/06BM7|яC~|?A~ݿ'=(}W D?/ O?Yx}wZ endstream endobj 109 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 112 0 obj << /Length 207 /Filter /FlateDecode >> stream xePM 0 WVãބm SoA /<(@Ę R2tnؗ|71ZUf5N|h> /ExtGState << >>/ColorSpace << /sRGB 118 0 R >>>> /Length 1473 /Filter /FlateDecode >> stream xXn7 Wh.F@Hm"&uNԋ~!5d_wqCJԋ'*P7N}QF,V.ǻ8%w?/w8 f>d:+uI}[uu|wUr\Nƿ|5ʨPofmdeW)O rVV]lU gYUXN2GU˲yejefkp{m.s5̆;u8 w/'VJ,Ԃ^qgV}x[PG;u-́c6{):u!i 6ת3} B 8blgpXVmK Aa\ ^VӦ/wdl}&%L! 2's:`6 I8[y|+).a7U4УxBM A2$D,=p7W]mFgfSۑdӻ>RśQ&SGj7>pV<VɫY9:ҞlG} ܙÑpGTҐAd^ \hgNXii₇h?=jcCa uCPavHCDYwuspuHLlm#C̘9؛9 r7vm7]u=[S*ylޡ(ެc`5}uK xk캮?>̦+MxUxEYAcσqyCn ?.k|n+;oRw͜.0Y ZMT"5La6d ӕm:.Q$`>=: ? XceL#5<]":');Yd 1?|J~:nJ.dJe1UM^b{7!Nw*IOǺ} 9F},٧{Ɗ}' ia 2ž :¸g]Owf% ܈wk; +^uzdJσH;#D?ƾЏ/ C+恏^ܑ1^c9FRYA(K$4u#> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 124 0 obj << /Length 1479 /Filter /FlateDecode >> stream xX[o6~/ ú6톭i Fm${oE%iv݊!9߹`DI'W 9WTS(8EU,JzBLg!6 Y6uHu[h"T~k~%lc j%;-ڍHv=׼FΟW?  #93[rƵ Mg M‹iFB L%=P$ \k!ʳ(ԪSeu% 7lJ㔜6[7u3%Y(߹P,m(M VV䪼s6;|=htq AHKkb1Gǯ -!2t{A0: vmc,ډ :i7`=;b>Ӻ []w1#By(G B^lx3^ɋQ Yܧ_< ç;< x_AoOk(CDȜ8OK5(casM&y)kΈk]1BiƑWkъ+^AqZY  Nd!>gnHCv֧pFf4DF$A&00 s4۩Q e9d¦;_8Q_@gvqW ߫I(W]9Z|E)^ܡ˧\GN>նG;qoV\=d$MPO5rڔ&3mNCF4&q5O Ib126|y5]1h?{MO}\&!yoz~!whXywNp$(eDyZ]=x,=P]+_r!M'p}jfj2~3E5y3AI̮ک~8JWKfَDc!П3e_Un /_hO_Aw 7W?mMv]Bײڳ,#nQO] c?l5ɖzIaE"L6Hp|܉2~V @cǬ,a;9RȘ .S2)QǙH4!Z7`ytL>[ Mr tN8&5DWX#TX7Zns:!(Baܒ7XRm|"릒Ȅ`7]m&'4o> 7@ޢObkʲy_nX-MCEǢ?4pi&6'i|v endstream endobj 121 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/lfcAccuracy-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 127 0 R /BBox [0 0 504 360] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 128 0 R/F2 129 0 R>> /ExtGState << >>/ColorSpace << /sRGB 130 0 R >>>> /Length 4716 /Filter /FlateDecode >> stream x\K$ϯ#\|Vu1XB’|@ <ɒQ==;_EdfUDddF~ٛϧ44zM˼,-tB=L}w_ݻ)}??!ots2KOwЯj{ҶΩOylS*'OXm?a9\0ȷ: g䩴ydW5V6i0yI󶺆࠱S_Zb5q0&Wk/˜B\S07`-a4ff5`4G4̽ ];+͕W{0ߦ/ӁOyNmzv'ҼcWus߮oc7o:/+ͷ4NvJ7 $ ̩G|l *xuu" menc,!6]5%x$?ܡpLm8\2@ç[ vmp ys}dh/BšHv.urNi-"+aqt]{?qh x棌Q r^0 dQ0SoYyIQGylapֲɅ8 v,My seoouGh4Q:ard^lM WQPf0!va K9ljd~߉qX֊,O1:m}M♆a}$acX0Qǁ@X{N?2 qrSCȥe >d57݃Zwևkˆ>d5d^B?>&-CC1Tp-u(}鬣y @a=|@u>o6> | rOVwDCx\f5Ê8}ГЇkA11\C0 )cGP&0&Pen}5`-biwJJڜbuKMH,jwjJfjUCZV5U kE [--١Jö8jVD A4Gps Q:Wj3of^ߪjh^ݵU{Z㪆׸!6wJB`vfd;ֹ|b1B`ʹ9w *QN7kBiB꩕.t,Rٸ͡ ה^c"!MA (6$8i#2VCH\p\лҦ`S$,Z KH5;Ub,|" >W:w̖** CeP_1[6܂)J0%Q_1[vy!c|pLG5L}@|h ([Q|p\qgqp}Wֽ){5gK@An 2 8V! {tC2-_0Ni\XwzjpR8ƁРS%&[Z.?mL\k)Lqs!8!ߣ#hķi (qXG\C0TyO&>Fj!|x8E!>i->x'EROTa=s řBhW i,IqXDp}58Jv 1:+*nBL#h?7:rnv*dsK5C~} 'ncL`;>iX,['8iUf}$^&ڿ?ǘp lQq4#C&'TCq5* ~p g|إdc44oɀ=;\O9=p lk'UG,5ԟG'izp '='!1ZjvwUȏԭG/lvgηu(B2tpl+dt\5E.w/ىKŽ9e|R .5\jE=R .5 T*D00e<*,~KQ˭1!Qa+I¡*|BּoF7US,$X@YaN&{TfJqV7Uh*Un l/Ŋ@b> &(+.흇c@.—ìrP]/2 #d1 ]bMC*kt&>Cm!R-\2RH)9i0ds ,W aX L.~7jȖb#ș@ dð\|r(X{~`].E^`" ,z+U]DrIɉZ.|/|cEۛ" =/OKy]CgQՉ}#hkya3MksiŃkU &g%f=?Au .yqڻ":|^h'*2 :2:U.~5v\σ\}{@<+߇yrizWƭ5a3 ^7:{Ҏdr?y?Mch5z҅}綧_r_p+ +ޒ,7v} sۇsq>M]@Cty v:l.!+Fcψ*(+R 0Wlw'X?z`31Yω+`6Y=n3|Ej7ҿOyDlFN J ݏ7`z vr5(Z/f4%[ [,-Td KP/%LX=^w\j;AG;A;2wI*#m<^ ]hg(> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 137 0 obj << /Length 1520 /Filter /FlateDecode >> stream xZ[o6~/ Æ4CزA۔-TW;ȑīݵ1#E Y8ȳ_nF$pb4pnq<81!wnr ZBsN.Q.PScq$jHVUYTTJѼ]anVYZ\ (_ɬ,ze1ʍ5u}w1F&R9p~qDC3 yQ Byҧ.ͣs0_4SOнu²G/'v*eY6)*WL>@\Nw 8&s}&;%@>4eT*~8A6_u0E\/]pv8/Dd3jóO{VQ~]r,sLxx?x_o\"cɻ'mW7a˂s9|aC8Zԏ~Vp:r[qk{1 $ < Lk]CX].R]T*k榥M%hcn)LV Ȳz֒e$zyl"{V[6kZM+4UԮeV ޯԓ\*my:3l c,E4>dftzL0-,:*-GLI mZEY?ʪ4*MQ{*qP)=8qtFT2JO2J U9?i(i,3͡@PJ)8BK'N6Ce3B'nVj5#DV/1DpsG̸xj#P"1xEh s*BR9ǚRZOuk'j :vBt~WF08yI2j\5lEt[`r϶oOa ]|4='BD!t"NЉ:BGL'DE 9ǵ)>-N,‘enے)V'g{(VÏѵיCz+'NWԏbLKQQP¥N"$ֵf=-šG*Vap1  ;Øo^%vK AZHUj4ft #hQd] :],+FV |Duv@xmio!3 endstream endobj 133 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/lfcAccuracyDE-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 140 0 R /BBox [0 0 432 180] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 141 0 R/F2 142 0 R>> /ExtGState << >>/ColorSpace << /sRGB 143 0 R >>>> /Length 2832 /Filter /FlateDecode >> stream xZKo7ϯ}Hn6yc/`  'x[cXl~֓U=4=dW,ȉë!?w?71~L9r1k? ]"{^ggo?vW.R?g}42ysbcY0C}A[/ට:i% \wg"^8W34q+cK&fh☊;iKp IH/yk#\/m9qqls MW(8gS`QDJ8eD-SCQhb&Ӹ&f<0c3#e>d-yy+؊)0x-|;z̳/a;aF?ݴeLq9tjGL4:NfRdVN- Kx_aZf t5D,R:9[jkR`2:LB C,Ȗ.%Ζښ= RĂ[sLt";9h8w5[Dno*4Zf.g-ۛي%иr&P@R& RlI,ܒFCGə9TiHm&䍵a7g_~ e%{nnrL:xwzg˞4'u281=WZO7W铋 p|'ߗkqͽk27n38*ȈyQV0@炧nl\O5"TcځN'[CIU'Z=bl߭.Nz1T3I(߮dlnם\16()}G߻ {{w_/R{E*gwn>;+_xbqËW> e`?=>V fˑN _z{xͼwuQ RI;q*gRtQ!g<2_B ^]}v-w`Б{ƋRͧ{ ykD$ PgD.W#,=v'QIx} xT@92\.hDYś U`ݛ PxxcV Ėk۷b endstream endobj 145 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 149 0 obj << /Length 1408 /Filter /FlateDecode >> stream xXo6_! c+m h~ i[$}߾;Iv-bQw_}gW3_ s&K)W0 pfjRsuHӗA,Oę YS9p|u_^˙9wg^+)eRkiXZve-{VjZU+0sȖ=1ӗa0x0&G}EpG)N(oF„q_7] #gط(ㄖ0+'XW?NQr18d܏UO{vWO_ty⥜,m0ٞS(Yu݉/ 7a?\_bI>E$Z],݊0,X=cXݦw_%}l;-ھޟs|=a{# JٴSeӝdq̃E=uo^Q;*em](Et-xpleEjoGkz>3Q@fMl- -n؝38Fsqzvӝ\- @֫l%G,҇K-xgX7^^֍a/R ܣ2,;j8a\0A5ab| 1qH5 $`:bn[1495V5Ӳی5i˕gc}~|<,3B |k;v4 ԣ7>YjENO-|x1 &BcE9{$gyjѴk<ջTDcEJ;BZz\вP37#"1hPЂUnhpd>iL&oQD~[R[֨j_D78vEՃwVD[v40Ҙ)ZZ^MNYfa\:E^ JkBLBbP .Ԑq. B] q M- l: >-Ei4mth @!il>T+!dF pu c/ |K/1([" >G@# kH6,v_ (IF%FQ+icfNx1}u9O TV ^.zRl`Zwa^5A|J3,'3ckai836%Ń{mja> /ExtGState << >>/ColorSpace << /sRGB 155 0 R >>>> /Length 4550 /Filter /FlateDecode >> stream x\Kϯ{̇HJǬc0.vzZS*vOOSIXd&OOyqg?r]SJS}O" OW/.O? w?iiN@mۘsr_>\< s->wiV}#pnۻ+pJ25m ]u^VWL c>[-p-a<3 \<ƨ5mNk`4c/Z?Ku닭JGbܗ8(ƨ'p[q9ϭe5ژSBXԍExgmT0 > <=K%G!ֹGIחZ4u*uśuVt<:AߖFuDDl[ '}<:k!p=dUʣ g6D:7 $8J/so, *sUԆå, 96wbh96@-Pu0N6`)&͕`%$S<3I6I]`3f?UH&ȅϜ/܆*8-<س:uf}l9,dc0$cL}Fx.dQpaQ|0 rbaW>'L^f8KK.uJ8h 'B`t3 Aq E9a(o9Kioa/eQCG)0B-p̲`TQ6ӹG,*E+bymp{`]\Enk0@_'L[B{`]A n/fl^p:  kh|.w g&*h`S* y]|k%ʊ6ֻ!^}ЕЇihbGɼ(У )Mzoor"SE! T.p|ڻ\\qI:m|p HWqa '3xAC<6A1 \Cp,+8q4i.{\ rO u/\"ro\sA'\ r7=]&S1 LCsC?|!~ cc{p(0 T +B6#xqװUEFħQ`h?,"L-+U-9cZBO,FwF8aC5Zjx=*VF(H)rPp_Jmd`5zFM)!8j,5z`|eY6ӺTmujha^Vi)*nfjn5lYhF=w6W?:-wTNk$oO=>WXp氧΃LY\7N]q5DA!ҩra% IlE2hT mmum$lh8 gLʹm;,- RtNh!X09F'W+NOI2Up]a{p}l6.GˡoL_0[[M e6}Go,[MT}WV0P4`ȱ,Xeupzwo |m# ̵e:.ι=ʇ:X-oaF'>N+Z8 5k [ NOqAWBQɬxPڀl^u5WCt%bCRZ#$Gڇi`)$i3P8Bt%XQa51h>J48<Tȃ1k } 1w?ǘ0 k]t%Nq Й깊WP 8ƄihL }ZOk1O0%O1&\Cbbp1AWBLƙ|$s|8'LCDLgy<O\4cL01a'6iG P%p PܰAWBl8?,8险AWB!+q3124=k>VjWGL+A}zYw (CFp~  /U¸9U bTE,|PAATU5( ¶՞$! X;a0!LxlUnQxVCaRE,4kxBQ f BF̢d6T]VQ7rkVJB2 I526nR^ >5U:d>ՠAؘOel|B@B900#j\*-P ڇQguU%?&eQ) 5u&BE+Lnd*T7@bHl)C#OƝ* )nEL)C#NHxS6]d T1 "rJtDC k2ZoS] 2m3%H31 n7؞.x{9i*ncrX&^ioO\2&eoK/Fn JZK%5);608B'47 XN&Xt Oɋ?k^7moۨ35{|^S?Ƽ"O+`D!jxTJqz*T!8]A LQP~3( Bo/UA@ցb EȽT[j%JPO(BlS q k-qЖj a54S1jPHLX&Su8LvC1)fO7a+ &l' ǿ_~ǽR Vt`,_暰(12CWi8l.ciL^* %6Bb 4!Ra`.q10q Y-^-;ݟ R(+ %YOf(a Ɯ'++ t% 'c+Y Bǿ0Pv~uIFTÚ2,0Д<KHJHzL(Jp.E xQ.ōk.xŠ3vh,VP=HMyZ@3 L-ޤi;Q0ds5e[=_AS b`ryD}Wz9'IN0 kfҞ/X{ /[ð\`PY.˙@1 +Zi2&wXl6ﮡ^8Kʂᡪ4g7?L9S7h[G=ǵVv1vzx ;C"KP_3[K(F:3*eAKֻ_~ _|5|cLAV{x'$g:fsec"m1w2`<Ta?5MZ[s̭0:tO*/զ S w_j!ң‡{@.c_䥒Db|$S#R}DĻ7L%*/JE$FQqE8DY+W&Il[B5#Ÿ7.x߿_"z{/=c>%ƿEr®sn'rVxXVJ|t<۴SΈPW#fDg|<_k&NptFQ|0 W|w?gNnTg|> stream xZKsFW*Ҽ)*@aW`8(YKf%Qh%KkK<Ҏzud83L fԁ&=g3micNg& ^FRB Wee/N*47)& `\3Ƀu1` hGi#1`> *3 fYC2xF 8Z -ZF:'94PpGp6ȎGHZ0yg=GP-&)={,`>2@`8 !$Ph 3:[,[J  Kd5N4ANDHR3d Aa(utRne֑%AQRy` @4($HVE-D@6 [pY*qX rHxiFP-b %֐*HTBy@D`KA2<> .z%,y^T,9dō-z6  R3,m ߊ،#>XvVv潙z$a.٩*zuz[6Y'ioYvc{@>BtLmFVVÔyWt㈏¶a̮/8{ŒyG<|%'_.rL(yZM^65SU^W:1#/g~EN9> O"(Xc%ؕ&*cq{(NQ,YgbacP3HryWXAqyN9zpuI7P\WqްSzxĒs:+Õkk?])Ua/ r @[yhV=_f/m6-,_U\٪h`%f&E5=I|u&XC5_}ʛ"KYU ǀ+PNߞ>%3@|T׭N,e]l/e)/Ur^m͂M>UeCc`]:'i6y=_ISyrfLޯs`Ied(`.$.f/5l'ҿZ䇶> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 162 0 obj << /Length 2019 /Filter /FlateDecode >> stream x]s6ݿ5g#ϛܥns3iEBZT*vDR9Nf,`,{ oς/=.%KԋdYzK(N.57ُoG~1W<2"KcE`@J'R/]H)}}vksTmPm۴4\􃦮q9R63+ZwVJ.U ry҉„3%HV@S4WѪqcnZے۪cڲkO- d~6.iifctS#SgA0|eLP f W~gH3ѭm^-n&׭q i寍&6]$- jtzw!^"/1%L .ZUUTݼ̻(H Aљ_m.D__S{!#&2Nl0 t_oVͼB2LGsmL?%ͶICt.]n^CRDDbihT D k%nW05Mc+Cń"""I )g# s&S'vksZ|+]X ~oʭ4M:GI򥢥O["s23j`,Ef E<0P&>T>d= --mKp|.:%> iχYj0yր%@vү:T2MDPG{ڙfa!" ňFʀ 6f(~5٢PF?9&$b @֎b'5JyQ4mI68= ^`'_yI;H[bX|9KA-`l1^EYĆl H# a{'2@EZ42[6튼lEH()-kPX{ R8;Ora'-ebPĉZ[u"Gpz ʇmL(F5Af,v,a֖8D*%D(._QK7A%}5nu . k@u/Wj0GZX)|o 슎ʶ4eh4Ki ?ЀLp(E &] !vbJu-v p?7ѿ]Ls^@f"WVz1eb@TMtyYq4+KZ(#S ?"|?]HYxR{T8Xp yLg) ^"hL}P(M#4LyB u*ʞ(#{.YS52XD*OB%fYlGؑ~ec^2dL&'_ smv$񎄌Kd3~E8NX씇Oדdsx?`! 8^gAH/w͔'`?XjLkխI2| ]JN88ۋܨle_lkvxdI/UcԸit=_n!g{9~Q$z}QkvX~EitN@R9AgUvST7/g;E+i,[+I$ FA~Q:S+zzn;=Vw>ѭZoS$4}GM)21*S7ʄqdhs,z(޽!ְb+-UuOio4dud/Яzgag뻺.noY߶Yaٺ)>uṯ /e{5!Ϩ]w;ԎAО:81"^q^NƸ]H˅ T$ͶtIYî:)Bl.i>w5 /f)HHR1w$y endstream endobj 146 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/lfcAccuracyDiffMAE-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 165 0 R /BBox [0 0 432 180] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 166 0 R/F2 167 0 R>> /ExtGState << >>/ColorSpace << /sRGB 168 0 R >>>> /Length 2799 /Filter /FlateDecode >> stream xZKoﯘtߏc(K8@L>>(eYYJ( H~}]]KA#~SU=UWwu_^/~[GЗzvy8? >7|[m~uqۍ_^ÿO4Z*\9[7>o~n;6͹[_]R1CPU. J;s_cbFF\32b * ?bR.zZ؂b\($FӾ`rb|BI#1ަZ\^T {T 1}vsxN5yQ7>,W˳in^^e]"Ӎ0+x#I$vg!<&0IcY 9S.Ey@zM~!Ή,$s7Ic\[}mM9O '^G`\Fp^ bAE11„htڅ3Htr$a'0D\"vP vpM (3* |49-aFS5a8rtјR¯90!=xX) ͊r@=P{ 15qFe ͂5+i*4&{yʅu =Y^hL9]ƥa%Pr! ٍ).a${N!{H= [Jv*;dG Bf@{~1|ٹI{dRmCJICC $K6`r!WJ6CC ĭGbګ9'iq3 S.a^-Z^ilx7]+6s~IlW6T29y,)k&>=(GV( #,򱚲P.օ"(dXgir4ch$Ǫ@Ӭha$*TJ;SUMT JCqև"ȵ@VCcCCKDRKa~)CKU v0S3 [(F$2'[0S!3n uJ3tR%߻Vć.iq ڶlDXH,@N1w:)^B 86h-&Z}ʄd?Bz6MU*4#ԫBzxшyaTa o'yԨ*=T*N%a KuGkT2*n\M}Cɰ؉ ˙NfҠLmtİr3tz8¨^Q6YYض|Qx4(ߣ9Gl,Gy$x3r%Y{- LNTQ $K4}ŕo $=|DLhMP!MقR#!Yk\H "gRboSćArɒ0h.US.S.7b9| fT& YT>U֞_L{ e.9*b[Cc_֐w7~'OO_,W>wOc2U Za&ϲ{#: :坛LRN%G>R{z*n~)}OEcicYKR?5F8!$=CH5~!O_5bw36TC1bBμXP Je6($R ki;mmh04Ҫt 6`4ٮihqjz{uxs*LmH7*QHszx{#&cb[cW`"&4M_]oc?/P]>ux;{`y\ýcy=q=r{ZfOdžñ!ge4g z` `C waPJ=,p >!uw^_/X`hϯ.o>oa?!S{6}gF%\ǁ?˧w//?^^;"tGRWM7^ODp |SV*8Un #U endstream endobj 170 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 175 0 obj << /Length 1874 /Filter /FlateDecode >> stream xY[o6~ϯ$1#^DIC-P;v`d*KI$S}lKVsw.D*΢7 xdI" &S䛳Og,tB'n ?n8gc..2ieb@Ɉ)WX;K)CJw宝MQpO&w Hc"֖X/i\5nKE[$mlۖ{Ӊ"q72KVtޢzsuhpHmkwz6I׻jcGpØA,#[ 妬Ҟ,Cԧ3EY/ "؀T3dAט]tE]1z~\, \2ey7KUj^#gjֶuEk PԁeSoN2 4p59rݬxq(HR=j ]k8K%Q74_1c.rSj7E(Wlm]g]T ,=uٍ5-q&"{e- ΐm:Z 0`~WaA'lKqű8>~EImc[[|,*몼g& @B1;r)+AqA |#BTs>H9\:4=eI_\ie+n QR.U{L]ד{}zAV2|n\ġ$< >ec$/grߖ?@wq)(alWAJx{D ^k5ï0]~_>G'sI x"eQZAlB0rC жoB (hdHinc3>X'Ή2_ok!?}3yخ~P"VY#G%,B;&)<LG}mٔ[}|^Ot:7Jxy0@xSܾ]|Lˊ+`HAZ$+DsVm '8 tY5B(rM҅{.#?/;||=]v+Pc b=N:?׉ endstream endobj 159 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/simulateCluster-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 183 0 R /BBox [0 0 576 360] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 184 0 R/F2 185 0 R>> /ExtGState << >>/ColorSpace << /sRGB 186 0 R >>>> /Length 10706 /Filter /FlateDecode >> stream x}ˎ,7^_KiQQΗ?jL3+, (HW]*dHݺ%Lc~hvH3gfd^ (Byep:i4>?Na_;ӿӷy'ӷ?> P:/<}EXa.OZX`e SJSy %d^%dR %<}_|M8ٿ?Z|?Ze:}YsN9}ç~雿~O߼V񷪪?^W1V|er+2Vuǫ+-9ۺoY׀a[/܀y~uoB4st8kl~-ry 2t ^E%}鲤"Jl 6)[X{kg&slѼ d)^` `IO@6=_@6%(asSEtI!ڪJʄlk-M1Ȯ25Qw]o~6O3(zs 3Yᷝ E8>::vmQx5y.-r?d4>b]Hb5PbX9dQ-deGJG , NcyS5Gˎ@uDx/5EpAM ݝ :;4+?xRW1 zjv}mfo1)?֣/ I) %q=)Fs^W]!:B KK/U.q QM_M$WahܼL'RY &D a8^- 1tm9xBXJ!sޓigj ؟ 1X5̝žx:vB@Fʊw4{J }x' ZW"!#H"se ASx@!~Nu"!Sy2D7BS]vhC e8p4_f5;=M kwk]Vgz_O=<Κ7ZG^ydH)9}8 byu<[K|"2DY)vɜ|#H"ԏ0/!꘵ldEkNh_`v }_7Cs#a[3ى+0Lc|3vJ%epsnqT Y{ \2sa&a%ju^5x_<|@i|}}]绑07[%Z3'XI"TOsN"9ϛ3m{keOZ#uY\o@@kΫӂ%07x묳2Dʹr3 KeJ9l"砯 ,BH/VU _N>AV:O';A+Cڬ30+jŸVV-0d~D ӒV슶̜}}]& #8B ("vȥ 5|g up .ex~8)!_x#9)I^O (UD(UF*Gj „RnbE611mL4dXD[G@}ʰS}_;qSESEh"z"4T=TkUvUC HH^#-0U-NW "A2"5B5|ctj8'>G!:^̧iaq xdy;WF}ܜKyP1? O 7£.'AP-PGW$Hg[iFL{9OR pScxʜfQ,;E5aRaT-/Oϻq\ʜs/|"/GoI7'@ݥejqQ>KJ5.!`\U~견R>xHc"LA5eUF 0N x&Sf2*UWh+]D~ T /2Z0M8=U~rB tçM])SjhNj ~-q57,yհxR94z!vgn+7>C8˖!gQwhWg2gR'`6MlzDI@nt =0XE~/Mn>ߠǥd&?*"=[z)Ss|SGHJCH4}@yuj!u[ԚģJ5i!~|S^L6 S "+ޜSq8ujp4EpA+_}K{uRS+DcE #6Zd}ⓢ+E@X:5U8`"8 V+YPiRo1 'hNЋc.KiLBzY93Nr$[9Š吖l9)yrzL>XWˑm[9YD+3][Գzq^^c>foC>ӭ '-I;HdW׃liG ev 9ݿ!ԍw#ry7/!7d@!-w !pE7:6p$Χ Ux AC eP_ /!R%E e8p0}dW[)#D"rюO{G+ǂ&/!(#O"tx|ȫ2|>eP%w)yAsC@^1(38.څg QYo exf\]AR`+[D+<sl)C"V"%2Dauen]oHN>CԙDe!>7A9u0y|6Ev֛ 2B4T׃|K}A.|.uv)CX,g7|{늷o z{N{!ʾOEnuo` wvUoCE|tgg[3(>ֻO (`ߺ Eut儛Hm'!B 'i󓴟#f 2B~:a=5VhA+z }jZZEh"ztadu_g+o"=Vi+i5Z=Vkk=ɦsMxhUׅZRw8 (IBB+@9%FQ-"W({\ BN4aw$q(;S ɅgH"{N4+ 0kIx_^CFz絶0D%e s%h ԋK%b:Ջ˪.aW;.,g! ctn a>w=:߼эz;\nގ\o՟|\TjEk}hKڄ^lI|`ԋr[qdz;f0}N_Fzxmf "A)C t"!Jܧ :10/!ztO!tGbٕ{)#jᣛ^zЗv߽M^|'苆^FzPrĖZڄ-h/!6Q^H!҃蝱880ЫiW/hNmdx Ϻ:t #=.lzԗYe}`hVĂ@.sgXRFd!QE>C]2B-,}C3F&V~E }ŠhaEhaEh"z~WaS|4UdqrxG@{iƊlg&UWkhF|%'RxLR [g.sox"qTKG_M6hqQ@-D_&gQhqE텿^ QPhqt77FN{T-}UjјT~r"#"ǹ,oM 8NKV; 2;O#&o,vEyK2@HdX~Bǝ 7HENh_#&;\x&M@87F'PFLߠ^&@?S);#X?<_Jok[h&ha[4Q{y*7otpq,"k(j9uj. i|Bwf|=J82DWɳl8 !B|Uz]qH!3';^ z@o'E趌K"OƵmAмv/z/!hä8RFG9g}#ePA>CmX2D8_{@Acl~I٧ zA; 1G6)CbJʽ|m]!BG,g2m>ehsA>Aĭ}է f[=E!W|-#& ѭ':Or)CzMN>Cv8Χ f}yAGY#`^>Am=q2DfN_ɾO:ygv3?v|[rwzgK"Uvx|(B7|7R+:~<އֈc!B{ı5X=XA#`"NJi O)*aw(-ww PU!,oCE!ɢ!ZUUC!Ɋ!ɊАdEd֦0 PVܽ6T+@Y[Jr-fYHG9} |QFZqBpڣ|v]Oklvݢqe] -Dˀe2v0ȶ [P6t {&|UCk }a ^W)6Vp ?]Ԁ)>` 6:2\[\X68| bc& Vp֋|D8E|&Iա,*r/V)^eWӇs@euE6|5Kچ1j =Lūl}a!s1Cj _]GܲwaZj[pչ asDX[cS* ap$uR CMv&q4}YpE64}v&/d̟-ӡ*h]tk8u4A;B4t{J4[|蘙ιGc^õE4 ]:y΃;]c)=7r5;͵t]g\Y@Nnk=W0k. 6x__r.ao]t5UK!w #=L>0п̇YZzo|;譗;۾jfo'|؇EH3P*NV)%*L%@q:ǵrt^a\Hc[q@֋eכ]H/~>`I!tNsVG3DS|X-!My){ ΧQYϔ^oάxzcK\OjzZ/:kFH/$7<^mNw؊|ؚCp #\I@gæ6a+N_.aق묽|)Cy QqbR"$u^(%q?D0$xT9Z$R'i SF=Wg_f0;G3 6dݵ }In|6AǢ1Ǭ!ǢLj?iǢƬЛhckjMud:]0ŸB{Wu$8}Ք?~4}_տNQ,r>~?GR T/?ޭrϫKLJx3QhEjZ"@#P`R^Ħ7I\{?m3ǗgqA~ϝ_{bs/GYZ]-c?@-S[ S _V๹~HM ja+!L/_H]@_}6"un찹_ŲP=#U%ĩcS=!#; gVLMoap(fx}5sDgCfH{Lη쯙,iسW{; $6{i,(~~ߧ~^F?ۏ盇Δ%Bp,O_o?~72}{9oEk_n-Uf9EHmwxpw6O@_ң+:_y;gM2?Of#W}gm6n^l"`^sSpSH} ߩ7[J^"iV _S>߷ OMf\Pwnx{*uݘ(h N}|޶+|3mg/glth),'jI ӛj?-L>eU *My[6qKZvK=ײ胙H'N09K"4G = yrȥn3_ Ƴw6N˷Oo>4}xaBn& Շ~,|-}ϯL_ç_n endstream endobj 188 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 192 0 obj << /Length 846 /Filter /FlateDecode >> stream xVo0~篰=T; ^{( ĥтGG'%Nh{*ppE7=ڬRDm?fmzһY.'Dlp!)-&g5gyfO7+g",n>RA gEBּZa9tBT%3A6oR.ҚbcX8Zp5!fσFDY\{t_yGgTj%XH}-3 &:^|`|cA,?/'?*T_^ w)*XI 8?Z+axa$͠UգxAf  W徨)i*B2^^Dz=|/)X:D-YipӸHYfq< WQLo9_˿2bǾ2N&09u0w@^uL6`N2Pq𛈝06c'5lcvwOeݴGessVVW;x Qjfi11g.L;-׫_%[໳Qu~nscׁJcT' sc(YO#3K+L`*1ƖJWn|@˪ץ+HkX.0; jVJ:ep &} .C endstream endobj 189 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (./figure/onlyonecondition-1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 195 0 R /BBox [0 0 288 288] /Resources << /ProcSet [ /PDF /Text ] /Font << /F1 196 0 R/F2 197 0 R>> /ExtGState << >>/ColorSpace << /sRGB 198 0 R >>>> /Length 611 /Filter /FlateDecode >> stream xUMo1ﯘ#2񌿯JHRQM#! wƆeɇ%=`vͳ獽 Ss O3Ѣs@9ad 3+<4//r4@^~ |o&7 𩉌)@b LH0,AV5 d2z ֢@UdxF,!ۚ,fdI^C7Gv ($ &Q fA8cʒ1q6W :!KaSB^  K-l͇od'吢4L"=j:.NVl'SPz71wvQ3.IF9`´(9v._^.{pkã.l՚mgv/śm^)f&WoJu<#ӳxu_r4lu`^ Nmߕˆ},,b2'bѾ>yZs@{{{*׶j_0[m~nZE&nFgս_c`o9ZrK*)Cw HF(Tv.)ǠkxV?EA|޷,Ck+Y/W4 endstream endobj 200 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 203 0 obj << /Length 1294 /Filter /FlateDecode >> stream xڝWMs6Wh&@e&Yt'4($TrG"I$A,o.DP W=-^=`DF1 C BhP_N^~E4:YlF4Uwm\'1ywML/?D$\o^QƯRLw^ K{|Y^87^'vz-f`X1?R߷\Q"+ƋOi!ϳ<?NWbǰ#[= 0^vEv]}apl_/yvQFwKj 04-hq#*/7[ap+b')R54t5fPv=8WxQn6c]R. }X^4c) Uη@K|o5=0 ?Fբ\ (%}u}L>knteU ![90@"aO^Q&PS8Gt0jт#{ ,4-?q6<+//э@m"oԗiBC(%[hU|-@cjdVKQ7[Vk /3 'jcjZV1Q腤8j >rlP9bȌ°1a+#pXgtU!.lĦ;<< qofZ 2! OH♱}DZ4w{(Eՠ8v}ToKZ'#Q@pL@:"F^ [hK;rϤOsxz.3J;Cl}h%ئkI]ݡ0-A${#oX02`mz t؛fF -8U}؛ŭwVpcA#. wH> stream xڴyeT[5q woi\kpG3sgV]ԩShjreU&3 Pdfbcf+ؙDAfvfVVN$jj1'1 d/n x%S :)& X3 (L&oj=E dea O?ޢYcS ˬ P t { 2*)%ueUz. ELUM] .&j0U|`(qWPQV`clry \͝@v,`~777f g03ɂ,n ' h 0.fo[ϖL?N(J&+BĴ9Kc|啕vV`! 0K#A @?UNgꢠzۻ8{6lS3s+[왕_2EI U5&ƳgRUǞO<qy~/+7֤fb ;ON`u=뿥Vfnno?7ҿe@0tM-YP)lo%r9̍m>V+vrx]`fe ~k򷃂Wt{sLҿP3 hĢ53_\.v@,[YzhJ?%܁fV`STr[Ӌ[v/sdְoC0pMmnTWo-ENFKDV_F 3+{ ;7ع^loltI, 09!Hn.?/E߈"/`e(qXT8,jB|<Bloa877˿X -oߐo77777 ` r.oY P3:*ª`' P1]^I?=h-* rb0ظxxԆ?|M1o-Ogd@w),Tur}pDx ,5Ql,Bx+1xcFt|4ogB-//+M cWf7}|$D4K)ersN76ԇZ;#G_1/(JL58bcϏA_q#,0L 9tu`h\RR-%5z5*b9g[^L 00ߪ'T%oS.+ㆃ |A!8F_otAsfP8Xc~Na_Uo g4PO}؅WjLPz7:[ 񈆄-L /^£M.A&@Fk/f\#n4+nj{G1'8}uA|[LHo+[TE 8:͐8೸$W,^ߴ&wCK)~wwl u=ޫpN*\Ft0 QٯvPl-^定djYqX0r ">p {@jՔDPf&P U򮎋R[@\8Ky񌱼1Q)py 3$s[AK+9/שdʬ8!TdVKgsm7jqWI%@$IP{N,VSaZ|N|yX٠=M$ˏh%R8J kɏF !a,6}\'M?難V\/GJsF[~^5W3Ěg}QFlj&:&umz坙{@`fz1bP :xN+(-@׼#j)oNS?r ?Xp3_ pf}j1hdKYqR S)\PR-@ޒ1ɲ41g_ݕ/}-7K3 Z-Ǐ'P>愈Rf茫WO6^!wHjw5|&c`A5 7olJ~؍ggNOlBr22Mū˶z0]F,m`-^õCуܵSB$SΓ2#t BCga՚{L.Etd'5/Gnh6 6}y]67#yűPY]ذ2&?q$lƲ$prmqi$rL~Sxwy{Gugʮ%:Zk|)\ʕ3+*=5;+IQ'rblT`'JW !Q־(+ߧ%1JcX)jw}m#[m֙7b\ms/or`-,mNof4=J,⋬P>'.|:sgH5DPqX [į8%@g, UQ6c pmR_%XD 7ߡoV\cQ(BSi%82jqlJHoE_)-$4{um))uYI Yx:Zp5\W(8І*]X q/q00AdFw JSd)[PFueT⅛o\ijOmbǿ)[Ra!mʚDr<)mpЬjݞ;Ĉ cD]\Euw^b-fBѧ1jegŞ[/0,Tf'OݟĢuWO>l#c7 k07au {n#_.[i2O2gP8 |6llG^ 2w>\ ?TVjb,fpj?.\.-jwKk$dNF!ge"+rC om"ޤmKEAU*"'YޜzndM<<7 UMsxɥzf6Sy84U Kюv)B{|(j7k՘ $ U;HqNYn= =9Ct([eYH޹\ftJ TG7[+8ϑx$~㲺.8\I Ic:q' J)T,Q H9Sd%2&&Ri`IvՐz)Jv#ǯB< WM q ]h}R|ZQKŎ|?wȳ/ഛLwmHͪ*t),sY۪ͺHn^ `G"V6̮99|gt-]n0uXD[/5U=EĬ9klS7fH jsos+r8,T/פ9𡋪6Dxg¦i1&?KZzW5'if`(B@>uI3YiWj,<ڵF]Rv.oJbASu_̲i!0p 5iA[C+gԅ5 MN?*F~ǃىH跐 "rpnqhP:ecqf;BGXX;{r`tT^z?va؜頚0EbOgJkBObRIb/ .x센࡛>}o_udM7ؔTaL[*(|% +lo]Ei"^:!cZVΠK` hjalGWx.-r5JYc #2Nwy [~Z#ԠhF쑋&l~#nhn75E;zOʷubnʨaۧ zIhhE GW>p5-EsāFOZq;.k\9E4PM!XR#v;Fd0Kl-IBS1dݱ&b>U{8"LXL&(h>)lDe=\ƪ~ %.? OL0ђG=Wث4δx ݣD*>ϗԝ!/ &@8٬SndL$O&~Fbs55 ?nzi<8-7W}ntᰂsάϾm=SΈZZAjSe88^#xu?T\bVOSjuPh[VK$a#ZV~wN#q֦vÛw_i{Y%l4 O jbyգKn'\ AENLxyy0 9Ǝg{>!5lhX9m@}}ӯIo8\[:`ԊEȺp:t ݮ.D1LNeh}Mr(@H4Z"ݯ6(}bM<"c!./ *-΄9i6E]F5lu֯K&ǧgP$Ň]'Q!~OdE 6;$_56t C1lhKU݊DD8 V?ĝR˩-@m G<#4/-lk 98>ݍi'HM.)JL"FS<6z<d ҤJCv'ܝz{yx]BnFY>Ƕ.d;EX$ȏx릒/VE,D?; sf<E]tL 6|_t16_,J۝l$UL3n5W!񙠪Q2vQhaL/*Z#bV.݀JAB&@Zih{}fJ3諬q$cCj]0z7f-3WlLs}Hj[v6%E%&t; 8t)=dǷ$<(]=Gr#4ي zE!2d;0"ơ .gk)Bate3pL٢_FSo[?y~8g]ńFkJ9#4Tx5[T6 sQ$.[Xtty-qU#b6L$0cyq\)YQ]@]+AX͙J5[`^;QX^ Ud )6ȹS#&k.3-IQBɽf" f=Xc;"u~GԻPfHIx` [)`{Q2oXjtsP|S)Zz^Bڋ.rګZ斋ю*g=*Rs*.Ca *T yj#T=9Ǽ9J62ZLƝ/]I6qD??l_eݲ,/r x/l{Q7#&< /_`v#)tS93TAs4W'IIkwF蚢[R$0k>Iߋ@z<5+[Z%|ې8u{7ќUHRhͬv-^2"Z8UKzc`T}!A1& }!B)4_>:'Т頒6LHL7oi[}Yot:'zKN,=#߱Qڤ < (A:v9js'X ޭ0S? Z}:RtccWe2! `CW2OI4cdԴۚ5|oم[t鴇4r"h:2go*V8 K">WDNvFhZVVnTԤcMdd~涄//yhlǷ8s(=rU^$i[;M<eN\*x9 l4jk;3K%^(lUS;D}V Õ3UfgDF+TK>כ! Ipo[^D0ߐح ;>ط_*}1UW u ϨJIi&11Rߞsp,絓QB4{VK&=k|_ZAbFxH:me4/:1Z}^uғ,z_ri{ڟAOraͺ,-7^[.-ē)!ˡ{*&hB/ AZۮ-1 #<;zfڃ[Y "j۾viiB깿fmΈ궙4ߞ{c}P|=r>dD?f)ukXKdbtL^}7\&-u_=EDCÈYirkU[I":B~GLvq C,H2% v/nѤCfOBw Buxza WH?WPб7o^: 5 JX#,S),ĥ<]%fdic*7ٿb-K8 5Vb?6}ѳ32;`X\?_3$r /"Ri.:W#? d@<V6{(můdx&pP/ _М{W ;FH-a ~,7R@L&W[ Zք: \~{ː h{ubg'Y3@8yAwOJĿ8 ϶2&h4"n^p9_-\'0,uVa% h䏤zk_⤚mΊpI4g:F5" T&T‘=;H9M ]`%Jb۞>ʹ,F3Q@Jxw+-'W&H}  lC2 +jEbp:?;yQv>oD0 x4I89&5ާXAhIxK˃|T jDc페0_4_h#dM1bի|{s\ڜM0J_(>Ms'&wq-4VН!Jyf3 ,yl|U/_P1k_hBX},H=zE4Goy>V;0,UI.%yHKMDz3M#-C\Js`po8P|E`riy3B[3|B15Z(P- 6ab$N#rK (RQXA 3G^+N5 c>(W[6?B98w\c@OMiL^^:(F+ 46+Vt(?[#~̣z A&3gM>'-R͇WG7a_A4\m_MՔ=2<|E35kx*ИV15]e"w5c^ig7J[嘎<]9=Bb۲SbGTYG}'/$QỸ.^ X]UN bZε}Wwy(TtX3 '-a .G\K2%4|03x>¼v.ц]bJT׊G;&Z$a[0ϜF>A= tJY5H>eS& ,뎍!l%07-caZ/*pʜ'1pS[}(+ dLkvaUohT=LcM}<=@Az 6Jڦhp8ȄLHbǑ_ qЕܜD[*`8_1)nxFYn<:,@v^4̳,f2HLmZM֧J-+on.4:b3guJZ4rty2 Q z)U1R!gJ29ѧ%Ģi.agV[.LwNEt>zDmge9[!ܒtRH.";#˖zz+c 귗2F gφ8LHNqJ"E_#7CSd^YSy4c}=v1i9zSaYZ(%}%g9jd>2H/"?5\RDC քggWq QB(Bo.,g6のz9q\:K$Q& <l|n0+\g8mҧ ;|nx_ex:|8ړI䤃H81+G~WWDv噳IbnW*U,F7:~fi'N@ Sр[B:nȤ,/Wgvdy=|fQ21d c\kIRlIY"Towj_'(|>?~ dXc$'\jEwt։p1ҤW6spESwGe\wIn 6Ά|%x}tA0 q{ Ѥ[?"Vkd4x(GtU`D͗!T#/z$ 1$*rfD wo 8" $Fi!nTZ,J4V,]z^:5?bl9臆L1b._E[ wvGZT"Bh2Ƌ0$`dTvJOH&5kqFX~cC鴛d<7VWEFW=rM}ϵyRM{K!\v>OR+wzޫN)|ıaQBgUɡSS>Hg1 RNK-Sr)Z˯<-iW4uC=rwYC#D{wkωB<+qje8o6aq>DSfY1Yl26UPy2+&҂d(oS,LK >' 9>qI:w ҭW 4q+>J+)8 t!以`r{&$Ŧ6<@{rusF Օy^l=(^wi9s]C A {c7r֦xRҴԔ`p;-V/5ORv4PVXɢFϼlE7̣!ZЃJGNzc׎XF$8z`罇zj Jf+r:M;avnHLpJOE;I~[6|Fsu4ijU}Cl1x'N+Pu=,wyk:7ϐ{%TirR YEvщ I𤜰[R`cV/,k4AYٚtid8ςyKMjNA8XaacKP7DTj(֬]̈=pMj9_]rQj꼲1G7$bLэBOP:(8>fg9cmn; 챳_Ӗ.C,%`J̣8e՚@E&}5]|Grѷa,:ߒ-x>E>gרDt拪$t|L |鮯s'Aƒ}>8lܗ~ܾb 8^;ǷfnAxcqm5)跡 VJK /ieH]hoh|<8SدN(]wq鬜hAF];&ti %0 W)SעFƨiZS,"i7%6]aZ?18侵|:_n ;LzYI4u—u a3BAԇP;2nC3-rtsaCp'¸5mW0|' Wq}~(ƶRɃ'kx3BU$*&1elԽ^WRn %Y`ty܉yϹnZB'}nAjḡ1rK[;?ETC?p|xceGgskZ%K ,+Ekb n<`9-[1d[^f"³aC_SlzD֧C]d'8O3_]eܢ(6Eָ'Yo3V_嶟4Waj~wPʜt#ˆqU9BX|2 LbD*Iќn<ͤlD>Z{B(0J3<~Xz/y-B{`mډW5e e a8σI}TUrm@3n9R(yr=ϥ8*.)mpC@uɫ`F&7>ۅi\a&36KLB5A':yqҎ $?l!ʭVH>^yHFT~21!-.KѪ2q)Uz~u#aU}4BKzRz,UOd)]sUgJzƎia}H5d.NE@&\9WΑO_߆_">]fr e[/'rerHkӍ̚bYr˼*q0 N!iS._(᩼[kY`wK>JݴO[`o_3P&<l- LnMlhnI:(֗s'0՘w= Ƥ_5^ n:g%mhW̟CqCUz[!Ұ ̨P[d$`&EuՊ+?ˬς]ZqȶN*u8c ET,) @EsU!`+FeBd 2g= QbmFUvRԫ,14POBf!PTJ!j%~x_sl 1|3J9TLUdhim;0S@8і@R!oFj7nQwEd1w '0V3/Tۥ*22>'~< _cnaj(ri'cIU4I#H,Drx.vP =DjE,_W=*؎؀ZJS,i4J YDEF*(g)e!L^Hb_Lf=^Y8*[:#g^խGQ@v7#;[i%QV[ib2;"*u'tv\Pן}{z\ 8ҝ@אOVgdw7\/K]#vtg|,ăwY"!,,_<>U;Rp> }s^ et\ *&2ʃKuIMlqp NP$|56I3\XĐ6Y,m" | s[FɃyS)۔um| GlPhr=,5F̟OAӊ=tv9OlD P]?ڥQYz{JAd|.YIS1ujMr <')M/3cѭnWUg+KЛ-r֗ڈjI4We jm5 ?o,$I#~OGALA"Ǵ,G=]~1IR1[&W'QCofx#7mAۄaBZTHbѡһԓFg}L+6ۅ r_zR",X^8HRw[NC^E(⠗?{RC2WAQ*Sg.`iZr,exM,`Y?$)v0XPx͹-JNoHmUொYn$р_"L~݋$鸹2Kp{:$ł'ĒnDgE,_TÄe\dm=PլF Vv7 A%>hJEfvq>ه_Ǵ썁bf5!F?"w^(c`]u1-| IB`"LUCzGuD-;J߬H07Ηc$\! _ԙ'dn|KMx>WI?]R rYDku(S1_ZV]L; c~ qĤ5j򸎘|H]L@dC}0^\F6bIfMuʓ\f:PK82D-ĂaKyڱL eEҋ ui߀jPU?1iBwڪOi r%uhg<ɏhߢZI>_+6>5L$k+Kʳhɺ=y!Rb OF#qN*ejĖkYV%TOO}HlO mO9*g$fMO_*BAI:\vu( Qdtsg~lu"ylB*@?U,c+T)1ܮG-qE zK gu "ud`-uR/9Ϝm ?QQ&+ľ1=|Ý>Rm)PME\J1-9DbDm8 yVCS/<_v/.t8yyx>~zVWin޴VV̟f"r))7V Ղ7raj ^&؛N.b}Oac4)Bww(G"/lpuFfeAV7}A9Mz*(~dT&؛kF)PkEz3`&l y\vO>~!]P#%˧(C&6Xߙ>Zsfv ',q;jPٸkgrʘʕ /GKwr='SpUR̪n^4$$e =8XmHeRGwWƒP0LZ5k]?_w/S˄8@~dptwjkC+CAqSwwXU{ْSVZ& $/ _ "'#kN)ӜK=qSFs'(ibՂX{Kp,~Y7yt@|~FwkSAR} LO]"zW8fm$⯂Ɏ?YlȎh|O};e=I?FC8Gݤ(*ȵw[[9BxCDC9fRT*?N+5ñp()\f;rX $2xv܉&WNA륦n C*iY)kC_`.HPlG6/lX8!/}v;~X3=y}n>{5GD`ru=%oи I&ytRK2Pf<Tۖ.ɹF3!Jr}Ǻejj'%ߊ x?"TSX<oLwF'H#(K~_Uw^Qƙvij@Kmy"v~zT5tZu͹&*9~hZ4 u{]B. ~>o0׷Do9hWF$±捵̈́3'`:X^1?ijt;XEpҹ2ob@^x flk( X=Q$bryv4E\FG6a5 zU&HjC&cۚxcÞ\#)cCIҌGlK:BWUuD@xM_L(R3JR7ՖF]q/ftQSD̂ x Scfc yF>=5*TWZybfrih V̟YW:S)<ݔC$I ]k>CCFUZ?-e;g b TӤv!,=={px ÿoӜa?J\<Ɖ\'j?r+/l,0~\{(gQ@*x .Usmԯu6*(:'T.u䂑edL?Jʎ."gdF~B^k'}Ij<+LL˛df/y{2SjhSӶ"/sa/*$f#ӈh INa`Lt5G)ÅzMޱ[)\ 镐_tm|3cn5їɻo"qM$d`lG`fYkmiT8|-eao/l5/I+sM%=A$%L] NśYZ . ΗS2!;J7MBcg9?Zb؜QMP0P"$w00_{ WH:w5FF?Ep!Yad)U 9 gꢈժ)}R]L1T0Wc=$LtԈVo/3]3[HK ŌSOv$ϵ Q0v#7n5 H$cI/3lKUU9j?3~`g,23JF6h)~:vI\QOq"{j>c X肓.)%z2ሏgRqUWIM FMkG\#& HNN7bպF3ܞ_(8::`)JZ62^>l^OY6Ʃ]7c[ :Bb@*쯒wҙzQ$%l{˻$|2xзw"s#hm'4P,?c0TŴ#c+PC7,/ӥX@_b̋ A: 1|t-I&=F82\Hús* ~y*_*ɻz91Pkg :y_g1*ܡ:r:w:EB_0^֫L&u)8X3?]>rxEq04/:N,+T"ۣ/*Z ie7Vd$OgmKlVIEi}(TU6q)>4,"{W|10K1Uu>!38$/p}Chz7ϳχ8_We,fѾifxbDk)I00ez\2 Efrthi9uqD72 T0M\T:;{/?^'뿛w57F$1Y۞Y!:98CGhމEJ'HJHrc.| X7/u. Q %e?)߹O?\K4 1\6x@ z@?U&S* ┧ #H9.o? 8bsP궗<^K0[Pz c7YDwYPl//eΗBa#R;~+5tiS2lO٭?9:Y5U^QiȨ(}$9TY]$ --.HH0Cy,a}/ǣ@Ct',;ɵ>gKυF.کEbYyyv6,?K/iQ?ޠl|9ceޑJh·TV6uVuW MVv9ĿKh2>ZH6#W<0B4 ߖp-sBw6 mN4Bfݓ pb'#h5b dbrJؠ;k~U*9,!ɥ6 )0գ͚OLQ.{wKPxY1aLͫWK?_F7]x ^ȑu p&E\VPL3.x„v>çKhegz/}8騇H*eFbK@aQ~z.MR^+#GM^hK-zRTiZof:CȢ#!!qMt[{l(k'*lH|V`e1ea)4q8,h\mӟkH/%%gj\B--rq5vfv+  G(SɈ`452KZʤ+ZĔs$);r?NL"E6*h\;AuJaU掏l?ћD)TGm*B9R11zOm ~r=Z*ٴ {tsx˶XQ&0+VԿ7@U+d܄iK@EmCZ;b 7-r!#Nm1f[Yݦijf-`aU{/|AP~NO{@e#ʺ`KEC&Z(pMK5?5hƻ Ӣ+sU{П֖!#\Mp- 0xh\* ,:TY-#~ _-PD /h*JL݊{ ϟ<#d/De]VHe'?'ڼo0_346kZA:%75dW)W(`&aTek)NLenGck8 tcn􏜑z V55n()*B"a 0Y r6/r9=-Ƕcv@LikK1Ѻ,ꪙTXE  <փ`ߒZa=|)au|#D&á#J٭Vp3A.ZEo(v{f?NdmYh2Uu-HTkSqn%o76ߑHn]UzK,7[D_2|3{<+?DXմPn@nJ>-BV"D*i&)wҨ >Z "_g ,UvGԘ\{ʅƚapxGjpu O9)ʍz4Y$4 Ej!= :]qx ǧ`:kd-uW18$BlcgX8ҷ w?? clu٧B~RތKz 5*J|"DJ>B{XbITkV(ԓš8Ƈhpl7,>EF5if1"G= _瑡g 0uڤh! e >g U~<`ʖS{tvufme$ŻR.ě@iwhp~*ft,.5-riۇoV09Uu;}2%&b 1H!ֶelLs+, ڸ>!d> Hj6ji iJC5R_}GĐD,[6k? x[  .Wl.,+>\-ubLbUɿz~kym/̘^ܒcwlFRV5B&z0d1&HjӆlʑmCoXkEk%-CVˊR\ٱ4tEF%->x54= # $ :YS>uDw]XV͋F=T׸⛋BPrU?wп]$0#di!ؤ /+:^^.bub?K"bvU2cH!Wg,{۽ms{#JԳp`aKwoS.'J‚v1Ffȃ.t;>!jKrsWcla=)g`Lܧ_z Zgu=K'HDG"t")VCMV!-2X{(@Y%R%jk\/=CiƕM-0}jal1\z kS:X.,,D{6axBޣ F G+XN endstream endobj 223 0 obj << /Length1 1698 /Length2 1990 /Length3 0 /Length 3040 /Filter /FlateDecode >> stream xڵTyLZl 1R0$0Ex/pC84.$$"GF CBXL'4@ E@0́xl' 6 `XN3D|>*e n%tJ`{?d%p:4nifCHh Ҳp[3ߩl&+87P"##!"!FF!d>OϓHT ƈN'p2aDKlqcJ<DZ9y@?@BY3#1 z?08NkDIOoԭP|gh(pyLrp<# \,e/atK['Bxw2&d|HGS Y Ys>a@B"h$oV6a)^7\;X!w,39iir0M -|xB8ˆ!JE ?4C21\d6 a7W9hyZ- Ex /lE< k qy, /6Ж+Yn\o8> D#Su t-^iƯ&r ~e"P2wqH;C_$sA(} @hYX&!@!#(lT@0qCRXh:—-&Zy~dk&@Ca. p T\A4߾*]?D[Y( "Lx;C `X 3 M(su܆'bl˟iD~srcBkֽ Ue]-CclA|5F;O}`[s :LrenI\1DO5aڲ1_߭}P龧hGkl}ں.=mѾ+i7ch$-ٞ6ҹKV㲳V85JϬIK|gMcOmNJ3MR d?inexgFԏGVi2Ћ7k[ltvVZ䝽:hF[Kgil]?}DO[IwFAcEh gcͪl/72ce}$9箩t}Z|WΑG-rY=fdcs I% +KJX> ʽK05P᧹F|@L܎ u mH6r c;S44.D xŸON˰9hpUBI褋a곫[+(%7CҫG֥֯cWS] y;׶(ݳeceV:?GY͉(YlѨDw-bSg&/4QdƍJgU^*xc܂8 ɩu]p"hsڼx7뗘vkoZ]ޖIJB' u(!ۉ}PsSR_&30a:>&wk:EJvc:xmENot`; jU_ K=`:qZI㧬w䬢nI u'nc;^ iSKTX9!)bOn}BmS|y+ Iϼ<`Qo1vwoRKU wOOvB%_Iq$mi5x<|&[xplXF{NktmGF*׍²G|9[:U`펏*:TpFJܹEw/'.i[Z?)f5Tv8mﯛld2֚)SiET;oöf3ruأya0׵;>1e(YkYo5`%gO3DE}U_Ny[Ӕ*&fNusnT[;.]~JemVvOqn2u9Z҆GZm{eo׮# ͋"ԥ HemIw  g$krHWDxț4@`Vm'WS}(+IUVJMuPs)oFS~vl⹜W-e{N+_42S,Fu[nۙYwV'Ē*J3* 4+,q`tɴ-]?}8=}}^[[bfRA{ޖ7GMVL4H_{ʬog-QMVqa-˪a(kǁT/nJ=xݟ}5bvJ 9s>tFZ] zQ,. XP\l~f=w:쏸x;ѩaC9gQEf 柨{vdognغOu=RYa+isÿ;z~=ܻ8:%wG;}ATS`hbrrM_-kPLgn'hnbrʖ>ܿ`:+=㑼zjm1eZdXUfާ>{h5gkRL3[>ۢ|y -w^ntۣ:rg L E endstream endobj 225 0 obj << /Length1 2676 /Length2 31983 /Length3 0 /Length 33517 /Filter /FlateDecode >> stream xڴeT\ͺ5Ӹqw!$8w 4Xpw4YZ U q G3#ȍ]` `gQZۛ8Yٹhh$]n6n@5@ `gGA@` Pjz;9UGW73SW)N.6Vnkp;[ ojnjg0YYXʎ` 0Z[-@]@F]EKU\XHjhj0ފ+kJ- ?5 0+f&8w58@Wm-5p? nnNllVn.VNӴqx:.@{?¸,rYUḿ W$i9Rv @6u'WQUU`jrLA@7S7wW?67Ђ_Iw=6K]2C{_S1S_A6nXwg6lJrR,(9y[Ev^?7.HTY:e3_ ?G|N-A %x$9iK^H4@S{ oʎ.qZڸ[KL/#e] _>XہBcشd$%'2wY8yx..HYrϰXAn?xlMz`{ؤl'MA;wP p?קnq x&n@K?v5v񀋙;ڃw-ܿ-rd xMJ`r9X)zUw_!VA0I?Y{;YAEm6A0Kۿ XQ Xق9N_~z';?np1'S?+.p/'?ٿCy8#`u`uҎ,/x6^:ڛZU0[7k_^_ AAF8/._I6ahהPW)xG?_^,\N~tGſk4GZh.fQ UͪTMp(_W=)@kH>s2Ұz!gG=2QL=NBi IQ޸VEۼ=SمJo&ﶞƑ^ZVLS@=PϧlHRiAM*[ķlV1Hmy/jΤ*Q p2*=.RnZ kʮnB^`tt:HtS=ÂHJ# X[oڡ$4=~fZ}gD-(8.3{d;+=WVBp2$OskddyR_@h,ȓaazb4]Eq!F{%i>jU%?)ۏ!U"1~Z gBlULL_)^@RK!&KMq@{LWxYLF}LQʰcw+6Dq;~} =dD˅qhu#nI&rU۬nx7ٟߕk^zVxf^Rm AΜ%q4$99UsZp5ܼRd@?v WA눾;$k0:V,. a{p}hH)!1)suנ͙}m_prGzVM3 }okB_R&R ?[g.)D4{}{eUBT~y "@8;]_-v\ْ.]_3"[IP'_ JOo!`ߡ+]֪\`n@ae֙yҝ'> \[6hqg )/E$1Ɏ|C ^#1mYVr^@G˟Z܌kA̘ ]ܼz-(G 1SHAd<ʶߘR0Cc9jrUad$`PYV5 S~<:/|yݮW6IhR,oe3xGs{Q#U]xҹRթEvlV ),R1[*|aؙNq^2&ՁZ%C v{!8NoW[!-o?:9$\eh19bSC nsfN#y~Cܘ|{goh=(m^$c;5t.>>gd:R1I@[2_?|Ac]~S&\ BЍG4"-Pq/osQz Fw9UnOlWP7Pl:6\ SvrG7񕾡u5pG5m{U6rs9CtIs9s4(qbߗs7 hWA>_&</;g}v.b|_rcȶ]-h+=ey1^l <64:>ıBeҎ!CSѾkIt_8LmFޗ#j&."Tbl?IļD`ڝ7gKg ;2o*MqTtIb gr CkEE+/ | r_3=c6{h;z.pF$lJ d-Zcء`猒Ɣ4zX霕;Wl(֋y%h W򹨼Uv5?6T(Pf.kmI+K4,8<;M}ph0m9U20\x"ND ~Bw5(T$-.$;f+;K1y|ÿm~tybɽmX@T J"$ϋά u@{ywgyeKq Wi\nݴ=" {OI'w(e:C#ڍm3|Yc{,a# !ְ wѪQ1^$*/"ԤϐmI32k6 zLn8a7&J!b$F4.,@+` #6`U QC~o992JTyx݋M&j5?%&oRY+[wqTA¡NTw( 7(1]_gΨ GnSjƒOM*{ۼd:lY)ӏ#BۗWg~_c}>຤-߿ ~{4&V @vǎoA`&T\m= a8 BzrШ ތ"JmSufAJZ8rS\{TP_YL'* m9ŸFTZ3oE3WB,lcms'3qȁkmMtIl)9~ /;qF6i3jX0t Qw;#mj%w ]|!DՊaLRv2vۂۄPl>l쟏=r?Or8`$N ,xG~H8|p@tWAV=;POy%>6TwO\Ϩ+XN&- ?fV~ ZJf_괆 #N}o ' 8vv)\! /b^hܙ)Gz[y A;60}hrPy S(󈺈Hh+W4΢,Wnf]ہonjIG 6(Pbx!,.TJv7|9MS[W+O8<2tzQz/L7N0b З]2R)O1[LP-*gy32gq$R׏!&y-?]\" /{A4Em\RI&6[ó~NSܲIfp¡Jj\g~?#w5[ѭJO6/{D@Ť[6>tS޶j-.luuTTز ~k{IXe]s8oC^&|eͅ![okx}#WڑF&n¢L)T S5p0ch=k4>,.h~o/W뒢ua"feiu!OUjc[܃'6dU)b-+{aQv(bv%w+X]3g"Kku]0pWi钥[:9JEfR4/nޕ2~LrgaGhJkb"}, fy9ZZA|m@q Gf mޠ̆ᓨ&/ P{ =ZD%޿Sdg_u 4wו o:{#}[g[]ӛz0O|9oeӝMN* [Vϩ~%R|.m5hqS$AѰ[:@#=7^KP2BFVԒUT#J3%uRjaR"I \[bUvGn ؄(QƤ(WWɚ-E&Oh<d~uqZ9B{0y9*-$suC\3C;Ii!=|5Զ3RKez(zy4'Iu> lG9zk3Nөvo5^u[Wv3)GCCP]-4ne=kw# dpH( 듉%QJ۪s }>)!ߕOx ksg  ]F`gߠT0uk 76CImruǖ&O:ORi"Fs?t 2(,$""vӚb[3WO[pU)l<- Jr 潲­jxQN،jh mr#`^ͧŘZdr՞$@3r$aE2u''fn#5V0j s;2dh쎡ENY$%eE'KE_HmLM%vAY_DZ:vLSZ䡞L^o)_ 2>#XPt#B^%8zmU8 _hu}_ -Š&\VS\?3 jCK笓 NSfwF=PgM^(4&/s#idj5-CKJj'tS?f{=#};?.r^}qIZ?L},1uW/3Bӥ5C8ޠ|Ͳ-^ wūiJ~,ӄ(ODhω%4"nX$Bh :4fcdlzð"Hƣ0пbaTLKxXK{b,ܔ=8ٓ';jAj}bӣ2.DWfۤ#E 'mb5:`0pܶ/" OQSe~hRЅo{n{FOA(0 Qtkr_k]2aqtSTJʕ}]*`̠/{<ɆAL/זJV0 B l7g{,'W huK7hU(B]3:>lm䜆(7Yaj5llؚ9'C7rza>|xވ'rh|Rz!rve'$`Ҵ6/ C DLTo!}oz3DŽMeĎ*F|D,%7Û32)fdb25 Ic!J pM/uCCp&ՙX"ɖIn~76l<Z*F wh )*T 3\Κ:7J>rYZ&9#2D ئ_a2s`[s*,שf\0EmhbN &\ӚgYY\JG`g~\FӅB?dz23ΡS3GUN lk+þ>< ȍ^*p۶@ao9O-ZK΄XqEMQh]џ\j)تF! 4HՙT S-I{#?MP1dSب(WF:+cEOSB}BCm -?Yer3QX2t0G`voX#'43>zA|}YW#!tМc;sYRWrczڌ;> X*O15)קBee=7/'&f"+2Z`UwGY$!Jar&Mؔj-E7l=t\息?x>!ث*߽yf^8A` #6E+F y #AD@{=eS<)gjz;S_SĠo^}PC?4 a9L<ґknʹF& 2Yz JJat% #Ɩ1 Ob<@TXLi__75Vڿ>8Sz6YWwW#h@Qs웶v-R~#TsJ^?&MۊOe)cB8UȆoS "䰘k2:s0EH2J0C9Lv[ &Oxv Ak)i1&Z}DJī淖EЎ7wƸ\F-ꟿ|,>46X2žp/ *U&u/[A\_Wn$9HF NV%ZaBy(VcleGڳN@b)zm`g2ol+ t~QKE응d\͒`,IjJ\Qg,OBڗJuR  ,#tE+ڸtrFRK-# lnj) z 3jǛf|"7ֵ~ce8osƷ۬XOF(A깨oݜ+<173ǀ˗e3!.8 z]{־JYҞ4AJٺC?1~];H-{w#?%'yVyЄ8> i'P4zDĵ3`{hf"m1ll9͝߸ٚ5Eg_d.iFR@1]~92RFcpDSbU;,(VL+9IbF̮!h5\t8981v{)MGWaEmp(z-gp8FpXT]9 X6'|$^/5硡!y~𥴺V0~ =WW[yP8ISɆVAc2%o4֛;Jo|W kVBhǜ[aK@(6@ū;i~̢ʪp..Cp;avv"[6:4'>|ƃ>#ZT<ʁaI@s݇J>h@C5~VA ,=rKjbM _Fj8'ëaSE;}3:Ŧ6!<(X& )l r5mk!8a%e?'#~~K3Gv+7ryfzqpV.:)xG͔˂#ܣ.(LYo %gd% tO<]>!5v-rH7e5qٸ`te'm~j;Uc?SЛOxp8 St<=r!嫼Kw!eETSó/X)ExaJEk&$q[xCdБ(5d17X.4PnW' kY(=Nnݠ `7Pb$- ;m;8%)sdM,jRFc 1S|&XYiO{Ȝ+*Yq ,g 쓵03/R2TaWQ4z~hF~|Axmc3}VE,o;ySM 9AbV~VV뢮L4N2gq lXܢ kSgb6xE%I<M 󱨡\n+g|I]1  A Jf, SJwQ΋kwR}|P0!Bh^?vmCmLy+93stN(?-5Ati*zIEqfs\X}þ#3~D:K r m=fQ8XdLFOOeY:;5mNdTLjJ.fXC/Ldo]EPt6rD1UtqW<\-BKgEczx?Κ2ݲ(V DB;+5ݗnaA\'ʩMNccaeO83b*Ϭ6`W@0cZKZVCvwӷ4{}ylk{>!GuB%3VG7pw^y&%]^[q6ՎOn- ? } Qghy[AO.pB %/MG^wH,`٬VY'dӰK-vS΃ï 5eg=˯w7mX, (mKVËê5*I@E|K6gBa"z)P9/SY'[Iz4O-[ wt{K(bzF^" >8ڹu'ڛ&I[`jϓ|>.L{>'Di'aL)ObH;&2}lPˢIұC9{%9-D'IB/;4lu)c ݼo"/[Hy5xBC `Btj´n Z }(r`i':,~Wm[Y\@Fc䃉KЖhL9++dESNӎor.,f5n?E;AQl"c9Wev}:WO\%}]0*N2QeD+>3e) } wx/[fۂ4h yGF'"FIAxzQ͢{Y#i)bL[؂ո{UdmaCy!D.v!Jhb/#'='|(Pqaw'@?!)Kỳ=epxOkБ*䉉a殸fi^w~v!75,Yz6SNJE 4*#c՞am#A,WS 1 wzCF{98Yvi-3> HN޻/e[BR4oϣI^CoP||/GΡޮ+,",nR_Y_3TؾK%ǟʇXKLE{D/6x?jx&[> =1kȋ~Eυ}|!yv)p(;Iypaou:8c6HWpRY=>C/ÉM Z{Dn4h8=NWKusoq,sgu-z 'aVD5;rL2\'/1Ԇ!}-`h@ƺZrkQ^'c5evb1-[i%q$`m7[o6e*tÌ.FFC&^]Ƞu"|cEklǙAኼ/^Y1o*HVKR]TPURZZvY3 `B߿~'V9"▗Z@}I W A#]ٌH)DDpt0v_yP LL+4*Ր M&=GlLў,\̏&_`sրٶk}Vc H`]| !yHW xvջN.o3 hEP9#4טB^++[Q3{җKߖ9ͣ뜾ӫYKLFc]q Y{SI1.-z"j>(ssdf#>zT{GV_͵ 8Q{Ip2Yc%F$;1*1Gjt3k~ngG SWv9!ܴǑ{b96;SZl*[C5'_uoy6g*D0n-VĽ`hT;y"|K-\3Hҹ~0ISڌȚr03-q-mLmU@>#h0§esITktK䳮/7O4vݿ'> 5AA#ur57éV=g^ҧ~-M 8 E? *:yئM4p${qh_Ք62 ܺ[ȸPj|e9 8DT/)g踀{DVd܌HΖzbb9 7W8P}%+.fI7l xّ4P}Y8GfGD4.r@ H*j/c[]k܏3)L4ތ׃~exN|r(D؏QkfPJrhP£vsҜ n?2)∟0; ix|GDIQ/rJ>C4CPvm:^i'2S /Ih:& l2V9p_N뚱vЬQSiS*3UqIglvgCTz"W8Y^^H{,|yc8QSm0Nq3/!VKSX (< ? s4Y+NQr~Zx=b.f/6U}(ӊ$D77[A\\3o5oʀ+AdEI=vQn,:@?̧  2z.}v0JӱԔP'{ 5=N/d8!GNq?&"EzLxd I'GU-ԈWoVote{ax‘Q-lHO&_H0 "qIMnW/f$[[L#1s體Gu:'z #7¹{b NAܳvH$T&UypK QŲb1BCrj;7H[/!Rq[6ggV{zI Sc9=/4gkbLFg85Lsƍ:FpYmK4<~ֶdq,n3*S7;ݼAxJ M@aG|7 3t?$qHEX('HI+0v@)N/2h|"06D!-ejKjd2q-T):yN2':>=̔qs5.A.JÏY;8_jо'kaEܫ!s;tŊFE{[~uf@H/  mk2]Y҉~VPNk2*~IZG`)lDf6:]k* KPE!orP\g#*Mlea$.b֡%/D@vX)X?,Q?3lCK;InR5x(Z4_%!7X7zS(?=L/.rPK>PaRYr}q]Aٗ})~6𘸱p_NnmXF28G8иr@uCvn),d 6ڐN'go ˕O9W8udi Xa"QiL(A'lf1=qZz+F?VUk98۹d)@Ib;>쫾.0^{P;^m7(킜;fHl+.c+3K)l'1OzN\=$V'9gwpml:|PԌM?ep%_:O'એv0V.0B>GڤWMѪVRϗxU}w5(bMf-E b Cۋq2/Ѝjra^C=ϱUG97c\6H(nỐlNz"k"AS<3T[Ks@q-0Gs!ٜyB/ɓfJljѤ~ϴ^_qKJ}óve>`okc8nW-olqQ*N^Y uRsTۯnIN)w29s.ZUhmm OHgI^{rOdžݜ8*55䣰!O0XIj/Pi1 jOCj{Nm {s>&܅* 7Z,_2@VL^@ԁ!;w`C-MEdlQKiEzeTwXց`J\6*Nbb>;8uj%eۂA?ts}hH qWfH<(XQUZ\+[c I'IWcKF_mqfEt*&tھڡ"(=Yhٰ"2%Y=ךoR>y) !d9"/YrKPO7)+gJ?Yny|+h#얾^m$4 ׋xG-V1촟8r/*v?} kڟ=dk3=5gȭL3I^ǖ vH[#%&QRBce1l"bbYǻQ510kx^n{jYԥκF`bgX.o$SRZ8q})f篛~J){$T .Շe \ $'܀5k2ןSǎQ~x`VY`ԕrpE}7tMMЛ3q٨D+Ѵ%^+'HߤZGI9,;6 &եdw`Wyi.SdK$W!$G-';%)kC+8;N"-1MΠm&-v$YtLdMy;OZ;]g$w*ݕJ秒)ͭ=ЛLߝm= yfz(2{vsǞ=~k*23oxoM$z:֊V&-?ۓE}Wq`5%U=6:Q G6m򽅇LTdW}a\K LON@EzHr#BZ˸f*df؝C{㱬f32\`WNka,e)+$B3qwPξEUƻ3snJ)Y.m_4^4k$Ӿ^^Tvb0'ԇÄ3PXbcYO8.ބr+ֿR'"?9/ݮm>kjp&t+3avP,6Mml"iOЈk\~J!fHd֔ZZkÙȳdޟ#z0J8p6:$ R KjztyiZ*s_s/6|@x 5d}LM$~=y QE@X: ?kV/ k+md)TY@ZNShc4lu@K4a7g=?ȹJ1S++28\dً,1W:{QW'd1+:?1AePrޅ'.3$&mIeš B#ʹt<]9ڝ%k;<.\ߚ tyD6a#jc%]gՄla\*)Հ> =r l i rhUK\} 5}__*6TGY x\s֤o9)g=SU xGA׵j+J՞-%[8ds 1x ), ]d#nJbXikKa͍|>޾5Jv3d_w|Ge77JД- 1$ߊ@mTQMmݿ/PDh a5@Ϳ\ ;nAS+Q|Os~ 긳S/ʌC2t'((9==[^] VK<ʫ.DkNB ̼)DU_#'h8Gu{7 -J?~䗶QmݐR_PB1ގ0Z5r$K9%6U`hLu5ܟ1l>׫1=d>2Zm7=QD}:ǘ'Ƌ_{/,ӛkGZ$F/me%V]4+&M.-u`lg-H(i=C e=Rl-վ3#wPg:dž9*(B'myr1}k>aSQTR {5Xgh{1.%c"1ޘM~BV(VE(3ɀ|& sAn/u[Iz"`(9gQ{ |UJ#nefK7Mw9u|YR&QYQmy^Y ?vb6qwr$k \`n:|7>}y(N4?T,7) O #D*\Fuxi2sNtm Q 71kMі&nS+C;eucSY tCה/q {kD/%Luc59[sR1<&I CP H(KCNn8KXaGŵ@xtY5G8!Xej">\@MpK e،U<2P ;U},3$mF pw})ک . {+? $UV#]MHl$^Q_daz8#' ;i(M_2p^V_4dd,"R-ڧ@.Sr/"ӑ&y{Ahvʏj. zzEo{k'ryV>~3^M޲[SXcS^7Ȓv"IJJFP?f?XJ bёk,% ZYό=ft3s&Z\KNU=ȒcD<6C*t@ _/}~vKe]l)r p3wDK7`ZJ鷰JWp&wMngߖK#N ˳E^WrgmD..G&?j?nr\@06-В)3?p2untb5DHџ*p.& >-g֭00GÜzpG5w2 (jNJG3 -lA?H$3ә|E HY֭(?)Jc@ x'vGhpxC(@W9΅zyF4`Cf㳩0<'f>D30ɔ!J-Q9_[{w"D"^ʃu܄¾spW,^Ŷu 2FOj8T&pπ$8p [& ;OQpzRALj#e͸RNPmhf: 2vLQOx;XL]fe'g mPDxF!nέEzM#72/t٢o5a)lP:+M aaŴ9@Q#B-*7|C3QDsuXZވqs6 f$"~Thv6r$ 3=,T>^/6V4K)) 6tM%@Lu0 WbwRqpy=I#r؋jspK '0Q-Pzsǹ@ : )wضP,-Amm fb50_>"W|U@ tvgem v V3dKouuY{'ѹM0s^bݬldk퀀BCD#bLmC +OH$P:eqvWh)\,Bs^ŎlJr&' 9z_Fl[T@,'`-z햃w~5iJ&"ya[u&ذQy5Ԩ'__85kM:?7G9qݢ@pZdvZ Ћ]Kg.ݸ3ˌB zNJ%#h4=>5//KJ*aY2DOB('\uk .V0]4xdS-vk)ŲGeԝ|h dQODgyHƨq-f=EJ6]@&Sxlf\'i ?rvhm̯ D?b%k.:B/5Ųb#CtL mK+Rܺn1fl:Q\KTca pA3'ތbџ*ϚY[ŀcx/Lx~C`K %tX]H2PS08$X1/ pbً)>EZۋVSܾ-O2R3 /-| XLO9"Zu'ݽmPy+. 6(1>I4cc_KL*tE-|D hP$0Cz{L2YZ,g% #Alg0lL9p&RRKR6*zCD:4aĝNV#}za1&}q8OP;zR3D9!1R1+Li1t=YqHݢЧ=yw^փm 9 W ˎNu5D THTokDNCs%@~}\lxZ*,z/q=^g (CJ5xa[Gl&GՀ;<Y0c-uTP%¿yʴ4Pg;JF7 v$D6rlVu@^nH"AJ iB@39=JPy9 FS\!WXr 1F@Y#T+VyUWUyJ)8^\ʬ~k vLl'9\:S)4mҳv9 xh΀~!D.G } Ҥ~3R$&3v!Pw_F٣ H#;$X|L:~T(f5铿mgl6)h*+}$leyP*8DJi›Ӹݑԟ8._m8:(Gާ {| '{eQ ؤ.ieq~>l*%hUa :t' -V>&0+7ohj/ aRԨz2OݫtN~FP+"?8<8\{Ś߶7%+Ro3:I^o"4rY7Ed{)RDn?8c'M֒.(N1;-󠤕uhH`4:(uYÊr{Diݼh{=2vͪ, PmxnJwsr3/f;M1le]pl >T6O.^~Yx0|Flj{ mo*cFLǝ*B$;.ԠijXj0*/*f_ƒ'ʊ^oϵQd%vM/oAL!hkA1\D9Vf52}cg|z&l}+;ikXIJjbfW:[l }@D}:ǘ&_U=+;)#Pɱ*5w w"7#B`/tThMSKmT A McXUWWqUI{\n-`Z&'7[fsO,xDlXA7*;_AمR#5/o[\̅Kn\"[҅&$ZΓǩ`e{tD1&sS[|b)gnU0M^t>1PXzi. o.1C$!G8<[)*')(\jAa9Fw/h-+]Щe;`4LJX` QMyph5Vclv$ܕZ'"f~+NwLjkvH΋NVٞݒ2~30^+m]i'ZZ:t"=J0{Mھ#>B' 5XSњMk1ktZ9N!~;TZ]~<,*eN(Ku?#s>!ob-:ߚԋHMlh@z}:[)W7DVP?^YV3=j)گ /opfz0{<0!%9""o 7ʓ9Y"1ojr$j⽪ @Eh(֢*AwWa@iЀr%&ߌH0I`N-,LSCTK><):yd\>58gһe *]hT1,%<17ɌhbIaHf &1EV[~*(\FLr_~?:Do`}1PJQnl 2$aJI[P(#+h P"-hTL9)XuOt#:"[|}bH[ۍ mS:oIdI}r)@cGi{ ZɋQV/v ;zf7u >=L,z-|ŧcsH\+۬L_U^nv EP4HDy4~,%mt-; oR MJElZZ#$tp:x.x3oPL9NCڨB ڈ T>TFkWm͕8Zx-x {g=?߉VfumŹ;YQW,w(;Y9¦ZԊTSƁ߁&ĵW 󡰞yb9!yC3H?7ȴʖPz!ˋJ<OJC Sߗ}cΆr&c! W:lﶅ_^a-Ӝv{*e;%nW\%ew8([%1DX'!eۤ"#o*"k68]G[l&RvIh0A=CV)i46*[| j/{z72>-=ro," 198@NVQFdJאf+32q;0/Vg9 / %QHp`diO9&tK ŀCQ-eiHA($p)#1P pO&H-9s.]3 fkmQ udi 2êA}^aиnzFݠO' Ya)\Cl#IMF'B׀Z=D=0Mkp.xM㣿N '+„l fJySs/n)>W 2>ۀhS1řZRMI=q}^<2˷iyZBSxn{yäzMf ʣ@):ʅx x87 n#=\=Iti2pheψXa 7VYFWAH7sΦw}hPyy6oO I _XQllLԒT܇lg_ImW 3v. 9ػI7Q6T߾$-Od[%]gܖ7HM|!9ge1*Ӓȟ~jb6HJApH{Λ,qzⷑMXo.e)ݭ% ƾ3udL5lŝўzb(۞[50cʸi,BXK[ p-[B8*OA<書˭ݖ7F!`e;O|>%B?oyB/$a ,gR//U9lR;V6NNKsބ*t.eUe:o)iv!9 1q:z)˳6@>Iyңeov^Glߎ;]ԥVYν$=QW*PgG%y9BFN[gbH_Rک .0{fq<- Of-q)F9Eex4$?C"S u!Sa4|6gqbEQ񱹣[|O\AD_pierT.+g#92`#z#﫵g[[!^XbTBE! Y((bqGd .->N2*v/%ηU=FriۯQ,0.](=XCy \\.M}2i+ars_ၩ82! ˮ  ,ݠe>M>ꟑ`wWjcPW"Gz.jsX[3ir)0MmyE|/9 "b?2X98Kky|/^ ȋ$2}}( NAZM'UQP9 ߪӝ㧫^ vMM 'xXпazБR+qGM_:umakF, 5 |~c4IA,OΟvw-U+jD?dLYL}z:ly\ Zmż}ĶNl sJAF`e+P»|[S֊L/ Y2ޟ`otXFtiۏ;ϫmYCI);oL|]r D6S^QEQ"?T-d Ћ7v胒7vY /}rzϼqfJ7,^B@x;'3MViO &spTW`= `Tsz:+ !݇7 !Rq1tM23 zN-ˡd㔪`Z>0CZ!*x5d˥P-0Lr4Ih:^B&%EK}HOvAPI3rQңH}VQ[({bO Rݒ[ 5yDiWm=׻bѱϓzO&NDPF 8nA_ۿF *q'G+h#"7/5>+B3֋͞2Q#IޣʭOuX.bus 3xPP=v%&GVk[f5GwԄjrZ㚝"0K}'N},i)*Wy;_{+j% mŨ wBo~&Z %+ X8p? M;'MB:#BDq6)X"of9-0ܒQr=>xe]yH3n A\m~P?=E !y*~_@ rKR$.&+uti3|J薵d++*%v@,8WQ%:Ɨ X˩^29!N䞕Am9j0k˰pm\ø\)n,j:!Jƛijlbqg0u-xV @)*wXYIQ(vxQP 3z|ӳ.lYEgsحsjbsMuXߣQhm~5)w eS ?Dҁ,IlxBWb =e vLTu1^F p}FWTʆck] D$[DR֊8HB緢 υth?ۻAnI3ez-zG}FYiZ:x&[F TV?a֥>׽\"T~՚'ݦMA?Y ܭqݲ%ͬoXϬdկI//S +)xՉ]wT42.+MbM+vOёv\sa,v}G2١sx2~|ݓa(>l>.e?yux<Ϝ;9iLWe?jky]l 4]qҾ\}c %?/j#w "6TFs~"xLyg;g?^pkS+{r"M^%Y?w73BՓErϫ%Pn%mOw5xՕ\,ot,er.@BcF)?%vp^t(߶Nq>^hw4fN@r5&ITpwXI >KLʸɋ+<7: \z3Ⱒ߼}avx[ Kf-6#$oѤ֋9rANNk1c8W{iN##(7P06QB[c\P*Ƀ~\q7%.3]x}Mʅ9*2>c4\!qM0Mn\9O`RdO:7チs Qk (yŶ^-&a! 7DT >1ʏ6>ר)'N5yy `HGT0ВkzyFgߗ:Ie[ H/uat[-\C#6#3,_xu diZ]DGx1!5&2+!ȳo@]obuV~[h-SoKLbD #E]b^DCAeQ0[4A;GVU={"E&!{esum ==E1_>duѫ$Ԟn>ч&&73.aBx?TMeF厺xǕ G,12,qs ;3.YJ+@ [δ=Cnh^ډ}T gNL(#2͗+a y蹚uU"Ez˝A@<8ؕJ_Db#&/XBtW3pz<ߖ}wq{G>~`Qkk(4'P73m< m`)mn'}ɬ*В{:nɶK`7,x1ԓϾ>HX@lpN$io=ط%羕-qQ.Hq<0nd2ĭ<-CkRYIz7DYoLqpfV!Ue ً*`f԰HݣwSA3N};M(<5ϹۆHJ: S+Le`CQ3wuw_oe>ʌNTN_( 0OM fss%>c6 .0Ľ^hu`̢  -+_SWeSwFDr':M`|SntHfk׺'f.a|-}(p-&C oqnܮt==kSShI/8E+Z!ƖQ1r6i$zq( k^e u~l]%4E\#=LCX F&&oNo"SPJ~^\K|\ثst@ EdhFpmC9i?*5 !ܳ/x}3thVF5N49R[@*rJ3 T͎wMgQz#%uWyc &"N>PG0hf wQE.d \?r/b6Ngt=FTƩ{lkKd|?f O&#,< oؑW|&˗+I3\ۀqK|/8i *bca=dTKi2#hkL:'`kQ!Ao X-v}@M&WE~˚Wt-36y;f+=V(yj%(qAJF ӂ8:( 冷+k3R*YJ9# J.[E?]d)=iѿxA_n@dUKvžko{1I"cs]<;F,eDK` zwT71fJ Z RqA0+1sKvޡ/,r0!p$xX7{hG<@%ed% 8g<;y/F']L$8u{!ܓ{FNM `_>fQ rV /')O1M~vN!Zu0^a3^cwwQs0ң0%5G%v&9έ &>">ɜ5JK/[5zk^-䛺ɡ/0B`̧r% sf hQumon@Lqd@ E Q^o{h endstream endobj 227 0 obj << /Length1 1654 /Length2 17208 /Length3 0 /Length 18325 /Filter /FlateDecode >> stream xڴeT\ݶ-S[pwpp'X %=xpwNpwwskժV>1m-*2u&1 '3#`är7u3r"PQIMA6N ?d P6}~x!Pd@jx;,T@Lfnf##D7LřL +M(ؘ݀ethG>_7 Xp0qM?A w7?hA/@mreǝz玙:m;9ٸewlŔ䤥5>ȤGf$6>NH-$>X!mG@N,<}_o-ܝY4m\܁r!fX@ܚof `ij~\|L=;}1} ] /KTSjh Z"(9>A9ZJ@l:{xhrUrru0Bdn ~h_)@n?f`bceۇ$nnL6 `QRTh7)Gs' G+;7C\\_M[Q pv,\&'ךtuwB*?Zc5Y\6S(\mY?|hqq'/_&N^;;㇍b54G+{@/9Ҽ@mjcxiTT4IΧx6B<ɜ-rHApS'Y~À`"Pl UmB)l-f͐Aett9ķ4G$:O_B{βrZD#h'\j?b~5[1 ݉r 0ręI!yu(hjq$gVKw:Q2t}*o㢳i7YEjhO/>jݝ6~֑/bǪO}AP q_.x܍%r#U '2Kc}{bRS4Uk\ vǗdUpY)34{Բt!Y\-Zvגl5K?W'\Ŏ 'd]:˗_ 4ZVBp(dN:ʜ $(p AYC5A۶Ez?J#uE:D:CݧmU s0h}?0>iRؔ6'9%҇ tx-NfݲΎ'x5/>nnnA áfGNQ =f#|r XSN)#i&$C:hRߟgnBt& &<2+<}/?xRv8B.PZW9$ȁ!^~)M,tS\1hR!PZTn1QXc, 0H ^JQwr u#E4n;*Kg!p92wѰשׂz"W%/I(uJ Bs Yd$(c71*A4UKr`GSSS&_!5gQK]E͹c:1RهncVeƄI-F|Uq3vw3X&a@4Vhq6uoRQ`*V~7 %ȩc:܀K=*nV %KXYo2 iAOsSR?& 7"OUfǦNnjBc×وpn;7ML,?(MsR(CQCF1 `ɇ0=-.#PG7uӦa[ +.~fFa>^>q,V1zyb5{z:k(oA+edNS{nSg(lp:=Bɤ&$$ ʞPD_Qf|SJcIlq˨jSh 59Ѡ|/E|>U044:kFGر=̰ }hO,8Nm &_h7Ĥ[1~Qo!ćoD$\pPG1bI| CI5<;{T)'v:7,d OS0+ 1@"W^LISc|0*Ĩ{r^6)Lzl̹bATlHL Z ?v5!"wȻAB`\ J'FVncNLg=}|?-.ԹO,zHxCæ'+em pDs W Np^2#ũW> 8/H#tO uq?~ Z:VRlEk_Z> ˾u: F 2a  x$T'k- ygiQ'21<9-#SFp: &aC[MlRS]x퍾U7;@y!Y>' %@BClRI8֫ |k9_LZj :Zy,aa3<8锃粥-v2D73"GD]ǃ"z8k~-?FRwd4/甦T']M kf8$%ckvxQXTRS[ K[Ӫt{rz?6P=s3?%jnN fY˰g_1u;ւ3N^g B8^S#% D~6.-wWV,Ų7H,WpQ[֣B_(m]5Ke8՟kǙۤǡ}}m&Zj7_*- mm^4}Enk%,` ""4T1Gˮor@%3/̩+/ )xs=bw+cNq[Y{fSo,V6)Y20̋lɐ 7ͅdq,f7 dWB ZS\Gc'4Jh'5':n=m)pƘ1h$ G W[aRqB7VfHjc`X@n6TkHY"fܯ #fq't,3O8j@C'F4`+V[qW%0QIͻV6-e}p5?RL;%i9sʥ޷9YUh[BpԺ#.~?WB2ݻopp.IW N'a{"d#B!UʡƕUQk9PJp[kaHf.4fjVr&o\=;E$}}RL?+&XlJɁZcw.Zm Q>z k\g7S#9`GM%oaߴ(TrA91B E[$S!OoT;DIh&f":i2rQF_;G''e|Y{hjP&W}-;W0IcUK=/ꏝVg5k9sh/:CyBQ+KRX<8w{żF 5G"+%m"ޟy[ootYF#J o|уiJ[M|(zR`m\6ZQ9(LΔ f)5Ab]z3uQصy{t+bgG [~ݗP?EB=Zl-\ f(SiHwS3m lYf{{,Ey#E>ڀ1Zibt*m#?'cA#lE@ؔ٩wL"=]K:  ضc51ȅAH"TjؠrOCaq*xAYTR!Omzb P\]t.Z 0&yUHHğFYB 7'HxN.1Z5/ uQw%e9z\YSL4 :Rx^gʹ;} ~L&J/61r >&z,}C6ҽD_ɥRՅ%-v r n(ʙդ(AڌS|w= >ׄeX E|!b;=S+kh,wc1D}J#|H~[0xjG!T_[=mWn)+X`Dlkg>4"nC⿇OѪa徹;#9'=A'EkHpQqpusH&?WH/+}Q[I΅Ʌ}zt1A=,EBZJ{+Ix?Gz2oVxrP]&u.Pzķ)["ĆaΊPH"T.+P̂ NԲяVaF2Yjw*a$[? M}T)?B~ γUҜGJSh/ZQ ܦlpM^MD)l={_U{^@#/)C*gb咮b /xLX(2ivco鬏@Xi+T~3qk"_wl/M queHE)o/RZBUX^`XجPtW*!;0l'jhiD3TICFj ۔#ҸK(qV"fܮab1G219Q_7kķrXDvZ <=VUs6eR)2芌éEiߵ zj-TwpW,]Mc y"zڊ)F 6|"g`Mũq>2xQĖI3~siDLn-+_j 4;o#X/Y^l'g:?l.Gو)8 F8am公A1E_5 #u9~'o. hLcl|M3LyOcz{iPG&IKũ2ЫS9B@/i-KdzdX{KJBr@Z[uKU}%jتF:Ə!_mB*\Ee&*&jߌfF?G9ṗ`#xu^EhLO#ჺRng9 ,̰JQ[s%kOP OЁo'ٗT16dp")taRt lRQ~Y{/k?ZݦgLBP16'}<'+MJÅZOjBG~WoQ 2p"Z#fxY&sz#GRA.̉(JT Ŭ)[3< 2(î*y{B1j&]f0ɂ*uԂp VNiAར!m&?vfx4;*G1F8pF$[u+=j|pDjy.(0UVp}kY) e|}!BbB{GU4o3P1?Fs`-ø1r'qKp|= ¾*da#o m̥iB04 \-ךR<⊚$y>چ0>Oח֡T )U#:۴'.xUՏ+@׌>x[h>m2>=zbV=Y瓸 W0g~0YΞ5:;%d=glw4*`'77w@ޭr21=bX?z8Q*WJE^6K%alHs#\gGqZs`K4[q^!Xk3|?)h 7Fm5\lD ָJ{{\φR 'bJKe47#0dϚޭ0sWm Oѕf˸:Ws~#gy٘F:7Fc&rE-ay.[:trAISݮd)ؗ1IɖxJD%a:-`k*6\''M‹4fAYN26gm°"LET4:~R &,@T +fιOy1X(}S>k6=Ɛsjtss"ILKqxa1#HJԋ!1I7t7fJ}ldq>iIb1*1_v]6r3ֆzeK\`lVwg~M,[&]6\-+2$C"l2"9:,{kj?ſ œ7(('燢'@gmޫMp$u{ K<*F/K״-a\o! _F{bQc$O`b˅Q&T2n~wVj7Jk, ݐ8c޿*(˺awɒ6X\FE{*rA _^Y4?^^y5ڜJ8~U 4븁Rn)p2~\hIJW>Wbo)2GFČY:ざoM9MJS_4> 03+F|E|&6XO_[&_>%b@ !0FZ0"wi/^ujE޳ fu /F!n?f64l6`Ua-*}jԆwiP>KS0yk G4Azc(͍Ii/gU_?#hj,Q55 W Ѧx@ -ـCn6qnk u}=u_ T[mE,)(Y3 z-SQ6/;YFhizrNKʹJ˵\g%s[V=3((ȫPu2 5H frX Z0]wRQZM 65ؗRY/3na Z] O4낊4g#YV*?mmAˎcL1PqÊ~;H&k4:׽ܡ^#V®||0GinAĬ5\~Z/ůIux7Y6f VZ΋,ƣ>6.IWGuύ m#_ebTCYC7F(4-dB!la1 SMb_yu `bLN+c}t mpԱ-Z؇ӳ>y&v:\ q4'WL4 k7ᕖ\VOW4=Zy{Q=VgK %JoL}uO>Uͦ|_{0qQOJWu &XKBf;74#c/ MX-k>t:"\']7mt Zr^f!׽% TyHpk4zu?|s51M I N,M=X*_XU%Ռ!l5ri6YA+Ο0 wN=jR r?=+܅< M؋? w|kQG%-^ClI ^Ub7B,tb_^1W/[x^=51LPg`R~?S \iUy IlcmApp VH Чץ@cDƖd8"k&z[zN Cf+v/xGK3MC 'ML$mg·w>U*E'; G^m9dueq/rϊώA3ŭI7e5 OY;e#'A; #g-ϛ])!Xc[}2U8iGS,v(k6l8+̶|8z*ZyXgxΰ J,gyB;=pciMŵ܀Z.㫬Cy7/vj)/)"uGdߪ~אkaPt&Q5B0SM<(yv `8VR=c֊HwIr^sEV-ꩮwIQ茦]ݕ:CeKad}|ʗhjn:ƊoTǒ 72xvUcp&B?!,ɘ\rq"Th97=gTwu74;S..*^Fox%+u$V_#SQAAh`0KL- tp-0 r eC4dw=25w sn`;:k˔KTOh\^ u3"S%zIcOx 3 b6kf Ijt4z 7DSp `@Kf\,(4by ղ);ATb4I}p WbGoяG|oSi?ܭz\Pjcڽઝ_u TV6CEܶxnk{X_4 ;YNvd+/@͸%WB'9h:2x+*ƾ1Ȅ3#Ysh~Tz*)f1y-ۯN!y/]ę^e@BUyawH!x ͰD -S`E׾\J\dҏ)&lB-miB![ASZmn=o$}@^]0abv8^'TWjض f]*Kb/j{E>xOѐ~e6RTA_J)'a[I1I-)Ǟ`{Ryf  6yT #/$|_} #MCTIC qh4\XYv1"WaC-h>S\%^p`|bD>+/;AX+ZQg[ 8>VZ2ŭg仆g1ds]Q"g535)5˟k\˷(RJ% o_xt$mK6Ņhc˿B gb.)3Le50¥Mžc yq4%D֡"*jwEOݧz<ˡI;wP8c4,?763DFQٿi}?:%*n|R^5XUº)M8>3rawǩ7s/| #,<5?i,X&&Ze~d|,Ui6VR9 m <^lvJ]Ep<#e&k QPE "Wb~ܾTvhͽ9$_t}f;v(+<_OЉXvd(b~ 5*د)Wv=jVȹx6Hf|m d <<tObBN-ɿc&ДӘ7,şdY^K. P$ѴY@/=sE JrL}&V7x[n:S1YC~G%w F%kM-B閞%ecJUg}U;Co,TJ߅𥺂9p;<*2g- >!W6Et #OGOPowǤċG$]WfQ6VI PVA•/uQ+1;Bwp|4;.i>5"G䤟T$gVV޿jpxo؁yQ5"HϪ;y g$׻y`lVRBv_V?7Auܝ7ެd'erSfl%mxLPLZWrol~}mNݓ pS&,Z7`1JDBuKc㳃~YJzͨf.gn q(Of֝T[Fb+1˪vaxq8FzZJ*'IO/B}zq!j-aIl&d&_ y}1ld_ߚ&aj$H$s(/g,ŕ+mvXט+dr'hHbDq't=z0% ZBAr2/o.hMP~T/X/*~9#D3E`#1eck hq0_)*EՐ.S^`Ȩ/0шZ\oNyihl7NwۮYՃڌz5l ηVi_JT!FhAs?K d &&)ya0$8 ;b _A,{K0"J UR'x!T7A5{k1+>g}TD:=;Ubwle~wt[%᮷} qqp?]*,7#V,˫{ևsE:P:)"<&{R~[ap'XpNjsG{!"Oh}q\A馈2UHAg{K)6ѼbUk!R}uOA{29mCxҒ$_jcl714o¢>gϦ@fha$dAkc4qqI?mSGͯUOb Nq4PRš yJr $dtzWq$֝QZɤDpJ>tiձm5jDN v9t,Hw̸73&Ѹ 6>_ s5}_%hN:A`Q>B2@|Ƣ;Q% +_*c烉 hXD+c:/[9VÅn9;ch7jGFQWD|O^sMYl(?7s3tG8/v5 Έ]ɔo-$SgxiQ PF ǚ[-DiŇR(.J70KGau8hYe]˼؈Q4S6g*sωONUEy . t.ꄃ)?nąK%vlYfwQM];nMgOdNjesdT}h2ywsBe%^kWNUD. mQe_ؘXڬk2`yd1{wܳ8Mk3 eZ skʣe@ C9ukl`цFXzL27i\A5QќyacP)tLq=Jz!Q<RL]w4͵%3 a~%ZJzCnְq ڏUX 4lWLovw<@ uzKvhoDԦXaN{ 4mS;qZ>\uVit Ld~/n+WEͿs 7U0<'F-En$=4:{VO-"۪Cs_fS '}qqls^&Í\+ _)z:\t5@-(M.Dzझt>7Zyֹ+befA_&Q.sEs*˛Gw%{ q d^lkv sW-n5F㍁KȌ,">eoU9{Xpm~Н\͠~6ǒpYj \}c6ᜣ8Reo8?aeQˏ!-7 %e3iÚDpĦߦ_Jzrd3MQlaw;S_:m*0ۖ{V N) 2~e‰z.:"%E`^&lY٠3n2ۜB8Õ1~$a[K G @: 9/r~@o|9n? '$'4CY$~'n l \x_3hT0wVX̽`QZM-K*">l4_kJH8#Wb$9c^y"NUG?DKZg(jAf&AlʼnF䄙cZknHx[?}_ <.)dž &U kh2 FV,%Y{t> stream xڵePڲ6kgpgp%Ak,8/s9^=USCI(b`t12TYYe@ƶV6&JJ1g1^p,JwWg /%@ ht~WL< @#@cPvp1V@w1GOg+ KП쌌"e8XL LEw`458ԁ 5 U5-{`5WGG"!QT5Rj^Ձ-?y +H(K29tvQ3ڻ_ 4 #3; ق/~V.wg3Wa\ ϥL.?N+K.{!@bmp#_;c+{d ru%{ͨ&:;ɡ*_'dk7flM]\@.G̭lػ3+d "2jgϨ^{&/?D<,\V^{J؛9ٽvAS>q:=O_;{_ʛ:2k[9e]o`@SK??b?2z;:8̍m]V7oc7 ?+7ÂWt{sw&kPiߧ`4G`Vt7?s_$]mm4]3O2!Klgl_:+I+-;-FXپKޚ6@_*{9 `VӐTTR?󗙄 `llo6z1f&{л  0wpFs\f?Y߈,o `f0Kqx̪F'}i/ |wxjh 뻃?{fw@?{<9߉lMT^@&?{/rvjYDl>f~uf0;ןp+o",;X6INV@S2VjC-eL|& 6dR9K%ڗhS`۾$UOܘ)Hj2ie*,TtrLgƷ4FxۻЯR>UCϲ~rXD X=`~1Y1*qR,ErFXDzWz%hܨ'ٟ6Ke% ƍy`ށ>wWSkz-zdmi`[ZZ {T"DMtf>p^ѹ"@[cLQ8laX8_ՠF 1 TyawL4V8zPԜ_#K1!cX45e\*fDRE\X *2'ez İׁs;/eff!zR-)̂^77c}{i#fn^^[+{\`jT{rS=27E ?D?TRP-V㸉,!N0RymG*Pۡg"`xWVEI [6Z{9b9^H=P.P}Fj )Q k {n0BDͦrպ搞hCx[w$FtU=A} #r(FF܉؁-ԍ`~I[ KdpC@ /es8>3n6"nݿJazM|vb&SAH )?RC}tJb:,y+]S7$ޘ TWWcydo2HFOsC=/]w(8.$?4璚Wܘۨ -gD[yz^uk9Զ(- {cUR,8j;3`^0f^u%ph"exMA5 j y7 y@6h0~87ܾ&FQ|hu`nkV$sHf=zH-gE%ݵoj!9QW*̯V698V WAJW3-SiD(E^ Rwc-~p!\*>꧘{X a,)lGuUR .V"n_;s\6*ķF0dc?i/! `(ۤIEqO_ysK)*/KX# "7FT0r1W{Ik1|&88M;kmᢏZX_nof}f0[i~j6|.6m2M~$Ԕ*BzKdJ|}6FV.D brQ J܎wK3ӃZ\ysC5}UQU1yh=K i[`{?*1 vp0 kV] D L4;wCE빈5Zœ]KvW"LVSgt1OtEb&ɛH"tFv.9$(@O\3ԸAl)hT- w'Ž;-R)9CO DS G5Agzފ(B#b+WWofA"}-`6;^@: ?9E!:6<p,dx]e=% 2\=WGc 0 }*-}:Z0? uhE"..5E¿h 3P\A!H(:i&)F tL60gY*M. I@ y=Қ1#_$ԂK?Fؓfbe~{Y[(25zvK[g7'c]H(}BoӀ#=рrӔ,\̕&m aɠuO Gf0milCP_6#Q&'wշ%rhxQTBV}(*PDv0&q i'DlD@uY,C$A||+<jf6א[q#7OKR\3wݽsG` Z*.lxY4ޛĮ>23z{}'<{U&ݶ7eCeuogiЏ$- zIQP#*"T#⋙jtocYGv6L@[c+ YTZ}:)з;Jc7&O!@@I3j胜n9&F~& 7fU[k(=J洲$6 ćZv }r|G B޵QXClpJfd&-Ewe<ʞ}Sa6G=g7ȵ ZEu3%)ML{1թΫ_wJ<\p.I!|j)V)^geLHCLM67V߃ x?oR.tE!_hբy_$,+"1"5ѝy)׹)Nm)7MSiEz3%ԙM\r3"Z_uo7R֕&L: 6}gIңnέ˻g3BJtƷU+;aԬ{c;E ̨/\9A X8cnU"cc/1|u#Bk+q f@](o6ĜK-zßJN wnY"ͳs<6~B|NXnJM_ސ]CڮD1^fC3:(Bm1Lגo'i4}eN=?q¼J!ؒ9^u#"0͔t1>+kYYL,$g|f?bAlW%'rԲǓ$'4 P_/n9 G}1tH+3z$)iTZ~"Pq:ӓ!632L}QGHCP#[DԒRg ,Nglho0Tyn_H#f;Fuq(~vhr $q@W:H.cV,#Pr 5JqkM{>&c/OiC7R8GFV%A|3 ɗ#%:Kt3R>X6X2L "_rZ3%7EP 8#C(/2L< ֺVF&I5>hg7yzƮ'>v`Pܸdby-3`V*& ?w"Rff/瀾ğ$Q>^NfG)uj 4}<G8>y4\mV5uG2@ԋ!ߵ-L bfiXD/9 ُ C?Ri^yť8dV8V.^L<P##s/t?Xbdb0&˙jR LS/6CCvk:fD+K3yBɓ780#)<2y* yz*c[l3>dH#)Uy5%֣{Y)o)&/:SDc vC+:[!\*0YLfKG%Rp "ΐ(wq퇆d.$k⭓ҥVU5U:/2Z?3]z'5l]>_}X ikO.A[Kҡ/GïyIѩڔ auAY"a?H$Unrk0^𷀕1jro)G ?Wðfj>_,6za+.ēBQ'^Sm5&|{)cA: "߷7pU[#⳴#Sem^ &:،qoߋxeP`B$1٪TB2"81D#ZBb.-DZ"Sg nǞQ5ePj rj.Z:~,ړNSoӘf)h! tXm)#F'\.x$kVOz߇ȚB^R/BGE/Fp} = b'%iPϭV\E%0ӏ˫5[|U4.e` K4lw QNxƹAkcD&)BJ JC5qu편NFqA5dmWˁp)=JWN!(x VQ2DU!C7kiז/ Ѫz :GQ(!yB&  N~\K h"ؘk#B3XT/ ܔz˔TM=#1 < EU6SI&!J5#j)5abVlagY(ԚYF3[w1mtxcX$x=}9wdg &&^MvY=qP ZOTׯkmH{vwj %ozuuyih,wEWvH5_:89񉃤3GB̎GXO/ 5 W2e'෻#yӕ|~+ƶKy~w5'6):iL"9O4Sl;_d%]^hR 6ߓ&~"  XۯAF@!@3r896|>rvT% \QDm, Al?(A XGJh* * b岖,Ljs-T9 r:ZIeۢ7`"vLFo`kC'WYRG5\cwSu~vάyp MSu06^Rl4>HE^=j2+ss8ӸiA Ais&&kn"Ϸ(YOGxu UL)=4nvw{'P %_(JU`PG4Ch;qbA'ǩSlY},%o(&DdyJ77 -k,^T\b|zqr/W|.3&K|S$8*Ѩ5L锆b/b@eElP!04/e sk fG*.Eٲ-+"Ч^y񞓦P9#ڡt9}5izgTZs-'1y/=(k-"<\ [fEX%>~9IDlx',"?Lm?&e:V\]3qhԧ#JVLLD2*"Q-Y|Uݳ18u˄Eށjx-o9NtaĩJ>?<gˋ>d Fhjr,[DS8=r#csaPM))l%-7Ŧegk靲dd:CjU.45'^W]XwUt rG 'dP*T@ 9(e`LW>/+G,bR_ *9knK=;w: bͰD>d3>.h4/Q(|t?g xRwuQzDH R쉆{NArƷdCtlט\m9َj׶5|VC\K3ZCgΎP:=mq*diCU/[ I_]j2YRYM;^`ƍ}bL{BR^JDl_z 9o7Vazm_ljvh>M.;bw7wq>b!߇3ܘ!,lv3c{f %:HS@|kZo38yz&lT4 QU!+B#TsYj} dtPo' T}M>`2'2S-{^|3(i6̟XXOzhV:f=IUPm9C̷G;4bf2y#3{ 䜽gXF0ۉqM6MDI֯蕅lGPVɥ;NY#bq(hZXqvQ)^H&=Kh}[NaM YsQUGc2X78ؚW7c[OYRTC|$‰Fδ,~qE庝Ge^Pa.|M hg[3,#^>bV^ȀU#?j9*uc E|hH#3E*79 I ڋ:= NkRPP%H4Šș?Ec4x})<8_oI[!-}53LdZ3cv{R̰(N^O??G8,\{G5cF9=fҩ$?H YqOR!so^y=CSiK 绣 (^! 7 `8r].^IM6 9Dhd30զ#6qmIo "2en:wZ %yXȄ2.J& n+bS3u-}H9{VS=sT[@XMvoz+RhdkHP>8YƋۂ `7 yQDo=/m5FnxzMKbsTD奵foᶃ.%ۘ"<lDZ?ڏ0pwؘ l:T3v$%b#,eL֤R- 57.r$GeS$vXMl }QBzz_ܨx>o#rE`*b/unW) HnڈKKQ(9㿆YolŞN%ʫ /8j$h@@P<{_&bĂ3wt!)5Ag7VMJv0heK6Wa3&],Fyn q-o}5`YN; :8 =8ue៸CjFd6V/D8IyY?j4k{u|6{Ԑ3땥ܞXZjzʚ+aPsᘙ`"OĻxM9l6B'kYg:-pN󤚃y~6&iQe*) {h@=CV1q&kOROc5t[KAH)U`֗G"^'`tF@0#IFō]J)0*,S }Hj'K K"/igkIsڶr_ Qu1@lӨEQ`s`^ m;˸rpܢ[OfpQgt;9LZ ^67pihI$!*HH熩kq&i6t!RܮHNJoF!(ء/YJ{GTX?4l^t#pa,f"#*%c|$Tej0^'b=x#%A<2R& (|]DF;d:̱);i$cqՑA]=,T4^Y7k_+tTU;~n(,cg?ƭZ`,~M *XBnVT+_&]qW/N%LlqrYy]Fsao`FJ$,ҧ'US6> K5 |P|F3#ܚW~~: ȷ&3_8.\&PqRY^DhZh rbJ(fCO^BJή]&zǗVYQi-P>H-fו qޘܵP+n@:@mU׈_zPdfmԚJf*kKc@eJ#ZsJ#5QV/p[ >}QKHbg jq]}"#BxBoЫA~閸uI0L\ܙ23I<[+ﱁ [Nsf^+G&׵nL'Ϙ9tC͒֟>wu||,1p"A)xNlmekaF}1GB`Зlo҄CJDpJY~t,P''gzaFϩ#Tm)s6z5AX s&"_|N< q:'s%*2PԺӸV1s{)jlSf`#V4_LL"w}ؾv1b 7w1:uUܫ8k=Dfάer t `G6 ǃShXRvߔ H*Cz u0f B:`@x,QCfy햖I]]p-[W(uE"lHTi@pͩ-^m`A4(:ǹO s%X*f=6.ǬfJ`d?(bop҉Jm[(qQBsQbJ Z畍1}>?PJdžl%hAʦjI X&;ɇZ-1ng(1Ϭy"<<ܞC > wwmC0G7#9HS_^dii&55-IKC+!&#vu{r(DW0?72Bz_ptY IxFL jȎ?;|^oXN_].X}NI K`rYߒ{rC p'm$t>)Zx-א1f&lU{ZT}ܭ kuЛ BnX"o-Y!Wd_py]>9U/w}"ҢCxk1W#|!As:JMwYfjtj,ho7C%Σgxg<ҟtx"iā+8ډ?+L@|%(+{B'UpwF2o+='x86Ml]H/}D\*lg?m;84<`n/Shc^KÚ:Pؗ.j`TK;dyNO.Ji\Y$OWF0lbR7R4ڶ$ }.[塹. < Ӯ!͉In%L8V#fET5kvIi.oH0[\"TJZB#;npZ}a.NT ag/7n`.:K?"ݪK*%#:Kh`3Ji3H5fP5XljK$xgWx|iG YExJ2'wFGHʩsS(ZydI*iiq_ONO.muB$5t M*_=DTb| q^ ߟZxJKU[E:%Yq:8;aBq5707j^v>xwJ3Ƞ}CE k â̏د֛嬝Sv93*Me֢U]Xj3~+Z#m\6tl}4;8pˮ=L9s[R980H7cbmq-}-0<5 К> stream xڵTyl LMLpzzNoj -HEOGo֢@0dzʴg+;YsT.#Žj*6C`4Es{lg%{'-";\/ $l)TUc£,¼|yM KYȽ :w`^\]Y[>GVNneZ!;лNxYuIn%u=򣇳;'UvZ<>(-c"߿Y؝MqsvMD. ('?њЅ:%}ԡAjӰTcC.ٻl@P ht%j1W+E-/M^4:&,u=rӤkjYw\{?V'~`8Sө'Qw7mN#}ca]@_ztSȴ[R6AzEm[ϓ[9)U- 7 Z=r4K6'5t_8^OlSlNˬlb 'I_"e_v'[KEΉԒt:TE(-1'<ҋ A@YJKm /ءL!#}]R]҉<ً bvKK{Zw A%r1 'p]/BSvv\]QLv~ǃ*ۖ(k~p7B[j]ez˂WzD{\IF5U?/ڮ_ۍ7r[z^F=?%7`櫹^Q5>kf|ڱ)Ty\lGJ rLD]\pɒ;kKjmӄݺI n;[`i$;3c7zel6^KP//Lj}{9?t`Cq0ElK E{nʗFXWITxëJ[W+]^Y*{ЃQ '+_dAjLYlf7? !ѱeVT+1n|1]f_T!'tm>0]hlPȅrPI*[Z&cwywV]%jOH튉jopy@;#+>F7Jܑ9;0-R8Z5j'-枨-J9VnvaEqEߗ$GwzYu_2r,Nj`0,_bQ-0W.]{a[7p-~jqoNr6> stream xڴeT\[ !Xp/5 ]Cpwww\~5Nͥs:uXQ^(aoLuff1001FΖvbF@3 7<9@hM9@ePwr76rzW-. Ks g{ Vz?x0dLݜ-Fv9ۻ@eo0Z٘@M @RYAMQ==IĄU@u:OU1dNW}n]N\UXUKQW OF 7wW3_ T{> fF? ?;@v?{ZCvymjwY8;5wjwjNwmN&rx_?;G!zo<Pk0ݑwW 8쭁43rY0w+""^llzmde>j_ h`odR+?U EpZ))6Ն-C,hO(M +$°yhN1Ui+狇$.<Π._AB}$U6BP=n|bxCJ"-mY˅r>܈侄ێ4i#L3k9*݉Mu_Pҭ Tk'|uaC8R|,θA^jD25+Hf)(~ +p;-vExay &rEDVp zẐ"r^cJ:a7^{PgTZxYW=('!!G41tݎbVsfSy8Is'e.R%^҂!=`OPc><`S~Gӛ1-+F2#ʉRVſer%fyOe2^ ݍ؎Rywb=pɰRp u֍l0܂BEp3!"W~p([\Ɛ㜅f# -&BC<҄2mָ/s4OLY0>uVaq&1vT.P!"p&&h6pW鐽_pl?V D,V380!2L@-zʝAEz,BY:CLL=2I[%'-YG4శ ƚ!dxLɴ:yr" F2ۊk1UO*"&wo ^mvrPcBO5gUz"4wy:sŵҴ"f*$ Zمqdۧ;['w6oѼYr[[>u7:U@A[,?V_0! UI5[d.HMx[in:Q=LӉ^knv}jx-HləfB{EKA%i?ߗizZz7m$.x:Z3*џJ7w&,hdRT+- s~Fd8)B ᥯[0edyp% S5U$z^' _IRrf Z\P1\:¹5:nOdE.~A=]Mibp` $G'k&v56^o^Bx^uz\PQE/ʼnf?KeOaSk6|N #0SsYyVIc`y%e8ys8^@ʕwR3F(?O|]āk-:±(vXҁEVFF%Qz]X 5K4!j}!g d)v襼?L8].}wĻ6dMuD.z`_)L@/ ]L=\& Lar(ze7]v1`\QIzfB3s=L'V3g&6 fp+-s|Mkv<|o_(-Isgx-d;W6Lrj "Aj*4ӍTv \0Zf|1VY3o"L藡 I xk= $M`y#fľeSԽ{lo\>۱;iC6_h*,[f;҆k!Zm µ_F%ݫDjӋչBi}8Mɾ{}cT]-!ጰۉ"/p@XN13ij6ZdU=oHN|J%Ta䪿hRnM a ,|wY(5-&R2Kcyeb6qrNZ2 .'tAܗ3D HʅYzv}U t4ָZ57qܑD /R~dSdOהl͙eSYRi,ci9&#sΔK4Y. Aw_wgٯ8ğU wr`'__D&H S8tSH' >o7>j 05#g+@P0ձ-:3Zjp>68aXHj\he)!9g 43N"pxj/ڒI]Ҿ&lqMm}>%?l3dW)ym*PVSW\ K) TKA8՚Rw/2{o^Id4MؑN7E3 \2̟Î?)Z H$nW.x!ZqNbD>tξ?Mqa,N~"0xlRRcjBX-Ă 1uX]l^H? à*-pT&}S_KVK}~d&B+XA4x^*%XդRh9) >YDF@}H4]?EOjQHRǃJGln+8\trwyArjJ^?綾!4Y#\U~siR7?W&];&#i*yxfû0 }@ r‘L dx㙂G\DS*fr=W`=a} ͉2lQ}Nԭ0R&1}Z@'g !` yIvzHi  :x{iqʪmzLH+jʢ Pf-,q )xyWcgLcj:w3& z*IKx( `*Ru;4"ۃa^O2ߤbX٧$3%_r(ã|6"wHŵ{R[r3 ]7\A`Z ǏւNYK酊OCUvbnTw9D_ט?ROX$ f0?,~-0%Ym(A|R"J.yy\Bmlģt,DOUZ*(es[,T2bJ%2BLi13ꪸ =CߨtF[)< Oh+B墸Jh5^/GL ITI̴-%+åg:{^CXw0!MN@z_xiA Ә\}}Pڐ~yB/(9̗o|_ᠳl7p5zv5 Hs: gT!%/mϳtuTÁ b Y0`ETgXua|Jw}ceBUj7⣁ Lvy|] 9X>}R?9AL+I(g*x G*mk]>S %bee}F~Su/aN]>׳btm?wh=w' klX]C*p!7tΡt<~fPUۗY<띊%FG?9oA:AHR8eГcVS :DQ!Y'OF=Er"^R[Ds^=ו0gZAO_*[-`2羺<ybU=l!&5⫯ H&Q, dGH=த<yljF]V.aVЎ4 d Y~;[GLP͊kΑ8py"n]Or/"H1*ɕ7fn-I %Izi: *=ENwf: 9ge L)IRېU_g^9F1@)Ъ[!O2` #LqL ~<* b4w d.`BfM&mUYjECo᫣xf,g[FHqan&p0YsN2J#^dzPqV}f>7ДDUQE A%e%׫68,9ɀHPE՛~,v;  ~"P.VZ-{>XGj`}h>_3 mhYSE^:Yߦh{-dUUi4ԟ敘 "s]nd^76KSHv7[U~ΐꑵWcZk}DqĐD[k^E O7ӥ83f˷KRa:J58Kv./-Vae 1ys;ve7X֗J\-S&hXf9 Ff2Pio&4(TKpK[d\GIJzxtF -cz淦uEB#sʅ\"v|d(JWx63{u,aT?$F~ @b*fPعcۻp\W窿$T)MMD8!KbM*(5iͪ!'VXzϏ1;:G892>ɇ$6hԎQ jQЖ))j( %R@0P0CoEW^wVpSY.~[y$2sE4"S߇k9Mߙ`n [ܞr8i/|_v ЋEQB&CcoϬc [L~HKa0(nz_gﯣ=5[/ui (TS7ix\W2D9J̗kkw^U:גYPs -E{K} P/#F-h~FZuBE= |*5 ^;ͅɄ H.8`6;}2H,, @,kWKªak.ZHF .5 k>3\; \~%*#^0`44aJAx^迈&ތ I{HܨIġj>oKma CG^k\?-NiX 9dVyTÝFaG͢a0gEᲗ\:!u>u1U6G#ډ8 SW7=cNw8:3 YR^58H8d&םܭaݧvfo0>:,9j9[b[Og?yqj7 Nͯ+]]"á952Ԑ%j̰HvU0mr_X K\0".=/ؗNm|$bb(hSBdA ϳ^-j@%.t+4j(0Ak5J.-x&JF Žr(/彂G>g6`fGߧF `>t(4mNnZ"eZ)wsVq]l~BF)>NaCӱerkBb7%i/F}i@V] p"뾯rŏhݡC< 4x& . UJGx[X =r؉^` ݿ oZ^PeZE,-mQavN/R:.EejѵMDrԈNWmӭ@US/e2A{?UZ 0\t_8q.9~[AU}Udt>,)ëiq~ bv.4GOaE"Nq+(4BG(aPzh oZͅ CcWؕqTI@enqŞ8hW?Y=EpHT{"N[R؁/#1(-PQH{/ʺ%6{$jR"5eJAb?~`V)A붊nD-!x%*ܛzv:|7 q4{FuI۔@ֳ+h%AXAz,e;^ˑDxNRrlOss&+f1ϫZWMՅ/̒M.]JOm9t:] vgy`6kՠKvߓ ӒFRdSv֦Tmz,5kg-.7Wf>+(SPs(3U}בX(t + eW(ڑi|pu~u 'U7{R2u-3R<8DZEjtKp2sJBϬN 3{;sҩ a6P5諼ߌ~o?m|$TW=BY(ut6DƇ9sM g|}8;EIY404ZGAN0qmSs,pM>&83ėj%-مm#. Th [Ń1 4ś~g{38n{k+Wsu/7 dQy)A%NtmJ=t }0SZHжFҕh>-|ދg9żHIgd.f\86@x|<ہl M%o,S8b3U|AY̱A5^A,0K["Hw}EMQBe/1^ޓQ`5^2Q Fq2~a^I y.Ih@GR.,3{Ggq7r=̶mhK)d;}'! шR2 S){JS6r0 Sѹ_1p(_M^TK*! ђˍH>R! {QѲ f7_+/HOSwF@T|՛lJpC>Tl-X_-;.x'-.r#i:P)ٓF.Te<:v"@t1 FA}d\|&tc&.ⱳWDY@Z" !R[{%(;6u0f&ƺKbDҭ-<ҧ6V#$ݫ'_5l #-Ax̲eQoo7|}פC%)7^A$|!ǚL&r&O/ʚgW#/uH%\bMܥt=YZ\;ٿwnOĉoqeE2Y .š/{3򱴘_qiuZ#1s'/>bznĵ吣Ko 32ǴTa=Jb46evuoH['K=(/9M/L0dQHsd5k.4r~'_ŧq"29d/'yy%&P/Ӧ g9oKtSA ɿkYc沓VyB[S ݉#YyI?Ń;h Tv*-imtOV^D9U ńtYy7OlsuCrOR"+WpD{Ol;nQxȗ M^.!qGBU 5w FγvWn*9lK,ֈnş~.dU2T۝/ZEOv>ES łI_~fRS7u@׉dx9[n;.y߅_ۧ"q_?} ١զn5gho*tRe, .&em`w${:-V7dϼ*MZQ֋%] :("5:آVn0}sW"_lˋZ2 uJ3*UGtԓ56B\c&^=/+։[Em0$60 78ڢKMt,˴3ğn$OLͺ7$iG3?O5=<`g߬ `ɢmɖX^n6DVx%p?56\P*MWtPkqeՁWqȗp?L*sMCnWQC(n H}= Fp*&j]uibBaqpiA!_[f$^)q 4 3]Wnx褃V{SuBAN*Cj]g581 bIx>|#ծ.*l0x}Fʆt lf 餮{QfOHi@O N$C[63]Q"g<̜kCjwK(?3vL/[}6m*ҊI۲ w0teJY֜By9ѳsRgwFdɲOڣ+>R)6%& P?cPam`DCp5tƬQ,iQB+ȴ.ٹ{ ggq3ݏWwӜgF>Ƃ9^ӂgv03kl"ʿ}&M7>}Ust)B}qI73BxcK{`|}2ܒNkvYt3<'5:͍́GNiD]c^ 8Θ9oP<]3hHPy^z+3IBI1I>ݹSHQnj(ck&G<>Y7(O'WvfΌmotEDo~{slVE-}5qT3BXX+Cqӗ; C*lc Mh$Uǎz x}Xɴ?|1[R,8 [Aik,SW;]姺ޥ̬y4<brz,?#i%g9786He{t<1dWϿ8(x[WD0FA(=;jfPʲL}0"sBcU6sbުbvףFw ݏH欰Ϫ͌u4l॰!8< xK=:j}'gn Y|q5|6%;^01=4y:N_uΓ@lMRe&RJι4t>=64k}7Z%;ůmk&tAJ8Qx]/Vn:%_z&S\O)+~l\ 5k ?(e8s$kc_ћ偍` '!ʇp(%DlN_2 h>n Dq<]?N/ U˜ѻ*E; nᱟ9KX7dsq+ -/rS6 7ީU˔3|t󸐏'rqsw鎁pbTII(ENt\`‹oNA1b̎MЈ?=ߢ. 5r 8U.+*DNn XTT`I,"Eٽ]BODGi8\Iϙ-㻑*T1:akA["*B{]r\sb"S *F:MnF^yVC&2vepSZM;,>D75)ɫ֬Ѳb(-v )w)k=~p*\ցk:e>E%7~d/Ԟż?>]ݤ2Nr!#jݷśK>WŚ&.mEd|rYyA)'˜GCb uj1% l]`с*J %q^Ȯ_A`ֺ&*~hvTU-OF *3.:9rzł'Ne)j>{yTWm^]L6EҿJxU~,h~I6s㑡0H[#LOz5 _x8*Tgy{kE(饁jnIY9W¡NBY%:X+2RsE{#X$ɿxg֜嗱'.Mwp $25-ݣPovB/Q ),A`d'H3hva>ڈ:PYa>CkӷX]A2aJ|Y7.%;j0%p#9DAn P*OX Xy&۬sHY\LB3[߿{nEN:P,ձ.\܅Unp ?rg82U*iH,(±ZedqDaW?NkW׉ԛܞl1!HUd:'81^5UtDf?^ސy6KsiK"kr/&;ٕT$%1 |RcܳZWk ˳u7ィRT(߃Pžd@6XNL^xyㅃ)]Lk|E8RCk]q/rr7CHZJڒPT3}*&/g[H\w fFj_}儫AE]?5xcl[M"?j ߱{N~U|>|IƔYt eH:]FT0ص;cpBY[N(+wCL1CJ4*6p0| *y[L|0P p~!`nb`qd7JCG9VnVz%CJ@Y2`_i=[ج:mk6ghʎ] Geӣ}Lt?j#?6J.C d6f#`n,PsxpHiv]dSc6qϪ+H _.<_"APǂ_mo3fT9=V)SV<i/7qT$M\#g t_@”`7? yzr]B.50(~_L뀹#fjHeffmSTWˋǓ n J{$2]O )drmF/Ic\iV Zve/ѯzASD<5yA?,؃cOL&(0XJh*3D+s ҏ ~RHld7? H$B8yEwj}A>9FFKЮ;^s@G2O|N(0%'5XV~i"Xdgnpq ,z[e3\O11fAPx|$`[_AڅxQ»1c,~Д}Nc~C &$/KQ>똡1?K겪6Ω±6ilm;Mnlfc۶mjlIv_m<ϜLG_Y9{(>*^$p\yծv5t"_<ᖻ6ӳ4+ }"=KƿźrlEr8fEE׭;]@|yEl%C,En"Y`ǍH%_Zb0#Qj\I'k1X-9V>?&W\)D(Ԫ^UI?3mO6)hJ[Lco<ŢQ\jߑ]LLo ?];EPjMǻfb~;2޸p 9 Jkz/*/B7䵔q&I+7ucV?>YeC} }j7R̨ NC__E"k۵#Ǚ 1>P0A<17 hӒ;;V^ I_vi辀TQ g˘4Y.n:~;1V.ڈ RP|3Dtl9fR7DR~ƺ<!ȅ.m!id޸j]}ZںXܥֳ.IZFjllϊ ',hdY;0tTfmGƯA9"2.ت)lQ#yJoŐRUzP{ `9ٻr/]C6k臨ivd!|eϽBɈe64/ zcxqp,fp= ɭC-!TEir;\{T u} `n}ˍVr)E 8n]z|Ddz8CII}El*Lfϩ[>yJ1`UEx5 oJodM!^ꨕ~(vG~l ^Jḻ%PTfDAy* Vf4@aecȨq YI BnQJ]rV7Z}VVKQs'%N1X(' ;RP+,殣]7-ј <6%1Z\88 &*H$36'rLGJg>.-)zv mzvx:sͤj+/\ WCRM\œJHc:%B|2=< 5\2N(*>ύVk'w98 1i'l{o\/K\f!JdN(/ƀ=oi6sXndg{SNa ¤?T5$$E"hSު CkJSlʍEKR Jҍt**wCi'#}B zH@jo1-) @DK;g6ȝIyLF=yU[L`DyҼe %gE =2T>jϜ89lP&T|r\sX}Ӊ%gwoTS*m{tZX8ʮ1RFۃ Xٸ@pde 1ZU?%\UG8!ÆZ ,_~9E@ iI$ݟj0O USqeB1^%Z<Zw}-bi{(%48cI[ ЧF\鐩F[A!$GHWXV<^| Ѫ 1|GsW@~]c@Q>1㺛x_W՚pN7}`P@(͉GJEbE ؐ7opȾKsNS&dYTpU0ƹMT-Uj:N/!"|_zJn; OӐzӥԓlީFg' U{4u?U|G[xgde_=>-:V_>N#Ş.sS6|MMѺ]Z -1K?$=]8-? #'wtjvJv6YqD2hB|ѹ A=4HBS6 ܟx 7qsH% &Dϥi?u IѨ%Hfq\A}£/';~ b3u,Օ""Y)KQӬJQ26jxjb6}͇h0_|NSY(#r4٦B|0huPӨTURn!*$%fYЎ~_=yl1*I~0&fݭ.xi>v,j57 =$"\Pؘ.H3ip/ve6 CM 3TP6CX.+8H&R^+D4ӧ9oo&-!REIb(uW]2^Hj{1o56sDx M߲8ƫ#B&DžQғ[@PY<-ZhdcG@ȤYyfv)wN"1+pƫPoGJnB mUzIXGұCe@ʔd\'In>+X1aYq4E ϝ"t uD33. ܸkr# 0W@wޮHI lI׾GOIۈZBy7ˮaK5$Q:X$$QG$ >Ma"FP!wz %6/%+aױO3y3NO,D;0I{w 1˫ApH2fAc? ̀2x$O:i+yٍp.u~6],%WH$`p?v&87z^A_]E^EiY ˊiSre9 PQ>'Ff< ,]g>h'0ƪY#3EBk]Z Sّ0dɫ]hOgXe}Uj1ZM-7Q3Њ/q,0w]YQac ]5jYu-l/sts/sKCͰ1%AfeBo\CTbiU& Yw[n82nzf#L<-;:NZ!?40M H"4>"h>;Ȟ,>PDk%H# p&4b~yzL^bK0lqp%z{MnFSŖq!C> XWn*k_v5~;x\y+t^Ā?J@K&7Jo"mߠ[J\FH<~K  IRUѬ<(rhna D{ֽ3^sHqɗ ,+mŸtS![:~i[&W4)+T"Z8›Vx }7VP}HKO D *7鲣qUO#{uEK&98p"!8$s"9!Љ]"E}(?ixaYN}U5qZqʄ|)s vU-/AzFMnzx|T44sFP{B%H&|%68Nzl}Ohxd~=S"sr d5* 7=)vCtOpEi_r鰟nP[d utXW1Ry~$ 2xV~vd#(<\û^7{/x'2IC  N<45$>)Rץb Y]L:FæZ ݬ-+ =]@H'IglXF>6aG[">y'Nޡ2wArI@HaT=ks͔.]ҵU49ۭ#e;%Ɏ'[/WB`y\`0ZAj+5 ㏆pMn2/ڊ_d*|1}ljgmu?N yl^Uyf[a\i%!uBن`qseiP{Bn.ǏjMQMk3&{VR &N}3Lfo#'SJǬ$|ݪe'h%ZEgߟ['`BbDvdnά0[gieu5ʈWDt=^)^e,+"dH4BmC~x?d.$R7 ֝~afT0~ǣi4߄o7Y6[U:1IGώ") ss2^Z#j̓nL<|Y0NM;2[yw혎XMS'eJ-{{=Q>>Kftj`evH^݅)Hal"R`z7~/Ga2i@gf1?hf[TLjI M3葏! e`p<_ 4OAy^c/izQIM`W#vr;{yqsȉkXoĵ)<J֓f6RA w1ʶpZ슕dkX _\ >T>"(M"%Vܙsv}leZmXǩ!79!xkTrO௿̩U2YtNOwϛ,~P"63Z.SU;ղ_ 1F_ei%s"LPm_~tG,"w(Veҽ&بSR%;3pɽN̊0 o.{s.AQ.6ҍ:nZ VX~v0|_9+bAM]9.AIH`0a/zL4d277~UX_:[>G8ڳʞ}hD!=.:hn ~0Z27TgyHp֙cK:m%̚ri,;C>-@X6cB:زF6Ћ8{=x):^#u+K0.3'T@"{ (U͗yq%0_ kX浆*nf~STw*8>W%ĸPؒ}C]lfEb#B6Ϯ]0a꣗z>Lv\ȴ&jt ) ٪D!} M18v˾A#QU~A@;DLzނ s9J\ioke-9DmlR02Zx+1>6V¢UhS8( QD G "d3mhn;JE =v=5Nc~[{E=n3:ɢ"%D{C 1yM\i{1"h 4.NRfWd˼l`t*MalGRTti5K3Ӄ/Wezu.7t|ɺXzޗ-.Ơkdqis^ gT9}6(oT?,}rm(ɞ‹xk=\ᏬΊ<0KCèn4,꜏S0ρpx3L޺Re5X7L-U~urZĩv dVJ61)xǖ-f='4L%stxCͻ{`y7ORUaKJP,%+t^Ǖ"*'q?<ݷ]p>+uzjkjUݦB-! kԒAjW>B=ݽax-kӈZc9s,kBʃC endstream endobj 235 0 obj << /Length1 3019 /Length2 31394 /Length3 0 /Length 33081 /Filter /FlateDecode >> stream xڴeX>LwHI JwKwHw#943ݍJ- ]"݂4l^\0ܫ'> 5:9( AY `Y hbg`acBp@ I (AP 66>4j t*E D3 !̦&P5di C]$N֖V_18E-319ZL@9E *ЁASl4ե2jʚ*,.`")SҐ2jA-JPIk(Ok[  s_̛8j]o$c Y!6t7bi%f%vX9}-4/gW So 06@Yd-V_FJݥ`hƪ@}?]LtKڙ[y,jS;ٛYhb 1ſo &YkHn-uryxGH3[ eRZ&̤@f`sk%`dnn;t́ @]. bpX~F<V߈*JF|V/U7bFV߈7f~#hv]7fWfWU#hv]7jFZ4#h-Z4oͮiZ;@Ư9;k;s?r._bwhh?\:AOq bx8-BeMvaD_{߭AwϜv]B<.&v@냎AٱjaaG_j˟i&@^?MXnJTfoN&xPvj;74zH&boĵ$v(]ECcbg6[ `Y\{9%@Sdk%ءa:6PԳC<7wRhmv&VPnhO+'#Csr6;I t]PiР@hV? |5C#yʯg_78?￿: l Զ6}0Q4Xw}6CP~9-.vbR e/n~_݁fh s`3`r_ Dj>υu2'[I$7"h )R`&.UE*bIihf)WSjNh&hKvp?ᜧRT4/ ̰7;?#i#l"L1PCg+YLĚZ6`Ih1ii2"XHk;IcFC^ 4 j/ɱ$5P3wrL pUuqeuuxwVpI<UCx%_LPQ TNaBH漕`PWbσLc߂F¥eg5We,~B iDcyFя=ʪWbWn9M(gנb o[I^-ܾi!J#z| zO5= v{p:^V0 ԫCHlW\AG16Wѕ~rK3:ޭ?@Hޥ9Z5;s\d~VnvBCӒPJxRpƎ@IHѪ@8^Ќ"ntaMv &1 ExĀHtD3$xil7:e Fa/'c4=jl=њQNrc4oZJZY>eizNpK^@pK5]G/AF44xfϻZ$P^- ٳ|/Rבdp˟e uNQ|]{D*_K/1(E4$W<' rOEPLשGzUPqMX)Drע^ZelMG( ݛȂ'b ]]A4UtOF*&b'z dbn_;wf*L$zÓ&W)5 iQEVܣ+: KjXdBH|PU#i R¼kAMEЦ+s]yA ]77{TS\oNb ` Dr:j9q: :[ psKx@u|Q ³K]J~ʲ&wϐ&Rً2jTw5(WbS<阮ύʑ3Ĩ;,~akZBHtK'md!:ucrDz`+#WcakcR{gS AXFnr/bE#rFяO bc*)|o{Y P9lsg'׵ 1 Lu bPgFMom A-cik妶 ̔ĘgqNއ{2mbb&Թ ӡʆp1 1;f\n[G*]X|#2*pbLOw2w7}J\_9#wHd+-hY|ֈ_H\ܔJV:H?y#n,^2>T js|ouuW 0S|:&cj79SU0yhS< ;ď/=o^tνkr$ ڐ|jyHxނ Ӹ)?vPY/cB@us :鸞^#FPoꖹZQv̤%n(g)_2з&Lqt3 t 3PgyVfv+gh3߻O/>EEx~3 ; ͵82iOY18 ޒ`0RԛGg\c s2?t3[\}7G=tMбkZ؊m~9/C6L(<E db2/Pk[UBIL8[(ev(!T%4N9]qⳄc+k%TL-EɰDz^zQ?7ȸ/fNQ9b_VjD3Oo=ii9N||K( k5e2#nê<3/q6=|hVfheNɀlxGC~bӡD645ɂ界פ%Fp ?$_}|a w`OULBȒ 脜W}Hk~dx aQ&X?'ⶾO bEVq t+60 'g|Q귈_A#"Q8zcȔFoC'4TF;G!k'|ɽn^it¥v'Ře|+kHf۪_gU{'R Z'h6J7']$]GIQ?Vd*怣,Tta mxKANOU %X Z1FTLA{"b aӱ42_In0ݩg ,Vr f!Tn$;4?`s4TE!gȥGh:yx) uI ];ΘJyZ]Wip$xMn@}F;7V/pYd QI7#0faDcc$Rpmȯkwً9. Y`r6Mהc 1:^ ~PJݼ@$:sXKG7f4y`\duyC4{xuG2YLzL5/SM֬_(<Bh<&|ژ=ݽV^tʱCMĿ+-|şgAZIba a?F?ne`Ư@ՐJ[ӢqK#,JW} Cњn);2mޓ X!.v%'IFcȂw^d:k6 *+GG18xe8A ""Rf<=LW -}g?2~ \X7S>cBӾFsw1w-,rpBVha <Ye\L_L-_]M&$N/;,c@)|8 ]ꨒN)"['tz^iiW \oFiw,kLjVDaN9O@~x\WY'>.<s۱iU5QqFz1 5i"=؏DaϼTƏfytղ uLuQ>Qc.k@J0*K&}`AXӹf:}RoSA픰c%dZRm=O \[Lj?s5իgdt3\؇c_dj,}&hJ>c5a16 t xPq5V-{c\ټOlQ"6UAe/4=` Ù,f~,OMxGXB#Kxg kĿF2(H{q9KSFb~4`oMʵYfLp^dDfWUQbXkN ie:vs3?3(.|'у?9i7er)k{aT/WBRqm"PsU!W-}j%\dkf&!Q]iy'HqM Ed_+J3=uF;R炜2k~ǔyc2VZhGQ^pgKJve5 4xPɓ=Ȩ@; k5V8^Ic_ ۜך&k}Kz դ-$:p~bP*|=:UA^p\.7unG >K,~:J2>,AERӚ$[3<ɾwYB S+'Fftz9sɎȨ5w }:'kO=4&S`;r/s"_X2I6_; 6+|֍< .D÷Y+!߅މ?~DX* mfQT@XV̎U_4 )?#??8@]ә_$"a)}2˦:nIG2 @]sn7^'**0#G*`IaGkr3g .Xǻ,rqm;5D,3h.[<s1 {~%?j,\ldtxsW0Y/-0 CVgMorqp`:89z>O_ 蠥-}/{gWviC&kĻ?v5:g8ɣkGz!$:@Iכ_, ;9BK]^cyGҰa}GgD"s2Z$2#b}X&Eꥄy|I3PNֺ@Va~4 s]א iG\*>ĥ!-vMREinƬԹ‹7a(ř0q4(9!_%fbţױ*R|/Fy s@fǸ:͞siem}LkիjeﻥҪVsNWj*B!!q> hcͯH UҤu[|;>50f ";ruhfphM[{p#dqUd , ;K WXF6 &qbE5SE~TUCȭ\7Ѐ 3ϯ>8|~=8Ģv_0b7'qEGC8`؈`>Mk4;KE1pX/ .%kj=DU^)&d݊ostNҧ3AEI&Tܑ39@*Z|^2$ @~j=]٦QwuReI7ĺK\?!-I ] ֤8* k8DžDsjow3#l)UK͒]k  f]Hs݀n;J3Kq;RE0luf=%S&«AX!]`nW+-y_ɛa-,ogi^= x,ꁂ(vœs[e-z=d +j'QׄG|_ydBr9-Be8Š7dSjctB+ +Y@N{\N+|alknn>y@Ĺ:{iԵ qǍAG%=pM_k:Ia/C v²bb (xk>>/>M^'I昤Z:0E|h޳{w<^u|9!JH)5zgǗw$6[x!4 /7Š"_GŘa'7-gTJ̴XW=ih~A٪e)z}?N Ab/, Z_>i"<d9~Q#yv L`ȼ{밻>LR(Ep /֭7ԩɂHj򦛨SLE/>o,K\f Ԕ^r $=){Qigc!CЍ~͞s6}0k ޜKoۂ2a "QwR8@nؠLXpO>hTty QȨ.r_Nk× ;͢\f^R'hw6>AvuuV,C$Vތ-c'Λ) |ፉY$B7g\e?z)P^| #?iMc?q@[ջSUj}/J؟'lwXӳ2Z:8h\fo q&[QԳ M f SΜ-P\7s̰Iі.^v*J~zl2y:1Unξŋgʲt~H0B D=jTu伓UpsIGQH-oyv;ax1sz"bv0Cklob褜+ R#=s RHb>Tщm(ci-cn' +w ff=1of8-1]Kv9qUc#U$V$-u;gi_C+sϦP L6dEf !D >$p>k*Gwr䖕oնU׍2lsCurnUpFb(B3? k'8k=}l0jYA3_-Q_-n3SҭnE%JҨb t<4c]8ajtqzI كU iBe) -D#I2{そLE•9i mr/[㯛ۃo(mcp~Hـr\Ǔ'^ImD9uIQ*=C>U5w kHҢm1sJ`PpjJ{کu&)5- ?1SGS9D0/ OX KT4~QG:B w2XAY;Djʙb?Q&)ޒK9隄 :eXZzkΏ/ Yi;D*&U-o@%vv $O$"05Y4.'mjΛV9yх?n$e #+ԃ'56!8`h':[K236$]V=tN.klc m^LJeH//cQ s0?=i* Mb&%nW3yP4n~oDz1J7-q~3U^/6B֡m3'Ӑ|ys<߸6n7J$_eh>EּK]<1iRkPw7x|w{iqѦ)s_5D.ʾa(|~Q'sUԊ[0M)O22d{'>R'} 7\=ΐOB2 M15`s^1#2xy۱lT̢R+GiYRQ+ M3điĶHXcmuWva.WKZa'ovYDRF0K|}$/=nF6Wcy+ "ܾ{{WM`tЉq5/+0`Z~GyCeQ.=rܒӵ08p uvy7Q h .?Xv#a쟔N6 Q [c3u^t TcX\W7b<(7/]"_~'ػ J9ǓfU r>3Ugf HJlbRТWn] |jP9lwuhF3%5xo"%KT>op<$kgaNMCfv"=YcŸ?T`̘>ܕt|Z)y 9ljt3D.G& ǘ| E;1ڋ2xfrll/y&v],oXԹf Ԑ~na{9npz_QϮ CXys؝g 뤟N], :gǘה?3: ˕^(SQ*֊8=P}g:å ^ F>r`P~~N`?DgYȅy? V Z| E 0ёޜ3֯\CRi 7'.s&[1B섍1SAS>+w_3NʣӪ1P9^M~DTAYܻx1<6`-lh!6am$3mTO?f<̝:r~EBsxeonyp4[[=ϖ󢤡eY2M)bBJÔ6ĵ %̊uaYs͕%~ʎ៓Mx H3 VPX%׉Il5){]232bcbA~F;2X rGӡ)eg[eBlt%eER>D!NY\TZKy*5<Xյp(^E?O3TO 7> Cwi_7Ɣ5;1%ϑ0edK"&xm9p/u!SuۿgK4sUr[{q[*/OlGU}߶ VX'Q ۺ9 ޛ+"-R2YZ}ZeChK$52S]C] 0AWo%ӽo`B%]Tk[z ]]pW]{-_m'(Pѫ7e 9e;{/2Oj0dF NaV 򢍁FNq EW  % n^0ݳ^~JkSiӊ[4gIS1L@L%B^df}^@4`QC% (j䊠Nv~)9+Q7zsĚU6NcBw`-Y;iiц1rGyR|D/ |-A^t߽H_Sn*R+YxHXԜlv9/`@hARxyo4bo.oOwW8#aÝZ( !dӫ#іF )7[XGA:lL,(^MIg#{cӲ]H:^*oYGǓ,ޗGuH@/aSIH“w8W w+xx@N])+оoA (q|sfЏGS6J}^S>J]S"r47 W)RE4 `EN`y+2b/_㚤\=7򸣋Hu5vB9QLɳvt7#\ٶ'W(SJ^ey%tv} ^pyFuð#fR/=w4lP/@⺮ cvs.sq X~.C2JZc7uĥ]Zxt9c}<]Q5Sj[8bf\m'rSGVzWb1 &IUEY6Ϸ=jol $V4hgOE.SWtw+7P{=V&.)Ulu9̶D2{]~jqzc `N[dүiLIHD ,pxUw-[Ы#?Ea aginw+ȴDi&-xKv0(Xy|W'ŬI7[o̞n!Ϳn_%*;R(Fєl[une*#א?Μ1Ĭ5pe?U'}L;$Sp."D5"IG.es#+r N/N>HHlj!_:L>n3ẕZI_7`dս:=iRc"o²"B&7QmE'6"%-t!,4{Tr{FsS} Q~T:59ٮ)­I5$ ` OVVA)޾`P"L8^ثӱ~<_[Kv4ƙ. 9,w8[l8m*O5aML*ĩ=\]`!eSdu6OM I'=<.Aa"zev)3 =T9)wb(ѽF.9w V_?b _FM WĢFbHP'eq_o AA`.8,Hsw @qOI8Zc` riMȽJt"%osP^G6L^䟅Tu{ :C颤C+{FvY$z %%J(.~tG2Yl?#?ݴB9ipCʆ^)39nhd4}N@SE$0ސz[]iŎq3J:t/t;›,Smh'tC˷d=Bѩ{+e@t2PW (WxOzp5{i8g(U7U,9ad nfC4j8Mfuz1 Q\nYbSC6}P_ho\]WD!Eo"Ib8D)ڎgcwc8#Ql̃IQM6eeYd\)M\W;qfa#"Lt^+7@9 `D"M>&Ii~3}68TD_:K>K#nyε>*9{N?' wګ nmÁ58!_P HUz @[Uz&#jw/B.!7P"/G&sR>%cj=J:\OkS*UhS?2׌/A?j{3ƠN6V$>XefUiEOx?PiLsݬ䀲|PF,6"%ER 4c&jWL|" (kÏczbeTYq;g(i5u|Vjr!uސ Z(~{fWqZ{aWD$xjVO,~! )Gt̗݀xIj_yvmU}2[)cMv1i Eyގёirgv{ej߸`4 9R_͞5|31[(7"6,U(S)"oP2܃8z%&ٵ'E@B.=̓;?{-}Q2L9,)ez'_>'{}1mTu},NtCz'Ք uo7abcVy_c^1iտ~Z_M;L&[`= ;Vc92V^O 3YVZ"MZOcyХxˢp$tѫMcPWoT;=__xV) ~?S 1.l**]@Z/JՓNv$_H`OwuɌd*E><-N!JUY`ލd[{<fxZI9 'ʾ8#jl 6ivr/:c&Fz r_k'1LE9I'Fv#vanRAyq._-tRn=0-Zwژ}dR X;?dMK"oDȜiJ,1=7IOD++{Bi]5’zlTJ/xG@mΊLG+xǿIr]fr)nFQ#gW cotK ŀNuxhb,vD-x|xt pS< rI& VS}xr;:S5$nDy`༰* S?'J h|q g,3 nl[C-eVbԷc%/ Mź&#407DɖfߣZ@& omYX1ZAYzS)x k-wNu[\lVq<̰ ƙljGSs$XMyxy?f6~FdA"`Nɰj.?N zP>ZAªξU eJiܐrg&9ę#!OoE}qn7-/2yȴ)43-| 衒[O=!-a7єer@̀XOF?^uZ0AD3:6Vn~s,,P$MЃ$-|ɿ܅^pȦ!Ռt1<3ڷ!NJ]cc/&G<fXPv؇aArwU= !}tymd.qb >1';VsUjE*E{bS8gG_^v1YLA\;=0j"Ⱦt޼x| gQV2U[?օӞ9S/^ld:RvPY YkN[YgW9:Kz!̗Gܗ0ak/'b:F>l Y%(rNrJ|E!ږu#;y&9ٞ/fWּRj 10WȞ G+MS [-' 8' +z2W_ <(N>Fae@N&-ڼo0_346kZCI~8%}2 pĨ]xY (OJ񽝭ߓ"U.;v q&\fMhCOphI[1B` ׫Gh[F[$^5bףDF)Y5y_;c &6L?[E 7;|~O70!o?%eO+UܚpF 0OfbD(ni̭dT(lNWAne);,W~c4gOR %jm:JP6&Xn_k($UcXwoXQIA7BʆU1#H mE"M8$C5_KG6Y [.1}jK(/UNU b1hCS,KO;t(܇|4(B٥f.2ђ*`U_/3Zf}xx`T@C#Wre&ثHM'f[jb .+R1܊5P޹0Bo?6AdFS}{1p{ߏ5RцKni~yn(&1G-bXPJ twΐ=isr(/ He%(2 GQJQz\͵ AC'Z$l?;5h{nf |5ɓtQ'y. KX. %cQBvRO?lOXEs-v?e$z IԧhҙQn0zNS=lBCyfڮ ΝyLt;^vCm$|PU?D_c[hqHʫ$b8 v;XE?*anl2޶paO$uU^Jw4h416 3Vi _Z +1EIZ6_CJ񗁍| z^7LJꭇ/q$`.lR' |X:a,:xh̯$|pZm| 񗁍eÜZ'f$!\ aCYj#G: F‘-YѬ2&e`Auo޶r2=LDSunŒ٭iwyfblc>ItseWE 4,.mB{J/t#I}d A~yuˋ3XxaóXS}M:9g s*%SoSsJS=Jg{tWP&[h,˒|4d7xt[^|p%B-T!Zޘ!MO3ZXhq{/+iGW 1g򨪖yN xSSk1Z%Vbj[Obn5)C6\T`\,j,EXj(1^,g\=xy|*:ɏ)pouE7 FIM6Dulh2"0q,\~ZtBPwю2 >M讫nz $ l:dKgPB%&ڗ>.{wHv]bmʐEGOЧ+fZ3A9Kg·qqYqf잞$1ڏΥ/HgvD}OH_-%6"GB1Ita>*}N k\^qa:AOnWQm.x8J'j &͂ u7/F.5h_7iX#amlۯS{o)Ƞ8\G ˯9'V,4Cݑǖz(-RKW,IRMnΆn1-!?氮MhѼXPryet= >Y8DLrj܇sF@+o[X)9ZHmJԍmrt(Ӟ `4= [~&#C0pG" L캺Q?T]Ɲf햽%nY :WGx ƘxcK)k 4kW2X\{lz ܻx~],bmC@ )-)ڪ!QK/!!g$j*Qѽ&B#|L~ $ADƢs'$ L7ӧrb]s" ]' ǰQjOOۮ>xY֌ ܒG;0w KK:qOy{&h!ˑrybK(j`B2Bct_v\םH<rcZ{f64۸ъ$_MzK^N􀘊qznvO b g5P{a_HjϒG4ITNYjP+/ҷä̘} 5~s4V4k1(4wgAZ_FWLEc6LN¡]E04!4¢%{YcFUYQb->kXc\D!wFJiiH@x'݉oO JOBEؾ@a~B՚AR2b BEz !z%s7=\>zzC""6-C:NWE9 98ğ -ljb$Zw#~0!^BH0C'X,${%K"pzD;~]ݿs,ɗB%%|HHDdch#'k!@U.k M2,41ykhFnE~o%2A1CJ7ް[Hʱ0JMQxל,%rmnрJة;v>E챙n$mar^<-LMq 2 049+ z:6VS=.ܚ5eyhqD#C<W:5i6Nj jb6Ęާ4-Td|TѝPKuܲ,0EժIBD05c Ns6.kS%"bY{ǼvT#Փî$,#yܷ?lQ~̼0GZHB#AǿSrJmϑ*ҷXb9/ЮCHUX3jvcNv4m O)BOq񙊘edq j2ڵ[RǕqn;d~:A6W3ٲP6LMg_?zK[~!0Y>Ͽ]\oȝXϢ9eW3?;#^ۭoZ.z%@t[^:)[0>ơ=F[O*{.T݀_; P˖!qtx_l~S&cvYsC ۡxHֶ=qb6 0aí; ~jéٿgfqBA DF55M[f~2,~2},Ya+;0dQ$ߍe[Gns_EO=XѽOe궱=!b=H}~֚u*S0F+'tfJ4G/8rkVaK1)ųFB~|缈\+a ulvjO[*L-jVB_V`rYWz1R{kb;J T4@<#C鸎%{H؟* zuRnNf-4Y;&HH,&dmn]nbeCܱ&4O6TmOq(_!>Ju>4=ZtJpzg:x]q8'*Mh?'5+1j#rHf܂SzF._6ĩ@ GlLx{pA&׉i6X{P)еR^up?Sh}qQ,L!WB_Q`vՕ. SAsk|FBчW4``瞔)zO%yL}Â(L,, &FeϯKOuE6|ZqK&$\B%'"O6g`0jIC?>~}(JMzjŇ N0-0gTxYY}<Ã""-VV&P#;Rt; TgdKz\$̅gDZw+Gq7mHnh+ZRn>вibb KPfIYXv> 4^XPEx D(Ą4I=ZK9K,KВsXD1Jb/RQHFfS-CЙMu&G+h/t/BR^5U/hۻ0k&,D/jD_ջl_=1ޯ0`v;dvpxOt'yI~[O'A.UI(jiXعQdB>b;.>Ep'Ħzvi&[8S,Ӊ)ZJzf 4a.ps,!a$Wh)o$̒Jlu8*x[ Ȕoo0n (Cr/UO4S!mhN}-m !TLDr7f\VTAXe ъ67#+m@ JIZwV \QbfQEy `NJ#Yq>\L֞ȵDOSK Ʌv23 E4$X_W3~wWWU5m--j W67 8lW&Ju29V 8nWE33a6`#MqDq[*ݼץ[`mb|, mm n0 Ѵ*>syD=TRL5S U;BzzTsq79‡F s;#Je?7YBDoH^ ~/.Sh ,Zͨeqs#)g8L.z:DaK?怛Hɷ>/ќ0S`bI9H DKnR'6G)} %/@m; feQ__rFB6  7̥H!\ ^E9⾠΅o?*:¶r8g8\M7;v:8WnX!lNØ]βuS?tԧқ Vy}nj7,~XǾ3wbۓuIRo8R&\ Ch> M]U ۾g޸#k19w"D7*8gcC1J%=jumL7 l80] y΢}RF1+Biߞ]#A4lfQ!/vQ! @"G U2@ctVg{Z8Dr𲑙_ G;殼njNFLʌyc1W;ϧg!.>Jx6Mǚ^3@aX]|,XdWtck}C%0VC\c/0 >I,0YNC#`JWְDD| r %')b&i$z +YqfC2ZI,NE=؞o\_K9I^ksuKeV}$ bPYjK91m#e*MHmavy@TFS\_W4#ϥ~ƪoYi理C@٭Փ)#3d]Syi:m/7KU~.GlO+; aFgBJvKbٚEs:񀻢sHw'C x%!+.EHܘq} )f@j}+IǔG"d{L4\+h׃JM,T\ncil: lKzrÖx8m o(b7* RWrd;,̆f[wXV]4)2!c8?[)?d#iyB yMX]cM%\UU*^-APvnrV:^]IԃjSjN47r,%q[45~9<$H LBc޻)]׃`_؛䆲Ftn&FNVX)G]2UBo[H.?"Lb'IEc J9mUl'zZhKqYgy\wwdP>1ŏ%* %٦ @dqz/sHP}^W nbÉZXz17C|ӳ܃!=,n(8Ȣ߮k)񴧨ۂί4*ddsΚ4:Xۏ@)ϔE5N^$f RdzT_q7WB& 8f&$BV( <' }Fi ݚq '4eYZu+8#k`v kTq2!ˑ]X֮trW8@w@O{sKƬ9],d.8]+Z.z5Ad`A}nN`郑yl՘ 'e|: #H}( D5#2̸*}>vS@ߍjҼ_GgA=\L?wqᒻm'sv.j*mB"}B @}rGHMI|tt2z=!J[9 sXy2w j?:Z<1 8rVZ_͟Ɣ:-у]]rxsK9:dxZr(dj)|(o}bL٢EIhoW@}j'D@U GX0N $I\SʓTgl;w[A_yZ Ifybz&,{!VЮRG &R%Bl;t[1Mvof#璦+ס5wl_Ev>};7Dny6eZBޕʤӬFlSYRk5GE u endstream endobj 237 0 obj << /Length1 2057 /Length2 14723 /Length3 0 /Length 16008 /Filter /FlateDecode >> stream xڵUTJӸ[pwӸ[p A p9̜{_ղ?ծ~h*2u&QsSȕr`ceu563r"PQ;M\@&@~@=ʇ@Ns@jdКT\\LM\@5H"lmiӟJŘr&f.9@Yn:@+; @TTSH)k1Vwsttp?.Ҍ Q% IP UzoPx']QRCTCWElwÍ?S-jruuga`tsqevpdv  578]'k3 'Io(ߓ&>?5\52?4ݜPmE]}ev>~&c& 7濗mrvqu"`amϞY)*JIk0) Ĥ>_J(K`}$\ ŸIXًem r s?s7wsdY;e%/݄% :fV,?f?!8:8,L\~7w O#6ߏ _eAL?uLϨ `@`Qrp}?zI)igfbom_ CV|.R֞@skW3N K; oe.߻*lA@7_.,! `ѐVWd_% 2s0Yع&&^b`Z, ώrX$F|#^?=߈b`fbvޑ埫'b`|z ~w ߐ=foGD3s|~wB^? ;Ź,s|_?<=߉y DTܒ]}}]l_Q4qug}?lǿ>W\=sac0q/ώsW_Wǻܢ h}L &5_d 烎\.B<r/9ԟd ӂ@e:Tv7SkgnUEvM Q$E' 5sWz u9s::H|]qoU?=Ja9axtv=b%Ě~_0.ǙqڋǐQsj3>c hrk۝+,Qg6Q?1r_:INK rs^`lY$u$)[nU<ºr$w2e[WE 1-w*@aolrAvAS(98T;ZxL7e;_U}%D.lFg1lesAS.%oJo9P +<j!nnyimZlǹHp8Qaʾ!2PbK/4o[w[/%c}<JHN,6|2/b .^eʂ0~\vpV9edJشС"ϮNI2e0 *lYrr]油԰uZ_ٙ !3E|dvIdv~B(, kB#Rr{A8(L,8eTThVWv/`k_u`\|r9UBW׊#3[#z`-d+PS ~&e o+z0u4yq=z€Aeչ1ֆ6[6RғDLsc(̯t@ӯW c2آ H)^ҪPr(k= ϚܠoQ 5J; <@C-@imi9Ǐ\m!l\&AVyneOpM7/g 9_>J{ \żoI2yQBCp=F"mLj~bm0!h bwҧ :'}Vw$,E\H$*,+׵TN&HM>>iaR'xQhT|I&^B!> Ʒ6l:IS(\̣;jq}1[0PIkZMBnw(ߗIkfEz$C;e+oQK4^83'ԕx'm? =|~t '^Lp~9WUoC@'GA]*{9b#68FbEQ oqRS(ju' =C5KF.#O.;V^V 1RH{4y^v|}#(UYYAAƅ(k D_H}ZhtX51bSn׊j9^ ۷Ft qxwZu;^T)BkG5J,"%{kTjxp`nIgAaQ)@1!kn)M &O<|0Q}:Y]KxµjC2wJin'w Ӄ!r7gmqTޒ|sb ApŹ\mONY HzDÈh NMiq|@Rb 'M-:J؇%d Ǘoz)ژ&lԒDq{hEK-^[R?۟1NRNv:i а®:I|?'bVP4iA"Me8g.tx:Ճxqo.E׬I:σb60΄)c{lP{+ Aa;O>ޯ9)q!jQ81}RB}̙$]2rE>v(guyă Nc3#̝JU;Ҹc 0iEPE8#ju(:WBLqr\OpY֗ۃH c7LPD%P&kv~wY/$,aPl7?ohe},X`C4tZU\%N->[l$a‘V8 T$ۛK"o|Wt%wv-& n%)옄L}Z&A~O ;^T'B|t %zf޴E]8IKҎȵLǎ@.Ptם[Z$=!O9:?Y`#Ծ: j ir!?(1YW:b"Z53̲^:Dg֢ hKj:8s2\'':=S _^/IBi zw5dq|vGg{Ya0h8+=)ð7DCY1M* @WZX/(DR%>h>(D>J X/g<-aJlճ3'Fʶ\ߤJ"ϊo:tzLyxSBr3 G\4[F4qs[ &.>!*$|!kj]=hF\]rtlOW|X戤?Cue0JY(܇ l4W?R'shlaWWdR6׬&PcG;#"7OQ3T/ |?#8E?cn:V@~5b\# .|M6rԊH6P;E](h E>ZHk= 1Σ` 6%k<5S@DoΞufxpCuE%Y6 jxXE"]mnT%)5nU:3%ծ5=+1-Uס' g w\s!i`W&rT9!Dۉia/N?1iE֐jЊdYU8G, ȟڜ x>j{YG >/%39MŘ9l3j'[~:j+ C3c0ď@#ҿ|Ae뤕Jv\~RvKLBy`@Lࡠt\ 7{Gf׏M9~ge4Fc J/]}X8 w5!VIb$Aq wq{w""4ev{I u JHڈ%O5Dc2Lg35 1 [bȓr7Ӥ_u##Ii;n`a\\! {R+8|R/?RkN s-`ܛA I"dG@(>O/k*H^ ٖ*L$⩂ҭ""{0BIBWG]d#QfZY"%$.1.VOf>&A37X #į&iZe?;U?snç.E^6tM//[RPRSH¬Kt:z= \;u+jܷY?n!|)Aicn9"&{i/沦qƏ;>cV;ٞ>Z5sKv1SBr[3P)~2LCÛH},])(h]^׷r58j:OZ?Q_ i"^!}j,8mSɾ阖4-QFqo D^',2/9?o||=~JP8П@eΥB][ O#Tv,~zn  Yq![2>˷W$Q׆PcC1,ۑ>;}L  ,!SWR)cs [u}`J6#%y>"Pr̲ kGo/؈y޽FxBn <_A ?Ii }3&QܘܘV *TJzpdTͭFöouGŬxTqȂW_(T_~q&ICV2/WhuyߵYhJ7~@|Ra$7kKB,$Xi. wYvH/]{(=MA!~+2pi7yB\蔗hM(sϿG*ũJQ;Rw̱j_,yG>`D%F,wbklv-2Zn!כrVG6/%e=#OFxa֗լ,kX.b$͓{?8>%P[11R@f+jX4Vf잩câfWvLbd>\¬<R3ܙ2VC3-<U҂Ǻ,Fӄۭ}G+VI1@9 p|-!-#<))PvgܱM교rSz+͏^ qo"GlU%YVBzǫuuK)y08B>CgM! `yEAo~Sٍ#t-C ]oȒ`lTPײ؇CPqpR! KƌS:e#vpkvQ%/NbKӈ^hVݐW%wa֏)OZpoꔼ15~&LÛhX%KPn N4 IBE2./}khoπS%3&X*9/yXOS#Wz(PjȺN3eEaĤgjrAi4Ay~۹~r8thcjg]_N4YN7 kRT $!xF }9^xP@H/ZڨaAFfw{+ѝ:<ڿOШ`6ޓ'Ģ +AUℂ i +Gfi>-P[*u [܌CB59]ϝ3>b1ъlnLΦbf `gu(E|zvU ,Ȏ~&+mch 13햿&%<A^J^ؑČ RhDZFL>iw;q L͖~ε-B:IwYcaLbvvnswgihO2Gj?s8,Y K5ƿHrC+>8FNw"|_jP5aٕp&F.Wt 2(ɜTz& D哅h1\ڞY 3. ŏuޒV2Rl`0.MQ5: z̒lBO$ RJגu)flcg^] CPFy;u !PYԗ@ :!%m-2z'?rwrEFVZʒZRxk Ey8.)vƪn4t殗pe^ds$;!A%c^>#/e˔@nqw[SRguvt&"Kl8>>3ؔu(탿\1}!Q[ŽMq:qϤHzdcPT\lCY\=ߩRm-uǧ"BwSԆW*ʛ>8_Fn꜄3SLdOIٲjb2; &J.8X3ȣ!Ypdfߊ]D*H~=b .fLUel*EH~6ĵ.<^ &j5q"/b1K|!x.yC<D8s6Il 9{s1;Α _Y/-DPmJS  )*%|Zd5&־W})xACԏqĹP沂1nŤ/4WJ CsЅޘ&z.˞0~{MZ7j+hfTE;chN__L?HjXBCG}j2S M e T].ċz'gjzؤ@ JuChUoEC&ۂ"sh7MDc,\aRNl}_ ]xt@,Y+Y [$xo[P"jiߑarłaj6|V/p1 1CHmP'אP‚<#}-pސ0R{RP0,e2s23Kk޷,ңI5]|v۟8YqO䒖bؗ?Q>3h PbCO,+Լt6V.'կԒ4zfaVH`!{fB+N ^ƽ8]9t OP]U;9Z+eUIm<sZa)&>TA`~G ,&*d4Uuq kGu ,3:2/$< $xkdym4M=fTpW۹fe&EAXluA i;cB Mv:T`p=h|A YKUt2G6kUs.Μ1r)ϻU":  VxB:yヺr{T׻ԉy}!FIB1$~\Չ/{FxD_ h˗S0.QNZ żjzÚ&}duݐtlm)dʫ&uLv&< )Ѯbt=gǗ[hr37_,O ,:jo0vkv?(lLMjtM, "sfB|pD2fZOT}k>>~W;%dmfXks5}y}ϙ3 EqLfFKdIy 6GuSs~93V2.bm÷Yy@-@8nV٩ UhSIТKb \/T|#C(?k⊱1iNd*7’aI =C$7g䵳0h?>uE1B/.a|H^ sUJ]oI^G~qNY x:Qj~}H_>uUB[8>dg^3>xⴷgH͊bZr208ͣ +R/<\(T5OGe&'1eJFWjyQzo"7?υs")5UdXUΊ!N%G \$Cة@~f UĨ>:>< 1]νò MsG7DbĐ8qզc"yO7N. w>/|Ԩφ-5-iiNw0wyTJhw ٽ2mc+r{6!Dg;,"DV?#U]vݥ!09#C7-V3)l|X%.dY/_xAnqTzTL\YSbhd!~s\0r:28S_cؘ_(hz/RB=y7Pn\JҀ!!kҏf LMnP(> bu 8|KIrힴݏܡrx\ςVKZGRN I_)&t W}S&ڱ'ml`?ᗩoGT7/ln2{}*_8(׏QQ*͸GE+$$ZrXcMύ=[WkN_1;M]"^ BA(pc5+qz \N .Fؾ]D WB_,`!+?”rBPhk` IE]6'NJ%}D} #렚ͥo`sAAԃ PU8w- чb|ʎP 6J7w{#Ldb?~ $L\ѝ]`|`BрW gėъ;BzI49?c-a1CbK!6ONHzw.h[BFE.P&(ֶmO>Q"X.C"ӧׅf i <&gX NΆ d|'BjVq,2B=vYdB'9VYkvK;Ahr׽2*r7HPs嗻K5O χ|KNYgbv<6kZ?>``S8)>>()غp-Ε1 xzڮA{ vƔTH\zw*/kPi]˶_;J]^#ޏ솧ғMyg~e՟ji*2l iTi.T*С_=U) 3=*W$uuN>9$[zaj/ q&߬cD9ë9l ?,Wᨹ!.p@2RU VtBS~+D^yGw*ƛaE&~#Ri>vy-_0K,{UDC5pjW&:*@zQ[InreEH{e2HgQv[#O9.*'QQd7{մ"2U 7 ]6qϩJGr)z[$;|0(\:bo }J+!.kl +ίy9)dqT2N`'Z@fgd [zkGK BXQ҄% دb6㟗)&o>|:l _H}(YbQ3w˂8?yޡUڶb[s}Br.WI$1 8)N7r P3~7s=LG)G%EiCGn(jA?]x{iFŒknu7қ~-j.vkcn!& [Go^ԛb;cF[mCc!0WlG4OZ0muDq'5u+fWؙcW񣤅Q.l =>A}+|o`9>A{&L/ hu %'n9Hb\>nWxK&*3KPJT1{\@@⊊7Ʊv|ͦk!C 6"ny[qws%#Y!laYB|^jdzdA#`g=hխ.h !qH O9}IFC]͊aIj *:FA]lLE5o!m&xʱǵaQRn5`E {6"L|f "mⱮPlH".W$ o,ڬ?K!`Df7I'P6bLBdt.Fm ?Gyoip(Xe377Unň0ssg*fNj$|y'虺~?~C(12!*`"b%<[Acn{R&.U#7uj'`X y ̻_6?4I]Ԥ٪Oz*}"S+Qr6=5lAZMOJfm_>IxN:S[JN)s,xdl[Wdb}dKoHyj!".ҁ!Vf6%U8l$em c{RSPlQ'G?3Ԓ~ZYZ^UEN}n?uy *wq̑$ c-3yO,iaaWVI( %h|^a]h#]l SyW\gŠZm<"h+}RۙZT( 27 /*c$"Z# [6(͐(@Q̀xpakA/vOyacPXO]{6'vC#6 LS%gNR؁rxg/Peą֘;/y!#+d"f)EH)͘Y+>{_ȱ{]yZǖkGYV6rFR+ⷧBG M_u`VHKGTl.f* WB/Txk#TxГi*YC7(ٽw&}'JFfg*t)^حE)bxFiCCn jlW/X;5#6{8H&腲TBȷ,^tݷghH[;}fsr<GV (:f5XOV|xeOyPL$%FEk7#\]ѝmT307W,`;āRAX^ uc,}K >Gg9 Wqn7x( νɣC69a-9 FftigF5bSOſf 9" x`qMEA2U"V"1.=ђ90d[k6}^+aI$':NBG0Ov{STF!n6@ak 0:?U8|h0Vtиgޑ"(ց=+ lh|4'L aޛ QUXE8]}+H3|su 2 OmaV* zDO[ݺM|ݥNci]z87y4tޅf"zg-xVz5iXu`8ɹfUڰ dNIKae+$L"xp>mC޲~Y>Dء KH'_q$ݩ '/  9 hAH_C*qyJzjP)PU,PE@& endstream endobj 250 0 obj << /Author()/Title()/Subject()/Creator(LaTeX with hyperref package)/Producer(pdfTeX-1.40.16)/Keywords() /CreationDate (D:20160408210533-04'00') /ModDate (D:20160408210533-04'00') /Trapped /False /PTEX.Fullbanner (This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1) >> endobj 156 0 obj << /Type /ObjStm /N 99 /First 894 /Length 4089 /Filter /FlateDecode >> stream x[ms6_X ^ dnƱ/i8M4DldQ4_E9\;cSrX>XP2IUb,>49g&e]I-qkxbçN21n&K$:DBTQC&\٨KN.x("[h UZY0qؓy"8ǘHLD("dhB:e6;O- v&i VɝMW94,5d"U BṲ_F*Rd t&+uT2`Z:j&(Kѐ#$S B$sBBLkiR$&J@H 0M=4BSj9Ľ r/B_a)zpe0-1yB9ȧpNh܇B:"SӄE8ېG)@N'KCJAZD)5y C 2)Ҥ4.5f8T?DUH4Vga:0hѲrO`Ǐ؏EžV思ϯKD~^ 'y]O~Χ'dt7 ѕ]1X4 ?` !O$={~|LCD^vYm99Iy}]̊ɠ8-':q:b\\7bêu],X|0o 6(gĚr<,m>Uv5+.`PL6,1D]֬?aqf a9-f>87U$raȓ=E3NŻ G hRGON^ogJpv'3`}e*wyC{=B'#=6_A"A^ͪyS $ zv `!<*뢮3r%eEO:'է= 'spzc3ϊ)Iɣ!5"ONug?HulOoA͛I bFOe_^"R! 3[,Khaa0}\sOP>8r66e*\K& #hk+!]˽BJ+V0CL Wzɾc1nP\"(QTgJD=>Uݨ(.[o;z؏8v޼~Nnf7Ʈʪnէkh ?Ǹa^!4c<돫ɨoP:.ԗP?ꏊIu[`j?nE2k;R)ˋ:QݯwQHZmJombcA5tVOEw/JRXisKݡT6+ +UCp,.8;[!wx[_g"8|gۥɼYև`@ewIAӶ e`XUھH>,ɬJO/RruG NJcoїL@2ؤqhT8E+HeSzcGS(/^F( WI/Q&%MÙC*ҙv`4rFoCRais^K+JIK/:2^R m5PRQ2FB`yiםzU|io. d(c-5D)w')on[^rDXʰޤp@aU0g'w_=>ؿBe;u1ޒa^ VV]Q4pJ'_jp*@9 @Caq nIS4N𪿚 U 2z I988RzYQn0Ma0ޘvNrc2s5Z,h;C:!C5 f:Gr@7gJʤy.i][Uh^CPXG_җC6=Χ&Ҁ;=oq98E q"v!&;b'{ُ,g6dflnXMXŦW6c5kX[#>OK 'ヘI.ޝs4on<=|]|4R2>(S:qkw搫iLڱ] ۜMW(37?yv|z S2}bnu~n/svG ;`%;^{$ao+v5%.3 q5f"qX<]_}#X\rRaS6w[~8 MϲF$GO !fV->#jՋ>^|yw /f_pD襣1\vbۉde/: :옝q6Njs'HDO=,/ߞ}&I]spA2SnZ1UP%g>IlSO삽[X'C׃ _v.( * 1l8ֈ0o A_]- ɽVx^{84,ojz'K(ݙ{]DaEv+.NՏ3zp/T.<3YuteqnB [i{MR& liQ mȨq"#$60B'$aAQ$m)GU DŽ+G ɂ$7[ۥzSD(/|*ޓiH!K!٥ȞYv0h˶Xxzק0pI+CꮞugVkԎٮ{EN]}ɰ"B.Yx㮑0lR˵WLפ'5anbCh]VEL)㧊:~jXhR&AXcbXf` <1168C3F8D470E441EB1B6187E9602651>] /Length 656 /Filter /FlateDecode >> stream x%=lq?RoTTʭz{ZTUJI ɂH0̝A"& Ił.M9=ܜBX!`qfZ "8U˵TYX}\;ChhW@>m9] h(mBZEjP!8 ! ŏ5\k-תq:X`; oJ 7}Ta l*1>`~XwZ _^AˬF0(>h4k]T5[j;aYoFm;t@4C Bюvh .jYbORn8=k5oq\1G qx~<}?Vr-^VQrN1w{e߂e\{ endstream endobj startxref 393110 %%EOF DESeq2/inst/script/testsuite.Rmd0000644000175400017540000001162713201671732017612 0ustar00biocbuildbiocbuild# Test suite ```{r functionDefs} medianPercentShrinkage <- function(res,...) { idx <- res$baseMean > 10 baseMean <- res$baseMean[idx] qs <- quantile(baseMean, 0:10/10) nms <- unname( round( (qs[-1] + qs[-length(qs)])/2 ) ) f <- cut(baseMean, qs) delta <- 100 * ( (res$lfcMLE - res$log2FoldChange) / res$lfcMLE )[ idx ] barplot(tapply(delta, f, median, na.rm=TRUE), las=2, names=nms, xlab="mean expression", ylab="median percent LFC shrinkage", ...) } summarizeDESeqRun <- function(x,Wald=TRUE) { name <- x$name time <- x$time res <- x$res dds <- x$dds cat(name,"\n") cat(paste(as.character(design(dds)),collapse=""),"\n") cat(paste(nrow(dds),"genes",ncol(dds),"samples \n")) cat(paste(round(unname(time[3])),"seconds \n")) summary(res) if (Wald) { par(mfrow=c(2,2)) yext <- max(abs(res$log2FoldChange),na.rm=TRUE) plotMA(res,ylim=c(-yext,yext),main=name) medianPercentShrinkage(res,main=name) } else { par(mfrow=c(1,2)) } plotDispEsts(dds,main=name) hist(res$pvalue[res$baseMean > 10],col="grey",main="p-values | base-mean > 10",xlab="") } library("GenomicRanges") recount2SE <- function(name) { filename <- paste0(name,"_eset.RData") if (!file.exists(filename)) download.file(paste0( "http://bowtie-bio.sourceforge.net/recount/ExpressionSets/", filename),filename) load(filename) e <- get(paste0(name,".eset")) se <- SummarizedExperiment(SimpleList(counts=exprs(e)), colData=DataFrame(pData(e))) se } ``` ```{r runAirway, cache=TRUE} library("airway") data(airway) dds <- DESeqDataSet(airway, ~ cell + dex) time <- system.time({ dds <- DESeq(dds) }) res <- results(dds,addMLE=TRUE) airwayRes <- list(name="airway", time=time, res=res, dds=dds) rm(time, res, dds) ``` ```{r runPasilla, cache=TRUE} library("pasilla") library("Biobase") data("pasillaGenes") countData <- counts(pasillaGenes) colData <- pData(pasillaGenes)[,c("condition","type")] dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ type + condition) dds$condition <- relevel(dds$condition, "untreated","treated") time <- system.time({ dds <- DESeq(dds) }) res <- results(dds,addMLE=TRUE) pasillaRes <- list(name="pasilla", time=time, res=res, dds=dds) rm(time, res, dds) ``` ```{r runHammer, cache=TRUE} se <- recount2SE("hammer") se$Time[4] <- "2 months" se$Time <- droplevels(se$Time) dds <- DESeqDataSet(se, ~ Time + protocol) dds$protocol <- relevel(dds$protocol, "control") time <- system.time({ dds <- DESeq(dds) }) res <- results(dds,addMLE=TRUE) hammerRes <- list(name="hammer", time=time, res=res, dds=dds) rm(time, res, dds) ``` ```{r runBottomly, cache=TRUE} se <- recount2SE("bottomly") dds <- DESeqDataSet(se, ~ strain) time <- system.time({ dds <- DESeq(dds) }) res <- results(dds,addMLE=TRUE) bottomlyRes <- list(name="bottomly", time=time, res=res, dds=dds) rm(time, res, dds) ``` ```{r runParathyroid, cache=TRUE} library("DESeq2") library("parathyroidSE") data(parathyroidGenesSE) se <- parathyroidGenesSE dds0 <- DESeqDataSet(se, ~ patient + treatment) dds0 <- dds0[,dds0$treatment != "OHT" & dds0$time == "48h"] dds <- collapseReplicates(dds0, groupby = dds0$sample, run = dds0$run) dds$treatment <- factor(dds$treatment, levels=c("Control","DPN")) time <- system.time({ dds <- DESeq(dds) }) res <- results(dds,addMLE=TRUE) parathyroidRes <- list(name="parathyroid", time=time, res=res, dds=dds) rm(time, res, dds) ``` ```{r runFission, cache=TRUE} library("fission") data(fission) dds <- DESeqDataSet(fission, ~ strain + minute + strain:minute) time <- system.time({ dds <- DESeq(dds, test="LRT", reduced= ~ strain + minute) }) res <- results(dds) fissionRes <- list(name="fission", time=time, res=res, dds=dds) rm(time, res, dds) ``` ```{r plotAirway, fig.width=9, fig.height=9} summarizeDESeqRun(airwayRes) ``` ```{r plotPasilla, fig.width=9, fig.height=9} summarizeDESeqRun(pasillaRes) ``` ```{r plotHammer, fig.width=9, fig.height=9} summarizeDESeqRun(hammerRes) ``` ```{r plotBottomly, fig.width=9, fig.height=9} summarizeDESeqRun(bottomlyRes) ``` ```{r plotParathryoid, fig.width=9, fig.height=9} summarizeDESeqRun(parathyroidRes) ``` ```{r plotFission, fig.width=9, fig.height=4} summarizeDESeqRun(fissionRes, Wald=FALSE) ``` ```{r, fig.width=5, fig.height=5} gene <- rownames(fissionRes$res)[which.min(fissionRes$res$pvalue)] data <- plotCounts(fissionRes$dds, gene, intgroup=c("minute","strain"), returnData=TRUE, transform=TRUE) library("ggplot2") ggplot(data, aes(minute, count, color=strain, group=strain)) + ylab("log2 count") + geom_point() + geom_smooth(se=FALSE,method="loess") + ggtitle(gene) ``` ```{r} sapply(list(airway=airwayRes, pasilla=pasillaRes, hammer=hammerRes, bottomly=bottomlyRes, parathyroid=parathyroidRes, fission=fissionRes), function(z) unname(z$time[3])) ``` ```{r} sessionInfo() ``` DESeq2/inst/script/vst.nb0000644000175400017540000010475613201671732016260 0ustar00biocbuildbiocbuild(* Content-type: application/mathematica *) (*** Wolfram Notebook File ***) (* http://www.wolfram.com/nb *) (* CreatedBy='Mathematica 7.0' *) (*CacheID: 234*) (* Internal cache information: NotebookFileLineBreakTest NotebookFileLineBreakTest NotebookDataPosition[ 145, 7] NotebookDataLength[ 35124, 991] NotebookOptionsPosition[ 32281, 895] NotebookOutlinePosition[ 32848, 915] CellTagsIndexPosition[ 32805, 912] WindowFrame->Normal*) (* Beginning of Notebook Content *) Notebook[{ Cell[CellGroupData[{ Cell["Variance-stabilizing transformation for DESeq", "Subtitle", CellChangeTimes->{{3.540622683759346*^9, 3.5406226961574497`*^9}, { 3.540815669170343*^9, 3.540815673334532*^9}}], Cell[BoxData[ StyleBox[ RowBox[{ RowBox[{"For", " ", "parametrized", " ", "dispersion", " ", "fit"}], "\[IndentingNewLine]"}], "Subsubtitle"]], "Input", CellChangeTimes->{{3.540815676143361*^9, 3.5408157093482103`*^9}}], Cell[TextData[{ "This file describes the variance stabilizing transformation (VST) used by \ DESeq when parametric dispersion estimation is used.\nThis is a ", StyleBox["Mathematica", FontSlant->"Italic"], " notebook. The file ", StyleBox["vst.pdf", FontSlant->"Italic"], " is produced from ", StyleBox["vst.nb", FontSlant->"Italic"], "." }], "Text", CellChangeTimes->{{3.5407069553696613`*^9, 3.540707011225795*^9}}], Cell[BoxData[""], "Input", CellChangeTimes->{{3.540706950820628*^9, 3.540706952072029*^9}}], Cell[TextData[{ "When using ", StyleBox["estimateDispersions", FontSlant->"Italic"], " with ", StyleBox["fitType=\"parametric\"", FontSlant->"Italic"], ", we parametrize the relation between mean \[Mu] and dispersion \[Alpha] \ with two constants ", Cell[BoxData[ FormBox[ SubscriptBox["a", "0"], TraditionalForm]], FormatType->"TraditionalForm"], " and ", Cell[BoxData[ FormBox[ SubscriptBox["a", "1"], TraditionalForm]], FormatType->"TraditionalForm"], "as follows:" }], "Text", CellChangeTimes->{{3.540622754917987*^9, 3.5406228441955957`*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"\[Alpha]", " ", "=", " ", RowBox[{ SubscriptBox["a", "0"], "+", RowBox[{ SubscriptBox["a", "1"], "/", "\[Mu]"}]}]}]], "Input", CellChangeTimes->{{3.540622709621419*^9, 3.540622751006365*^9}, { 3.5406228468149643`*^9, 3.540622847455463*^9}, {3.540622881311955*^9, 3.540622881653171*^9}}], Cell[BoxData[ RowBox[{ SubscriptBox["a", "0"], "+", FractionBox[ SubscriptBox["a", "1"], "\[Mu]"]}]], "Output", CellChangeTimes->{3.5406228483207407`*^9, 3.540622882333549*^9, 3.5406235190432873`*^9, 3.54070632548785*^9}] }, Open ]], Cell[TextData[{ "In the package, ", Cell[BoxData[ FormBox[ SubscriptBox["a", "0"], TraditionalForm]], FormatType->"TraditionalForm"], " is called the ", StyleBox["asymptotic dispersion", FontSlant->"Italic"], " and ", Cell[BoxData[ FormBox[ SubscriptBox["a", "1"], TraditionalForm]], FormatType->"TraditionalForm"], " the ", StyleBox["extra-Poisson factor", FontSlant->"Italic"], "." }], "Text", CellChangeTimes->{{3.540625116841147*^9, 3.54062515229095*^9}}], Cell["The variance is hence", "Text", CellChangeTimes->{{3.5406228589902277`*^9, 3.540622862149235*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"v", " ", "=", " ", RowBox[{ RowBox[{"\[Mu]", " ", "+", " ", RowBox[{"\[Alpha]", " ", SuperscriptBox["\[Mu]", "2"]}]}], "//", "Expand"}]}]], "Input", CellChangeTimes->{{3.540622864971992*^9, 3.540622905693181*^9}}], Cell[BoxData[ RowBox[{"\[Mu]", "+", RowBox[{ SuperscriptBox["\[Mu]", "2"], " ", SubscriptBox["a", "0"]}], "+", RowBox[{"\[Mu]", " ", SubscriptBox["a", "1"]}]}]], "Output", CellChangeTimes->{{3.5406228908884497`*^9, 3.540622906087739*^9}, 3.540623520780364*^9, 3.540706328495348*^9}] }, Open ]], Cell[TextData[{ "A variance stabilizing transformation (VST) is a transformation ", StyleBox["u", FontSlant->"Italic"], ", such that, if ", StyleBox["X", "InlineFormula", FontSlant->"Italic"], " is a random variable with variance-mean relation ", StyleBox["v", FontSlant->"Italic"], ", i.e.,", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Var", "(", "X", ")"}], "=", RowBox[{"v", "(", RowBox[{ StyleBox["E", FontSlant->"Plain"], "(", "X", ")"}], ")"}]}], TraditionalForm]]], ", then ", Cell[BoxData[ FormBox[ RowBox[{"u", "(", "X", ")"}], TraditionalForm]]], " has stabilized variance, i.e., is homoskedastic.\[LineSeparator]\nA VST ", StyleBox["u", FontSlant->"Italic"], " can be derived from a variance-mean relation ", StyleBox["v", FontSlant->"Italic"], " by ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"u", "(", "x", ")"}], " ", "=", RowBox[{ SuperscriptBox["\[Integral]", "x"], FractionBox["d\[Mu]", SqrtBox[ RowBox[{"v", "(", "\[Mu]", ")"}]]]}]}], TraditionalForm]]], ". \nHence, we can get a general VST with" }], "Text", CellChangeTimes->{{3.5406229237822933`*^9, 3.540622976488544*^9}, { 3.5406230186338882`*^9, 3.5406234172645473`*^9}, 3.540623709187056*^9}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{ SubscriptBox["u", "0"], "=", RowBox[{"Integrate", "[", " ", RowBox[{ FractionBox["1", SqrtBox["v"]], ",", RowBox[{"{", RowBox[{"\[Mu]", ",", "0", ",", "x"}], "}"}], ",", " ", RowBox[{"Assumptions", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}]}]], "Input", CellChangeTimes->{{3.540623530465404*^9, 3.5406235592399807`*^9}, { 3.540623599688093*^9, 3.5406236174438133`*^9}, {3.54062365174788*^9, 3.5406236985888433`*^9}, {3.5406237316160307`*^9, 3.540623763300437*^9}, { 3.5406239927503653`*^9, 3.5406240020590677`*^9}}], Cell[BoxData[ FractionBox[ RowBox[{"Log", "[", FractionBox[ RowBox[{"1", "+", RowBox[{"2", " ", "x", " ", SubscriptBox["a", "0"]}], "+", SubscriptBox["a", "1"], "+", RowBox[{"2", " ", SqrtBox[ RowBox[{"x", " ", SubscriptBox["a", "0"], " ", RowBox[{"(", RowBox[{"1", "+", RowBox[{"x", " ", SubscriptBox["a", "0"]}], "+", SubscriptBox["a", "1"]}], ")"}]}]]}]}], RowBox[{"1", "+", SubscriptBox["a", "1"]}]], "]"}], SqrtBox[ SubscriptBox["a", "0"]]]], "Output", CellChangeTimes->{3.5406237656511507`*^9, 3.5406240086931467`*^9, 3.540706337835845*^9}] }, Open ]], Cell[TextData[{ "If ", Cell[BoxData[ FormBox[ SubscriptBox["u", "0"], TraditionalForm]], FormatType->"TraditionalForm"], " is a VST, then so is ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"u", "(", "x", ")"}], "=", RowBox[{ RowBox[{"\[Eta]", " ", RowBox[{ SubscriptBox["u", "0"], "(", "x", ")"}]}], "+", "\[Xi]"}]}], TraditionalForm]], FormatType->"TraditionalForm"], ". Hence, this here is a VST, too:" }], "Text", CellChangeTimes->{{3.54062372243547*^9, 3.540623757039871*^9}, { 3.54062379375005*^9, 3.540623799888122*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"u", " ", "=", " ", RowBox[{ RowBox[{"\[Eta]", " ", SubscriptBox["u", "0"]}], " ", "+", " ", "\[Xi]"}]}]], "Input", CellChangeTimes->{{3.540623420986374*^9, 3.540623478619273*^9}, { 3.540623773769244*^9, 3.540623774356125*^9}}], Cell[BoxData[ RowBox[{"\[Xi]", "+", FractionBox[ RowBox[{"\[Eta]", " ", RowBox[{"Log", "[", FractionBox[ RowBox[{"1", "+", RowBox[{"2", " ", "x", " ", SubscriptBox["a", "0"]}], "+", SubscriptBox["a", "1"], "+", RowBox[{"2", " ", SqrtBox[ RowBox[{"x", " ", SubscriptBox["a", "0"], " ", RowBox[{"(", RowBox[{"1", "+", RowBox[{"x", " ", SubscriptBox["a", "0"]}], "+", SubscriptBox["a", "1"]}], ")"}]}]]}]}], RowBox[{"1", "+", SubscriptBox["a", "1"]}]], "]"}]}], SqrtBox[ SubscriptBox["a", "0"]]]}]], "Output", CellChangeTimes->{3.540623775240573*^9, 3.5406240154345617`*^9, 3.540706341376553*^9}] }, Open ]], Cell[TextData[{ "We will now choose the parameters \[Eta] and \[Xi] such that our VST \ behaves like ", Cell[BoxData[ FormBox[ SubscriptBox["log", "2"], TraditionalForm]], FormatType->"TraditionalForm"], " for large values. Let us first look at the asymptotic ratio of the two \ transformations:" }], "Text", CellChangeTimes->{{3.5406237859608927`*^9, 3.540623835015697*^9}, { 3.540623912031002*^9, 3.5406239291035957`*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"Limit", "[", RowBox[{ RowBox[{"u", "/", RowBox[{"Log", "[", RowBox[{"2", ",", "x"}], "]"}]}], ",", RowBox[{"x", "\[Rule]", "\[Infinity]"}], ",", RowBox[{"Assumptions", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}]], "Input", CellChangeTimes->{{3.540623840409006*^9, 3.540623897293412*^9}}], Cell[BoxData[ FractionBox[ RowBox[{"\[Eta]", " ", RowBox[{"Log", "[", "2", "]"}]}], SqrtBox[ SubscriptBox["a", "0"]]]], "Output", CellChangeTimes->{ 3.5406238532935*^9, {3.540623884009026*^9, 3.540623898620083*^9}, 3.540623932367114*^9, 3.540624018553113*^9, 3.540706350438925*^9}] }, Open ]], Cell["\<\ Hence, if we set \[Eta] as follows, both tranformations have asymptotically \ the ratio 1.\ \>", "Text", CellChangeTimes->{{3.540623941645524*^9, 3.540623964772687*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"\[Eta]", "=", FractionBox[ SqrtBox[ SubscriptBox["a", "0"]], RowBox[{"Log", "[", "2", "]"}]]}]], "Input", CellChangeTimes->{{3.5406239683083763`*^9, 3.540623985663389*^9}}], Cell[BoxData[ FractionBox[ SqrtBox[ SubscriptBox["a", "0"]], RowBox[{"Log", "[", "2", "]"}]]], "Output", CellChangeTimes->{3.54062402054701*^9, 3.5407063538642073`*^9}] }, Open ]], Cell["We also want the difference to vanish for large values:", "Text", CellChangeTimes->{{3.5406240570203876`*^9, 3.540624066501199*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"Limit", "[", RowBox[{ RowBox[{"u", "-", RowBox[{"Log", "[", RowBox[{"2", ",", "x"}], "]"}]}], ",", RowBox[{"x", "\[Rule]", "\[Infinity]"}], ",", RowBox[{"Assumptions", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}]], "Input", CellChangeTimes->{{3.540624078076254*^9, 3.5406240782935953`*^9}}], Cell[BoxData[ RowBox[{"\[Xi]", "+", FractionBox[ RowBox[{"Log", "[", FractionBox[ RowBox[{"4", " ", SubscriptBox["a", "0"]}], RowBox[{"1", "+", SubscriptBox["a", "1"]}]], "]"}], RowBox[{"Log", "[", "2", "]"}]]}]], "Output", CellChangeTimes->{3.540624079366681*^9, 3.5407063578189287`*^9}] }, Open ]], Cell["So, we set", "Text", CellChangeTimes->{{3.540624088556891*^9, 3.540624089891953*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"\[Xi]", "=", RowBox[{"-", FractionBox[ RowBox[{"Log", "[", FractionBox[ RowBox[{"4", " ", SubscriptBox["a", "0"]}], RowBox[{"1", "+", SubscriptBox["a", "1"]}]], "]"}], RowBox[{"Log", "[", "2", "]"}]]}]}]], "Input", CellChangeTimes->{{3.540624101066662*^9, 3.54062410215302*^9}}], Cell[BoxData[ RowBox[{"-", FractionBox[ RowBox[{"Log", "[", FractionBox[ RowBox[{"4", " ", SubscriptBox["a", "0"]}], RowBox[{"1", "+", SubscriptBox["a", "1"]}]], "]"}], RowBox[{"Log", "[", "2", "]"}]]}]], "Output", CellChangeTimes->{3.5406241034473743`*^9, 3.5407063616117477`*^9}] }, Open ]], Cell["Check that both limits are now correct:", "Text", CellChangeTimes->{{3.540624108213401*^9, 3.5406241294046183`*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"Limit", "[", RowBox[{ RowBox[{"u", "/", RowBox[{"Log", "[", RowBox[{"2", ",", "x"}], "]"}]}], ",", RowBox[{"x", "\[Rule]", "\[Infinity]"}], ",", RowBox[{"Assumptions", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}]], "Input"], Cell[BoxData["1"], "Output", CellChangeTimes->{3.540624144767686*^9, 3.540706364776153*^9}] }, Open ]], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"Limit", "[", RowBox[{ RowBox[{"u", "-", RowBox[{"Log", "[", RowBox[{"2", ",", "x"}], "]"}]}], ",", RowBox[{"x", "\[Rule]", "\[Infinity]"}], ",", RowBox[{"Assumptions", "\[Rule]", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}]], "Input", CellChangeTimes->{{3.5406241492285852`*^9, 3.540624149396823*^9}}], Cell[BoxData["0"], "Output", CellChangeTimes->{3.5406241503157988`*^9, 3.5407063658855057`*^9}] }, Open ]], Cell["Hence, we arrive at this VST:", "Text", CellChangeTimes->{{3.540624156886202*^9, 3.5406241624452667`*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"FullSimplify", "[", RowBox[{"u", ",", RowBox[{"Assumptions", "->", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}]], "Input", CellChangeTimes->{{3.54062416605935*^9, 3.540624190033182*^9}, { 3.54062534911759*^9, 3.5406253565459948`*^9}}], Cell[BoxData[ FractionBox[ RowBox[{"Log", "[", FractionBox[ RowBox[{"1", "+", RowBox[{"2", " ", "x", " ", SubscriptBox["a", "0"]}], "+", SubscriptBox["a", "1"], "+", RowBox[{"2", " ", SqrtBox[ RowBox[{"x", " ", SubscriptBox["a", "0"], " ", RowBox[{"(", RowBox[{"1", "+", RowBox[{"x", " ", SubscriptBox["a", "0"]}], "+", SubscriptBox["a", "1"]}], ")"}]}]]}]}], RowBox[{"4", " ", SubscriptBox["a", "0"]}]], "]"}], RowBox[{"Log", "[", "2", "]"}]]], "Output", CellChangeTimes->{{3.5406241686802197`*^9, 3.54062419149958*^9}, 3.54062535102468*^9, 3.540706368929482*^9}] }, Open ]], Cell[TextData[{ "This VST (red) now behaves asymptotically as ", Cell[BoxData[ FormBox[ SubscriptBox["log", "2"], TraditionalForm]], FormatType->"TraditionalForm"], " (blue), shown here for typical values for ", Cell[BoxData[ FormBox[ SubscriptBox["a", "0"], TraditionalForm]], FormatType->"TraditionalForm"], " and ", Cell[BoxData[ FormBox[ SubscriptBox["a", "1"], TraditionalForm]], FormatType->"TraditionalForm"], "." }], "Text", CellChangeTimes->{{3.540624324957206*^9, 3.5406243429895267`*^9}, { 3.540624623453052*^9, 3.5406246604775143`*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"Plot", "[", " ", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"u", "/.", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], "\[Rule]", ".01"}], ",", RowBox[{ SubscriptBox["a", "1"], "->", "3"}]}], "}"}]}], ",", RowBox[{"Log", "[", RowBox[{"2", ",", "x"}], "]"}]}], "}"}], ",", RowBox[{"{", RowBox[{"x", ",", "0", ",", "10000"}], "}"}], ",", RowBox[{"PlotStyle", "\[Rule]", RowBox[{"{", RowBox[{"Red", ",", "Blue"}], "}"}]}]}], "]"}]], "Input", CellChangeTimes->{{3.540624219476747*^9, 3.540624313819049*^9}, { 3.540624609588531*^9, 3.5406246195269203`*^9}}], Cell[BoxData[ GraphicsBox[{{}, {}, {RGBColor[1, 0, 0], LineBox[CompressedData[" 1:eJwVz3k4lAsbBnBRoUJIpTRmsc1LC4noq25ZsuV0hJxsZSvLZxtpbNlphsiS kC0ksrRIyhLjraSiTkKcLJ1TsqtUhK/OfH8813P9rvu5/3hozr6WboICAgIR /Pn/9r11aC7hmxFazIYc0z0omI6b/rSDY4xoQc1cCpeC5+I/M0hJU2gIPW2l llFw/ZKYjk2OGa7VJ/UQjyhwK1eLCq2wgNDuZImSJQp83GtF/nG3xOLsYaNO Z3kMEE3H1FSPQK2j+rf2EHmYzzysPD1zBKrbl3lpp8mDYL8+LMK2xsWLGrmJ zfIY4cxmbeXYYsAoe8llFRXWFovjQYf+gNrjTh53HRUPpYT2tkgeQ7yRfEE0 hYqiy1LvLHPskKRhwplSp8KhSp3JrnBEePSM6bANFZ6jdfbv3J2hI5me4pnD v3/asGJluzPedjESbIuo6KtsriZUXfC71qSp2nUqjP3bBAJnXLCt6oXBuftU KC30lKxku0GqzzlAsZeKv1d/n1TlnEL1ztyzbHEanI7WzqTOnsKbidUiuTI0 vC0K/DLn4IHwJRPXCjkaenRm50gNT2jm0LkpBA1PT34StH/rhZML1lohhjTc Jsc3Jm33xWfT7qPqbBq2S1zf/CXbF9ofH1MMztJgU6UpoSbohyKjLW3GsTR0 zc9Kpnn74YL3zR6VCzR0pARscNTzR5rUY02zazTwmnzp38cDIHhr4wfzLhoi XV4wmFQWGMVOwzlvaNgnul3R3poFinuP5eAADQ1W08qtzSwopY9aGI3SUDvh vTU5IxDaalf2FyzRUC7ruVtpXxD2lVZdzGTQkXra1cImNRjDaxTCwrzoCMnM 7RAkg/G8Ln9lkC8drnWvzW98DcazCrdODxYdWj8MzERsQxD4p5cdQunoD1M0 rqeE4j/6j1DBpYMeO6JHqQzD6YebP8WV0XEn7dSukccRyJVm+N8YpiOvpvBO 2nwEUkqLdcTe06HvOZktLxMJ50PeM+4f6VBLVJB9bhQJj5MsM/FpOn4+z9ig eD0SRIB239ZFOooPn17X6xeFV3njrHvrGJg8qiW+52c05MT0bLQMGdDsz8jI WB+DueaXB9cZMxBuP7tpelsMthwcNJsxZUD8xE3lK04x6LQeK849zMAOT0JP mBeDc58e6HbbMRAYRgnsionF2ogNv1r9Gfhf4cp+L9F4PHIcOlp7md8f77ma u46DPn2pvydGGSgzKoleuZ0DOe3hQ+4TDBwo9nfyM+Gg/sDFlMEpBoLs18ga nOVAhCFS8uQLA4OdekkTHzlYHikXErPEQFVNZYBuAxctuvpClyQUYB4evf/N iSRUOizNmmsqgLN2W5/0jRTkiwydz2QrQM3iqYvDZBrYokc85MYUsH/wTdke /Uw82R2Ur2mpiFdlux5Hc7NhEy7ZFV+pCAmlTbwT2blQex/N6pxTBHuqj2M2 XgCD57InU/WU8KhmbZZ71BVYjIa6WUcqQTzZrvevFcXgZZmsl7qvhP+KZYTb VJRA8dyZRPEvSqC5X5DK0yyF1pzvWDxDGRMuIeJjg9cgMz/08Kq1Mr6dTQzy iCrHAUn1qw9ilGHFOt5RK1yBob/ONhtXKqN9DSNR+EYlNGLM9Ia6lZEeUa6t qFuNbON2VvWCMliXtpgZj9xA5kHl91lUFZi623/47dFNjOvn3V1uoILYqcTc E5xbiF+143WWiwpMDmoIf3e4DbqOwE2nWBXcP6/r4adQgyeMtp11V1TQ7myz bMXHGqw2D9vv2qQC11Vj5nL1d5DJTT5j0q+C8vyBFNmIWuSMLN4z+KqC4BUm 8dlWd/FmqTHRR4KJsYW22zuV6uAidV2HyWRCaKCsp2asDhun+6OOg4nyVIHq u033EIerejxbJn5o1IcVxNxHV1J4YacPv++cEFVkWI9jwt+6OmKYcHV7pqOy sQHJ3FNp09lMcKz8x9LfNsDuWK/wryomdNYnm0ffagQpXzWnSDKR8XDNNRt2 ExqkbOcdephIWVagbbjnAWL37rFsHGVCRrIs9K54Mxr3mcqoLDFhKCZdJ/a6 GWMt77Z1ixEQ8Fl4NWvVgv46w6w4KoFqudMfpppasMde+p2XBoFbvMLZfnke /JWE4xsPEFh4kWAnyuXBrHXGQdiKQNZr+WZiigf10tyGly4E4gt5e/+waEUS 8SR2PoBAfTdlfXtdK356bmqUjCGgv2zruOomEp75VgttqQTcz3vo/rOZxO35 jYKiaQS4sqXcnC18v2csN+X7pbq8qiiNhFOw5Nwzvu2dJb1HlEkENpzMepFO IJD8NlW4i8S2r8Pt3RcJlMY9+CzzOwkTgcXpgWwC7ZKLeh2W/ByRepQcApN5 2mmxViSITq9kR7417t7UmD1KorA+eMMQ3w9GilgvHUkMSyWNDF0m0Hsw4RvX m0RFj0vwcB7//y7S8IAPCT/v1gBqPgHKcYHMH74k+p9luh/n25UdrHWKReJe U8neYb4/lXmdMQwh4azrVz5UQEB6V1nbUiiJ3olLQfKFBLR47zfcCSdRvWbv Pie+w/oc7tGjSNR4CjUN8l3gliPSH03CJlkpkHKFQOvnHtvUWBKHpp8qOvL9 IVy63DiehFb45z/z+BZZdfjHrwQSPlsvswf4/heHK5Fl "]]}, {RGBColor[0, 0, 1], LineBox[CompressedData[" 1:eJwVz3k01QkfBnCULGPJSCGuu+Hen8qMsnSN8djaeCNJvMlOtsle1ixXVxfZ ciqlbC2WuERE2fqFo5i0EdPymimGMHGLqPTe+eN7nvP54znn+dK8Qx39JMTE xGJE92/2TTUnWO8LQKftG/czgRT8+cPCtB4/AF5XZ6jFxynwOHjrnzxhALbH FA+E5lLwsixqfvFwIBT4JenDFRQMbRcukgZBkHrYfzn2BQUPjnyQcHsZDP+j DhruW7Vwk5xSzdIPRbBUy8L7ES3oK1ZtnC8Mhc6Jd3NxM1pwrtmmuEkiDOrG 7vAVo+LpZ6FSfkgYSmP4q/crUzGQE7HB3SIcXbEBSg+NqehqC6UvTEWAscel +m48Fck+jxhsaiT69Xoyf+dS8auMvrbbgUho15CqDzKpuOM0q3uvIxJ1zUc0 ky9Scet9yObsgijk//zc6mMLFZVqQSY6vx6Dz9SbsQNCKvKiffc658Wi3fjq DbP/0hB3tmhAgoyFb6KT61MPGnybn9kJPsYizLrM1t2PBqMla1tplzh8mnpW 7hRGw2iC9q5WSjweSXr49p6kgZ42bkG5kYCY/2zJe1tLQ2N+gOF4TxJGaHsc ny/TcKmhpDH/cxJSZnclOH2nwSpoulBLJRlrjqgJ+lfRsSmTqda/Ixkq2YVp FXJ0rPQXbNCuSgb3o1B/LYWOcofodcNhKVgZowfVmdMxfdBIwXQlFdk3y7Or T9CxbbSgoGA9F1tPpm92S6Uj0U2oPruFC6mgQHVpHh0KXnW6pR5cHNfPkHc8 TcdPQYSFVBcXqRN7Lwou0hGVQIl6yk3DHc+QVxea6fhWsmY0WIaHsceutcem 6djB8PPspvGwInlW4+o/dGRfJccpHB5uBd+UG5yng1qVLHwcxEO49UruxiU6 LBuW5Uz6eWgJvbbEl2SA1z1rvjonHQRHLfGBJgMKU0NXi9bxISHOyJKyY6Bi x5XUNfp83CcnXivaM2BZHu4RtpuPge9qkiqODBxzk1OzPsFHfD0xruLCwOvf LbLeT/BRNbRTMO/DQE3DjQjOnQw4OmZ1asYzYJeYav7CKwuqv/zg5XWdAf7a LSPKghzk18p1V31m4HbVRFRrbw6iXXrO/b3MwN/WpWu9/peDkmHlWuY3Bsp/ kWUOS+fiWbRm3TlxJsTDEw8xObkoj47bHyjLRPuIz4P2olxcc0l1aNvIhEn1 TxVCrzyoO9sOmJgxsWnvA5/D0/lgj9VHCeOZMH/9osLU6ixyi5TKVoRMPKkw 7EnNKIQ99dShGFdtKOqod3kVFuFbNWElXqONmJkRvu1UMSRUHPbJrmiju2Ht ef+UUixq9CaFWOpAIfvQ8B+S5XDuH/v6NEkHv8kXJDpXX4HrwdKtzDYd0Pxz f7y07RqunH0zpzqvg/c+cQqTr68jJjhChquri08nMo8FplQiuL9fKtxFF06R ngO3pKrRNKd6dD9XF31yjEwpwQ2cmOee1hfo4kxSpbE2pxYCDQZn/LkuIs9p 2u4aF2DaOTqk5psu9vi7vbPvrkO1kUN2J52FtJnMIi9+PTQ1WfN3rVjYvdNA auHwTVw51hoz6cdCy2lOYBizAf2U3U2RaSz0eTuLS040gNe4atyolAVf2Uk7 jdZGzOnR/bd2sFB5+VWOWtItbNNbH0EdZSFWcjev0KkJrS4dn80WWJhc7r25 VacZ8eQbzvBaNla9qhhqmGzGYNaX+kEWG5V5YrVNbbchyFvsXbBgY8mgNaGY 24LO1afk613YmPROTymzaUXSb7PrHI6y4ev3cDtL9Q4Wn8i4rjvJBt8pfPLM yztIoxq8aCxkY/v6bLvU+rtIUpzyDRWwUXBf7rpzTBvcbCJH2SQbOeLFxjam 7ZBbFdssPsSGilJFfJNCB15Jvq2ammTDRl65Wf5ZB2q2YWTNVzbEji4/ETp1 wsPc/DFbkUCtRvS7mbZO7PuQoh5PI1DfVSIc1epCkvvgaj0DAsuP0g/JZHTB vtQy84UVgfPPtDqImS6MSx7+srSfAK+ky8x17z3cjlDKivIh0Pqcsr6v+R5M ZPjP/4okYCW+eUpPnYSMG0V1IpWA/+lAzl8bSby2HwljcwlkqF3LuKBJgsop JoNFHvxZS0+GRqLWxOHgB5HdvJVCxnVJ2LX5Gi+lEYgiP82UGJL4Omvzh0w6 gWsn2+dU9pHQ997TopNJoE/pi8WAIwlD2fADASJPXzLOT3Mi4el0/32lyAZN dQbCgyTezH1cszmLQPt4WeSgOwljw4uLBqcJDO9M/5QRIuqr/HnPLEf0/1PS xvIoCVPPvRJJIlM8xc4uhZI4f3vRtFNk35hYo4BIEpVlkpctcwl8qAg+bhNH YnMPobkzj4CyYUXv13gSYxMXt5wS2ajr7YbGRBLZBo6cPpETRg7fpqeQ4Nnm wzafQLHfBenRVBJcTYpplsj35oZc8tJEe7WE+gMiv0tUrtzFI9G9/0cthTME pGUdlr6nk2C2Jkjbi/x/GXBYgQ== "]]}}, AspectRatio->NCache[GoldenRatio^(-1), 0.6180339887498948], Axes->True, AxesOrigin->{0, 8.}, PlotRange->{{0, 10000}, {7.603101236704731, 13.316142815535127`}}, PlotRangeClipping->True, PlotRangePadding->{ Scaled[0.02], Scaled[0.02]}]], "Output", CellChangeTimes->{ 3.5406242472538157`*^9, {3.540624281049045*^9, 3.540624314895378*^9}, 3.5406246200625257`*^9, 3.540706377585573*^9}] }, Open ]], Cell["\<\ For small values, however, the VST (red) compresses the dynamics much more \ dramatically than the logarithm (blue) and the identity (green). This \ reflects that the strong Poisson noise makes differences uninformative for \ small values.\ \>", "Text", CellChangeTimes->{{3.540624693085382*^9, 3.5406247289244823`*^9}, { 3.5406248163017282`*^9, 3.540624917261745*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{"Plot", "[", " ", RowBox[{ RowBox[{"{", RowBox[{ RowBox[{"u", "/.", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], "\[Rule]", ".01"}], ",", RowBox[{ SubscriptBox["a", "1"], "->", "3"}]}], "}"}]}], ",", RowBox[{"Log", "[", RowBox[{"2", ",", "x"}], "]"}], ",", " ", "x"}], "}"}], ",", RowBox[{"{", RowBox[{"x", ",", "0", ",", "100"}], "}"}], ",", RowBox[{"PlotStyle", "\[Rule]", RowBox[{"{", RowBox[{"Red", ",", "Blue", ",", "Green"}], "}"}]}], ",", RowBox[{"PlotRange", "\[Rule]", RowBox[{"{", RowBox[{"0", ",", "20"}], "}"}]}]}], "]"}]], "Input", CellChangeTimes->{{3.5406243535636806`*^9, 3.5406244096271353`*^9}, { 3.540624670184353*^9, 3.540624671231537*^9}, {3.540624734125123*^9, 3.540624734432403*^9}, {3.540624796200985*^9, 3.540624806534687*^9}, { 3.540624844102325*^9, 3.540624845507819*^9}}], Cell[BoxData[ GraphicsBox[{{}, {}, {RGBColor[1, 0, 0], LineBox[CompressedData[" 1:eJwV1Gs4lAkbB3DEOCwlKZlnZp4ph7AqZCni+dc6FDkuedLqoJhRyXlzjthe x0g5n9vQqsW2K1RILKlQePOGilIqbU1apojZeT/c1//6fbi/3Nf1v9d4B7j6 SElISPiL5/+5NmPhlxJOm+WjuEzln88x0SCnS/rzS6nDUlzfDb8x0RZec1GH X0t5masXvWxg4pzJxI9KZxootwwyhnOLiU6DIB6D30otkV7/TrGLieiVokZR WTsVpuCrr9PDRE6WteXX9C5q6YA//7d+Ju6k9e+Y4fVR2jaOCZGjTAgSFSfD qx9SjJx18W/HmIhpr9UVlg1Q85EOb1gvxfuYrfmUPkS97WpmTb1jotssoekd 7xn1bY9lduccE5e3roj9rnaMMvfUPzazwMTHkcOlodXjlFOudr6iJIFYptzz 6bIXVF6Vecw8g0BuriPvQ/okVZCoYSCtQuDu2ZGg17wP1MBUnk+BFoGkvJzu c4YCyv2/nJyn6wjo8ynPTbUC6vOcjixTj0Cw7Nmo4OqPVGHSh2/DNxBYtDZu EZR9onR47pXDJgRWt0d+/3e6kAphpDCCbQjYNcs6vuKJqGS1oAtsbwKGIu+Z VQIRZZRif3TkEIE07v5eja0SsJB/FZ3tQyDU+3qYW4cE9E74PVrkE7B6FdhR PyiJw1EF9ZcCCDz/+8nB8JklyFptYecXTYC70FC0YCKHpJrAd5LnCaiW+AhW RsnhfPnmWX42ATlqhdWGVjl8zjZT6M0h8CHu+NQ+W3nILn8dmJVPoEVay+yW hwJC26yVZEoJeClmDZ0KV8Qdu4GU3F8J9CiZyEt7KePmxT3yBc0E+u1V8nqj lSHiRlhfbyHwv+T32vlFyljM5/k9biUwIVNltWFUGRqXNMJUbxP4uqAWR+9d jj7GMkFMJwG9919mr+xRgUJp0yS3T3zv3uYJ992qqLRU1Wx8RmB7hvWtKqfV GOhL3pW/SMBn45ySoRIXzLv6AUHGLDjo+nxcZauJlPTHgy08Fq4OxoR1b9bB mjslS55msWCuolcrfVIfT66Z7axrZoE3n1VcQRsgya11RcEECwGdk5tqHYyw uTLvRgmDjRRPhtbCMWMMpOx/fUyXjUNdUb6GfBOICq/4Wu5go01W3eXons2I edgdr+bDhuHyuJw3oWYYdhKoKSawceCThb7jsa0wbzrJGCtmo7zkL5Yw3RIR dU3Go01s/HHGbKTfFbCwMj3hPshG6HUL5+Jd2yChbxtr8o6N1rsV08l7t8Nc R++BgwwHfrZbmv3o7yGa1g5isDgoumRYf/SIFQqcitI+buLg7Kfm+MAT1qjs TuzrsOPgVXXYT/xgG9TNaugm7uegwyYr79QpWxwJKzsSGsaBpNPw84y4HVhS 73svOJkDiW3XJMsidsL8i6ZZSREHSWpCh/7/2KFHkUiYqeMg0PPQbWGCPSBd mprczsF4vPFhKnsXfuo8WLVtiAPVc/nDyeUOmLolE2r6loNKZb2IrkJHSOU0 qmXOc9AWTFqvu+yE2MhVEzJLSZRwy1gWbs5IaVR4IeCQaI6ek+A/cIbryfY0 y40k1tjLftXwcMGcUed3F0Dizz7d1MeDLnjycN42x5lEWueiAdfLFZ3n9+nk HyRxLuTNkQOjrpjQ2mZjFkSK/w1n2do9P0CzJEvVNY7Er7oyd0PHfkCNqc7N oEwSjH8k1dv2ucEvc2WHVSmJKoPfY0sm3aDOTCDVa0noelc7Lvi7IzC+9y2/ hcTIzqar7lPuOEh6DMXeJ9H/S+R6KmQ3ZituhmSMkPj8h8vWtdO7ceFFVUbA GxKixuD608c9kBU48fbJLAl7Axvr6BkPGHXzPMukuOjy2uh3NITG7zOa7yHP hcoh/o3iMBqbrbQ89om9j1+u9OAEjUTibEuU2LPBK64aR9Ogt1xJuia2ZpJw biGRRoXkrLy+AhdxV1vTMnNoFLAYbau+4cJM1rmuvonGo8dJUlOKXJxWTJZ6 fYNGyM8HtsiJe/Nw+W03ZguNz3bVx7XE5rOMv8TepjE+fX5wv9i5hqu377hH o8Prfvqg2P/sHRsYHqWR2iCob17KBQ6qays9o/HMQGpwWOxUX9dwapxGyXyS QCj22qAO1sWXNBpchJpGy7hwOX3Jx/89Dd75H6MqxS5KHW8oFdAIWM9Jbxf7 dSZToX+aRkTt7uIxseMK02pMhDSStqxvYipzcb/sLwm/LzQElfc6TMVWq1x0 LZynwboz3usmtvdl04qeBRrlfQFDQWLX1AUKRSIaCoKwp2fE/hePQ/RA "]]}, {RGBColor[0, 0, 1], LineBox[CompressedData[" 1:eJwt0nk01PsbB/DBYBBZBjPfpCmafsnFRJLo81ZJ1pD4Ii2iRiR73VKkiEbW knZaFCpdaRWVjKJIrjbq100prpNGYSwX1++c33POc97n9d/zPueZGbjdM1iW wWAcmNz/5Z0f0p+xDAlh/H9S8/PqcwUSsuX43716ddnEWEj8zMskRDksicOX XCNRitm7o0r6CG/H+z6+Uw0Zt7eolhT8IrMdQ9L2fGwhhdof3LL0+0n/r3XP +XNekeWdyR9NT/WT7B6my4MNb4jowBvG9mMD5OCSQ+b0lXbCebxr2ffDUrI2 xm1qctsnUplj0JquMkSmbRZfWdDUQdYFPg8yThsitWrdXcP3P5MLMvopoQeG idS6Wi4pvZMI8Ohp965RkqI7Os9/qIs4VSm6fd0yQXZNLahMnPWDCCYCB3Qk E8SyyGSmQvcPks5b32Rgw0BE3smhmd4SEhN4L9arlgHfMy5rSth9ZPnXiNqb rTJonufvbJP4k3R8/7Bx54AcvmSOVURTA+R1huhm1iImmreWFZYJB0iD6SKl 4j1MsOapfOi8OUDKI49cfycvjxULg8f9XQfJvgGnCWsdBSxBQLBepJTwxm6f GrNkweNxu0Vx1jBhnwmWaO9m4Wyzk5XVy2HCIlrLTR6wENRuWcDWGCE/EsN7 1jko4Y6pj7g2Y4RUM2dbP/RRxtPxmqHgpFESMCXnTdLOKfAUc7qjXMdIo6ql EjNAHez8wrP0UgZanDXzm+LVUXptnWcZzcDbtF7+8VPqaG84Oq09nIEv8peW m7xXR1/QhviKEwz8M6abSPtrwLc5dW+ChAGj3uHBK76aGO8VC/qPyCC1qerL Gm82ErHA8natLDKmnIjmxbFhGG323xuvZXHEKU625ygblmuiio90yeJsnQkv 8RUbEX1RE3JT5HCr+qx/iZc25oneXznnIYfOa/taxj11wLbLP2rwVg5LM+0f XlrFwer9J+eotzKR6FRR3BLOQVej0fmWDiaq5Q1yxw5zsHhjuXdSHxPW8YzN q59zMBK3wKBETR7mIZUqEyu54Pus9OOtlIfhMoGP91IKSjbVgY235aEg1etl LtBDASOmyEakgOXlojemXnrQfhm0fvCYApK2jTz0i9aDZMvrGecuKGD885vc 63/o4XODvKC5SgHS5pxFa02mY2JH0Z9pPxTQVcJKvjFHH+LCyE8id0UEm46o ClR54FflT09UYsGl/eBFjj4Pvz0tFRZqs2BxUNuWYcqDx50YmVszWWB+NAt7 4c4DK+F28t1FLJzL2NKw7QgPdVrGJcYhLHz83ppSMm0mgmSX/HgrZsH3StmE gdEs7C1eFPktTgmuc4P7dBwMsSmayKjeV8aIKLbyg7cheim+C1OsjMu9yckX NhvitO7ai/2NymBWFHHnpxgiPCvp6sOPyri3pMvOVWyI0sxBa0U5FfC9QnP2 L5uNW9N0TH1WqoCREGkuIXykaFjPsm9SQXnrnth6q/+gxrb1gWn9FCzWNCpj JhgjYN7q2LXFatgymnP6Im0G49N5VwtnqGN73TfzMtf5SK1oj79Rp45Dfgqz x8IskJPCPy4WamDTk92bBUJLxOexB9P6NPBIkesR6muFxQOOEjpMEwKNxLzu GGvYqdQHDH/VxIZftsZuYTaTf2zf1uGuhcIzYj3p4SVIMA9Vul2phRsZ1u0t nkDwwAyF3OlsxNyzdT/tYocRcqfFJ5qNBw0Xf6b5L4XILkND/RkbIQ6LqkLo Zeg5+nzrHo42Tl0W3AzduhxblCNfqAi1kf2ral/EDnt8redrzS/XxteS2Dhh 1ArU571uM+rXRu2KnPykJAe8chz9tt9KBzKr2joyE1di9gnOpvxYHTDsbskU /O6Ik+LDNfplOkjVlbq2HHRCREVDjW6PDiL8NtVI9ztjsLy8rY+ni0/7LILI URcw1eS7jH11wc493pZW6Ir9n+3D1mfookjd6PcnJ91gErfHz7NaF4+iZtjP KV2F3oTYlDMSXZzhFejZerlDbXOehpY+B1XxIwxhszsOL9zE83DkYKaz4j8G Ph7YK3vi8rs4DipezBW9a/VA3um/i6gCDtLrxs14AZ5I3+33MPMZB7nR3Vs3 vPeEw7ewUuYvDuK19afO8l2NS4E7gys4XBTPlW+I+Ws1HIVuLingQqFfhvto nRe6xQrF1UFcXDL7Y++Zb15wbWceZ4q4mBtY4ja2bQ0EtofmXSrjot3xbvma njXI4M5hKL7kouX8rt9ItDdMzqqoBvZzMXTDw2bWT28ECPH9ApvCxJ2omynh PpiwK1Y7YU7B2WyFffyAD8bC8wt3elF4EmAaEhpN4xz7MT8hioLmJmHl6Vga Lv9YvZWLprBOWKjavINGU3PtntRJD0ZplVvE0+hu5F/MjqFgmCodGTtAg/9m VfD5OAqJ5Q/Ss/JoHLTWvFq7i4K1ovv1m3dp9PCmhYwnUUiZkibbVUlDGiEs 3LefwkuNGi+qmoar/NQm5gEKQj2L4b01NJwRwVRJpnBMwFm68hmN8br7nToH KfT7//Vn23sawylO8cYiCtjI5at+pCE33WD02qRFmz13kk80HjTGhgvSKcyK rNW70EmjozB7vuVhCh4pl4O39dLo7Go3I5kUTok+3T4roeFhXL360aS7sijl lp80Sh4ZhS7LmuxzMv2apZTGiV8lOxyyKTwvEDNChmkcPVYTVj9p3aJxz5Oj k/eeCvN2zqEQWLrwYuMYDQ6z1KJx0teuR0gnJmi8ak5QXJVL4V+ACJdb "]]}, {RGBColor[0, 1, 0], LineBox[{{2.040816326530612*^-6, 2.040816326530612*^-6}, { 0.03067179205596268, 0.03067179205596268}, {0.06134154329559883, 0.06134154329559883}, {0.12268104577487113`, 0.12268104577487113`}, { 0.2453600507334157, 0.2453600507334157}, {0.4907180606505049, 0.4907180606505049}, {0.9814340804846833, 0.9814340804846833}, { 1.96286612015304, 1.96286612015304}, {4.090835708545865, 4.090835708545865}, {6.07778835701521, 6.07778835701521}, { 8.025764881887605, 8.025764881887605}, {10.138846915816112`, 10.138846915816112`}, {12.110912009821138`, 12.110912009821138`}, { 14.248082612882277`, 14.248082612882277`}, {16.346277092346465`, 16.346277092346465`}, {18.303454631887174`, 18.303454631887174`}, {20., 20.}}]}}, AspectRatio->NCache[GoldenRatio^(-1), 0.6180339887498948], Axes->True, AxesOrigin->{0, 0}, PlotRange->{{0, 100}, {0, 20}}, PlotRangeClipping->True, PlotRangePadding->{ Scaled[0.02], Automatic}]], "Output", CellChangeTimes->{{3.540624354626749*^9, 3.540624410273096*^9}, 3.540624672639637*^9, 3.540624735335573*^9, {3.540624800646563*^9, 3.540624806947823*^9}, 3.5406248461654253`*^9, 3.5407063812065363`*^9}] }, Open ]], Cell["A template for the R code in the function:", "Text", CellChangeTimes->{{3.5407065548336563`*^9, 3.540706563191538*^9}}], Cell[CellGroupData[{ Cell[BoxData[ RowBox[{ RowBox[{"CForm", "[", RowBox[{"FullSimplify", "[", RowBox[{"u", ",", RowBox[{"Assumptions", "->", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], ">", "0"}], ",", RowBox[{ SubscriptBox["a", "1"], ">", "0"}], ",", RowBox[{"x", ">", "0"}]}], "}"}]}]}], "]"}], "]"}], "/.", RowBox[{"{", RowBox[{ RowBox[{ SubscriptBox["a", "0"], "\[Rule]", "asymptDisp"}], ",", RowBox[{ SubscriptBox["a", "1"], "\[Rule]", "extraPois"}], ",", RowBox[{"x", "\[Rule]", "q"}]}], "}"}]}]], "Input", CellChangeTimes->{{3.540706440235935*^9, 3.54070654712416*^9}}], Cell["\<\ Log((1 + extraPois + 2*asymptDisp*q + 2*Sqrt(asymptDisp*q*(1 + extraPois + asymptDisp*q)))/ (4.*asymptDisp))/Log(2)\ \>", "Output", CellChangeTimes->{{3.5407064886833467`*^9, 3.540706495739716*^9}, { 3.540706529058442*^9, 3.5407065480525093`*^9}}] }, Open ]], Cell[BoxData["\[IndentingNewLine]"], "Input", CellChangeTimes->{3.5408157360105877`*^9}], Cell[BoxData[ StyleBox[ RowBox[{"For", " ", "local", " ", "dispersion", " ", "fit"}], "Subsubtitle"]], "Input", CellChangeTimes->{{3.540815731390847*^9, 3.540815731815278*^9}}], Cell[TextData[{ "In case of a local dispersion fit, the variance-stabilizing transformation ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"u", "(", "x", ")"}], " ", "=", RowBox[{ SuperscriptBox["\[Integral]", "x"], FractionBox["d\[Mu]", SqrtBox[ RowBox[{"v", "(", "\[Mu]", ")"}]]]}]}], TraditionalForm]]], "is obtained by numerical integration of the fitted mean-dispersion relation \ ", Cell[BoxData[ FormBox[ RowBox[{"v", "(", "\[Mu]", ")"}], TraditionalForm]], FormatType->"TraditionalForm"], " (by adding up along a asinh-spaced grid and a fitting a spline). Then, the \ scaling parameters \[Eta] and \[Xi] (see above) are chosen such that the VST \ is equal to ", Cell[BoxData[ FormBox[ SubscriptBox["log", "2"], TraditionalForm]], FormatType->"TraditionalForm"], " for two large normalized count values (for which the 95- and the \ 99.9-percentile of the sample-averaged normalized count values are used.)" }], "Text", CellChangeTimes->{{3.540815740854124*^9, 3.540815862626584*^9}, { 3.540815907181427*^9, 3.540816034208343*^9}, {3.540816089844325*^9, 3.5408161426064177`*^9}, {3.5408161836470623`*^9, 3.540816250332963*^9}}] }, Open ]] }, WindowSize->{640, 750}, WindowMargins->{{148, Automatic}, {Automatic, 24}}, PrintingPageRange->{Automatic, Automatic}, PrintingOptions->{"Magnification"->1., "PaperOrientation"->"Portrait", "PaperSize"->{594.3000000000001, 840.51}, "PostScriptOutputFile"->"/home/anders/work/SVN/DESeq/inst/doc/vst.pdf"}, FrontEndVersion->"7.0 for Linux x86 (64-bit) (February 25, 2009)", StyleDefinitions->"Default.nb" ] (* End of Notebook Content *) (* Internal cache information *) (*CellTagsOutline CellTagsIndex->{} *) (*CellTagsIndex CellTagsIndex->{} *) (*NotebookFileOutline Notebook[{ Cell[CellGroupData[{ Cell[567, 22, 182, 2, 85, "Subtitle"], Cell[752, 26, 230, 5, 66, "Input"], Cell[985, 33, 431, 13, 71, "Text"], Cell[1419, 48, 92, 1, 32, "Input"], Cell[1514, 51, 575, 20, 51, "Text"], Cell[CellGroupData[{ Cell[2114, 75, 330, 8, 32, "Input"], Cell[2447, 85, 234, 6, 47, "Output"] }, Open ]], Cell[2696, 94, 487, 19, 31, "Text"], Cell[3186, 115, 105, 1, 31, "Text"], Cell[CellGroupData[{ Cell[3316, 120, 255, 6, 32, "Input"], Cell[3574, 128, 305, 8, 33, "Output"] }, Open ]], Cell[3894, 139, 1272, 42, 145, "Text"], Cell[CellGroupData[{ Cell[5191, 185, 768, 20, 61, "Input"], Cell[5962, 207, 679, 22, 72, "Output"] }, Open ]], Cell[6656, 232, 578, 20, 31, "Text"], Cell[CellGroupData[{ Cell[7259, 256, 264, 6, 32, "Input"], Cell[7526, 264, 769, 24, 72, "Output"] }, Open ]], Cell[8310, 291, 437, 11, 51, "Text"], Cell[CellGroupData[{ Cell[8772, 306, 515, 15, 32, "Input"], Cell[9290, 323, 301, 8, 52, "Output"] }, Open ]], Cell[9606, 334, 180, 4, 31, "Text"], Cell[CellGroupData[{ Cell[9811, 342, 211, 6, 63, "Input"], Cell[10025, 350, 178, 5, 54, "Output"] }, Open ]], Cell[10218, 358, 139, 1, 31, "Text"], Cell[CellGroupData[{ Cell[10382, 363, 517, 15, 32, "Input"], Cell[10902, 380, 330, 10, 63, "Output"] }, Open ]], Cell[11247, 393, 92, 1, 31, "Text"], Cell[CellGroupData[{ Cell[11364, 398, 354, 11, 70, "Input"], Cell[11721, 411, 323, 10, 63, "Output"] }, Open ]], Cell[12059, 424, 123, 1, 31, "Text"], Cell[CellGroupData[{ Cell[12207, 429, 449, 14, 32, "Input"], Cell[12659, 445, 92, 1, 31, "Output"] }, Open ]], Cell[CellGroupData[{ Cell[12788, 451, 517, 15, 32, "Input"], Cell[13308, 468, 96, 1, 31, "Output"] }, Open ]], Cell[13419, 472, 113, 1, 31, "Text"], Cell[CellGroupData[{ Cell[13557, 477, 434, 12, 32, "Input"], Cell[13994, 491, 695, 21, 68, "Output"] }, Open ]], Cell[14704, 515, 579, 19, 51, "Text"], Cell[CellGroupData[{ Cell[15308, 538, 685, 20, 55, "Input"], Cell[15996, 560, 5767, 102, 227, "Output"] }, Open ]], Cell[21778, 665, 382, 7, 71, "Text"], Cell[CellGroupData[{ Cell[22185, 676, 958, 25, 55, "Input"], Cell[23146, 703, 6513, 112, 256, "Output"] }, Open ]], Cell[29674, 818, 126, 1, 31, "Text"], Cell[CellGroupData[{ Cell[29825, 823, 676, 20, 55, "Input"], Cell[30504, 845, 273, 6, 65, "Output"] }, Open ]], Cell[30792, 854, 89, 1, 55, "Input"], Cell[30884, 857, 183, 4, 37, "Input"], Cell[31070, 863, 1195, 29, 159, "Text"] }, Open ]] } ] *) (* End of internal cache information *) DESeq2/inst/script/vst.pdf0000644000175400017540000055376113201671732016436 0ustar00biocbuildbiocbuild%PDF-1.4 % 1 0 obj <> stream x]mq~bOKaɖ!qDH|Xnۥ(ק٪QL@3]]]o]]cO vqpeao]x2hOwt-H˕5?_5t~|k_-OcDNeALKXBS-ܩ֗iG U\ś K>IMKRկ? x_/7vow~xϞÖЙP,Or{~Lq,1]*w}z#C k `W= 9z:;;#+HOB g hLHkbz|Z>Ow[9ww׫64 w NHx]^.ArW|?~Zv{}g"j~%h6x<=_MZ;_ ފMҠZ~  Y5\rt.\7|z/eٔ;Uۍͽȸ.y}^b^mwF5 F#-|y+!-sBA‹  rCt^.\RP I-TS06=O"D5:f+c[ ! T;y>J0y.uYȣw zDw4*Ec}e3s<3v|N\@;)V^i:ޑ|5Hi^ӭsKAq+M$#N:lһ*F10Ihtt]A~Zx{#[dd50~׽7Pv#y:?LpY'I}"'y;,N.vAȦ'S<1o^Nl[رKYqŒQj+>Ι޹s.Ν3dēiF13g$"ZƸ.eL sx"H2XAdF6Xk 4bDڎb`ylDCL"&>%@{|@ZGf= dT$tϋf[CHɍhUP;:(騛f|ܪ;4j7xse_g+vaޙ"ܕy\;s ܈Oz2=(qB`{n7UDTfmɗH+29:"}ĶGvar* Okg԰TI?$؏@$Rf 9x_Jq޵P)Aep(|؇R\!U}h33L %VpL$(!(O4= gW1*,d.쟫-8qSԷd75N{,~fhٱ|N],pIQY>yG3)t)fk"R5C?e7=۪{L綧"P|y$r]4vXO/ej h&Z#{`mIl!^LK"iCV dhYaO DKCU zyվbbܚ N^||7-8%Ck88Rpu~ZpA3pxBˎDI~xmA긝kj *oZ gX,XyH I?Uɾz+g 9_NxF8nxڦ']zSlG|UsjPcܳImOHZ(lS~PsLu1߅daf#Oȉf#Sl⒦&6mʈlۓa(*LCƜO$!Ai$qq4^O5>\<2{bX [UH @7ݛXv:xj;' c ғ|-E7|*HȊ=4%xL!aDE.6ښ)Ļ̓[MMºB&%-5,y \VZ9妕^t TrFvXFQ*::!¬* N]Cgg]phU;$W7"#\Ɇ3!V?v8ZE{iG]*BNћEgq{dX ˭\t]2*މ#VfL!Y䑩y ~ 9E7BO\^,m 65pULv#G%?F8zyC&RI%Ҋ.2ҳnK 8pN?O*v{qA`Hį*9^9u13BYAQqY{/ecl|e/&d <H NgZ̍6II2%y.L~p(q6yƪXlyDmZμrvB+^.p!h!-vK pyE>YS<9aBDGG?Tu-nDy,vvZHTyseӇ$IH2q$1YoULtanb'A%Os dAfEdRu=Pa X=ʍl EHC=WN]fҁ^EX=}ld"P4N6dI2֡s;V^dĎ1Ss4Xu nwPP -5+9NN Na_IӾ҄ = S4AC3Ov(I1p>&\}T d5cc^i5AmH9dOxV^ሏ=i[Ť,D@ֽe7þ evx@K^GOzV#˦%[yofoYPdCPVrPqz8C.nLu{qr %@>mWrķƵd&PEr{qC|kr?\[rkq*0"}@h' 'ji:L("&.d@c8 rSHmFs,8gR)G8miD5ԫR_?e#ud* =tdX~8Vΐp'`\q9}=ÊC '=Dũ SNVU |)g3fG ]*4Zxܮ^=-Ne`z";Ç=DK1'P F-Hvk &w{)%dydd?1 *a^2 ;bXҺ.n,q't%_j7RZ.Ѫ5=A௑w:R'd,ikCr[K9ֳ0}OT;E#E|̶zp.Uy $\%}NO3=>KsDA#$NFgU J˜MxF݃UD6f5S9PDnV2aߺdlq\/"*FDw: LI)swfLrdu'r'||qSI1WL"jiI \v!!UrNrqH8׊tYI ]X. ǑHә8Z#?Vd9l@!to4HcJp \{߲/o ^-GjSp"737+gLĂP%W-$"lD3V3PEg=E΄.q$8.q.H=]@D&"I;)=G:#/$R14k3A!M֛mN#*.d y7t-ObL b*S2؇u)Sε5pD5W-TAm9`D0|E/J=BxG&dDERm'N}O~m|N#G~=bo)3(Ze3r&^̖^^CrQ8SAHf ;C}CKǠ;ͩr=̾nχ9N}?"j1rS #${5_=SOtPKV(A̦6rA;K]DJmyUKcuxuX9݃{OUғ$q-sd_2׉ZGNkG @6$iN#vJnͨo5\Qs~`r0fj-G>ډ]J4DۚR],}m?'׶G W[]=5 dgJ(IWrvx/و3]ܮ?W#1^f^H2=o\b5G+<>uDV(dB1>Q)|xVPNb=īqj Y`S;G,wʟoo#w';v 8Q͝oz~?2NU̡5S4'ۼdM9|r.MGEkL)k@i$!sTAE)Ko'br=)ԳH8R.F~(H3]"THx:DN-)~.CWe:RO.Qg.My(3ͣ39F n)y&%ǂ낁Y]E|Q(L{ىC(rGKd,'è)OCt)=|`J>ۗDGv,Z=''$RR '.: ʉǧ8\|ȽFg;`LB˫;N\PܮBTV1>O-16h jhC|;L/ y}pZGg&aV8ѾcLnKQiGU\BnVHA Ҳ*vii.uN(Ľ{1W>JTk.^zңUN2eNH=!v1d}_/as|z%Wwn/߮6bLߑz_p"8on{͡jсw.0Ua&m;1S7u-rrB|eJS@V]<ZЪ'zsR?B[0c,39Q.b[lO\ .*C P+뤔 Y摪fGMMRJ4LJD4HcҙV9PUü1!4#O:-#*Ǒ:XVGJ03udN%l=D.$mkg:wJq^JӇhGK+U>gܖ4S H9jM04Q[p?DeU#e_D\;TWv|x(8Tp-AUYP5F:0m1f*KlXڎDxUCHT3k\0B'@g9>礽1E9:v1XͣhЉ}W3ȍҀ1wgGP endstream endobj 2 0 obj 6684 endobj 3 0 obj <> stream x][\q~_1~Iũqd˩2MVaE-ŵ".ߧp zfEb,qmt 40z4'95y{}{X}x7_HۅVls7>'}(k4C'McQMnGŹL 9qҖ7x!|@翻H j_'pzn6V醔:AYud"AEM*3Dzk<)o(Be͐P!3!PCГv}5=!JMC+'!6nR*4Z a ~M%󚋄!b+m| ͡GM5 vj4flijyD!\/YM$6e2o\=\ݽsҪHhIZDÐMn.X7Zf?/-dpnF3ByW =T z,PnwhO1˵٫n Z0acb-5¤O"urE7!Y5IuzflwF~t}8dg`6P\$6utLBMy)ԭ1YEC9o4 ijm$ P#\~U'GRzHR)o OV:c>\6Ў2i5y}DVr/c&|ZA,~6_t6?)/vt Rdi΀:!vц⽕ }nllIK?,-+{ߕ(=<Z~ow5y*%ddhaGX2 7i_I^9ki[3b~r?7u]_0YrV\_xGALRwc&4i-ݗcFҨ@+\w1H[Q#ź&:R1 oźw}Y7ۼaasg_5z̴{?IlˤܬBҬfHCnD,Ռq2&?+f qDg#;'iOƚ՚Jk:[AP'{'O@fKǍ-) MYY>59I=n-&W:%Б 336,rmMe$Eee5O)ۆ<*ny[s6/u "GNzJkBh |cK35/yVC : !TDurʒ Ijlqq^Xcpa2rpm._ '.d. N)EV)xLUTTnf}Y}6$s)׉S dx>g SM.;ZNpS%Mn%-dɍAc=>vχuNMϚ ]:ɕ j5f F&'1_<^脳씃Yv4ź0~Xe6ɺ\=wTGo^},2WORz"+`nxC9/'#ݿxz:Y>UC֙ j :ֳ'u@,duwɪTѓUW[epUmR]GW%cUKGWnk^=\]1m8\nY;\<\z*UGWm([Wn=\wpuӤM#Wő8zm$궑FnZpu[>UIΏTk?{|Ȇnys:!DR)z/ * 1WtO7yM" SV|(0jtnحj[*:$cJ9.PZW1ԶU %) %Pz0t&NJ;PN}uCI;6MZPFa(q$bm#Q1ԶjHT ia6 OCmSPP'`ʙ/^a(g !APΌ1lڕOQÛQ霛pc'OvtppD2Y%Vdzz{-ln+?jgepY4/|w- `v#Je 9aվ|wsۛ7?=iT JXgI@4װ^)tCç|h) SQ1qzJ΋-lѽ}&!]{5T0|Sj߸PB6͙$p"JdbmNV8drn #Q¶qLG| m4ef[ ,z.\Ou@#1}^&֍ [?):zG{!mHn!Hн¹'].#xV=9#݂( {s~h{^qg?|ʁ%B:9:pKGKOtNZuFb\0҆59!o`7$.!N*&u2rf Ilm 2D8H@3崽:D3X!pxVl gi%L"Tg7c+o8wjh %~Am㤏%sɫdLF` Z/چap.BJG~“1":-9Ta@2n'MwMWlw䶩fɄ-a?5=5owmD2[[;i-ĭM4d|:1z%~y!5 Z8L.ҋ x7d_ ٬#mr cs!7ͼb;8GjRecCHᜓ|'k|ِ M }/nHykyCto奰g ?#pafE&HY/~xzb$eǫWWo/J!_]|ۧח77-cTF閽B\ׅ=f=HgUnĩuaBb NzJ;!L@0p<EH&Bx4,x]*%^d-cCf%PPX쉢}r_EL KȢR;!ljͮJvR`AI,ĔPڬ-RJ &FlmFg i3H}`J+`#pPf/nX=0|ww ex o7T:0 RGfj[ŖVhZɗg dhFl?70H,XOwJ=[2kkИ@q5Rk[` n5wz5C} (Ԍ5JS5> lflT]LFU#gJ=%jm5CdHkvP#VN4QF3U3Ԁ J#iJ3 m ܗ+ 4H#W!]Ƃu,H#ss>!`kM/>a &iPݻU.~]LN^3e4}"2Wѹ)q&գؼ7:][][2ϓ>54nXGVmVwIJk3O鞼[y)Y>L37 /ZIL!qv̔M(^^*CGkK~yqV|4y~pN b/ ekb/ eaسT"ǔcTiO"(a,(KN MLf 6rϔAQ׳9L$);QN4)UZ-I% R[,v%3e0wu̍p*lP+rϔ-RQSV૔V/+02Nr<<%[YI7Fb)c}H$-<z9Idp3l9b$DgrɛƽtW!p4v9+aS',Ü]ƒ/R v9BۚxCN A_E 9[+;ja훗[Gp8eKY9}9 t9M2]J8S`4x[\WEd/Y3\ϑ` zAτ?LjxpzP fS go9+a3]B8̉IV '!J"5i7n+b?~ZѲԌ0w7tM{]B8̩r-92rzO"3Af\"5YFɚI}b1 y)#ӭF83pZb'0ȩlY9#jvHL}UA]zUfxi3$ӋAIX{yt~,F?@-=^k#l:_A$b.;b  O" >܎}2|$c]QzIVqSƬ8)Wbs6_;IJjT--̀ /p'~d AKIu{k!A uR bzS]7|4}y7k# endstream endobj 4 0 obj 7621 endobj 5 0 obj <> stream x]YGn~_я)}8lwl9$gcri̮pPĨD^@HA1ߩ?%)ˠ/'|wk7nO ?~w'oxs6ӯ:fdWIaIC_lsuf9e$f%lPouR-oOJ? v_v.~ru 㦘![xE ,ȩyyyͷyS@oOכ:+ k㒨D4n*yLPy)u?pi21xy;Lc:9oJ u\vLQ`28c4=ghܕ&-Ӑ(']To쐦)?;(}63U58! 1#V*!MS*JsE7头ˍ)TҮqOHjPCl'S${L MpLiRazMSFU̪!+(R#M"TO,H xJi4cmud5MZ>y i6d dՊƍ$ݯgceF|ͦGfgȱH(pO)ٛ*SǍa7F_9!!U}wsvv5f*E@eڒɭ7a2IWI==B6|S(8Qg7Yr+_ yWdfg++:x]۟Qrw ^LLAg zu)I܎h[!Tʿ^.ua{鞽7U팳qf >zW&?)KN7hJ`Q8ɫ!@+o-9KH(CHe0cF4HNjLON 8A*R*yjiX`m@~)l c nIi2ђ"G+9 VcIrHb#) Plg(XHpX*z w@Bw@ksЗRI 3SHnHyhؐȨ;HU`R$j)TdDucq #VMON)!8g&2y"Cs t$n)œdH,#" ' ;3\ -~cB^ O *ı_I4BE:'#@-YD\W&Ѡɴb IE )[FpV-Vr`1$d8mBdf:H@ #dLa.#K9<#>r1ycl嚚fr# aHTf$sH&3B|b^Bp1ɰLfGx@y C/yDLZFL2}IH!5<#ѕHf*c\U4K@iq]rPlC$g9w Vlk.,[}% 9f_ @!%I)UD q((7HB%F_ K_]P|CfvݡHnOdօ6Y$fhJإĈ>|%A j̾ Ub* ttOI"h\%COvz)A5/)m=#Z ֗.t/Vͫs^.JIrQ*n^(tK3`(4rM hxF֤d (̾(Ÿr!LXd /Nav~~4 XTPS Ƀd˺}F0!;9ɩۜ"tHj L_}bˀ$ ]էTR9sC0V -= _fmANRqUjy^sRdfK=p~BZG1F\`,i5c;0`Arkƶ g{ oYЖc,v+R׌S2/0d,cꌅgdlOrڽyzƶέglKW8L?!c!Ʀc3;`==șf{3#zƶS26[XwY33تXofkzm4ե[ K8pR׉Gvqe[iy[EKy<2{D{k"CE8=Ԝ=HwrS.T/y,ޯ!(d(9i%B%=G8hm"gsZ?bq?^ Πt}KO1AD 8Mb?DS_lk!ی ENvrVR\Bs]asG5wBӁG M!2yqN_޶e!KO"2|"G.j+HrJ%3"8s\K 8 s6`&w-˜ Bfpx9d ^e~:&P+{XviDjՙo 8f:u`N-e9/Kmaf{RؖM^e~:Q(*ˇHԌ</yDGHbU3?#W#Ex/!3nt4a7j$ۻAފMƠyk|?>PN1Ayl8qe'uԁܜ_#&_qȞ_Peo^ߝ__<9]lX9]p:wݬ9]27m[Aqyt7?W^\e {kO_VbQ+9͆󷿎GYiȪV:G.%7\#_&Vw gK $L{M6$p-Ih ޝsY9`K7냤M_բWNU7&:c،Ʊأ“v2Qbx{8v@ eީoq#ANw^%!O#qS>%MMZU?%q$QJ*vOؓ^Uv&pl+}+O^mwnPhEr7Qٟn>ۓ1ᣫ^q|5VJ=LS86COh=gŗ3"-W_RhnF[ J bΒpR%rfml,% [E+>PHiG@A}C(>~ }q JYMC={WK?Sd'-M9!faZqK' Ӳ貆م(&m8&oѫ´<.ܫV>ڦS?gJ݇J.hڭj"*hS-bK7o$qϺK3.! &V- Yf no\#~::,q4|1BByH]9yB#RDkD$a%M(%ϧ5j%e.FKIrk|{n_ίƏ^N{BzC:CQ|́;[BF7VIx_m h TloصwjLe8{NHW+aƊ(hA{F$G c<U{J'.(w*3׭=/<%ci4 cn!°!Ak!Tu2>>PKb=փc<^%~"ݨztpo(D]_ݜ+9'OM#>;{wZ)qspl-]29F3˹qQ (Gr^SH"n.@ȋhgMи[y)_pf *d&kI#Q 8{ټU.!dSv(kټ?_ ^)'?nUֽj<zToh>#FkN⥪wqTǯjݞ_?}=pbF|LG wzdc}`p4 ~ e{Faape UUJ}$1d/^}= oGm}j>}#qu26b+if{yvGJx\-UӪQO3kZ֣vm<ۧ7~hUl hռlš@~l*/C|FYDB#{VyaYQ޿V!\{;V08E qRijzZ8Su[ҘJ5B GkuTzJ gWw㇘L͸Oٔ@չ凋g7wʫbF ~r!jĪ|{rp(H~V1~#N={3 ia)u!$r9&vPrF^![|Ys;s K5&e z{7]4p8:uކfKWNeulwOA z\6|kx>+w V5aEo5;+xO2nP Z7Oʞ endstream endobj 6 0 obj 5824 endobj 7 0 obj <> endobj 8 0 obj <> endobj 9 0 obj <> endobj 10 0 obj [ 321 312 333 379 321 312 333 379 449 458 467 478 449 458 467 478 362 340 340 355 362 340 340 355 371 371 371 371 371 371 371 371 371 399 399 399 551 551 551 551 371 371 371 371 371 371 371 371 371 399 399 399 551 551 551 0 362 362 362 362 362 362 478 478 478 478 478 478 200 200 200 805 316 452 590 715 316 452 590 715 390 739 390 390 390 853 853 853 232 232 232 232 232 232 232 232 641 1248 641 641 641 1254 1254 1254 436 436 436 436 436 436 436 436 926 1843 926 926 926 2023 2023 1943 436 436 436 436 232 436 199 312 232 232 188 232 232 390 641 926 298 496 399 399 556 379 200 379 517 390 ] endobj 11 0 obj <> stream x̙eXT]i af`ABJEJ@JBARnz?~|?k_kZ{sVQg73:sqp , 6F&F ''qrbKB!0fg+eA@N. $C\ GK&sqx N>.>$ 0[]ⶦP@ H=qZ[8L\| \vfP#b5`ٚpZ88<9abge®:ALaݿV"laHK5:c~B 1qˑ OO!rR?{538x9xx`<c#[SO'vOfFOK3?Lw{#'t~oa8&c-rfv.8lB?5oV~7#S;[kWX̾qod0/?XZԆ(6o]DZX@LU,L,I˿)jmi Qg\&V{{Wv)>_win<_k17m;'-D[߃?*vIؙۚZښ@|#(T 12qt@߁PVz:@EPz tp%ծ[<@߇ARM0PJk&uz8aM&badmv@SG#zMS{'恭 |@g1h A~#\TUbNu ՟|~n2vq^W7/gor?{\g[7 5Ŵ P?R5`/k^q*o>E޿%^gx`~FrDQy~X:C VxU^>g MS'&c~X|?\,[Ap~A_.&?S c~`PQDpx"XX.OK';N!?<ס9\|ts|.>Ňv~W]EwznqW]wn.X\9_.:/X "gq=9A?7|gPIk[? .+Ssp=I~adijj}.~Ɓρa~CNëW >3Y;;+8\?z^;8T?n|Kanۤ0#_1\L/ i$aBvP9[9NБ 1r/?%؂vi lU7sP{ɏkYkՍO 3Dp\`N |`x~ ?N)0wu.@^.<*e~O#\2hpǐw1e]s?!oW~:5?/ \<׹/yddp4;*;AP'̡3ξ6E̞ 87g?;}ӀY3P̚}`L gQRv);8E|fEl8oA s+;;7=cߋ[G(o,[fq`݉Φ`,?mZU/oj DR*=&w?"rPO6c"ROlt1b|a"BY}7t`:wXE RP/5eƛEm ^!RsXm/Q1x5nWECnuozl5i*7%9 ɂyE.ͳMJⰒqHE"&'ûH/FƔh cȍ˗,k4 ޺ K"/'Ggm+R-RoSYWPZ1\*Ax $?F7h|+gp>cl s_*D2ҵWaHaПʹ:MĬx%epk:{xL>K]oAӂ_}mAH\AyFlnmb՗bt>Ś0%B˾HIEIAf{mҀ%I!Ims^Th; gGD÷uFuX3]wozۣ)=N|/G4US9=Gٹq\f#h f֪Yo#֗./P&/) QHm!')OOPV..fGPn gSe Ǚ| FOGFR{t3PߣIZ( mǒbCR\ZH㽦>loӷؕȋXbLhfጮ{I$ovc A,ƻo$"]HJH֤.m?$\a}?ežJ_8 2uyҡ1;6(K/0OjRޥ5˾*ŵҏO\J*^Miᶑ\U\`,F᠚WKXnK)&"A *mE6|2Xyqc[!Nva3dJ I]$Dz(ϊT1(u;o?Pj9wۓ@Ly ix`,?H:ٓY5 %`KUPNDR=3XL1hs`k^u{, D};Ery>Gu̬uJ뎪.L,&W) EwH0;+ďp ۂH+٩[+nRDEXG!AWNs|< xwn-P1-+r8$љ'<.KMp}Vd<(lc O2VvHNPxͧ&M+l(`U.s`0O~+J q޴)9<`Ԩ:n\-b>O6=Aؓ~defzԞƷS-zOlDSY0 Sspx /RWdZO]j]Y %jf%P_V q\J'yp2n{`HCV G=%CEZ{,ԓ2ںV]N8oosy7T9%f+?BPHiw6Pm|.\MbJɘsD}Tc7Aݢ2J|ܭzvpv+'7_.;ȚٸO 7G}}tSC'Q}#֚ fQ|Bs^A,\*AΏvo;l+ܖreLjGj]<:.)i|q 3uzx&R<ډ{>8KY{I :#| U£,$6.T7Zil!m"q(:lyq'/d8A`ܝwR~|]3v#l)SءbxX9L9#Ez-'E\ n+Grj%"}'KXj橧BJ$MӸriRH06qC]!Ro!BkKS6 (`쪲cq\}[ow[-)qnOYK^a6A/D*aq>bh"ëFxN*y|W iLM$D-q.EhCQeEm&մM[^-#!ēN<<Q=9PދX*-thrݹN oayg;PzŠB3prvBk,Ԟ|(6"x7?SּO IcU#dI8pc`q\E9B!\a܆o\l e+ '|mvҦ9p(]W7z0C1Ć#a4fjf%H@[D"Wi|fUa4O6iӸV!Duf녽brE{^_ԺըҤ1ލ5z i6rsQШJզSWdž Ng GJ/H23{tElT`9'| n*1BZM 3"~_/\)#d/"8QկÅ7ٔ&vXMvY=#j'E-b&h4ҶDE١PB%~IP(Y@()+M{br- J̕j't?cp1R|y7qqjU{%w(FS\\RcJ zWېj'Cp=-M4nWVۓꄵr&);A3lk7ޮq[m<{-!C)C T{Y3ifUKSEuwKbAYI2(uP.{a1^Nᾷġ`feX*Tiˡz;WU[i2SQ(tI&vX{Ưs;lhqUxg_3'Ex ≷ğAKWk\JzɃT@S(%cNj Y4M 8XppW.WH@ipAljn̪IE;юR|9y>bUhۆ JɊ1{Dqy.KFJ>$ɥ6?5?,sej\߈]wUQ&{1o?{썱=ďO>[C3>`RMmE蘩5ߘSy}VL\<חlYM2ތOUBe>Ę}Rh~69geI^1zfP|4 5cFӠu/䃜ME'jN7nH%9i{41:ȊJs[02RxB|Sv֝~S;#zD˼&/S3^mt8N$ѕh<̳RCl,ޚPhCcECejכRTnhZ3gw))i4ZUpve94kQgc,ޱ:vqb/cǠ,[Jt="HcQҍs6gk0`gfiwRbP˵`d10&qo1h[CD7R2J( _rWs3剚r ꘏'H#gWV@l[A2;/5 dh 0`n5n#Γʚf߉Uf+l:?UM%6&2rZoa4mrA_SB b "NJ6OSӔC6+^ME k*m@|,(,oʭhBkr^-N7 .D}1ͣ ت>[dz鷐Y~Thק[|?>6C}sxWťM=\4P}KR!I]u;A(q,L޵"gC*TplYa4KнK -i.S˱~(:G sCS\m^(n>` Eݍ8"~D5x Qg?.i. 3P70 N{WT[7rGV>NWxW.춰&UOl0Hb( C7^/3XV1&Tc'׎8vkD.7 clޕ}0gqD n Fqe~g`3ݹ!Agu K A@[2 A15 g G]|+"A?c$}t2J]g$"X8 "󿴅T%Pɿ]%ig/|\1׶ixM\Ѫ3~A,ԛrK))`ژ^pW"ý0ꒆ9 5KW ?WEH-wYxҊ@.a[$-w{3tуӵZ4B{<^mLgN|YTNV\Rr"cvM{ |ۀ^sdv+_4Uz:!Jo^]HN;\ t){8=qNENXlj'A,l ݖ/ZZohE㰞b 1ѿ2yΚYIM@&CǶǫz1TGNRD'S#\*<(X[BPmhh>I Q<G+~INjݨsi!-9ӝ2LF0Oe(prUj㚺iy·}>1>.fq!<4_iONdu!~5 ZqQʈ9\O($֎٩8)<." ʔʚ(1GP1$:KUp߄U+ac228#3њb!O<sv`k&NhH] Yɿײ'J dHG#'̛YGN/!Q+9-c=֤_ťnNc@5I *=) iA`s6o DDtw ]cLZ)xgPeگ|~4D/=}B LHvDcTL(Vi7Y'0t<`y'yKPú2 +@]v"n[5  y"lZA5ƣ4Y=Ε|G/&R5JFI`%mEԔsٲA7FEit-*ԑ{ Z/FO9nٳXWhy&0y>^k蛮xrnȳB_h 0GugZ<E˘2J-Iq7E_T"˙x߳Vj҃8Gqp;AÐp ;#1M;l#&+I uQ?c1W?ZI~Ui25x F 9!J=Rp͑aAH EK[;lDܒҡ@_PF[c޻!iK)x!U|k dq]g%D覻'6ۓ1MNy1/.D<ɲ'+9:C/v;suVcg$*U3<\6'PBXbf&^ x4l%qn$CɘYR,UxHxڈߘ l_zkЯeKT/DfvfiP洭}Xz80=`; ( E;iqYeձ wh/ ۶G3Ƕ +1ԳĤF 3QPB/f(P?UKcD]?fT2aG&y2H=veb]ǜ6.\X<k]yVwkee:JqAʗ+'֭%* #V] ?k%]x9pf!2dw0,| xPrW#p?q Hژ@X![Հ$)HnVqBƚ%3Iv|r0&>ʐagGM(} yc}G-~ 5),N|W{@ig-y%K*q?LrڕIy,󻅍MUAVӔCN2Q֟8kgo}J{[@{ϋCx}q߱;{V3-9~iq!:hAMCo:!D颥K |H5& xDxr`{q-dǐ :onEb7/4?p9`k]:9ݬ`nXO2ѥmf'-᪅p&=c\w{~=E$|mk*  @`*|-Mt}+l4.VjE<>}28Q>O:-qlÑ-Fl;[5?Y> RX(#-ۧ=5M hIZÌl,_E`N1[oA6׸wJy$$#K|yV(:h7Mriy]]:"VWL]k`]#~<1V yOך_jP?32O4CCsݒmdlFk0گ}xwq Gmu[Lxμߋ9 C,"~}>* .|,Xx逃5+!'7e~E|ɐ!B5e.Ɲd'g(L]1o#ShvbL4y/A \ot`F0z0dĸ":~c3rLIOE;;+|MRoZJ,PV\?k}4U|BEZIM3n⁤D"HEw-y5[Nҕа*nÝH=u:NY^Sj_8ck&vFSgw"5mx@M4 l?yCfMmsL݌tLgND*7-w[X.I9*Ly9\I>w69Jީ\e!:.u)p,ŝ^X>^ly )SfVPI~ WYjk܇T!<~7"tŜZu%#N/g~ |A"-6p'& e-[åhr^dTp\,;W x7fr Yz(]j7 6*OjONiCݏ%E(MvZun7Uk&\x!# vI:]UɍNUV6UB&tn)&3Za!I){3cj1CM͒͜LPe"Ye>mBlʮ<FM ifpZp:+\{mޣ^ؑxc~H1롔.&ˬ/X:\H4ib.@yɇ/RF*.ݰLx./K;d-LU9 0ծ1-g|BT.$,< oXcmf%h-LmK}^X۶88 >N N Y^P>ٞf/^!+4jQ))`m3?#aaa~>O+6v``GPz Ra-.:p`L"ӓUeZ4yyEcI.KAy~x}X[e׍;=o )Je?䐚P."ML"a $&E&Š/BCC&D oGu{a;Usiyjz#3$7URuF5Z@7%$F^sH~(a]p苛fKP4UȊɸ/^u Y]3^IqDwL08>0s#O3'pz9D$0#EʋtIS:+ܓ 9%%'#z/N@}?CIf{.h*A:XwZ*RvܼeN蝙n;LS>T`.i7eeaKsԉrWH,/6L>P*#ڷ=g{I\!}xP.%ғ}+}Dyhx22*<#-+c_G_t -${5~k\&@S,kWbgXӶ\z(ut:9x ML3;3ĕ(exO ' #orQCjEi합ۇ @uyڢ6zxLz:8q=|Df]og<͙.U#pFF-֊}WV(7ǂ=I78!S~aօ&#`pN 7dշƬy!џ7mS|򰘲 ߵ"R$܎~V_~|!2_GuhW{άl/%#+)64IՉrt;`)ӶDyvDAϳ_ݺ)cOs IU!H#73!IRwi;Hnog)!_&@khl%[Dy_J9a>+wot\'BqO|DPrL U/Ē>8Jb?T[B_DRvӍ,EwiQ*Q(xFv_n  >li!]Ġ_mmWf">×bH`8Ig܎B*0\{5"QDQIRw ~C.m#]ďX{2:Gȱ_hǰ9,U~A=ͣw4M/j%2İ1n$4XFB5K;o6!(#>cΨ#&YpD)V\6;Jvtx&Th=˫/+HwJv UҪO:/bL 8O^@C|W6Ry#,1aVbvKfO[fKhѲ)K|osL|SE3;xŨn_qIdIjEY*𓕄dqԙ=ZPvT|?Ѷ( '|<= KZ G' 5'PmTfݩC mIRUɪTOrϙ&٫m-m@ON5y[}kns_ګsɌjJ|CG |R&D,3$9GL1ڑj>8RO9wF/$OV|YGW~=CZ^`#Ii;96.dD ԞIZUXljhDx&#}5h[^{CL^m98kr8PPs|Z9َ2&+f՟|yP.4*u<k䕒,z%TwQꅔee*@ѷJ`}@Hv(v^uLN`,c-@Za7JwhW *X6aZs} ~{JBB{jϞ)P5哉vlLUpcMWAmSvur\FTNyw\߯ge2˛Y-5ҋUTQݭ5ZSTb);d}; ˄ܩzFQ~}In\?ãJw|M|*OG7c#o{}sfeOsb ǁ*u+s̨ Jۂ{PGQ&&<:27VlnEV2~pP20A y޻{[.d]^h{"@&PsVZ]%,^^B:7=I%Z<7jlc jRd 5V&yIvywQue>#Yfk͝+̊[ gǼkEM!F LjO瘮[Pg8 &M,dϖ؍ѝf?DR⻬$&vXUsTə:S귦&O ߨ$Zq#ڇ 06~Mӌʓd@'6,_1DmxgR:PjR/1qˋS_Gɏ]^ HPmͲFHLF"`k= HBdib%r~&=A ͸  麭yE*DNL-I:Z=ZZ/$S4e#ÊӸHz;w<]GVG ų)~y)|f)7ZsZ{qht {MHʔmF_6xWxۧoI"8#>o[fHf^z:hF>0}핧KH(޻ ҭugV[l:̵QR eIgz<`b =Ѿg͔ 1iWjڷ`!nbc4zv`ukׄ N`uۼʚOs) dхwzSq_6lG|[K~ Gv K7|{t4|d0#jwʈ~n@K_:;#E qN:x'Ï^mHO`[Ee*eB]" ,W'!ﳇ]/ eu_;J\QWȅT2Jpr|{(qeEm[ Լx.\6T)~#bA)L$"rrඒT1T*$Tt>A.*RPم!`܉olC_o  @fk[h5?4GZ,MitELa@%%-hBMk,۶mmvҘ NN44Vc̼7 -H]Bv 8_T1m}%1MUiYss0)d=R]]0${ UЋ>R&$NC^.@_%m-A?$+O%jTݴ<:!R{„1QZ\\-:²l}>> Yi,xK649iYУƶ5ŬĶO(gb*HY }lDaHEYƲk i!o*m~ N$k)4ЎIA0*/yFpZ, )~=e1.|_@E'LR`B5cGp_Fuȵ(3gLKj}ȐFT+TrJyASl)&tГ>qW>t = ǂ~ub82/JoeȤs軓V-',׍XEI77?uDUu/-a f(qn&`ΑC'X[)cU] h @eu ]ԕ< 5uH25v3Tjc֧C͓Q3;;c*x1}|xa%h*LܥkCi@dέ2}T|/DX>z6n&eI+;Jm`/?;>*tThc em#U7nѬF6q){V:d`/B+k.u߹ -%R)M\pb< rׂ%̡*X`<3L8L0KȈ*V'5E'TQߑGeZq1J] $j񊘑q6ѭu|ɟXKGTYˆB%ˢdԞc$ԧlˉ/Rѧ iP(w骋.EC8?M}~j6e㰮TsL逾=P|,<92T}hD2sŨS%kr3IFqX&Uzhچ@:FIX`̸(} {0fPJOP%PaL5\^DҖ6,E5 u=IXIB;@Ch6%HvֈE~^u-\'6QVZܹ |W{[mK^ rA~Ozq X@x2K;2Z^f[:'w.bbBi˱ v1!t%bG{&մxA *&_m{R ?/<pGRHDV`WJ@iJQ! l5vzl ~v )$(|3[\s9> ]Fu*mHp> QX+b4>QJDy\zbd{=6Hl+ԥjjG&k#l ͔/8JB k" ۛ"є W[o5{i~XțyNP}m F2C_+QSJ4KT]~3- 4N4UG;dQ4czLLMyA|g 'hwJ."P)нsDtl)L ,`w\SᖬmxO$~qbpA_Mmrw V M>=urߚ$}:& Qq(0 U|2;e^,iyBI]"ۍ*Na^y3C~@_0f̮a?*G~7!0Xw}VlwJY0M-@E4MiPGm>Tk ddu8<xNf}m:>3<9JoR,/:w 5uz8Y[Fk&x~ y0{?[nUNtvaN$+E(:bFN b~Kb3YNpи:TZ7/gE5s'OӇhÚ;Ls[# SbӾ*R`f:= Pˤs#0D~?FY zbP-x jJ}Cvlv>\_8Eu3աCݼ7-~a//Z4le7egx%5X…J] JHh7/K ͐ƿN*&u^[(~_+c9ۯ T&3y/[W38-$2*7ۣo][V+7D7o4$^?R(K zN5f0u݊0cn=:Q6v6?[ !7{pIѮ͋"1N$ }3ͳ,Ce|8,Xn)_B R=hPC1]R eG F.p䃗 ]j^9ᕡRQ̿W`8#LA-(6 *iem4"vOk$^@L0Q7="uf$([$3 1U{M8tQr +2R3|l!'хi} S#d|lZWw%/!qug= 7!g~ݮRs/}[!`$^_,Pn)l*{H}S|?$ z)|d-|HuH8H څ{ u}sh c~S`aIb/^E3<мgxZYvodhe}X0GƆiAC4x^F<;A.鬒$~udC,nm!}/C owN&`&l8WKGr9Ҭ33,k8g. Ec!<*{J@ub4 P2{NSkH\[E,pJ?GykC{C!RdGT\UtIM4y~A,.|E'`+j,"vL|W?Z= B$1)x,6=t,of;E)h02s~Qt7[*{4;,Vi8FCBFGf@f">AM =Õף?i/fzɏ,YtN~yW#fܛx(kxC wgb̀3S`]q_ PTl‘eaUЗ~)VSfbteVQx&H襉z!q$lwS(Lʚ-@ Nk ov㠌9*io?Z֭ VEu]S !O+Yh EzVMKvVefHݵ-+g#d[mؔSJ/p=9͙X] L2f/,{ ʺ)g=yy~q8D U#"nFGQE,!x" N切9·w⤶{bۯPE% >Oe`PWF~09 JqF;n61'QXLNXvFE!F 54Cag/sEcFQpv"֟*$jLY`ʫL6\Aq-M/+LyAe P7.F5r7A?_0N(GNS44N`- ::GG*ZYBk_qR7 J|{~Z@@P?.<ER0UlQU 쎮"ҰIQĩ4^G 0WKߜ{i l)j-:jhQѲ|tzăS刳3E~)Zykuhex@xl%i5 ?}0茢 R&_#~Mq+XM;].N+.'C?Cdqu ,3?Qr 3@QU>^D(I!ll* 奐`7̺X3L;HeCƂ}!xr>Z4y[N6n6*#I-(:kb8nDnEsRi;~8!]3@utuo 0,|i!1hiM!HyZj2Z qm'u¢j݀tz ᒼ5O&ہeݯk"6FBL萮o# ų }mr`UMA]ydSɸ{mIFqDC+̀x#B϶o!Dޓ vpOZ(UGp;)gkw7 o )RhG7ֶyA-ؗe .ñ&`f"" n+A ]-f~?rwZpJ鿗ϡ饍ܭ΀جQ@A7OOGT9~e^Rޞwaj4ќ3z_{(C٩YA:I~pa;*Hn ˃3ȈYco LGzNu,Z7ac HAN\0jP;N^|̻"ҏahUf ;ۘJW>> 8] O~7_1ڝ=bSvZDu_9n1_`Aodn13}/ֿEXQ`iA~t/hmt&ZrDJըNqKdTTڽ-I e]e*^- OQ L0 %:3-*S4M lYDŽ"t1 Q4bIT>=4 b{f;TuL#" ;<DDS5VV"Hm u2o-GwpR/E*TZL0Siy:#z/v8w{Lˢ@PSf,.B1Km/Q]\wԦL"@}CE}?Xܑ*K}Ul+Bqr+Ԝx0d2BG%Pȣ$)vXrv[ݞ|A8\8>3avo ?3?qԜ;RH.G-3fbUyqtKHCHfɛ;lArƻ=Q'%1K#Jg_`C-!Ou«poxdž?G+^~7<4(LǴIC4]I~Mr*6)p=-FČ)A?@s[Xz ɞFis*xz{-:&Iq?ˑl O3nIs-h<nr=k>)J2DEY fk2~2b։ԠX \2U~h'_1мbyshsN7'۵NJ-WÑ1nNZp'y<)Ͷ PEZh-p|Gւ$08 /gu_$sJBb?t/D5}N0r13s8Pيx~_R)UgJ*6Ӓ\r- mnt՗~EV\Xe0z2?]z}VVmK5/{n*GC,!V޺[q t*Dx}b:d]fpX! *sm'`vV8Eƴpϑx1#O`&C9gm 'NqDhHvRP ~29Np.5Ki2TgИ'nStԬ @gy`=l[yuuCygry7' OIdQ"L r6+(< \5(Q_,%Ni'9wj/tXɩ?^7 j0JYUL6i{dv'ظҏr7/vsnr􁇛%B|܏4G(a Գ9)BNZW\Ff?HE3 kX2y3RPZ3M\?붡W4ӳ}q1ߴ`HKy0m?9-'G|GUb ʟdehȺ5Dw `!uWӉf-aj(h('lswhX<)jg5u֩L&߭!ڛ-6K w0C4S t;/&I"Δv_ntFu RYa~A|R(1χBVs>L)5qFL;9Ș#/\ǞzS mDeqaa}]pȣ?"@gzD&+P+r9O',r_%TO Ўf;Z& R\!4YD͹]%4Ld5܉tV}oƲd6BѯP?F&|hiEq X8Z{8;[yxk2GHV endstream endobj 12 0 obj 26688 endobj 13 0 obj <> endobj 14 0 obj <> endobj 15 0 obj [ 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 263 241 291 600 600 600 600 1200 1200 600 600 797 600 600 600 600 600 600 600 1200 200 1200 791 1200 600 1200 1200 1200 732 600 600 600 600 600 600 412 600 600 735 740 600 600 600 600 791 720 600 720 ] endobj 16 0 obj <> stream xx_5%%"0Cwt# 5C3tH HHK#-ҩ4ttI<~~{{Z0ШkqHZ8A\9@\B5 3T9,\\F..li8 q `> qq U]tU\ VV קz9A-!Pf0H:X%7o@,; @Z\NP}G;K8Ğ YY Y AA>04.Pܚ` x`$zxxpzosG{#NT.8b j~';/Ie!?=j 5wu{;>G%jY 7' !fur Pgx6"?AW0"~>NNK f E`@ܡWz_`.`9`]# _'ND$!v^]!fg y{߇Kb3ta.r0O:giq`PuGUo5Majz#GqA+pI~$Տ? G'YsG /r:8"z\ _______[ lv6堿Mm;oA[zo[zփ࿭m=sm=7tU@YOs;* ~U<"@U7{3(\~Ea.C? _H(i_PeS^M+]s<  Wȵ1@u;_&H 2~ 4} ~ō ԶCte#/7P~ Ԃ_'?Pvv׶P=wΫCoP k[@+tm nP ^!NvNֿwIA]AW5"p@d\`םBկ]!1lWunp;/mA t 2 _'T?=ȵߓMP[ "̿*d .D-6{zx -@\|?syKGVr Q Bpl!\!z&{y=@3@o B!Hz[|#w .*umAcU-C:z^kʺCחAqQCeegG+B!Xw!U1o P?P!x!(U쏓Ty8ɀl*ize`B]^m ]K8 mpǵ&ޟMC\@Ñ VsZEE߳?20u#.QZp(RrM`hfapq fk`tB:fms< 05q9 0Wk P8\!^Pk8"[̴`M'{(5`EpΟG}?1q'HֿA2PJ +y!g!U~Eh7;Wݵ"t:~z# Nfy& Bhkd B(;z"$"4J!vP*BH:9=tm8Z~!$#[Õҁa+& ~-zAqpgzM9jAoMqpC A5B9lBY{'W/1E%FĮ@ 0-D\ܮ8f#D-0BX\nfw?zՀM;{78U~ B~0n١ V ߧ_(B]}e.-CLo~-zsz!U~ׇ#6! }~,' IK+)пO0BH?B :---!;.@{8}ss vt^/0BG_y x}5*ѲGzGYB\Wo!4?o fBh^+%wx*D‹P!^"ߍ"E'̹~Fr;B,y B0\W\WgRRR><Df_U.%npDT]#.|[YzBͱ<ˣb؈]z)M켕-iB^riL7Gox`ήZ>SB#Q\kA9iw -m3 )jh-B b=!RC My~# *Ք*5*r,]!3ϱ(a"ňv4wh)7S9eob!oooi.VZ/ܥZ}2ӤU-SۘA)(yXbFKCJ臒\%,_s8dkfC_A fU G2|[D9hV t|inh(PnK}K>ɚKkpm<刌H~Q+UOgSUX`qa_Vϑ*Nhω&e>*A2 oxncr3Cد)XFGL[oS,VUL;$:P #Bj|kNӭĮӌןz$ Y4Jo_=YyrC~ʖPezvrCLOx *1/}1-liyP&O+T?P>)PZTEm }T*PcS1(r3ՈqN#D,,"$ue$&sx[HTP3;h>;UDS ҡTRs;DADY nOܡ5qi%Λ c;O. JlI &qdd+SN[LYf(gG# o V,~ 6Bt} q:؄e ͈_ F<LœzAW^5S4,2Oʗ)mme.{^G%8>gUq"ql(k-M4QsWm>,Pn_>pWń׺m|qJ1ZW͂d{^*䝙)ꇶ# 1 ioY֜L sIR}C|&;].yQ6,b }.+עQ \$%2m^5TRn@!GKC7K`tm.Њ/L 27v,j*1<_~ [Xd~hp&,2Aui :P!]QY||$  )Ē_a$x˹3}T~_%[} F~z}EE>Xy fBIYv@& 55d Mle{ڧ 592RP SR=9ZsxP]&q{h|rKƃ$].bNR=MAvk96|;G)+,I'{&:`W5HufGWIzUbjfo? BmXjq1hr}rꡟv؆a<ûuQࣱjI:^{Zy+ELS7Ub**0Jɕb+sB~>{+{ԳAsq!&~YJC, 2;}cd4̎uQ߅ŝ.4(Qa7G3f6U;NElS'nڕݢc_*=\uyЛa[LƞƗ&2OŽ8?J;R2~s߯Now]Z좚 (L3l${Nbrf,]N/ NV)l7;Edjxuz_XĿEXILmzcPNl^:_qm3=OKPe,km]B06Q<?:)Vr;4qǞh8GNdlWCf~udxb;@j{c8:-cX Nzs3Iuh $x'[HSu DvN%g|t>}f_U} Y~'u v+g0M' IiΜsXKm: † tfS^^{h'P)5Z3yM7Ms aF:o1uQZWͼ%;ji]+:1lWYu#̾^R;\ݕр"pۧ,(ܠٿ?X%Qh,ZqΔ?{U}׋-LzM#mv*ٷ \_B:s7M]1tټq9ROzJYt!J? K~ΒuB7=BqR4 HT/D++ c|7 yqvJ3 J!T7y:Rä흫P|eufYw&l(H`ED&!_^˪_?rl!€ŷ%ɬ}I)@^3`!;J\0#CU#>2~5pq=lK&V0E *!γv%׮^ (p]oDâXБN^Taw'EoF>t&-w! Ng:/4NȞfgH*ЪHpQadY}I,;&c?jܤ$rMhuWy:ֺn?*`_kހZʝI|WCv >ʔrӜѲa 2Sw8k^;la~|ӊke?۲L>'- ?~SkRMj2- 8WWWK#zUʷ$޸/G*u_nBy+jàh\CU1Z. sAzYx*yw^!G0ӧs7A>=Hw }T粻>wFK|WSPS~2[$K`Olm|jNhBP*̲Uk:)ȏ9rѡ-篝 i2pӧ[ 0L}ջI~PpBL;u>/(&l)*St&PأtRV?e'eW+VߏD(j\:oq[crLًRc7anK2n^BC|#t^2_!KSdyۂ!Db~T" bd} ּP3eGp|C2*E;J1?Ҍ )lfQ1^)|>rŒc<aᄆ%6-.y)葠*,"Ѫ@ޅ=謇nN]EϿ|`'q8$]^vQ>foS4BC. >O ֈkIa - @}cM[ c21CH߬\&̅#:w>'"&y.>%r0/wq]6kk e=mfJWI=$r Vlٷ9gXl2q(dQ7S49M90?Cc4]BJx\]_ lP,k43ڌAx8IvH!Os/^S}Ԏ|Nj[a4Ʌ$Il?wt/}CQ%~Oa+v<2'Z %ɜrMLqY廼 Wqx$}hI8)K䖣9'gdFg]5sȡWZ]9sSDEZv\=5.B0 yG̝\v3޻[>&O%8f1~0&͔{S9  o[,fTHvּOPd"ܦ5{N&KLI5;r )@+GhP A场͵2N8Lt*G$Rsj,j`īd%b@i*piQ8*5QgKbp'ISeT6D:d,eZ܏|J.|EQ? ; [i؆#ed:}^Tsn *Ɏ)5yH魙*ԨCJCiwtCӦ۴pĮ1Z7UhlX%PBx?|¨|ңBY)/ w>Ħ)!z˳T ぀~w{i b wt+DϼWG1513OH[aOtj쉪cC.Co2`ri?Fk4<_4lp(t< $VMm]5*munͰJ.()_rf]pjޗ}jZѾޝ+v&x$m$I̗^Sn9K͓ğ%~eC@ڇy4 (Z“7/h==* 8p~_u]sCk74)kܛ3g57K~ғ9[$~v NjV%eݟ Xy>]|rɆGpz3Ys_|#7ƠŭhyZFɄ j134Q#-~/ r*k=_n^GwSd,`_|gM g q16lyȣPUڊLsStQa:3,|%X@Է@a!>7} ĴC~~VڊԨf\9(r;(ހ.f=C.m5VD@X;dUd6 K'\ %.j$(g% IQi=xih|3ZKvCKBL'1 ?Jn=Q)vnYy#R2yڢj# mU-3 Ӻ6:ӠHW`I:: 31*z,߈K.yt`|9/|-Z^iZqɐDb=d;2ZZo(tLZeTl2ᖐL:Wh3<&J"?n[y+Aݻd9^|nHFmMP}\\3mCW|sClfs^A $`}3uCĚ\Pl>I'2|6D1V=h)tTm ;ιo?hiw?mng1/Ѝܐ#Mftg,-nD_rDݲ]=0 ܺI!.q5}r/IBQgy'XJ8M3TP|$<3?vF0Qj(%B@BqnG7xiSPe-/eu+;Cl$e/,(er>MoEn\YPzO¾ )_@2w2G"9^WS O&%R֣eM.l.s(s:^E{7'w)-~[zWE6H ş^~Ќt87jsZ';vq,Y{yƨ%iE&F*=8z.OI/D/,uk!,l$^Jc]BNU[:aN[[W;1r;{Nj^Y+fs˺mqpmU (VýCOx}OK%RMzuHla7 NO;s|;M '֣V"/>;Zu>CrD.>\O#d)\z׆$>-,{~`2kGZ7žb)EDIb 8O9:Reܪ!bV8F:*l<2⏧.\w)bמ|^կ1ePnV)Vŷ^_Y+ꕬA*͚A:h쏺OAݍ`de# A$ n@C3Q#>w8\J$Š5vMhd6NAHSБOcD8 Ow&R8 1u LчExVZ)sI4M02|tC!ą(pxI?-+gj.?` 2[m_|O~ELqXp7^1Oɍ[+jڠѳ>ÏeHkuT9oz'Q06| -bɾYZo(y{RVpWoNs <5;>-C<P)2LL}S(SF?pˀ+t>b:ojT&5mzȰ(ǖXlu> Dm]%+nݕEpn!K4MnuVUGefZ7,-} QcD5wXfI3.}Uu.1sP~1m#\l^ ?29 D* silgtn^@{_{NhրO=IB"T16Ē$JIPHmlTx'<.˖'n] ^ԣagtz#U]:BK3Xó"3^9I; ˣiUQgc>NqZ"^15>z4ǐZFU.RO,VpH) ܐ=umoϖk#ω)! ݥۺHwѡlgf XMQseVl]ZWZG0g*m%J7 ֦ᕴ9Jj^:sEiۦ`깫X^Q;ق >@5[֐&c^dٗFvIˊ<`Ҿɬޓk,z|AHd*2*DldT(5~iRMqTny䮈Ć{BLɳ=gt:3NĀo \SΎׂxv;7Alꉣ6_\!D6AJ)JFN][BgXN|ѰMT |HùYb;!dO/0ʟRq=,}xSp'Xn~@b6iLc>1UXjs%Kep/a3d |xYt ]v"U9bi H4R|K} E)8i0aşlȇ|GFF%ꯪLb/:9>fɄw4yl@!D\eICI0#Wb|w70nኧX dov0@#VJrz$#pBߤl>YBiSI#GyӖ9|{F^ &>syƼ/Ij q2o lx]cTXP3Rr\}teLl4 y- ~x{h+*2fs؆nz""r$9lH륺lo,24wT-E1}vEC.YP s,Y J-|e]1l*oD.M<Shp#kTzߗؘW!K~yn{a ;+/Mڧ1Gl%tϞ16~%$SPXGTE5| V)%z;Iu86^~kc#Q=^*rlg;-+$`NFPRe` g/.LJ 8/(1ˎlG=ݣ"xſ,`s/^AK] z<-:Hgt#t?.v$/K $ڐUv6` dbKگQ|*=~!ɿVpm Ӱ|Nj"LE3C,N.-!}Td3ܬ^ӕ٧j c·ElPl܈3d=k 4!"~C80٬4rЫsF,JM]=7Qz,vJΞ <6,: ʤ%wa3:IA&70-8 Xےu*Z5yR-6Y+{/Q"Nl+RRs%sІ^&{ɿ IT3 x|e1=Y͂4 q Tp+{1'P$1t,f4$z϶Ï=Dd|[ƥ7Q!+\WUdpKIKM D~!^Wd"w[^K&A]iL;D<ᵝ 1^of,>`]v+S ?S]a5R%/.d\h;O~AebȂ8c6Hh}+,~OMi[vu$ 3''Zьmtb t>Wإon}}5YvョEulR\1';J&24[4X@ƣ|Elyb>P}'a$?!Xv?Vl!hg:=ƭNBV2 δ1its)č| `BmpXJ8Aj{fO?{?Wa0̹Ld*olR mxf=iw GưޮzӲ&OG^| CikT x Hsp/W i~ZW~i $5o2Q^.uW};lϱ:M.6U֤Yk+)j4"who5-9=9Y|]h1.x^7@;\V* /5ٸK1rkTXo2/Τjp ;O;^P4 `{fעrtI&* |@J[UQeM]KEVڥCk 9, us:7֐OH/F>ϱ P'.ۍR\h@7ucj&b Ũ\3ON^ vј+׶*3E0la3<] Z] Xϸ[e+m䖘//R8 5F}<֨pxv-.{~{!_mٻH[ڏ:=+v$RsGfN(,+l\b{ (m>klml+ |/qOoҗ}oms۶Twl "|VVg6RwޖW3XJxoEPlBJ[(! K/dIu+ w,9^{ MZBqPWY1Xc/+o$#L?2^F}yiKQ}_#noּnpu:>騑NmX-53(H2l̯rwrj ߛz߲Lv\P#,%`#yޑO03--Z`})7k&WS#5_Y .;x ZJ˚^͌/7bD.oOSr"H[tgUKf%Y¶N ݙgHijsJ]%>SEnL9\F݂Yi$ؐR1XJyΨxD@9tIYy~m:M(ޑanDt2l401 ˣYgzW:ˌJJ_255Zw;O;Ŭh'NJzU9 r͡UL^}[Nzw.!j3N>O*N(P$-n33-A֋ۡgf7kG^e=ՊV4o2 >Ykzn"AmJ FCK&#xN' {{s|Q37jtڃf@Ԋ\&|`YB8V)^JZLs:8LZ;S?z|TxIR׿0" lb_tIGʹBŢ@]J@Q0IzUʻ/j_ܒ9ð /[#Az(Vi1T,e2jp(Q%WxG9]80HJ/1dFtl+dbwpQ:1Hdg~8 j.Y􂎓^1<( fs1zQ-ۡ*O]Mt)(}TQ`w_Um `##%wpFΟL#A jy-\cid3eFe38kFLopʗVz |&Wa7Zj(hydɖ􉗾Rl|kVό2cC2{v.|lB[QeBe I+bn1־aoI =ҬRSv}R={yE+RM#3*:}'7AӀRҹ2è| §."wXٻx1c0lF >@F)=oT^H^qF.%)|zۓ.2Zr;ox#뾙q''IapZZB*?wD~Z0`3q6I559/-օ`[$y1WwVm R)̯%p/wVbyOh$RZ)"Oy'id$j|s꺫Ҁy䤽 H(OF{88{fܵz\I(trn5>^*ݍn>sZ wE'BEDO -7mQJV >IWHj?zoТPK퓗hLɷ- _=4 O!|;wY Vs#ZRhzm5}篚sd*pni1Ku=.x@9H! O&:Ѓ[z`j7< G4ck8,.`V s}}DuJsv= [ku'pk$Fw>M= l8ɾ9U})AM$s%Væege&{ jVN?j11[U~_CGGiDWxsߏiڎ3BE&as``۶m[ӶmM۶m۶m۶mOV}ZM.K&)MG {aBe (yB['z;VvdiNδSqqvp63vki#?)GpzbdynoCY/WLaes(lACgye\[Lz۠|ٿ+sÔɝjӠiKKд7H,{MNWqa᪽+{m㘲:4k5F@A@]i`kWJ51u{lPqi+\H5/|cHsj 't_xars\RJ(+" wDpic$u d9NGծ>}VZ7NBR"Qg'$YUދ&Wm[ 78' 2=EwhpT@A;˕!t2%sXWO&@d!Va:}PP綂;(y\,@#.D]Rte=Ya67P]@v?:zC͢TLm8%2X~u4U&ex8gnA"t*X8Np@gG0V &_$fVئ20to<HO.%ô(1lHG:OO̩ZDҲ2@lK%x A*uUZ/#Rw H +jR.]; g*fN L@و=loON3vPiK(TX=٠oޘ0e?v HQb (pЄYxNe#!u|N%ctfҗD+V.ɑh9tq'ʽB#YAZkgsˊ>x؃Yph"nb Lv.l-sWQ2Nt(CW\s#.!'W`(SL8wRpx'.RZS.Qr_"#X-˘V}+!f݄2:' }Oӕc+픣hXi+oF^oS)p_?'#xC=`AĖ3(d8w& Pz`!_vL:KϭRkż V{V+|K6R!1%>*! ) j_gT)TigtAbT;k-I-6[Iw 4ԕ`Rw g5&mmE/)nk6iI3+aEq=? Rv?4)C0b%%b-įٕ16=|@SMB XӗX:>R]86pGK$YV;X2m~%G)hOn(JU9,tz;r_5H^LWE6{u( b iG"7ˤ,1Ɋ '+=Dd^Aq%PaXx,mbb#8 2Q84}Rdj7;'l7)cn^.Gܴ <}X#Y["tԖ '|84W47*0J{";n1åei?r%>  5;_i[555\Ss@g7ђprP1=֕ɒU,Z`:`ZZԐ plXf 7FPHbn.f! >Qu,uX*%h$ И,s$479\x Kʱ-,sI̤iE{/_ܛ6bJD쁏O"O).'B:]xk0>\:(N`(LW"6m8UQR= iؔm%ո&&1Elsa-uTG1\ގK;RGQW);JCQwD-!ׂRkw6F[.)77,~u`ގ] ?-AC.ڑ M+?LeX '59<X-|`1$PNMG =LB"ng;KBgJ:8Pp8`P0}E &ct&KMQF4){<ȑ3Gu$a;FQ=䶈JšH95L<{JrJ L!o mn#}2vE.}pO.}ֱ~&*G(iXZN~$ƫFZlVMV$[,‰$x2J=>JmԋADKC]Bɼx"0uo@n dJP (=?4F=oQjz?v 7^3AQ(A U !n Kw>V6_1'U}`9=6g>6,=I`ީ<7 ]?td_{7#3vb.{:d0:HH+ČG`FoVmHfpxH$`k/Rۤ֏R3j૶'8DUyXuО}T?hf Ls, `M4f*} ^+ mӯ[2t^cB@΋Rw7JB;"?^]2ץs7bFX< wIiw'f&9KjZsJZFb"QF(Y1Qy|\;.srhǒoLej8B6&٭A־`>T[XTR.$}  }ƠuD#TR%&%rw01Ⱥ p|Pc6 |.IF _F,N{or*5,$4WƎcj2AmWA@<̷0a 8mŇ+]jX_ +*&00ssiQM(lxu2 T=ٹ-V>8]ܩre*g ~mCmq _ xyLM !lۅ xr p*UnI-N+qjᾈFcJv "]4RK7>'N<&OfjHkr<(:yj23T! KƦ^RK}g O6V(zAԎkXKb:RccOk#&sϩE:2 DSOowy]U3 aWf~,OҖ5m,r/{Yn&iACu\)~2R# QTxMGRyl0͞䫏`P{Q0Ft]|[_zI%%RӲ@5U;Ӆ4 R2!mZ7סJڦ0 |וf .;`JKT#f)O[3A,a-pPc C~JA M3Z.r1TJUUTp \ޤ1UAWȤT&]0H؅4e$ 5Tk+c%txėXhyЏ@U@+dz`]OAZ6Q:W0A(P]<ہ?]YsV nF&3Ο&Enp &?/ 0쏬ck}G"!Ak@_?5aӺQf(6l̅Dc,hoXK' =6n#Cq{swO櫌niudNj!k|v1#6YW~ B&ƯLO@ 4}og#QCȲ%&~CEx -fu!S/&-ffӟŴwz.+OKf6F!uվ!:gzn^!~պ4QiGt3FFQILlc13+]HKox},Q}P&E ?D /}SXM^VlQÓ~6AuPܗxyfn R:t^s = {׫ofGjQG#Rrk&,e1#'^ڋ3FM O۫h J8U0u  B9K`UYTJU, u )ob 8<췭HLqJt# ɈLLϻ9w[5g٫6gFD&]FU6 hi{3ht-f:;klwvDJ1XD!ͼ7Eycj=8Yװ{lJUoߵ'S\7_\ĐKc+ ʌ.YvqEQ(~(I\؍V#Ej%07d#䗁S]˗G|ޤ@)K"mv3? 29RcܝW$ˊ*X`({6Lo`yz9}6%h3vƔ((ZDכ"^(Rm7L5~Խs-}Ïjy_gTs&_.3`^ oy<@ù7YsUTuX!޹'EԲj2a*"DʈPUe_ZooB [_sӏqʪ4)6%&Qd]L0*/ߑTf{|߀q;ީ3놘Yu@"7K8VsX1<.ڵ^. },_V%6(uE^XA n+qz_P?2gf2W1 v8ތ^K΍1AJm(TN?Uk/ t5ևN" !`qW>"x)m1xUQ ݞSj5My_D ^) CdL.W|֑p2bYϖufoh xAPWgHAڅK*sЍ~ܰ;P=ѻuvnY/>r*:jv;u҆/caq_#C43 ynG{=߽2l77m̩hC aG.x%|+D# }j)uҗ*RJLFeΘHVe'r9>mI15Uk-KpdWgE80Q~,}0M`E"u ٩ލW3{r OϷ,Ds?THj>"]7ڜxqQR$znnDwQII u2EC kV-m{B}`h=\*T=!Lp{ ۂ2Tas\ C',HJU,)NTo4"u Ip\c C1mRI\:ˌՃ̆|?2r_8ŠXcoҲ3v~QɆvVі~/Wa;[w]L?,B չk2 qZפܕBL h♟Bw0t7X}Scn@~<5NOؿgT2k%ڇ5'yÙa*FU2p][w5o:ucE_҉[x^_Z5@Pd{Xf6?Ms"W=-Bҗ ^s|6X \ͼ/}bg6mJpbX g- -h~ޔf;Mj<>vJ㛐i3(C<^dA+oƹ}4?QA=$XͨPڰy*CZh'ns){ΒHFl@Ӄ@Y oѫ=@1A-;Vۺ@k܏1ů U<ˉݑƋ妺_?tlocp#B8@ۖs`#zÄw~ؼ|:iUw`+bX'cJ q <eu ^xMߘ9;Ɨ0wW(#np RVPB<ʘ8xsk/[]nH{DUV#CJ7y49.OSkGHhur`) łޥAn:)Jp]qIGŠ)sxoJ1i0CB _frt$Dy[9R~0:vNF*vQdV X =n掚V9D?Á,|vz@e0;sf ęu?*l%FIU.w7C߃Dq /~!iXLαը׉D} $O7"?hޱO;eQ@ӍÊħ>%4r6!4bmr+':5`sH$4n<ʝ\aEA!Xvn H|{z[Rg=)>UXNӤR઀,@]a1R!9l?;tW4n)JRuo7; .&IW w)0U= uҳG\6JHV-J8FOۥN=ᯊBo=T]㧺/df㦴u8|j6/f0p׏d9saTok$dkwM!'踝mcQ뺿PْNP|Ɇ,Y&u;ஃxSAS4K m:wY:\^3q秴jZ]@EgD$_+‘%tNW;P^fZis. an:P]巾e@q |ٳ4}DZ\]᭸z)yYoQ`dTyb(wq-j]~9'nPK97||ZE<&SJX7JaRFH/Q4:$=5@_q k}JDtkmW1' 2ґΉKrQCs&%mߠe~-(mޥ](d#9Sw~:ZxG'l6wc٘n27?O6͠'h4NF,{!G8`]{ [Z5kh.Pf1/cWm L]H/F endstream endobj 17 0 obj 26332 endobj 18 0 obj <> endobj 19 0 obj <> endobj 20 0 obj [ 600 600 600 600 600 600 600 600 600 600 600 600 600 ] endobj 21 0 obj <> stream xԙeXT]NER:%n$KNKK?z{˨syku>Tʪ,f`ޙRdglej Tۃ)ؠ }q ~J^NiLgcUh(8[@ev6 '%>lZJg) deaOiA)joQʹx[C@6fƶ`'J!`VJqҙRT&&blGrCL-PڛRRZ:;;nnnnİY)Emm)r@/X)uWqN>t<ܔ@N吴7ف읝п_!a:!jhcv[ٛno E7LA rZw;@inl2A{9(!. /X:ԙdae#9Oou~nFhU5!bkhldsklge 3+uYgc[+SQ{ [%ۿIdlljOyU@[+{2藤fiejcrr!>wAo^_4v2n;'N~#|G ~KcXͿ]lfeoA 4@=п.6<;@W 8NN7`;Cp~<mcSg/ dni,l@ r=; i$o&mrܔ i Kc[]! N\@+0 7FG no0m@&0`ge`bl@ zbô'/P1Rƶ *7W)-/g4M9~?9Un/8y=_NSN BFK) '_M8yo& PC1LO|?M~.&+Ot T_@7 lq~?J @0x~"aD0n ]?rqD7?M.n&?m@0ٸ~A|ܿ LF_&'? aor'MF&O'`rD7x%e7Yy&5/,]N^.X0;Uߕ>~Wnzc]>~8W`]KAVPd/I  <2g"'a?l> "nko1v=Tٸҿy~ b`ggI{+33ۛwwC3hv *ՙj:T*4;|UivQPgNDRCʟz?K 5&*Pw>E?K **YjYQP򟥆:,5ԫHHO ﯳ|c`%T-+/l_*WZnXvJV+-[Z;ˀ60>ٿ;/-D޸; Pa?fB}Kn٤P#_n0LNaB/ iġC`3bl xxӆj0'@zטڸt];4ʶ0ޛy{4 dF)slucj`A S9-`jys7Q[F""bX%X v`{L1y`!eǔc{1g~P^*~:kw~0^?Xv`u|?Ο Z ^P|Ml3EIIE`0aE3haxGk:0P?c0q?g5l$ .N-%MWй|3Pj,`0:Y!߿IsjVf j~ j~ {j.`gt  |?׏ßa~{_ #pAfZO)l7i:iel ` rec{~7_)^,Ч% '}qC ߶hAdiAhkB5mw}u y+_g.#܉Z)hNn/!Uf/{4lȵ*L+0jmnBcS5R@>}%򹐒{3h#`%Z qQU_iK+_,%s[Iپ>([1ӵD LՃkGM^e띤2̨$3XRbB I4k F{ˁh+RbQJKJ&#"1zB̹OOxRqKwMvS 2V͛D+ĉcJыPh%O/L,5>gc\4pg 2J;^P_ghcP\_WJʵD7L*ߢ&ܤD&f9|6z 7\T Jפb2 Ѩ])}2Uț$,dn2+L۱#:Q"AI}u-Njt%hˠ븢fKN0Fu5:[8*Z7G\-FTjf8C2٪_ P↢\%TرvO P7E*Tv.+E,6M}YÒ x]zk&hI+hZ\n^۝ M5V)־c:k1OCb߰uی"AF_1#R1ɞǖ5G5=SEZWaIkmıB}4O5R'Tb$~5ui((/?a.,a2典q;Q `f'S](U#է ͵V#/dsE uGJ꼘lt7LFZs0/&3 RR U/Xg}~Z6)t_9?LO}5{$SW"qOk}s6#Fì.Tq!7z:JcP aGltv 0.vy(*:OLUwX1v֣kPg߂Ҭ 4: ^^vGUs4=IO})SBњbP/OO,B qh睑eb~uvO`EM:WPW}E-BP GS)qd`֥vaL_.o4^ z.oO-<ޓVt\5u"Zu) *Too.4'ͦwF.F(<1~dxilfy:k47~Dt['(˜p nzkhm(햅uԺv; (L\V[E2$i,x+Z;WLJ= h Wv3Yd?-q8'MQ7Oqr^i-DJ~`-^ چ tiɫڣlE[:ibbo=&ys¤Oj%wӪs3_LAH̝Gr2^ؑsDPܼ޲J< VGcI4R3DvϤP%yV9gH{sa͹ٚh5X$Dhۄ|ɡ$I88nV^eѫ}Wnol՟?3UdWNz.ԒGA(f66kN'Ǡ7]-lZ_ld o@]ԃoX G^GHvx¨IqkW_#:Hҳb䐈{E6[<=].m@M+E$SDJ]|{|ܓ+s .]ۇ18d/IpX8O#X082Ul 8;I/~܊ҩSzޱg51X(L~K/m~ihA jabzQ/wLy%nFm*Ŀy*B,M :qKvR.vqZb-uH|9g1u WhA:C*dyĉv5f|wœ|uZkFE6,rJW]Wn1|a0$ܻM݅KY5%&D9m ֻ-^q>Ɵb5m٥m 3ͳ qxݗu$v1b2RzĻ""Mt:?\,ŃӚW^/G 8xu A1UfwAztmA' %@Zw3vGVÙS<,T/&:qUMe|$EUѮ]Y1VD8*԰[patYK\$@Y\r]6&zHބsUdM<q/ቘP.T1mE&d}B{UrĘvE^b]ͣיIh\G~w:<ԚRi7#(8'\<1zaG3sO⻂7 ZO'm_C ܵ$KCQ([WYkzOMęJ~FGG}eMfG`9Ԅ]a>5b~KK=$OG"Jcz6`U`Ocѥ -~%ރl}ZC _/^b i%mPAph`F6t+hG[=yn>FʙĻހ̨{zUR~q[՞ 0/KALgo?Xch:#~GI̪Ywk/LJOuylMKڋIw߁TFͺ"c9&Pfb;QCZZoS)qGf@l+Sl]sƈfv|$63J|Ib=V¶TStCȑX=cwd1>xY 5.A(~!ARϵkn)g "1M|:P[M|X74M p;(\!.KNSQW~޻~7ᱶҎ"x8J5 E6:]K /5Ȝ 3'"NV<ՔDt.%BomׯS͕0D":eR)hkH. I`UG pO[gFO-qyvۂwvmʷoYbo] Zݍ}4歘"ܦ=v)xSQ#΄5ld:e=!=Dր O$H"хؽ7vi^*&?9B'YK(r=ڱyc.9r =aEBaw&3EH$3xȐHeG3-&r| /k4W1ޕ_M$l}Ʊj2sl΄viܓ3k+|\/k66*'t|aIz.}N:U^7 mu"uJrU ZTh}X,&O˯@NdEc᷐e=Cce+Rm=H(thVUd[srxs#~X }Ģ(>h0-]T?.8yu%\Hr@{8.[ F3\oN̖xoRR z??dn= i <^򌆃YiTu5~ V*tCu<{Q0]HU0P.RWw|tXG6ˈE4 R.Z1o̽:GHUƙ\H'oI="+2:Qޅ1Zi"eJSWW#x1ܓ.OOHG PeƣI[[' K(l[!";z~'>Uy8W";*6|YAE@9V9]{ayn;5+&qtKCn}YT{fnٗtt+jjlԾR=.?_}|.k\0]Pv9 ȊȋOwV v'?g'm|Y30~7J+S!O#c(V:zs1(HF"b ϳjN>u\{ 7"܆L1xm?ʂ]U] +$-WUDqgw_[!vb97ޙE^=驮~KLW3̪;[[Ŝ :? 6<_@?ԇ˒Y,_{2Cxr¸n+&*0ęi6aikϽDZMv]>qA٩lSKd2pFzUo"״\IVԺvA*du>w.BlwtHҚd?[#S`v11@U?sy\ЅM{]+9nZfnmkqRąGmGMIK$tn-CLi"JѢoz8ٝ!{ڌţJa|8Q¦uS*/&;xۻ3,~">se5N.;U_^geݑ,1] 8ıJW ]l4Jg g}:gyv#i:) p0 4{GRGLTXJN]qJVtB ~2lb{ c&QQ'+UFŒM4\> bmxO.m q楫uO`ty.*iЫ\M ++||Jmz~ . lIm ݈d\ 炤bmCcK|~lRYlmnMOsgXޚُ35ʉ4؅UAk wVL˯ {H"8bN>R CY"S[,?H3XFtԠ_Eʕv]b-eNI0C;܀]9sxFrc9C*WѰ +̊ qK10d|H]Uy $L]!` X󂹦fJ EJhKq1ԛ;hL.6sӃ8k/|MJlehd'Te>a}MRU"ل&x^_qa=\Έ㡃apۊBߞRzBR%BpU i#?G+=ؘ0j$.6/~TiR86>zUV%54w } TWNF1JǠqʰiXy,js SX؉ȪUN˄]R{xAVMX~(犷m+풩Ϭl#'d^SiP }&Rk.i׀FHZ ?7 $懪a>x!KknR"E5f&~B[|ܖ{&U4wh%2bЅ<#PC\ȉu`Z3+̭GLO`6=rlx59JL5mey}%Z] y("J?N% 7(.i}Ɋ̸nSy)S^w!VBip+kfeC&E#by{ҳ;bs Q& { |*Rʪ=dSN{~IJH_P!$&$6-[ :$ ^}Lf 6ۮz !.v?m [Cx w,kp9'~#V4z=E >YTLvPTi=|+cJ$F,rANR,eEۛɻXOЃlA/,Y5?:Kޣ6m2Xy%2CvٙY^ j[J!0"[KW56]RP xL靽x3s99EaaȚumfMH|Cm#1?4K` <ڟ7T97^+FiuKo+`KzJǥع'ϓjhr֔,XmpLP8ֲ 1 ˂;yj~ƒw_9` <~vWڬ%V> p\e9E㯷=!pekP^DҺ>|>cn?3 r:IcY͊' .u9<.ےr9J!/L4[Xz'g8[/ض.؎_mW| !>?C2j_kc7鬰U276HVȀY~ydzX}tYh6\!!d"ƕ$(BUMhάu}x,_33qq1yIZ?l^ E$y :i@'kz-w W, J*ѝ(ǚr_+:K (R5~[9WJж#]SF|Euiӄ}զ@GhJoz?}(f6;ˑ(9h<"_FOz뻑y=cEK=SܒƢj۞_1eHZHC>>Y8H+.*ڥ֋8([(~֒Ⱥhr"cNx+KWZF&!{l9el~Q2_s3 8gsjϧ4${SfzT Slɫ&[BZ< @l/Jʟƴ`|# 2Uт n8ЮU> T"{ɱ}7t"'盎sCW#i_BluCnILjũUY+[~@(9T{4|ySE/^O4RNۨ_) {?!MeCp %OyN( YnN&b3(`VD+C9+,O6VkmHG!XbtupSLbҊ$W'ڡby"5˿;NQxԜVCi:Sciy}˫PoS1#3AocnHHRBl"Ci򜏮Ư 3̄xq=2t.?cRi۠hN눇##Ѵ7fGZm tDz(fBq: N[]$dւxg{%U b* |8Gru v)8̽$㦪{_jy(L!ص4 I>R|ILoNf b̀S5E.Z^N*$z),o{Ө8(Y4Dh;&NkW ʫ[ݭ'z-#=GH9 ,k5|e0SןCu''h[*'4sl; ;8/:5KD}{e9^6m*H\4<Qa1KkSϼ{$yϜ }Eל;y5˧YPOr % t#ng<)O]ٞM7F2||šp@/#4yn3Y^Ʉ}6I &\C(h%c>3EJK9h-}yܲMvqx_ }!ǪCU^&v+=\~̑I@FC74Yc3BnvV!c{|!,]q"0Ч/9(5LEL)`4M=lI-Q`xoUbWxE>y2"zL46^)vFEA&yNkru衞>^i19KE^_ \3 (Z .\Ohē.ohהUa5*`2|hATpq9"C 9?bMY\{} n8lx0f!ELα_vHRw_eJvXxB-6+ugHϞq4n@cB+jTĞz #.f+*n]=IlHgKͬ+&5q;UtѴ'2k@ WؿY_A4blCM4o6zH0KK#?΄mI<)v!;-u[b1F {˩恠,e4]<*ɝFE b4& MTTa V 8+)sŃu6xh}US/@L)ApTN|KvTQLu.*"RG@{3z\ o$Rf-4Zcd)n1V4 3[zg]wY [Q;65llRw/ mgi|R'hNI -}HOCäx6Tkf4+`:߱.սir|@9in$ k9OبO`]ћmK,/jԠctM3;Iަ-34 ܿ(cx?j ;_I"(IkbMelQ-ug.ᅹ3qp븄U3)ܞt)ʷBM^oވN7X1m};|]XgO9ڻI!ÏIr} bCsIWhlPCur -YAcsn/R-aKN8/s|'JVTwVw`|Ok D$ǥo[ۭlf/w`V0pu /e`Z׆X8e#jڎ_];X{*  7|1*RK1}=nyy+R] |GW˃^Y"ʇyn ^U"0G??RGƼo#TLC֏gq@D'6ָq bi&, : *|=g[O"4]q0tR0". G32Z>>:|b|@~nKF ěh\E߶7Mag"T lvyҟ6#MxmXN]p7q1 HQmK,"#ʹ6roo݊=>ݢtQ'\=gTEU6g~tm#9;^d֫1G4h:&d(in\0!*.jj.<Н\4wp ]1j,->JI_*Q!x͌WTJfD-4"O_Zb1+cg$%)o= ؾ[#O '2#)lSRz:A(|J S!='k}K#~".,Q#t^Z\]+ dF#-VJ8k}p)C2BC (.M*ıQzʚ/iT>a$t)KY`/1Cm9MhV+H"y#?,g4v`'zlB+!?:hӰ"^VHU=H*b =#OdJ6^L "Bfuމۋ~eE5jk)FDtt[!zY# u6j3śt]#xLUwqڋ9"?Lz`ðc}dI8K Bgp@I[ȹM` Z[:-eaqLnGyO|d&dYOUuobE a+F*}8ߩ <3MX/.[Ѭνl9>oJf`u®X~Js#ȟsR;4t>6,S 4eBURU+>⻪i{];3%?)#5 &aQXe $l^< hFd`ޢ}tSD@|?߮6]ꑒ1\9EU\"e0˚g] Aвh=4ddOΖCP!Rܮprzs`r?2O p$r}fSA$Zl}w Қu-xuWPP<ԮY4".X͚R~[N &O]+x01M}֗d9T[sCg*'pvW41n;oVٓ)'G&ꋲ34'$e~\[9Yw+Om:Yni(%5plz}sP4G\ϊdx^)V;^=EhUc/`Kk) =LU2i4r6)a(NNb⒅zw`t;dxdsȹ&ۦO2LmIzW;\kqX53bg )k}጗"Bz>th"/e}B.v0׭̪ ($D9s~s 앎"!JM[mKٖ*S\v,C*#Nt[vZMg1͢O4! m3#kԡ}y8N~U;$Y3o)q9,E,S؃RA;h@U)D7.߼AΰO[м5.?^fGXz5LyPud>Fc0m'q-kj.U-&%M۟hNoWu/XxI rG0G[)fUkmeq73I͟(b+I{MKۃʐN ċ082wMu! |YYT4rZj"}1S1Q¢Z`?ݚ2ӷ(a/- velƒo?0_/n> ][UM 9o]xtibeQ:8;K@$b敍Cyhu256֞I"^S# ]oylk[~Ut}-z!GB"WE\Sʫۣp"m%Y9Z&5+QFeXcoUF#nvWAvZآ[Jލ/~޽Њc.tHlNSq-aj(1۬< 蔯~ҽ[cj HdS0m,(vpwtMѷ(yO!5:S`y .gdzħo5Bu4Fi1…P+nqu%8&@j&ZgwH”| 𽷼xrDnk` ע~NF-i oeh =3Lft6c4r E+Q}sj仇_ġ3qadpǮ}<&1IJ1V?=ktw؅QE ~\,Ѧ2XLO-|{t.Y>4٬ԓ.!13FhVp18]c*g}qa j;-*>&i JNZٕkD#:ng6O rjFŝvӦaM_cz{- V^q%~nнTyn:z6WWpDo FVMԝ\APj9fopER.ѥVL(71YzGoBgӑB+_X2e>`Ҍ J@WKn/XC\.%j(*s<`,"EN$wN hyiM &#S]; n,s"bC>=f^Yt۷u_foI08t[ )S3}".;E5>ͨ8r &0k{={vTWI޾Ͽ}@%T@D`!RSG@j(ڗlH3[M7$t X "4?^zu Ll"dF>G~yi ǤgC{;!lUWL8L[2;7 r{w}SI#DB`>ᡵ_2AkP`9nĎz+찘B?}Fh6HS?ͫ==@-ZN gpez;hTxfb c Ao I%Ѧ.Ǚ Z-QKOp,L^ }Wۃ:IyuƠA( uVJ` $fx~q1[s"†B5wL%KO^q@]Ty_׸S) cG I$ v_6vUL^DzVޔ Ǘ3d^s4ՓjVUZx2nks@kyiVAZXgoً.Ë9D\'(H y,g^#O.ٜw~LVU*!MP'u ՜ȌdD(zՏ7#)[ S"w4\}V&C{|LIhvbb<1*p/&'Ahy(Qkؾ564Gd(S_!܎v5msxiQ8{ID3GɜX$Bg~۠$,D6)~j`5BLJ>}ֆֱ}ȞEpGSVwڌiTkb΍rEs%}~~bNӢPM-|Tj=% lD iCVVvk*qm&Yvl۶'&O&ٶ5q 뵮SƐ!L0 V\D)fi rU[XȑAX-jpQLnN{j}Xp<^S_!Ω@N3tzlW%%c7 Iˆ&v<xYsD Z &E8xs+S0D烋65cF"dNk9c` @ 7w؎9BCx9ޱc}u+Z iV*}] "w>%S{GZla"g@.%6zXgL8{8keltDD#5{ol"Yԁ}ms98]D‡ UҀ<Oswr3i)eq^0B㵽~Pce3y1vGxS$nҲ>}2!gK/a"M; .'|% ?ܘqjz[uXGt#}zc khFiMB𸟷Ɵi۝PĘɉme#gyCg~֦  "X+_lh+S{gP1͔X՚_{:`т>&`Ǡ} W#Nr1  t PbCK&jw)e#y [)Qc1r/ )ozj7u5;uIIX¿$ zk0:Q)rĶzƍ۳2xx.GDV"vx{}ĉ'l nGq~kJzgr/4wz!zl;ܜ7!jRu^W.aV;|,0sr{Zs9-RTܔNXBjeWix;SrE?mM95"=xVvR]>$ç^}9c#郃6:~Jk~<+8WvrW^DG"֙#7:q K4vү+Ȓbʡ-\j7ne7!&کΘâΦNT2O~T b52`i[GKIJz'yí9𰼗gt8c YF޺+%| 6 r^zp7a39j 3 ,xf0ƅ:ł ~$p+V[\<gX IPM0GtHɿPO Cн?yhz&" /cؕTW9L8=e:4iM `hNdu7u59= *M*\9YI^A6Ӕ,rE`p̯`LCahWBXY:^3}a~., t.o 뜙NV lq|)gV P7VH7h[z=ٰNk4oP+y8i5ʢ.L(-(.Y6i!>g%"Z?Ү I^رA/t`ЗINT#dՍ b6idCo# *bp$(2ya|#?B'9fE +T/W **d]IUT)?&P= !G.k\xT * Bx:AA->O M+:#LC#쌗sQ^YE#-| ܲҷafg! tl{ٽ+}PU}XgpTJ$>S,'4)ɘC.os3ID)gnYLG:xּoşRrE 18Co&RB)LO˰{,Iz>oU"4.yZoć~s_*qfYD-dE ⚋d)&m!##R4W,'JEiXSwt;$#ܴ*#<]X} ـ n6B?v.E}뿚4=ェZ줳Fk5b^اԵb;72K?mc;okʨ/"o(R<}OP0o#IzGH r(ޠa`32^qޣ kdKbfٷvHfV)u.:ۯEJX錳دmꃖߖCvfU%~q CJǏgkũ3WeV8š(8+4gI&I4cwHb9Aܠå>Ձ {U+i 2Dr9EJm*xAg\sBfaRaƆ{hΰ2o91RVr s3`>vى(y{hϔoOxGJ Xo-b٘<=e.d @w>+3+ة3Qm•_{oNh^ ui',[b:\қ1SɘwOR@v9/(unB,옔՛CU@S5vBؘFר:omXŽlUb e2Eub7Iq}Ԧa}#uR/&\aIo_imܵ y ֈv K-}xI:W^r5*rt37> LQ%%26)vE&}ô/r͜`ϩҨA24w jN`7=5ٰ?gRjQj]4oI0wmCa#󧋷\BFܝm(|Tbך V5/Iՠr1˺sr1XY[ EjtJux2|$瓳i˝Fb(Cj1=rr0\R!~ŵ%y?~$ϱ^E9cu 똒$7*:7hӕG Q>4JR,/v\gWp"`ڵ]$|./N"8x3jDDe]1g+L#Va矕ҍQTI wѐ!f}1ƪDCnˁ h;6gGu=ռUk(%o *fFio_L95 r)#܆M&ܮ_'S=E,w=@q?L{Ka5XhѳM'+}bIW%L"鸹pHɚh;G~ݝX54_f6DY_.e VG\mCM_0*H R (=~ՐZyB3C3˸bB(8G:ì]r! kc+A5bd!.F MgP] _ iuVH) хQ5qN@ڸ '0&(8_y2xR$}.ai32z>S 0gt- ΅' (Rr&uiov.`ٌs0#Ebmj l0 dT1!(tL"N }T$ TcĻZ&>s_҉T=ڠ9 HVbRvF&7zZ֪F܂,-05{`IYs6[}JCJlMM/O7x.7ל)\롼uao /m)|0J4 M[/\dgp{&I. .mn;W2 jEXǝ #eJ:QŒfKמio7yIӟ;A=o).d篽g {8rLqgy#@hgcd¹c;l}Y?5|I>'5P227Sa7m:#JyhR}Hf#Ӭv9U:tA#rW. XLH-Mp5Ҧ<;!WhQmoO=ڝK k⼮3YӮ#$WZе.ϡ52K7qdAC> 1z%_NX]}dLS]Mjź#E B(7y}%zJ9&2^M?ȴJa#(0]1&OՕLP@8]QL9(Y = ?^)]Ƶ?z@׿M:TAaz^67KiUKꌨV,ⷊUUw?WL̍]]l}]]]{ / endstream endobj 22 0 obj 24390 endobj 23 0 obj <> endobj 24 0 obj <> endobj 25 0 obj [ 644 250 644 250 316 500 500 500 500 500 500 500 500 500 500 277 277 644 645 644 443 644 724 639 726 636 627 637 592 729 351 495 733 715 882 727 687 727 687 563 571 604 684 473 748 615 617 634 333 629 333 711 580 312 592 507 556 464 390 549 518 495 284 568 434 514 535 459 478 524 438 507 679 475 524 821 690 437 ] endobj 26 0 obj <> stream xyU\5ҝ0 )H7"1PСH)Jt %!ҍp] 8}:_irHۙl N.a lc\\ IF`v2&p0@L^a.2b% q 8L+n` 30$ma`#'w[ l}[a`KK905hV`[fb wA,C3CVHHa?C`G >@֌ ]\\8]^ƈ ~-!W18a܈8!QB@ s5C`nksY@l-61wjB 2D@81K0% `W3+fZp#aog0: `/Gg0s{y w\ops!fp)bku[ ;/a}🨰 Bkq5B U1X]Ll P_͢ 68_nIZB@lYzm؂!W W bfm vtp&Q\lh_77Frm_)`g>Y[3;s%`\%PE[ `e #xJP/zj@}3'8_@v>mPIw Բ^ _x[@ۃ @%M4wSB4y~ lV&Pd#R_ ut vH0/Ꮏ; ~ fg 55A6['U6NP8<1PCw Fjq5_(j5Ra^D®_dc i^(?b:"+6_Pic*dQC-xA@?pՠHA^Qo|-lWFB~+Լ^@0w[j;n|\@-߽|WY87"NmAyMDՁ|C]/ T  ptd|m A~D8.ۈ(%Qv@-|EO,6[7?&#AW7"xȀȣ0Y{G\W\"o|ܿ"&s(!CpAݴ#{{$J&6H//~PA~yj1Cn/4~g/ ﯙ&eAG$ D~d/#B7@VTY7 0 x{U9AJY/ |  ^6!ATB+:^2 B{ծ^//P*|k+#X .Ai"%*Mτ!X)@μL o"S`M4!h$!/B܄P!89y@: w0 RHBPƟ"iE!\^d]H]B@YK[!oKM0P"{OQ̉@Mkzk /PG#o([=BpW10$!T2!T "EPzXZ^ޒ\Bl$PTs/G~M@Bx_=HA9Ok,BP{u@ HG8k?g!]bn6wt@:_R#W>!^ A:\_?L@P?":[ĻwCNP0r5q#l:j:AM'Q"(Q?P~y׶C\BdoR x_l,@ֿҗ%$AQGQXPSB@ fgoшBbXX <@)C"SB@!uD+q6m/U(ٹ/B!DBr!~E궖-`A h_븒:0;!?zAx#!<@DYÑ34 ]}@<*0!03(F(@ kcwD)BDh"⽎('zmpsuinP1F;H` (?ӄmZ\ˆ[/0\`K-e$qv~U"d +̐͝ R:q ~M G\WT쐧 `#i7 ?a+?K\llB(Z @<푇lL,!HwegF!D#<E/Rߨ2M$ BhB]#B*7Kr!z^Im/|(oxRD?E(B4Wx*D@4Wx Dt="_L aWLsDE" AT( )?]|bw ));WĻ(+ċxx]oC+9%OH0lmt#{XΚfGNi>JGbz.=R^OYYlߍY+zbU>e5`ˌ6 ='r+4'Έo0'[QhTsE'E>N )GG>-Q[B.  _`즇~cFXK;pk2<\gpʣxxSҮ=MlxJڣJ W\J4?'%3ud93*lʛӭy;ڢcÚWk djK- [\Z5>w^Y?zz3_2gyv !vF:al{Klcm&`}uvA%{@`l \yAw  /KN 9AvBZ0.)tVS,T6~PeK(k#{#k\?K#gёT)ZOt]1!S_ڣ%T~n Rz7WJ{"7~RC?uOڐ5_c/.- YV#ysN<6les y19e^0ˎJI]4ԗ1׬Ô;ǥZd[I{7u}c>{IS=z͟V1#N/yZTCWC. O{E7W8o4+nFMl_,%j-0ifzrU[lWFsHmN']\wK ;g]wvRjoTɞO3l)5 T# ,Ww\y-i֢omwfl:GTk>5g}߈q tr54} ڣ1RHʷ&VAɫw}pN\"ʾ/렠IƪAڴy(MTqX,,Kݰfi(ٸ8!C|D=4IIPK?%W-G`uul]445\Œa1}loE^j:׎=X~uӆyxdPwDC&ozZ;qVI|c<ފM/ by_,-~hNSX 8}e5e2 J?V/G恽_= yaÚ_I{ހa1%oVE8)į c; 226gM > ߐk_{~\KR]=~yzy7ve^S*JՀuZJ@"҇V nCuDl?$ Pe&H250ae:!AIR:ь \p(v,UTYT=}[\`ܝw&5YKc"v{ ՗XW})9ts}~ $'u״F i˛7E%Aq`:;$t4sC2@òj[X8\! ص ˨ˑPg=Cl11,SǞ;"}蘌¨9)$oHѾ7TsĊ0.ξ|Ɛߑ`aV,kRrԫ&B ~Qk'Ǘ<`>cy [,7#Ayݬ "{~E]O:EBCyCHΒw?%7E(ykH7v״Cg?իu"(U4AfPu\lbprbQ,&Z=J5bFčjª'6}Fj2l~>bհ`|/tĸR,މK%7[n<;7togQ?Eվjٜz,6뱠W!\E {))}7>xDjX?XWHxQ/P>=z$p'{<؇m'W=P~rK3sL3t&$;.2bX7;B?"OmmY@ջ\ot9ȗ(ǵV<~̊`OQ-՚۳;вdasdAO=f PGud7ªJW-,|:ǔl#6:3E6;ou"eGމ铫y&s2sD> 3AQ_wAZBDc|7:2Iܛw812j~JG(g>ёdSY}|/ݓiw٥LK-m_vCWoZxԏ@u 7} ]92Y3oR~!w845 LO=OnZ[.mPLf"8byl'DƋ]?YN!'jM}FM#.mkGŻbVJJ6Oѿ*NgzvzM㔡oCkȖ&-TTgigOmje،^J?W`Nij$Xoem8a‘@psz)@B}/F|_haے1q=Nƽb I 0SM%FzS235-sprX֥;/K3"u{&#cAקN*}-}yԳՎMZK\)pn(R9#d3nb{oU7P`7R.Ubcڼ8x/yXw}h㏏H|eᤧ55~j/r`w2_ o{5+U:@_8+ .ח$w?ckc bX=@T?R_vn2Ȕ}yy d- 7%ɴcߠnkև6N'sň&ޖ>[M[5+j(yD> 1b[Goգn~޼!F^ KT(`Q,!2E2)y 듊<_MQ@wcr >UȞmJm.> Cp%{%rQc'yaε7EmN)Qr虊:=O- $T/ ՘n0 G9 |N]Zē>v܀R 1j}M犬B=U5sִ>Iw>טvBo|]R;!oGR{ r[- iS|A-Zd"yW ︉j2ݟv>K Ĵ[9;JMвV4v;7q1ܚmaM]$IN+V;?/}(54{t^1,يɠϿ&d/cB~De+\Q|/~!0 Sz On驒_n> _>L;qoxBPVI~M@O}4K>Sj\mV/>z(yD\lkc= lE CeT\ֺ'T;u"a-6=-&wUrjH|ymgE{ӃpE&]GaOCr$ȕPz6/Βma5gOY/by-j}nzQ+U"匽rϗ& ]ͺWNJ~CNܕRe4廹\?x4ZqsVL.h_"m=1~K*ҍ}:]XzN fWkYS=fٽ =9_ ~ ך:ߨ%Cm%fꖖ40nb1x6NGc3s\ 1H( J Ŕ(JC!B\ppR/E8 ˝q9a7=4heH9Nd-?5\BU*zCs8mQxGlL'[fś'{ltNwӺ W;UR̽O&?.Pu|jS pbJFhY֜3c*%&'e죸k?y-`o_ilr kܹC@qA6,L㬂J/ 4hc) dd7P" D)-3dwJR?G$EfOe,EVt6tﯘ`|AN +Eޗ٧ :F:¹[+8 ]Nb?K;~%Ίq͜{9?jP1ųEQN>ފZx=L>gT}w^r{;򥶍嵾[P}*h9oFRR6Ci6{dF߽F(RE``uaYm Dcs aȓҨUo怜V.tcMϱa[n?#OQr@m%Q Hi<*}*U+Ѡ|OTn4sh.^:V!-CK1Qü2\#+btG>~O[veqbn[W(P~fo4) Fdi"oH|`YI.Y5)y\g!{9 }s+ (Ǿ#:5sąa_X"בbo9SLCQhv\f6q2EAKJhDq)gVc$97o.P]FxEy o7Dn;ؘi$c^scKB%HlkTRGeFFa,[*APE W)>i$3HcKRT<)gL2mI IBݬfle89o`| kQsjWǨ=5YOk{?@{]&=c$wh){ylr0|ٲ&tXq$Oޔ/_e\jGOTU]dkEaM^GU<ѧ\3ĵ+}mVa^?KtYCf$`R3Ū8CPd(@ E7?K溜[ i ]hdNl.:is mҩ`,~Z&aG$8=h ~$(h ҥOOV21 z= za{G@`@V&'I<L0IiƳW=e|a7ϢTZjV]'sY}1+R$W p9F;*ZaݦxrW @ҋK2ZQFEtk4]W7(ۨXF޾*['ݾIM?F{e onz}xI!\W74f5kP'WAϭ4*>j(ۊlC!znda ɕ9b~9vMB܆i6޻14Qu wkU֑5=Oci"qge];U~_u9I]EB֌7V4K ɗ=ĶJ4n2mC81] L9j\hD7W$aGT5l־ 6ń4C1nl.g8&aݤObF7<&kt4̫/ GhѰFZv{eϱ/ [ L;g@(*{Ge7Tf@=}qjY[]ʾbF\;Ý([!jªe|z؎QGAϞv{)'0:Zu`@y? kP'HB\]bV)IeBCY' &vڈ{wt' ԱTރ Uh/V=!ֵXZ^ga&uzR0X<&E0&V;>,13o37RxpU2G?U=*'ȯ߰Qn8m,SבBo t&07ݵKOpnKPgf8^ͣžw|'F {gzxcO{}.1})ސUy2*Y'v؛YUoM}(Syܺ aWA=MAp2xRprpyKx].V=Un|iEEbdԓO8\>./b-u/doW$n(q˳nyi8en3G: arY:x&80;$r1ZĦɟ oፚn;ͩ9o?;Ȏa1-%V4I'LMY!+tQ$YA`cb9)Ox7IFmd=+ȍ2T~zC ~h@uf<+!ɟJto5R垬QJ/Ck" >? jreu${|ʓ$$4K}{W=Qs.>e#%O![V-|{cu}{Wۈx,[>(*zn8GR"pn$-O C oDTFU<_!W;yXr$:f;,%e]V7;Ϲwx1Ɲ?B sn`4[Se0lYhWiOG{ ͉;Qv^MLD&ʴCA?Mԟ$'Tq{Q- u}d,|RIJ-lgI##.{%jsR#"dKP}K.(wwwʉ@T^0my/Mθ$ 4tQDk{ .u%Xnn^(cX;34;Ťc]q@~cf#w{V5y>XWaMMna5/f1Ε1-xU{;&rQHX`NrF:f;CyT% v-Zbx;<^e=]_Lj<9x6!jC)P,S_#$|eMB&NRG3ɢ :h7v?ߦo.7ɳ;w;EЃpI3 :~y8JiÄxLT+b=F :^Ta G,Vo}NHp䩵cW ?^}ѨdQ 21ܶqjGs./:-3rloVi8uX6UE4Ywe A5]u< k0MݩD$Ɵ?̳=xG٪ S<6Bp.YtdL1fI",Q&͸(>ؖy+=:Sݳt鯺T>*nKY/ c1f/'r0k~.+;}+OvZD nT!?0orOrvoOSڎBUT1+uoQPKt&M%AhqMa #co(+~zi=!w3bo0|}3kB<{%{!d*lK;9qWu^Ɉ{bl|( ?Ĝa%WwH2+kl{K=(7>| gN[S@]T!YiIw&P3JpKo`,}Ȓa-AR'a.cդm~*M7`wږ}rMRU_/8m:}GVc8]fQٰgdaIѯ-2kX_ONX=~~0a- |5k%O"\#G |4ʹ͵&LPOI/et^թ/%IR!^ZW^/?aiҤdSz>V̹̽Tb`&ѐϵ-g߷?k5*R UmM^Х7jꂬޑ"3Gÿa|}1|K'Vc$o곢Ͳ[twNq`Ksph%ae\zM˓xu^hv\Hxv8kM&E|,l28|7֩ ;X'kRJ8(711t„]W GEcr7](PTƝp %R(VT_xBx=y:63>ԅcߔ$vVU\25K{Z>fFִO]1"}w)Vt_Zk.4(57j>ݣ(jj1>շ ?+{x51ǵNOn%p >j~dna0ci>M( &kK]5BONZX{\tg%3'%?XW?7H.زjSUFk2 948r8p+N6L{1ㆶbj7bg=<k=i`΁Ϭȩ#/Cن%s,Quk1`縊8 E!;t @|<~ {󫕇:حpFnwk[A=-nPўo)14;kSЯSD+ ^ ٲw8Ln8ʭO:G^rncgG5  e^aϐ\ᶀD^oÄ4M̙l?bb~wDY^?-֓gBy|d0Y`m5"X 4!m=pay\c&7H:>FТ^=j[Fk%C L|)x<ي󓪈 p|Ӵm0-|ltQ(d6ڦfgwi@ M^* |l_"nGIn2}c;t ?M?9cxcl(RaKwh9A5UOhHUGdTu+z+h\煝J}X2ǰ(g!5 ĉGWp^>L䝥'+}Qj=^ǽVxc:vގýd2'*EL_{^zEҀ=Skt{9ǐhCd>BfcE= L#O:[}~ Y'ư&v;ӞRW)s/Y_J*X| +oh;p(2QH~V ~irsq{d)}qv']$> KOfF-Q(?>rw U{ui㚹+skApØ`^=刅/3I6S9Y*9ElWq?kWq'3E!H};FZbE^pCx pD&r.l>*4\Ϟ=[GD#Q^jR"˜t Qi_T߫/88 ui<*ȓ;20 o"ëP-)>z4K(VG,6Y\ 6g@+BU F 7:lcɲu.. ;hH5mQ y 7RQl.$Ù&ߧ [ QN'ڡ3) -cKY/ؚ?7갔٥a>,&'dN); ~>29hK^bE"Y4ƕTJ$ ڕ?.0/yXy؀/6mx${1Ue w)WTw)P؜⿩+ff.N7hx8Ql_2x, -`,*~s/G- *iLvbiΓXJ eXi3$a)7ZؘbM3l4ޟnE+ֆG4t2'IEV=*q''<5~M4+Vi?4? #֝?x0l#n䔖0t%tQ=S %wu~ [q;=O/q.OŜH,sEv/vCݟPo_ j2zSv`]"o˄[.۵}#+^xo8k奇=g:}~Ҙu}sAA6Б|lsijKN/GQZV/p!I\j$[ީӹߎ5wrEw4B|>:msFs~?85DGyc~4yyU>~v gHI6Tz͆o(lƺC9_Enӛ Ȉ|٘#=>5Vb;f,?:oԤ.jNZcfI44?sG S]JpVNg2JCK.o6y@s=j&,N:R2y]_l.yIo+.^΋DIRQ5}uŽ\d4{{@ и(7x}m8kC8BO%8r0̝erO;YYe9; R<\ԚL MGPLpHQNzD܈?m|K3- ɠݞp3e.c=4+STVZyHnT]Dd*.RuQ7ŚUǹD<X> >a&/6zq/>#.Ag#%K{xG圁PsPͩa1Zt,wE8ݤ)?3kk[z~H`fo~BK+ȳ{3qS0XSB sM&-x{:qn+ts2_Ǧd ^@Pp|aiX %rJ&@tuOYF8w3wB4.p6">Bl=/N1!(χn0}X,}Ϩ r*ft;0śU*EʚbcV5 0-h~48;R6&xd+]!Fmԣu:@u,G 3'_F}ҟd;<|hG՗ĶywKS5Sk5蟓[lRshۏ;165-<}=V'rLŸ= ybc V>>M?-mq܌e<]tΨcc8\TnI4E;#p[)6zEMD@ȴ)Iq[]]ğ{SF-eDALzpй{Lsnf:]vEO8E3u]vKKîNI+_*2Y^!V`(yTib*YRaQ$j]K ;.6 )@hv@qbH!ި|@ͷ* S5z'oN0uI%d0!? KQc>t8\ eJȌ6}<|)iUJR N +jά{TըsQX2wKz^#ED$CK a\&t0S_ׁpp|2E'U]b*}ɇF~q. g7z}&K&`X8 DbKJ/rg13^;̽\z븵sU|IS\4:/T$^k9Dء oK0@U7Hq<|aHR6z5 И脋lG[2BV{q/k Xƺ3]dEHqi:F4pKl>`ѩJ*̡ z&]ы֫|eD|MU}0c0Ia"q7t2ۭ /Ǹzys p#J攵uO)?c 0t;inȎ? ݔG!IoЋ۽fztVΰpa<0krn/ʜo&wD-eVYCxx'٧6db$M|VY#* <'p=63bF7WܝaVab,$%0)I|M-Sƪǻn8AxV] KaŲ9hR&`6D3fi'b~3UIC)V,hVN +ԫ)y? ]Yt1}ğ8C,^pO5ATI9{ܠxϽb؄d;|^=Jt-x^ug>&5kD,͗i(e$@hI'iDYL{IO~IX MTǠ ɐ#y0ڇ~hn0VL!֤\tS {j--M{^,3\Bt\_{\ $7mvnl۶m۶mضm'jHS룑焗2 .Xnz; LgC,]"ao0`t- UR8c]4kjՓa*09x]+mݭipkE//DRR?&‚ 'oe8L%d"qebm;Wn|eeF. \N}ӳ׆1Zz6{Ȱ @\x{탗BBs#¾耋*hCNddb7g9І<NnggWG8taܵXtdyƑz23՞]45(1B9ԭf 6eDj2\@FZ3Q2kE9IΈayc|uMTrh{zvQ`K_6s?]E"dBJ"ayvUAVIODa$szLI$'H6 3C&w|Ԯ@ĺx(!"R9˨Ur68߉{C~0F#jN V|Y(ktʛ :$0cl!{vVs%[q z4*h,xXl*=.ipwա`s:&vDw+[84zwaDk?:ܵc2 lI|'E]|"=wB~+jHbT>^ Lð5-0gʷ2uHY~3{yGtqYKо9 nA< -TJIJ-]5["Eh\iftI݋}hb@2E+R$#񄪁Cn+ QL!{^(y`-,7WJpx'1ɄvpLR;##a~EXS g1y%w1mcݍ^ K۫YQ#e7B%c  k2vStSE S|l@b3s/VuzgO׭ K5Y;U8dE/N1aܫ9OKv GЂ0lڳv16!jYkl|FQ>_;ve  SO,yp/9hSF#*ZE׼wtz{:vs>b@l7˹`@{R C߭o;&~ 9֟@xNk V5(JM XaH#tS T`DZ8:=!񔢞MiƻZj1.榽N/` Ji!x > PIQ,!|e]/oԁ^K tI;(asIn5i~[udx۔HRLha':O/`NevT[2`UbwD#D~)›b`(`*bWJ@G^[PkTc䙳)Q#Kn~Tuqt"..~)w%__9azR $B/M۲.`1߁PzȗOVWDa>gPSB%2c!̾N%)#f(Oا{U "oyP~URÖY/ӟ[.%W fEsGH N rXB3Cz$#Bk0XҦ^eCAz#יn]0%#pbg–Y#oh <w uet^%i"Ԇd`'#RȨ!UnW Ɵ1,l0GɈ,6zחPH7ȥZ1wEã/i J?;$;%`at|:vG!'1aмR5_+:LKthos:\TGBn3]%AKV)B-Ma9ܽx<;S:M;ܹPN:1f`3{7@F 浀WXNۈ^Kvqi]Y6s6&@E`u:Vsy:Xl*SY?>tfr#Y [ӹE{csRwH0 :].kw8a}(/NY䚍MzӤޅ:yFΑȓ3|Ba#8e<$a͔{EoBc8vYws)umG>|mFwe)2s rW}_ʏpkD&`<3 LC\蜔r'=X HNRo_y5+2[CVΥr 3 <6$иq "r/Duqy0;wמra#0w ,=Jhr7Z!#]_`ZA||St͏n&Ex%`AO/**F SoPhhg?4@6] vrE\YT[@'i(V Z&m?z.iFz6:m&N#^N0+4K_)ue(4iq>@[ه~%nŔ|o S _Bg`I."ȸkVEγ]8m@fWG8G7/:ċvv&\+l YiG2IQ5΅^Ndiޱg{S^`JtsF XCT=h<P>;VV9L:q #Cd1RIVNp zTGe3(H*G2HaẒ Sj(7u2E/ Ic{ǫ/F$MÐc}}ݐOejnQfbfMp%7ӆA˪W\FAy~OOaHXz'ϖwc050k/i?>%2݇^߶&JGZ=tfRFDLi#5*"խ}f$OWng c7Cv -4@)|fb"&]:w O;2a 6:s&hjH|/A/Av~Fgp۠c ) ۨwȑJ*Nxf+h]yvm!Gh!o>DĎZ_Y-bqd/wF9)yP4Ɉ򫲹iYr8lĜ.`sE.3gtLJVonLv]B_-CVB!ۘM#VMFvOW7UCh!OrK*Wܒ+c.2u*m;`U;aߪg"mh M~+b$ę }:CinG.BK_Ђ@me 2<]Y4-&]Ref>n+R@4=f@LŽlqcvkP9ҖJ >of=C"e&,~Bp[5x[/jYrz0Ktdn٩sMĀhg)51\dM߭]ut& XpdUOoA7y sɫ_?"S(>cBP 6Ԍʑ0*qRpЄ{3aq<"ߏC L?9SuD^xd9كHdV&fskTn%d|0̹|Z]9?%!8޾R3֣qE ET~mBN8P};4' Vj²jjqzmӜ5Ծ7!<rϘ;{f=yDzVdJs3=;'+1Xf6a7.׼CxR+{t&@,nAlkާu9;}620WX)a]/NZb- 䐘m4~ڦj6Ga-Լ[m2T֟ cd'PX~a4.qׄ=K-}A I18_tR0Y$ϵ~ rѓe|F>ys똬=,9A=F:8LV_d 6ts9VuAqԁ~FSH6|3-,MZ~E5H 0G1F6B \j;oD 6W{ve rCtZ<}.y{Qy|"/Om‰xXTDP|~'p_R֊dJ /LGҜ夤Hb8!֔?]Tvw #^+)y7ۦo#r\&\ria&7U49 Ff~OЙgJUrӿҗIƫ= !.Ę4eJGh͹k%3qBےOZivhcShF"bBsZ3 ,,ix6,3lZaU6w u6w ucs|9(˱DZr1bz@5Jim>Gޑ)3ÔdÛoAr}7D l!H0s 1Xڝ1 KT?qUu7mfOfQ@^BYbJz;oT(И (^lܪv XA +pŦ ךŋVjI 5{pU0W ]k"[zHJoӟh1y,Vȕ Zl1GR)j% M-xJ/L}4؁żSs{5\fɉ*XbP`0\ %k !!~g#]Wjm֪>6b@=l]% I)'v"? }ZÀE]3.<4F0өӽmϤR ǰtd{)ddB/4Lv]Ͷ$,q<6iG}'L#?шEo/Rv I8%2srK? %>^}Q Ic[C̪T|r (U?6͇bn1ZB%E/Bk TMf]1">VR>H`?$g;ZQ9 nZ=d=#y{?0jIù5o=|A@،DdbgS: 7KbNûnZXٮu:cjyR\Y}rpCoF-R`\-NjHJ4` hOreeZlC;kj$y s&flH!v&%1BGJICpűXR\;󴦏&3y34zOfJ+>GT4J!JmU^')` 9kR^ :$"r7qѹdZt!Y4s NĞ("&}5Y;72Gs:s\^&NN\ zͿ1ϪޡT`dag7(mD jR]w|ZL(a/ {DmOWZE[+l]̻X;, Ed6[)حRhQ ?cW[FG_/-ߑE߽IW,> [d0\7'I8Wix,(PK*hP*_1Ҫbt'lB[K~]2`wapq3hgvˬ&ۡ۶6B~XM|Ѹ{O S^ tE}AmodEѪ"6ٯ][$À2Z#xOe"h⪋̪fza `[4m|\g0TիB\Ac #%N!za?<$'ChCVY@x֍y?na"WN..Bݸt 1|:Kj(CRO9nm83@J+3pWYtճTk~"K?aဗ.X_ <_^En#E ? Xϕ~];lnjL8/;V_*̡ 0i}~Y,]9L(6LJfrG{me#E2Zq8*;.ڿ9g13廮֥kYJ"w(h,qS43]\L7$P;O||Ub"mXv|cxHp^ƥ7M0+u<5U!<\,e""|ͽmM`q#lZ `["u y@ N߾\;1Ռ۱6ǪB;~5?Gn&$!ͱRWkZݫ3# D~G< ~>gk DQ.Tɔf25G+ǚ2Z-ILy q+6yP'MK@<&jcl?`Or"7`ұL'?m҅V?;8tlHH\R> 2ӉT*" :Un D"MfK5=|v7Iw݄`a˨ހ*p\3!Bଡ଼l?ҌC* T݅ʂpo!;K$5+zN{@mPBz]6I"kHBQ6w@K0T5xP+$ݨ: g qVt"(ngv78-'vR¼Nr'MD`r4*-͟t0=f WQMl_~54Ys8NLtT@DL.<^77!,TF4c siOY+<-LΓgWb{GV#YW(jΜ Ke35D$h" 7-hE 2'_˝ɒ \z)h >U1"v͉191}A/FmED͛Ya5 tQ X*+Y#֣aY_1+aޒbҔ )\AfQQ7N%E횾)qr$TM<`T9Ռen6X"d4MˏOq^pgж&((ɄT >'Fz(r%K9t 79_R)dio N~~Z!5 RD̚%񠣬w2-Hpp-d3u̼=T/`;IU*@ dv:t ă ZVC@#RL4x"DwI5Fp}\c<3B~g0"CؽOfJƐsK{R%7&7 BHuH4PmԒ܎@"|tKޙ߭EMlyhA{M0Yv.\gM7~-=OMmCXgZchXiSm }#QX3$%e~x%B5w-o<'p31-l-JaSK:Hd#5OMԨ܎ʘ JH<Ҋx`dVJ0&ҍີ,YxD?&"¨qk ~,sd怘_GsW/]p- qF 7xy7# `T8AI ߵsEne5ba䐥+U6>5l+Q(~u!Y+y@)0pq.ywݧnڠXJ3=L' :l/(>Qqi뻌kΗnG3 Trм=+Qw"s򈧢1D0P'x33{Y3Ěe@꿲 B\ZBn6&=#&r8dd|%+%f2sb F`xsgdEù#*nIwXt:imF%6˲!N.,qX "`^%cDx+:w,GK65zDIH`hГQCU\R\pvʀ.x ,#s̰xGS^T{P]-Kԓ}_r/[8|d0x5Ep/XSnR};S!Mh: h׾4x\  PI\[s;[@g.R47MoQadJ :ܠ=Y>%/*J~>B84цrL1c[G/=8Nzְ9vFtvt2IĖہQm"]L7iQE^{IGĽ1X}kS+$55mx)fuci0 }(T'ׇwːXiNbJx:-#O N@>G۩tl1<`HhX,"Ǹ9C\rWNgIx|p q; ozVmkU!ԗa!ѝ&g4{`HVo!t}H)_"& Ȁ@Oc^jڲ+6_@'uu@+ߨ ;af_ OO2 endstream endobj 27 0 obj 29661 endobj 28 0 obj <> endobj 29 0 obj <> endobj 30 0 obj <> endobj 31 0 obj <> endobj 32 0 obj <> endobj 33 0 obj [ 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 0 600 600 600 600 600 600 600 600 600 600 600 600 299 299 299 917 600 ] endobj 34 0 obj <> stream xԙ_UKNaK nA{! )%R"JKHwIxdޏ?|{fu)) ZD2z& K=S=&k+k:Ak CXd`@Ek+a=d`92:2z` Q@8ڀ @C@ `ehH9YفzV1= ==ck{?mB6v&'FNN6:&0*Fvzy=H ae@0qpBo`mEes dg+ $!ww+w}p2r'x7w?̿+@x L _":/ $g㐁'x&20d' YXqM02]!/F?AkkKߛ8^&2w/g73#thF+ӌ`"w&DT?S $lLG&ET}lU3`"wNE?S v*jWL#sgt[hL_јIEcRY!տ²A[꯰_a9!‚4A2G:w叚w;~Hئ`_AR 6)GW<; e~a`? C{!;NB ; x];4, 7#{m;{;gz Ay""s`[!@䎍+;ǟP[cs EE,J:+2A َ "hǐc{1fx^?Ac ~ ǟ)rޭ{Rg\?HG^~#'O='lA l[g)P d'djg;4qzff l~B&͝ ~V/fݡL`k&e3ѿ7?vԷ [;@(?E;S?7H+ p8?쬝!0~o L`?۰O;E`hivkrvֆ>^ Ӥ?e!6i?!f5,=hr`-d5ǿ#k:5~ǰ!1 d-YCdÐfKrxؿԳPA}U@l٬bv?vM- !) <@l~ 36OM Gkp\  ?ße~{_ !@S%kGk9=ӟ0}Qf# d=;S ZwBPŝ|X9ߦlߏo o>?~Y[62_ iQl!9q͌B];:w n2|GZ ge(`U鄞(L7VQΏRlސ}B/"eM}tp18L}dꙢ FQw̄nEGǞFcYN1gsQk||+ 7ٙx^"d=D^BJ1eTZvi!i^N@=JB=atHNg^qXBanwa#K5GL!>BxQvEop`ϿB,Ӓ/(@Oef 7(y^%aYt(ӟv([iwUV)M*v͙5VY)V@k-Q{Wz>G4j2.SCYH^"dWTU=OO`0(?HF )s T߿4\RE^ã|uLӉ_|+UD-Ibf)@'!K%l9* oǷhok_ L3LW]nѬ/6EI"rd)f*Nu'c]L+CGtkДL&m#+]V]q\rӭU%M&U]e~FzlP9N⤜RQw[&8c"b8=nmCjNG5y:碨>!/&aܗ9F@+f\O|>BSz\,7D0n̂09 IIi[I/̇=x]ͨ޵ csD"ڍM?c^B+,N~ᐦGD>y5qOǽDʮ1X KwKDJe1uMsblm/bS?]=g R_6O|:=n{XhՂx9XsMnUX֚+@#AeJ);bvyXO_.1O,@h- V^{Mw9!,d V !xR/zQjwu+dzBJ *|Gw6B>.#n 6U]nÁǑ1q]", / -z;$ 󰡿"qn&iZZ? Mۅ_F1VIkꖆ+w{rd2Il{FNW6@=_H65l3ht,ƾH/aЗ@֡oՀMӃ뾱9xbO\:7$#TF{ek4yDÉ*W kQ-ng-o;7'WNVVTylE( ? %%&r:1xb,q7S)AizױzI{S&g[9%B .J|mUt+8< c*{G`.md[PAEW<5ӹL䋰R򿞘ղg_9Ű̊M1Os7w8_v&4vXyr34Po ex 8 <j(DF H>k,uXAlt*ұT5 {0Zʂ\`i%Uٲks·QP FtǠ_|8\Bt5QG׏xM=_fxXx g,IV٨'zCh a"l-:h6;V,mX24|6:ߩ:*(Wg&s5rBR)#c坺i\^V)Ki]ܓcU,uǧO5REn4+(r(hҭ)k:ͳzг/͏pEl充0pphYwX@Adw^A~t p˃79 -35Ef]׉q5udZ̀q<+9JlC:ԪH\@X"aM[K{Rb! Up%,Z FMsIn =vxm#=_7W˲xsUuH/mM|:Tiտ1{55lI%RMa,I{Kp$ԌsҸ5j+C[F ?336yRMѓԧFs"niP|+D5|slX#,@k`=OH`.#zfObHWm| 3mOIm5#e:>e5>HOe[Nj.&گ ?iC^6peRثΡ;J/Tb%#>r|KO]&Tde+Gm38aBH< $ELOi/=,]c61;~8 ?ĪO^lY|nIOFxؖ7h]!Qҥ›Lوe[TT[~ŀg%t%aAuz"L1\Qш90Ϲvg;|0Q%F"vXt? yu.ӂ mOQkq%'P$>\qjJ;a9bHg>#S.eλyW^ =\ 2_ n"^ x[?mƥzN.܂abc MJT7+L;[>UhIM[5ۮoFoK1{2\pm[2 2fC>ҕm_ ܯVpUr֬=#beU6h#z`>Ff`"χWPZ)2Ye . •t\S*&A?u&!-ƣRAcqj{$[,)kOxeWE֩0S:ifNl _K3lo n \} "[TI]q܉ƕ' e ʬKyʃ]7޹oˆl]H-9* ts XS.krHZr]YD_J>[5OmFkM =>Q'X(I疁Aq=):TX5ɵ4xyf_>α'lSK9nӉn*_5"rupab\7"n2OfCRI82dMp |wg+E;^lpиonp P!֔}2M YҚ_Z*?As"aYJG<`@d>JetOqT]1f|"pIdZϘb$BZ80uG<DW[TlEB7%I#+9 NbeMft6q/>8vcRڣ< (`WA&+&2L˪.07+g%;ޘfn|p%jV^V/5#gmh`jܪp'Dod\67Dw1ԩ@Tu5#+IMTo7/Zq5jZ *$)R0EL$}`X])4o2a,Y( %ّG$ ƒT4j`/:׬|j-JMȱd>*9ɳIߢ*Baq(.ܡRU51꜏/c\2OJzBjWJD!xf8e/DչJ}AVi1aLў3f'#ҏ?]wNI ]L ."f8WC\ite˧11"*qUv Lf\<åeV%pyob7\ezY/H B=|5Q^Q̏fbZ/%"Q 7|N'g۵Ÿ]~fB;N%q;L3-q<3c횾Xvz;u۠7T(VO1e\v3\  Zs ݩR)<[">Nolgm,ԙV.~+i}ʼ$9MXr SYD79įGKjtrkF'@kPϽ9@U״,7iVTW͏|<4]Ї XakЪ<_=8i=)b0Q8jk2 w 5#1 ICO{έBsNHB_vFs;1w{D2xzmcGR6Wã,.?ݵX+o S|Vm5/*V-OcBh)]O/`X\B1)QRUì[8X32suʛ[F"teF/^g+PxlK[* '{R@Wk[n [grI%bCZ]GE|̹ 91=mT%~dS`23~ۅ]mp ָK3YSHyxS0xOHE>MPEX4}LtSP4΢ 1ld~Š)D{.RGĄF7ÚMv;%|;&VYZs|H̿Hr.=\q{Mt4hu(E*T4[ ?+0epAqrVV̖RCq*2 A=Aϛ}-K_'|1uɡ[O8N~yiy<)2rꒇTt%L+daKD/cuȉ8,> ȩ}䜝6]9v&W4$4ʚϚa-QKqPg$-(t8=[/MڎR IbM2o#|k,gqv$=N)x3({|kNr5Z8lq6}m*<* rMO=ӟVd#5ض6c6=- B6@gytG9"`CTKg>52n k`%['XNDBdȁ'*d ,G؜Oo*ë}^gǸ~$L#`dAzihr"^$q mTsLJy-3Aw'aYLA0NTY>>,\";L`QxγƐ R2K|Dv "1sXb'S3Z)b\p %4wVkpVmrX򊕝3P.(yLq>OTsxeutRaevL u6 <=XL렐\ȟv;@\3!mD!P'sEad%M U-( 3Iį/fk&%}4GӘ.&ck5Fʘ2Sc(pE#ۊʀTQC\$}ozee'¾h4UyX1WPʴ+> -RC&vcUheD3f[`h_UB;3B"7:L?ҳU9x@, ,WܴLx?Z K>sꇭ˧$ԯMT"-KkEA bGМ{5eAM'0IiU0ks6zgo"J(+#ce>ӯ¥.A DDumtGh5MY2> 7H $]bz*hPcNäI 2VevA`Xbx Fjȥx/Z$VD+O)Ew$p[f+;Xɽ’U8cM]PQeL/7uӟgCyJ&3[Veg֋[e->|mcqE]!iEշ=!rKNYt=7GFFyA2}4MXpa<;Inijwiغ.CIK\nmSs8a]Ų2 T?+.>Rݷala_hܡQQvRp)Ƚ)tlj\NABc}Sݛ~1, 3 k\EOS[ uþe-T\Qp]+4V=2'c< ed^7h.ܓ#BHO\:)+"㪦y6T5V99 q(%ɂ5=% ʃQ4IcD7sfu݃)h9BfrGb&o{;3-LCa‹! -|9XIoU ;i n-/nHUuof@Qn?7gz]cRZ[dfH7ov?0O7=cA 9#sdRg8(Y1y Dh5=짳Q QQob uG -DRA](Iz٫V_1TI5xF=yRz.Ea )j(rT66CUV3&o觃mVy3"{(Θ Ů8au4G_gd)_F::7{{\:|{a|Pgͽ|př 繁J 9" $EQ6}"||b RGG7ЎRޫ.nE,vd;.Tk>|ɐRz|n]ҤeCYEWG{b! /]0ŦPJT clzrR|za8sdKetMsWշĈznC:,cWK=ʽr*އ_qijQy mEt&t4v Yaf[/ Lk$ oqRI]v ZQbv.vn."8)ȯH<Ph.e[_yOdE"Sс`jxNW}QԂE' kp]Ԥnt7#%&t^sb݌D/RF݈<#vJbn@aۘWtm5i ݴQ8 &tu36ZiJc5^®aAz |n6+{B*Gy5I-^"‰Xͩ]bnVZo3FNF#k-xdKFEp[?Nv Z$D5< ُĄ!<Rpģ'0jNLV>;&[:#SdR:4<Ce:!<;tcvWG/7FN.OdEEiFt6EEV%zjՑ>L_]o$ze?x.r#ua՜-ۦ=N12"2xąe'5Fೠ|zn:IoaVZT8c+)իoRz։Y›m]s%DgRv4^={ȏ_aF6 oIiR Zim|w:)j\ z`N3M&c_̄WɞM4YӨ*ZvN7?qxUk>ntLaɂwo 0-s &B[KrSG*wjfIѽ0<3m)J:vÀ̛~꼷oRbkC4Mg^icG=o w#Dh&gvFNC`./gfʥSg52KihG5YݙsE[0hImJMjZt<Pmi%R\$CGor| +3utIa m+t,QćYީKS#00% enc F(|P?mMpqe6P [22}|" <I1 R EȴJӓc(3ODŽ HĬED;~G[FpGn/z]evՖ}%8 t/LxmL8|H&?6F`|ܸ?鍞 =iQaˇAi ?EOXYկT+<:>,p1~0"ӠZ2K۷‘!n]Y9&2fSL@2;_99e}J<L`>oV+x_"-wfQIqbCn^ 3Ay=%}dSAUJ0Djpg/!"ҷL(*6~z"`B[GRZQ>?~ud85G禾5 C-UNiyT1Ӷ8`0ZΈj>˵ Aҝ&dw o >p5jÕ " dڡKu]N12\saşcEm-|ݹZA5x'unC(E]"(,*`|\a:{y#U4}PB]!k C  rnnL4ZmLcUKisH>jcLtb¿ ԡHAY(r0ņFٍ^޵?{o ܜe"OIȚ tURXòA,bq!>d骊<3˖t٨(-H, 2R]Q=3>b ѡR^^*]7d]cL)uj}t $Ξ}{jo.(=~&SFMP^DG>SǍ6w^A\\A 9'[Wmaп`6 ]%iZgT% MųnEeȑi@}7j f -(g[Gx.fry45L79Վ K$ MԖ0Uf.N"}]۬N #pn]!SnXXzi)ɏzfD8..ˌ2#?RnVKߊKj>|ix~:fE,!8_`%8 *FñM ;#)ANxA~%x'>@sY7񈊇AϚV 6hl"wif}z<7uI?O4DQc }J܍/z.=No˨kIhpɾFBxsC4)aPjb-w"䜠IA,Nh}1!ȭhE9Gb1P^ ta_hs#䱠=YĘ ߰0I@trO./?u(+]ȴ6!gu6gZ<o3g-}]\0DhLf,ޟӯ fQfܥAltV!>b a i#֝p{:',VCفB*Sr)3nAxwr?ٶkGņ]َ I')i$xVnފށCP2\8\)xyK2!)t(eimO>^Ķ/k+WV f sۭqZaqjqe.>ئ?/SfPVڽe-L{-^>! @G.0.@7NDcR#Y4;qPzed ؊>FPIP,SHSDŽEi}lӕ_,zȸ@? |o{(Hv'2n4!m)CH?VhoYPQ/NȦCv:|dL˳]+9"Iج&ͫ QK$t;3(d88w6ɵ81s]+HՃ A9nc^vNt'ߟ}?Ii׬i:PD 3F?E/ڊiꓛoUDFȑUOLS-Xe\U4䈒{vq'>;%? ̺1o3|Dž)iSк6+'~9Sc/aVGz;χ<o^Bt'Bn,^g~BQrﵮC"Yy94 ]ŇACiJ#(rY#v=9yT9zM ^щdzCg+Da\#^,Ht' m綠zXVr1`AE2 &_Jܳ(&qJhRg.o#3FQ?&{:OjVoJhVk+p;hFM -<*rlΖng#e=* #rfV׀G<tĕZw^'_cB] O n{| {cB*XdH^0ayoFmL\\>5<7'^SEl<$莳`^fbWI,6<&ȿ- y7۞F / Iu(fOd)bv¸Sȋ58(4*Z'`]ڋyn:>pkRGLɁDsr oB<|`S4hU1I1Ai7%:,Ϲm;1=: ʮK9a@+6kזּ$ -5CqZ EKn%bn|s9g&8!5Lnɷ>ͻ+{,B:+ ]UʈߐWJY:͂e$k{v;t*} SCkwj yfZp;@MfS&?Ӝ12n;sDzMˮ 4ktX1JZNe_kIBlwu4s&B,29W] #mJVQ%[V *#+up5Vɥ} 67 `L *=$Jّz8|`oc>I`0+GE^NTŽIlElAb옳R ѡW +ϑ~/g λL8(&T5u%`)qF}D^/}^vj5p>f4*CeEmRlf1\F*q贔XwG>'{QD4Ъ+rʇ[I"#L>B~9VЌ͋3~1m0+KgnayQ?j #.O50mZ{+lvO_~ Ǚ5>[ C aSsHVs-ePu?+K Q1>[J.6:Smo0rAEĻڮ4$drlޞv'$H =w/)_? ")YF3Q# %6}ϐO>$+óRGiChb̲tOI5b%19Ϧx1č1v}L* Ue{[[2u73DҘwBs+c,[<F_38 qs sly $HN0f+g?+uagY(#:IIylAhz022Nx/6nNUU>$DBSHΨenϿwiw*X!y*l͹gj.P]>"S޻ -;OHG JE^\ 褐%# q:$ )@KǙNdujIT@L2D)4\W5q%_ME/$ PnRt&18kj0|2XQQ]i_,U>Z}@ctl}~  >f"bHE[&-)- a:SXA6ridl-u*]kFNpsDXDZrߠlD8=rH*ICc._'+3P:8bNʔC^HTKRcJS8$k[bsE>IJ,%pf{*qQ ~FJH'װ[kkDܲv3# RKV ~ d+%q|(\W5@^H/A,GfOנW hъ@p?=:S0nY!SuJwF49n=iN2;lTg^<˼`Ӆ$$!ZEhc8e9:󒇃WRE 3sܤZǏ"DCC&Ug5:zm-)̔p R+pRn>10d}da[w/;]YGDrrI'z ڧp|*ovg}X  0tI/v(lF=F $s(N AM =,U7iijfP /0 6lOlt>8R+=Xdm.M~*))whrhѝ1kLHPgHߪQ8Ա}[NgᕲV7SjޏƄ??!b@a=EcC C$"mzD3ȅ\iY>*Rr"t;V16 |HWx=ɬVڟ=ilA5U]{ͫڈll(<2 S]+.]s[cm`U< 79O<†䗾`7E(RI}!ZO_TsU;8(CWXg9QQ"bB xÓ5g7lPw8*0†Ĭ7AEŋ66|f6R*;° bfSCe-%~plxR_;[ Id(1z&fw.0\;vn+bPi4#ku)e~X;Y7x^I:B&[ JAa: %Dtmvb҃NMGqK8PkqdX"}~6Bu֝/~ r@8~edUnDJ@p\d7g4J`OK-C _֛Wlj_){7y_^@kBn3h[p0+=Rq,nn<|(@_wϲ[2e؆3.}G Z>Lqi* +SӬ*n֎Q(oHٝ,լ#ff!Ʀ4ݿ4SKp$P,?aݒ_iR3 F9<>[hE]fl,]iY^X<,&]wm:FyN^bP 4B>F +)E=ÓۡasD|K_e]$UpCs˥eCwxE}(^i{WVltn7$u.>aM up1Ngcޚ|{9,e ڠup֌8mԟjp~uST%ȹi,)XzۣeRe yI&9j}dI :l,./LQ Gߊ}_FA]Ł];TF9W LR鷴׎lZr,=Cf9F09J`x"o)Wzc,]̝_ Giw /c7kZFoQSr:p]Vth $L+?~H"!z3d1!lȫXm~}!M!%R24(_`zy#dSpr._BOgUF'f`+zd;"PE+h+v)o[`גȕт\Z md&CA qFF3fe;l3QdFMk߷j]Òj $DQS(!*7]ז̎:ّBO^/c>/W~Ѧ2rs Ip(K5XnU N[`hosj/+*;o|ˇ- ?8,zcKiU6ƍ;#nHGF+YqrN 䫴S3'jmJ")m(fhcxbnEh I)&F[DL\Lj:Q#)k.9͙ +*(-(FSidg ˮS?Mnsm[{xv|ul!|-ߪ8iO3ҿp?*&]t>G1YqUWn2Te/BhVVMJhߥynKt!5ӆz{S87bb]y1EJ.+{zvZgɼsCm[; Ʋ>][kV,ۅ Cyhs1 *AqIx ni/m_?AsnY)E>ij+!_ 7/nʸm-SƟ jfD 7+rsXNojm8ӿ =fZns'ߕƦ. gm J!%z{hCs9܉D,sʬkmZy1'Ӎ#!H/$ oUJ j\E>96/K5'% %_gtF$ΰGA|LG˦p^sL#^ƹr񉸫umXgb@> endobj 37 0 obj <> endobj 38 0 obj [ 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 600 ] endobj 39 0 obj <> stream xxe\5! %tȐR2tt0#0C *J7! -H(҈4{b8}:_&:MR0s< %Pm 8ĭ\\ I aPY, 8OK Frsx&Up ;Cs PqB@N`[j Pف@0g_ N0nІlP> d 9x8Amc cV}h ' Ԃ`d;nnnn[fo Z D 䌈_R"A/*FrPK= wƹ! q[aN sz@-6tqB!.`%F@81k0%/ ;6Mu F9} V`3 ;};px 8l Z] jp # _;:,VD@0⬭FȻ٩? {h}gQ[B\ݯA,v` YԄ-l =PK ք9Ctl P>u*Q\xr@$zptUWE~eԯ3w!ꗃZ,!Pk9.* C:ہ|@?Po@|,\!u@.7P{5n : ^ߞ"Upn !<ˆ&p!|@l \A. '87{:8 0$ο;  oXhCAHuAZUhb8 #jinquZ<@gʹ[؁qoNWTw7;i_U?b:"^A&W 9_?0PdgZw:"Eq ނԆ#Ei>>~@ka2m Pz] Nmaw:n{qD]~ "!"ȏ@A6@ґ# \Ἲi#Dj>V•A:^C|@,W@x \a`8$x@~#BW,y& AvB D+}(Y\y%2H ';drp@~ 7Dy: {%u+~9e;~9(8$7~RMHpP2lq|tZan;GyUv[|W?R) %(/=a"i z 0|\AP2zqs!p!:P,^][x-J'7"%71Ց?HččMTFTF&rq#h$Fp_.! %7C҇Í kÍ_Fz`'4ҹs#(SaCH吴"sn/brp.aB8elA6.ѴCeC,܋p?qD"V3'7Qm|@@*ԾheCǹ*kU̍T'p#T2FЩ,'@?+ڱ , &sC v0I!ѡ Z9E#& Bs𯞿`$U ws ZGE߫Ee! {H78k;89BH%5}0;=~u40 Gr>F0pD%0A레8 \DjbA6Mx# :|D X@g"x_O q !I}Y5F.!W!UAE?:7Bh:`NW6q\@@v++$(aHdʍWr"4"4@JA\Av`(RN0wㅀrvvD"!"vu[ W9A>ҽW*ϵ.N}7G<n[AY-p&1i ثO$ֵk9e Eۥɢ1J%.L)_̽Pd˸ѱ!7 vˆhgK򣟢߽6dZg%TkҮ)EDӷDXf|̑o<R-)w`GP'jG[妤@}"qհq[BdNIlWjF߄9AcwT>CqijK.-T&uCČ嵤E㱎4nf"R}Wb#؎}P{.MHh\9l+u(-C)i:gɓ::W!~Ó1Du]<;\>wTu_8&]p`)䒱|_6l{^LtfUps{ SƵ,o"=^m0x3lSyygHyjDp4rUɉAOX3J+C:ʍɳ%~%cg?v_c= 0mrmxoQqc쳹Imni1.GET K6@49ْwU-MhwPLn b-ALY$Khly}nI :qQwR+|΁b| I$tOMs$qO݄}a2ZO6S)w~طjq)MVҕ/?19y{'[3$.qꮹ =; =hXJ|~SaK:Ik }mm2$J|]*t&#w,NiK~ ^MūyqFٷdIemIh9|E4ʝ\!{6ƚV{L^έB]8<#*dps6P;򿉙O#ni"k0AJUg> <- ß{gDOȝ;lNf.w/ۧԺ0me™rzF2{iݧ&xY ~gA+ A~pOaDK$Ay*%^[9FaE]gbWOƧ=ID;JwRpbwՀK0[d;gA.,NITeb+ Ùf D̿'m{SC:7[Yw,鎱)4YX<(rs*A-;>G |y̶U5Փslx킎ດh-7ϕ74?ڳ*]TiCC:#ZDmqD zl7}w>.=R `D-HI54Aq`*IrQoY=lX7Ӣ٦ / XV[x!zxa;'#U3^jjphõTmkkա֔|R, V[2wܭ۟vPEbyDYy.`j:2-h3SSw>JӋ,GDtwqlB#;+h 93̾WO,Wf~Wy|H HvBu~椃vJos;I!jM7NOjR^eFh`%v~Ͽ}.xHޭL\GiNtENcO GmK.< 1}.5PZE﹅փIrDZӿ&zոPSzHXg@Wz2nٍ/hg}'~i=s*Ђ|]WO^g~7&7G|K- Q= ,>8BBjCfݼǹs6✚#{T mmNe usUh\ XIߩb-d,KUN9d <_$L? Џ:Szoz%Se/8A+Bfܳٲܥx]q8+yCQJc(PCR>XY星$.V%5d_:uQQml7^цܣ,8ɳf1z]e=6>\3]0aLP!T)h@t]W){u?0ɝ=Yo4ȒEfX݁-l%voEKxE A zfF[ڷNu]PL]Wh8111s7z})NH%`}`?OߺV|9 3î#=x3Ws-;ELwzYꪌgITo]fs/˙0jWxd iMI< vW\VJ=9$Sf'XWN<Qmh>dN֔{}>8mgƳE%E:T@+ϗ[K sm~r%,y1L;lMRw;D#ˤ䀻e:V fD''4J0arieMvE*gH` kGPnnքW/طQb\jQ ѫbzO q{-S 3o4Qcuj\\ySΓɼg~#mԪ-d<9"{Ž EF?F\/Rw)['cŹdhzj5tGPL‘[0Ӯw!)Y^Xj?y!U0i(g;,圞N7 gd|hfJ}>%2@p)zX_vXB]td h Ȃ^T5}| %ydwON;9>Ģ'nzo\$|1hmVNG:F3AcsUk,^ieûtkQTy=7 u(%Q>fQǀ/||sF^ܑ똳jeHO:q7F;l~dҜjKm|{Pvߥk7S?dص5̋ G~ZP(Q2~I ? @SV*CQ`#&Ѯl^;!$RFA jI- }R<TFF>Ui <>OF oLZn6rmR;A >Vҷ"ģݭ'-(o0QhtQ,chxev[8~PX.w0e{gZyʆ| )w@%Ö>=\Ք[[ǦH reQJf@n=}lY/A#)^i!]D+:!k_{H|^?:&w;W\ro`4H/r*^Av-[1ZhiQg231NS X[ȋع8.ah!dZqr MO(VI{LBCV.%ݏɡ QMAj{,c)vZCL df򵩵w 39Tu4.C]z7BM+uRY>/+(7 KK#rઐ8Պ ~9f $1ymz!m?̥|κXl ,v!Bb0j5~mykJ#`hR Y?*cLWl{.P;4!>qvntWe]ĉJ~˷5\CXt&ѿ_uҸL=h0 s; kKjn~fSU9zT^A?_& 간 %iHɲ1K?b9ӯ^].;,0}EՎ0#^]t:_W1RM9 ~~}H04җlߏyrAmFc {b&n@2g7 CElh`>us$tT[㛜V䥵d-8GW:JXTN0?S\kuiVH!sKdrI[u8nU B1:317ϏfʛedKAP,$T\:%OީcjB* I^z8'\X1TnybX:[gr98E2'逺Ҏ.y_j k,q~VI x;)OsIxC$IV9bvB Qu;$*k#M(3L~x* bf|DT1I3EFnQ?F}( p1m9U?(ېV낮0WU{^.֧/zpMrcoqb2$R˿| )pT%,{_k$_kvCܪZOyCB- hDq" TF{,߫~ <>d] WJU<8zT|Acڲӓ >zfUXzQ=DJ46- =/U:X+C C(Jȱk=ӷ`c 5wsh;ki7* 5bSWbЦ1ٖL͓pޜCm|/R΁=*J-K`0r [ 1 ~ڋL 7Esb^5ØBEӳ9!-|엳<%oӰ)1Dm_#.b=3sH cfKC)J)gKJB-gz/,C/T@ _Wi>!J+k^HR^ҍ&}Ӏ/%.-癮ZI_je=Ns=/D6!JSs"bO;L@Rj \,ᝎnJ-|JK18ш҈734I؅|OqSV*)SzS^ǏÀ+)з8[ L? r(`hoi;qyf"agTc/YzemP̊X/zo-L?L3ϬQ1wHɅf  |u~9k>p4JZxlL=li}ߥ1 .l1k @4v-'bqpWuwO'<aio@[)x^N.-T+k:sg6| c1&l.tSݷd "im $ž>+v_R;IzO7n9}|q_㥕 ~(~ hSܕrDv|2yosIɐyR ۾ QwFQdZ#x0s*?ke@4gl(Zy;nGL?6(E{:X|+&U[SrɎB@o=ntGΕ̽6UtE󳘑|E7iĥ5 Xw,~>rtl8 ͏fמԉ.O*I$?Y4'sG2̴Sƫd츚?bRyאou -fZ)=Z]y*b =_S.dL{$C6n5z咳$"%!:$>l ?}~E\q:vhyV:oaqD\hU2C!Ə FQ M[ׄKbw o/㔼h9%Rj9`,`F2å>'Ex6j^}۫9ws&I+Eo~h_6-?UmK06nxwT`pmcfZ{)U |c1ۣцM7k?EU$6U62V-1-Uss“ VNdž|C%Q~v-2R_a[M)ir'(O8Bq&6P׍ _Ɋ.IzYm\{fsG/z;57_`,[38 T?N]fߘg̋ݨ&6Lltj=uH'qri ?mqu`_PՏi/R:/;]l+D]k\8h{Z{'ޅ?z^{H߬(;z_s0N݊Mo,|FG4X[htZ>W¨S%6 `QgF}+ǺiUvl'Ƿ}_<#:nqzPϺcA17" :ef`TPq$8# k1LՉqKc{` !s8fxR}`PT#﬽vwxQ۰!ȑȋ{wdhLˆwaNOv>WπFB3N8o&6j5K0tc)Z4|FE޴;|Iso$,g&5pH!;.| _ Z"d_P<9ILXYrEeLeowUև yу4't4{m:e69p~?rߒ9l~Õ^|pEL!U2@ L~PVΒ\7{Ex N8feFvgg>=挱Օwu.l%S;{2*dn9'V-DM^0$Jpl8OBpSPM4s[[(B>I>qO=ܰȍuɣgi/ ijIZX)U{]YG:7e\洞,'d18Ⱦ$L~j7uQyʺK]*y $˒^5@!%p Xľ楞T~Egsm K-tw)+c.ZbX&o󣌸oN2GI~O{(*L%"=\ Տ= 5ou.]rwx8i祿oq;h#Rl)?w65&-25m7װ\g0F. 7O͆A/yA{'i n~@VjBuzPGQrEK]sW 2PrKHZCRpa]˃H@h "'JJ02(=37sFOmnw E􈩟@oZ[++apom3|p{/N5۽QTȚe&.qT@yq+'VEh(™Ii+tY&هA:7 ָ߶qgOyR7F8l~Do gP)[Wxf[ R!8OFdww>sg~_9l hu!,=@cٻSC]նehf5LI/ݜ2S&r Gם2`//ݼ7@3{w^=Re}j*lme6b?7;o/fU,8%p* s% tf QOgD})t~ ǂT[>b_2( u |s U|h:1odr5{N[o7q{ңld-+&8_GZTӇ0b?I0SHq-FSńO >w P1%fl`Z9ښ&9$FdZ?rZ.jEw >I^ZQV1p_,7l?0$q=~ՐFlĽ91|JAtO=OQ_/cu]~ꞢDJ?g9mRGl/k^jI[=*#zZBӆ:fyk+O4#*S#ã_qs"(;6TEcM 87QJ[yS~R^޽$&Q!Rخ%?>t;g%%*V^yg ڧs=mMPSyAAuOI K'f|A)/B~O *>hȽxN\=c<10Ҋ&t]IU[fђn?5X!vc>D[ vڑw/>XK=,0 cֽƥ͠搈s^s3dVA;*O 5MߨK!)u7" 5C< #oN|O9lboӕ"!i2V,]eyAsZk$4XS*|xc'l4Wzwº1jD]T[8pLc.qm7`-9D5>B'?|F/z iǣiB:僢e:-gwXJjI.?-9I򺐓9C6DV斆Q?*_ XRL27d=#};IF1A3 $wBr{ *Θ ug%IPsK&!܊yky y1pYN#|' um3+YAû%˦ jb;J* xMe h-}bUx{1a$=;!tHdG̯Íh_m@}"6EBS%U+;?AN%M:9C /:6֌ѽL2A̓HiC7&OF%BkўjED~x~I ]HrIfDv^<cQn5_4oÃ" KJ-9m_پ0Eośy߿/?Q'|̒8p+}}2ژcy]A]Q*IxbI_嗇bs槺ywmj\'$ygWBpYc!Gi`Iܗwj mB.\9/>k,Vq@~$Fraw:p[30bmZ!͠~ ]C+cjuMeWonsJ1~v ֩gYYzRu6vMkּ/9ؾ llAQnuGb) R]8;(c2BY/Kp7k*jDG͐ӈex[jR5#.K{=w0㦢X(_0D 맪쏭MDŽ7x "4}ݫ $.ƁJv2yh~GV =)g[䛌xbnY~=<'͐4uxcY7?=Z)ӹ2լ`GӽVmfa$7&ú7PЍ+P\_Ko=BxI0b(C9Ї3[Ru_m39a]V%iʲμW=U|ȂVvkrpU;l{Of>"T?d4P.>[t'kx퍍9sh-LW>oOBآK,/E,rԟm:v|*ȋlk(~~kUũQ _dOOףbeI& *~Z;9CX[_`qZcw7D+ps:btRu90Q"*V%Q\|irf7mo59*6P+̌U<.oZ҇=LX0$.AӘ@|W BVRY榲A¡(lMkڏ\rеn? ߐBwX{$ޱR mF̜/9.BZtϋyH@>dQ)OY:1($A"_ڟsDCWftT^| Ο5`[hVsr(sZPmS[`]c+&Chl6OX[5kSϖgE6=wTf}"ttpn]Y6 ]l[yM<T EG6@j)Y#LS;$쓌NhW,/e?Ιٮf؂̚V1s5 IW-xg{jP&{Ƌ I~1{@{$3JyX;` [EuFg6XiŴo؅b6Y4,*ޅ x0o4!m۾C іT=.&3Ћ 0 3x'nCTTg-/4J4_ӆ OޕUКFNz?8-|eH^ɅX_m>]d)*'72¾ep1(Uv?[iBGYV=ꔸ x֦hwתl4 ϴdÜ~ , n`=j""0%Ű$c7zp&,7(^.:A0W۰w&-P}US`#{ko"^cJfxA>t{[Q V/\҅ZU']1g[N~,i;;xht]S .[s'ۃFqHXS7x#f:d-68KVk0{ oTWٰ6||W^t?'qM`,q#j:2;u:*Ĩ=a^\xn5{~w2.eI_HmK!lN.RӺyaδZ)}x9dx/B~9ʝX-ra$cdmM,,l{ xݺDnDJS/2ecZJ7$[tv3 nUrpY^rm ӎ)/?4M T e?ayb~HPBVms[5GY5Uw^\b?rE!LǛ*;ۻPE22>%qsj'YJNXCGе0h@T Bv) o[l!o}ZXi6&Y,ePĘ.r"s/,?Ž5C6UO6#%y;y6?MqgbvêN,FDh&NA : j$?Wzu,|")Ǘo}VvX:oè վ}!*e~Sn-HIwN;wOēyF;D.&?I"FT~twtA*fM[()Yh27ss]%p%% 9)>kE#mwU@- լdѬlgպ9K[!ӏ&墌>~Q܅4V)hs/%.9GN|>)vTT gߪbJ`CB'zBzx}#&U5^GLm7|]|xR|&ЌT6ÚZ묶bO=%z{O_!7cB8Im#yҾ,.p`%Nj'LR|t47{[Jm(ȩ└H1:Xgc_츆433ݭ[Ζ7tOŢ,j&΢e_}^s|aimbGz'R fOl{EnL:0'qhV=H$cMmXFγ.U9J @K=kT~kөy2[jvXpwL98F@y/VC_PTB.\[gRdC0{<a{ 2<@{??mަֲ3e`y74s-]VZ$W~vz i7ڹq>zF[7wp0YzN`ˉ,nEBya5Bv 3YkmA]ŷ &P~vx2~/A^“Y*y*]o3ANh;$-Z>vĬV? \6h,}̤$Um#hO. lΩAⱍFc5iƶmFv۶mfcXgoΙ3C9:[> ou׋t ),AJj(1f`wtrCpTPOiBL[cLnEbFOn{ ܂%붚 yawDdnW2(;,ŔmA8cJ$S=R"YX! ]!93UjYȠIY]x׉&Ƛ.O#/gQB ;&3no iPwK-AuA ^ $94q *%!^g:McExZ,l13Q$X7BR\!AD'Nnfd??&9`8/6U_]un_8Flb*sSa'i)]Dꫲ c!D1[]=IY<0GgWv -,OI{ן5ӸZÙ|*UG۴辀- XT&.P 7sErxw QQ2; Ա J. ]t+[P:Th 2<^AW-CaBJلdDBL_Yίn1UߗFf$+So'f`I=`WBEڟ3r ۇoy64Ӱ4*&(<}md"hpr1s5v_TByOMoվ=kb in-`Zc4[ M/uL2ܷW膩B{y!N" ǯd6Uqwm1)灬75Rc5ӻU[ދ{{SF'=NBfz3?Pqvi"v#liT'> >Lz,R{5gRw3:ų< 7d-3Y0nzGx00䛜aȥ5@4gpEaKs j/<jvgd#qbRN_?`\m^5,T' ?in:&1!v l(5gZKNX'bXp3i-~9P>iB()[V|HDyv̕ {]skQΌd(cyka1rR8͠ߌbn-Gc q]$v24gͲ+m AЀz87mI7B GƗ6Ik~Bla!uؐTy J[?J.-=o++j3,ɹ>KaϼF"P/ ET#ޟFoILA)OP\_YYtI{bԭﰡo8=n'26>"`/GS7?t'j4{o/ `;Ygrkk'CGM$ȋ1_ECs9QVg)݉#R(.4cDž,Et@{?Kno7_rL5Q}~eMH GApMm3 ro ># ebrEլBeD{!ROA吋B~ze,FSWyn1MX- ~+]<:S9#a*.0nSn8"T=@j}(Ici#MWo\pD/2gzfc?qYP'')A:UN@)HZ é"ޣJ{*8O"6Sy(19cD]A4'CL0xXW\g ;7sn;$ Wk{ScMP*6 & {.QZ4޷ I7S$$t*'Yn aWو4L9ċLYSZP wNOT CAZY-45̉{N/Ca?Ɲ*FJInM=0YZ &&4 {JI$B7/S"nHo2+y~Sj{Jm8WH &59mQ͘4no(4遳o Cg.:ߔ`o20' 7 sm:@XCu}ҿD7/9m/Rm1 4tp$V˸?_ջYYIDx}nMO. ٌ{&b* Nx֚axirR?#)a[Ս[/"r_,PPH_0b1L~@_*2K~cD wm(͉yd@ =5lt6#rH~43G(.]#!_3FEi_()뽿lEeld,e. ki5Ɩh@dU 8@IUP&?˷݊%x-I{MYW[?&9 (sa) T@F e;v- _s%pP ݷ|F+5T(uXr.TokwC?Ϫ;wfy!3Bh@iœ?(%< 4wOR.bS%)Q-\ّHUb8 I9._+BǸ_ WVqO2!(1 {\z}qշiMku /ϫ5OBTZzͲU)=蕚pvu )1>9fVR2h- dW >>j O# #`z^/^ QBlnm\ g (2WqXw̟EUuQjkԝћ'!x<B`_Yʄxp<?0fDIj>{իˁKk國 ni0=(jSl,N?ޞa"qT'~E."3^>3}^| ~`X,GaΉfاK;Qc ,Twd6N(v]"uem 2W'Ϙ:sI)$'4_-DZ+c梙m=Geɝ^׊ ♆$%GlCp,vB |}]vs%S%EP=@",gX|Ⓤ>ԟd+o ; ."WiSn6s8UmrAbpt*4lxMOgW[Bp_&>\כs'qi,LT#xt>sʹ"u$ϥVDv08ҕ f8Vn5a?c'CZFWIO!/X6F3nCߏopd3ig-~ȇWB|W%BzSC\ \/c-lH=NjsDF1"XRv٪DhIJZu h,oWgr-N2+am=;P٤ԇ2)t)J|)Y+3BJx1(GL>md콴T@ًw_'SK1&`a~!J|rƪs쒘tG}ן̪a봦pM.?'-pvYi_XS 5 DszszTt y=: |m4!XQiL^L8΋~@8s9.JT3Q=M%\؎3/i.yb0Ph/-D o7~~ղ܌&NBSױwCZ6 Oh`YKcT#-i?XXPߝƒshy#и>xOS}q-pK^ʗnpo%DCEdXrX=h+Ҕ3p&•tI,S?;O p~;5>?uub/4vlup;FH;,AJ~:SFU!_INmp0B+IMA^{OZl \GB#{G{Ů(anʣkP/|4ӌ _0&,1HK{l;hL}Mc<u#šfI1]`sc+c:>/ʫz]yv >o;:1Ѵ PqA#+5 %SS#'$1jey6 Brx*q~eH $!۩SbZj˨W?X*r[F)Cp45W [ ꣨g8/JdC_)/,\Qe&x fO|BWĄIVui CrH^ے"`;;!r}ߊ&Fum(7g#H=ɫս3d|6C W}ub HАg :Q,֕k\\TPM YNYhkh0E 9RML^sOoAwM)K"*\W{CPnp2"@;kF\Sαe̤nj XU;6HRH2Uzj Mc.%9P'^W@D8_&u͏aDҨ-7 qE~ʴ<YFGGaÒPg{d^T̈Vh<(`,iõk별p~rVJ!'ycΒ }xV=~l} 色bTv)Wau> &և|np4uε)'Ug\_#'lR`0Iڜo/hFPamQn2ڵpYUC,Ga GB~/D\u04AuGF BjZp*gԈV`#U v~mn~cpO:EKl59ʧ5ŰZ~QN_]Ze:1u^).xJ?Y4nTtgd!~,Vi7@f }@e (^gN4  " POW5dlT&NuP~GG83(d$CQp:?H`Ɉ_ nGާPU`ホ5C<ƤK>p绗J510ʩ\USTS?ɂYv3BvK ?q7owG'xIX~5{Yy،=v&女PQ] ٺ 頁jނMB#HȒc[`yKY|=BΕ5}#ZX"yߖw8/"F(hDdguSm"VYπ!aōfM/[4&g&&Co]Ta;Z*ggaE{I 8I(݇j+  \E߽ ו>X3&/1.r־Rv$vWsA"c꧶o`_kF[J kqBF-jX.Mml8CVWObΌ)M003v)M"5|j# G$!$OHy#h7CPaWl .cp r*{)t=B+f%B[6CH^Ѽ;=OT}|I>A#_$<]oW ۑIX_CIj3ҿD/MVK,L@BƲ5;gD{uQ>3>,/.|*xӁםWh7ġapEu^fCHﲻddwINx1Oٟ-= E3pnS|abT/p@Wn9-A _ *&16|ITF򆊀h\I{E"nWަ'w\hc7* eVKK1=M!c{Qs.5x=4=3@s]g\T(Jzg1Aw>T1$gsY|Hz|jltP7Еa*0/K! S9q3vz9;881-.E endstream endobj 40 0 obj 26864 endobj 41 0 obj <> endobj 42 0 obj <> endobj 43 0 obj <> endobj 44 0 obj <> endobj 45 0 obj <>/Font<< /F2 8 0 R /F1 9 0 R /F11 14 0 R /F6 19 0 R /F12 24 0 R /F5 29 0 R /F13 30 0 R /F9 31 0 R /F7 32 0 R /F8 37 0 R /F10 42 0 R /F4 43 0 R /F3 44 0 R >> >> endobj 46 0 obj <> endobj 47 0 obj <> endobj 48 0 obj <> endobj 49 0 obj <> endobj 50 0 obj <> endobj 51 0 obj <> endobj xref 0 52 0000000000 65535 f 0000000015 00000 n 0000006769 00000 n 0000006789 00000 n 0000014480 00000 n 0000014500 00000 n 0000020394 00000 n 0000020414 00000 n 0000020461 00000 n 0000020563 00000 n 0000020660 00000 n 0000021270 00000 n 0000048069 00000 n 0000048091 00000 n 0000048276 00000 n 0000048414 00000 n 0000048970 00000 n 0000075413 00000 n 0000075435 00000 n 0000075625 00000 n 0000075771 00000 n 0000075843 00000 n 0000100344 00000 n 0000100366 00000 n 0000100551 00000 n 0000100692 00000 n 0000101024 00000 n 0000130796 00000 n 0000130818 00000 n 0000130999 00000 n 0000131136 00000 n 0000131243 00000 n 0000131339 00000 n 0000131440 00000 n 0000131814 00000 n 0000156208 00000 n 0000156230 00000 n 0000156425 00000 n 0000156571 00000 n 0000156903 00000 n 0000183878 00000 n 0000183900 00000 n 0000184085 00000 n 0000184227 00000 n 0000184328 00000 n 0000184428 00000 n 0000184631 00000 n 0000184736 00000 n 0000184841 00000 n 0000184946 00000 n 0000185014 00000 n 0000185061 00000 n trailer <> startxref 185235 %%EOF DESeq2/man/0000755000175400017540000000000013201671732013400 5ustar00biocbuildbiocbuildDESeq2/man/DESeq.Rd0000644000175400017540000002071313201671732014633 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{DESeq} \alias{DESeq} \title{Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution} \usage{ DESeq(object, test = c("Wald", "LRT"), fitType = c("parametric", "local", "mean"), betaPrior, full = design(object), reduced, quiet = FALSE, minReplicatesForReplace = 7, modelMatrixType, parallel = FALSE, BPPARAM = bpparam()) } \arguments{ \item{object}{a DESeqDataSet object, see the constructor functions \code{\link{DESeqDataSet}}, \code{\link{DESeqDataSetFromMatrix}}, \code{\link{DESeqDataSetFromHTSeqCount}}.} \item{test}{either "Wald" or "LRT", which will then use either Wald significance tests (defined by \code{\link{nbinomWaldTest}}), or the likelihood ratio test on the difference in deviance between a full and reduced model formula (defined by \code{\link{nbinomLRT}})} \item{fitType}{either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity. See \code{\link{estimateDispersions}} for description.} \item{betaPrior}{whether or not to put a zero-mean normal prior on the non-intercept coefficients See \code{\link{nbinomWaldTest}} for description of the calculation of the beta prior. In versions \code{>=1.16}, the default is set to \code{FALSE}, and shrunken LFCs are obtained afterwards using \code{\link{lfcShrink}}.} \item{full}{for \code{test="LRT"}, the full model formula, which is restricted to the formula in \code{design(object)}. alternatively, it can be a model matrix constructed by the user. advanced use: specifying a model matrix for full and \code{test="Wald"} is possible if \code{betaPrior=FALSE}} \item{reduced}{for \code{test="LRT"}, a reduced formula to compare against, i.e., the full formula with the term(s) of interest removed. alternatively, it can be a model matrix constructed by the user} \item{quiet}{whether to print messages at each step} \item{minReplicatesForReplace}{the minimum number of replicates required in order to use \code{\link{replaceOutliers}} on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set to \code{Inf} in order to never replace outliers.} \item{modelMatrixType}{either "standard" or "expanded", which describe how the model matrix, X of the GLM formula is formed. "standard" is as created by \code{model.matrix} using the design formula. "expanded" includes an indicator variable for each level of factors in addition to an intercept. for more information see the Description of \code{\link{nbinomWaldTest}}. betaPrior must be set to TRUE in order for expanded model matrices to be fit.} \item{parallel}{if FALSE, no parallelization. if TRUE, parallel execution using \code{BiocParallel}, see next argument \code{BPPARAM}. A note on running in parallel using \code{BiocParallel}: it may be advantageous to remove large, unneeded objects from your current R environment before calling \code{DESeq}, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.} \item{BPPARAM}{an optional parameter object passed internally to \code{\link{bplapply}} when \code{parallel=TRUE}. If not specified, the parameters last registered with \code{\link{register}} will be used.} } \value{ a \code{\link{DESeqDataSet}} object with results stored as metadata columns. These results should accessed by calling the \code{\link{results}} function. By default this will return the log2 fold changes and p-values for the last variable in the design formula. See \code{\link{results}} for how to access results for other variables. } \description{ This function performs a default analysis through the steps: \enumerate{ \item estimation of size factors: \code{\link{estimateSizeFactors}} \item estimation of dispersion: \code{\link{estimateDispersions}} \item Negative Binomial GLM fitting and Wald statistics: \code{\link{nbinomWaldTest}} } For complete details on each step, see the manual pages of the respective functions. After the \code{DESeq} function returns a DESeqDataSet object, results tables (log2 fold changes and p-values) can be generated using the \code{\link{results}} function. See the manual page for \code{\link{results}} for information on independent filtering and p-value adjustment for multiple test correction. } \details{ The differential expression analysis uses a generalized linear model of the form: \deqn{ K_{ij} \sim \textrm{NB}( \mu_{ij}, \alpha_i) }{ K_ij ~ NB(mu_ij, alpha_i) } \deqn{ \mu_{ij} = s_j q_{ij} }{ mu_ij = s_j q_ij } \deqn{ \log_2(q_{ij}) = x_{j.} \beta_i }{ log2(q_ij) = x_j. beta_i } where counts \eqn{K_{ij}}{K_ij} for gene i, sample j are modeled using a Negative Binomial distribution with fitted mean \eqn{\mu_{ij}}{mu_ij} and a gene-specific dispersion parameter \eqn{\alpha_i}{alpha_i}. The fitted mean is composed of a sample-specific size factor \eqn{s_j}{s_j} and a parameter \eqn{q_{ij}}{q_ij} proportional to the expected true concentration of fragments for sample j. The coefficients \eqn{\beta_i}{beta_i} give the log2 fold changes for gene i for each column of the model matrix \eqn{X}{X}. The sample-specific size factors can be replaced by gene-specific normalization factors for each sample using \code{\link{normalizationFactors}}. For details on the fitting of the log2 fold changes and calculation of p-values, see \code{\link{nbinomWaldTest}} if using \code{test="Wald"}, or \code{\link{nbinomLRT}} if using \code{test="LRT"}. Experiments without replicates do not allow for estimation of the dispersion of counts around the expected value for each group, which is critical for differential expression analysis. If an experimental design is supplied which does not contain the necessary degrees of freedom for differential analysis, \code{DESeq} will provide a warning to the user and follow the strategy outlined in Anders and Huber (2010) under the section 'Working without replicates', wherein all the samples are considered as replicates of a single group for the estimation of dispersion. As noted in the reference above: "Some overestimation of the variance may be expected, which will make that approach conservative." Furthermore, "while one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation." We provide this approach for data exploration only, but for accurately identifying differential expression, biological replicates are required. The argument \code{minReplicatesForReplace} is used to decide which samples are eligible for automatic replacement in the case of extreme Cook's distance. By default, \code{DESeq} will replace outliers if the Cook's distance is large for a sample which has 7 or more replicates (including itself). This replacement is performed by the \code{\link{replaceOutliers}} function. This default behavior helps to prevent filtering genes based on Cook's distance when there are many degrees of freedom. See \code{\link{results}} for more information about filtering using Cook's distance, and the 'Dealing with outliers' section of the vignette. Unlike the behavior of \code{\link{replaceOutliers}}, here original counts are kept in the matrix returned by \code{\link{counts}}, original Cook's distances are kept in \code{assays(dds)[["cooks"]]}, and the replacement counts used for fitting are kept in \code{assays(dds)[["replaceCounts"]]}. Note that if a log2 fold change prior is used (betaPrior=TRUE) then expanded model matrices will be used in fitting. These are described in \code{\link{nbinomWaldTest}} and in the vignette. The \code{contrast} argument of \code{\link{results}} should be used for generating results tables. } \examples{ # see vignette for suggestions on generating # count tables from RNA-Seq data cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10) cond <- factor(rep(1:2, each=5)) # object construction dds <- DESeqDataSetFromMatrix(cnts, DataFrame(cond), ~ cond) # standard analysis dds <- DESeq(dds) res <- results(dds) # moderated log2 fold changes resultsNames(dds) resLFC <- lfcShrink(dds, coef=2, res=res) # an alternate analysis: likelihood ratio test ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1) resLRT <- results(ddsLRT) } \references{ Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. \url{https://doi.org/10.1186/s13059-014-0550-8} } \seealso{ \code{\link{nbinomWaldTest}}, \code{\link{nbinomLRT}} } \author{ Michael Love } DESeq2/man/DESeq2-package.Rd0000644000175400017540000000243013201671732016302 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \docType{package} \name{DESeq2-package} \alias{DESeq2-package} \title{DESeq2 package for differential analysis of count data} \description{ The main functions for differential analysis are \code{\link{DESeq}} and \code{\link{results}}. See the examples at \code{\link{DESeq}} for basic analysis steps. Two transformations offered for count data are the "regularized logarithm", \code{\link{rlog}}, and \code{\link{varianceStabilizingTransformation}}. For more detailed information on usage, see the package vignette, by typing \code{vignette("DESeq2")}, or the workflow linked to on the first page of the vignette. All support questions should be posted to the Bioconductor support site: \url{http://support.bioconductor.org}. } \references{ DESeq2 reference: Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. \url{http://dx.doi.org/10.1186/s13059-014-0550-8} DESeq reference: Simon Anders, Wolfgang Huber (2010) Differential expression analysis for sequence count data. Genome Biology, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} } \author{ Michael Love, Wolfgang Huber, Simon Anders } \keyword{package} DESeq2/man/DESeqDataSet.Rd0000644000175400017540000001067613201671732016110 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllClasses.R \docType{class} \name{DESeqDataSet-class} \alias{DESeqDataSet-class} \alias{DESeqDataSet} \alias{DESeqDataSet-class} \alias{DESeqDataSetFromMatrix} \alias{DESeqDataSetFromHTSeqCount} \alias{DESeqDataSetFromMatrix} \alias{DESeqDataSetFromHTSeqCount} \alias{DESeqDataSetFromTximport} \title{DESeqDataSet object and constructors} \usage{ DESeqDataSet(se, design, ignoreRank = FALSE) DESeqDataSetFromMatrix(countData, colData, design, tidy = FALSE, ignoreRank = FALSE, ...) DESeqDataSetFromHTSeqCount(sampleTable, directory = ".", design, ignoreRank = FALSE, ...) DESeqDataSetFromTximport(txi, colData, design, ...) } \arguments{ \item{se}{a \code{RangedSummarizedExperiment} with columns of variables indicating sample information in \code{colData}, and the counts as the first element in the assays list, which will be renamed "counts". A \code{RangedSummarizedExperiment} object can be generated by the function \code{summarizeOverlaps} in the GenomicAlignments package.} \item{design}{a \code{formula} which expresses how the counts for each gene depend on the variables in \code{colData}. Many R \code{formula} are valid, including designs with multiple variables, e.g., \code{~ group + condition}, and designs with interactions, e.g., \code{~ genotype + treatment + genotype:treatment}. See \code{\link{results}} for a variety of designs and how to extract results tables. By default, the functions in this package will use the last variable in the formula for building results tables and plotting. \code{~ 1} can be used for no design, although users need to remember to switch to another design for differential testing.} \item{ignoreRank}{use of this argument is reserved for DEXSeq developers only. Users will immediately encounter an error upon trying to estimate dispersion using a design with a model matrix which is not full rank.} \item{countData}{for matrix input: a matrix of non-negative integers} \item{colData}{for matrix input: a \code{DataFrame} or \code{data.frame} with at least a single column. Rows of colData correspond to columns of countData} \item{tidy}{for matrix input: whether the first column of countData is the rownames for the count matrix} \item{...}{arguments provided to \code{SummarizedExperiment} including rowRanges and metadata. Note that for Bioconductor 3.1, rowRanges must be a GRanges or GRangesList, with potential metadata columns as a DataFrame accessed and stored with \code{mcols}. If a user wants to store metadata columns about the rows of the countData, but does not have GRanges or GRangesList information, first construct the DESeqDataSet without rowRanges and then add the DataFrame with \code{mcols(dds)}.} \item{sampleTable}{for htseq-count: a \code{data.frame} with three or more columns. Each row describes one sample. The first column is the sample name, the second column the file name of the count file generated by htseq-count, and the remaining columns are sample metadata which will be stored in \code{colData}} \item{directory}{for htseq-count: the directory relative to which the filenames are specified. defaults to current directory} \item{txi}{for tximport: the simple list output of the \code{tximport} function} } \value{ A DESeqDataSet object. } \description{ \code{DESeqDataSet} is a subclass of \code{RangedSummarizedExperiment}, used to store the input values, intermediate calculations and results of an analysis of differential expression. The \code{DESeqDataSet} class enforces non-negative integer values in the "counts" matrix stored as the first element in the assay list. In addition, a formula which specifies the design of the experiment must be provided. The constructor functions create a DESeqDataSet object from various types of input: a RangedSummarizedExperiment, a matrix, count files generated by the python package HTSeq, or a list from the tximport function in the tximport package. See the vignette for examples of construction from different types. } \details{ Note on the error message "assay colnames() must be NULL or equal colData rownames()": this means that the colnames of countData are different than the rownames of colData. Fix this with: \code{colnames(countData) <- NULL} } \examples{ countData <- matrix(1:100,ncol=4) condition <- factor(c("A","A","B","B")) dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), ~ condition) } \references{ See \url{http://www-huber.embl.de/users/anders/HTSeq} for htseq-count } DESeq2/man/DESeqResults.Rd0000644000175400017540000000152713201671732016217 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllClasses.R \docType{class} \name{DESeqResults-class} \alias{DESeqResults-class} \alias{DESeqResults} \alias{DESeqResults-class} \title{DESeqResults object and constructor} \usage{ DESeqResults(DataFrame, priorInfo = list()) } \arguments{ \item{DataFrame}{a DataFrame of results, standard column names are: baseMean, log2FoldChange, lfcSE, stat, pvalue, padj.} \item{priorInfo}{a list giving information on the log fold change prior} } \value{ a DESeqResults object } \description{ This constructor function would not typically be used by "end users". This simple class extends the DataFrame class of the IRanges package to allow other packages to write methods for results objects from the DESeq2 package. It is used by \code{\link{results}} to wrap up the results table. } DESeq2/man/DESeqTransform.Rd0000644000175400017540000000141113201671732016521 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllClasses.R \docType{class} \name{DESeqTransform-class} \alias{DESeqTransform-class} \alias{DESeqTransform} \alias{DESeqTransform-class} \title{DESeqTransform object and constructor} \usage{ DESeqTransform(SummarizedExperiment) } \arguments{ \item{SummarizedExperiment}{a RangedSummarizedExperiment} } \value{ a DESeqTransform object } \description{ This constructor function would not typically be used by "end users". This simple class extends the RangedSummarizedExperiment class of the SummarizedExperiment package. It is used by \code{\link{rlog}} and \code{\link{varianceStabilizingTransformation}} to wrap up the results into a class for downstream methods, such as \code{\link{plotPCA}}. } DESeq2/man/coef.Rd0000644000175400017540000000237013201671732014605 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{coef} \alias{coef} \alias{coef.DESeqDataSet} \title{Extract a matrix of model coefficients/standard errors} \usage{ \method{coef}{DESeqDataSet}(object, SE = FALSE, ...) } \arguments{ \item{object}{a DESeqDataSet returned by \code{\link{DESeq}}, \code{\link{nbinomWaldTest}}, or \code{\link{nbinomLRT}}.} \item{SE}{whether to give the standard errors instead of coefficients. defaults to FALSE so that the coefficients are given.} \item{...}{additional arguments} } \description{ \strong{Note:} results tables with log2 fold change, p-values, adjusted p-values, etc. for each gene are best generated using the \code{\link{results}} function. The \code{coef} function is designed for advanced users who wish to inspect all model coefficients at once. } \details{ Estimated model coefficients or estimated standard errors are provided in a matrix form, number of genes by number of parameters, on the log2 scale. The columns correspond to columns of the model matrix for final GLM fitting, i.e., \code{attr(dds, "modelMatrix")}. } \examples{ dds <- makeExampleDESeqDataSet(m=4) dds <- DESeq(dds) coef(dds)[1,] coef(dds, SE=TRUE)[1,] } \author{ Michael Love } DESeq2/man/collapseReplicates.Rd0000644000175400017540000000444113201671732017510 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/helper.R \name{collapseReplicates} \alias{collapseReplicates} \title{Collapse technical replicates in a RangedSummarizedExperiment or DESeqDataSet} \usage{ collapseReplicates(object, groupby, run, renameCols = TRUE) } \arguments{ \item{object}{A \code{RangedSummarizedExperiment} or \code{DESeqDataSet}} \item{groupby}{a grouping factor, as long as the columns of object} \item{run}{optional, the names of each unique column in object. if provided, a new column \code{runsCollapsed} will be added to the \code{colData} which pastes together the names of \code{run}} \item{renameCols}{whether to rename the columns of the returned object using the levels of the grouping factor} } \value{ the \code{object} with as many columns as levels in \code{groupby}. This object has assay/count data which is summed from the various columns which are grouped together, and the \code{colData} is subset using the first column for each group in \code{groupby}. } \description{ Collapses the columns in \code{object} by summing within levels of a grouping factor \code{groupby}. The purpose of this function is to sum up read counts from technical replicates to create an object with a single column of read counts for each sample. Note: by "technical replicates", we mean multiple sequencing runs of the same library, in constrast to "biological replicates" in which multiple libraries are prepared from separate biological units. Optionally renames the columns of returned object with the levels of the grouping factor. Note: this function is written very simply and can be easily altered to produce other behavior by examining the source code. } \examples{ dds <- makeExampleDESeqDataSet(m=12) # make data with two technical replicates for three samples dds$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1))))) dds$run <- paste0("run",1:12) ddsColl <- collapseReplicates(dds, dds$sample, dds$run) # examine the colData and column names of the collapsed data colData(ddsColl) colnames(ddsColl) # check that the sum of the counts for "sample1" is the same # as the counts in the "sample1" column in ddsColl matchFirstLevel <- dds$sample == levels(dds$sample)[1] stopifnot(all(rowSums(counts(dds[,matchFirstLevel])) == counts(ddsColl[,1]))) } DESeq2/man/counts.Rd0000644000175400017540000000264013201671732015204 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{counts} \alias{counts} \alias{counts,DESeqDataSet-method} \alias{counts<-,DESeqDataSet,matrix-method} \alias{counts} \title{Accessors for the 'counts' slot of a DESeqDataSet object.} \usage{ \S4method{counts}{DESeqDataSet}(object, normalized = FALSE, replaced = FALSE) \S4method{counts}{DESeqDataSet,matrix}(object) <- value } \arguments{ \item{object}{a \code{DESeqDataSet} object.} \item{normalized}{logical indicating whether or not to divide the counts by the size factors or normalization factors before returning (normalization factors always preempt size factors)} \item{replaced}{after a \code{DESeq} call, this argument will return the counts with outliers replaced instead of the original counts, and optionally \code{normalized}. The replaced counts are stored by \code{DESeq} in \code{assays(object)[['replaceCounts']]}.} \item{value}{an integer matrix} } \description{ The counts slot holds the count data as a matrix of non-negative integer count values, one row for each observational unit (gene or the like), and one column for each sample. } \examples{ dds <- makeExampleDESeqDataSet(m=4) head(counts(dds)) dds <- estimateSizeFactors(dds) # run this or DESeq() first head(counts(dds, normalized=TRUE)) } \seealso{ \code{\link{sizeFactors}}, \code{\link{normalizationFactors}} } \author{ Simon Anders } DESeq2/man/design.Rd0000644000175400017540000000145313201671732015143 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{design} \alias{design} \alias{design,DESeqDataSet-method} \alias{design<-,DESeqDataSet,formula-method} \alias{design} \title{Accessors for the 'design' slot of a DESeqDataSet object.} \usage{ \S4method{design}{DESeqDataSet}(object) \S4method{design}{DESeqDataSet,formula}(object) <- value } \arguments{ \item{object}{a \code{DESeqDataSet} object} \item{value}{a \code{formula} used for estimating dispersion and fitting Negative Binomial GLMs} } \description{ The design holds the R \code{formula} which expresses how the counts depend on the variables in \code{colData}. See \code{\link{DESeqDataSet}} for details. } \examples{ dds <- makeExampleDESeqDataSet(m=4) design(dds) <- formula(~ 1) } DESeq2/man/dispersionFunction.Rd0000644000175400017540000000310213201671732017550 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllGenerics.R, R/methods.R \docType{methods} \name{dispersionFunction} \alias{dispersionFunction} \alias{dispersionFunction<-} \alias{dispersionFunction} \alias{dispersionFunction,DESeqDataSet-method} \alias{dispersionFunction<-,DESeqDataSet,function-method} \alias{dispersionFunction} \title{Accessors for the 'dispersionFunction' slot of a DESeqDataSet object.} \usage{ dispersionFunction(object, ...) dispersionFunction(object, ...) <- value \S4method{dispersionFunction}{DESeqDataSet}(object) \S4method{dispersionFunction}{DESeqDataSet,`function`}(object, estimateVar = TRUE) <- value } \arguments{ \item{object}{a \code{DESeqDataSet} object.} \item{...}{additional arguments} \item{value}{a \code{function}} \item{estimateVar}{whether to estimate the variance of dispersion residuals. setting to FALSE is needed, e.g. within \code{estimateDispersionsMAP} when called on a subset of the full dataset in parallel execution.} } \description{ The dispersion function is calculated by \code{\link{estimateDispersions}} and used by \code{\link{varianceStabilizingTransformation}}. Parametric dispersion fits store the coefficients of the fit as attributes in this slot. } \details{ Setting this will also overwrite \code{mcols(object)$dispFit} and the estimate the variance of dispersion residuals, see \code{estimateVar} below. } \examples{ dds <- makeExampleDESeqDataSet(m=4) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dispersionFunction(dds) } \seealso{ \code{\link{estimateDispersions}} } DESeq2/man/dispersions.Rd0000644000175400017540000000165513201671732016240 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllGenerics.R, R/methods.R \docType{methods} \name{dispersions} \alias{dispersions} \alias{dispersions<-} \alias{dispersions} \alias{dispersions,DESeqDataSet-method} \alias{dispersions<-,DESeqDataSet,numeric-method} \alias{dispersions} \title{Accessor functions for the dispersion estimates in a DESeqDataSet object.} \usage{ dispersions(object, ...) dispersions(object, ...) <- value \S4method{dispersions}{DESeqDataSet}(object) \S4method{dispersions}{DESeqDataSet,numeric}(object) <- value } \arguments{ \item{object}{a \code{DESeqDataSet} object.} \item{...}{additional arguments} \item{value}{the dispersions to use for the Negative Binomial modeling} } \description{ The dispersions for each row of the DESeqDataSet. Generally, these are set by \code{\link{estimateDispersions}}. } \seealso{ \code{\link{estimateDispersions}} } \author{ Simon Anders } DESeq2/man/estimateBetaPriorVar.Rd0000644000175400017540000000351213201671732017764 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{estimateBetaPriorVar} \alias{estimateBetaPriorVar} \alias{estimateMLEForBetaPriorVar} \alias{estimateMLEForBetaPriorVar} \title{Steps for estimating the beta prior variance} \usage{ estimateBetaPriorVar(object, betaPriorMethod = c("weighted", "quantile"), upperQuantile = 0.05) estimateMLEForBetaPriorVar(object, maxit = 100, useOptim = TRUE, useQR = TRUE, modelMatrixType = NULL) } \arguments{ \item{object}{a DESeqDataSet} \item{betaPriorMethod}{the method for calculating the beta prior variance, either "quanitle" or "weighted": "quantile" matches a normal distribution using the upper quantile of the finite MLE betas. "weighted" matches a normal distribution using the upper quantile, but weighting by the variance of the MLE betas.} \item{upperQuantile}{the upper quantile to be used for the "quantile" or "weighted" method of beta prior variance estimation} \item{maxit}{as defined in \code{link{nbinomWaldTest}}} \item{useOptim}{as defined in \code{link{nbinomWaldTest}}} \item{useQR}{as defined in \code{link{nbinomWaldTest}}} \item{modelMatrixType}{an optional override for the type which is set internally} } \value{ for \code{estimateMLEForBetaPriorVar}, a DESeqDataSet, with the necessary information stored in order to calculate the prior variance. for \code{estimateBetaPriorVar}, the vector of variances for the prior on the betas in the \code{\link{DESeq}} GLM } \description{ These lower-level functions are called within \code{\link{DESeq}} or \code{\link{nbinomWaldTest}}. End users should use those higher-level function instead. NOTE: \code{estimateBetaPriorVar} returns a numeric vector, not a DESEqDataSet! For advanced users: to use these functions, first run \code{estimateMLEForBetaPriorVar} and then run \code{estimateBetaPriorVar}. } DESeq2/man/estimateDispersions.Rd0000644000175400017540000001154513201671732017733 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{estimateDispersions} \alias{estimateDispersions} \alias{estimateDispersions,DESeqDataSet-method} \title{Estimate the dispersions for a DESeqDataSet} \usage{ \S4method{estimateDispersions}{DESeqDataSet}(object, fitType = c("parametric", "local", "mean"), maxit = 100, quiet = FALSE, modelMatrix = NULL) } \arguments{ \item{object}{a DESeqDataSet} \item{fitType}{either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity. \itemize{ \item parametric - fit a dispersion-mean relation of the form: \deqn{dispersion = asymptDisp + extraPois / mean} via a robust gamma-family GLM. The coefficients \code{asymptDisp} and \code{extraPois} are given in the attribute \code{coefficients} of the \code{\link{dispersionFunction}} of the object. \item local - use the locfit package to fit a local regression of log dispersions over log base mean (normal scale means and dispersions are input and output for \code{\link{dispersionFunction}}). The points are weighted by normalized mean count in the local regression. \item mean - use the mean of gene-wise dispersion estimates. }} \item{maxit}{control parameter: maximum number of iterations to allow for convergence} \item{quiet}{whether to print messages at each step} \item{modelMatrix}{an optional matrix which will be used for fitting the expected counts. by default, the model matrix is constructed from \code{design(object)}} } \value{ The DESeqDataSet passed as parameters, with the dispersion information filled in as metadata columns, accessible via \code{mcols}, or the final dispersions accessible via \code{\link{dispersions}}. } \description{ This function obtains dispersion estimates for Negative Binomial distributed data. } \details{ Typically the function is called with the idiom: \code{dds <- estimateDispersions(dds)} The fitting proceeds as follows: for each gene, an estimate of the dispersion is found which maximizes the Cox Reid-adjusted profile likelihood (the methods of Cox Reid-adjusted profile likelihood maximization for estimation of dispersion in RNA-Seq data were developed by McCarthy, et al. (2012), first implemented in the edgeR package in 2010); a trend line capturing the dispersion-mean relationship is fit to the maximum likelihood estimates; a normal prior is determined for the log dispersion estimates centered on the predicted value from the trended fit with variance equal to the difference between the observed variance of the log dispersion estimates and the expected sampling variance; finally maximum a posteriori dispersion estimates are returned. This final dispersion parameter is used in subsequent tests. The final dispersion estimates can be accessed from an object using \code{\link{dispersions}}. The fitted dispersion-mean relationship is also used in \code{\link{varianceStabilizingTransformation}}. All of the intermediate values (gene-wise dispersion estimates, fitted dispersion estimates from the trended fit, etc.) are stored in \code{mcols(dds)}, with information about these columns in \code{mcols(mcols(dds))}. The log normal prior on the dispersion parameter has been proposed by Wu, et al. (2012) and is also implemented in the DSS package. In DESeq2, the dispersion estimation procedure described above replaces the different methods of dispersion from the previous version of the DESeq package. \code{estimateDispersions} checks for the case of an analysis with as many samples as the number of coefficients to fit, and will temporarily substitute a design formula \code{~ 1} for the purposes of dispersion estimation. This treats the samples as replicates for the purpose of dispersion estimation. As mentioned in the DESeq paper: "While one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation." The lower-level functions called by \code{estimateDispersions} are: \code{\link{estimateDispersionsGeneEst}}, \code{\link{estimateDispersionsFit}}, and \code{\link{estimateDispersionsMAP}}. } \examples{ dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) head(dispersions(dds)) } \references{ \itemize{ \item Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 11 (2010) R106, \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} \item McCarthy, DJ, Chen, Y, Smyth, GK: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40 (2012), 4288-4297, \url{http://dx.doi.org/10.1093/nar/gks042} \item Wu, H., Wang, C. & Wu, Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics (2012). \url{http://dx.doi.org/10.1093/biostatistics/kxs033} } } DESeq2/man/estimateDispersionsGeneEst.Rd0000644000175400017540000000766413201671732021215 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{estimateDispersionsGeneEst} \alias{estimateDispersionsGeneEst} \alias{estimateDispersionsFit} \alias{estimateDispersionsMAP} \alias{estimateDispersionsPriorVar} \alias{estimateDispersionsFit} \alias{estimateDispersionsMAP} \alias{estimateDispersionsPriorVar} \title{Low-level functions to fit dispersion estimates} \usage{ estimateDispersionsGeneEst(object, minDisp = 1e-08, kappa_0 = 1, dispTol = 1e-06, maxit = 100, quiet = FALSE, modelMatrix = NULL, niter = 1, linearMu = NULL, minmu = 0.5) estimateDispersionsFit(object, fitType = c("parametric", "local", "mean"), minDisp = 1e-08, quiet = FALSE) estimateDispersionsMAP(object, outlierSD = 2, dispPriorVar, minDisp = 1e-08, kappa_0 = 1, dispTol = 1e-06, maxit = 100, modelMatrix = NULL, quiet = FALSE) estimateDispersionsPriorVar(object, minDisp = 1e-08, modelMatrix = NULL) } \arguments{ \item{object}{a DESeqDataSet} \item{minDisp}{small value for the minimum dispersion, to allow for calculations in log scale, one order of magnitude above this value is used as a test for inclusion in mean-dispersion fitting} \item{kappa_0}{control parameter used in setting the initial proposal in backtracking search, higher kappa_0 results in larger steps} \item{dispTol}{control parameter to test for convergence of log dispersion, stop when increase in log posterior is less than dispTol} \item{maxit}{control parameter: maximum number of iterations to allow for convergence} \item{quiet}{whether to print messages at each step} \item{modelMatrix}{for advanced use only, a substitute model matrix for gene-wise and MAP dispersion estimation} \item{niter}{number of times to iterate between estimation of means and estimation of dispersion} \item{linearMu}{estimate the expected counts matrix using a linear model, default is NULL, in which case a lienar model is used if the number of groups defined by the model matrix is equal to the number of columns of the model matrix} \item{minmu}{lower bound on the estimated count for fitting gene-wise dispersion} \item{fitType}{either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity. See \code{\link{estimateDispersions}} for description.} \item{outlierSD}{the number of standard deviations of log gene-wise estimates above the prior mean (fitted value), above which dispersion estimates will be labelled outliers. Outliers will keep their original value and not be shrunk using the prior.} \item{dispPriorVar}{the variance of the normal prior on the log dispersions. If not supplied, this is calculated as the difference between the mean squared residuals of gene-wise estimates to the fitted dispersion and the expected sampling variance of the log dispersion} } \value{ a DESeqDataSet with gene-wise, fitted, or final MAP dispersion estimates in the metadata columns of the object. \code{estimateDispersionsPriorVar} is called inside of \code{estimateDispersionsMAP} and stores the dispersion prior variance as an attribute of \code{dispersionFunction(dds)}, which can be manually provided to \code{estimateDispersionsMAP} for parallel execution. } \description{ Normal users should instead use \code{\link{estimateDispersions}}. These low-level functions are called by \code{\link{estimateDispersions}}, but are exported and documented for non-standard usage. For instance, it is possible to replace fitted values with a custom fit and continue with the maximum a posteriori dispersion estimation, as demonstrated in the examples below. } \examples{ dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersionsGeneEst(dds) dds <- estimateDispersionsFit(dds) dds <- estimateDispersionsMAP(dds) plotDispEsts(dds) # after having run estimateDispersionsFit() # the dispersion prior variance over all genes # can be obtained like so: dispPriorVar <- estimateDispersionsPriorVar(dds) } \seealso{ \code{\link{estimateDispersions}} } DESeq2/man/estimateSizeFactors.Rd0000644000175400017540000001064113201671732017661 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{estimateSizeFactors} \alias{estimateSizeFactors} \alias{estimateSizeFactors,DESeqDataSet-method} \title{Estimate the size factors for a \code{\link{DESeqDataSet}}} \usage{ \S4method{estimateSizeFactors}{DESeqDataSet}(object, type = c("ratio", "poscounts", "iterate"), locfunc = stats::median, geoMeans, controlGenes, normMatrix) } \arguments{ \item{object}{a DESeqDataSet} \item{type}{Method for estimation: either "ratio", "poscounts", or "iterate". "ratio" uses the standard median ratio method introduced in DESeq. The size factor is the median ratio of the sample over a "pseudosample": for each gene, the geometric mean of all samples. "poscounts" and "iterate" offer alternative estimators, which can be used even when all genes contain a sample with a zero (a problem for the default method, as the geometric mean becomes zero, and the ratio undefined). The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts. This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples. The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and finding a size factor vector by numerically optimizing the likelihood of the ~1 model.} \item{locfunc}{a function to compute a location for a sample. By default, the median is used. However, especially for low counts, the \code{\link[genefilter]{shorth}} function from the genefilter package may give better results.} \item{geoMeans}{by default this is not provided and the geometric means of the counts are calculated within the function. A vector of geometric means from another count matrix can be provided for a "frozen" size factor calculation} \item{controlGenes}{optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)} \item{normMatrix}{optional, a matrix of normalization factors which do not yet control for library size. Note that this argument should not be used (and will be ignored) if the \code{dds} object was created using \code{tximport}. In this case, the information in \code{assays(dds)[["avgTxLength"]]} is automatically used to create appropriate normalization factors. Providing \code{normMatrix} will estimate size factors on the count matrix divided by \code{normMatrix} and store the product of the size factors and \code{normMatrix} as \code{\link{normalizationFactors}}. It is recommended to divide out the row-wise geometric mean of \code{normMatrix} so the rows roughly are centered on 1.} } \value{ The DESeqDataSet passed as parameters, with the size factors filled in. } \description{ This function estimates the size factors using the "median ratio method" described by Equation 5 in Anders and Huber (2010). The estimated size factors can be accessed using the accessor function \code{\link{sizeFactors}}. Alternative library size estimators can also be supplied using the assignment function \code{\link{sizeFactors<-}}. } \details{ Typically, the function is called with the idiom: \code{dds <- estimateSizeFactors(dds)} See \code{\link{DESeq}} for a description of the use of size factors in the GLM. One should call this function after \code{\link{DESeqDataSet}} unless size factors are manually specified with \code{\link{sizeFactors}}. Alternatively, gene-specific normalization factors for each sample can be provided using \code{\link{normalizationFactors}} which will always preempt \code{\link{sizeFactors}} in calculations. Internally, the function calls \code{\link{estimateSizeFactorsForMatrix}}, which provides more details on the calculation. } \examples{ dds <- makeExampleDESeqDataSet(n=1000, m=4) dds <- estimateSizeFactors(dds) sizeFactors(dds) dds <- estimateSizeFactors(dds, controlGenes=1:200) m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4) dds <- estimateSizeFactors(dds, normMatrix=m) normalizationFactors(dds)[1:3,] geoMeans <- exp(rowMeans(log(counts(dds)))) dds <- estimateSizeFactors(dds,geoMeans=geoMeans) sizeFactors(dds) } \references{ Reference for the median ratio method: Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} } \seealso{ \code{\link{estimateSizeFactorsForMatrix}} } \author{ Simon Anders } DESeq2/man/estimateSizeFactorsForMatrix.Rd0000644000175400017540000000343113201671732021514 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{estimateSizeFactorsForMatrix} \alias{estimateSizeFactorsForMatrix} \title{Low-level function to estimate size factors with robust regression.} \usage{ estimateSizeFactorsForMatrix(counts, locfunc = stats::median, geoMeans, controlGenes) } \arguments{ \item{counts}{a matrix or data frame of counts, i.e., non-negative integer values} \item{locfunc}{a function to compute a location for a sample. By default, the median is used. However, especially for low counts, the \code{\link[genefilter]{shorth}} function from genefilter may give better results.} \item{geoMeans}{by default this is not provided, and the geometric means of the counts are calculated within the function. A vector of geometric means from another count matrix can be provided for a "frozen" size factor calculation} \item{controlGenes}{optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)} } \value{ a vector with the estimates size factors, one element per column } \description{ Given a matrix or data frame of count data, this function estimates the size factors as follows: Each column is divided by the geometric means of the rows. The median (or, if requested, another location estimator) of these ratios (skipping the genes with a geometric mean of zero) is used as the size factor for this column. Typically, one will not call this function directly, but use \code{\link{estimateSizeFactors}}. } \examples{ dds <- makeExampleDESeqDataSet() estimateSizeFactorsForMatrix(counts(dds)) geoMeans <- exp(rowMeans(log(counts(dds)))) estimateSizeFactorsForMatrix(counts(dds),geoMeans=geoMeans) } \seealso{ \code{\link{estimateSizeFactors}} } \author{ Simon Anders } DESeq2/man/fpkm.Rd0000644000175400017540000000566713201671732014642 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/helper.R \docType{methods} \name{fpkm} \alias{fpkm} \title{FPKM: fragments per kilobase per million mapped fragments} \usage{ fpkm(object, robust = TRUE) } \arguments{ \item{object}{a \code{DESeqDataSet}} \item{robust}{whether to use size factors to normalize rather than taking the column sums of the raw counts, using the \code{\link{fpm}} function.} } \value{ a matrix which is normalized per kilobase of the union of basepairs in the \code{GRangesList} or \code{GRanges} of the mcols(object), and per million of mapped fragments, either using the robust median ratio method (robust=TRUE, default) or using raw counts (robust=FALSE). Defining a column \code{mcols(object)$basepairs} takes precedence over internal calculation of the kilobases for each row. } \description{ The following function returns fragment counts normalized per kilobase of feature length per million mapped fragments (by default using a robust estimate of the library size, as in \code{\link{estimateSizeFactors}}). } \details{ The length of the features (e.g. genes) is calculated one of two ways: (1) If there is a matrix named "avgTxLength" in \code{assays(dds)}, this will take precedence in the length normalization. This occurs when using the tximport-DESeq2 pipeline. (2) Otherwise, feature length is calculated from the \code{rowRanges} of the dds object, if a column \code{basepairs} is not present in \code{mcols(dds)}. The calculated length is the number of basepairs in the union of all \code{GRanges} assigned to a given row of \code{object}, e.g., the union of all basepairs of exons of a given gene. Note that the second approach over-estimates the gene length (average transcript length, weighted by abundance is a more appropriate normalization for gene counts), and so the FPKM will be an underestimate of the true value. Note that, when the read/fragment counting has inter-feature dependencies, a strict normalization would not incorporate the basepairs of a feature which overlap another feature. This inter-feature dependence is not taken into consideration in the internal union basepair calculation. } \examples{ # create a matrix with 1 million counts for the # 2nd and 3rd column, the 1st and 4th have # half and double the counts, respectively. m <- matrix(1e6 * rep(c(.125, .25, .25, .5), each=4), ncol=4, dimnames=list(1:4,1:4)) mode(m) <- "integer" se <- SummarizedExperiment(list(counts=m), colData=DataFrame(sample=1:4)) dds <- DESeqDataSet(se, ~ 1) # create 4 GRanges with lengths: 1, 1, 2, 2.5 Kb gr1 <- GRanges("chr1",IRanges(1,1000)) # 1kb gr2 <- GRanges("chr1",IRanges(c(1,1001),c( 500,1500))) # 1kb gr3 <- GRanges("chr1",IRanges(c(1,1001),c(1000,2000))) # 2kb gr4 <- GRanges("chr1",IRanges(c(1,1001),c(200,1300))) # 500bp rowRanges(dds) <- GRangesList(gr1,gr2,gr3,gr4) # the raw counts counts(dds) # the FPM values fpm(dds) # the FPKM values fpkm(dds) } \seealso{ \code{\link{fpm}} } DESeq2/man/fpm.Rd0000644000175400017540000000353513201671732014457 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/helper.R \docType{methods} \name{fpm} \alias{fpm} \title{FPM: fragments per million mapped fragments} \usage{ fpm(object, robust = TRUE) } \arguments{ \item{object}{a \code{DESeqDataSet}} \item{robust}{whether to use size factors to normalize rather than taking the column sums of the raw counts. If TRUE, the size factors and the geometric mean of column sums are multiplied to create a robust library size estimate. Robust normalization is not used if average transcript lengths are present.} } \value{ a matrix which is normalized per million of mapped fragments, either using the robust median ratio method (robust=TRUE, default) or using raw counts (robust=FALSE). } \description{ Calculates either a robust version (default) or the traditional matrix of fragments/counts per million mapped fragments (FPM/CPM). Note: this function is written very simply and can be easily altered to produce other behavior by examining the source code. } \examples{ # generate a dataset with size factors: .5, 1, 1, 2 dds <- makeExampleDESeqDataSet(m = 4, n = 1000, interceptMean=log2(1e3), interceptSD=0, sizeFactors=c(.5,1,1,2), dispMeanRel=function(x) .01) # add a few rows with very high count counts(dds)[4:10,] <- 2e5L # in this robust version, the counts are comparable across samples round(head(fpm(dds), 3)) # in this column sum version, the counts are still skewed: # sample1 < sample2 & 3 < sample 4 round(head(fpm(dds, robust=FALSE), 3)) # the column sums of the robust version # are not equal to 1e6, but the # column sums of the non-robust version # are equal to 1e6 by definition colSums(fpm(dds))/1e6 colSums(fpm(dds, robust=FALSE))/1e6 } \seealso{ \code{\link{fpkm}} } DESeq2/man/lfcShrink.Rd0000644000175400017540000001063613201671732015620 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lfcShrink.R \name{lfcShrink} \alias{lfcShrink} \title{Shrink log2 fold changes} \usage{ lfcShrink(dds, coef, contrast, res, type = c("normal", "apeglm", "ashr"), svalue = FALSE, returnList = FALSE, apeAdapt = TRUE, parallel = FALSE, BPPARAM = bpparam(), ...) } \arguments{ \item{dds}{a DESeqDataSet object, after running \code{\link{DESeq}}} \item{coef}{the name or number of the coefficient (LFC) to shrink, consult \code{resultsNames(dds)} after running \code{DESeq(dds)}. note: only \code{coef} or \code{contrast} can be specified, not both. \code{type="apeglm"} requires use of \code{coef}.} \item{contrast}{see argument description in \code{\link{results}}. only \code{coef} or \code{contrast} can be specified, not both.} \item{res}{a DESeqResults object. Results table produced by the default pipeline, i.e. \code{DESeq} followed by \code{results}. If not provided, it will be generated internally using \code{coef} or \code{contrast}} \item{type}{\code{"normal"} is the original DESeq2 shrinkage estimator; \code{"apeglm"} is the adaptive t prior shrinkage estimator from the 'apeglm' package; \code{"ashr"} is the adaptive shrinkage estimator from the 'ashr' package, using a fitted mixture of normals prior - see the Stephens (2016) reference below for citation} \item{svalue}{logical, should p-values and adjusted p-values be replaced with s-values when using \code{apeglm} or \code{ashr}. See Stephens (2016) reference on s-values.} \item{returnList}{logical, should \code{lfcShrink} return a list, where the first element is the results table, and the second element is the output of \code{apeglm} or \code{ashr}} \item{apeAdapt}{logical, should \code{apeglm} use the MLE estimates of LFC to adapt the prior, or use default or specified \code{prior.control}} \item{parallel}{if FALSE, no parallelization. if TRUE, parallel execution using \code{BiocParallel}, see same argument of \code{\link{DESeq}} parallelization only used with \code{normal} or \code{apeglm}} \item{BPPARAM}{see same argument of \code{\link{DESeq}}} \item{...}{arguments passed to \code{apeglm} and \code{ashr}} } \value{ a DESeqResults object with the \code{log2FoldChange} and \code{lfcSE} columns replaced with shrunken LFC and SE. \code{priorInfo(res)} contains information about the shrinkage procedure, relevant to the various methods specified by \code{type}. } \description{ Adds shrunken log2 fold changes (LFC) and SE to a results table from \code{DESeq} run without LFC shrinkage. Three shrinkage esimators for LFC are available via \code{type}. } \details{ As of DESeq2 version 1.18, \code{type="apeglm"} and \code{type="ashr"} are new features, and still under development. Specifying \code{type="apeglm"} passes along DESeq2 MLE log2 fold changes and standard errors to the \code{apeglm} function in the apeglm package, and re-estimates posterior LFCs for the coefficient specified by \code{coef}. Specifying \code{type="ashr"} passes along DESeq2 MLE log2 fold changes and standard errors to the \code{ash} function in the ashr package, with arguments \code{mixcompdist="normal"} and \code{method="shrink"} (\code{coef} and \code{contrast} ignored). See vignette for a comparison of shrinkage estimators on an example dataset. For all shrinkage methods, details on the prior is included in \code{priorInfo(res)}, including the \code{fitted_g} mixture for ashr. The integration of shrinkage methods from external packages will likely evolve over time. We will likely incorporate an \code{lfcThreshold} argument which can be passed to apeglm to specify regions of the posterior at an arbitrary threshold. For \code{type="normal"}, shrinkage cannot be applied to coefficients in a model with interaction terms. } \examples{ set.seed(1) dds <- makeExampleDESeqDataSet(n=500,betaSD=1) dds <- DESeq(dds) res <- results(dds) res.shr <- lfcShrink(dds=dds, coef=2) res.shr <- lfcShrink(dds=dds, contrast=c("condition","B","A")) res.ape <- lfcShrink(dds=dds, coef=2, type="apeglm") res.ash <- lfcShrink(dds=dds, coef=2, type="ashr") } \references{ \code{type="normal"}: Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. \url{https://doi.org/10.1186/s13059-014-0550-8} \code{type="ashr"}: Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2. \url{https://doi.org/10.1093/biostatistics/kxw041} } DESeq2/man/makeExampleDESeqDataSet.Rd0000644000175400017540000000235513201671732020255 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{makeExampleDESeqDataSet} \alias{makeExampleDESeqDataSet} \title{Make a simulated DESeqDataSet} \usage{ makeExampleDESeqDataSet(n = 1000, m = 12, betaSD = 0, interceptMean = 4, interceptSD = 2, dispMeanRel = function(x) 4/x + 0.1, sizeFactors = rep(1, m)) } \arguments{ \item{n}{number of rows} \item{m}{number of columns} \item{betaSD}{the standard deviation for non-intercept betas, i.e. beta ~ N(0,betaSD)} \item{interceptMean}{the mean of the intercept betas (log2 scale)} \item{interceptSD}{the standard deviation of the intercept betas (log2 scale)} \item{dispMeanRel}{a function specifying the relationship of the dispersions on \code{2^trueIntercept}} \item{sizeFactors}{multiplicative factors for each sample} } \value{ a \code{\link{DESeqDataSet}} with true dispersion, intercept and beta values in the metadata columns. Note that the true betas are provided on the log2 scale. } \description{ Constructs a simulated dataset of Negative Binomial data from two conditions. By default, there are no fold changes between the two conditions, but this can be adjusted with the \code{betaSD} argument. } \examples{ dds <- makeExampleDESeqDataSet() dds } DESeq2/man/nbinomLRT.Rd0000644000175400017540000000375513201671732015545 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{nbinomLRT} \alias{nbinomLRT} \title{Likelihood ratio test (chi-squared test) for GLMs} \usage{ nbinomLRT(object, full = design(object), reduced, betaTol = 1e-08, maxit = 100, useOptim = TRUE, quiet = FALSE, useQR = TRUE) } \arguments{ \item{object}{a DESeqDataSet} \item{full}{the full model formula, this should be the formula in \code{design(object)}. alternatively, can be a matrix} \item{reduced}{a reduced formula to compare against, e.g. the full model with a term or terms of interest removed. alternatively, can be a matrix} \item{betaTol}{control parameter defining convergence} \item{maxit}{the maximum number of iterations to allow for convergence of the coefficient vector} \item{useOptim}{whether to use the native optim function on rows which do not converge within maxit} \item{quiet}{whether to print messages at each step} \item{useQR}{whether to use the QR decomposition on the design matrix X while fitting the GLM} } \value{ a DESeqDataSet with new results columns accessible with the \code{\link{results}} function. The coefficients and standard errors are reported on a log2 scale. } \description{ This function tests for significance of change in deviance between a full and reduced model which are provided as \code{formula}. Fitting uses previously calculated \code{\link{sizeFactors}} (or \code{\link{normalizationFactors}}) and dispersion estimates. } \details{ The difference in deviance is compared to a chi-squared distribution with df = (reduced residual degrees of freedom - full residual degrees of freedom). This function is comparable to the \code{nbinomGLMTest} of the previous version of DESeq and an alternative to the default \code{\link{nbinomWaldTest}}. } \examples{ dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomLRT(dds, reduced = ~ 1) res <- results(dds) } \seealso{ \code{\link{DESeq}}, \code{\link{nbinomWaldTest}} } DESeq2/man/nbinomWaldTest.Rd0000644000175400017540000001270213201671732016623 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{nbinomWaldTest} \alias{nbinomWaldTest} \title{Wald test for the GLM coefficients} \usage{ nbinomWaldTest(object, betaPrior = FALSE, betaPriorVar, modelMatrix = NULL, modelMatrixType, betaTol = 1e-08, maxit = 100, useOptim = TRUE, quiet = FALSE, useT = FALSE, df, useQR = TRUE) } \arguments{ \item{object}{a DESeqDataSet} \item{betaPrior}{whether or not to put a zero-mean normal prior on the non-intercept coefficients} \item{betaPriorVar}{a vector with length equal to the number of model terms including the intercept. betaPriorVar gives the variance of the prior on the sample betas on the log2 scale. if missing (default) this is estimated from the data} \item{modelMatrix}{an optional matrix, typically this is set to NULL and created within the function. only can be supplied if betaPrior=FALSE} \item{modelMatrixType}{either "standard" or "expanded", which describe how the model matrix, X of the formula in \code{\link{DESeq}}, is formed. "standard" is as created by \code{model.matrix} using the design formula. "expanded" includes an indicator variable for each level of factors in addition to an intercept. betaPrior must be set to TRUE in order for expanded model matrices to be fit.} \item{betaTol}{control parameter defining convergence} \item{maxit}{the maximum number of iterations to allow for convergence of the coefficient vector} \item{useOptim}{whether to use the native optim function on rows which do not converge within maxit} \item{quiet}{whether to print messages at each step} \item{useT}{whether to use a t-distribution as a null distribution, for significance testing of the Wald statistics. If FALSE, a standard normal null distribution is used.} \item{df}{the degrees of freedom for the t-distribution} \item{useQR}{whether to use the QR decomposition on the design matrix X while fitting the GLM} } \value{ a DESeqDataSet with results columns accessible with the \code{\link{results}} function. The coefficients and standard errors are reported on a log2 scale. } \description{ This function tests for significance of coefficients in a Negative Binomial GLM, using previously calculated \code{\link{sizeFactors}} (or \code{\link{normalizationFactors}}) and dispersion estimates. See \code{\link{DESeq}} for the GLM formula. } \details{ The fitting proceeds as follows: standard maximum likelihood estimates for GLM coefficients (synonymous with "beta", "log2 fold change", "effect size") are calculated. Then, optionally, a zero-centered Normal prior distribution (\code{betaPrior}) is assumed for the coefficients other than the intercept. Note that this posterior log2 fold change estimation is now not the default setting for \code{nbinomWaldTest}, as the standard workflow for coefficient shrinkage has moved to an additional function \code{link{lfcShrink}}. For calculating Wald test p-values, the coefficients are scaled by their standard errors and then compared to a standard Normal distribution. The \code{\link{results}} function without any arguments will automatically perform a contrast of the last level of the last variable in the design formula over the first level. The \code{contrast} argument of the \code{\link{results}} function can be used to generate other comparisons. The Wald test can be replaced with the \code{\link{nbinomLRT}} for an alternative test of significance. Notes on the log2 fold change prior: The variance of the prior distribution for each non-intercept coefficient is calculated using the observed distribution of the maximum likelihood coefficients. The final coefficients are then maximum a posteriori estimates using this prior (Tikhonov/ridge regularization). See below for details on the prior variance and the Methods section of the DESeq2 manuscript for more detail. The use of a prior has little effect on genes with high counts and helps to moderate the large spread in coefficients for genes with low counts. The prior variance is calculated by matching the 0.05 upper quantile of the observed MLE coefficients to a zero-centered Normal distribution. In a change of methods since the 2014 paper, the weighted upper quantile is calculated using the \code{wtd.quantile} function from the Hmisc package. The weights are the inverse of the expected variance of log counts, so the inverse of \eqn{1/\bar{\mu} + \alpha_{tr}}{1/mu-bar + alpha_tr} using the mean of normalized counts and the trended dispersion fit. The weighting ensures that noisy estimates of log fold changes from small count genes do not overly influence the calculation of the prior variance. See \code{\link{estimateBetaPriorVar}}. The final prior variance for a factor level is the average of the estimated prior variance over all contrasts of all levels of the factor. When a log2 fold change prior is used (betaPrior=TRUE), then \code{nbinomWaldTest} will by default use expanded model matrices, as described in the \code{modelMatrixType} argument, unless this argument is used to override the default behavior. This ensures that log2 fold changes will be independent of the choice of reference level. In this case, the beta prior variance for each factor is calculated as the average of the mean squared maximum likelihood estimates for each level and every possible contrast. } \examples{ dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) res <- results(dds) } \seealso{ \code{\link{DESeq}}, \code{\link{nbinomLRT}} } DESeq2/man/normTransform.Rd0000644000175400017540000000113213201671732016533 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/helper.R \name{normTransform} \alias{normTransform} \title{Normalized counts transformation} \usage{ normTransform(object, f = log2, pc = 1) } \arguments{ \item{object}{a DESeqDataSet object} \item{f}{a function to apply to normalized counts} \item{pc}{a pseudocount to add to normalized counts} } \description{ A simple function for creating a \code{\link{DESeqTransform}} object after applying: \code{f(count(dds,normalized=TRUE) + pc)}. } \seealso{ \code{\link{varianceStabilizingTransformation}}, \code{\link{rlog}} } DESeq2/man/normalizationFactors.Rd0000644000175400017540000000466213201671732020107 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllGenerics.R, R/methods.R \docType{methods} \name{normalizationFactors} \alias{normalizationFactors} \alias{normalizationFactors<-} \alias{normalizationFactors} \alias{normalizationFactors,DESeqDataSet-method} \alias{normalizationFactors<-,DESeqDataSet,matrix-method} \alias{normalizationFactors} \title{Accessor functions for the normalization factors in a DESeqDataSet object.} \usage{ normalizationFactors(object, ...) normalizationFactors(object, ...) <- value \S4method{normalizationFactors}{DESeqDataSet}(object) \S4method{normalizationFactors}{DESeqDataSet,matrix}(object) <- value } \arguments{ \item{object}{a \code{DESeqDataSet} object.} \item{...}{additional arguments} \item{value}{the matrix of normalization factors} } \description{ Gene-specific normalization factors for each sample can be provided as a matrix, which will preempt \code{\link{sizeFactors}}. In some experiments, counts for each sample have varying dependence on covariates, e.g. on GC-content for sequencing data run on different days, and in this case it makes sense to provide gene-specific factors for each sample rather than a single size factor. } \details{ Normalization factors alter the model of \code{\link{DESeq}} in the following way, for counts \eqn{K_{ij}}{K_ij} and normalization factors \eqn{NF_{ij}}{NF_ij} for gene i and sample j: \deqn{ K_{ij} \sim \textrm{NB}( \mu_{ij}, \alpha_i) }{ K_ij ~ NB(mu_ij, alpha_i) } \deqn{ \mu_{ij} = NF_{ij} q_{ij} }{ mu_ij = NF_ij q_ij } } \note{ Normalization factors are on the scale of the counts (similar to \code{\link{sizeFactors}}) and unlike offsets, which are typically on the scale of the predictors (in this case, log counts). Normalization factors should include library size normalization. They should have row-wise geometric mean near 1, as is the case with size factors, such that the mean of normalized counts is close to the mean of unnormalized counts. See example code below. } \examples{ dds <- makeExampleDESeqDataSet(n=100, m=4) normFactors <- matrix(runif(nrow(dds)*ncol(dds),0.5,1.5), ncol=ncol(dds),nrow=nrow(dds), dimnames=list(1:nrow(dds),1:ncol(dds))) # the normalization factors matrix should not have 0's in it # it should have geometric mean near 1 for each row normFactors <- normFactors / exp(rowMeans(log(normFactors))) normalizationFactors(dds) <- normFactors dds <- DESeq(dds) } DESeq2/man/normalizeGeneLength.Rd0000644000175400017540000000072213201671732017631 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/helper.R \name{normalizeGeneLength} \alias{normalizeGeneLength} \title{Normalize for gene length} \usage{ normalizeGeneLength(...) } \arguments{ \item{...}{...} } \description{ Normalize for gene length using the output of transcript abundance estimators } \details{ This function is deprecated and moved to a new general purpose package, tximport, which will be added to Bioconductor. } DESeq2/man/plotCounts.Rd0000644000175400017540000000230113201671732016035 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plots.R \name{plotCounts} \alias{plotCounts} \title{Plot of normalized counts for a single gene} \usage{ plotCounts(dds, gene, intgroup = "condition", normalized = TRUE, transform = TRUE, main, xlab = "group", returnData = FALSE, replaced = FALSE, pc, ...) } \arguments{ \item{dds}{a \code{DESeqDataSet}} \item{gene}{a character, specifying the name of the gene to plot} \item{intgroup}{interesting groups: a character vector of names in \code{colData(x)} to use for grouping} \item{normalized}{whether the counts should be normalized by size factor (default is TRUE)} \item{transform}{whether to have log scale y-axis or not. defaults to TRUE} \item{main}{as in 'plot'} \item{xlab}{as in 'plot'} \item{returnData}{should the function only return the data.frame of counts and covariates for custom plotting (default is FALSE)} \item{replaced}{use the outlier-replaced counts if they exist} \item{pc}{pseudocount for log transform} \item{...}{arguments passed to plot} } \description{ Normalized counts plus a pseudocount of 0.5 are shown by default. } \examples{ dds <- makeExampleDESeqDataSet() plotCounts(dds, "gene1") } DESeq2/man/plotDispEsts.Rd0000644000175400017540000000277113201671732016333 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plots.R \docType{methods} \name{plotDispEsts} \alias{plotDispEsts} \alias{plotDispEsts,DESeqDataSet-method} \title{Plot dispersion estimates} \usage{ \S4method{plotDispEsts}{DESeqDataSet}(object, ymin, CV = FALSE, genecol = "black", fitcol = "red", finalcol = "dodgerblue", legend = TRUE, xlab, ylab, log = "xy", cex = 0.45, ...) } \arguments{ \item{object}{a DESeqDataSet, with dispersions estimated} \item{ymin}{the lower bound for points on the plot, points beyond this are drawn as triangles at ymin} \item{CV}{logical, whether to plot the asymptotic or biological coefficient of variation (the square root of dispersion) on the y-axis. As the mean grows to infinity, the square root of dispersion gives the coefficient of variation for the counts. Default is \code{FALSE}, plotting dispersion.} \item{genecol}{the color for gene-wise dispersion estimates} \item{fitcol}{the color of the fitted estimates} \item{finalcol}{the color of the final estimates used for testing} \item{legend}{logical, whether to draw a legend} \item{xlab}{xlab} \item{ylab}{ylab} \item{log}{log} \item{cex}{cex} \item{...}{further arguments to \code{plot}} } \description{ A simple helper function that plots the per-gene dispersion estimates together with the fitted mean-dispersion relationship. } \examples{ dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) plotDispEsts(dds) } \author{ Simon Anders } DESeq2/man/plotMA.Rd0000644000175400017540000000430113201671732015061 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plots.R \docType{methods} \name{plotMA} \alias{plotMA} \alias{plotMA,DESeqDataSet-method} \alias{plotMA,DESeqResults-method} \alias{plotMA} \title{MA-plot from base means and log fold changes} \usage{ \S4method{plotMA}{DESeqDataSet}(object, alpha = 0.1, main = "", xlab = "mean of normalized counts", ylim, MLE = FALSE, ...) \S4method{plotMA}{DESeqResults}(object, alpha, main = "", xlab = "mean of normalized counts", ylim, MLE = FALSE, ...) } \arguments{ \item{object}{a \code{DESeqResults} object produced by \code{\link{results}}; or a \code{DESeqDataSet} processed by \code{\link{DESeq}}, or the individual functions \code{\link{nbinomWaldTest}} or \code{\link{nbinomLRT}}} \item{alpha}{the significance level for thresholding adjusted p-values} \item{main}{optional title for the plot} \item{xlab}{optional defaults to "mean of normalized counts"} \item{ylim}{optional y limits} \item{MLE}{if \code{betaPrior=TRUE} was used, whether to plot the MLE (unshrunken estimates), defaults to FALSE. Requires that \code{\link{results}} was run with \code{addMLE=TRUE}. Note that the MLE will be plotted regardless of this argument, if DESeq() was run with \code{betaPrior=FALSE}. See \code{\link{lfcShrink}} for examples on how to plot shrunken log2 fold changes.} \item{...}{further arguments passed to \code{plotMA} if object is \code{DESeqResults} or to \code{\link{results}} if object is \code{DESeqDataSet}} } \description{ A simple helper function that makes a so-called "MA-plot", i.e. a scatter plot of log2 fold changes (on the y-axis) versus the mean of normalized counts (on the x-axis). } \details{ This function is essentially two lines of code: building a \code{data.frame} and passing this to the \code{plotMA} method for \code{data.frame} from the geneplotter package. The code of this function can be seen with: \code{getMethod("plotMA","DESeqDataSet")} If users wish to modify the graphical parameters of the plot, it is recommended to build the data.frame in the same manner and call \code{plotMA}. } \examples{ dds <- makeExampleDESeqDataSet() dds <- DESeq(dds) plotMA(dds) res <- results(dds) plotMA(res) } \author{ Michael Love } DESeq2/man/plotPCA.Rd0000644000175400017540000000363213201671732015175 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plots.R \docType{methods} \name{plotPCA} \alias{plotPCA} \alias{plotPCA,DESeqTransform-method} \title{Sample PCA plot for transformed data} \usage{ \S4method{plotPCA}{DESeqTransform}(object, intgroup = "condition", ntop = 500, returnData = FALSE) } \arguments{ \item{object}{a \code{\link{DESeqTransform}} object, with data in \code{assay(x)}, produced for example by either \code{\link{rlog}} or \code{\link{varianceStabilizingTransformation}}.} \item{intgroup}{interesting groups: a character vector of names in \code{colData(x)} to use for grouping} \item{ntop}{number of top genes to use for principal components, selected by highest row variance} \item{returnData}{should the function only return the data.frame of PC1 and PC2 with intgroup covariates for custom plotting (default is FALSE)} } \value{ An object created by \code{ggplot}, which can be assigned and further customized. } \description{ This plot helps to check for batch effects and the like. } \note{ See the vignette for an example of variance stabilization and PCA plots. Note that the source code of \code{plotPCA} is very simple. The source can be found by typing \code{DESeq2:::plotPCA.DESeqTransform} or \code{getMethod("plotPCA","DESeqTransform")}, or browsed on github at \url{https://github.com/Bioconductor-mirror/DESeq2/blob/master/R/plots.R} Users should find it easy to customize this function. } \examples{ # using rlog transformed data: dds <- makeExampleDESeqDataSet(betaSD=1) rld <- rlog(dds) plotPCA(rld) # also possible to perform custom transformation: dds <- estimateSizeFactors(dds) # shifted log of normalized counts se <- SummarizedExperiment(log2(counts(dds, normalized=TRUE) + 1), colData=colData(dds)) # the call to DESeqTransform() is needed to # trigger our plotPCA method. plotPCA( DESeqTransform( se ) ) } \author{ Wolfgang Huber } DESeq2/man/plotSparsity.Rd0000644000175400017540000000153613201671732016411 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/plots.R \name{plotSparsity} \alias{plotSparsity} \title{Sparsity plot} \usage{ plotSparsity(x, normalized = TRUE, ...) } \arguments{ \item{x}{a matrix or DESeqDataSet} \item{normalized}{whether to normalize the counts from a DESeqDataSEt} \item{...}{passed to \code{plot}} } \description{ A simple plot of the concentration of counts in a single sample over the sum of counts per gene. Not technically the same as "sparsity", but this plot is useful diagnostic for datasets which might not fit a negative binomial assumption: genes with many zeros and individual very large counts are difficult to model with the negative binomial distribution. } \examples{ dds <- makeExampleDESeqDataSet(n=1000,m=4,dispMeanRel=function(x) .5) dds <- estimateSizeFactors(dds) plotSparsity(dds) } DESeq2/man/priorInfo.Rd0000644000175400017540000000134713201671732015643 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/AllGenerics.R, R/methods.R \docType{methods} \name{priorInfo} \alias{priorInfo} \alias{priorInfo<-} \alias{priorInfo} \alias{priorInfo,DESeqResults-method} \alias{priorInfo<-,DESeqResults,list-method} \alias{priorInfo} \title{Accessors for the 'priorInfo' slot of a DESeqResults object.} \usage{ priorInfo(object, ...) priorInfo(object, ...) <- value \S4method{priorInfo}{DESeqResults}(object) \S4method{priorInfo}{DESeqResults,list}(object) <- value } \arguments{ \item{object}{a \code{DESeqResults} object} \item{...}{additional arguments} \item{value}{a \code{list}} } \description{ The priorInfo slot contains details about the prior on log fold changes } DESeq2/man/replaceOutliers.Rd0000644000175400017540000000651613201671732017041 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{replaceOutliers} \alias{replaceOutliers} \alias{replaceOutliersWithTrimmedMean} \alias{replaceOutliersWithTrimmedMean} \title{Replace outliers with trimmed mean} \usage{ replaceOutliers(object, trim = 0.2, cooksCutoff, minReplicates = 7, whichSamples) replaceOutliersWithTrimmedMean(object, trim = 0.2, cooksCutoff, minReplicates = 7, whichSamples) } \arguments{ \item{object}{a DESeqDataSet object, which has already been processed by either DESeq, nbinomWaldTest or nbinomLRT, and therefore contains a matrix contained in \code{assays(dds)[["cooks"]]}. These are the Cook's distances which will be used to define outlier counts.} \item{trim}{the fraction (0 to 0.5) of observations to be trimmed from each end of the normalized counts for a gene before the mean is computed} \item{cooksCutoff}{the threshold for defining an outlier to be replaced. Defaults to the .99 quantile of the F(p, m - p) distribution, where p is the number of parameters and m is the number of samples.} \item{minReplicates}{the minimum number of replicate samples necessary to consider a sample eligible for replacement (including itself). Outlier counts will not be replaced if the sample is in a cell which has less than minReplicates replicates.} \item{whichSamples}{optional, a numeric or logical index to specify which samples should have outliers replaced. if missing, this is determined using minReplicates.} } \value{ a DESeqDataSet with replaced counts in the slot returned by \code{\link{counts}} and the original counts preserved in \code{assays(dds)[["originalCounts"]]} } \description{ Note that this function is called within \code{\link{DESeq}}, so is not necessary to call on top of a \code{DESeq} call. See the \code{minReplicatesForReplace} argument documented in \code{link{DESeq}}. } \details{ This function replaces outlier counts flagged by extreme Cook's distances, as calculated by \code{\link{DESeq}}, \code{\link{nbinomWaldTest}} or \code{\link{nbinomLRT}}, with values predicted by the trimmed mean over all samples (and adjusted by size factor or normalization factor). This function replaces the counts in the matrix returned by \code{counts(dds)} and the Cook's distances in \code{assays(dds)[["cooks"]]}. Original counts are preserved in \code{assays(dds)[["originalCounts"]]}. The \code{\link{DESeq}} function calculates a diagnostic measure called Cook's distance for every gene and every sample. The \code{\link{results}} function then sets the p-values to \code{NA} for genes which contain an outlying count as defined by a Cook's distance above a threshold. With many degrees of freedom, i.e. many more samples than number of parameters to be estimated-- it might be undesirable to remove entire genes from the analysis just because their data include a single count outlier. An alternate strategy is to replace the outlier counts with the trimmed mean over all samples, adjusted by the size factor or normalization factor for that sample. The following simple function performs this replacement for the user, for samples which have at least \code{minReplicates} number of replicates (including that sample). For more information on Cook's distance, please see the two sections of the vignette: 'Dealing with count outliers' and 'Count outlier detection'. } \seealso{ \code{\link{DESeq}} } DESeq2/man/results.Rd0000644000175400017540000003615413201671732015401 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/results.R \name{results} \alias{results} \alias{resultsNames} \alias{removeResults} \alias{resultsNames} \alias{removeResults} \title{Extract results from a DESeq analysis} \usage{ results(object, contrast, name, lfcThreshold = 0, altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"), listValues = c(1, -1), cooksCutoff, independentFiltering = TRUE, alpha = 0.1, filter, theta, pAdjustMethod = "BH", filterFun, format = c("DataFrame", "GRanges", "GRangesList"), test, addMLE = FALSE, tidy = FALSE, parallel = FALSE, BPPARAM = bpparam(), ...) resultsNames(object) removeResults(object) } \arguments{ \item{object}{a DESeqDataSet, on which one of the following functions has already been called: \code{\link{DESeq}}, \code{\link{nbinomWaldTest}}, or \code{\link{nbinomLRT}}} \item{contrast}{this argument specifies what comparison to extract from the \code{object} to build a results table. one of either: \itemize{ \item a character vector with exactly three elements: the name of a factor in the design formula, the name of the numerator level for the fold change, and the name of the denominator level for the fold change (simplest case) \item a list of 2 character vectors: the names of the fold changes for the numerator, and the names of the fold changes for the denominator. these names should be elements of \code{resultsNames(object)}. if the list is length 1, a second element is added which is the empty character vector, \code{character()}. (more general case, can be to combine interaction terms and main effects) \item a numeric contrast vector with one element for each element in \code{resultsNames(object)} (most general case) } If specified, the \code{name} argument is ignored.} \item{name}{the name of the individual effect (coefficient) for building a results table. Use this argument rather than \code{contrast} for continuous variables, individual effects or for individual interaction terms. The value provided to \code{name} must be an element of \code{resultsNames(object)}.} \item{lfcThreshold}{a non-negative value which specifies a log2 fold change threshold. The default value is 0, corresponding to a test that the log2 fold changes are equal to zero. The user can specify the alternative hypothesis using the \code{altHypothesis} argument, which defaults to testing for log2 fold changes greater in absolute value than a given threshold. If \code{lfcThreshold} is specified, the results are for Wald tests, and LRT p-values will be overwritten.} \item{altHypothesis}{character which specifies the alternative hypothesis, i.e. those values of log2 fold change which the user is interested in finding. The complement of this set of values is the null hypothesis which will be tested. If the log2 fold change specified by \code{name} or by \code{contrast} is written as \eqn{ \beta }{ beta }, then the possible values for \code{altHypothesis} represent the following alternate hypotheses: \itemize{ \item greaterAbs: \eqn{|\beta| > \textrm{lfcThreshold} }{ |beta| > lfcThreshold }, and p-values are two-tailed \item lessAbs: \eqn{ |\beta| < \textrm{lfcThreshold} }{ |beta| < lfcThreshold }, NOTE: this requires that \code{betaPrior=FALSE} has been specified in the previous \code{\link{DESeq}} call. p-values are the maximum of the upper and lower tests. \item greater: \eqn{ \beta > \textrm{lfcThreshold} }{ beta > lfcThreshold } \item less: \eqn{ \beta < -\textrm{lfcThreshold} }{ beta < -lfcThreshold } }} \item{listValues}{only used if a list is provided to \code{contrast}: a numeric of length two: the log2 fold changes in the list are multiplied by these values. the first number should be positive and the second negative. by default this is \code{c(1,-1)}} \item{cooksCutoff}{theshold on Cook's distance, such that if one or more samples for a row have a distance higher, the p-value for the row is set to NA. The default cutoff is the .99 quantile of the F(p, m-p) distribution, where p is the number of coefficients being fitted and m is the number of samples. Set to \code{Inf} or \code{FALSE} to disable the resetting of p-values to NA. Note: this test excludes the Cook's distance of samples belonging to experimental groups with only 2 samples.} \item{independentFiltering}{logical, whether independent filtering should be applied automatically} \item{alpha}{the significance cutoff used for optimizing the independent filtering (by default 0.1). If the adjusted p-value cutoff (FDR) will be a value other than 0.1, \code{alpha} should be set to that value.} \item{filter}{the vector of filter statistics over which the independent filtering will be optimized. By default the mean of normalized counts is used.} \item{theta}{the quantiles at which to assess the number of rejections from independent filtering} \item{pAdjustMethod}{the method to use for adjusting p-values, see \code{?p.adjust}} \item{filterFun}{an optional custom function for performing independent filtering and p-value adjustment, with arguments \code{res} (a DESeqResults object), \code{filter} (the quantitity for filtering tests), \code{alpha} (the target FDR), \code{pAdjustMethod}. This function should return a DESeqResults object with a \code{padj} column.} \item{format}{character, either \code{"DataFrame"}, \code{"GRanges"}, or \code{"GRangesList"}, whether the results should be printed as a \code{\link{DESeqResults}} DataFrame, or if the results DataFrame should be attached as metadata columns to the \code{GRanges} or \code{GRangesList} \code{rowRanges} of the \code{DESeqDataSet}. If the \code{rowRanges} is a \code{GRangesList}, and \code{GRanges} is requested, the range of each gene will be returned} \item{test}{this is automatically detected internally if not provided. the one exception is after \code{nbinomLRT} has been run, \code{test="Wald"} will generate Wald statistics and Wald test p-values.} \item{addMLE}{if \code{betaPrior=TRUE} was used, whether the "unshrunken" maximum likelihood estimates (MLE) of log2 fold change should be added as a column to the results table (default is FALSE). This argument is preserved for backward compatability, as now the recommended pipeline is to generate shrunken MAP estimates using \code{\link{lfcShrink}}. This argument functionality is only implemented for \code{contrast} specified as three element character vectors.} \item{tidy}{whether to output the results table with rownames as a first column 'row'. the table will also be coerced to \code{data.frame}} \item{parallel}{if FALSE, no parallelization. if TRUE, parallel execution using \code{BiocParallel}, see next argument \code{BPPARAM}} \item{BPPARAM}{an optional parameter object passed internally to \code{\link{bplapply}} when \code{parallel=TRUE}. If not specified, the parameters last registered with \code{\link{register}} will be used.} \item{...}{optional arguments passed to \code{filterFun}} } \value{ For \code{results}: a \code{\link{DESeqResults}} object, which is a simple subclass of DataFrame. This object contains the results columns: \code{baseMean}, \code{log2FoldChange}, \code{lfcSE}, \code{stat}, \code{pvalue} and \code{padj}, and also includes metadata columns of variable information. The \code{lfcSE} gives the standard error of the \code{log2FoldChange}. For the Wald test, \code{stat} is the Wald statistic: the \code{log2FoldChange} divided by \code{lfcSE}, which is compared to a standard Normal distribution to generate a two-tailed \code{pvalue}. For the likelihood ratio test (LRT), \code{stat} is the difference in deviance between the reduced model and the full model, which is compared to a chi-squared distribution to generate a \code{pvalue}. For \code{resultsNames}: the names of the columns available as results, usually a combination of the variable name and a level For \code{removeResults}: the original \code{DESeqDataSet} with results metadata columns removed } \description{ \code{results} extracts a result table from a DESeq analysis giving base means across samples, log2 fold changes, standard errors, test statistics, p-values and adjusted p-values; \code{resultsNames} returns the names of the estimated effects (coefficents) of the model; \code{removeResults} returns a \code{DESeqDataSet} object with results columns removed. } \details{ The results table when printed will provide the information about the comparison, e.g. "log2 fold change (MAP): condition treated vs untreated", meaning that the estimates are of log2(treated / untreated), as would be returned by \code{contrast=c("condition","treated","untreated")}. Multiple results can be returned for analyses beyond a simple two group comparison, so \code{results} takes arguments \code{contrast} and \code{name} to help the user pick out the comparisons of interest for printing a results table. The use of the \code{contrast} argument is recommended for exact specification of the levels which should be compared and their order. If \code{results} is run without specifying \code{contrast} or \code{name}, it will return the comparison of the last level of the last variable in the design formula over the first level of this variable. For example, for a simple two-group comparison, this would return the log2 fold changes of the second group over the first group (the reference level). Please see examples below and in the vignette. The argument \code{contrast} can be used to generate results tables for any comparison of interest, for example, the log2 fold change between two levels of a factor, and its usage is described below. It can also accomodate more complicated numeric comparisons. The test statistic used for a contrast is: \deqn{ c^t \beta / \sqrt{c^t \Sigma c } }{ c' beta / sqrt( c' Sigma c ) } The argument \code{name} can be used to generate results tables for individual effects, which must be individual elements of \code{resultsNames(object)}. These individual effects could represent continuous covariates, effects for individual levels, or individual interaction effects. Information on the comparison which was used to build the results table, and the statistical test which was used for p-values (Wald test or likelihood ratio test) is stored within the object returned by \code{results}. This information is in the metadata columns of the results table, which is accessible by calling \code{mcols} on the \code{\link{DESeqResults}} object returned by \code{results}. On p-values: By default, independent filtering is performed to select a set of genes for multiple test correction which maximizes the number of adjusted p-values less than a given critical value \code{alpha} (by default 0.1). See the reference in this man page for details on independent filtering. The filter used for maximizing the number of rejections is the mean of normalized counts for all samples in the dataset. Several arguments from the \code{\link[genefilter]{filtered_p}} function of the genefilter package (used within the \code{results} function) are provided here to control the independent filtering behavior. In DESeq2 version >= 1.10, the threshold that is chosen is the lowest quantile of the filter for which the number of rejections is close to the peak of a curve fit to the number of rejections over the filter quantiles. 'Close to' is defined as within 1 residual standard deviation. The adjusted p-values for the genes which do not pass the filter threshold are set to \code{NA}. By default, \code{results} assigns a p-value of \code{NA} to genes containing count outliers, as identified using Cook's distance. See the \code{cooksCutoff} argument for control of this behavior. Cook's distances for each sample are accessible as a matrix "cooks" stored in the \code{assays()} list. This measure is useful for identifying rows where the observed counts might not fit to a Negative Binomial distribution. For analyses using the likelihood ratio test (using \code{\link{nbinomLRT}}), the p-values are determined solely by the difference in deviance between the full and reduced model formula. A single log2 fold change is printed in the results table for consistency with other results table outputs, however the test statistic and p-values may nevertheless involve the testing of one or more log2 fold changes. Which log2 fold change is printed in the results table can be controlled using the \code{name} argument, or by default this will be the estimated coefficient for the last element of \code{resultsNames(object)}. } \examples{ ## Example 1: two-group comparison dds <- makeExampleDESeqDataSet(m=4) dds <- DESeq(dds) res <- results(dds, contrast=c("condition","B","A")) # with more than two groups, the call would look similar, e.g.: # results(dds, contrast=c("condition","C","A")) # etc. ## Example 2: two conditions, two genotypes, with an interaction term dds <- makeExampleDESeqDataSet(n=100,m=12) dds$genotype <- factor(rep(rep(c("I","II"),each=3),2)) design(dds) <- ~ genotype + condition + genotype:condition dds <- DESeq(dds) resultsNames(dds) # Note: design with interactions terms by default have betaPrior=FALSE # the condition effect for genotype I (the main effect) results(dds, contrast=c("condition","B","A")) # the condition effect for genotype II # this is, by definition, the main effect *plus* the interaction term # (the extra condition effect in genotype II compared to genotype I). results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") )) # the interaction term, answering: is the condition effect *different* across genotypes? results(dds, name="genotypeII.conditionB") ## Example 3: two conditions, three genotypes # ~~~ Using interaction terms ~~~ dds <- makeExampleDESeqDataSet(n=100,m=18) dds$genotype <- factor(rep(rep(c("I","II","III"),each=3),2)) design(dds) <- ~ genotype + condition + genotype:condition dds <- DESeq(dds) resultsNames(dds) # the condition effect for genotype I (the main effect) results(dds, contrast=c("condition","B","A")) # the condition effect for genotype III. # this is the main effect *plus* the interaction term # (the extra condition effect in genotype III compared to genotype I). results(dds, contrast=list( c("condition_B_vs_A","genotypeIII.conditionB") )) # the interaction term for condition effect in genotype III vs genotype I. # this tests if the condition effect is different in III compared to I results(dds, name="genotypeIII.conditionB") # the interaction term for condition effect in genotype III vs genotype II. # this tests if the condition effect is different in III compared to II results(dds, contrast=list("genotypeIII.conditionB", "genotypeII.conditionB")) # Note that a likelihood ratio could be used to test if there are any # differences in the condition effect between the three genotypes. # ~~~ Using a grouping variable ~~~ # This is a useful construction when users just want to compare # specific groups which are combinations of variables. dds$group <- factor(paste0(dds$genotype, dds$condition)) design(dds) <- ~ group dds <- DESeq(dds) resultsNames(dds) # the condition effect for genotypeIII results(dds, contrast=c("group", "IIIB", "IIIA")) } \references{ Richard Bourgon, Robert Gentleman, Wolfgang Huber: Independent filtering increases detection power for high-throughput experiments. PNAS (2010), \url{http://dx.doi.org/10.1073/pnas.0914005107} } \seealso{ \code{\link{DESeq}}, \code{\link[genefilter]{filtered_R}} } DESeq2/man/rlog.Rd0000644000175400017540000001421513201671732014635 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/rlog.R \name{rlog} \alias{rlog} \alias{rlogTransformation} \alias{rlogTransformation} \title{Apply a 'regularized log' transformation} \usage{ rlog(object, blind = TRUE, intercept, betaPriorVar, fitType = "parametric") rlogTransformation(object, blind = TRUE, intercept, betaPriorVar, fitType = "parametric") } \arguments{ \item{object}{a DESeqDataSet, or matrix of counts} \item{blind}{logical, whether to blind the transformation to the experimental design. blind=TRUE should be used for comparing samples in an manner unbiased by prior information on samples, for example to perform sample QA (quality assurance). blind=FALSE should be used for transforming data for downstream analysis, where the full use of the design information should be made. blind=FALSE will skip re-estimation of the dispersion trend, if this has already been calculated. If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis.} \item{intercept}{by default, this is not provided and calculated automatically. if provided, this should be a vector as long as the number of rows of object, which is log2 of the mean normalized counts from a previous dataset. this will enforce the intercept for the GLM, allowing for a "frozen" rlog transformation based on a previous dataset. You will also need to provide \code{mcols(object)$dispFit}.} \item{betaPriorVar}{a single value, the variance of the prior on the sample betas, which if missing is estimated from the data} \item{fitType}{in case dispersions have not yet been estimated for \code{object}, this parameter is passed on to \code{\link{estimateDispersions}} (options described there).} } \value{ a \code{\link{DESeqTransform}} if a \code{DESeqDataSet} was provided, or a matrix if a count matrix was provided as input. Note that for \code{\link{DESeqTransform}} output, the matrix of transformed values is stored in \code{assay(rld)}. To avoid returning matrices with NA values, in the case of a row of all zeros, the rlog transformation returns zeros (essentially adding a pseudocount of 1 only to these rows). } \description{ This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size. The rlog transformation produces a similar variance stabilizing effect as \code{\link{varianceStabilizingTransformation}}, though \code{rlog} is more robust in the case when the size factors vary widely. The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis. \code{rlog} takes as input a \code{\link{DESeqDataSet}} and returns a \code{\link{RangedSummarizedExperiment}} object. } \details{ Note that neither rlog transformation nor the VST are used by the differential expression estimation in \code{\link{DESeq}}, which always occurs on the raw count data, through generalized linear modeling which incorporates knowledge of the variance-mean dependence. The rlog transformation and VST are offered as separate functionality which can be used for visualization, clustering or other machine learning tasks. See the transformation section of the vignette for more details. The transformation does not require that one has already estimated size factors and dispersions. The regularization is on the log fold changes of the count for each sample over an intercept, for each gene. As nearby count values for low counts genes are almost as likely as the observed count, the rlog shrinkage is greater for low counts. For high counts, the rlog shrinkage has a much weaker effect. The fitted dispersions are used rather than the MAP dispersions (so similar to the \code{\link{varianceStabilizingTransformation}}). The prior variance for the shrinkag of log fold changes is calculated as follows: a matrix is constructed of the logarithm of the counts plus a pseudocount of 0.5, the log of the row means is then subtracted, leaving an estimate of the log fold changes per sample over the fitted value using only an intercept. The prior variance is then calculated by matching the upper quantiles of the observed log fold change estimates with an upper quantile of the normal distribution. A GLM fit is then calculated using this prior. It is also possible to supply the variance of the prior. See the vignette for an example of the use and a comparison with \code{varianceStabilizingTransformation}. The transformed values, rlog(K), are equal to \eqn{rlog(K_{ij}) = \log_2(q_{ij}) = \beta_{i0} + \beta_{ij}}{rlog(K_ij) = log2(q_ij) = beta_i0 + beta_ij}, with formula terms defined in \code{\link{DESeq}}. The parameters of the rlog transformation from a previous dataset can be frozen and reapplied to new samples. See the 'Data quality assessment' section of the vignette for strategies to see if new samples are sufficiently similar to previous datasets. The frozen rlog is accomplished by saving the dispersion function, beta prior variance and the intercept from a previous dataset, and running \code{rlog} with 'blind' set to FALSE (see example below). } \examples{ dds <- makeExampleDESeqDataSet(m=6,betaSD=1) rld <- rlog(dds) dists <- dist(t(assay(rld))) plot(hclust(dists)) # run the rlog transformation on one dataset design(dds) <- ~ 1 dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) rld <- rlog(dds, blind=FALSE) # apply the parameters to a new sample ddsNew <- makeExampleDESeqDataSet(m=1) mcols(ddsNew)$dispFit <- mcols(dds)$dispFit betaPriorVar <- attr(rld,"betaPriorVar") intercept <- mcols(rld)$rlogIntercept rldNew <- rlog(ddsNew, blind=FALSE, intercept=intercept, betaPriorVar=betaPriorVar) } \references{ Reference for regularized logarithm (rlog): Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 2014, 15:550. \url{http://dx.doi.org/10.1186/s13059-014-0550-8} } \seealso{ \code{\link{plotPCA}}, \code{\link{varianceStabilizingTransformation}}, \code{\link{normTransform}} } DESeq2/man/show.Rd0000644000175400017540000000101113201671732014640 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{show} \alias{show} \alias{show,DESeqResults-method} \title{Show method for DESeqResults objects} \usage{ \S4method{show}{DESeqResults}(object) } \arguments{ \item{object}{a DESeqResults object} } \description{ Prints out the information from the metadata columns of the results object regarding the log2 fold changes and p-values, then shows the DataFrame using the standard method. } \author{ Michael Love } DESeq2/man/sizeFactors.Rd0000644000175400017540000000214113201671732016161 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{sizeFactors} \alias{sizeFactors} \alias{sizeFactors,DESeqDataSet-method} \alias{sizeFactors<-,DESeqDataSet,numeric-method} \alias{sizeFactors} \title{Accessor functions for the 'sizeFactors' information in a DESeqDataSet object.} \usage{ \S4method{sizeFactors}{DESeqDataSet}(object) \S4method{sizeFactors}{DESeqDataSet,numeric}(object) <- value } \arguments{ \item{object}{a \code{DESeqDataSet} object.} \item{value}{a numeric vector, one size factor for each column in the count data.} } \description{ The sizeFactors vector assigns to each column of the count matrix a value, the size factor, such that count values in the columns can be brought to a common scale by dividing by the corresponding size factor (as performed by \code{counts(dds, normalized=TRUE)}). See \code{\link{DESeq}} for a description of the use of size factors. If gene-specific normalization is desired for each sample, use \code{\link{normalizationFactors}}. } \seealso{ \code{\link{estimateSizeFactors}} } \author{ Simon Anders } DESeq2/man/summary.Rd0000644000175400017540000000143713201671732015371 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/methods.R \docType{methods} \name{summary} \alias{summary} \alias{summary.DESeqResults} \title{Summarize DESeq results} \usage{ \method{summary}{DESeqResults}(object, alpha, \dots) } \arguments{ \item{object}{a \code{\link{DESeqResults}} object} \item{alpha}{the adjusted p-value cutoff. If not set, this defaults to the \code{alpha} argument which was used in \code{\link{results}} to set the target FDR for independent filtering, or if independent filtering was not performed, to 0.1.} \item{...}{additional arguments} } \description{ Print a summary of the results from a DESeq analysis. } \examples{ dds <- makeExampleDESeqDataSet(m=4) dds <- DESeq(dds) res <- results(dds) summary(res) } \author{ Michael Love } DESeq2/man/unmix.Rd0000644000175400017540000000315313201671732015031 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/helper.R \name{unmix} \alias{unmix} \title{Unmix samples using loss in a variance stabilized space} \usage{ unmix(x, pure, alpha, shift, loss = 1, quiet = FALSE) } \arguments{ \item{x}{normalized counts or TPMs of the samples to be unmixed} \item{pure}{normalized counts or TPMs of the "pure" samples} \item{alpha}{for normalized counts, the dispersion of the data when a negative binomial model is fit. this can be found by examining the asymptotic value of \code{dispersionFunction(dds)}, when using \code{fitType="parametric"} or the mean value when using \code{fitType="mean"}.} \item{shift}{for TPMs, the shift which approximately stabilizes the variance of log shifted TPMs. Can be assessed with \code{vsn::meanSdPlot}.} \item{loss}{either 1 (for L1) or 2 (for squared) loss function. Default is 1.} \item{quiet}{suppress progress bar. default is FALSE, show progress bar if pbapply is installed.} } \value{ mixture components for each sample (rows), which sum to 1. } \description{ Unmixes samples in \code{x} according to \code{pure} components, using numerical optimization. The components in \code{pure} are added on the scale of gene expression (either normalized counts, or TPMs). The loss function when comparing fitted expression to the samples in \code{x} occurs in a variance stabilized space. This task is sometimes referred to as "deconvolution", and can be used, for example, to identify contributions from various tissues. Note: if the \code{pbapply} package is installed a progress bar will be displayed while mixing components are fit. } DESeq2/man/varianceStabilizingTransformation.Rd0000644000175400017540000001435313201671732022614 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/vst.R \name{varianceStabilizingTransformation} \alias{varianceStabilizingTransformation} \alias{getVarianceStabilizedData} \alias{getVarianceStabilizedData} \title{Apply a variance stabilizing transformation (VST) to the count data} \usage{ varianceStabilizingTransformation(object, blind = TRUE, fitType = "parametric") getVarianceStabilizedData(object) } \arguments{ \item{object}{a DESeqDataSet or matrix of counts} \item{blind}{logical, whether to blind the transformation to the experimental design. blind=TRUE should be used for comparing samples in an manner unbiased by prior information on samples, for example to perform sample QA (quality assurance). blind=FALSE should be used for transforming data for downstream analysis, where the full use of the design information should be made. blind=FALSE will skip re-estimation of the dispersion trend, if this has already been calculated. If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis.} \item{fitType}{in case dispersions have not yet been estimated for \code{object}, this parameter is passed on to \code{\link{estimateDispersions}} (options described there).} } \value{ \code{varianceStabilizingTransformation} returns a \code{\link{DESeqTransform}} if a \code{DESeqDataSet} was provided, or returns a a matrix if a count matrix was provided. Note that for \code{\link{DESeqTransform}} output, the matrix of transformed values is stored in \code{assay(vsd)}. \code{getVarianceStabilizedData} also returns a matrix. } \description{ This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors), yielding a matrix of values which are now approximately homoskedastic (having constant variance along the range of mean values). The transformation also normalizes with respect to library size. The \code{\link{rlog}} is less sensitive to size factors, which can be an issue when size factors vary widely. These transformations are useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis. } \details{ For each sample (i.e., column of \code{counts(dds)}), the full variance function is calculated from the raw variance (by scaling according to the size factor and adding the shot noise). We recommend a blind estimation of the variance function, i.e., one ignoring conditions. This is performed by default, and can be modified using the 'blind' argument. Note that neither rlog transformation nor the VST are used by the differential expression estimation in \code{\link{DESeq}}, which always occurs on the raw count data, through generalized linear modeling which incorporates knowledge of the variance-mean dependence. The rlog transformation and VST are offered as separate functionality which can be used for visualization, clustering or other machine learning tasks. See the transformation section of the vignette for more details. The transformation does not require that one has already estimated size factors and dispersions. A typical workflow is shown in Section \emph{Variance stabilizing transformation} in the package vignette. If \code{\link{estimateDispersions}} was called with: \code{fitType="parametric"}, a closed-form expression for the variance stabilizing transformation is used on the normalized count data. The expression can be found in the file \file{vst.pdf} which is distributed with the vignette. \code{fitType="local"}, the reciprocal of the square root of the variance of the normalized counts, as derived from the dispersion fit, is then numerically integrated, and the integral (approximated by a spline function) is evaluated for each count value in the column, yielding a transformed value. \code{fitType="mean"}, a VST is applied for Negative Binomial distributed counts, 'k', with a fixed dispersion, 'a': ( 2 asinh(sqrt(a k)) - log(a) - log(4) )/log(2). In all cases, the transformation is scaled such that for large counts, it becomes asymptotically (for large values) equal to the logarithm to base 2 of normalized counts. The variance stabilizing transformation from a previous dataset can be frozen and reapplied to new samples. See the 'Data quality assessment' section of the vignette for strategies to see if new samples are sufficiently similar to previous datasets. The frozen VST is accomplished by saving the dispersion function accessible with \code{\link{dispersionFunction}}, assigning this to the \code{DESeqDataSet} with the new samples, and running varianceStabilizingTransformation with 'blind' set to FALSE (see example below). Then the dispersion function from the previous dataset will be used to transform the new sample(s). Limitations: In order to preserve normalization, the same transformation has to be used for all samples. This results in the variance stabilizition to be only approximate. The more the size factors differ, the more residual dependence of the variance on the mean will be found in the transformed data. \code{\link{rlog}} is a transformation which can perform better in these cases. As shown in the vignette, the function \code{meanSdPlot} from the package \pkg{vsn} can be used to see whether this is a problem. } \examples{ dds <- makeExampleDESeqDataSet(m=6) vsd <- varianceStabilizingTransformation(dds) dists <- dist(t(assay(vsd))) plot(hclust(dists)) # learn the dispersion function of a dataset design(dds) <- ~ 1 dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) # use the previous dispersion function for a new sample ddsNew <- makeExampleDESeqDataSet(m=1) ddsNew <- estimateSizeFactors(ddsNew) dispersionFunction(ddsNew) <- dispersionFunction(dds) vsdNew <- varianceStabilizingTransformation(ddsNew, blind=FALSE) } \references{ Reference for the variance stabilizing transformation for counts with a dispersion trend: Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. \url{http://dx.doi.org/10.1186/gb-2010-11-10-r106} } \seealso{ \code{\link{plotPCA}}, \code{\link{rlog}}, \code{\link{normTransform}} } \author{ Simon Anders } DESeq2/man/vst.Rd0000644000175400017540000000321413201671732014503 0ustar00biocbuildbiocbuild% Generated by roxygen2: do not edit by hand % Please edit documentation in R/vst.R \name{vst} \alias{vst} \title{Quickly estimate dispersion trend and apply a variance stabilizing transformation} \usage{ vst(object, blind = TRUE, nsub = 1000, fitType = "parametric") } \arguments{ \item{object}{a DESeqDataSet or a matrix of counts} \item{blind}{logical, whether to blind the transformation to the experimental design (see \code{\link{varianceStabilizingTransformation}})} \item{nsub}{the number of genes to subset to (default 1000)} \item{fitType}{for estimation of dispersions: this parameter is passed on to \code{\link{estimateDispersions}} (options described there)} } \value{ a DESeqTranform object or a matrix of transformed, normalized counts } \description{ This is a wrapper for the \code{\link{varianceStabilizingTransformation}} (VST) that provides much faster estimation of the dispersion trend used to determine the formula for the VST. The speed-up is accomplished by subsetting to a smaller number of genes in order to estimate this dispersion trend. The subset of genes is chosen deterministically, to span the range of genes' mean normalized count. This wrapper for the VST is not blind to the experimental design: the sample covariate information is used to estimate the global trend of genes' dispersion values over the genes' mean normalized count. It can be made strictly blind to experimental design by first assigning a \code{\link{design}} of \code{~1} before running this function, or by avoiding subsetting and using \code{\link{varianceStabilizingTransformation}}. } \examples{ dds <- makeExampleDESeqDataSet(n=20000, m=20) vsd <- vst(dds) } DESeq2/src/0000755000175400017540000000000013201712502013403 5ustar00biocbuildbiocbuildDESeq2/src/DESeq2.cpp0000644000175400017540000004537613201712502015151 0ustar00biocbuildbiocbuild/* * DESeq2 C++ functions * * Author: Michael I. Love * Last modified: February 22, 2017 * License: LGPL (>= 3) * * Note: The canonical, up-to-date DESeq2.cpp lives in * the DESeq2 library, the development branch of which * can be viewed here: * * https://github.com/Bioconductor-mirror/DESeq2/blob/master/src/DESeq2.cpp */ // include RcppArmadillo and Rcpp #include "RcppArmadillo.h" using namespace Rcpp; // [[Rcpp::depends(RcppArmadillo)]] // user includes #include #include #include // this function returns the log posterior of dispersion parameter alpha, for negative binomial variables // given the counts y, the expected means mu, the design matrix x (used for calculating the Cox-Reid adjustment), // and the parameters for the normal prior on log alpha double log_posterior(double log_alpha, Rcpp::NumericMatrix::Row y, Rcpp::NumericMatrix::Row mu, arma::mat x, double log_alpha_prior_mean, double log_alpha_prior_sigmasq, bool usePrior, Rcpp::NumericMatrix::Row weights, bool useWeights) { double prior_part; double alpha = exp(log_alpha); Rcpp::NumericVector w_diag = pow(pow(mu, -1) + alpha, -1); arma::mat w = arma::diagmat(Rcpp::as(w_diag)); arma::mat b = x.t() * w * x; double cr_term = -0.5 * log(det(b)); double alpha_neg1 = R_pow_di(alpha, -1); double ll_part; if (useWeights) { ll_part = sum(weights * (lgamma(y + alpha_neg1) - Rf_lgammafn(alpha_neg1) - y * log(mu + alpha_neg1) - alpha_neg1 * log(1.0 + mu * alpha))); } else { ll_part = sum(lgamma(y + alpha_neg1) - Rf_lgammafn(alpha_neg1) - y * log(mu + alpha_neg1) - alpha_neg1 * log(1.0 + mu * alpha)); } if (usePrior) { prior_part = -0.5 * R_pow_di(log_alpha - log_alpha_prior_mean,2)/log_alpha_prior_sigmasq; } else { prior_part = 0.0; } double res = ll_part + prior_part + cr_term; return(res); } // this function returns the derivative of the log posterior with respect to the log of the // dispersion parameter alpha, given the same inputs as the previous function double dlog_posterior(double log_alpha, Rcpp::NumericMatrix::Row y, Rcpp::NumericMatrix::Row mu, arma::mat x, double log_alpha_prior_mean, double log_alpha_prior_sigmasq, bool usePrior, Rcpp::NumericMatrix::Row weights, bool useWeights) { double prior_part; double alpha = exp(log_alpha); Rcpp::NumericVector w_diag = pow(pow(mu, -1) + alpha, -1); arma::mat w = arma::diagmat(Rcpp::as(w_diag)); Rcpp::NumericVector dw_diag = -1.0 * pow(pow(mu, -1) + alpha, -2); arma::mat dw = arma::diagmat(Rcpp::as(dw_diag)); arma::mat b = x.t() * w * x; arma::mat db = x.t() * dw * x; double ddetb = ( det(b) * trace(b.i() * db) ); double cr_term = -0.5 * ddetb / det(b); double alpha_neg1 = R_pow_di(alpha, -1); double alpha_neg2 = R_pow_di(alpha, -2); double ll_part; if (useWeights) { ll_part = alpha_neg2 * sum(weights * (Rf_digamma(alpha_neg1) + log(1 + mu*alpha) - mu*alpha*pow(1.0 + mu*alpha, -1) - digamma(y + alpha_neg1) + y * pow(mu + alpha_neg1, -1))); } else { ll_part = alpha_neg2 * sum(Rf_digamma(alpha_neg1) + log(1 + mu*alpha) - mu*alpha*pow(1.0 + mu*alpha, -1) - digamma(y + alpha_neg1) + y * pow(mu + alpha_neg1, -1)); } // only the prior part is w.r.t log alpha if (usePrior) { prior_part = -1.0 * (log_alpha - log_alpha_prior_mean)/log_alpha_prior_sigmasq; } else { prior_part = 0.0; } // Note: return dlog_post/dalpha * alpha because we take derivatives w.r.t log alpha double res = (ll_part + cr_term) * alpha + prior_part; return(res); } // this function returns the second derivative of the log posterior with respect to the log of the // dispersion parameter alpha, given the same inputs as the previous function double d2log_posterior(double log_alpha, Rcpp::NumericMatrix::Row y, Rcpp::NumericMatrix::Row mu, arma::mat x, double log_alpha_prior_mean, double log_alpha_prior_sigmasq, bool usePrior, Rcpp::NumericMatrix::Row weights, bool useWeights) { double prior_part; double alpha = exp(log_alpha); Rcpp::NumericVector w_diag = pow(pow(mu, -1) + alpha, -1); arma::mat w = arma::diagmat(as(w_diag)); Rcpp::NumericVector dw_diag = -1 * pow(pow(mu, -1) + alpha, -2); arma::mat dw = arma::diagmat(as(dw_diag)); Rcpp::NumericVector d2w_diag = 2 * pow(pow(mu, -1) + alpha, -3); arma::mat d2w = arma::diagmat(as(d2w_diag)); arma::mat b = x.t() * w * x; arma::mat b_i = b.i(); arma::mat db = x.t() * dw * x; arma::mat d2b = x.t() * d2w * x; double ddetb = ( det(b) * trace(b.i() * db) ); double d2detb = ( det(b) * (R_pow_di(trace(b_i * db), 2) - trace(b_i * db * b_i * db) + trace(b_i * d2b)) ); double cr_term = 0.5 * R_pow_di(ddetb/det(b), 2) - 0.5 * d2detb / det(b); double alpha_neg1 = R_pow_di(alpha, -1); double alpha_neg2 = R_pow_di(alpha, -2); double ll_part; if (useWeights) { ll_part = -2 * R_pow_di(alpha, -3) * sum(weights * (Rf_digamma(alpha_neg1) + log(1 + mu*alpha) - mu*alpha*pow(1 + mu*alpha, -1) - digamma(y + alpha_neg1) + y * pow(mu + alpha_neg1, -1))) + alpha_neg2 * sum(weights * (-1 * alpha_neg2 * Rf_trigamma(alpha_neg1) + pow(mu, 2) * alpha * pow(1 + mu*alpha, -2) + alpha_neg2 * trigamma(y + alpha_neg1) + alpha_neg2 * y * pow(mu + alpha_neg1, -2))); } else { ll_part = -2 * R_pow_di(alpha, -3) * sum(Rf_digamma(alpha_neg1) + log(1 + mu*alpha) - mu*alpha*pow(1 + mu*alpha, -1) - digamma(y + alpha_neg1) + y * pow(mu + alpha_neg1, -1)) + alpha_neg2 * sum(-1 * alpha_neg2 * Rf_trigamma(alpha_neg1) + pow(mu, 2) * alpha * pow(1 + mu*alpha, -2) + alpha_neg2 * trigamma(y + alpha_neg1) + alpha_neg2 * y * pow(mu + alpha_neg1, -2)); } // only the prior part is w.r.t log alpha if (usePrior) { prior_part = -1.0/log_alpha_prior_sigmasq; } else { prior_part = 0.0; } // Note: return (d2log_post/dalpha2 * alpha^2 + dlog_post/dalpha * alpha) // = (d2log_post/dalpha2 * alpha^2 + dlog_post/dlogalpha) // because we take derivatives w.r.t log alpha double res = ((ll_part + cr_term) * R_pow_di(alpha, 2) + dlog_posterior(log_alpha, y, mu, x, log_alpha_prior_mean, log_alpha_prior_sigmasq, false, weights, useWeights)) + prior_part; return(res); } // Obtain the MLE or MAP dispersion estimate using line search. // fitting occurs on the scale of log(alpha) // // [[Rcpp::export]] Rcpp::List fitDisp(SEXP ySEXP, SEXP xSEXP, SEXP mu_hatSEXP, SEXP log_alphaSEXP, SEXP log_alpha_prior_meanSEXP, SEXP log_alpha_prior_sigmasqSEXP, SEXP min_log_alphaSEXP, SEXP kappa_0SEXP, SEXP tolSEXP, SEXP maxitSEXP, SEXP usePriorSEXP, SEXP weightsSEXP, SEXP useWeightsSEXP) { Rcpp::NumericMatrix y(ySEXP); arma::mat x = Rcpp::as(xSEXP); int y_n = y.nrow(); Rcpp::NumericVector log_alpha(clone(log_alphaSEXP)); Rcpp::NumericMatrix mu_hat(mu_hatSEXP); Rcpp::NumericVector log_alpha_prior_mean(log_alpha_prior_meanSEXP); double log_alpha_prior_sigmasq = Rcpp::as(log_alpha_prior_sigmasqSEXP); double min_log_alpha = Rcpp::as(min_log_alphaSEXP); double kappa_0 = Rcpp::as(kappa_0SEXP); int maxit = Rcpp::as(maxitSEXP); double epsilon = 1.0e-4; double a, a_propose, kappa, lp, lpnew, dlp, theta_kappa, theta_hat_kappa, change; // record log posterior values Rcpp::NumericVector initial_lp(y_n); Rcpp::NumericVector initial_dlp(y_n); Rcpp::NumericVector last_lp(y_n); Rcpp::NumericVector last_dlp(y_n); Rcpp::NumericVector last_d2lp(y_n); Rcpp::NumericVector last_change(y_n); Rcpp::IntegerVector iter(y_n); Rcpp::IntegerVector iter_accept(y_n); double tol = Rcpp::as(tolSEXP); bool usePrior = Rcpp::as(usePriorSEXP); // observation weights Rcpp::NumericMatrix weights(weightsSEXP); bool useWeights = Rcpp::as(useWeightsSEXP); for (int i = 0; i < y_n; i++) { Rcpp::checkUserInterrupt(); Rcpp::NumericMatrix::Row yrow = y(i,_); Rcpp::NumericMatrix::Row mu_hat_row = mu_hat(i,_); // maximize the log likelihood over the variable a, the log of alpha, the dispersion parameter. // in order to express the optimization in a typical manner, // for calculating theta(kappa) we multiple the log likelihood by -1 and seek a minimum a = log_alpha(i); // we use a line search based on the Armijo rule. // define a function theta(kappa) = f(a + kappa * d), where d is the search direction. // in this case the search direction is taken by the first derivative of the log likelihood lp = log_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); dlp = dlog_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); kappa = kappa_0; initial_lp(i) = lp; initial_dlp(i) = dlp; change = -1.0; last_change(i) = -1.0; for (int t = 0; t < maxit; t++) { // iter counts the number of steps taken out of maxit; iter(i)++; a_propose = a + kappa * dlp; // note: lgamma is unstable for values around 1e17, where there is a switch in lgamma.c // we limit log alpha from going lower than -30 if (a_propose < -30.0) { kappa = (-30.0 - a)/dlp; } // note: we limit log alpha from going higher than 10 if (a_propose > 10.0) { kappa = (10.0 - a)/dlp; } theta_kappa = -1.0 * log_posterior(a + kappa*dlp, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); theta_hat_kappa = -1.0 * lp - kappa * epsilon * R_pow_di(dlp, 2); // if this inequality is true, we have satisfied the Armijo rule and // accept the step size kappa, otherwise we halve kappa if (theta_kappa <= theta_hat_kappa) { // iter_accept counts the number of accepted proposals; iter_accept(i)++; a = a + kappa * dlp; lpnew = log_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); // look for change in log likelihood change = lpnew - lp; if (change < tol) { lp = lpnew; break; } // if log(alpha) is going to -infinity // break the loop if (a < min_log_alpha) { break; } lp = lpnew; dlp = dlog_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); // instead of resetting kappa to kappa_0 // multiple kappa by 1.1 kappa = fmin(kappa * 1.1, kappa_0); // every 5 accepts, halve kappa // to prevent slow convergence // due to overshooting if (iter_accept(i) % 5 == 0) { kappa = kappa / 2.0; } } else { kappa = kappa / 2.0; } } last_lp(i) = lp; last_dlp(i) = dlp; last_d2lp(i) = d2log_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); log_alpha(i) = a; // last change indicates the change for the final iteration last_change(i) = change; } return Rcpp::List::create(Rcpp::Named("log_alpha",log_alpha), Rcpp::Named("iter",iter), Rcpp::Named("iter_accept",iter_accept), Rcpp::Named("last_change",last_change), Rcpp::Named("initial_lp",initial_lp), Rcpp::Named("initial_dlp",initial_dlp), Rcpp::Named("last_lp",last_lp), Rcpp::Named("last_dlp",last_dlp), Rcpp::Named("last_d2lp",last_d2lp)); } // fit the Negative Binomial GLM. // note: the betas are on the natural log scale // // [[Rcpp::export]] Rcpp::List fitBeta(SEXP ySEXP, SEXP xSEXP, SEXP nfSEXP, SEXP alpha_hatSEXP, SEXP contrastSEXP, SEXP beta_matSEXP, SEXP lambdaSEXP, SEXP weightsSEXP, SEXP useWeightsSEXP, SEXP tolSEXP, SEXP maxitSEXP, SEXP useQRSEXP) { arma::mat y = Rcpp::as(ySEXP); arma::mat nf = Rcpp::as(nfSEXP); arma::mat x = Rcpp::as(xSEXP); int y_n = y.n_rows; int y_m = y.n_cols; int x_p = x.n_cols; arma::vec alpha_hat = Rcpp::as(alpha_hatSEXP); arma::mat beta_mat = Rcpp::as(beta_matSEXP); arma::mat beta_var_mat = arma::zeros(beta_mat.n_rows, beta_mat.n_cols); arma::mat contrast_num = arma::zeros(beta_mat.n_rows, 1); arma::mat contrast_denom = arma::zeros(beta_mat.n_rows, 1); arma::mat hat_matrix = arma::zeros(x.n_rows, x.n_rows); arma::mat hat_diagonals = arma::zeros(y.n_rows, y.n_cols); arma::colvec lambda = Rcpp::as(lambdaSEXP); arma::colvec contrast = Rcpp::as(contrastSEXP); int maxit = Rcpp::as(maxitSEXP); arma::colvec yrow, nfrow, beta_hat, mu_hat, z; arma::mat w, ridge, sigma; // observation weights arma::mat weights = Rcpp::as(weightsSEXP); bool useWeights = Rcpp::as(useWeightsSEXP); // vars for QR bool useQR = Rcpp::as(useQRSEXP); arma::colvec gamma_hat, big_z; arma::vec big_w_diag; arma::mat weighted_x_ridge, q, r, big_w; // deviance, convergence and tolerance double dev, dev_old, conv_test; double tol = Rcpp::as(tolSEXP); double large = 30.0; Rcpp::NumericVector iter(y_n); Rcpp::NumericVector deviance(y_n); // bound the estimated count, as weights include 1/mu double minmu = 0.5; for (int i = 0; i < y_n; i++) { Rcpp::checkUserInterrupt(); nfrow = nf.row(i).t(); yrow = y.row(i).t(); beta_hat = beta_mat.row(i).t(); mu_hat = nfrow % exp(x * beta_hat); for (int j = 0; j < y_m; j++) { mu_hat(j) = fmax(mu_hat(j), minmu); } ridge = diagmat(lambda); dev = 0.0; dev_old = 0.0; if (useQR) { // make an orthonormal design matrix including // the ridge penalty for (int t = 0; t < maxit; t++) { iter(i)++; if (useWeights) { w = diagmat(weights.row(i).t() % mu_hat/(1.0 + alpha_hat(i) * mu_hat)); } else { w = diagmat(mu_hat/(1.0 + alpha_hat(i) * mu_hat)); } // prepare matrices weighted_x_ridge = join_cols(sqrt(w) * x, sqrt(ridge)); qr(q, r, weighted_x_ridge); big_w_diag = arma::ones(y_m + x_p); big_w_diag(arma::span(0, y_m - 1)) = diagvec(w); big_w = diagmat(big_w_diag); big_z = arma::zeros(y_m + x_p); z = arma::log(mu_hat / nfrow) + (yrow - mu_hat) / mu_hat; big_z(arma::span(0,y_m - 1)) = z; // IRLS with Q matrix for X gamma_hat = q.t() * sqrt(big_w) * big_z; solve(beta_hat, r, gamma_hat); if (sum(abs(beta_hat) > large) > 0) { iter(i) = maxit; break; } mu_hat = nfrow % exp(x * beta_hat); for (int j = 0; j < y_m; j++) { mu_hat(j) = fmax(mu_hat(j), minmu); } dev = 0.0; for (int j = 0; j < y_m; j++) { // note the order for Rf_dnbinom_mu: x, sz, mu, lg if (useWeights) { dev = dev + -2.0 * weights(i,j) * Rf_dnbinom_mu(yrow(j), 1.0/alpha_hat(i), mu_hat(j), 1); } else { dev = dev + -2.0 * Rf_dnbinom_mu(yrow(j), 1.0/alpha_hat(i), mu_hat(j), 1); } } conv_test = fabs(dev - dev_old)/(fabs(dev) + 0.1); if (std::isnan(conv_test)) { iter(i) = maxit; break; } if ((t > 0) & (conv_test < tol)) { break; } dev_old = dev; } } else { // use the standard design matrix x // and matrix inversion for (int t = 0; t < maxit; t++) { iter(i)++; if (useWeights) { w = diagmat(weights.row(i).t() % mu_hat/(1.0 + alpha_hat(i) * mu_hat)); } else { w = diagmat(mu_hat/(1.0 + alpha_hat(i) * mu_hat)); } z = arma::log(mu_hat / nfrow) + (yrow - mu_hat) / mu_hat; solve(beta_hat, x.t() * w * x + ridge, x.t() * w * z); if (sum(abs(beta_hat) > large) > 0) { iter(i) = maxit; break; } mu_hat = nfrow % exp(x * beta_hat); for (int j = 0; j < y_m; j++) { mu_hat(j) = fmax(mu_hat(j), minmu); } dev = 0.0; for (int j = 0; j < y_m; j++) { // note the order for Rf_dnbinom_mu: x, sz, mu, lg if (useWeights) { dev = dev + -2.0 * weights(i,j) * Rf_dnbinom_mu(yrow(j), 1.0/alpha_hat(i), mu_hat(j), 1); } else { dev = dev + -2.0 * Rf_dnbinom_mu(yrow(j), 1.0/alpha_hat(i), mu_hat(j), 1); } } conv_test = fabs(dev - dev_old)/(fabs(dev) + 0.1); if (std::isnan(conv_test)) { iter(i) = maxit; break; } if ((t > 0) & (conv_test < tol)) { break; } dev_old = dev; } } deviance(i) = dev; beta_mat.row(i) = beta_hat.t(); // recalculate w so that this is identical if we start with beta_hat if (useWeights) { w = diagmat(weights.row(i).t() % mu_hat/(1.0 + alpha_hat(i) * mu_hat)); } else { w = diagmat(mu_hat/(1.0 + alpha_hat(i) * mu_hat)); } hat_matrix = sqrt(w) * x * (x.t() * w * x + ridge).i() * x.t() * sqrt(w); hat_diagonals.row(i) = diagvec(hat_matrix).t(); // sigma is the covariance matrix for the betas sigma = (x.t() * w * x + ridge).i() * x.t() * w * x * (x.t() * w * x + ridge).i(); contrast_num.row(i) = contrast.t() * beta_hat; contrast_denom.row(i) = sqrt(contrast.t() * sigma * contrast); beta_var_mat.row(i) = diagvec(sigma).t(); } return Rcpp::List::create(Rcpp::Named("beta_mat",beta_mat), Rcpp::Named("beta_var_mat",beta_var_mat), Rcpp::Named("iter",iter), Rcpp::Named("hat_diagonals",hat_diagonals), Rcpp::Named("contrast_num",contrast_num), Rcpp::Named("contrast_denom",contrast_denom), Rcpp::Named("deviance",deviance)); } // [[Rcpp::export]] Rcpp::List fitDispGrid(SEXP ySEXP, SEXP xSEXP, SEXP mu_hatSEXP, SEXP disp_gridSEXP, SEXP log_alpha_prior_meanSEXP, SEXP log_alpha_prior_sigmasqSEXP, SEXP usePriorSEXP, SEXP weightsSEXP, SEXP useWeightsSEXP) { Rcpp::NumericMatrix y(ySEXP); arma::mat x = Rcpp::as(xSEXP); int y_n = y.nrow(); Rcpp::NumericMatrix mu_hat(mu_hatSEXP); arma::vec disp_grid = Rcpp::as(disp_gridSEXP); int disp_grid_n = disp_grid.n_elem; Rcpp::NumericVector log_alpha_prior_mean(log_alpha_prior_meanSEXP); double log_alpha_prior_sigmasq = Rcpp::as(log_alpha_prior_sigmasqSEXP); bool usePrior = Rcpp::as(usePriorSEXP); double a; double delta = disp_grid(1) - disp_grid(0); double a_hat; arma::vec disp_grid_fine; arma::vec logpostvec = arma::zeros(disp_grid.n_elem); arma::vec log_alpha = arma::zeros(y_n); arma::uword idxmax; // observation weights Rcpp::NumericMatrix weights(weightsSEXP); bool useWeights = Rcpp::as(useWeightsSEXP); for (int i = 0; i < y_n; i++) { Rcpp::checkUserInterrupt(); Rcpp::NumericMatrix::Row yrow = y(i,_); Rcpp::NumericMatrix::Row mu_hat_row = mu_hat(i,_); for (int t = 0; t < disp_grid_n; t++) { // maximize the log likelihood over the variable a, the log of alpha, the dispersion parameter a = disp_grid(t); logpostvec(t) = log_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); } logpostvec.max(idxmax); a_hat = disp_grid(idxmax); disp_grid_fine = arma::linspace(a_hat - delta, a_hat + delta, disp_grid_n); for (int t = 0; t < disp_grid_n; t++) { a = disp_grid_fine(t); logpostvec(t) = log_posterior(a, yrow, mu_hat_row, x, log_alpha_prior_mean(i), log_alpha_prior_sigmasq, usePrior, weights.row(i), useWeights); } logpostvec.max(idxmax); log_alpha(i) = disp_grid_fine(idxmax); } return Rcpp::List::create(Rcpp::Named("log_alpha",log_alpha)); } DESeq2/src/Makevars0000644000175400017540000000006013201712502015073 0ustar00biocbuildbiocbuildPKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) DESeq2/src/Makevars.win0000644000175400017540000000006113201712502015670 0ustar00biocbuildbiocbuild PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) DESeq2/src/RcppExports.cpp0000644000175400017540000001337113201712502016405 0ustar00biocbuildbiocbuild// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include #include using namespace Rcpp; // fitDisp Rcpp::List fitDisp(SEXP ySEXP, SEXP xSEXP, SEXP mu_hatSEXP, SEXP log_alphaSEXP, SEXP log_alpha_prior_meanSEXP, SEXP log_alpha_prior_sigmasqSEXP, SEXP min_log_alphaSEXP, SEXP kappa_0SEXP, SEXP tolSEXP, SEXP maxitSEXP, SEXP usePriorSEXP, SEXP weightsSEXP, SEXP useWeightsSEXP); RcppExport SEXP _DESeq2_fitDisp(SEXP ySEXPSEXP, SEXP xSEXPSEXP, SEXP mu_hatSEXPSEXP, SEXP log_alphaSEXPSEXP, SEXP log_alpha_prior_meanSEXPSEXP, SEXP log_alpha_prior_sigmasqSEXPSEXP, SEXP min_log_alphaSEXPSEXP, SEXP kappa_0SEXPSEXP, SEXP tolSEXPSEXP, SEXP maxitSEXPSEXP, SEXP usePriorSEXPSEXP, SEXP weightsSEXPSEXP, SEXP useWeightsSEXPSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< SEXP >::type ySEXP(ySEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type xSEXP(xSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type mu_hatSEXP(mu_hatSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type log_alphaSEXP(log_alphaSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type log_alpha_prior_meanSEXP(log_alpha_prior_meanSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type log_alpha_prior_sigmasqSEXP(log_alpha_prior_sigmasqSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type min_log_alphaSEXP(min_log_alphaSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type kappa_0SEXP(kappa_0SEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type tolSEXP(tolSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type maxitSEXP(maxitSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type usePriorSEXP(usePriorSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type weightsSEXP(weightsSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type useWeightsSEXP(useWeightsSEXPSEXP); rcpp_result_gen = Rcpp::wrap(fitDisp(ySEXP, xSEXP, mu_hatSEXP, log_alphaSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, min_log_alphaSEXP, kappa_0SEXP, tolSEXP, maxitSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP)); return rcpp_result_gen; END_RCPP } // fitBeta Rcpp::List fitBeta(SEXP ySEXP, SEXP xSEXP, SEXP nfSEXP, SEXP alpha_hatSEXP, SEXP contrastSEXP, SEXP beta_matSEXP, SEXP lambdaSEXP, SEXP weightsSEXP, SEXP useWeightsSEXP, SEXP tolSEXP, SEXP maxitSEXP, SEXP useQRSEXP); RcppExport SEXP _DESeq2_fitBeta(SEXP ySEXPSEXP, SEXP xSEXPSEXP, SEXP nfSEXPSEXP, SEXP alpha_hatSEXPSEXP, SEXP contrastSEXPSEXP, SEXP beta_matSEXPSEXP, SEXP lambdaSEXPSEXP, SEXP weightsSEXPSEXP, SEXP useWeightsSEXPSEXP, SEXP tolSEXPSEXP, SEXP maxitSEXPSEXP, SEXP useQRSEXPSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< SEXP >::type ySEXP(ySEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type xSEXP(xSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type nfSEXP(nfSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type alpha_hatSEXP(alpha_hatSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type contrastSEXP(contrastSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type beta_matSEXP(beta_matSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type lambdaSEXP(lambdaSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type weightsSEXP(weightsSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type useWeightsSEXP(useWeightsSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type tolSEXP(tolSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type maxitSEXP(maxitSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type useQRSEXP(useQRSEXPSEXP); rcpp_result_gen = Rcpp::wrap(fitBeta(ySEXP, xSEXP, nfSEXP, alpha_hatSEXP, contrastSEXP, beta_matSEXP, lambdaSEXP, weightsSEXP, useWeightsSEXP, tolSEXP, maxitSEXP, useQRSEXP)); return rcpp_result_gen; END_RCPP } // fitDispGrid Rcpp::List fitDispGrid(SEXP ySEXP, SEXP xSEXP, SEXP mu_hatSEXP, SEXP disp_gridSEXP, SEXP log_alpha_prior_meanSEXP, SEXP log_alpha_prior_sigmasqSEXP, SEXP usePriorSEXP, SEXP weightsSEXP, SEXP useWeightsSEXP); RcppExport SEXP _DESeq2_fitDispGrid(SEXP ySEXPSEXP, SEXP xSEXPSEXP, SEXP mu_hatSEXPSEXP, SEXP disp_gridSEXPSEXP, SEXP log_alpha_prior_meanSEXPSEXP, SEXP log_alpha_prior_sigmasqSEXPSEXP, SEXP usePriorSEXPSEXP, SEXP weightsSEXPSEXP, SEXP useWeightsSEXPSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< SEXP >::type ySEXP(ySEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type xSEXP(xSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type mu_hatSEXP(mu_hatSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type disp_gridSEXP(disp_gridSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type log_alpha_prior_meanSEXP(log_alpha_prior_meanSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type log_alpha_prior_sigmasqSEXP(log_alpha_prior_sigmasqSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type usePriorSEXP(usePriorSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type weightsSEXP(weightsSEXPSEXP); Rcpp::traits::input_parameter< SEXP >::type useWeightsSEXP(useWeightsSEXPSEXP); rcpp_result_gen = Rcpp::wrap(fitDispGrid(ySEXP, xSEXP, mu_hatSEXP, disp_gridSEXP, log_alpha_prior_meanSEXP, log_alpha_prior_sigmasqSEXP, usePriorSEXP, weightsSEXP, useWeightsSEXP)); return rcpp_result_gen; END_RCPP } static const R_CallMethodDef CallEntries[] = { {"_DESeq2_fitDisp", (DL_FUNC) &_DESeq2_fitDisp, 13}, {"_DESeq2_fitBeta", (DL_FUNC) &_DESeq2_fitBeta, 12}, {"_DESeq2_fitDispGrid", (DL_FUNC) &_DESeq2_fitDispGrid, 9}, {NULL, NULL, 0} }; RcppExport void R_init_DESeq2(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } DESeq2/tests/0000755000175400017540000000000013201671732013767 5ustar00biocbuildbiocbuildDESeq2/tests/testthat/0000755000175400017540000000000013201671732015627 5ustar00biocbuildbiocbuildDESeq2/tests/testthat.R0000644000175400017540000000007313201671732015752 0ustar00biocbuildbiocbuildlibrary("testthat") library("DESeq2") test_check("DESeq2") DESeq2/tests/testthat/test_1vs1.R0000644000175400017540000000024013201671732017577 0ustar00biocbuildbiocbuildcontext("1vs1") test_that("1 vs 1 gets warning", { dds <- makeExampleDESeqDataSet(n=100, m=2) expect_warning({ dds <- DESeq(dds)}) res <- results(dds) }) DESeq2/tests/testthat/test_DESeq.R0000644000175400017540000000126513201671732017756 0ustar00biocbuildbiocbuildcontext("DESeq") test_that("DESeq() gives correct errors", { dds <- makeExampleDESeqDataSet(n=100, m=8) expect_error(DESeq(dds, test="LRT")) expect_error(DESeq(dds, test="Wald", full=~condition, reduced=~1)) expect_error(DESeq(dds, full=~1)) m <- model.matrix(~ condition, colData(dds)) expect_error(DESeq(dds, test="LRT", full=m, reduced=~1)) expect_error(DESeq(dds, test="LRT", full=m, reduced=m)) expect_error(DESeq(dds, full=m, betaPrior=TRUE)) design(dds) <- ~ 0 + condition expect_error(DESeq(dds, betaPrior=TRUE)) dds <- makeExampleDESeqDataSet(n=100) dds$condition <- factor(rep(c("A","B","C"),each=4)) dds <- dds[,1:8] expect_error(DESeq(dds)) }) DESeq2/tests/testthat/test_LRT.R0000644000175400017540000000073313201671732017455 0ustar00biocbuildbiocbuildcontext("LRT") test_that("test='LRT' gives correct errors", { dds <- makeExampleDESeqDataSet(n=100, m=4) dds$group <- factor(c(1,2,1,2)) design(dds) <- ~ condition expect_error(DESeq(dds, test="LRT", reduced=~group)) expect_error(DESeq(dds, test="LRT", reduced=~1, modelMatrixType="expanded")) expect_error(DESeq(dds,test="LRT",reduced=~group, betaPrior=TRUE)) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) expect_error(nbinomLRT(dds)) }) DESeq2/tests/testthat/test_QR.R0000644000175400017540000000051313201671732017332 0ustar00biocbuildbiocbuildcontext("QR") test_that("not using QR works as expected", { set.seed(1) dds <- makeExampleDESeqDataSet(n=100,betaSD=1) dds <- DESeq(dds, quiet=TRUE) ddsNoQR <- nbinomWaldTest(dds, useQR=FALSE) res <- results(dds) resNoQR <- results(ddsNoQR) expect_equal(res$log2FoldChange, resNoQR$log2FoldChange, tolerance=1e-6) }) DESeq2/tests/testthat/test_addMLE.R0000644000175400017540000000144013201671732020076 0ustar00biocbuildbiocbuildcontext("addMLE") test_that("adding MLE works as expected", { set.seed(1) dds <- makeExampleDESeqDataSet(n=200,m=12,betaSD=1) dds$condition <- factor(rep(letters[1:3],each=4)) dds <- DESeq(dds, betaPrior=TRUE) ddsNP <- nbinomWaldTest(dds, betaPrior=FALSE) res1 <- results(dds, contrast=c("condition","c","a"), addMLE=TRUE) res2 <- results(ddsNP, contrast=c("condition","c","a")) expect_equal(res1$lfcMLE, res2$log2FoldChange) res1 <- results(dds, contrast=c("condition","a","b"), addMLE=TRUE) res2 <- results(ddsNP, contrast=c("condition","a","b")) expect_equal(res1$lfcMLE, res2$log2FoldChange) res1 <- results(dds, contrast=c("condition","c","b"), addMLE=TRUE) res2 <- results(ddsNP, contrast=c("condition","c","b")) expect_equal(res1$lfcMLE, res2$log2FoldChange) }) DESeq2/tests/testthat/test_betaFitting.R0000644000175400017540000000320613201671732021252 0ustar00biocbuildbiocbuildcontext("betaFitting") test_that("estimates of beta fit from various methods are equal", { # test for equivalence of DESeq2 estimates with those # found using IRLS code and using optim m <- 10 set.seed(1) y <- rpois(m,20) sf <- rep(1,m) condition <- factor(rep(0:1,each=m/2)) x <- cbind(rep(1,m),rep(0:1,each=m/2)) lambda <- 2 alpha <- .5 dds <- DESeqDataSetFromMatrix(matrix(y,nrow=1), colData=DataFrame(condition), design= ~ condition) sizeFactors(dds) <- sf dispersions(dds) <- alpha mcols(dds)$baseMean <- mean(y) # for testing we convert beta to the naturual log scale: # convert lambda from log to log2 scale by multiplying by log(2)^2 # then convert beta back from log2 to log scale by multiplying by log(2) betaDESeq <- log(2)*DESeq2:::fitNbinomGLMs(dds, lambda=c(0,lambda*log(2)^2))$betaMatrix # the IRLS algorithm betaIRLS <- c(1,1) for (t in 1:100) { mu.hat <- as.vector(sf * exp(x %*% betaIRLS)) w <- diag(1/(1/mu.hat^2 * ( mu.hat + alpha * mu.hat^2 ))) z <- log(mu.hat/sf) + (y - mu.hat)/mu.hat ridge <- diag(c(0,lambda)) betaIRLS <- as.vector(solve(t(x) %*% w %*% x + ridge) %*% t(x) %*% w %*% z) } # using optim objectiveFn <- function(p) { mu <- exp(x %*% p) logLike <- sum(dnbinom(y, mu=mu, size=1/alpha, log=TRUE)) prior <- dnorm(p[2], 0, sqrt(1/lambda),log=TRUE) -1 * (logLike + prior) } betaOptim <- optim(c(.1,.1), objectiveFn, control=list(reltol=1e-16))$par expect_equal(as.numeric(betaDESeq), betaIRLS, tolerance=1e-6) expect_equal(as.numeric(betaDESeq), betaOptim, tolerance=1e-6) }) DESeq2/tests/testthat/test_collapse.R0000644000175400017540000000052213201671732020612 0ustar00biocbuildbiocbuildcontext("collapse") test_that("collapse replicates works", { dds <- makeExampleDESeqDataSet(n=10, m=8) dds$sample <- rep(1:4, each=2) dds$run <- 1:8 dds2 <- collapseReplicates(dds, groupby=dds$sample, run=dds$run) expect_true(all(counts(dds2)[,1] == rowSums(counts(dds)[,1:2]))) expect_true(dds2$runsCollapsed[1] == "1,2") }) DESeq2/tests/testthat/test_construction_errors.R0000644000175400017540000000430413201671732023140 0ustar00biocbuildbiocbuildcontext("construction_errors") test_that("proper errors thrown in object construction", { coldata <- DataFrame(x=factor(c("A","A","B","B")), xx=factor(c("A","A","B","B ")), name=letters[1:4], ident=factor(rep("A",4)), num=1:4, missinglevels=factor(c("A","A","B","B"), levels=c("A","B","C")), notref=factor(c("control","control","abc","abc")), row.names=1:4) counts <- matrix(1:16, ncol=4) expect_message(DESeqDataSet(SummarizedExperiment(list(foo=counts), colData=coldata), ~ x)) expect_error(DESeqDataSetFromMatrix(matrix(c(1:11,-1),ncol=4), coldata, ~ x)) expect_error(DESeqDataSetFromMatrix(matrix(c(1:11,0.5),ncol=4), coldata, ~ x)) expect_error(DESeqDataSetFromMatrix(matrix(rep(0,16),ncol=4), coldata, ~ x)) expect_warning(DESeqDataSetFromMatrix(matrix(rep(1:4,4),ncol=4), coldata, ~ x)) expect_warning(DESeqDataSetFromMatrix(matrix(1:16, ncol=4, dimnames=list(c(1,2,3,3),1:4)), coldata, ~ x)) expect_error(DESeqDataSetFromMatrix(counts, coldata, ~ y)) expect_warning(DESeqDataSetFromMatrix(counts, coldata, ~ name)) expect_error(DESeqDataSetFromMatrix(counts, coldata, ~ ident)) expect_message(DESeqDataSetFromMatrix(counts, coldata, ~ num)) expect_message(DESeqDataSetFromMatrix(counts, coldata, ~ missinglevels)) expect_message(DESeqDataSetFromMatrix(counts, coldata, ~ notref)) expect_error(DESeqDataSetFromMatrix(counts, coldata, ~ident + x), "design contains") expect_message(DESeqDataSetFromMatrix(counts, coldata, ~xx), "characters other than") # same colnames but in different order: expect_error(DESeqDataSetFromMatrix(matrix(1:16, ncol=4, dimnames=list(1:4, 4:1)), coldata, ~ x)) # testing incoming metadata columns coldata <- DataFrame(x=factor(c("A","A","B","B"))) rowranges <- GRanges("1", IRanges(1 + 0:3 * 10, width=10)) se <- SummarizedExperiment(list(counts=counts), colData=coldata, rowRanges=rowranges) mcols(colData(se)) <- DataFrame(info="x is a factor") mcols(se)$id <- 1:4 mcols(mcols(se)) <- DataFrame(info="the gene id") dds <- DESeqDataSet(se, ~ x) mcols(colData(dds)) mcols(mcols(dds)) }) DESeq2/tests/testthat/test_counts_input.R0000644000175400017540000000117013201671732021542 0ustar00biocbuildbiocbuildcontext("counts_input") test_that("counts can be supplied as input (tidy or not)", { # count matrix input cnts <- matrix(rnbinom(40,mu=100,size=2),ncol=4) mode(cnts) <- "integer" coldata <- data.frame(cond=factor(c("A","A","B","B"))) dds <- DESeqDataSetFromMatrix(cnts, coldata, ~cond) # tidy data frame input gene.names <- paste0("gene",1:10) rownames(coldata) <- colnames(cnts) <- letters[1:4] tidy.counts <- cbind(gene.names, as.data.frame(cnts)) dds <- DESeqDataSetFromMatrix(tidy.counts, coldata, ~cond, tidy=TRUE) expect_true(all(rownames(dds) == gene.names)) expect_true(all(counts(dds) == cnts)) }) DESeq2/tests/testthat/test_custom_filt.R0000644000175400017540000000157413201671732021350 0ustar00biocbuildbiocbuildcontext("custom_filt") test_that("custom filters can be provided to results()", { # try a custom filter function set.seed(1) dds <- makeExampleDESeqDataSet(n=200, m=4, betaSD=rep(c(0,2),c(150,50))) dds <- DESeq(dds) res <- results(dds) method <- "BH" alpha <- 0.1 customFilt <- function(res, filter, alpha, method) { if (missing(filter)) { filter <- res$baseMean } theta <- 0:10/10 cutoff <- quantile(filter, theta) numRej <- sapply(cutoff, function(x) sum(p.adjust(res$pvalue[filter > x]) < alpha, na.rm=TRUE)) threshold <- theta[which(numRej == max(numRej))[1]] res$padj <- numeric(nrow(res)) idx <- filter > quantile(filter, threshold) res$padj[!idx] <- NA res$padj[idx] <- p.adjust(res$pvalue[idx], method=method) res } resCustom <- results(dds, filterFun=customFilt) plot(res$padj, resCustom$padj);abline(0,1) }) DESeq2/tests/testthat/test_disp_fit.R0000644000175400017540000000757713201671732020632 0ustar00biocbuildbiocbuildcontext("disp_fit") test_that("the fitting of dispersion gives expected values using various methods", { # test the optimization of the logarithm of dispersion (alpha) # parameter with Cox-Reid adjustment and prior distribution. # also test the derivatives of the log posterior w.r.t. log alpha m <- 10 set.seed(1) y <- rpois(m,20) sf <- rep(1,m) condition <- factor(rep(0:1,each=m/2)) x <- cbind(rep(1,m),rep(0:1,each=m/2)) colnames(x) <- c("Intercept","condition") lambda <- 2 alpha <- .5 # make a DESeqDataSet but don't use the design formula # instead we supply a model matrix below dds <- DESeqDataSetFromMatrix(matrix(y,nrow=1), colData=DataFrame(condition), design= ~ condition) sizeFactors(dds) <- sf dispersions(dds) <- alpha mcols(dds)$baseMean <- mean(y) # for testing we convert beta to the naturual log scale: # convert lambda from log to log2 scale by multiplying by log(2)^2 # then convert beta back from log2 to log scale by multiplying by log(2) betaDESeq <- log(2)*DESeq2:::fitNbinomGLMs(dds, lambda=c(0,lambda*log(2)^2),modelMatrix=x)$betaMatrix log_alpha_prior_mean <- .5 log_alpha_prior_sigmasq <- 1 mu.hat <- as.numeric(exp(x %*% t(betaDESeq))) dispRes <- DESeq2:::fitDisp(ySEXP = matrix(y,nrow=1), xSEXP = x, mu_hatSEXP = matrix(mu.hat,nrow=1), log_alphaSEXP = 0, log_alpha_prior_meanSEXP = log_alpha_prior_mean, log_alpha_prior_sigmasqSEXP = log_alpha_prior_sigmasq, min_log_alphaSEXP = log(1e-8), kappa_0SEXP = 1, tolSEXP = 1e-16, maxitSEXP = 100, usePriorSEXP = TRUE, weightsSEXP=matrix(1,nrow=1,ncol=length(y)), useWeightsSEXP=FALSE) # maximum a posteriori (MAP) estimate from DESeq dispDESeq <- dispRes$log_alpha # MAP estimate using optim logPost <- function(log.alpha) { alpha <- exp(log.alpha) w <- diag(1/(1/mu.hat^2 * ( mu.hat + alpha * mu.hat^2 ))) logLike <- sum(dnbinom(y, mu=mu.hat, size=1/alpha, log=TRUE)) coxReid <- -.5*(log(det(t(x) %*% w %*% x))) logPrior <- dnorm(log.alpha, log_alpha_prior_mean, sqrt(log_alpha_prior_sigmasq), log=TRUE) (logLike + coxReid + logPrior) } dispOptim <- optim(0, function(p) -1*logPost(p), control=list(reltol=1e-16), method="Brent", lower=-10, upper=10)$par expect_equal(dispDESeq, dispOptim, tolerance=1e-6) # check derivatives: # from Ted Harding https://stat.ethz.ch/pipermail/r-help/2007-September/140013.html num.deriv <- function(f,x,h=0.001) (f(x + h/2) - f(x-h/2))/h num.2nd.deriv <- function(f,x,h=0.001) (f(x + h) - 2*f(x) + f(x - h))/h^2 # first derivative of log posterior w.r.t log alpha at start dispDerivDESeq <- dispRes$initial_dlp dispDerivNum <- num.deriv(logPost,0) expect_equal(dispDerivDESeq, dispDerivNum, tolerance=1e-6) # second derivative at finish dispD2DESeq <- dispRes$last_d2lp dispD2Num <- num.2nd.deriv(logPost, dispRes$log_alpha) expect_equal(dispD2DESeq, dispD2Num, tolerance=1e-6) # test fit alternative dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) ddsLocal <- estimateDispersions(dds, fitType="local") ddsMean <- estimateDispersions(dds, fitType="mean") ddsMed <- estimateDispersionsGeneEst(dds) useForMedian <- mcols(ddsMed)$dispGeneEst > 1e-7 medianDisp <- median(mcols(ddsMed)$dispGeneEst[useForMedian],na.rm=TRUE) dispersionFunction(ddsMed) <- function(mu) medianDisp ddsMed <- estimateDispersionsMAP(ddsMed) # test iterative set.seed(1) dds <- makeExampleDESeqDataSet(m=50,n=100,betaSD=1,interceptMean=8) dds <- estimateSizeFactors(dds) dds <- estimateDispersionsGeneEst(dds, niter=5) with(mcols(dds)[!mcols(dds)$allZero,], expect_equal(log(trueDisp), log(dispGeneEst),tol=0.2)) }) DESeq2/tests/testthat/test_dispersions.R0000644000175400017540000000210113201671732021345 0ustar00biocbuildbiocbuildcontext("dispersions") test_that("expected errors thrown during dispersion estimation", { dds <- makeExampleDESeqDataSet(n=100, m=2) dds <- estimateSizeFactors(dds) expect_error(estimateDispersionsGeneEst(dds)) set.seed(1) dds <- makeExampleDESeqDataSet(n=100, m=4, dispMeanRel=function(x) 0.001 + x/1e3, interceptMean=8, interceptSD=2) dds <- estimateSizeFactors(dds) mcols(dds)$dispGeneEst <- rep(1e-7, 100) expect_error(estimateDispersionsFit(dds)) dds <- estimateDispersionsGeneEst(dds) expect_message(estimateDispersionsFit(dds)) dds <- makeExampleDESeqDataSet(n=100, m=4) dds <- estimateSizeFactors(dds) mcols(dds)$dispGeneEst <- rep(1e-7, 100) dispersionFunction(dds) <- function(x) 1e-6 expect_warning(estimateDispersionsMAP(dds)) dds <- makeExampleDESeqDataSet(n=100, m=4) dds <- estimateSizeFactors(dds) levels(dds$condition) <- c("A","B","C") expect_error(estimateDispersions(dds)) dds$condition <- droplevels(dds$condition) dds$group <- dds$condition design(dds) <- ~ group + condition expect_error(estimateDispersions(dds)) }) DESeq2/tests/testthat/test_edge_case.R0000644000175400017540000000242613201671732020714 0ustar00biocbuildbiocbuildcontext("edge_case") test_that("edge cases work or throw proper errors", { # one row set.seed(1) dds <- makeExampleDESeqDataSet(n=1) sizeFactors(dds) <- rep(1,ncol(dds)) dispersions(dds) <- .5 dds <- nbinomWaldTest(dds) res <- results(dds) dds <- nbinomLRT(dds, reduced=~1) res <- results(dds) # only intercept set.seed(1) dds <- makeExampleDESeqDataSet(n=100) design(dds) <- ~ 1 expect_warning({dds <- DESeq(dds)}) res <- results(dds) # metadata insertion dds <- makeExampleDESeqDataSet(n=50,m=4) dds2 <- DESeqDataSetFromMatrix( counts(dds), colData(dds), design(dds) ) mcols(dds2)$foo <- paste( "bar", 1:nrow(dds2) ) dds2 <- DESeq(dds2) results(dds2) expect_true(class(mcols(mcols(dds2))$type) == "character") dds3 <- DESeqDataSetFromMatrix( counts(dds), DataFrame(row.names=colnames(dds)), ~ 1 ) dds3$test <- 1:ncol(dds3) dds3 <- estimateSizeFactors(dds3) expect_true(class(mcols(colData(dds3))$type) == "character") # underscores dds <- makeExampleDESeqDataSet(n=50,m=4) levels(dds$condition) <- c("A_1","B_2") dds$exp_cond <- dds$condition design(dds) <- ~ exp_cond dds <- DESeq(dds) results(dds) # NA in colData dds <- makeExampleDESeqDataSet(n=50,m=4) colData(dds)$condition[4] <- NA expect_error(DESeq(dds)) }) DESeq2/tests/testthat/test_factors.R0000644000175400017540000000065513201671732020460 0ustar00biocbuildbiocbuildcontext("factors") test_that("bad factor in design throw errors", { dds <- makeExampleDESeqDataSet(n=100, m=6) levels(dds$condition) <- c("test-","test+") expect_error(DESeq(dds)) dds <- makeExampleDESeqDataSet(n=100, m=6) dds$condition <- factor(rep(letters[1:3], each=2), ordered=TRUE) expect_error(DESeq(dds)) mm <- model.matrix(~ condition, data=colData(dds)) dds <- DESeq(dds, full=mm) # betaPrior=FALSE }) DESeq2/tests/testthat/test_fpkm.R0000644000175400017540000000056313201671732017752 0ustar00biocbuildbiocbuildcontext("fpkm") test_that("fpkm works as expected", { dds <- DESeqDataSetFromMatrix(matrix(c(1:4,2 * 1:4), ncol=2), DataFrame(x=1:2), ~ 1) rowRanges(dds) <- GRanges("1", IRanges(start=0:3 * 10 + 1, width=10)) expect_equal(fpkm(dds)[1,1], 1e5 * 100, tolerance=.1) expect_equal(fpm(dds)[1,1], 1e5, tolerance=.1) expect_equal(fpm(dds, robust=FALSE)[1,1], 1e5) }) DESeq2/tests/testthat/test_frozen_transform.R0000644000175400017540000000201313201671732022403 0ustar00biocbuildbiocbuildcontext("frozen_transform") test_that("frozen transforms works", { set.seed(1) dds <- makeExampleDESeqDataSet(n=100) design(dds) <- ~ 1 dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) expect_warning(ddsNew <- makeExampleDESeqDataSet(m=1,n=100)) counts(ddsNew)[,1] <- counts(dds)[,1] sizeFactors(ddsNew)[1] <- sizeFactors(dds)[1] # VST vsd <- varianceStabilizingTransformation(dds, blind=FALSE) dispersionFunction(ddsNew) <- dispersionFunction(dds) vsdNew <- varianceStabilizingTransformation(ddsNew, blind=FALSE) expect_equal(assay(vsd)[,1],assay(vsdNew)[,1],tolerance=1e-3) # rlog rld <- rlogTransformation(dds, blind=FALSE) mcols(ddsNew)$dispFit <- mcols(dds)$dispFit betaPriorVar <- attr(rld,"betaPriorVar") intercept <- mcols(rld)$rlogIntercept rldNew <- rlogTransformation(ddsNew, blind=FALSE, betaPriorVar=betaPriorVar, intercept=intercept) expect_equal(assay(rld)[,1],assay(rldNew)[,1],tolerance=1e-3) }) DESeq2/tests/testthat/test_htseq.R0000644000175400017540000000063113201671732020135 0ustar00biocbuildbiocbuildcontext("htseq") test_that("htseq", { dir <- system.file(package="pasilla", "extdata") files <- grep("treated",list.files(dir),value=TRUE) sampleTable <- data.frame(id=seq_along(files), files, condition=factor(rep(c("t","u"),c(3,4)))) setwd(dir) expect_error(DESeqDataSetFromHTSeqCount(sampleTable)) dds <- DESeqDataSetFromHTSeqCount(sampleTable, design=~condition) }) DESeq2/tests/testthat/test_interactions.R0000644000175400017540000000131713201671732021515 0ustar00biocbuildbiocbuildcontext("interactions") test_that("interactions throw error", { dds <- makeExampleDESeqDataSet(n=100,m=8) colData(dds)$group <- factor(rep(c("X","Y"),times=ncol(dds)/2)) design(dds) <- ~ condition + group + condition:group dds <- DESeq(dds) expect_equal(resultsNames(dds)[4], "conditionB.groupY") # interactions error expect_error(DESeq(dds, betaPrior=TRUE), "designs with interactions") # also lfcShrink res <- results(dds, name="conditionB.groupY") expect_error(res <- lfcShrink(dds, coef=4, res=res), "not implemented") res <- results(dds, contrast=c("condition","B","A")) expect_error(res <- lfcShrink(dds, contrast=c("condition","B","A"), res=res), "not implemented") }) DESeq2/tests/testthat/test_lfcShrink.R0000644000175400017540000000454213201671732020741 0ustar00biocbuildbiocbuildcontext("lfcShrink") test_that("LFC shrinkage works", { dds <- makeExampleDESeqDataSet(betaSD=1) dds <- estimateSizeFactors(dds) expect_error(lfcShrink(dds, 2, 1)) dds <- estimateDispersions(dds) dds <- DESeq(dds) res <- results(dds) res.shr <- lfcShrink(dds=dds, coef=2, res=res) plotMA(res.shr) res.shr <- lfcShrink(dds=dds, contrast=c("condition","B","A"), res=res) plotMA(res.shr) # testing out various methods for LFC shrinkage set.seed(1) dds <- makeExampleDESeqDataSet(betaSD=1,n=1000,m=10) dds <- DESeq(dds) res <- results(dds, name="condition_B_vs_A") # dds and res must match expect_error(lfcShrink(dds=dds, coef=2, res=res[1:500,], type="normal"), "rownames") expect_error(lfcShrink(dds=dds, coef=2, res=res[1:500,], type="apeglm"), "rownames") # try out various types and ways of specifying coefs res.n <- lfcShrink(dds=dds, coef="condition_B_vs_A", res=res, type="normal") res.n <- lfcShrink(dds=dds, coef=2, res=res, type="normal") res.n <- lfcShrink(dds=dds, coef=2, type="normal") res.ape <- lfcShrink(dds=dds, coef=2, type="apeglm") res.ash <- lfcShrink(dds=dds, res=res, type="ashr") # prior info str(priorInfo(res.n)) str(priorInfo(res.ape)) str(priorInfo(res.ash)) # plot against true par(mfrow=c(1,3)) plot(mcols(dds)$trueBeta, res.n$log2FoldChange); abline(0,1,col="red") plot(mcols(dds)$trueBeta, res.ape$log2FoldChange); abline(0,1,col="red") plot(mcols(dds)$trueBeta, res.ash$log2FoldChange); abline(0,1,col="red") # s-value returned res.ape <- lfcShrink(dds=dds, coef=2, type="apeglm", svalue=TRUE) expect_true("svalue" %in% names(res.ape)) res.ash <- lfcShrink(dds=dds, res=res, type="ashr", svalue=TRUE) expect_true("svalue" %in% names(res.ash)) # TODO add tests of new plotMA() with svalue # list returned res.ape <- lfcShrink(dds=dds, coef=2, type="apeglm", returnList=TRUE) names(res.ape) res.ash <- lfcShrink(dds=dds, res=res, type="ashr", returnList=TRUE) names(res.ash) # test wrong coef specified resInt <- results(dds, name="Intercept") expect_error(lfcShrink(dds=dds, coef=2, res=resInt, type="apeglm")) # test supplied model.matrix full <- model.matrix(~condition, colData(dds)) dds <- DESeq(dds, full=full) res <- results(dds) res.ape <- lfcShrink(dds=dds, coef=2, res=res, type="apeglm") }) DESeq2/tests/testthat/test_linear_mu.R0000644000175400017540000000151713201671732020770 0ustar00biocbuildbiocbuildcontext("linear_mu") test_that("the use of linear model for fitting mu works as expected", { set.seed(1) dds <- makeExampleDESeqDataSet(n=100, m=4, interceptMean=10, interceptSD=3, dispMeanRel=function(x) 0.5, sizeFactors=c(.5,1,1,2)) dds <- estimateSizeFactors(dds) dds1 <- estimateDispersionsGeneEst(dds, linearMu=FALSE) dds2 <- estimateDispersionsGeneEst(dds, linearMu=TRUE) mu1 <- assays(dds1)[["mu"]] mu2 <- assays(dds2)[["mu"]] par(mfrow=c(2,2),mar=c(3,3,1,1)) for (i in 1:4) { plot(mu1[,i], mu2[,i], xlab="", ylab="", log="xy") abline(0,1) } cors <- diag(cor(mu1, mu2, use="complete")) expect_true(all(cors > 1 - 1e-6)) # dds2 <- estimateDispersionsFit(dds2, fitType="mean") dds2 <- estimateDispersionsMAP(dds2) dds2 <- nbinomWaldTest(dds2) res <- results(dds2) }) DESeq2/tests/testthat/test_methods.R0000644000175400017540000000070613201671732020457 0ustar00biocbuildbiocbuildcontext("methods") test_that("methods throw errors", { coldata <- DataFrame(x=factor(c("A","A","B","B"))) counts <- matrix(1:16, ncol=4) dds <- DESeqDataSetFromMatrix(counts, coldata, ~ x) expect_warning(counts(dds, replace=TRUE)) expect_error(counts(dds, normalized=TRUE)) expect_error(sizeFactors(dds) <- c(-1, -1, -1, -1)) expect_error(normalizationFactors(dds) <- matrix(-1, ncol=4, nrow=4)) expect_error(estimateDispersions(dds)) }) DESeq2/tests/testthat/test_model_matrix.R0000644000175400017540000000235713201671732021504 0ustar00biocbuildbiocbuildcontext("model_matrix") test_that("supplying custom model matrix works", { dds <- makeExampleDESeqDataSet(n=100, m=18) dds$group <- factor(rep(1:3,each=6)) dds$condition <- factor(rep(rep(c("A","B","C"),each=2),3)) # note: design is not used design(dds) <- ~ 1 dds <- dds[,-c(17,18)] m1 <- model.matrix(~ group*condition, colData(dds)) m1 <- m1[,-9] m0 <- model.matrix(~ group + condition, colData(dds)) dds <- DESeq(dds, full=m1, reduced=m0, test="LRT") results(dds)[1,] results(dds, name="group2.conditionC", test="Wald")[1,] dds <- removeResults(dds) dds <- DESeq(dds, full=m1, test="Wald", betaPrior=FALSE) results(dds)[1,] # test better error than "error: inv(): matrix seems singular" coldata <- data.frame(group=factor(rep(1:3,each=6)), group2=factor(rep(1:3,each=6)), condition=factor(rep(1:6,3))) counts <- matrix(rpois(180, 100), ncol=18) m1 <- model.matrix(~ group + group2, coldata) m2 <- model.matrix(~ condition + group, coldata) dds <- DESeqDataSetFromMatrix(counts, coldata, ~group) expect_error(dds <- DESeq(dds, full=m1, fitType="mean"), "full rank") expect_error(dds <- DESeq(dds, full=m2, reduced=m1, test="LRT", fitType="mean"), "full rank") }) DESeq2/tests/testthat/test_nbinomWald.R0000644000175400017540000000260513201671732021106 0ustar00biocbuildbiocbuildcontext("nbinomWald") test_that("nbinomWald throws various errors and works with edge cases",{ dds <- makeExampleDESeqDataSet(n=100, m=4) expect_error(nbinomWaldTest(dds)) expect_error(nbinomLRT(dds)) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) mm <- model.matrix(~ condition, colData(dds)) mm0 <- model.matrix(~ 1, colData(dds)) expect_error(nbinomWaldTest(dds, betaPrior=TRUE, modelMatrix=mm)) expect_error(nbinomLRT(dds, betaPrior=TRUE, full=mm, reduced=mm0)) expect_error(nbinomWaldTest(dds, betaPrior=FALSE, modelMatrixType="expanded")) expect_error(nbinomLRT(dds, betaPrior=FALSE, modelMatrixType="expanded")) dds2 <- estimateMLEForBetaPriorVar(dds) estimateBetaPriorVar(dds2, betaPriorMethod="quantile") dds <- nbinomWaldTest(dds, modelMatrixType="standard") covarianceMatrix(dds, 1) # changing 'df' dds <- makeExampleDESeqDataSet(n=100, m=4) counts(dds)[1:4,] <- rep(0L, 16) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) round(head(results(dds)$pvalue,8),3) dds <- nbinomWaldTest(dds, useT=TRUE, df=rep(1,100)) round(head(results(dds)$pvalue,8),3) # try nbinom after no fitted dispersions dds <- makeExampleDESeqDataSet(n=100, m=4) dds <- estimateSizeFactors(dds) dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst dds <- nbinomWaldTest(dds) }) DESeq2/tests/testthat/test_optim.R0000644000175400017540000000361313201671732020144 0ustar00biocbuildbiocbuildcontext("optim") test_that("optim gives same results", { set.seed(1) dds <- makeExampleDESeqDataSet(n=100,interceptMean=10,interceptSD=3) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) # make a large predictor to test scaling colData(dds)$condition <- rnorm(ncol(dds),0,1000) modelMatrix <- model.matrix(~ condition, as.data.frame(colData(dds))) fit <- DESeq2:::fitNbinomGLMs(dds, modelMatrix=modelMatrix, modelFormula = ~ condition, alpha_hat = dispersions(dds), lambda = c(2,2), renameCols=TRUE, betaTol=1e-8, maxit=100, useOptim=TRUE, useQR=TRUE, forceOptim=FALSE) fitOptim <- DESeq2:::fitNbinomGLMs(dds, modelMatrix=modelMatrix, modelFormula = ~ condition, alpha_hat = dispersions(dds), lambda = c(2,2), renameCols=TRUE, betaTol=1e-8, maxit=100, useOptim=TRUE, useQR=TRUE, forceOptim=TRUE) #plot(fit$betaMatrix[,2], fitOptim$betaMatrix[,2]) #abline(0,1,col="red") expect_equal(fit$betaMatrix, fitOptim$betaMatrix,tolerance=1e-6) expect_equal(fit$betaSE, fitOptim$betaSE,tolerance=1e-6) # test optim gives same lfcSE set.seed(1) dds <- makeExampleDESeqDataSet(n=100, m=10) counts(dds)[1,] <- c(rep(0L,5),c(1000L,1000L,0L,0L,0L)) dds <- DESeq(dds, betaPrior=FALSE) # beta iter = 100 implies optim used for fitting expect_equal(mcols(dds)$betaIter[1], 100) res1 <- results(dds, contrast=c("condition","B","A")) res2 <- results(dds, contrast=c(0,1)) expect_true(all.equal(res1$lfcSE, res2$lfcSE)) expect_true(all.equal(res1$pvalue, res2$pvalue)) }) DESeq2/tests/testthat/test_outlier.R0000644000175400017540000000516313201671732020501 0ustar00biocbuildbiocbuildcontext("outlier") test_that("outlier filtering and replacement works as expected", { # test filtering and replacement set.seed(1) dds <- makeExampleDESeqDataSet(n=100, m=12, dispMeanRel = function(x) 4/x + .5) counts(dds)[1,] <- rep(0L, 12) counts(dds)[2,] <- c(100000L, rep(10L, 11)) counts(dds)[3,] <- c(100000L, rep(0L, 11)) dds0 <- DESeq(dds, minReplicatesForReplace=Inf) dds1 <- DESeq(dds, minReplicatesForReplace=6) pval0 <- results(dds0)[1:3,"pvalue"] pval <- results(dds1)[1:3,"pvalue"] LFC0 <- results(dds0)[1:3,"log2FoldChange"] LFC <- results(dds1)[1:3,"log2FoldChange"] # filtered expect_true(all(is.na(pval0))) # not filtered expect_true(all(!is.na(pval[2:3]))) # counts still the same expect_true(all(counts(dds1)==counts(dds))) # first is NA expect_true(is.na(LFC[1])) # replaced, reduced LFC expect_true(abs(LFC[2]) < abs(LFC0[2])) # replaced, LFC now zero expect_true(LFC[3] == 0) idx <- which(!mcols(dds1)$replace) # the pvalue for those not replaced is equal expect_equal(results(dds1)$pvalue[idx], results(dds0)$pvalue[idx]) # check that outlier filtering catches throughout range of mu beta0 <- seq(from=1,to=16,length=100) idx <- rep(rep(c(TRUE,FALSE),c(1,9)),10) set.seed(1) par(mfrow=c(2,3)) for (disp0 in c(.01,.1)) { for (m in c(10,20,80)) { dds <- makeExampleDESeqDataSet(n=100, m=m, interceptMean=beta0, interceptSD=0, dispMeanRel=function(x) disp0) counts(dds)[idx,1] <- as.integer(1000 * 2^beta0[idx]) dds <- DESeq(dds, minReplicatesForReplace=Inf, quiet=TRUE, fitType="mean") res <- results(dds) cutoff <- qf(.99, 2, m-2) outlierCooks <- assays(dds)[["cooks"]][idx,1] > cutoff nonoutlierCooks <- mcols(dds)$maxCooks[!idx] < cutoff expect_true(all(is.na(res$pvalue[idx]))) expect_true(all(outlierCooks)) expect_true(all(nonoutlierCooks)) col <- rep("black", 100) col[idx] <- "blue" plot(2^beta0, mcols(dds)$maxCooks, col=col, log="xy", main=paste(m,"-",disp0), xlab="mean", ylab="cooks") abline(h=qf(.99,2,m-2)) } } dds <- makeExampleDESeqDataSet(n=100) counts(dds)[1,1] <- 1000000L dds <- DESeq(dds, test="LRT", reduced=~1, minReplicatesForReplace=6) # test replace function dds <- makeExampleDESeqDataSet(n=100,m=4) expect_error(replaceOutliers(dds)) dds <- DESeq(dds) expect_error(replaceOutliers(dds, minReplicates=2)) # check model matrix standard bug set.seed(1) dds <- makeExampleDESeqDataSet(n=100, m=20) counts(dds)[1,] <- c(100000L, rep(0L, 19)) dds <- DESeq(dds, modelMatrixType="standard") }) DESeq2/tests/testthat/test_parallel.R0000644000175400017540000000611013201671732020603 0ustar00biocbuildbiocbuildcontext("parallel") test_that("parallel execution works as expected", { dispMeanRel <- function(x) (4/x + .1) * exp(rnorm(length(x),0,sqrt(.5))) set.seed(1) dds0 <- makeExampleDESeqDataSet(n=100,dispMeanRel=dispMeanRel) counts(dds0)[51:60,] <- 0L # the following is an example of a simple parallelizable DESeq() run # without outlier replacement. see DESeq2:::DESeqParallel for the code # which is actually used in DESeq() nworkers <- 3 idx <- factor(sort(rep(seq_len(nworkers),length=nrow(dds0)))) ### BEGINNING ### dds <- estimateSizeFactors(dds0) dds <- do.call(rbind, lapply(levels(idx), function(l) { estimateDispersionsGeneEst(dds[idx == l,,drop=FALSE]) })) dds <- estimateDispersionsFit(dds) dispPriorVar <- estimateDispersionsPriorVar(dds) dds <- do.call(rbind, lapply(levels(idx), function(l) { ddsSub <- estimateDispersionsMAP(dds[idx == l,,drop=FALSE], dispPriorVar=dispPriorVar) estimateMLEForBetaPriorVar(ddsSub) })) betaPriorVar <- estimateBetaPriorVar(dds) dds <- do.call(rbind, lapply(levels(idx), function(l) { nbinomWaldTest(dds[idx == l,,drop=FALSE], betaPrior=TRUE, betaPriorVar=betaPriorVar) })) ### END ### res1 <- results(dds) dds2 <- DESeq(dds0, betaPrior=TRUE, minRep=Inf) res2 <- results(dds2) expect_equal(mcols(dds)$dispGeneEst, mcols(dds2)$dispGeneEst) expect_equal(mcols(dds)$dispFit, mcols(dds2)$dispFit) expect_equal(mcols(dds)$dispMAP, mcols(dds2)$dispMAP) expect_equal(mcols(dds)$dispersion, mcols(dds2)$dispersion) expect_equal(attr(dispersionFunction(dds), "dispPriorVar"), attr(dispersionFunction(dds2), "dispPriorVar")) expect_equal(attr(dispersionFunction(dds), "varLogDispEsts"), attr(dispersionFunction(dds2), "varLogDispEsts")) expect_equal(mcols(dds)$MLE_condition_B_vs_A, mcols(dds2)$MLE_condition_B_vs_A) expect_equal(attr(dds, "betaPriorVar"), attr(dds2, "betaPriorVar")) expect_equal(mcols(dds)$conditionB, mcols(dds2)$conditionB) expect_equal(res1$log2FoldChange, res2$log2FoldChange) expect_equal(res1$pvalue, res2$pvalue) library("BiocParallel") register(SerialParam()) dds3 <- DESeq(dds0, betaPrior=TRUE, parallel=TRUE) res3 <- results(dds3, parallel=TRUE) res4 <- results(dds3) expect_equal(res2$pvalue, res3$pvalue) expect_equal(res3$pvalue, res4$pvalue) expect_equal(res2$log2FoldChange, res3$log2FoldChange) expect_equal(res3$log2FoldChange, res4$log2FoldChange) dds <- makeExampleDESeqDataSet(n=100,m=8) dds <- DESeq(dds, parallel=TRUE, test="LRT", reduced=~1) # lfcShrink parallel test dds <- makeExampleDESeqDataSet(n=100) dds <- DESeq(dds) res <- results(dds) res <- lfcShrink(dds, coef=2) res2 <- lfcShrink(dds, coef=2, parallel=TRUE) expect_equal(res$log2FoldChange, res2$log2FoldChange) res <- lfcShrink(dds, coef=2, type="apeglm", svalue=TRUE) res2 <- lfcShrink(dds, coef=2, type="apeglm", parallel=TRUE, svalue=TRUE) expect_equal(res$log2FoldChange, res2$log2FoldChange) # this should be checked with number of workers > 1 expect_equal(res$svalue, res2$svalue) }) DESeq2/tests/testthat/test_plots.R0000644000175400017540000000153213201671732020153 0ustar00biocbuildbiocbuildcontext("plots") test_that("plots work", { # test plots dds <- makeExampleDESeqDataSet(n=100,m=8) dds$group <- factor(rep(c(1,2,1,2),each=2)) dds <- DESeq(dds) res <- results(dds) plotDispEsts(dds) plotDispEsts(dds, CV=TRUE) plotMA(dds) plotMA(dds, ylim=c(-1,1)) plotCounts(dds, 1) plotCounts(dds, 1, intgroup=c("condition","group")) plotCounts(dds, 1, transform=TRUE) expect_error(plotCounts(dds, 1, intgroup="foo")) vsd <- varianceStabilizingTransformation(dds, blind=FALSE) plotPCA(vsd) dat <- plotPCA(vsd, returnData=TRUE) plotPCA(vsd, intgroup=c("condition","group")) expect_error(plotPCA(vsd, intgroup="foo")) plotSparsity(dds) # plotMA MLE dds <- DESeq(dds, betaPrior=TRUE) res <- results(dds) expect_error(plotMA(res, MLE=TRUE)) res <- results(dds, addMLE=TRUE) plotMA(res, MLE=TRUE) dev.off() }) DESeq2/tests/testthat/test_results.R0000644000175400017540000001522413201671732020516 0ustar00biocbuildbiocbuildcontext("results") test_that("results works as expected and throws errors", { ## test contrasts set.seed(1) dds <- makeExampleDESeqDataSet(n=200,m=12) dds$condition <- factor(rep(1:3,each=4)) dds$group <- factor(rep(1:2,length=ncol(dds))) counts(dds)[1,] <- rep(c(100L,200L,800L),each=4) design(dds) <- ~ group + condition # calling results too early expect_error(results(dds)) sizeFactors(dds) <- rep(1, ncol(dds)) dds <- DESeq(dds) head(coef(dds)) res <- results(dds) show.res <- capture.output(show(res)) summary.res <- capture.output(summary(res)) # various results error checking expect_error(results(dds, test="LRT")) expect_error(results(dds, altHypothesis="lessAbs")) expect_error(results(dds, name=c("Intercept","group1"))) expect_error(results(dds, contrast=c("foo","B","A"))) expect_error(results(dds, contrast=c("condition","4","1"))) expect_error(results(dds, test="foo")) expect_error(results(dds, contrast=FALSE)) expect_error(results(dds, contrast=letters[1:4])) expect_error(results(dds, contrast=c("condition","1","1"))) results(dds, independentFiltering=FALSE) results(dds, contrast=list("condition_2_vs_1")) expect_error(results(dds, contrast=list("condition_2_vs_1","condition_3_vs_1","condition_3_vs_1"))) expect_error(results(dds, contrast=list("condition_2_vs_1",1))) expect_error(results(dds, contrast=list("condition_2_vs_1","foo"))) expect_error(results(dds, contrast=list("condition_2_vs_1","condition_2_vs_1"))) expect_error(results(dds, contrast=list(character(), character()))) expect_error(results(dds, contrast=rep(0, 6))) # check to see if the contrasts with expanded model matrix # are close to expected (although shrunk due to the beta prior). # lfcShrink() here calls results() lfc31 <- lfcShrink(dds,contrast=c("condition","3","1"))[1,"log2FoldChange"] lfc21 <- lfcShrink(dds,contrast=c("condition","2","1"))[1,"log2FoldChange"] lfc32 <- lfcShrink(dds,contrast=c("condition","3","2"))[1,"log2FoldChange"] expect_equal(lfc31, 3, tolerance=.1) expect_equal(lfc21, 1, tolerance=.1) expect_equal(lfc32, 2, tolerance=.1) expect_equal(results(dds,contrast=c("condition","1","3"))[1,2], -3, tolerance=.1) expect_equal(results(dds,contrast=c("condition","1","2"))[1,2], -1, tolerance=.1) expect_equal(results(dds,contrast=c("condition","2","3"))[1,2], -2, tolerance=.1) # check that results are not changed by releveling dds2 <- dds colData(dds2)$condition <- relevel(colData(dds2)$condition, "2") dds2 <- DESeq(dds2) expect_equal(lfcShrink(dds2,contrast=c("condition","3","1"))[1,"log2FoldChange"], lfc31, tolerance=1e-6) expect_equal(lfcShrink(dds2,contrast=c("condition","2","1"))[1,"log2FoldChange"], lfc21, tolerance=1e-6) expect_equal(lfcShrink(dds2,contrast=c("condition","3","2"))[1,"log2FoldChange"], lfc32, tolerance=1e-6) # test a number of contrast as list options expect_equal(results(dds, contrast=list("condition_3_vs_1","condition_2_vs_1"))[1,2], 2, tolerance=1e-6) results(dds, contrast=list("condition_3_vs_1","condition_2_vs_1"), listValues=c(.5,-.5)) results(dds, contrast=list("condition_3_vs_1",character())) results(dds, contrast=list("condition_3_vs_1",character()), listValues=c(.5,-.5)) results(dds, contrast=list(character(),"condition_2_vs_1")) results(dds, contrast=list(character(),"condition_2_vs_1"), listValues=c(.5,-.5)) # test no prior on intercept expect_equivalent(attr(dds,"betaPriorVar"), rep(1e6, 4)) # test thresholding results(dds, lfcThreshold=1) results(dds, lfcThreshold=1, altHypothesis="lessAbs") results(dds, lfcThreshold=1, altHypothesis="greater") results(dds, lfcThreshold=1, altHypothesis="less") dds3 <- DESeq(dds, betaPrior=TRUE) expect_error(results(dds3, lfcThreshold=1, altHypothesis="lessAbs")) }) test_that("results: designs with zero intercept", { # test some special cases for results() # using designs with +0 set.seed(1) dds <- makeExampleDESeqDataSet(n=100,m=12) dds$condition <- factor(rep(1:3,each=4)) dds$group <- factor(rep(1:2,length=ncol(dds))) counts(dds)[1,] <- rep(c(100L,200L,400L),each=4) design(dds) <- ~ condition + 0 dds <- DESeq(dds, betaPrior=FALSE) expect_equal(results(dds)[1,2], 2, tolerance=.1) expect_equal(results(dds, contrast=c("condition","2","1"))[1,2], 1, tolerance=.1) expect_equal(results(dds, contrast=c("condition","3","2"))[1,2], 1, tolerance=.1) expect_equal(results(dds, contrast=c("condition","1","3"))[1,2], -2, tolerance=.1) expect_equal(results(dds, contrast=c("condition","1","2"))[1,2], -1, tolerance=.1) expect_equal(results(dds, contrast=c("condition","2","3"))[1,2], -1, tolerance=.1) expect_error(results(dds, contrast=c("condition","4","1"))) design(dds) <- ~ group + condition + 0 dds <- DESeq(dds, betaPrior=FALSE) expect_equal(results(dds)[1,2], 2, tolerance=.1) expect_equal(results(dds, contrast=c("condition","2","1"))[1,2], 1, tolerance=.1) expect_equal(results(dds, contrast=c("condition","3","2"))[1,2], 1, tolerance=.1) expect_equal(results(dds, contrast=c("condition","1","3"))[1,2], -2, tolerance=.1) expect_equal(results(dds, contrast=c("condition","1","2"))[1,2], -1, tolerance=.1) expect_equal(results(dds, contrast=c("condition","2","3"))[1,2], -1, tolerance=.1) }) test_that("results: likelihood ratio test", { set.seed(1) dds <- makeExampleDESeqDataSet(n=100) dds$group <- factor(rep(1:2,6)) design(dds) <- ~ group + condition dds <- DESeq(dds, test="LRT", reduced=~group) expect_true(!all(results(dds,name="condition_B_vs_A")$stat == results(dds,name="condition_B_vs_A",test="Wald")$stat)) # LFC are already MLE expect_error(results(dds, addMLE=TRUE)) expect_error(results(dds, lfcThreshold=1, test="LRT")) expect_true(all(results(dds, test="LRT", contrast=c("group","1","2"))$log2FoldChange == -1 * results(dds, test="LRT", contrast=c("group","2","1"))$log2FoldChange)) }) test_that("results basics regarding format, tidy, MLE, remove are working", { dds <- makeExampleDESeqDataSet(n=100) dds <- DESeq(dds) res <- results(dds, format="GRanges") expect_warning(results(dds, format="GRangesList")) rowRanges(dds) <- as(rowRanges(dds), "GRangesList") dds <- DESeq(dds) expect_message(results(dds, format="GRanges")) # check tidy-ness res <- results(dds, tidy=TRUE) expect_true(colnames(res)[1] == "row") expect_true(is(res, "data.frame")) # test MLE and 'name' dds2 <- DESeq(dds, betaPrior=TRUE) results(dds2, addMLE=TRUE) expect_error(results(dds, name="condition_B_vs_A", addMLE=TRUE)) # test remove results dds <- removeResults(dds) expect_true(!any(mcols(mcols(dds))$type == "results")) }) DESeq2/tests/testthat/test_rlog.R0000644000175400017540000000144013201671732017753 0ustar00biocbuildbiocbuildcontext("rlog") test_that("rlog works", { # expect warning on sparsity and large counts dds <- makeExampleDESeqDataSet(n=100, m=20) idx <- sample(ncol(dds), nrow(dds)/2, TRUE) counts(dds)[cbind(1:(nrow(dds)/2), idx)] <- 10000L mcols(dds)$dispFit <- .5 expect_warning({ rld <- rlog(dds, blind=FALSE) }) # test rlog basics/errors dds <- makeExampleDESeqDataSet(n=20, m=4) colnames(dds) <- NULL rlog(dds) head(rlog(assay(dds))) expect_error(rlog(dds, intercept=rep(1,10))) mcols(dds)$dispFit <- rep(.5, 20) rlog(dds, blind=FALSE, intercept=rep(1,20)) expect_error(rlogData(dds)) expect_error(rlogData(dds, intercept=rep(1,10))) # test normTranform dds <- makeExampleDESeqDataSet(n=50, m=10) nt <- normTransform(dds) plotPCA(nt) rld <- rlog(counts(dds)) }) DESeq2/tests/testthat/test_size_factor.R0000644000175400017540000000307413201671732021325 0ustar00biocbuildbiocbuildcontext("size_factor") test_that("size factor works", { # size factor error checking m <- matrix(1:16, ncol=4) expect_error(estimateSizeFactorsForMatrix(m, geoMeans=1:5)) expect_error(estimateSizeFactorsForMatrix(m, geoMeans=rep(0,4))) expect_error(estimateSizeFactorsForMatrix(m, controlGenes="foo")) estimateSizeFactorsForMatrix(m, geoMeans=1:4) estimateSizeFactorsForMatrix(m, controlGenes=1:2) # norm matrix works nm <- m / exp(rowMeans(log(m))) # divide out the geometric mean true.sf <- c(2,1,1,.5) counts <- sweep(2*m, 2, true.sf, "*") dds <- DESeqDataSetFromMatrix(counts, data.frame(x=1:4), ~1) dds <- estimateSizeFactors(dds, normMatrix=nm) expect_equal((normalizationFactors(dds)/nm)[1,], true.sf) # make some counts with zeros set.seed(1) true.sf <- 2^(rep(c(-2,-1,0,0,1,2),each=2)) dmr <- function(x) 0.01 dds <- makeExampleDESeqDataSet(sizeFactors=true.sf, n=100, dispMeanRel=dmr) cts <- counts(dds) idx <- cbind(seq_len(nrow(cts)), sample(ncol(dds), nrow(cts), replace=TRUE)) cts[idx] <- 0L cts[1,1] <- 1000000L # an outlier counts(dds) <- cts # positive counts method dds <- estimateSizeFactors(dds, type="poscounts") sf <- sizeFactors(dds) plot(true.sf, sf);abline(0,1) coefs <- coef(lm(sf ~ true.sf)) expect_true(abs(coefs[1]) < .1) expect_true(abs(coefs[2] - 1) < .1) # iterate method dds <- estimateSizeFactors(dds, type="iterate") sf <- sizeFactors(dds) plot(true.sf, sf);abline(0,1) coefs <- coef(lm(sf ~ true.sf)) expect_true(abs(coefs[1]) < .1) expect_true(abs(coefs[2] - 1) < .1) }) DESeq2/tests/testthat/test_tximport.R0000644000175400017540000000147413201671732020705 0ustar00biocbuildbiocbuildcontext("tximport") test_that("tximport works", { library("tximport") library("tximportData") library("readr") dir <- system.file("extdata", package="tximportData") samples <- read.table(file.path(dir,"samples.txt"), header=TRUE) files <- file.path(dir,"salmon", samples$run, "quant.sf") names(files) <- paste0("sample",1:6) tx2gene <- read.csv(file.path(dir, "tx2gene.csv")) txi <- tximport(files, type="salmon", tx2gene=tx2gene) dds <- DESeqDataSetFromTximport(txi, samples, ~1) # test library size correction taking into account # the average transcript lengths dds <- estimateSizeFactors(dds) # test fpkm exprs <- fpm(dds) exprs <- fpkm(dds) # test length of 0 txi2 <- txi txi2$length[1,1] <- 0 expect_error(dds2 <- DESeqDataSetFromTximport(txi2, samples, ~1), "lengths") }) DESeq2/tests/testthat/test_unmix_samples.R0000644000175400017540000000323513201671732021700 0ustar00biocbuildbiocbuildcontext("unmix samples") test_that("unmixing samples works", { set.seed(1) n <- 2000 a <- runif(n) b <- runif(n) c <- runif(n) counts <- matrix(nrow=n, ncol=8) disp <- 0.01 counts[,1] <- rnbinom(n, mu=1e4 * a, size=1/disp) counts[,2] <- rnbinom(n, mu=1e4 * b, size=1/disp) counts[,3] <- rnbinom(n, mu=1e4 * c, size=1/disp) counts[,4] <- rnbinom(n, mu=1e4 * (.75*a + .25*b), size=1/disp) counts[,5] <- rnbinom(n, mu=1e4 * (.5*a + .5*b), size=1/disp) counts[,6] <- rnbinom(n, mu=1e4 * (.25*a + .75*b), size=1/disp) counts[,7] <- rnbinom(n, mu=1e4 * (.33*a + .33*b + .33*c), size=1/disp) counts[,8] <- rnbinom(n, mu=1e4 * (.25*a + .25*b + .5*c), size=1/disp) coldata <- data.frame(a=c(1,0,0,.75,.5,.25,.33,.25), b=c(0,1,0,.25,.5,.75,.33,.25), c=c(0,0,1, 0, 0, 0,.33,.5)) dds <- DESeqDataSetFromMatrix(counts, coldata, ~1) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds, fitType="mean") vsd <- varianceStabilizingTransformation(dds, blind=FALSE) #library(ggplot2) #plotPCA(vsd, intgroup=c("a","b","c")) + geom_point(size=5) pure <- matrix(rnbinom(3*n,mu=1e4*c(a,b,c),size=1/disp),ncol=3) colnames(pure) <- c("a","b","c") x <- counts alpha <- attr(dispersionFunction(dds),"mean") mix <- unmix(counts, pure=pure, alpha=alpha, quiet=TRUE) max(abs(dds$a - mix[,1])) max(abs(dds$b - mix[,2])) max(abs(dds$c - mix[,3])) expect_lt(max(abs(dds$a - mix[,1])), .01) expect_lt(max(abs(dds$b - mix[,2])), .01) expect_lt(max(abs(dds$c - mix[,3])), .01) # test the shifted log (designed for TPMs) mix2 <- unmix(counts, pure=pure, shift=0.5, quiet=TRUE) }) DESeq2/tests/testthat/test_vst.R0000644000175400017540000000231213201671732017623 0ustar00biocbuildbiocbuildcontext("vst") test_that("vst works", { dds <- makeExampleDESeqDataSet(n=100, m=4) design(dds) <- ~ 1 dds <- estimateSizeFactors(dds) dds <- estimateDispersionsGeneEst(dds) dds <- estimateDispersionsFit(dds, fitType="parametric") vsd <- varianceStabilizingTransformation(dds, blind=FALSE) dds <- estimateDispersionsFit(dds, fitType="local") vsd <- varianceStabilizingTransformation(dds, blind=FALSE) dds <- estimateDispersionsFit(dds, fitType="mean") vsd <- varianceStabilizingTransformation(dds, blind=FALSE) # test VST basics/errors dds <- makeExampleDESeqDataSet(n=20, m=4) colnames(dds) <- NULL varianceStabilizingTransformation(dds) head(varianceStabilizingTransformation(assay(dds))) expect_error(getVarianceStabilizedData(dds)) # test just matrix vsd <- varianceStabilizingTransformation(counts(dds)) # test fast VST based on subsampling dds <- makeExampleDESeqDataSet(n=20000, m=10) vsd <- vst(dds) vsd <- vst(counts(dds)) # test VST and normalization factors dds <- makeExampleDESeqDataSet(n=100, m=10, betaSD=1.5) nf <- matrix(exp(rnorm(1000,0,.2)),ncol=10) normalizationFactors(dds) <- nf vsd <- varianceStabilizingTransformation(dds, fitType="local") }) DESeq2/tests/testthat/test_weights.R0000644000175400017540000000632713201671732020473 0ustar00biocbuildbiocbuildcontext("weights") test_that("weights work", { set.seed(1) dds <- makeExampleDESeqDataSet(n=10) dds <- DESeq(dds, quiet=TRUE) dds2 <- dds w <- matrix(1, nrow=nrow(dds), ncol=12) w[1,1] <- 0 assays(dds2)[["weights"]] <- w dds2 <- nbinomWaldTest(dds2) dds3 <- dds[,-1] dds3 <- nbinomWaldTest(dds3) expect_equal(results(dds2)$log2FoldChange[1], results(dds3)$log2FoldChange[1]) expect_equal(results(dds2)$lfcSE[1], results(dds3)$lfcSE[1]) expect_equal(mcols(dds2)[1,"deviance"],mcols(dds3)[1,"deviance"]) nf <- matrix(sizeFactors(dds),nrow=nrow(dds),ncol=ncol(dds),byrow=TRUE) o <- fitNbinomGLMsOptim(object=dds, modelMatrix=model.matrix(design(dds), colData(dds)), lambda=rep(1e-6, 2), rowsForOptim=1, rowStable=TRUE, normalizationFactors=nf, alpha_hat=dispersions(dds), weights=w, useWeights=TRUE, betaMatrix=matrix(0,nrow=nrow(dds),ncol=2), betaSE=matrix(0,nrow=nrow(dds),ncol=2), betaConv=rep(FALSE,nrow(dds)), beta_mat=matrix(0,nrow=nrow(dds),ncol=2), mu=matrix(0,nrow=nrow(dds),ncol=ncol(dds)), logLike=rep(0,nrow(dds))) expect_equal(results(dds3)$log2FoldChange[1], o$betaMatrix[1,2], tolerance=1e-4) set.seed(1) dds <- makeExampleDESeqDataSet(n=10) w <- matrix(1, nrow=nrow(dds), ncol=12) w[1,1] <- 0 assays(dds)[["weights"]] <- w dds <- DESeq(dds, betaPrior=TRUE, quiet=TRUE) design(dds) <- ~1 suppressWarnings({ dds <- DESeq(dds, quiet=TRUE) }) dds2 <- dds assays(dds2)[["weights"]] <- w dds2 <- nbinomWaldTest(dds2) dds3 <- dds[,-1] dds3 <- nbinomWaldTest(dds3) expect_equal(results(dds2)$log2FoldChange[1], results(dds3)$log2FoldChange[1]) expect_equal(results(dds2)$lfcSE[1], results(dds3)$lfcSE[1]) expect_equal(mcols(dds2)[1,"deviance"],mcols(dds3)[1,"deviance"]) set.seed(1) dds <- makeExampleDESeqDataSet(n=10) counts(dds)[1,1] <- 100L sizeFactors(dds) <- rep(1,12) dds <- estimateDispersions(dds) dds2 <- dds w <- matrix(1, nrow=nrow(dds), ncol=12) w[1,1] <- 0 assays(dds2)[["weights"]] <- w dds2 <- estimateDispersions(dds2) dds3 <- dds[,-1] dds3 <- estimateDispersions(dds3) expect_equal(mcols(dds2)[1,"dispGeneEst"],mcols(dds3)[1,"dispGeneEst"],tolerance=1e-3) # MAP estimates won't be equal because of different dispersion prior widths... expect_true(mcols(dds)[1,"dispMAP"] > mcols(dds2)[1,"dispMAP"]) # test grid of weights ## set.seed(1) ## dds <- makeExampleDESeqDataSet(n=10, dispMeanRel=function(x) 0.01) ## counts(dds)[1,1] <- 100L ## sizeFactors(dds) <- rep(1,12) ## dds <- DESeq(dds, quiet=TRUE, fitType="mean") ## dds2 <- dds ## w <- matrix(1, nrow=nrow(dds), ncol=12) ## lfc <- sapply(1:11, function(i) { ## w[1,1] <- (i-1)/10 ## assays(dds2)[["weights"]] <- w ## dds2 <- DESeq(dds2, quiet=TRUE, fitType="mean") ## results(dds2)$log2FoldChange[1] ## }) ## plot((1:11-1)/10, lfc, type="b") ## abline(h=results(dds)$log2FoldChange[1]) }) DESeq2/vignettes/0000755000175400017540000000000013201712502014624 5ustar00biocbuildbiocbuildDESeq2/vignettes/DESeq2.Rmd0000644000175400017540000031311313201671732016326 0ustar00biocbuildbiocbuild--- title: "Analyzing RNA-seq data with DESeq2" author: "Michael I. Love, Simon Anders, and Wolfgang Huber" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('DESeq2')`" abstract: > A basic task in the analysis of count data from RNA-seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each gene. Analogous data also arise for other assay types, including comparative ChIP-Seq, HiC, shRNA screening, mass spectrometry. An important analysis question is the quantification and statistical inference of systematic changes between conditions, as compared to within-condition variability. The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions This vignette explains the use of the package and demonstrates typical workflows. [An RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/) on the Bioconductor website covers similar material to this vignette but at a slower pace, including the generation of count matrices from FASTQ files. DESeq2 package version: `r packageVersion("DESeq2")` output: rmarkdown::html_document: highlight: pygments toc: true fig_width: 5 bibliography: library.bib vignette: > %\VignetteIndexEntry{Analyzing RNA-seq data with DESeq2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding[utf8]{inputenc} --- ```{r setup, echo=FALSE, results="hide"} knitr::opts_chunk$set(tidy=FALSE, cache=TRUE, dev="png", message=FALSE, error=FALSE, warning=TRUE) ``` # Standard workflow **Note:** if you use DESeq2 in published research, please cite: > Love, M.I., Huber, W., Anders, S. (2014) > Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. > *Genome Biology*, **15**:550. > [10.1186/s13059-014-0550-8](http://dx.doi.org/10.1186/s13059-014-0550-8) Other Bioconductor packages with similar aims are [edgeR](http://bioconductor.org/packages/edgeR), [limma](http://bioconductor.org/packages/limma), [DSS](http://bioconductor.org/packages/DSS), [EBSeq](http://bioconductor.org/packages/EBSeq), and [baySeq](http://bioconductor.org/packages/baySeq). ## Quick start Here we show the most basic steps for a differential expression analysis. There are a variety of steps upstream of DESeq2 that result in the generation of counts or estimated counts for each sample, which we will discuss in the sections below. This code chunk assumes that you have a count matrix called `cts` and a table of sample information called `coldata`. The `design` indicates how to model the samples, here, that we want to measure the effect of the condition, controlling for batch differences. The two factor variables `batch` and `condition` should be columns of `coldata`. ```{r quickStart, eval=FALSE} dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design= ~ batch + condition) dds <- DESeq(dds) res <- results(dds, contrast=c("condition","treat","ctrl")) resultsNames(dds) res <- lfcShrink(dds, coef=2) ``` The following starting functions will be explained below: * If you have transcript quantification files, as produced by *Salmon*, *Sailfish*, or *kallisto*, you would use *DESeqDataSetFromTximport*. * If you have *htseq-count* files, the first line would use *DESeqDataSetFromHTSeq*. * If you have a *RangedSummarizedExperiment*, the first line would use *DESeqDataSet*. ## How to get help for DESeq2 Any and all DESeq2 questions should be posted to the **Bioconductor support site**, which serves as a searchable knowledge base of questions and answers: Posting a question and tagging with "DESeq2" will automatically send an alert to the package authors to respond on the support site. See the first question in the list of [Frequently Asked Questions](#FAQ) (FAQ) for information about how to construct an informative post. You should **not** email your question to the package authors, as we will just reply that the question should be posted to the **Bioconductor support site**. ## Input data ### Why un-normalized counts? As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the *i*-th row and the *j*-th column of the matrix tells how many reads can be assigned to gene *i* in sample *j*. Analogously, for other types of assays, the rows of the matrix might correspond e.g. to binding regions (with ChIP-Seq) or peptide sequences (with quantitative mass spectrometry). We will list method for obtaining count matrices in sections below. The values in the matrix should be un-normalized counts or estimated counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq). The [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/) describes multiple techniques for preparing such count matrices. It is important to provide count matrices as input for DESeq2's statistical model [@Love2014] to hold, as only the count values allow assessing the measurement precision correctly. The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input. ### The DESeqDataSet The object class used by the DESeq2 package to store the read counts and the intermediate estimated quantities during statistical analysis is the *DESeqDataSet*, which will usually be represented in the code here as an object `dds`. A technical detail is that the *DESeqDataSet* class extends the *RangedSummarizedExperiment* class of the [SummarizedExperiment](http://bioconductor.org/packages/SummarizedExperiment) package. The "Ranged" part refers to the fact that the rows of the assay data (here, the counts) can be associated with genomic ranges (the exons of genes). This association facilitates downstream exploration of results, making use of other Bioconductor packages' range-based functionality (e.g. find the closest ChIP-seq peaks to the differentially expressed genes). A *DESeqDataSet* object must have an associated *design formula*. The design formula expresses the variables which will be used in modeling. The formula should be a tilde (~) followed by the variables with plus signs between them (it will be coerced into an *formula* if it is not already). The design can be changed later, however then all differential analysis steps should be repeated, as the design formula is used to estimate the dispersions and to estimate the log2 fold changes of the model. *Note*: In order to benefit from the default settings of the package, you should put the variable of interest at the end of the formula and make sure the control level is the first level. We will now show 4 ways of constructing a *DESeqDataSet*, depending on what pipeline was used upstream of DESeq2 to generated counts or estimated counts: 1) From [transcript abundance files and tximport](#tximport) 2) From a [count matrix](#countmat) 3) From [htseq-count files](#htseq) 4) From a [SummarizedExperiment](#se) object ### Transcript abundance files and *tximport* input A newer and recommended pipeline is to use fast transcript abundance quantifiers upstream of DESeq2, and then to create gene-level count matrices for use with DESeq2 by importing the quantification data using the [tximport](http://bioconductor.org/packages/tximport) package. This workflow allows users to import transcript abundance estimates from a variety of external software, including the following methods: * [Salmon](http://combine-lab.github.io/salmon/) [@Patro2017Salmon] * [Sailfish](http://www.cs.cmu.edu/~ckingsf/software/sailfish/) [@Patro2014Sailfish] * [kallisto](https://pachterlab.github.io/kallisto/about.html) [@Bray2016Near] * [RSEM](http://deweylab.github.io/RSEM/) [@Li2011RSEM] Some advantages of using the above methods for transcript abundance estimation are: (i) this approach corrects for potential changes in gene length across samples (e.g. from differential isoform usage) [@Trapnell2013Differential], (ii) some of these methods (*Salmon*, *Sailfish*, *kallisto*) are substantially faster and require less memory and disk usage compared to alignment-based methods that require creation and storage of BAM files, and (iii) it is possible to avoid discarding those fragments that can align to multiple genes with homologous sequence, thus increasing sensitivity [@Robert2015Errors]. Full details on the motivation and methods for importing transcript level abundance and count estimates, summarizing to gene-level count matrices and producing an offset which corrects for potential changes in average transcript length across samples are described in [@Soneson2015]. Note that the tximport-to-DESeq2 approach uses *estimated* gene counts from the transcript abundance quantifiers, but not *normalized* counts. A tutorial on how to use the *Salmon* software for quantifying transcript abundance can be found [here](https://combine-lab.github.io/salmon/getting_started/). We recommend using the `--gcBias` [flag](http://salmon.readthedocs.io/en/latest/salmon.html#gcbias) which estimates a correction factor for systematic biases commonly present in RNA-seq data [@Love2016Modeling; @Patro2017Salmon], unless you are certain that your data do not contain such bias. Here, we demonstrate how to import transcript abundances and construct of a gene-level *DESeqDataSet* object from *Salmon* `quant.sf` files, which are stored in the [tximportData](http://bioconductor.org/packages/tximportData) package. You do not need the `tximportData` package for your analysis, it is only used here for demonstration. Note that, instead of locating `dir` using *system.file*, a user would typically just provide a path, e.g. `/path/to/quant/files`. For a typical use, the `condition` information should already be present as a column of the sample table `samples`, while here we construct artificial condition labels for demonstration. ```{r txiSetup} library("tximport") library("readr") library("tximportData") dir <- system.file("extdata", package="tximportData") samples <- read.table(file.path(dir,"samples.txt"), header=TRUE) samples$condition <- factor(rep(c("A","B"),each=3)) rownames(samples) <- samples$run samples[,c("pop","center","run","condition")] ``` Next we specify the path to the files using the appropriate columns of `samples`, and we read in a table that links transcripts to genes for this dataset. ```{r txiFiles} files <- file.path(dir,"salmon", samples$run, "quant.sf") names(files) <- samples$run tx2gene <- read.csv(file.path(dir, "tx2gene.csv")) ``` We import the necessary quantification data for DESeq2 using the *tximport* function. For further details on use of *tximport*, including the construction of the `tx2gene` table for linking transcripts to genes in your dataset, please refer to the [tximport](http://bioconductor.org/packages/tximport) package vignette. ```{r tximport, results="hide"} txi <- tximport(files, type="salmon", tx2gene=tx2gene) ``` Finally, we can construct a *DESeqDataSet* from the `txi` object and sample information in `samples`. ```{r txi2dds, results="hide"} library("DESeq2") ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ condition) ``` The `ddsTxi` object here can then be used as `dds` in the following analysis steps. ### Count matrix input Alternatively, the function *DESeqDataSetFromMatrix* can be used if you already have a matrix of read counts prepared from another source. Another method for quickly producing count matrices from alignment files is the *featureCounts* function [@Liao2013feature] in the [Rsubread](http://bioconductor.org/packages/Rsubread) package. To use *DESeqDataSetFromMatrix*, the user should provide the counts matrix, the information about the samples (the columns of the count matrix) as a *DataFrame* or *data.frame*, and the design formula. To demonstate the use of *DESeqDataSetFromMatrix*, we will read in count data from the [pasilla](http://bioconductor.org/packages/pasilla) package. We read in a count matrix, which we will name `cts`, and the sample information table, which we will name `coldata`. Further below we describe how to extract these objects from, e.g. *featureCounts* output. ```{r loadPasilla} library("pasilla") pasCts <- system.file("extdata", "pasilla_gene_counts.tsv", package="pasilla", mustWork=TRUE) pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv", package="pasilla", mustWork=TRUE) cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id")) coldata <- read.csv(pasAnno, row.names=1) coldata <- coldata[,c("condition","type")] ``` We examine the count matrix and column data to see if they are consistent in terms of sample order. ```{r showPasilla} head(cts,2) coldata ``` Note that these are not in the same order with respect to samples! It is absolutely critical that the columns of the count matrix and the rows of the column data (information about samples) are in the same order. DESeq2 will not make guesses as to which column of the count matrix belongs to which row of the column data, these must be provided to DESeq2 already in consistent order. As they are not in the correct order as given, we need to re-arrange one or the other so that they are consistent in terms of sample order (if we do not, later functions would produce an error). We additionally need to chop off the `"fb"` of the row names of `coldata`, so the naming is consistent. ```{r reorderPasila} rownames(coldata) <- sub("fb", "", rownames(coldata)) all(rownames(coldata) %in% colnames(cts)) all(rownames(coldata) == colnames(cts)) cts <- cts[, rownames(coldata)] all(rownames(coldata) == colnames(cts)) ``` If you have used the *featureCounts* function [@Liao2013feature] in the [Rsubread](http://bioconductor.org/packages/Rsubread) package, the matrix of read counts can be directly provided from the `"counts"` element in the list output. The count matrix and column data can typically be read into R from flat files using base R functions such as *read.csv* or *read.delim*. For *htseq-count* files, see the dedicated input function below. With the count matrix, `cts`, and the sample information, `coldata`, we can construct a *DESeqDataSet*: ```{r matrixInput} library("DESeq2") dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition) dds ``` If you have additional feature data, it can be added to the *DESeqDataSet* by adding to the metadata columns of a newly constructed object. (Here we add redundant data just for demonstration, as the gene names are already the rownames of the `dds`.) ```{r addFeatureData} featureData <- data.frame(gene=rownames(cts)) mcols(dds) <- DataFrame(mcols(dds), featureData) mcols(dds) ``` ### *htseq-count* input You can use the function *DESeqDataSetFromHTSeqCount* if you have used *htseq-count* from the [HTSeq](http://www-huber.embl.de/users/anders/HTSeq) python package [@Anders:2014:htseq]. For an example of using the python scripts, see the [pasilla](http://bioconductor.org/packages/pasilla) data package. First you will want to specify a variable which points to the directory in which the *htseq-count* output files are located. ```{r htseqDirI, eval=FALSE} directory <- "/path/to/your/files/" ``` However, for demonstration purposes only, the following line of code points to the directory for the demo *htseq-count* output files packages for the [pasilla](http://bioconductor.org/packages/pasilla) package. ```{r htseqDirII} directory <- system.file("extdata", package="pasilla", mustWork=TRUE) ``` We specify which files to read in using *list.files*, and select those files which contain the string `"treated"` using *grep*. The *sub* function is used to chop up the sample filename to obtain the condition status, or you might alternatively read in a phenotypic table using *read.table*. ```{r htseqInput} sampleFiles <- grep("treated",list.files(directory),value=TRUE) sampleCondition <- sub("(.*treated).*","\\1",sampleFiles) sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition) ``` Then we build the *DESeqDataSet* using the following function: ```{r hsteqDds} library("DESeq2") ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition) ddsHTSeq ``` ### *SummarizedExperiment* input An example of the steps to produce a *RangedSummarizedExperiment* can be found in the [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/) and in the vignette for the data package [airway](http://bioconductor.org/packages/airway). Here we load the *RangedSummarizedExperiment* from that package in order to build a *DESeqDataSet*. ```{r loadSumExp} library("airway") data("airway") se <- airway ``` The constructor function below shows the generation of a *DESeqDataSet* from a *RangedSummarizedExperiment* `se`. ```{r sumExpInput} library("DESeq2") ddsSE <- DESeqDataSet(se, design = ~ cell + dex) ddsSE ``` ### Pre-filtering While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the `dds` data object, and we increase the speed of the transformation and testing functions within DESeq2. Here we perform a minimal pre-filtering to keep only rows that have at least 10 reads total. Note that more strict filtering to increase power is *automatically* applied via [independent filtering](#indfilt) on the mean of normalized counts within the *results* function. ```{r prefilter} keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] ``` ### Note on factor levels By default, R will choose a *reference level* for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell *results* which comparison to make using the `contrast` argument (this will be shown later), or you can explicitly set the factors levels. You should only change the factor levels of variables in the design **before** running the DESeq2 analysis, not during or afterward. Setting the factor levels can be done in two ways, either using factor: ```{r factorlvl} dds$condition <- factor(dds$condition, levels = c("untreated","treated")) ``` ...or using *relevel*, just specifying the reference level: ```{r relevel} dds$condition <- relevel(dds$condition, ref = "untreated") ``` If you need to subset the columns of a *DESeqDataSet*, i.e., when removing certain samples from the analysis, it is possible that all the samples for one or more levels of a variable in the design formula would be removed. In this case, the *droplevels* function can be used to remove those levels which do not have samples in the current *DESeqDataSet*: ```{r droplevels} dds$condition <- droplevels(dds$condition) ``` ### Collapsing technical replicates DESeq2 provides a function *collapseReplicates* which can assist in combining the counts from technical replicates into single columns of the count matrix. The term *technical replicate* implies multiple sequencing runs of the same library. You should not collapse biological replicates using this function. See the manual page for an example of the use of *collapseReplicates*. ### About the pasilla dataset We continue with the [pasilla](http://bioconductor.org/packages/pasilla) data constructed from the count matrix method above. This data set is from an experiment on *Drosophila melanogaster* cell cultures and investigated the effect of RNAi knock-down of the splicing factor *pasilla* [@Brooks2010]. The detailed transcript of the production of the [pasilla](http://bioconductor.org/packages/pasilla) data is provided in the vignette of the data package [pasilla](http://bioconductor.org/packages/pasilla). ## Differential expression analysis The standard differential expression analysis steps are wrapped into a single function, *DESeq*. The estimation steps performed by this function are described [below](#theory), in the manual page for `?DESeq` and in the Methods section of the DESeq2 publication [@Love2014]. Results tables are generated using the function *results*, which extracts a results table with log2 fold changes, *p* values and adjusted *p* values. With no additional arguments to *results*, the log2 fold change and Wald test *p* value will be for the last variable in the design formula, and if this is a factor, the comparison will be the last level of this variable over the first level. However, the order of the variables of the design do not matter so long as the user specifies the comparison using the `name` or `contrast` arguments of *results* (described later and in `?results`). Details about the comparison are printed to the console, above the results table. The text, `condition treated vs untreated`, tells you that the estimates are of the logarithmic fold change log2(treated/untreated). ```{r deseq} dds <- DESeq(dds) res <- results(dds) res ``` In previous versions of DESeq2, the *DESeq* function by default would produce moderated, or shrunken, log2 fold changes through the use of the `betaPrior` argument. In version 1.16 and higher, we have split the moderation of log2 fold changes into a separate function, *lfcShrink*, for reasons described in the [changes section](#changes) below. Here we provide the `dds` object and the number of the coefficient we want to moderate. It is also possible to specify a `contrast`, instead of `coef`, which works the same as the `contrast` argument of the *results* function. If a results object is provided, the `log2FoldChange` column will be swapped out, otherwise *lfcShrink* returns a vector of shrunken log2 fold changes. ```{r lfcShrink} resultsNames(dds) resLFC <- lfcShrink(dds, coef=2) resLFC ``` The above steps should take less than 30 seconds for most analyses. For experiments with many samples (e.g. 100 samples), one can take advantage of parallelized computation. Parallelizing `DESeq`, `results`, and `lfcShrink` can be easily accomplished by loading the BiocParallel package, and then setting the following arguments: `parallel=TRUE` and `BPPARAM=MulticoreParam(4)`, for example, splitting the job over 4 cores. Note that `results` for coefficients or contrasts listed in `resultsNames(dds)` is fast and will not need parallelization. As an alternative to `BPPARAM`, one can `register` cores at the beginning of an analysis, and then just specify `parallel=TRUE` to the functions when called. ```{r parallel, eval=FALSE} library("BiocParallel") register(MulticoreParam(4)) ``` We can order our results table by the smallest *p* value: ```{r resOrder} resOrdered <- res[order(res$pvalue),] ``` We can summarize some basic tallies using the *summary* function. ```{r sumRes} summary(res) ``` How many adjusted p-values were less than 0.1? ```{r sumRes01} sum(res$padj < 0.1, na.rm=TRUE) ``` The *results* function contains a number of arguments to customize the results table which is generated. You can read about these arguments by looking up `?results`. Note that the *results* function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted *p* value below a given FDR cutoff, `alpha`. Independent filtering is further discussed [below](#indfilt). By default the argument `alpha` is set to $0.1$. If the adjusted *p* value cutoff will be a value other than $0.1$, `alpha` should be set to that value: ```{r resAlpha05} res05 <- results(dds, alpha=0.05) summary(res05) sum(res05$padj < 0.05, na.rm=TRUE) ``` A generalization of the idea of *p* value filtering is to *weight* hypotheses to optimize power. A Bioconductor package, [IHW](http://bioconductor.org/packages/IHW), is available that implements the method of *Independent Hypothesis Weighting* [@Ignatiadis2016]. Here we show the use of *IHW* for *p* value adjustment of DESeq2 results. For more details, please see the vignette of the [IHW](http://bioconductor.org/packages/IHW) package. The *IHW* result object is stored in the metadata. **Note:** If the results of independent hypothesis weighting are used in published research, please cite: > Ignatiadis, N., Klaus, B., Zaugg, J.B., Huber, W. (2016) > Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. > *Nature Methods*, **13**:7. > [10.1038/nmeth.3885](http://dx.doi.org/10.1038/nmeth.3885) ```{r IHW} library("IHW") resIHW <- results(dds, filterFun=ihw) summary(resIHW) sum(resIHW$padj < 0.1, na.rm=TRUE) metadata(resIHW)$ihwResult ``` If a multi-factor design is used, or if the variable in the design formula has more than two levels, the `contrast` argument of *results* can be used to extract different comparisons from the *DESeqDataSet* returned by *DESeq*. The use of the `contrast` argument is further discussed [below](#contrasts). For advanced users, note that all the values calculated by the DESeq2 package are stored in the *DESeqDataSet* object, and access to these values is discussed [below](#access). ## Exploring and exporting results ### MA-plot In DESeq2, the function *plotMA* shows the log2 fold changes attributable to a given variable over the mean of normalized counts for all the samples in the *DESeqDataSet*. Points will be colored red if the adjusted *p* value is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down. ```{r MA} plotMA(res, ylim=c(-2,2)) ``` It is more useful visualize the MA-plot for the shrunken log2 fold changes, which remove the noise associated with log2 fold changes from low count genes without requiring arbitrary filtering thresholds. ```{r shrunkMA} plotMA(resLFC, ylim=c(-2,2)) ``` After calling *plotMA*, one can use the function *identify* to interactively detect the row number of individual genes by clicking on the plot. One can then recover the gene identifiers by saving the resulting indices: ```{r MAidentify, eval=FALSE} idx <- identify(res$baseMean, res$log2FoldChange) rownames(res)[idx] ``` ### Alternative shrinkage estimators The moderated log fold changes proposed by @Love2014 use a normal prior distribution, centered on zero and with a scale that is fit to the data. The shrunken log fold changes are useful for ranking and visualization, without the need for arbitrary filters on low count genes. The normal prior can sometimes produce too strong of shrinkage for certain datasets. In DESeq2 version 1.18, we include two additional adaptive shrinkage estimators, available via the `type` argument of `lfcShrink`. For more details, see `?lfcShrink` The options for `type` are: * `normal` is the the original DESeq2 shrinkage estimator, an adaptive normal prior * `apeglm` is the adaptive t prior shrinkage estimator from the [apeglm](http://bioconductor.org/packages/apeglm) package * `ashr` is the adaptive shrinkage estimator from the [ashr](https://github.com/stephens999/ashr) package [@Stephens2016]. Here DESeq2 uses the ashr option to fit a mixture of normal distributions to form the prior, with `method="shrinkage"` **Note:** if the shrinkage estimator `type="ashr"` is used in published research, please cite: > Stephens, M. (2016) > False discovery rates: a new deal. *Biostatistics*, **18**:2. > [10.1093/biostatistics/kxw041](https://doi.org/10.1093/biostatistics/kxw041) ```{r warning=FALSE} resApe <- lfcShrink(dds, coef=2, type="apeglm") resAsh <- lfcShrink(dds, coef=2, type="ashr") ``` ```{r fig.width=8, fig.height=3} par(mfrow=c(1,3), mar=c(4,4,2,1)) xlim <- c(1,1e5); ylim <- c(-3,3) plotMA(resLFC, xlim=xlim, ylim=ylim, main="normal") plotMA(resApe, xlim=xlim, ylim=ylim, main="apeglm") plotMA(resAsh, xlim=xlim, ylim=ylim, main="ashr") ``` **Note:** due to the nature of the statistical model and optimization approach, `apeglm` is usually a factor of ~5 slower than `normal`. For example, with 10,000 genes and 10 samples, `normal` may take ~3 seconds, while `apeglm` takes ~15 seconds (on a laptop). However, `apeglm` can be more than an order of magnitude slower when there are many coefficients, e.g. 10 or more coefficients in `resultsNames(dds)`. The method `ashr` is fairly fast and does not depend on the number of coefficients, as it uses only the estimated MLE coefficients and their standard errors. A solution for speeding up `normal` and `apeglm` is to use multiple cores. This can be easily accomplished by loading the BiocParallel package, and then setting the following arguments of `lfcShrink`: `parallel=TRUE` and `BPPARAM=MulticoreParam(4)`, for example, splitting the job over 4 cores. This approach can also be used with `DESeq` and `results`, as mentioned [above](#parallel). **Note:** If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this, which can be accommodated in DESeq2 by including in the design any known batch variables or by using functions/packages such as `svaseq` in [sva](http://bioconductor.org/packages/sva) [@Leek2014] or the `RUV` functions in [RUVSeq](http://bioconductor.org/packages/RUVSeq) [@Risso2014] to estimate variables that capture the unwanted variation. In addition, the ashr developers have a [specific method](https://github.com/dcgerard/vicar) for accounting for unwanted variation in combination with ashr [@Gerard2017]. ### Plot counts It can also be useful to examine the counts of reads for a single gene across the groups. A simple function for making this plot is *plotCounts*, which normalizes counts by sequencing depth and adds a pseudocount of 1/2 to allow for log scale plotting. The counts are grouped by the variables in `intgroup`, where more than one variable can be specified. Here we specify the gene which had the smallest *p* value from the results table created above. You can select the gene to plot by rowname or by numeric index. ```{r plotCounts} plotCounts(dds, gene=which.min(res$padj), intgroup="condition") ``` For customized plotting, an argument `returnData` specifies that the function should only return a *data.frame* for plotting with *ggplot*. ```{r plotCountsAdv} d <- plotCounts(dds, gene=which.min(res$padj), intgroup="condition", returnData=TRUE) library("ggplot2") ggplot(d, aes(x=condition, y=count)) + geom_point(position=position_jitter(w=0.1,h=0)) + scale_y_log10(breaks=c(25,100,400)) ``` ### More information on results columns Information about which variables and tests were used can be found by calling the function *mcols* on the results object. ```{r metadata} mcols(res)$description ``` For a particular gene, a log2 fold change of -1 for `condition treated vs untreated` means that the treatment induces a multiplicative change in observed gene expression level of $2^{-1} = 0.5$ compared to the untreated condition. If the variable of interest is continuous-valued, then the reported log2 fold change is per unit of change of that variable. **Note on p-values set to NA**: some values in the results table can be set to `NA` for one of the following reasons: * If within a row, all samples have zero counts, the `baseMean` column will be zero, and the log2 fold change estimates, *p* value and adjusted *p* value will all be set to `NA`. * If a row contains a sample with an extreme count outlier then the *p* value and adjusted *p* value will be set to `NA`. These outlier counts are detected by Cook's distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described [below](#outlier) * If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted *p* value will be set to `NA`. Description and customization of independent filtering is described [below](#indfilt) ### Rich visualization and reporting of results **ReportingTools.** An HTML report of the results with plots and sortable/filterable columns can be generated using the [ReportingTools](http://bioconductor.org/packages/ReportingTools) package on a *DESeqDataSet* that has been processed by the *DESeq* function. For a code example, see the *RNA-seq differential expression* vignette at the [ReportingTools](http://bioconductor.org/packages/ReportingTools) page, or the manual page for the *publish* method for the *DESeqDataSet* class. **regionReport.** An HTML and PDF summary of the results with plots can also be generated using the [regionReport](http://bioconductor.org/packages/regionReport) package. The *DESeq2Report* function should be run on a *DESeqDataSet* that has been processed by the *DESeq* function. For more details see the manual page for *DESeq2Report* and an example vignette in the [regionReport](http://bioconductor.org/packages/regionReport) package. **Glimma.** Interactive visualization of DESeq2 output, including MA-plots (also called MD-plot) can be generated using the [Glimma](http://bioconductor.org/packages/Glimma) package. See the manual page for *glMDPlot.DESeqResults*. **pcaExplorer.** Interactive visualization of DESeq2 output, including PCA plots, boxplots of counts and other useful summaries can be generated using the [pcaExplorer](http://bioconductor.org/packages/pcaExplorer) package. See the *Launching the application* section of the package vignette. ### Exporting results to CSV files A plain-text file of the results can be exported using the base R functions *write.csv* or *write.delim*. We suggest using a descriptive file name indicating the variable and levels which were tested. ```{r export, eval=FALSE} write.csv(as.data.frame(resOrdered), file="condition_treated_results.csv") ``` Exporting only the results which pass an adjusted *p* value threshold can be accomplished with the *subset* function, followed by the *write.csv* function. ```{r subset} resSig <- subset(resOrdered, padj < 0.1) resSig ``` ## Multi-factor designs Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables. In fact, DESeq2 can analyze any possible experimental design that can be expressed with fixed effects terms (multiple factors, designs with interactions, designs with continuous variables, splines, and so on are all possible). By adding variables to the design, one can control for additional variation in the counts. For example, if the condition samples are balanced across experimental batches, by including the `batch` factor to the design, one can increase the sensitivity for finding differences due to `condition`. There are multiple ways to analyze experiments when the additional variables are of interest and not just controlling factors (see [section on interactions](#interactions)). The data in the [pasilla](http://bioconductor.org/packages/pasilla) package have a condition of interest (the column `condition`), as well as information on the type of sequencing which was performed (the column `type`), as we can see below: ```{r multifactor} colData(dds) ``` We create a copy of the *DESeqDataSet*, so that we can rerun the analysis using a multi-factor design. ```{r copyMultifactor} ddsMF <- dds ``` We change the levels of `type` so it only contains letters (numbers, underscore and period are also allowed in design factor levels). Be careful when changing level names to use the same order as the current levels. ```{r fixLevels} levels(ddsMF$type) levels(ddsMF$type) <- sub("-.*", "", levels(ddsMF$type)) levels(ddsMF$type) ``` We can account for the different types of sequencing, and get a clearer picture of the differences attributable to the treatment. As `condition` is the variable of interest, we put it at the end of the formula. Thus the *results* function will by default pull the `condition` results unless `contrast` or `name` arguments are specified. Then we can re-run *DESeq*: ```{r replaceDesign} design(ddsMF) <- formula(~ type + condition) ddsMF <- DESeq(ddsMF) ``` Again, we access the results using the *results* function. ```{r multiResults} resMF <- results(ddsMF) head(resMF) ``` It is also possible to retrieve the log2 fold changes, *p* values and adjusted *p* values of the `type` variable. The `contrast` argument of the function *results* takes a character vector of length three: the name of the variable, the name of the factor level for the numerator of the log2 ratio, and the name of the factor level for the denominator. The `contrast` argument can also take other forms, as described in the help page for *results* and [below](#contrasts) ```{r multiTypeResults} resMFType <- results(ddsMF, contrast=c("type", "single", "paired")) head(resMFType) ``` If the variable is continuous or an interaction term (see [section on interactions](#interactions)) then the results can be extracted using the `name` argument to *results*, where the name is one of elements returned by `resultsNames(dds)`. # Data transformations and visualization ## Count data transformations In order to test for differential expression, we operate on raw counts and use discrete distributions as described in the previous section on differential expression. However for other downstream analyses -- e.g. for visualization or clustering -- it might be useful to work with transformed versions of the count data. Maybe the most obvious choice of transformation is the logarithm. Since count values for a gene can be zero in some conditions (and non-zero in others), some advocate the use of *pseudocounts*, i.e. transformations of the form: $$ y = \log_2(n + n_0) $$ where *n* represents the count values and $n_0$ is a positive constant. In this section, we discuss two alternative approaches that offer more theoretical justification and a rational way of choosing parameters equivalent to $n_0$ above. One makes use of the concept of variance stabilizing transformations (VST) [@Tibshirani1988; @sagmb2003; @Anders:2010:GB], and the other is the *regularized logarithm* or *rlog*, which incorporates a prior on the sample differences [@Love2014]. Both transformations produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors. The point of these two transformations, the VST and the *rlog*, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low. Both VST and *rlog* use the experiment-wide trend of variance over mean, in order to transform the data to remove the experiment-wide trend. Note that we do not require or desire that all the genes have *exactly* the same variance after transformation. Indeed, in a figure below, you will see that after the transformations the genes with the same mean do not have exactly the same standard deviations, but that the experiment-wide trend has flattened. It is those genes with row variance above the trend which will allow us to cluster samples into interesting groups. **Note on running time:** if you have many samples (e.g. 100s), the *rlog* function might take too long, and so the *vst* function will be a faster choice. The rlog and VST have similar properties, but the rlog requires fitting a shrinkage term for each sample and each gene which takes time. See the DESeq2 paper for more discussion on the differences [@Love2014]. ### Blind dispersion estimation The two functions, *vst* and *rlog* have an argument `blind`, for whether the transformation should be blind to the sample information specified by the design formula. When `blind` equals `TRUE` (the default), the functions will re-estimate the dispersions using only an intercept. This setting should be used in order to compare samples in a manner wholly unbiased by the information about experimental groups, for example to perform sample QA (quality assurance) as demonstrated below. However, blind dispersion estimation is not the appropriate choice if one expects that many or the majority of genes (rows) will have large differences in counts which are explainable by the experimental design, and one wishes to transform the data for downstream analysis. In this case, using blind dispersion estimation will lead to large estimates of dispersion, as it attributes differences due to experimental design as unwanted *noise*, and will result in overly shrinking the transformed values towards each other. By setting `blind` to `FALSE`, the dispersions already estimated will be used to perform transformations, or if not present, they will be estimated using the current design formula. Note that only the fitted dispersion estimates from mean-dispersion trend line are used in the transformation (the global dependence of dispersion on mean for the entire experiment). So setting `blind` to `FALSE` is still for the most part not using the information about which samples were in which experimental group in applying the transformation. ### Extracting transformed values These transformation functions return an object of class *DESeqTransform* which is a subclass of *RangedSummarizedExperiment*. For ~20 samples, running on a newly created `DESeqDataSet`, *rlog* may take 30 seconds, while *vst* takes less than 1 second. The running times are shorter when using `blind=FALSE` and if the function *DESeq* has already been run, because then it is not necessary to re-estimate the dispersion values. The *assay* function is used to extract the matrix of normalized values. ```{r rlogAndVST} vsd <- vst(dds, blind=FALSE) rld <- rlog(dds, blind=FALSE) head(assay(vsd), 3) ``` ### Variance stabilizing transformation Above, we used a parametric fit for the dispersion. In this case, the closed-form expression for the variance stabilizing transformation is used by the *vst* function. If a local fit is used (option `fitType="locfit"` to *estimateDispersions*) a numerical integration is used instead. The transformed data should be approximated variance stabilized and also includes correction for size factors or normalization factors. The transformed data is on the log2 scale for large counts. ### Regularized log transformation The function *rlog*, stands for *regularized log*, transforming the original count data to the log2 scale by fitting a model with a term for each sample and a prior distribution on the coefficients which is estimated from the data. This is the same kind of shrinkage (sometimes referred to as regularization, or moderation) of log fold changes used by the *DESeq* and *nbinomWaldTest*. The resulting data contains elements defined as: $$ \log_2(q_{ij}) = \beta_{i0} + \beta_{ij} $$ where $q_{ij}$ is a parameter proportional to the expected true concentration of fragments for gene *i* and sample *j* (see formula [below](#theory)), $\beta_{i0}$ is an intercept which does not undergo shrinkage, and $\beta_{ij}$ is the sample-specific effect which is shrunk toward zero based on the dispersion-mean trend over the entire dataset. The trend typically captures high dispersions for low counts, and therefore these genes exhibit higher shrinkage from the *rlog*. Note that, as $q_{ij}$ represents the part of the mean value $\mu_{ij}$ after the size factor $s_j$ has been divided out, it is clear that the rlog transformation inherently accounts for differences in sequencing depth. Without priors, this design matrix would lead to a non-unique solution, however the addition of a prior on non-intercept betas allows for a unique solution to be found. ### Effects of transformations on the variance The figure below plots the standard deviation of the transformed data, across samples, against the mean, using the shifted logarithm transformation, the regularized log transformation and the variance stabilizing transformation. The shifted logarithm has elevated standard deviation in the lower count range, and the regularized log to a lesser extent, while for the variance stabilized data the standard deviation is roughly constant along the whole dynamic range. Note that the vertical axis in such plots is the square root of the variance over all samples, so including the variance due to the experimental conditions. While a flat curve of the square root of variance over the mean may seem like the goal of such transformations, this may be unreasonable in the case of datasets with many true differences due to the experimental conditions. ```{r meansd} # this gives log2(n + 1) ntd <- normTransform(dds) library("vsn") meanSdPlot(assay(ntd)) meanSdPlot(assay(vsd)) meanSdPlot(assay(rld)) ``` ## Data quality assessment by sample clustering and visualization Data quality assessment and quality control (i.e. the removal of insufficiently good data) are essential steps of any data analysis. These steps should typically be performed very early in the analysis of a new data set, preceding or in parallel to the differential expression testing. We define the term *quality* as *fitness for purpose*. Our purpose is the detection of differentially expressed genes, and we are looking in particular for samples whose experimental treatment suffered from an anormality that renders the data points obtained from these particular samples detrimental to our purpose. ### Heatmap of the count matrix To explore a count matrix, it is often instructive to look at it as a heatmap. Below we show how to produce such a heatmap for various transformations of the data. ```{r heatmap} library("pheatmap") select <- order(rowMeans(counts(dds,normalized=TRUE)), decreasing=TRUE)[1:20] df <- as.data.frame(colData(dds)[,c("condition","type")]) pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) pheatmap(assay(vsd)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) pheatmap(assay(rld)[select,], cluster_rows=FALSE, show_rownames=FALSE, cluster_cols=FALSE, annotation_col=df) ``` ### Heatmap of the sample-to-sample distances Another use of the transformed data is sample clustering. Here, we apply the *dist* function to the transpose of the transformed count matrix to get sample-to-sample distances. ```{r sampleClust} sampleDists <- dist(t(assay(vsd))) ``` A heatmap of this distance matrix gives us an overview over similarities and dissimilarities between samples. We have to provide a hierarchical clustering `hc` to the heatmap function based on the sample distances, or else the heatmap function would calculate a clustering based on the distances between the rows/columns of the distance matrix. ```{r figHeatmapSamples, fig.height=4, fig.width=6} library("RColorBrewer") sampleDistMatrix <- as.matrix(sampleDists) rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-") colnames(sampleDistMatrix) <- NULL colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) pheatmap(sampleDistMatrix, clustering_distance_rows=sampleDists, clustering_distance_cols=sampleDists, col=colors) ``` ### Principal component plot of the samples Related to the distance matrix is the PCA plot, which shows the samples in the 2D plane spanned by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects. ```{r figPCA} plotPCA(vsd, intgroup=c("condition", "type")) ``` It is also possible to customize the PCA plot using the *ggplot* function. ```{r figPCA2} pcaData <- plotPCA(vsd, intgroup=c("condition", "type"), returnData=TRUE) percentVar <- round(100 * attr(pcaData, "percentVar")) ggplot(pcaData, aes(PC1, PC2, color=condition, shape=type)) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar[1],"% variance")) + ylab(paste0("PC2: ",percentVar[2],"% variance")) + coord_fixed() ``` # Variations to the standard workflow ## Wald test individual steps The function *DESeq* runs the following functions in order: ```{r WaldTest, eval=FALSE} dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) ``` ## Contrasts A contrast is a linear combination of estimated log2 fold changes, which can be used to test if differences between groups are equal to zero. The simplest use case for contrasts is an experimental design containing a factor with three levels, say A, B and C. Contrasts enable the user to generate results for all 3 possible differences: log2 fold change of B vs A, of C vs A, and of C vs B. The `contrast` argument of *results* function is used to extract test results of log2 fold changes of interest, for example: ```{r simpleContrast, eval=FALSE} results(dds, contrast=c("condition","C","B")) ``` Log2 fold changes can also be added and subtracted by providing a `list` to the `contrast` argument which has two elements: the names of the log2 fold changes to add, and the names of the log2 fold changes to subtract. The names used in the list should come from `resultsNames(dds)`. Alternatively, a numeric vector of the length of `resultsNames(dds)` can be provided, for manually specifying the linear combination of terms. Demonstrations of the use of contrasts for various designs can be found in the examples section of the help page for the *results* function. The mathematical formula that is used to generate the contrasts can be found [below](#theory). ## Interactions Interaction terms can be added to the design formula, in order to test, for example, if the log2 fold change attributable to a given condition is *different* based on another factor, for example if the condition effect differs across genotype. Many users begin to add interaction terms to the design formula, when in fact a much simpler approach would give all the results tables that are desired. We will explain this approach first, because it is much simpler to perform. If the comparisons of interest are, for example, the effect of a condition for different sets of samples, a simpler approach than adding interaction terms explicitly to the design formula is to perform the following steps: * combine the factors of interest into a single factor with all combinations of the original factors * change the design to include just this factor, e.g. ~ group Using this design is similar to adding an interaction term, in that it models multiple condition effects which can be easily extracted with *results*. Suppose we have two factors `genotype` (with values I, II, and III) and `condition` (with values A and B), and we want to extract the condition effect specifically for each genotype. We could use the following approach to obtain, e.g. the condition effect for genotype I: ```{r combineFactors, eval=FALSE} dds$group <- factor(paste0(dds$genotype, dds$condition)) design(dds) <- ~ group dds <- DESeq(dds) resultsNames(dds) results(dds, contrast=c("group", "IB", "IA")) ``` The following two plots diagram hypothetical genotype-specific condition effects, which could be modeled with interaction terms by using a design of `~genotype + condition + genotype:condition`. In the first plot (Gene 1), note that the condition effect is consistent across genotypes. Although condition A has a different baseline for I,II, and III, the condition effect is a log2 fold change of about 2 for each genotype. Using a model with an interaction term `genotype:condition`, the interaction terms for genotype II and genotype III will be nearly 0. Here, the y-axis represents log2(n+1), and each group has 20 samples (black dots). A red line connects the mean of the groups within each genotype. ```{r interFig, echo=FALSE, results="hide", fig.height=3} npg <- 20 mu <- 2^c(8,10,9,11,10,12) cond <- rep(rep(c("A","B"),each=npg),3) geno <- rep(c("I","II","III"),each=2*npg) table(cond, geno) counts <- rnbinom(6*npg, mu=rep(mu,each=npg), size=1/.01) d <- data.frame(log2c=log2(counts+1), cond, geno) library("ggplot2") plotit <- function(d, title) { ggplot(d, aes(x=cond, y=log2c, group=geno)) + geom_jitter(size=1.5, position = position_jitter(width=.15)) + facet_wrap(~ geno) + stat_summary(fun.y=mean, geom="line", colour="red", size=0.8) + xlab("condition") + ylab("log2(counts+1)") + ggtitle(title) } plotit(d, "Gene 1") + ylim(7,13) lm(log2c ~ cond + geno + geno:cond, data=d) ``` In the second plot (Gene 2), we can see that the condition effect is not consistent across genotype. Here the main condition effect (the effect for the reference genotype I) is again 2. However, this time the interaction terms will be around 1 for genotype II and -4 for genotype III. This is because the condition effect is higher by 1 for genotype II compared to genotype I, and lower by 4 for genotype III compared to genotype I. The condition effect for genotype II (or III) is obtained by adding the main condition effect and the interaction term for that genotype. Such a plot can be made using the *plotCounts* function as shown above. ```{r interFig2, echo=FALSE, results="hide", fig.height=3} mu[4] <- 2^12 mu[6] <- 2^8 counts <- rnbinom(6*npg, mu=rep(mu,each=npg), size=1/.01) d2 <- data.frame(log2c=log2(counts + 1), cond, geno) plotit(d2, "Gene 2") + ylim(7,13) lm(log2c ~ cond + geno + geno:cond, data=d2) ``` Now we will continue to explain the use of interactions in order to test for *differences* in condition effects. We continue with the example of condition effects across three genotypes (I, II, and III). The key point to remember about designs with interaction terms is that, unlike for a design `~genotype + condition`, where the condition effect represents the *overall* effect controlling for differences due to genotype, by adding `genotype:condition`, the main condition effect only represents the effect of condition for the *reference level* of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms `genotypeII.conditionB` and `genotypeIII.conditionB` give the *difference* between the condition effect for a given genotype and the condition effect for the reference genotype. This genotype-condition interaction example is examined in further detail in Example 3 in the help page for *results*, which can be found by typing `?results`. In particular, we show how to test for differences in the condition effect across genotype, and we show how to obtain the condition effect for non-reference genotypes. ## Time-series experiments There are a number of ways to analyze time-series experiments, depending on the biological question of interest. In order to test for any differences over multiple time points, once can use a design including the time factor, and then test using the likelihood ratio test as described in the following section, where the time factor is removed in the reduced formula. For a control and treatment time series, one can use a design formula containing the condition factor, the time factor, and the interaction of the two. In this case, using the likelihood ratio test with a reduced model which does not contain the interaction terms will test whether the condition induces a change in gene expression at any time point after the reference level time point (time 0). An example of the later analysis is provided in our [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene). ## Likelihood ratio test DESeq2 offers two kinds of hypothesis tests: the Wald test, where we use the estimated standard error of a log2 fold change to test if it is equal to zero, and the likelihood ratio test (LRT). The LRT examines two models for the counts, a *full* model with a certain number of terms and a *reduced* model, in which some of the terms of the *full* model are removed. The test determines if the increased likelihood of the data using the extra terms in the *full* model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the *deviance* captures the difference in likelihood between a full and a reduced model. The likelihood ratio test can be performed by specifying `test="LRT"` when using the *DESeq* function, and providing a reduced design formula, e.g. one in which a number of terms from `design(dds)` are removed. The degrees of freedom for the test is obtained from the difference between the number of parameters in the two models. A simple likelihood ratio test, if the full design was `~condition` would look like: ```{r simpleLRT, eval=FALSE} dds <- DESeq(dds, test="LRT", reduced=~1) res <- results(dds) ``` If the full design contained other variables, such as a batch variable, e.g. `~batch + condition` then the likelihood ratio test would look like: ```{r simpleLRT2, eval=FALSE} dds <- DESeq(dds, test="LRT", reduced=~batch) res <- results(dds) ``` ## Approach to count outliers RNA-seq data sometimes contain isolated instances of very large counts that are apparently unrelated to the experimental or study design, and which may be considered outliers. There are many reasons why outliers can arise, including rare technical or experimental artifacts, read mapping problems in the case of genetically differing samples, and genuine, but rare biological events. In many cases, users appear primarily interested in genes that show a consistent behavior, and this is the reason why by default, genes that are affected by such outliers are set aside by DESeq2, or if there are sufficient samples, outlier counts are replaced for model fitting. These two behaviors are described below. The *DESeq* function calculates, for every gene and for every sample, a diagnostic test for outliers called *Cook's distance*. Cook's distance is a measure of how much a single sample is influencing the fitted coefficients for a gene, and a large value of Cook's distance is intended to indicate an outlier count. The Cook's distances are stored as a matrix available in `assays(dds)[["cooks"]]`. The *results* function automatically flags genes which contain a Cook's distance above a cutoff for samples which have 3 or more replicates. The *p* values and adjusted *p* values for these genes are set to `NA`. At least 3 replicates are required for flagging, as it is difficult to judge which sample might be an outlier with only 2 replicates. This filtering can be turned off with `results(dds, cooksCutoff=FALSE)`. With many degrees of freedom -- i.\,e., many more samples than number of parameters to be estimated -- it is undesirable to remove entire genes from the analysis just because their data include a single count outlier. When there are 7 or more replicates for a given sample, the *DESeq* function will automatically replace counts with large Cook's distance with the trimmed mean over all samples, scaled up by the size factor or normalization factor for that sample. This approach is conservative, it will not lead to false positives, as it replaces the outlier value with the value predicted by the null hypothesis. This outlier replacement only occurs when there are 7 or more replicates, and can be turned off with `DESeq(dds, minReplicatesForReplace=Inf)`. The default Cook's distance cutoff for the two behaviors described above depends on the sample size and number of parameters to be estimated. The default is to use the 99% quantile of the F(p,m-p) distribution (with *p* the number of parameters including the intercept and *m* number of samples). The default for gene flagging can be modified using the `cooksCutoff` argument to the *results* function. For outlier replacement, *DESeq* preserves the original counts in `counts(dds)` saving the replacement counts as a matrix named `replaceCounts` in `assays(dds)`. Note that with continuous variables in the design, outlier detection and replacement is not automatically performed, as our current methods involve a robust estimation of within-group variance which does not extend easily to continuous covariates. However, users can examine the Cook's distances in `assays(dds)[["cooks"]]`, in order to perform manual visualization and filtering if necessary. **Note on many outliers:** if there are very many outliers (e.g. many hundreds or thousands) reported by `summary(res)`, one might consider further exploration to see if a single sample or a few samples should be removed due to low quality. The automatic outlier filtering/replacement is most useful in situations which the number of outliers is limited. When there are thousands of reported outliers, it might make more sense to turn off the outlier filtering/replacement (*DESeq* with `minReplicatesForReplace=Inf` and *results* with `cooksCutoff=FALSE`) and perform manual inspection: First it would be advantageous to make a PCA plot as described above to spot individual sample outliers; Second, one can make a boxplot of the Cook's distances to see if one sample is consistently higher than others (here this is not the case): ```{r boxplotCooks} par(mar=c(8,5,2,2)) boxplot(log10(assays(dds)[["cooks"]]), range=0, las=2) ``` ## Dispersion plot and fitting alternatives Plotting the dispersion estimates is a useful diagnostic. The dispersion plot below is typical, with the final estimates shrunk from the gene-wise estimates towards the fitted estimates. Some gene-wise estimates are flagged as outliers and not shrunk towards the fitted value, (this outlier detection is described in the manual page for *estimateDispersionsMAP*). The amount of shrinkage can be more or less than seen here, depending on the sample size, the number of coefficients, the row mean and the variability of the gene-wise estimates. ```{r dispFit} plotDispEsts(dds) ``` ### Local or mean dispersion fit A local smoothed dispersion fit is automatically substitited in the case that the parametric curve doesn't fit the observed dispersion mean relationship. This can be prespecified by providing the argument `fitType="local"` to either *DESeq* or *estimateDispersions*. Additionally, using the mean of gene-wise disperion estimates as the fitted value can be specified by providing the argument `fitType="mean"`. ### Supply a custom dispersion fit Any fitted values can be provided during dispersion estimation, using the lower-level functions described in the manual page for *estimateDispersionsGeneEst*. In the code chunk below, we store the gene-wise estimates which were already calculated and saved in the metadata column `dispGeneEst`. Then we calculate the median value of the dispersion estimates above a threshold, and save these values as the fitted dispersions, using the replacement function for *dispersionFunction*. In the last line, the function *estimateDispersionsMAP*, uses the fitted dispersions to generate maximum *a posteriori* (MAP) estimates of dispersion. ```{r dispFitCustom} ddsCustom <- dds useForMedian <- mcols(ddsCustom)$dispGeneEst > 1e-7 medianDisp <- median(mcols(ddsCustom)$dispGeneEst[useForMedian], na.rm=TRUE) dispersionFunction(ddsCustom) <- function(mu) medianDisp ddsCustom <- estimateDispersionsMAP(ddsCustom) ``` ## Independent filtering of results The *results* function of the DESeq2 package performs independent filtering by default using the mean of normalized counts as a filter statistic. A threshold on the filter statistic is found which optimizes the number of adjusted *p* values lower than a significance level `alpha` (we use the standard variable name for significance level, though it is unrelated to the dispersion parameter $\alpha$). The theory behind independent filtering is discussed in greater detail [below](#indfilttheory). The adjusted *p* values for the genes which do not pass the filter threshold are set to `NA`. The default independent filtering is performed using the *filtered_p* function of the [genefilter](http://bioconductor.org/packages/genefilter) package, and all of the arguments of *filtered_p* can be passed to the *results* function. The filter threshold value and the number of rejections at each quantile of the filter statistic are available as metadata of the object returned by *results*. For example, we can visualize the optimization by plotting the `filterNumRej` attribute of the results object. The *results* function maximizes the number of rejections (adjusted *p* value less than a significance level), over the quantiles of a filter statistic (the mean of normalized counts). The threshold chosen (vertical line) is the lowest quantile of the filter for which the number of rejections is within 1 residual standard deviation to the peak of a curve fit to the number of rejections over the filter quantiles: ```{r filtByMean} metadata(res)$alpha metadata(res)$filterThreshold plot(metadata(res)$filterNumRej, type="b", ylab="number of rejections", xlab="quantiles of filter") lines(metadata(res)$lo.fit, col="red") abline(v=metadata(res)$filterTheta) ``` Independent filtering can be turned off by setting `independentFiltering` to `FALSE`. ```{r noFilt} resNoFilt <- results(dds, independentFiltering=FALSE) addmargins(table(filtering=(res$padj < .1), noFiltering=(resNoFilt$padj < .1))) ``` ## Tests of log2 fold change above or below a threshold It is also possible to provide thresholds for constructing Wald tests of significance. Two arguments to the *results* function allow for threshold-based Wald tests: `lfcThreshold`, which takes a numeric of a non-negative threshold value, and `altHypothesis`, which specifies the kind of test. Note that the *alternative hypothesis* is specified by the user, i.e. those genes which the user is interested in finding, and the test provides *p* values for the null hypothesis, the complement of the set defined by the alternative. The `altHypothesis` argument can take one of the following four values, where $\beta$ is the log2 fold change specified by the `name` argument, and $x$ is the `lfcThreshold`. * `greaterAbs` - $|\beta| > x$ - tests are two-tailed * `lessAbs` - $|\beta| < x$ - *p* values are the maximum of the upper and lower tests * `greater` - $\beta > x$ * `less` - $\beta < -x$ The four possible values of `altHypothesis` are demonstrated in the following code and visually by MA-plots in the following figures. ```{r lfcThresh} par(mfrow=c(2,2),mar=c(2,2,1,1)) ylim <- c(-2.5,2.5) resGA <- results(dds, lfcThreshold=.5, altHypothesis="greaterAbs") resLA <- results(dds, lfcThreshold=.5, altHypothesis="lessAbs") resG <- results(dds, lfcThreshold=.5, altHypothesis="greater") resL <- results(dds, lfcThreshold=.5, altHypothesis="less") drawLines <- function() abline(h=c(-.5,.5),col="dodgerblue",lwd=2) plotMA(resGA, ylim=ylim); drawLines() plotMA(resLA, ylim=ylim); drawLines() plotMA(resG, ylim=ylim); drawLines() plotMA(resL, ylim=ylim); drawLines() ``` ## Access to all calculated values All row-wise calculated values (intermediate dispersion calculations, coefficients, standard errors, etc.) are stored in the *DESeqDataSet* object, e.g. `dds` in this vignette. These values are accessible by calling *mcols* on `dds`. Descriptions of the columns are accessible by two calls to *mcols*. Note that the call to `substr` below is only for display purposes. ```{r mcols} mcols(dds,use.names=TRUE)[1:4,1:4] substr(names(mcols(dds)),1,10) mcols(mcols(dds), use.names=TRUE)[1:4,] ``` The mean values $\mu_{ij} = s_j q_{ij}$ and the Cook's distances for each gene and sample are stored as matrices in the assays slot: ```{r muAndCooks} head(assays(dds)[["mu"]]) head(assays(dds)[["cooks"]]) ``` The dispersions $\alpha_i$ can be accessed with the *dispersions* function. ```{r dispersions} head(dispersions(dds)) head(mcols(dds)$dispersion) ``` The size factors $s_j$ are accessible via *sizeFactors*: ```{r sizefactors} sizeFactors(dds) ``` For advanced users, we also include a convenience function *coef* for extracting the matrix $[\beta_{ir}]$ for all genes *i* and model coefficients $r$. This function can also return a matrix of standard errors, see `?coef`. The columns of this matrix correspond to the effects returned by *resultsNames*. Note that the *results* function is best for building results tables with *p* values and adjusted *p* values. ```{r coef} head(coef(dds)) ``` The beta prior variance $\sigma_r^2$ is stored as an attribute of the *DESeqDataSet*: ```{r betaPriorVar} attr(dds, "betaPriorVar") ``` General information about the prior used for log fold change shrinkage is also stored in a slot of the *DESeqResults* object. This would also contain information about what other packages were used for log2 fold change shrinkage. ```{r priorInfo} priorInfo(resLFC) priorInfo(resApe) priorInfo(resAsh) ``` The dispersion prior variance $\sigma_d^2$ is stored as an attribute of the dispersion function: ```{r dispPriorVar} dispersionFunction(dds) attr(dispersionFunction(dds), "dispPriorVar") ``` The version of DESeq2 which was used to construct the *DESeqDataSet* object, or the version used when *DESeq* was run, is stored here: ```{r versionNum} metadata(dds)[["version"]] ``` ## Sample-/gene-dependent normalization factors In some experiments, there might be gene-dependent dependencies which vary across samples. For instance, GC-content bias or length bias might vary across samples coming from different labs or processed at different times. We use the terms *normalization factors* for a gene x sample matrix, and *size factors* for a single number per sample. Incorporating normalization factors, the mean parameter $\mu_{ij}$ becomes: $$ \mu_{ij} = NF_{ij} q_{ij} $$ with normalization factor matrix *NF* having the same dimensions as the counts matrix *K*. This matrix can be incorporated as shown below. We recommend providing a matrix with row-wise geometric means of 1, so that the mean of normalized counts for a gene is close to the mean of the unnormalized counts. This can be accomplished by dividing out the current row geometric means. ```{r normFactors, eval=FALSE} normFactors <- normFactors / exp(rowMeans(log(normFactors))) normalizationFactors(dds) <- normFactors ``` These steps then replace *estimateSizeFactors* which occurs within the *DESeq* function. The *DESeq* function will look for pre-existing normalization factors and use these in the place of size factors (and a message will be printed confirming this). The methods provided by the [cqn](http://bioconductor.org/packages/cqn) or [EDASeq](http://bioconductor.org/packages/EDASeq) packages can help correct for GC or length biases. They both describe in their vignettes how to create matrices which can be used by DESeq2. From the formula above, we see that normalization factors should be on the scale of the counts, like size factors, and unlike offsets which are typically on the scale of the predictors (i.e. the logarithmic scale for the negative binomial GLM). At the time of writing, the transformation from the matrices provided by these packages should be: ```{r offsetTransform, eval=FALSE} cqnOffset <- cqnObject$glm.offset cqnNormFactors <- exp(cqnOffset) EDASeqNormFactors <- exp(-1 * EDASeqOffset) ``` ## "Model matrix not full rank" While most experimental designs run easily using design formula, some design formulas can cause problems and result in the *DESeq* function returning an error with the text: "the model matrix is not full rank, so the model cannot be fit as specified." There are two main reasons for this problem: either one or more columns in the model matrix are linear combinations of other columns, or there are levels of factors or combinations of levels of multiple factors which are missing samples. We address these two problems below and discuss possible solutions: ### Linear combinations The simplest case is the linear combination, or linear dependency problem, when two variables contain exactly the same information, such as in the following sample table. The software cannot fit an effect for `batch` and `condition`, because they produce identical columns in the model matrix. This is also referred to as *perfect confounding*. A unique solution of coefficients (the $\beta_i$ in the formula [below](#theory)) is not possible. ```{r lineardep, echo=FALSE} DataFrame(batch=factor(c(1,1,2,2)), condition=factor(c("A","A","B","B"))) ``` Another situation which will cause problems is when the variables are not identical, but one variable can be formed by the combination of other factor levels. In the following example, the effect of batch 2 vs 1 cannot be fit because it is identical to a column in the model matrix which represents the condition C vs A effect. ```{r lineardep2, echo=FALSE} DataFrame(batch=factor(c(1,1,1,1,2,2)), condition=factor(c("A","A","B","B","C","C"))) ``` In both of these cases above, the batch effect cannot be fit and must be removed from the model formula. There is just no way to tell apart the condition effects and the batch effects. The options are either to assume there is no batch effect (which we know is highly unlikely given the literature on batch effects in sequencing datasets) or to repeat the experiment and properly balance the conditions across batches. A balanced design would look like: ```{r lineardep3, echo=FALSE} DataFrame(batch=factor(c(1,1,1,2,2,2)), condition=factor(c("A","B","C","A","B","C"))) ``` ### Group-specific condition effects, individuals nested within groups Finally, there is a case where we *can* in fact perform inference, but we may need to re-arrange terms to do so. Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a condition or treatment, while controlling for individual effects. The individuals are nested within the groups: an individual can only be in one of the groups, although each individual has one or more observations across condition. An example of such an experiment is below: ```{r groupeffect} coldata <- DataFrame(grp=factor(rep(c("X","Y"),each=6)), ind=factor(rep(1:6,each=2)), cnd=factor(rep(c("A","B"),6))) coldata ``` Note that individual (`ind`) is a *factor* not a numeric. This is very important. To make R display all the rows, we can do: ```{r} as.data.frame(coldata) ``` We have two groups of samples X and Y, each with three distinct individuals (labeled here 1-6). For each individual, we have conditions A and B (for example, this could be control and treated). This design can be analyzed by DESeq2 but requires a bit of refactoring in order to fit the model terms. Here we will use a trick described in the [edgeR](http://bioconductor.org/packages/edgeR) user guide, from the section *Comparisons Both Between and Within Subjects*. If we try to analyze with a formula such as, `~ ind + grp*cnd`, we will obtain an error, because the effect for group is a linear combination of the individuals. However, the following steps allow for an analysis of group-specific condition effects, while controlling for differences in individual. For object construction, you can use a simple design, such as `~ ind + cnd`, as long as you remember to replace it before running *DESeq*. Then add a column `ind.n` which distinguishes the individuals nested within a group. Here, we add this column to coldata, but in practice you would add this column to `dds`. ```{r groupeffect2} coldata$ind.n <- factor(rep(rep(1:3,each=2),2)) as.data.frame(coldata) ``` Now we can reassign our *DESeqDataSet* a design of `~ grp + grp:ind.n + grp:cnd`, before we call *DESeq*. This new design will result in the following model matrix: ```{r groupeffect3} model.matrix(~ grp + grp:ind.n + grp:cnd, coldata) ``` Note that, if you have unbalanced numbers of individuals in the two groups, you will have zeros for some of the interactions between `grp` and `ind.n`. You can remove these columns manually from the model matrix and pass the corrected model matrix to the `full` argument of the *DESeq* function. See example code in the next section. Above, the terms `grpX.cndB` and `grpY.cndB` give the group-specific condition effects, in other words, the condition B vs A effect for group X samples, and likewise for group Y samples. These terms control for all of the six individual effects. These group-specific condition effects can be extracted using *results* with the `name` argument. Furthermore, `grpX.cndB` and `grpY.cndB` can be contrasted using the `contrast` argument, in order to test if the condition effect is different across group: ```{r groupeffect4, eval=FALSE} results(dds, contrast=list("grpY.cndB","grpX.cndB")) ``` ### Levels without samples The base R function for creating model matrices will produce a column of zeros if a level is missing from a factor or a combination of levels is missing from an interaction of factors. The solution to the first case is to call *droplevels* on the column, which will remove levels without samples. This was shown in the beginning of this vignette. The second case is also solvable, by manually editing the model matrix, and then providing this to *DESeq*. Here we construct an example dataset to illustrate: ```{r missingcombo} group <- factor(rep(1:3,each=6)) condition <- factor(rep(rep(c("A","B","C"),each=2),3)) d <- DataFrame(group, condition)[-c(17,18),] as.data.frame(d) ``` Note that if we try to estimate all interaction terms, we introduce a column with all zeros, as there are no condition C samples for group 3. (Here, *unname* is used to display the matrix concisely.) ```{r missingcombo2} m1 <- model.matrix(~ condition*group, d) colnames(m1) unname(m1) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero ``` We can remove this column like so: ```{r missingcombo3} idx <- which(all.zero) m1 <- m1[,-idx] unname(m1) ``` Now this matrix `m1` can be provided to the `full` argument of *DESeq*. For a likelihood ratio test of interactions, a model matrix using a reduced design such as `~ condition + group` can be given to the `reduced` argument. Wald tests can also be generated instead of the likelihood ratio test, but for user-supplied model matrices, the argument `betaPrior` must be set to `FALSE`. # Theory behind DESeq2 ## The DESeq2 model The DESeq2 model and all the steps taken in the software are described in detail in our publication [@Love2014], and we include the formula and descriptions in this section as well. The differential expression analysis in DESeq2 uses a generalized linear model of the form: $$ K_{ij} \sim \textrm{NB}(\mu_{ij}, \alpha_i) $$ $$ \mu_{ij} = s_j q_{ij} $$ $$ \log_2(q_{ij}) = x_{j.} \beta_i $$ where counts $K_{ij}$ for gene *i*, sample *j* are modeled using a negative binomial distribution with fitted mean $\mu_{ij}$ and a gene-specific dispersion parameter $\alpha_i$. The fitted mean is composed of a sample-specific size factor $s_j$ and a parameter $q_{ij}$ proportional to the expected true concentration of fragments for sample *j*. The coefficients $\beta_i$ give the log2 fold changes for gene *i* for each column of the model matrix $X$. Note that the model can be generalized to use sample- and gene-dependent normalization factors $s_{ij}$. The dispersion parameter $\alpha_i$ defines the relationship between the variance of the observed count and its mean value. In other words, how far do we expected the observed count will be from the mean value, which depends both on the size factor $s_j$ and the covariate-dependent part $q_{ij}$ as defined above. $$ \textrm{Var}(K_{ij}) = E[ (K_{ij} - \mu_{ij})^2 ] = \mu_{ij} + \alpha_i \mu_{ij}^2 $$ An option in DESeq2 is to provide maximum *a posteriori* estimates of the log2 fold changes in $\beta_i$ after incorporating a zero-centered Normal prior (`betaPrior`). While previously, these moderated, or shrunken, estimates were generated by *DESeq* or *nbinomWaldTest* functions, they are now produced by the *lfcShrink* function. Dispersions are estimated using expected mean values from the maximum likelihood estimate of log2 fold changes, and optimizing the Cox-Reid adjusted profile likelihood, as first implemented for RNA-seq data in [edgeR](http://bioconductor.org/packages/edgeR) [@CR,edgeR_GLM]. The steps performed by the *DESeq* function are documented in its manual page `?DESeq`; briefly, they are: 1) estimation of size factors $s_j$ by *estimateSizeFactors* 2) estimation of dispersion $\alpha_i$ by *estimateDispersions* 3) negative binomial GLM fitting for $\beta_i$ and Wald statistics by *nbinomWaldTest* For access to all the values calculated during these steps, see the section [above](#access). ## Changes compared to DESeq The main changes in the package *DESeq2*, compared to the (older) version *DESeq*, are as follows: * *RangedSummarizedExperiment* is used as the superclass for storage of input data, intermediate calculations and results. * Optional, maximum *a posteriori* estimation of GLM coefficients incorporating a zero-centered Normal prior with variance estimated from data (equivalent to Tikhonov/ridge regularization). This adjustment has little effect on genes with high counts, yet it helps to moderate the otherwise large variance in log2 fold change estimates for genes with low counts or highly variable counts. These estimates are now provided by the *lfcShrink* function. * Maximum *a posteriori* estimation of dispersion replaces the `sharingMode` options `fit-only` or `maximum` of the previous version of the package. This is similar to the dispersion estimation methods of DSS [@Wu2012New]. * All estimation and inference is based on the generalized linear model, which includes the two condition case (previously the *exact test* was used). * The Wald test for significance of GLM coefficients is provided as the default inference method, with the likelihood ratio test of the previous version still available. * It is possible to provide a matrix of sample-/gene-dependent normalization factors. * Automatic independent filtering on the mean of normalized counts. * Automatic outlier detection and handling. ## Methods changes since the 2014 DESeq2 paper * In version 1.18 (November 2017), we add two [alternative shrinkage estimators](#alternative-shrinkage-estimators), which can be used via `lfcShrink`: an estimator using a t prior from the apeglm packages, and an estimator with a fitted mixture of normals prior from the ashr package. * In version 1.16 (November 2016), the log2 fold change shrinkage is no longer default for the *DESeq* and *nbinomWaldTest* functions, by setting the defaults of these to `betaPrior=FALSE`, and by introducing a separate function *lfcShrink*, which performs log2 fold change shrinkage for visualization and ranking of genes. While for the majority of bulk RNA-seq experiments, the LFC shrinkage did not affect statistical testing, DESeq2 has become used as an inference engine by a wider community, and certain sequencing datasets show better performance with the testing separated from the use of the LFC prior. Also, the separation of LFC shrinkage to a separate function `lfcShrink` allows for easier methods development of alternative effect size estimators. * A small change to the independent filtering routine: instead of taking the quantile of the filter (the mean of normalized counts) which directly *maximizes* the number of rejections, the threshold chosen is the lowest quantile of the filter for which the number of rejections is close to the peak of a curve fit to the number of rejections over the filter quantiles. ``Close to'' is defined as within 1 residual standard deviation. This change was introduced in version 1.10 (October 2015). * For the calculation of the beta prior variance, instead of matching the empirical quantile to the quantile of a Normal distribution, DESeq2 now uses the weighted quantile function of the Hmisc package. The weighting is described in the manual page for *nbinomWaldTest*. The weights are the inverse of the expected variance of log counts (as used in the diagonals of the matrix $W$ in the GLM). The effect of the change is that the estimated prior variance is robust against noisy estimates of log fold change from genes with very small counts. This change was introduced in version 1.6 (October 2014). For a list of all changes since version 1.0.0, see the `NEWS` file included in the package. ## Count outlier detection DESeq2 relies on the negative binomial distribution to make estimates and perform statistical inference on differences. While the negative binomial is versatile in having a mean and dispersion parameter, extreme counts in individual samples might not fit well to the negative binomial. For this reason, we perform automatic detection of count outliers. We use Cook's distance, which is a measure of how much the fitted coefficients would change if an individual sample were removed [@Cook1977Detection]. For more on the implementation of Cook's distance see the manual page for the *results* function. Below we plot the maximum value of Cook's distance for each row over the rank of the test statistic to justify its use as a filtering criterion. ```{r cooksPlot} W <- res$stat maxCooks <- apply(assays(dds)[["cooks"]],1,max) idx <- !is.na(W) plot(rank(W[idx]), maxCooks[idx], xlab="rank of Wald statistic", ylab="maximum Cook's distance per gene", ylim=c(0,5), cex=.4, col=rgb(0,0,0,.3)) m <- ncol(dds) p <- 3 abline(h=qf(.99, p, m - p)) ``` ## Contrasts Contrasts can be calculated for a *DESeqDataSet* object for which the GLM coefficients have already been fit using the Wald test steps (*DESeq* with `test="Wald"` or using *nbinomWaldTest*). The vector of coefficients $\beta$ is left multiplied by the contrast vector $c$ to form the numerator of the test statistic. The denominator is formed by multiplying the covariance matrix $\Sigma$ for the coefficients on either side by the contrast vector $c$. The square root of this product is an estimate of the standard error for the contrast. The contrast statistic is then compared to a normal distribution as are the Wald statistics for the DESeq2 package. $$ W = \frac{c^t \beta}{\sqrt{c^t \Sigma c}} $$ ## Expanded model matrices For the specific combination of `lfcShrink` with the type `normal` and using `contrast`, DESeq2 uses *expanded model matrices* to produce shrunken log2 fold change estimates where the shrinkage is independent of the choice of reference level. In all other cases, DESeq2 uses standard model matrices, as produced by `model.matrix`. The expanded model matrices differ from the standard model matrices, in that they have an indicator column (and therefore a coefficient) for each level of factors in the design formula in addition to an intercept. This is described in the DESeq2 paper, but the DESeq2 software package has moved away from this approach, with more support for shrinkage of individual coefficients (although the expanded model matrix approach is still supported using the above combination of functions and arguments). ## Independent filtering and multiple testing ### Filtering criteria The goal of independent filtering is to filter out those tests from the procedure that have no, or little chance of showing significant evidence, without even looking at their test statistic. Typically, this results in increased detection power at the same experiment-wide type I error. Here, we measure experiment-wide type I error in terms of the false discovery rate. A good choice for a filtering criterion is one that 1) is statistically independent from the test statistic under the null hypothesis, 2) is correlated with the test statistic under the alternative, and 3) does not notably change the dependence structure -- if there is any -- between the tests that pass the filter, compared to the dependence structure between the tests before filtering. The benefit from filtering relies on property (2), and we will explore it further below. Its statistical validity relies on property (1) -- which is simple to formally prove for many combinations of filter criteria with test statistics -- and (3), which is less easy to theoretically imply from first principles, but rarely a problem in practice. We refer to [@Bourgon:2010:PNAS] for further discussion of this topic. A simple filtering criterion readily available in the results object is the mean of normalized counts irrespective of biological condition, and so this is the criterion which is used automatically by the *results* function to perform independent filtering. Genes with very low counts are not likely to see significant differences typically due to high dispersion. For example, we can plot the $-\log_{10}$ *p* values from all genes over the normalized mean counts: ```{r indFilt} plot(res$baseMean+1, -log10(res$pvalue), log="x", xlab="mean of normalized counts", ylab=expression(-log[10](pvalue)), ylim=c(0,30), cex=.4, col=rgb(0,0,0,.3)) ``` ### Why does it work? Consider the *p* value histogram below It shows how the filtering ameliorates the multiple testing problem -- and thus the severity of a multiple testing adjustment -- by removing a background set of hypotheses whose *p* values are distributed more or less uniformly in [0,1]. ```{r histindepfilt} use <- res$baseMean > metadata(res)$filterThreshold h1 <- hist(res$pvalue[!use], breaks=0:50/50, plot=FALSE) h2 <- hist(res$pvalue[use], breaks=0:50/50, plot=FALSE) colori <- c(`do not pass`="khaki", `pass`="powderblue") ``` Histogram of p values for all tests. The area shaded in blue indicates the subset of those that pass the filtering, the area in khaki those that do not pass: ```{r fighistindepfilt} barplot(height = rbind(h1$counts, h2$counts), beside = FALSE, col = colori, space = 0, main = "", ylab="frequency") text(x = c(0, length(h1$counts)), y = 0, label = paste(c(0,1)), adj = c(0.5,1.7), xpd=NA) legend("topright", fill=rev(colori), legend=rev(names(colori))) ``` # Frequently asked questions ## How can I get support for DESeq2? We welcome questions about our software, and want to ensure that we eliminate issues if and when they appear. We have a few requests to optimize the process: * all questions should take place on the Bioconductor support site: , which serves as a repository of questions and answers. This helps to save the developers' time in responding to similar questions. Make sure to tag your post with `deseq2`. It is often very helpful in addition to describe the aim of your experiment. * before posting, first search the Bioconductor support site mentioned above for past threads which might have answered your question. * if you have a question about the behavior of a function, read the sections of the manual page for this function by typing a question mark and the function name, e.g. `?results`. We spend a lot of time documenting individual functions and the exact steps that the software is performing. * include all of your R code, especially the creation of the *DESeqDataSet* and the design formula. Include complete warning or error messages, and conclude your message with the full output of `sessionInfo()`. * if possible, include the output of `as.data.frame(colData(dds))`, so that we can have a sense of the experimental setup. If this contains confidential information, you can replace the levels of those factors using *levels()*. ## Why are some *p* values set to NA? See the details [above](#pvaluesNA). ## How can I get unfiltered DESeq2 results? Users can obtain unfiltered GLM results, i.e. without outlier removal or independent filtering with the following call: ```{r vanillaDESeq, eval=FALSE} dds <- DESeq(dds, minReplicatesForReplace=Inf) res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE) ``` In this case, the only *p* values set to `NA` are those from genes with all counts equal to zero. ## How do I use VST or rlog data for differential testing? The variance stabilizing and rlog transformations are provided for applications other than differential testing, for example clustering of samples or other machine learning applications. For differential testing we recommend the *DESeq* function applied to raw counts as outlined [above](#de). ## Can I use DESeq2 to analyze paired samples? Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. `~ subject + condition`. ## If I have multiple groups, should I run all together or split into pairs of groups? Typically, we recommend users to run samples from all groups together, and then use the `contrast` argument of the *results* function to extract comparisons of interest after fitting the model using *DESeq*. The model fit by *DESeq* estimates a single dispersion parameter for each gene, which defines how far we expect the observed count for a sample will be from the mean value from the model given its size factor and its condition group. See the section [above](#theory) and the DESeq2 paper for full details. Having a single dispersion parameter for each gene is usually sufficient for analyzing multi-group data, as the final dispersion value will incorporate the within-group variability across all groups. However, for some datasets, exploratory data analysis (EDA) plots could reveal that one or more groups has much higher within-group variability than the others. A simulated example of such a set of samples is shown below. This is case where, by comparing groups A and B separately -- subsetting a *DESeqDataSet* to only samples from those two groups and then running *DESeq* on this subset -- will be more sensitive than a model including all samples together. It should be noted that such an extreme range of within-group variability is not common, although it could arise if certain treatments produce an extreme reaction (e.g. cell death). Again, this can be easily detected from the EDA plots such as PCA described in this vignette. Here we diagram an extreme range of within-group variability with a simulated dataset. Typically, it is recommended to run *DESeq* across samples from all groups, for datasets with multiple groups. However, this simulated dataset shows a case where it would be preferable to compare groups A and B by creating a smaller dataset without the C samples. Group C has much higher within-group variability, which would inflate the per-gene dispersion estimate for groups A and B as well: ```{r varGroup, echo=FALSE} set.seed(3) dds1 <- makeExampleDESeqDataSet(n=1000,m=12,betaSD=.3,dispMeanRel=function(x) 0.01) dds2 <- makeExampleDESeqDataSet(n=1000,m=12, betaSD=.3, interceptMean=mcols(dds1)$trueIntercept, interceptSD=0, dispMeanRel=function(x) 0.2) dds2 <- dds2[,7:12] dds2$condition <- rep("C",6) mcols(dds2) <- NULL dds12 <- cbind(dds1, dds2) rld <- rlog(dds12, blind=FALSE, fitType="mean") plotPCA(rld) ``` ## Can I run DESeq2 to contrast the levels of many groups? DESeq2 will work with any kind of design specified using the R formula. We enourage users to consider exploratory data analysis such as principal components analysis rather than performing statistical testing of all pairs of many groups of samples. Statistical testing is one of many ways of describing differences between samples. Regarding multiple test correction, if a user is planning to contrast all pairs of many levels, and then selectively reporting the results of only a *subset* of those pairs, one needs to perform multiple testing across *contrasts* as well as genes to control for this additional form of multiple testing. This can be done by using the `p.adjust` function across a long vector of *p* values from all pairs of contasts, then re-assigning these adjusted *p* values to the appropriate results table. As a speed concern with fitting very large models, note that each additional level of a factor in the design formula adds another parameter to the GLM which is fit by DESeq2. Users might consider first removing genes with very few reads, e.g. genes with row sum of 1, as this will speed up the fitting procedure. ## Can I use DESeq2 to analyze a dataset without replicates? If a *DESeqDataSet* is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for `?DESeq`. ## How can I include a continuous covariate in the design formula? Continuous covariates can be included in the design formula in exactly the same manner as factorial covariates, and then *results* for the continuous covariate can be extracted by specifying `name`. Continuous covariates might make sense in certain experiments, where a constant fold change might be expected for each unit of the covariate. However, in many cases, more meaningful results can be obtained by cutting continuous covariates into a factor defined over a small number of bins (e.g. 3-5). In this way, the average effect of each group is controlled for, regardless of the trend over the continuous covariates. In R, *numeric* vectors can be converted into *factors* using the function *cut*. ## I ran a likelihood ratio test, but results() only gives me one comparison. "... How do I get the *p* values for all of the variables/levels that were removed in the reduced design?" This is explained in the help page for `?results` in the section about likelihood ratio test p-values, but we will restate the answer here. When one performs a likelihood ratio test, the *p* values and the test statistic (the `stat` column) are values for the test that removes all of the variables which are present in the full design and not in the reduced design. This tests the null hypothesis that all the coefficients from these variables and levels of these factors are equal to zero. The likelihood ratio test *p* values therefore represent a test of *all the variables and all the levels of factors* which are among these variables. However, the results table only has space for one column of log fold change, so a single variable and a single comparison is shown (among the potentially multiple log fold changes which were tested in the likelihood ratio test). This is indicated at the top of the results table with the text, e.g., log2 fold change (MLE): condition C vs A, followed by, LRT p-value: '~ batch + condition' vs '~ batch'. This indicates that the *p* value is for the likelihood ratio test of *all the variables and all the levels*, while the log fold change is a single comparison from among those variables and levels. See the help page for *results* for more details. ## What are the exact steps performed by DESeq()? See the manual page for *DESeq*, which links to the subfunctions which are called in order, where complete details are listed. Also you can read the three steps listed in the [DESeq2 model](#theory) in this document. ## Is there an official Galaxy tool for DESeq2? Yes. The repository for the DESeq2 tool is and a link to its location in the Tool Shed is . ## I want to benchmark DESeq2 comparing to other DE tools. One aspect which can cause problems for comparison is that, by default, DESeq2 outputs `NA` values for adjusted *p* values based on independent filtering of genes which have low counts. This is a way for the DESeq2 to give extra information on why the adjusted *p* value for this gene is not small. Additionally, *p* values can be set to `NA` based on extreme count outlier detection. These `NA` values should be considered *negatives* for purposes of estimating sensitivity and specificity. The easiest way to work with the adjusted *p* values in a benchmarking context is probably to convert these `NA` values to 1: ```{r convertNA, eval=FALSE} res$padj <- ifelse(is.na(res$padj), 1, res$padj) ``` ## I have trouble installing DESeq2 on Ubuntu/Linux... "*I try to install DESeq2 using biocLite(), but I get an error trying to install the R packages XML and/or RCurl:*" `ERROR: configuration failed for package XML` `ERROR: configuration failed for package RCurl` You need to install the following devel versions of packages using your standard package manager, e.g. `sudo apt-get install` or `sudo apt install` * libxml2-dev * libcurl4-openssl-dev # Acknowledgments We have benefited in the development of DESeq2 from the help and feedback of many individuals, including but not limited to: The Bionconductor Core Team, Alejandro Reyes, Andrzej Oles, Aleksandra Pekowska, Felix Klein, Nikolaos Ignatiadis (IHW), Anqi Zhu (apeglm), Joseph Ibrahim (apeglm), Vince Carey, Owen Solberg, Ruping Sun, Devon Ryan, Steve Lianoglou, Jessica Larson, Christina Chaivorapol, Pan Du, Richard Bourgon, Willem Talloen, Elin Videvall, Hanneke van Deutekom, Todd Burwell, Jesse Rowley, Igor Dolgalev, Stephen Turner, Ryan C Thompson, Tyr Wiesner-Hanks, Konrad Rudolph, David Robinson, Mingxiang Teng, Mathias Lesche, Sonali Arora, Jordan Ramilowski, Ian Dworkin, Bjorn Gruning, Ryan McMinds, Paul Gordon, Leonardo Collado Torres, Enrico Ferrero, Peter Langfelder. # Session info ```{r sessionInfo} sessionInfo() ``` # References DESeq2/vignettes/library.bib0000644000175400017540000002750513201671732016770 0ustar00biocbuildbiocbuild@article{Leek2014, author = {Leek, Jeffrey T.}, journal = {Nucleic Acids Research}, title = {{svaseq: removing batch effects and other unwanted noise from sequencing data}}, url = {http://dx.doi.org/10.1093/nar/gku864}, year = 2014, volume = 42, issue = 21, } @article{Risso2014, author = {Risso, Davide and Ngai, John and Speed, Terence P and Dudoit, Sandrine}, journal = {Nature Biotechnology}, title = {{Normalization of RNA-seq data using factor analysis of control genes or samples}}, url = {http://dx.doi.org/10.1038/nbt.2931}, year = 2014, volume = 32, issue = 9, } @article{Gerard2017, author = {Gerard, David and Stephens, Matthew}, journal = {arXiv}, title = {{Empirical Bayes Shrinkage and False Discovery Rate Estimation, Allowing For Unwanted Variation}}, url = {https://arxiv.org/abs/1709.10066}, year = 2017 } @article{Stephens2016, author = {Stephens, Matthew}, journal = {Biostatistics}, title = {False discovery rates: a new deal}, url = {https://doi.org/10.1093/biostatistics/kxw041}, year = 2016, volume = 18, issue = 2, } @article{Love2016Modeling, author = {Love, Michael I. and Hogenesch, John B. and Irizarry, Rafael A.}, journal = {Nature Biotechnology}, title = {Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation}, url = {http://dx.doi.org/10.1038/nbt.3682}, volume = 34, issue = 12, pages = {1287--1291}, year = 2016 } @article{Ignatiadis2016, author = {Ignatiadis, Nikolaos and Klaus, Bernd and Zaugg, Judith and Huber, Wolfgang}, journal = {Nature Methods}, title = {Data-driven hypothesis weighting increases detection power in genome-scale multiple testing}, url = {http://dx.doi.org/10.1038/nmeth.3885}, year = 2016 } @article{Love2014, url = {http://dx.doi.org/10.1186/s13059-014-0550-8}, author = {Love, Michael I. and Huber, Wolfgang and Anders, Simon}, title = {{Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2}}, journal = {Genome Biology}, year = 2014, Volume = 15, Issue = 12, Pages = 550, } @article{Soneson2015, url = {http://dx.doi.org/10.12688/f1000research.7563.1}, author = {Soneson, Charlotte and Love, Michael I. and Robinson, Mark}, title = {{Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences}}, journal = {F1000Research}, year = 2015, Volume = 4, Issue = 1521 } @article{Anders:2010:GB, url = {http://genomebiology.com/2010/11/10/R106}, author = {Anders, Simon and Huber, Wolfgang}, Title = {Differential expression analysis for sequence count data}, Journal = {Genome Biology}, Year = 2010, Volume = 11, Pages = {R106}, } @article{Anders:2014:htseq, url = {http://dx.doi.org/10.1093/bioinformatics/btu638}, author = {Anders, Simon and Pyl, Paul Theodor and Huber, Wolfgang}, title = {{HTSeq -- A Python framework to work with high-throughput sequencing data}}, journal = {Bioinformatics}, year = 2014, } @article{BH:1995, author = {Benjamini, Yoav and Hochberg, Yosef}, title = {Controlling the false discovery rate: a practical and powerful approach to multiple testing}, journal = "Journal of the Royal Statistical Society B", year = 1995, volume = 57, pages = "289--300" } @article{Bourgon:2010:PNAS, ISI = {ISI:000278054700015}, URL = {http://www.pnas.org/content/107/21/9546.long}, PDF = {PNAS-2010-Bourgon-9546-51.pdf}, author = {Bourgon, Richard and Gentleman, Robert and Huber, Wolfgang}, Title = {Independent filtering increases detection power for high-throughput experiments}, journal = {PNAS}, Year = 2010, volume = 107, number = 21, pages = {9546--9551}, } @article{Brooks2010, author = {Brooks, A. N. and Yang, L. and Duff, M. O. and Hansen, K. D. and Park, J. W. and Dudoit, S. and Brenner, S. E. and Graveley, B. R.}, doi = {10.1101/gr.108662.110}, issn = {1088-9051}, journal = {Genome Research}, pages = {193--202}, title = {{Conservation of an RNA regulatory map between Drosophila and mammals}}, url = {http://genome.cshlp.org/cgi/doi/10.1101/gr.108662.110}, year = 2011 } @article{Tibshirani1988, author = {Tibshirani, Robert}, title = {Estimating transformations for regression via additivity and variance stabilization}, journal = {Journal of the American Statistical Association}, year = 1988, volume = 83, pages = {394--405} } @misc{htseq, author = {Anders, Simon}, title = {{HTSeq: Analysing high-throughput sequencing data with Python}}, year = 2011, howpublished = {\url{http://www-huber.embl.de/users/anders/HTSeq/}} } @article{sagmb2003, title = {Parameter estimation for the calibration and variance stabilization of microarray data}, author = {Huber, Wolfgang and von Heydebreck, Anja and {S\"ultmann}, Holger and Poustka, Annemarie and Vingron, Martin}, journal = {Statistical Applications in Genetics and Molecular Biology}, year = 2003, volume = 2, number = 1, pages = {Article 3} } @misc{summarizeOverlaps, author = {Valerie Obenchain}, title = {Counting with \texttt{summarizeOverlaps}}, year = 2011, howpublished = {Vignette, distributed as part of the Bioconductor package \emph{GenomicAlignments}, as file \emph{summarizeOverlaps.pdf}} } @article{Anders:2012:GR, author = {Anders, Simon and Reyes, Alejandro and Huber, Wolfgang}, title = {Detecting differential usage of exons from {RNA-seq} data }, year = {2012}, journal = {Genome Research}, doi = {10.1101/gr.133744.111}, } @article{CR, author = {Cox, D. R. and Reid, N.}, journal = {Journal of the Royal Statistical Society, Series B}, keywords = {CML,Cox-Reid,ML,dispersion}, mendeley-tags = {CML,Cox-Reid,ML,dispersion}, number = {1}, pages = {1--39}, title = {{Parameter orthogonality and approximate conditional inference}}, url = {http://www.jstor.org/stable/2345476}, volume = {49}, year = {1987} } @article{edgeR_GLM, author = {McCarthy, Davis J and Chen, Yunshun and Smyth, Gordon K}, doi = {10.1093/nar/gks042}, issn = {1362-4962}, journal = {Nucleic Acids Research}, keywords = {edgeR}, mendeley-tags = {edgeR}, month = jan, pmid = {22287627}, title = {{Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/22287627}, year = {2012}, volume={40}, pages={4288-4297} } @article{SchwederSpjotvoll1982, author={Schweder, T. and Spj\/{o}tvoll, E.}, title={Plots of {P-values} to evaluate many tests simultaneously}, journal={Biometrika}, year={1982}, volume=69, pages={493-502}, doi={10.1093/biomet/69.3.493} } @article{Wu2012New, author = {Wu, Hao and Wang, Chi and Wu, Zhijin}, day = {22}, doi = {10.1093/biostatistics/kxs033}, issn = {1468-4357}, journal = {Biostatistics}, month = sep, pmid = {23001152}, posted-at = {2013-02-26 17:09:19}, priority = {2}, publisher = {Oxford University Press}, title = {{A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data}}, url = {http://dx.doi.org/10.1093/biostatistics/kxs033}, year = {2012} } @article{Cook1977Detection, author = {R. Dennis Cook}, journal = {Technometrics}, month = feb, year = {1977}, title = {{Detection of Influential Observation in Linear Regression}} } @article{Bickel2010Subsampling, author = {Bickel, Peter J. and Boley, Nathan and Brown, James B. and Huang, Haiyan and Zhang, Nancy R.}, day = {5}, doi = {10.1214/10-aoas363}, eprint = {1101.0947}, issn = {1932-6157}, journal = {The Annals of Applied Statistics}, month = dec, number = {4}, pages = {1660--1697}, title = {{Subsampling methods for genomic inference}}, url = {http://dx.doi.org/10.1214/10-aoas363}, volume = {4}, year = {2010} } @article{Li2009SAMtools, author = {Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, Richard and 1000 Genome Project Data Processing Subgroup}, title = {{The Sequence Alignment/Map format and SAMtools}}, volume = {25}, number = {16}, pages = {2078-2079}, year = {2009}, doi = {10.1093/bioinformatics/btp352}, URL = {http://bioinformatics.oxfordjournals.org/content/25/16/2078.abstract}, eprint = {http://bioinformatics.oxfordjournals.org/content/25/16/2078.full.pdf+html}, journal = {Bioinformatics} } @article{Kim2013TopHat2, author = {Kim, Daehwan and Pertea, Geo and Trapnell, Cole and Pimentel, Harold and Kelley, Ryan and Salzberg, Steven}, doi = {10.1186/gb-2013-14-4-r36}, issn = {1465-6906}, journal = {Genome Biology}, number = {4}, pages = {R36+}, pmid = {23618408}, title = {{TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions}}, url = {http://dx.doi.org/10.1186/gb-2013-14-4-r36}, volume = {14}, year = {2013} } @Article{Delhomme2012easy, Author={Delhomme, N. and Padioleau, I. and Furlong, E. E. and Steinmetz, L. M.}, Title={{easyRNASeq: a Bioconductor package for processing RNA-Seq data}}, Journal={Bioinformatics}, Year={2012}, Volume={28}, Number={19}, Pages={2532--2533}, Month={Oct} } @Article{Liao2013feature, Author={Liao, Y. and Smyth, G. K. and Shi, W.}, Title={{featureCounts: an efficient general purpose program for assigning sequence reads to genomic features}}, Journal={Bioinformatics}, Year={2013}, Month={Nov} } @article{Li2011RSEM, author = {Li, Bo and Dewey, Colin N.}, doi = {10.1186/1471-2105-12-3231}, journal = {BMC Bioinformatics}, pages = {323+}, title = {{RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.}}, url = {http://dx.doi.org/10.1186/1471-2105-12-323}, volume = {12}, year = {2011} } @article{Patro2014Sailfish, author = {Patro, Rob and Mount, Stephen M. and Kingsford, Carl}, journal = {Nature Biotechnology}, pages = {462--464}, title = {{Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms}}, url = {http://dx.doi.org/10.1038/nbt.2862}, volume = {32}, year = {2014} } @article{Patro2017Salmon, author = {Patro, Rob and Duggal, Geet and Love, Michael I. and Irizarry, Rafael A. and Kingsford, Carl}, journal = {Nature Methods}, title = {Salmon provides fast and bias-aware quantification of transcript expression}, url = {http://dx.doi.org/10.1038/nmeth.4197}, year = 2017 } @article{Bray2016Near, author = {Bray, Nicolas and Pimentel, Harold and Melsted, Pall and Pachter, Lior}, journal = {Nature Biotechnology}, pages = {525–-527}, title = {Near-optimal probabilistic RNA-seq quantification}, volume = {34}, url = {http://dx.doi.org/10.1038/nbt.3519}, year = 2016 } @article{Robert2015Errors, author = {Robert, Christelle and Watson, Mick}, doi = {10.1186/s13059-015-0734-x}, journal = {Genome Biology}, title = {{Errors in RNA-Seq quantification affect genes of relevance to human disease}}, url = {http://dx.doi.org/10.1186/s13059-015-0734-x}, year = {2015} } @article{Trapnell2013Differential, author = {Trapnell, Cole and Hendrickson, David G and Sauvageau, Martin and Goff, Loyal and Rinn, John L and Pachter, Lior}, doi = {10.1038/nbt.2450}, journal = {Nature Biotechnology}, title = {{Differential analysis of gene regulation at transcript resolution with RNA-seq}}, url = {http://dx.doi.org/10.1038/nbt.2450}, year = {2013} } DESeq2/vignettes/sed_call0000644000175400017540000000052613201671732016331 0ustar00biocbuildbiocbuildsed -e 's/rmarkdown::html_document:/BiocStyle::pdf_document2:/' -e '/highlight: pygments/d' -e 's/Analyzing RNA-seq data with DESeq2/Analyzing RNA-seq data with DESeq2 (PDF)/g' -e 's/This is the source document/This is a derived document, DO NOT EDIT/' -e '/DESeq2 package version: `r packageVersion("DESeq2")`/d' DESeq2.Rmd > DESeq2_pdf.Rmd