statmod/0000755000176200001440000000000012654111506011726 5ustar liggesusersstatmod/inst/0000755000176200001440000000000012654044513012706 5ustar liggesusersstatmod/inst/NEWS0000644000176200001440000002747012654044513013417 0ustar liggesusers 2 February 2016: statmod 1.4.24 - speedup for rinvgauss() by replacing rchisq() with rnorm() and rbinom() with runif(). - speedup for qinvgauss() by using qgamma as starting approximation for very small right tail probabilities, and inverse chisq as starting approximation for very small left tail probabilities. - qinvgauss() now computes Newton step using log probabilities and a Taylor series expansion for small steps. This improves accuracy in extreme cases. The stopping criterion for the Newton iteration has been revised. - Bug fix to dinvgauss(), pinvgauss() and qinvgauss() which were not preserving attributes of the first argument. 30 December 2015: statmod 1.4.23 - qinvgauss() has been improved to return best achievable machine accuracy. It now checks for backtracking of the Newton iteration. - dinvgauss() and pinvgauss() now check for a wider range of special cases. This allows them to give valid results in some cases for infinite or missing parameter values and for x outside the support of the distribution. 26 October 2015: statmod 1.4.22 - Functions needed from the stats and graphics packages are now explicitly imported into the package NAMESPACE. 30 March 2015: statmod 1.4.21 - qinvgauss() now treats input arguments of different lengths or NA parameter values more carefully. - elda() now gracefully removes structural zeros, i.e., rows where the number of cells or the number of assays is zero. - S3 print and plot methods for "limdil" class now registered. - Use of require("tweedie") in the qres.tweedie() code replaced by requireNameSpace("tweedie"). 30 May 2014: statmod 1.4.20 - Considerable work on the inverse Gaussian functions dinvgauss(), pinvgauss(), qinvgauss() and rinvgauss(). The parameter arguments are changed to mean, shape and dispersion instead of mu and lambda. The functions now include arguments lower.tail and log.p, meaning that right-tailed probabilities can be used and probabilities can be specified on the log-scale. Good numerical precision is maintained in these cases. The functions now respect attributes, so that a matrix argument for example will produce a matrix result. Checking is now done for missing values and invalid parameter values on an element-wise basis. A technical report has been written to describe the methodology behind qinvgauss(). - This file has been renamed to NEWS instead of changelog.txt. - The introductory help page previously called 1.Introduction is now named statmod-package. 13 April 2014: statmod 1.4.19 - qinvgauss() now uses a globally convergent Newton iteration, which produces accurate values for a greater range of parameter values. - glmnb.fit() now supports weights. 27 September 2013: statmod 1.4.18 - Update reference for permp(). - bug fix to elda() so that it returns NA for the tests instead of giving an error when the Fisher information for the slope isNA. - Exact roles of authors specified in DESCRIPTION file. - All usage lines in help files wrapped at 90 characters to ensure that code is not truncated in pdf manual. 30 January 2013: statmod 1.4.17 - new function eldaOneGroup() to conduct limiting dilution analysis when there is only one treatment group. This function implements a globally convergent Newton iteration to avoid occasional problems with lack of convergence of the usual glm calculations. - elda() (aka limdil) now call eldaOneGroup() to get confidence intervals and score tests. This improves the numerical reliability of the function. - more detail added to the elda() help page about the interpretations of the goodness of fit tests. - new function forward() for forward selection in multiple regression with an unlimited number of candidate covariates. - license upgraded to GPL from LGPL. 28 September 2012: statmod 1.4.16 - gauss.quad() and gauss.quad.prob() now use Fortran to solve tridiagonal eigenvalue problem, with considerable gain in speed. Updates to references for the same two functions. - Formatting of gauss.quad test output to ensure agreement between Unix and Windows. - mixedModel2 test code no longer prints REML residuals, because these are not reproducible between Windows and Unix. 6 August 2012: statmod 1.4.15 - improvements to glmnb.fit() to make it more numerically robust. - use of lgamma() in gauss.quad() to avoid floating overflows with kind="jacobi". 19 November 2011: statmod 1.4.14 - power.fisher.test() now accepts a new argument alternative which indicates the rejection region. 25 October 2011: statmod 1.4.13 - bug fix to glmnb.fit() when dispersion is a vector argument. 18 October 2011: statmod 1.4.12 - glmnb.fit() now accepts vector dispersion argument. - change to residual returned by tweedie glms when var.power=2 and y==0. In this case the theoretical residual is -Inf. The value returned by the tweedie family is finite, but smaller than previous. 29 June 2011: statmod 1.4.11 - updates to help page for sage.test 21 April 2011: statmod 1.4.10 - bug fix to glmnb.fit(). 9 March 2011: statmod 1.4.9 - bug fix to glmnb.fit(). - bug correction to sage.test() when library sizes are equal. The p-values returned change slightly. 3 November 2010: statmod 1.4.8 - new function glmnb.fit(), which implements Levenberg-modified Fisher scoring to fit a negative binomial generalized linear model with log-link. 28 May 2010: statmod 1.4.7 - permp() now has two new arguments 'method' and 'twosided'. The function now provides both exact and approximate methods for computing permutation p-values. 19 April 2010: statmod 1.4.6 - psi.hampel() and rho.hampel() renamed to .psi.hampel and .rho.hampel and removed from export function list. 16 April 2010: statmod 1.4.5 - new function mscale() which is the robust estimation of a scale parameter using Hampel's redescending psi function. - new function psi.hampel() which is the Hampel's redescending psi function. - new function rho.hampel() which is the integral of Hampel's redescending psi function. 30 March 2010: statmod 1.4.4 - remlscore() now returns a component iter giving the number of iterations used. 18 February 2010: statmod 1.4.3 - new function permp() which calculates exact p-values for permutation tests when permutations are sampled with replacement. 5 January 2010: statmod 1.4.2 - new argument 'dispersion' for glm.scoretest(), allowing the user to set a known value for the dispersion parameter. - ensure chisq values from limdil() remain positive, even when small. - correct the format of chisq value in print.limdil(). 29 Sep 2009: statmod 1.4.1 - fixes to documentation links to other packages - bug fix to glmgam.fit() when there are exact zeros in the data or fitted values. - add more goodness of fit tests to elda(). - improvements to print.limdil method. - argument log added to dinvgauss(), giving the option of returning the density on the log-scale. 6 May 2009: statmod 1.4.0 - new function glm.scoretest() to compute likelihood score tests for terms in generalized linear models. - Improvements to elda() and print.limdil() to avoid glm fits in extreme data situations with 0% or 100% positive response, improving speed and avoiding warnings. - Improvements to print.limdil method. - New function .limdil.allpos(). It calculates lower bound of the limdil confidence interval when all tests respond by using a globally convergent Newton interation. - Modify limdil() on lower bound of the confidence interval when all tests respond. - New methods print.limdil and plot.limdil for limdil objects. - The output from limdil() is now a limdil class object. - Added \eqn{} markup to equations in pinvgauss.Rd remlscor.Rd and remlscorgamma.Rd. - Elso et al (2004) reference added to compareGrowthCurves help page. 18 November 2008: statmod 1.3.8 - qres.nbinom now works in more situations. It now accepts a model fit from MASS::glm.nb or a model fit using MASS:negative.binomial() when the theta argument is unnamed. Previously the theta argument had to be named, as in negative.binomial(theta=2). 20 July 2008: statmod 1.3.7 - reference added to help page for compareGrowthCurves() - the saved output from the automatic tests updated for R 2.7.1 07 April 2008: statmod 1.3.6 - fixes to limdil() on estimate and upper bound of the confidence interval when all cells respond to all tests. - bug fix in limdil() which produced wrong calculation of the upper bound and lower bound of the confidence interval when there are more than one group and no cells responds or all cells respond to all tests in one of the groups. 24 March 2008: statmod 1.3.5 - The function remlscoregamma(), removed in 2004, restored to the package with updated references. 11 February 2008: statmod 1.3.4 - bug fix in limdil() which produced error when calculating the confidence intervals of multiple groups and all cells respond in one of the groups. 12 January 2008: statmod 1.3.3 - the limdil function now has the capability to handle and compare multiple experiments or groups. 24 September 2007: statmod 1.3.1 - non-ascii European characters removed from Rd files - Shackleton reference added to limdil.Rd - fixed some non-matched brackets in other Rd files 15 October 2006: statmod 1.3.0 - package now has a NAMESPACE which simply exports all objects - new function fitNBP() - new function plotGrowthCurves() 4 January 2006: statmod 1.2.4 - fixes to gauss.quad.prob when n=1 or n=2 12 December 2005: statmod 1.2.3 - remlscore() was failing when rank of X was only 1, now fixed. 20 October 2005: statmod 1.2.2 - mixedModel2Fit() now outputs REML residuals - randomizedBlock() & randomizedBlockFit() renamed to mixedModel2() & mixedModel2Fit() 4 July 2005: statmod 1.2.1 - remlscore() now outputs covariance matrices for estimated coefficients - redundant copy of randomizedBlockFit() removed 22 June 2005: statmod 1.2.0 - /inst/doc/index.html created - change log (this file) moved to /inst/doc directory of package - new function limdil() 14 June 2005: statmod 1.1.1 - change to rinvgauss() to avoid numerical problems with subtractive cancellation when lambda< library(statmod) > > set.seed(0); u <- runif(100) > > ### fitNBP > > y <- matrix(rnbinom(2*4,mu=4,size=1.5),2,4) > lib.size <- rep(50000,4) > group <- c(1,1,2,2) > fitNBP(y,group=group,lib.size=lib.size) $coefficients 1 2 [1,] -10.414313 -10.81978 [2,] -9.315701 -10.41431 $fitted.values [,1] [,2] [,3] [,4] [1,] 1.5 1.5 1.0 1.0 [2,] 4.5 4.5 1.5 1.5 $dispersion [1] 0.9886071 > > ### glmgam.fit > > glmgam.fit(1,1) $coefficients [1] 1 $fitted.values [1] 1 $deviance [1] 0 $iter [1] 1 > glmgam.fit(c(1,1),c(0,4)) $coefficients [1] 2 $fitted.values [1] 2 2 $deviance [1] Inf $iter [1] 1 > glmgam.fit(X=cbind(1,c(1,0.5,0.5,0,0)),y=rchisq(5,df=1)) $coefficients [1] 0.1873533 0.6578903 $fitted.values [1] 0.8452436 0.5162985 0.5162985 0.1873533 0.1873533 $deviance [1] 10.7196 $iter [1] 12 > > ### glmnb.fit > y <- rnbinom(5,mu=10,size=10) > glmnb.fit(X=cbind(1,c(1,0.5,0.5,0,0)),y=y,dispersion=0.1) $coefficients [1] 2.3042476 -0.2210662 $fitted.values [1] 8.029975 8.968465 8.968465 10.016639 10.016639 $deviance [1] 0.5750191 $iter [1] 3 $convergence [1] "converged" > glmnb.fit(X=cbind(1,c(1,0.5,0.5,0,0)),y=y,dispersion=runif(6)) $coefficients [1] 2.2854591 -0.2049791 $fitted.values [1] 8.008312 8.872615 8.872615 9.830198 9.830198 $deviance [1] 0.150322 $iter [1] 3 $convergence [1] "converged" > glmnb.fit(X=cbind(1,c(1,1,0,0,0)),y=c(0,0,6,2,9),dispersion=0.1) $coefficients [1] 1.734601 -17.510821 $fitted.values [1] 1.407586e-07 1.407586e-07 5.666667e+00 5.666667e+00 5.666667e+00 $deviance [1] 3.242349 $iter [1] 17 $convergence [1] "converged" > > ### mixedModel2 > > y <- rnorm(6) > x <- rnorm(6) > z <- c(1,1,2,2,3,3) > m <- mixedModel2(y~x,random=z) > m$reml.residuals <- m$qr <- NULL > m $varcomp Residual Block 0.4541410 -0.1830805 $se.varcomp [1] 0.3764600 0.1997047 $coefficients (Intercept) x -0.1622120 -0.1635664 $residuals [1] 0.18728174 -0.23383906 0.04655731 0.38053667 0.93837501 0.56278279 $fitted.values [1] 0.42428865 0.43509063 0.06831659 -0.08278783 0.09651970 0.07022039 $effects (Intercept) x -1.8057315 -1.0181717 -0.8221429 0.4683771 1.5047294 0.9167959 $weights [1] 11.36623 11.36623 11.36623 2.20196 2.20196 2.20196 $rank [1] 2 $assign NULL $df.residual [1] 4 $se.coefficients [1] 0.1331802 0.1606472 > > ### mixedModel2Fit > > y <- c(-1,1,-2,2,0.5,1.7,-0.1) > X <- matrix(1,7,1) > Z <- model.matrix(~0+factor(c(1,1,2,2,3,3,4))) > m <- mixedModel2Fit(y,X,Z) > m$reml.residuals <- m$qr <- NULL > m $varcomp Residual Block 2.923462 -1.098564 $se.varcomp [1] 2.195145 1.177909 $coefficients x1 0.3376358 $residuals [1] 0.4774892 -1.0781457 0.4774892 0.4376358 1.5213203 2.7213203 -1.0000000 $fitted.values [1] -0.4774892 -0.4774892 -0.4774892 -0.3376358 0.0000000 0.0000000 0.0000000 $effects x1 1.0020814 -1.4659700 0.3593523 0.2343331 0.8897582 1.5915892 -0.5848592 $weights [1] 1.3767775 1.3767775 1.3767775 0.5479759 0.3420603 0.3420603 0.3420603 $rank [1] 1 $assign NULL $df.residual [1] 6 $se.coefficients [1] 0.3369346 > > ### qresiduals > > y <- rnorm(6) > fit <- glm(y~1) > residuals(fit) 1 2 3 4 5 6 0.1570945 -1.0198715 0.6872330 -1.1702352 1.7359615 -0.3901824 > qresiduals(fit) 1 2 3 4 5 6 0.1425500 -0.9254473 0.6236059 -1.0618896 1.5752385 -0.3540575 > qresiduals(fit,dispersion=1) 1 2 3 4 5 6 0.1570945 -1.0198715 0.6872330 -1.1702352 1.7359615 -0.3901824 > > if(require("MASS")) { + fit <- glm(Days~Age,family=negative.binomial(2),data=quine) + print(summary(qresiduals(fit))) + fit <- glm.nb(Days~Age,link=log,data = quine) + print(summary(qresiduals(fit))) + } Loading required package: MASS Min. 1st Qu. Median Mean 3rd Qu. Max. -3.2820 -0.8242 -0.2252 -0.1500 0.7333 3.0360 Min. 1st Qu. Median Mean 3rd Qu. Max. -2.90900 -0.52600 -0.02938 -0.01204 0.67880 2.45600 > > ### gauss.quad > > options(digits=10) > g <- gauss.quad(5,"legendre") > zapsmall(data.frame(g),digits=15) nodes weights 1 -0.9061798459 0.2369268851 2 -0.5384693101 0.4786286705 3 0.0000000000 0.5688888889 4 0.5384693101 0.4786286705 5 0.9061798459 0.2369268851 > g <- gauss.quad(5,"chebyshev1") > zapsmall(data.frame(g),digits=15) nodes weights 1 -0.9510565163 0.6283185307 2 -0.5877852523 0.6283185307 3 0.0000000000 0.6283185307 4 0.5877852523 0.6283185307 5 0.9510565163 0.6283185307 > g <- gauss.quad(5,"chebyshev2") > zapsmall(data.frame(g),digits=15) nodes weights 1 -0.8660254038 0.1308996939 2 -0.5000000000 0.3926990817 3 0.0000000000 0.5235987756 4 0.5000000000 0.3926990817 5 0.8660254038 0.1308996939 > g <- gauss.quad(5,"hermite") > zapsmall(data.frame(g),digits=15) nodes weights 1 -2.0201828705 0.01995324206 2 -0.9585724646 0.39361932315 3 0.0000000000 0.94530872048 4 0.9585724646 0.39361932315 5 2.0201828705 0.01995324206 > g <- gauss.quad(5,"laguerre",alpha=5) > zapsmall(data.frame(g),digits=15) nodes weights 1 2.510558565 18.05274373485 2 5.115656536 63.52567706777 3 8.635874626 34.74331388323 4 13.417467882 3.63334627180 5 20.320442391 0.04491904235 > g <- gauss.quad(5,"jacobi",alpha=5,beta=1.1) > zapsmall(data.frame(g),digits=15) nodes weights 1 -0.8844049819 0.40981005618 2 -0.6382606000 1.16318993548 3 -0.2943950347 0.93716413992 4 0.1024254205 0.26378902100 5 0.5034550719 0.01840428809 > g <- gauss.quad.prob(5,dist="uniform") > zapsmall(data.frame(g),digits=15) nodes weights 1 0.04691007703 0.1184634425 2 0.23076534495 0.2393143352 3 0.50000000000 0.2844444444 4 0.76923465505 0.2393143352 5 0.95308992297 0.1184634425 > g <- gauss.quad.prob(5,dist="normal") > zapsmall(data.frame(g),digits=15) nodes weights 1 -2.856970014 0.01125741133 2 -1.355626180 0.22207592201 3 0.000000000 0.53333333333 4 1.355626180 0.22207592201 5 2.856970014 0.01125741133 > g <- gauss.quad.prob(5,dist="beta") > zapsmall(data.frame(g),digits=15) nodes weights 1 0.04691007703 0.1184634425 2 0.23076534495 0.2393143352 3 0.50000000000 0.2844444444 4 0.76923465505 0.2393143352 5 0.95308992297 0.1184634425 > g <- gauss.quad.prob(5,dist="gamma") > zapsmall(data.frame(g),digits=15) nodes weights 1 0.2635603197 5.217556106e-01 2 1.4134030591 3.986668111e-01 3 3.5964257710 7.594244968e-02 4 7.0858100059 3.611758680e-03 5 12.6408008443 2.336997239e-05 > > ### invgauss > > pinvgauss(c(0,0.1,1,2.3,3.1,NA),mean=c(1,2,3,0,1,2),dispersion=0.5) [1] 0.000000000e+00 2.057306477e-05 2.854596328e-01 1.000000000e+00 [5] 9.812161963e-01 NA > pinvgauss(c(0,0.1,1,2.3,3.1,NA),mean=c(1,2,3,0,1,2),dispersion=0.5,log.p=TRUE) [1] -Inf -10.79152787332 -1.25365465102 0.00000000000 [5] -0.01896246007 NA > pinvgauss(c(0,0.1,1,2.3,3.1,NA),mean=c(1,2,3,0,1,2),dispersion=0.5,lower.tail=FALSE,log.p=TRUE) [1] 0.0000000000000 -0.0000205732764 -0.3361157861191 -Inf [5] -3.9747602878610 NA > pinvgauss(1,mean=c(1,2,NA)) [1] 0.6681020012 0.4901383399 NA > p <- c(0,0.001,0.5,0.999,1) > qinvgauss(p,mean=1.3,dispersion=0.6) [1] 0.0000000000 0.1271035164 0.9446753861 9.2602074131 Inf > qinvgauss(p,mean=1.3,dispersion=0.6,lower.tail=FALSE) [1] Inf 9.2602074131 0.9446753861 0.1271035164 0.0000000000 > qinvgauss(0.5,mean=c(1,2,NA)) [1] 0.6758413057 1.0284597846 NA > qinvgauss(log(p),mean=1.3,dispersion=0.6,log.p=TRUE) [1] 0.0000000000 0.1271035164 0.9446753861 9.2602074131 Inf > qinvgauss(log(p),mean=1.3,dispersion=0.6,lower.tail=FALSE,log.p=TRUE) [1] Inf 9.2602074131 0.9446753861 0.1271035164 0.0000000000 > > > ### extra tests done only locally > > #GKSTest <- Sys.getenv("GKSTest") > #if(GKSTest=="on") { > #print("hello") > #} > > proc.time() user system elapsed 0.15 0.03 0.21 statmod/src/0000755000176200001440000000000012654044513012520 5ustar liggesusersstatmod/src/gaussq.f0000644000176200001440000003231412654044513014175 0ustar liggesusersC This was file downloaded from http://www.netlib.org/go/gaussq.f C on 7 August 2012. C Two lines have been commented out, with a capital C, and rewritten C by Gordon Smyth. The file is unchanged apart from those two changes C and this preamble of 6 lines. c To get dgamma, "send dgamma from fnlib". c To get d1mach, mail netlib c send d1mach from core c C subroutine gaussq(kind, n, alpha, beta, kpts, endpts, b, t, w) c c this set of routines computes the nodes t(j) and weights c w(j) for gaussian-type quadrature rules with pre-assigned c nodes. these are used when one wishes to approximate c c integral (from a to b) f(x) w(x) dx c c n c by sum w f(t ) c j=1 j j c c (note w(x) and w(j) have no connection with each other.) c here w(x) is one of six possible non-negative weight c functions (listed below), and f(x) is the c function to be integrated. gaussian quadrature is particularly c useful on infinite intervals (with appropriate weight c functions), since then other techniques often fail. c c associated with each weight function w(x) is a set of c orthogonal polynomials. the nodes t(j) are just the zeroes c of the proper n-th degree polynomial. c c input parameters (all real numbers are in double precision) c c kind an integer between 1 and 6 giving the type of c quadrature rule: c c kind = 1: legendre quadrature, w(x) = 1 on (-1, 1) c kind = 2: chebyshev quadrature of the first kind c w(x) = 1/sqrt(1 - x*x) on (-1, +1) c kind = 3: chebyshev quadrature of the second kind c w(x) = sqrt(1 - x*x) on (-1, 1) c kind = 4: hermite quadrature, w(x) = exp(-x*x) on c (-infinity, +infinity) c kind = 5: jacobi quadrature, w(x) = (1-x)**alpha * (1+x)** c beta on (-1, 1), alpha, beta .gt. -1. c note: kind=2 and 3 are a special case of this. c kind = 6: generalized laguerre quadrature, w(x) = exp(-x)* c x**alpha on (0, +infinity), alpha .gt. -1 c c n the number of points used for the quadrature rule c alpha real parameter used only for gauss-jacobi and gauss- c laguerre quadrature (otherwise use 0.d0). c beta real parameter used only for gauss-jacobi quadrature-- c (otherwise use 0.d0) c kpts (integer) normally 0, unless the left or right end- c point (or both) of the interval is required to be a c node (this is called gauss-radau or gauss-lobatto c quadrature). then kpts is the number of fixed c endpoints (1 or 2). c endpts real array of length 2. contains the values of c any fixed endpoints, if kpts = 1 or 2. c b real scratch array of length n c c output parameters (both double precision arrays of length n) c c t will contain the desired nodes. c w will contain the desired weights w(j). c c underflow may sometimes occur, but is harmless. c c references c 1. golub, g. h., and welsch, j. h., "calculation of gaussian c quadrature rules," mathematics of computation 23 (april, c 1969), pp. 221-230. c 2. golub, g. h., "some modified matrix eigenvalue problems," c siam review 15 (april, 1973), pp. 318-334 (section 7). c 3. stroud and secrest, gaussian quadrature formulas, prentice- c hall, englewood cliffs, n.j., 1966. c c original version 20 jan 1975 from stanford c modified 21 dec 1983 by eric grosse c imtql2 => gausq2 c hex constant => d1mach (from core library) c compute pi using datan c removed accuracy claims, description of method c added single precision version c C double precision b(n), t(n), w(n), endpts(2), muzero, t1, C x gam, solve, dsqrt, alpha, beta c C call class (kind, n, alpha, beta, b, t, muzero) c c the matrix of coefficients is assumed to be symmetric. c the array t contains the diagonal elements, the array c b the off-diagonal elements. c make appropriate changes in the lower right 2 by 2 c submatrix. c C if (kpts.eq.0) go to 100 C if (kpts.eq.2) go to 50 c c if kpts=1, only t(n) must be changed c C t(n) = solve(endpts(1), n, t, b)*b(n-1)**2 + endpts(1) C go to 100 c c if kpts=2, t(n) and b(n-1) must be recomputed c C 50 gam = solve(endpts(1), n, t, b) C t1 = ((endpts(1) - endpts(2))/(solve(endpts(2), n, t, b) - gam)) C b(n-1) = dsqrt(t1) C t(n) = endpts(1) + gam*t1 c c note that the indices of the elements of b run from 1 to n-1 c and thus the value of b(n) is arbitrary. c now compute the eigenvalues of the symmetric tridiagonal c matrix, which has been modified as necessary. c the method used is a ql-type method with origin shifting c C 100 w(1) = 1.0d0 C do 105 i = 2, n C 105 w(i) = 0.0d0 c C call gausq2 (n, t, b, w, ierr) C do 110 i = 1, n C 110 w(i) = muzero * w(i) * w(i) c C return C end c c c C double precision function solve(shift, n, a, b) c c this procedure performs elimination to solve for the c n-th component of the solution delta to the equation c c (jn - shift*identity) * delta = en, c c where en is the vector of all zeroes except for 1 in c the n-th position. c c the matrix jn is symmetric tridiagonal, with diagonal c elements a(i), off-diagonal elements b(i). this equation c must be solved to obtain the appropriate changes in the lower c 2 by 2 submatrix of coefficients for orthogonal polynomials. c c C double precision shift, a(n), b(n), alpha c C alpha = a(1) - shift C nm1 = n - 1 C do 10 i = 2, nm1 C 10 alpha = a(i) - shift - b(i-1)**2/alpha C solve = 1.0d0/alpha C return C end c c c C subroutine class(kind, n, alpha, beta, b, a, muzero) c c this procedure supplies the coefficients a(j), b(j) of the c recurrence relation c c b p (x) = (x - a ) p (x) - b p (x) c j j j j-1 j-1 j-2 c c for the various classical (normalized) orthogonal polynomials, c and the zero-th moment c c muzero = integral w(x) dx c c of the given polynomial's weight function w(x). since the c polynomials are orthonormalized, the tridiagonal matrix is c guaranteed to be symmetric. c c the input parameter alpha is used only for laguerre and c jacobi polynomials, and the parameter beta is used only for c jacobi polynomials. the laguerre and jacobi polynomials c require the gamma function. c C double precision a(n), b(n), muzero, alpha, beta C double precision abi, a2b2, dgamma, pi, dsqrt, ab c C pi = 4.0d0 * datan(1.0d0) C nm1 = n - 1 C go to (10, 20, 30, 40, 50, 60), kind c c kind = 1: legendre polynomials p(x) c on (-1, +1), w(x) = 1. c C 10 muzero = 2.0d0 C do 11 i = 1, nm1 C a(i) = 0.0d0 C abi = i C 11 b(i) = abi/dsqrt(4*abi*abi - 1.0d0) C a(n) = 0.0d0 C return c c kind = 2: chebyshev polynomials of the first kind t(x) c on (-1, +1), w(x) = 1 / sqrt(1 - x*x) c C 20 muzero = pi C do 21 i = 1, nm1 C a(i) = 0.0d0 C 21 b(i) = 0.5d0 C b(1) = dsqrt(0.5d0) C a(n) = 0.0d0 C return c c kind = 3: chebyshev polynomials of the second kind u(x) c on (-1, +1), w(x) = sqrt(1 - x*x) c C 30 muzero = pi/2.0d0 C do 31 i = 1, nm1 C a(i) = 0.0d0 C 31 b(i) = 0.5d0 C a(n) = 0.0d0 C return c c kind = 4: hermite polynomials h(x) on (-infinity, c +infinity), w(x) = exp(-x**2) c C 40 muzero = dsqrt(pi) C do 41 i = 1, nm1 C a(i) = 0.0d0 C 41 b(i) = dsqrt(i/2.0d0) C a(n) = 0.0d0 C return c c kind = 5: jacobi polynomials p(alpha, beta)(x) on c (-1, +1), w(x) = (1-x)**alpha + (1+x)**beta, alpha and c beta greater than -1 c C 50 ab = alpha + beta C abi = 2.0d0 + ab C muzero = 2.0d0 ** (ab + 1.0d0) * dgamma(alpha + 1.0d0) * dgamma( C x beta + 1.0d0) / dgamma(abi) C a(1) = (beta - alpha)/abi C b(1) = dsqrt(4.0d0*(1.0d0 + alpha)*(1.0d0 + beta)/((abi + 1.0d0)* C 1 abi*abi)) C a2b2 = beta*beta - alpha*alpha C do 51 i = 2, nm1 C abi = 2.0d0*i + ab C a(i) = a2b2/((abi - 2.0d0)*abi) C 51 b(i) = dsqrt (4.0d0*i*(i + alpha)*(i + beta)*(i + ab)/ C 1 ((abi*abi - 1)*abi*abi)) C abi = 2.0d0*n + ab C a(n) = a2b2/((abi - 2.0d0)*abi) C return c c kind = 6: laguerre polynomials l(alpha)(x) on c (0, +infinity), w(x) = exp(-x) * x**alpha, alpha greater c than -1. c C 60 muzero = dgamma(alpha + 1.0d0) C do 61 i = 1, nm1 C a(i) = 2.0d0*i - 1.0d0 + alpha C 61 b(i) = dsqrt(i*(i + alpha)) C a(n) = 2.0d0*n - 1 + alpha C return C end c c subroutine gausq2(n, d, e, z, ierr) c c this subroutine is a translation of an algol procedure, c num. math. 12, 377-383(1968) by martin and wilkinson, c as modified in num. math. 15, 450(1970) by dubrulle. c handbook for auto. comp., vol.ii-linear algebra, 241-248(1971). c this is a modified version of the 'eispack' routine imtql2. c c this subroutine finds the eigenvalues and first components of the c eigenvectors of a symmetric tridiagonal matrix by the implicit ql c method. c c on input: c c n is the order of the matrix; c c d contains the diagonal elements of the input matrix; c c e contains the subdiagonal elements of the input matrix c in its first n-1 positions. e(n) is arbitrary; c c z contains the first row of the identity matrix. c c on output: c c d contains the eigenvalues in ascending order. if an c error exit is made, the eigenvalues are correct but c unordered for indices 1, 2, ..., ierr-1; c c e has been destroyed; c c z contains the first components of the orthonormal eigenvectors c of the symmetric tridiagonal matrix. if an error exit is c made, z contains the eigenvectors associated with the stored c eigenvalues; c c ierr is set to c zero for normal return, c j if the j-th eigenvalue has not been c determined after 30 iterations. c c ------------------------------------------------------------------ c integer i, j, k, l, m, n, ii, mml, ierr real*8 d(n), e(n), z(n), b, c, f, g, p, r, s, machep C real*8 dsqrt, dabs, dsign, d1mach real*8 dsqrt, dabs, dsign c C machep=d1mach(4) machep = 2.0d0**(-52.0d0) c ierr = 0 if (n .eq. 1) go to 1001 c e(n) = 0.0d0 do 240 l = 1, n j = 0 c :::::::::: look for small sub-diagonal element :::::::::: 105 do 110 m = l, n if (m .eq. n) go to 120 if (dabs(e(m)) .le. machep * (dabs(d(m)) + dabs(d(m+1)))) x go to 120 110 continue c 120 p = d(l) if (m .eq. l) go to 240 if (j .eq. 30) go to 1000 j = j + 1 c :::::::::: form shift :::::::::: g = (d(l+1) - p) / (2.0d0 * e(l)) r = dsqrt(g*g+1.0d0) g = d(m) - p + e(l) / (g + dsign(r, g)) s = 1.0d0 c = 1.0d0 p = 0.0d0 mml = m - l c c :::::::::: for i=m-1 step -1 until l do -- :::::::::: do 200 ii = 1, mml i = m - ii f = s * e(i) b = c * e(i) if (dabs(f) .lt. dabs(g)) go to 150 c = g / f r = dsqrt(c*c+1.0d0) e(i+1) = f * r s = 1.0d0 / r c = c * s go to 160 150 s = f / g r = dsqrt(s*s+1.0d0) e(i+1) = g * r c = 1.0d0 / r s = s * c 160 g = d(i+1) - p r = (d(i) - g) * s + 2.0d0 * c * b p = s * r d(i+1) = g + p g = c * r - b c :::::::::: form first component of vector :::::::::: f = z(i+1) z(i+1) = s * z(i) + c * f 200 z(i) = c * z(i) - s * f c d(l) = d(l) - p e(l) = g e(m) = 0.0d0 go to 105 240 continue c c :::::::::: order eigenvalues and eigenvectors :::::::::: do 300 ii = 2, n i = ii - 1 k = i p = d(i) c do 260 j = ii, n if (d(j) .ge. p) go to 260 k = j p = d(j) 260 continue c if (k .eq. i) go to 300 d(k) = d(i) d(i) = p p = z(i) z(i) = z(k) z(k) = p 300 continue c go to 1001 c :::::::::: set error -- no convergence to an c eigenvalue after 30 iterations :::::::::: 1000 ierr = l 1001 return c :::::::::: last card of gausq2 :::::::::: end statmod/NAMESPACE0000644000176200001440000000134612654044513013154 0ustar liggesusers# Calling the dynamic library useDynLib(statmod) # All functions exported other than those starting with "." exportPattern("^[^\\.]") importFrom("graphics", "abline", "legend", "lines", "plot", "points") importFrom("stats", "Gamma", "binomial", "chisq.test", "dbinom", "fisher.test", "fitted", "glm", "glm.fit", "lm.fit", "lm.wfit", "make.link", "median", "model.matrix", "model.response", "model.weights", "p.adjust", "pbeta", "pbinom", "pchisq", "pgamma", "pnorm", "ppois", "printCoefmat", "qgamma", "qnorm", "quantile", "quasi", "rbinom", "rchisq", "rnorm", "residuals", "runif", "var", "weighted.mean") S3method(print,limdil) S3method(plot,limdil) statmod/data/0000755000176200001440000000000012654044513012642 5ustar liggesusersstatmod/data/welding.rdata0000644000176200001440000000234412654044513015313 0ustar liggesusersRDX2 X  welding  ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ@EÙ™™™™š@D™™™™š@E333333@FY™™™™š@E333333@Fó33333@E™™™™š@DLÌÌÌÌÍ@E333333@FÀ@EÌÌÌÌÌÍ@DLÌÌÌÌÍ@F@D™™™™š@E@@G@ names  Rods Drying Material Thickness Angle Opening Current Method Preheating Strength class data.frame row.names 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16þþstatmod/R/0000755000176200001440000000000012654044513012132 5ustar liggesusersstatmod/R/mscale.R0000644000176200001440000000315712654044513013527 0ustar liggesusersmscale <- function(u, na.rm=FALSE) # Scale M-estimator with 50% breakdown # Yohai (1987) Annals, Stromberg (1993) JASA. # # GKS 2 June 1999 # Revised 17 April 2010 { isna <- is.na(u) if(any(isna)) { if(na.rm) { if(any(!isna)) u <- u[!isna] else return(NA) } else { return(NA) } } if(mean(u==0) >= 0.5) return(0) U <- abs(u) s <- median(U)/0.6744898 iter <- 0 repeat { iter <- iter+1 z <- u/0.212/s d1 <- mean(.rho.hampel(z))-3.75 d2 <- mean(z*.psi.hampel(z)) s <- s*(1+d1/d2) if(iter > 50) { warning("Max iterations exceeded") break } if(abs(d1/d2) < 1e-13) break } s } .rho.hampel <- function(u, a = 1.5, b = 3.5, c = 8) { # Integral of Hampel's redescending psi function (Hampel, Ronchetti, # Rousseeuw and Stahel, 1986, Robust Statistics, Wiley, page 150). # Default values are as in Stromberg (1993) JASA. # # GKS 31 May 99 # U <- abs(u) A <- (U <= a) #increasing B <- (U > a) & (U <= b) #flat C <- (U > b) & (U <= c) #descending D <- (U > c) # zero rho <- U rho[A] <- (U[A] * U[A])/2 rho[B] <- a * (U[B] - a/2) rho[C] <- a * (b - a/2) + a * (U[C] - b) * (1 - (U[C] - b)/(c - b)/2) rho[D] <- (a * (b - a + c))/2 rho } .psi.hampel <- function(u, a = 1.5, b = 3.5, c = 8) { # Hampel's redescending psi function (Hampel, Ronchetti, # Rousseeuw and Stahel, 1986, Robust Statistics, Wiley, page 150). # Default values are as in Stromberg (1993) JASA. # # GKS 2 June 99 # U <- abs(u) B <- (U > a) & (U <= b) #flat C <- (U > b) & (U <= c) #descending D <- (U > c) # zero psi <- u psi[B] <- sign(u[B]) * a psi[C] <- sign(u[C]) * a * (c - U[C])/(c - b) psi[D] <- 0 psi }statmod/R/mixedmodel.R0000644000176200001440000000707512654044513014415 0ustar liggesusers# MIXEDMODEL.R randomizedBlock <- mixedModel2 <- function(formula, random, weights=NULL, only.varcomp=FALSE, data=list(), subset=NULL, contrasts=NULL, tol=1e-6, maxit=50, trace=FALSE) # REML for mixed linear models with 2 variance components # Gordon Smyth, Walter and Eliza Hall Institute # 28 Jan 2003. Last revised 20 October 2005. { # Extract model from formula cl <- match.call() mf <- match.call(expand.dots = FALSE) mf$only.varcomp <- mf$tol <- mf$tol <- mf$maxit <- NULL mf$drop.unused.levels <- TRUE mf[[1]] <- as.name("model.frame") mf <- eval(mf, parent.frame()) mt <- attr(mf, "terms") xvars <- as.character(attr(mt, "variables"))[-1] if((yvar <- attr(mt,"response")) > 0) xvars <- xvars[-yvar] xlev <- if(length(xvars) > 0) { xlev <- lapply(mf[xvars], levels) xlev[!sapply(xlev, is.null)] } y <- model.response(mf, "numeric") w <- model.weights(mf) x <- model.matrix(mt, mf, contrasts) random <- mf[["(random)"]] # Missing values not allowed if(any(is.na(y)) || any(is.na(x)) || any(is.na(random))) stop("Missing values not allowed") if(!is.null(weights)) if(any(is.na(weights))) stop("Missing values not allowed") # Design matrix for random effects lev <- unique.default(random) z <- 0 + (matrix(random,length(random),length(lev)) == t(matrix(lev,length(lev),length(random)))) mixedModel2Fit(y,x,z,w=w,only.varcomp=only.varcomp,tol=tol,maxit=maxit,trace=trace) } randomizedBlockFit <- mixedModel2Fit <- function(y,X,Z,w=NULL,only.varcomp=FALSE,tol=1e-6,maxit=50,trace=FALSE) # REML for mixed linear models with 2 variance components # Fits the model Y = X*BETA + Z*U + E where BETA is fixed # and U is random. # # GAMMA holds the variance components. The errors E and # random effects U are assumed to have covariance matrices # EYE*GAMMA(1) and EYE*GAMMA(2) respectively. # Gordon Smyth, Walter and Eliza Hall Institute # Matlab version 19 Feb 94. Converted to R, 28 Jan 2003. # Last revised 20 Oct 2005 { # Prior weights if(!is.null(w)) { sw <- sqrt(w) y <- sw * y X <- sw * X } # Find null space Q of X X <- as.matrix(X) Z <- as.matrix(Z) mx <- nrow(X) nx <- ncol(X) nz <- ncol(Z) fit <- lm.fit(X,cbind(Z,y)) r <- fit$rank QtZ <- fit$effects[(r+1):mx,1:nz] # Apply Q to Z and transform to independent observations mq <- mx-r if(mq == 0) return(list(varcomp=c(NA,NA))) s <- La.svd(QtZ,nu=mq,nv=0) uqy <- crossprod(s$u,fit$effects[(r+1):mx,nz+1]) d <- rep(0,mq) d[1:length(s$d)] <- s$d^2 dx <- cbind(Residual=1,Block=d) dy <- uqy^2 # Try unweighted starting values dfit <- lm.fit(dx,dy) varcomp <- dfit$coefficients dfitted.values <- dfit$fitted.values # Main fit if(mq > 2 && sum(abs(d)>1e-15)>1 && var(d)>1e-15) { if(all(dfitted.values >= 0)) start <- dfit$coefficients else start <- c(Residual=mean(dy),Block=0) # fit gamma glm identity link to dy with dx as covariates dfit <- glmgam.fit(dx,dy,coef.start=start,tol=tol,maxit=maxit,trace=trace) varcomp <- dfit$coefficients dfitted.values <- dfit$fitted.values } out <- list(varcomp=dfit$coef) out$reml.residuals <- uqy/sqrt(dfitted.values) if(only.varcomp) return(out) # Standard errors for variance components dinfo <- crossprod(dx,vecmat(1/dfitted.values^2,dx)) out$se.varcomp=sqrt(2*diag(chol2inv(chol(dinfo)))) # fixed effect estimates s <- La.svd(Z,nu=mx,nv=0) d <- rep(0,mx) d[1:length(s$d)] <- s$d^2 v <- drop( cbind(Residual=1,Block=d) %*% varcomp ) mfit <- lm.wfit(x=crossprod(s$u,X),y=crossprod(s$u,y),w=1/v) out <- c(out,mfit) out$se.coefficients <- sqrt(diag(chol2inv(mfit$qr$qr))) out } statmod/R/matvec.R0000644000176200001440000000104712654044513013536 0ustar liggesusersmatvec <- function(M,v) { # Multiply the columns of matrix by the elements of a vector, # i.e., compute M %*% diag(v) # Gordon Smyth # 5 July 1999 # v <- as.vector(v) M <- as.matrix(M) if(length(v)!=dim(M)[2]) stop("matvec: Dimensions do not match") t(v * t(M)) } vecmat <- function(v,M) { # Multiply the rows of matrix by the elements of a vector, # i.e., compute diag(v) %*% M # Gordon Smyth # 5 July 1999 # v <- as.vector(v) M <- as.matrix(M) if(length(v)!=dim(M)[1]) stop("vecmat: Dimensions do not match") v * M } statmod/R/power.R0000644000176200001440000000074212654044513013414 0ustar liggesuserspower.fisher.test <- function(p1,p2,n1,n2,alpha=0.05,nsim=100,alternative="two.sided") { # Calculation of power for Fisher's exact test for # comparing two proportions # Gordon smyth # 3 June 2003. Revised 19 Nov 2011. y1 <- rbinom(nsim,size=n1,prob=p1) y2 <- rbinom(nsim,size=n2,prob=p2) y <- cbind(y1,n1-y1,y2,n2-y2) p.value <- rep(0,nsim) for (i in 1:nsim) p.value[i] <- fisher.test(matrix(y[i,],2,2),alternative=alternative)$p.value mean(p.value < alpha) } statmod/R/tweedief.R0000644000176200001440000000311212654044513014046 0ustar liggesusers## TWEEDIEF.R tweedie <- function(var.power=0, link.power=1-var.power) { # Tweedie generalized linear model family # Gordon Smyth # 22 Oct 2002. Last modified 2 Sep 2011. lambda <- link.power if(lambda==0) { linkfun <- function(mu) log(mu) linkinv <- function(eta) pmax(.Machine$double.eps, exp(eta)) mu.eta <- function(eta) pmax(.Machine$double.eps, exp(eta)) valideta <- function(eta) TRUE } else { linkfun <- function(mu) mu^lambda linkinv <- function(eta) eta^(1/lambda) mu.eta <- function(eta) (1/lambda) * eta^(1/lambda - 1) valideta <- function(eta) TRUE } p <- var.power variance <- function(mu) mu^p if(p == 0) validmu <- function(mu) TRUE else if(p > 0) validmu <- function(mu) all(mu >= 0) else validmu <- function(mu) all(mu > 0) dev.resids <- function(y, mu, wt) { y1 <- y + 0.1*(y == 0) if (p == 1) theta <- log(y1/mu) else theta <- ( y1^(1-p) - mu^(1-p) ) / (1-p) if (p == 2) # Returns a finite somewhat arbitrary residual for y==0, although theoretical value is -Inf kappa <- log(y1/mu) else kappa <- ( y^(2-p) - mu^(2-p) ) / (2-p) 2 * wt * (y*theta - kappa) } initialize <- expression({ n <- rep(1, nobs) mustart <- y + 0.1 * (y == 0) }) aic <- function(y, n, mu, wt, dev) NA structure(list( family = "Tweedie", variance = variance, dev.resids = dev.resids, aic = aic, link = paste("mu^",as.character(lambda),sep=""), linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, initialize = initialize, validmu = validmu, valideta = valideta), class = "family") } statmod/R/digamma.R0000644000176200001440000000147012654044513013656 0ustar liggesusers# SPECIAL FUNCTIONS logmdigamma <- function(x) { # log(x) - digamma(x) # Saves computation of log(x) and avoids subtractive cancellation in digamma(x) when x is large # Gordon Smyth, smyth@wehi.edu.au # 19 Jan 98. Last revised 9 Dec 2002. # z <- x if(any(omit <- is.na(z) | Re(z) <= 0)) { ps <- z ps[omit] <- NA if(any(!omit)) ps[!omit] <- Recall(z[!omit]) return(ps) } if(any(small <- Mod(z) < 5)) { ps <- z x <- z[small] ps[small] <- log(x/(x+5)) + Recall(x+5) + 1/x + 1/(x+1) + 1/(x+2) + 1/(x+3) + 1/(x+4) if(any(!small)) ps[!small] <- Recall(z[!small]) return(ps) } x <- 1/z^2 tail <- ((x * (-1/12 + ((x * (1/120 + ((x * (-1/252 + (( x * (1/240 + ((x * (-1/132 + ((x * (691/32760 + ( (x * (-1/12 + (3617 * x)/8160))))))))))))))))))))) 1/(2 * z) - tail } statmod/R/glmnb.R0000644000176200001440000001176512654044513013366 0ustar liggesusersglmnb.fit <- function(X,y,dispersion,weights=NULL,offset=0,coef.start=NULL,start.method="mean",tol=1e-6,maxit=50,trace=FALSE) # Fit negative binomial generalized linear model with log link # by Levenberg damped Fisher scoring # Yunshun Chen and Gordon Smyth # 2 November 2010. Last modified 20 November 2012. { # Check input values for y y <- as.vector(y) if(any(y < 0)) stop("y must be non-negative") if(!all(is.finite(y))) stop("All y values must be finite and non-missing") ymax <- max(y) n <- length(y) # Handle zero length y as special case if(n == 0) stop("y has length zero") # Check input values for X X <- as.matrix(X) if(n != nrow(X)) stop("length(y) not equal to nrow(X)") if(!all(is.finite(X))) stop("All X values must be finite and non-missing") p <- ncol(X) if(p > n) stop("More columns than rows in X") # Handle y all zero as special case if(ymax==0) return(list(coefficients=rep(0,p),fitted.values=rep(0,n),deviance=0,iter=0,convergence="converged")) # Check input values for dispersion if(any(dispersion<0)) stop("dispersion values must be non-negative") phi <- rep(dispersion,length.out=n) # Check input values for offset offset <- rep(offset,length=n) # Check input values for weights if(is.null(weights)) weights <- rep.int(1,n) if(any(weights <= 0)) stop("All weights must be positive") # Starting values delta <- min(ymax,1/6) y1 <- pmax(y,delta) if(is.null(coef.start)) { start.method <- match.arg(start.method,c("log(y)","mean")) if(start.method=="log(y)") { fit <- lm.wfit(X,log(y1)-offset,weights) beta <- fit$coefficients mu <- exp(fit$fitted.values+offset) } else { N <- exp(offset) rate <- y/N w <- weights*N/(1+phi*N) beta.mean <- log(sum(w*rate)/sum(w)) beta <- qr.coef(qr(X),rep(beta.mean,length=n)) mu <- drop(exp(X %*% beta + offset)) } } else { beta <- coef.start mu <- drop(exp(X %*% beta + offset)) } unit.dev.poissonlimit <- function(y,mu,phi) { b <- y-mu b2 <- 0.5*b^2*phi*(1+phi*(2/3*b-y)) 2 * ( y*log(y/mu) - b - b2 ) } unit.dev.gamma <- function(y,mu) { 2 * ( (y-mu)/mu - log(y/mu)) } unit.dev.negbin <- function(y,mu,phi) { 2 * ( y*log(y/mu) - (y+1/phi)*log((1+y*phi)/(1+mu*phi)) ) } total.deviance <- function(y,mu,phi,w) { if(any(is.infinite(mu))) return(Inf) poisson.like <- (phi < 1e-4) gamma.like <- (phi*mu > 1e6) negbin <- !(poisson.like | gamma.like) y <- y+1e-8 mu <- mu+1e-8 unit.dev <- y if(any(poisson.like)) unit.dev[poisson.like] <- unit.dev.poissonlimit(y[poisson.like],mu[poisson.like],phi[poisson.like]) if(any(gamma.like)) { m <- mu[gamma.like] alpha <- m/(1+phi[gamma.like]*m) unit.dev[gamma.like] <- unit.dev.gamma(y[gamma.like],m)*alpha } if(any(negbin)) unit.dev[negbin] <- unit.dev.negbin(y[negbin],mu[negbin],phi[negbin]) sum(w*unit.dev) } dev <- total.deviance(y,mu,phi,weights) # Scoring iteration with Levenberg damping iter <- 0 if(trace) cat("Iter =",iter,", Dev =",dev," Beta",beta,"\n") repeat { iter <- iter+1 # test for iteration limit if(iter > maxit) break # information matrix v.div.mu <- 1+phi*mu XVX <- crossprod(X,(weights*mu/v.div.mu)*X) maxinfo <- max(diag(XVX)) if(iter==1) { lambda <- maxinfo * 1e-6 lambda <- max(lambda,1e-13) lambdaceiling <- maxinfo * 1e13 lambdabig <- FALSE I <- diag(p) } # score vector dl <- crossprod(X,weights*(y-mu)/v.div.mu) # Levenberg damping dbeta <- beta lev <- 0 repeat { lev <- lev+1 # trial step R <- chol(XVX + lambda*I, pivot=TRUE) while(attr(R,"rank") lambdaceiling) if(lambdabig) { warning("Too much damping - convergence tolerance not achievable") break } } # iteration output if(trace) cat("Iter =",iter,", Dev =",dev," Beta",beta,"\n") # keep exiting if too much damping if(lambdabig) break # test for convergence scoresquare <- crossprod(dl,dbeta) if(trace) cat("Convergence criterion",scoresquare,dl,dbeta,"\n") if( scoresquare < tol || dev/ymax < 1e-12) break # decrease damping if successful at first try if(lev==1) lambda <- lambda/10 } beta <- drop(beta) names(beta) <- colnames(X) convergence <- "converged" if(lambdabig) convergence <- "lambdabig" if(iter>maxit) convergence <- "maxit" list(coefficients=beta,fitted.values=as.vector(mu),deviance=dev,iter=iter,convergence=convergence) } statmod/R/forward.R0000644000176200001440000000260012654044513013717 0ustar liggesusersforward <- function(y,x,xkept=NULL,intercept=TRUE,nvar=ncol(x)) # Forward selection for linear regression # 30 Jan 2013 { # Check y y <- as.numeric(y) n <- length(y) # Check x x <- as.matrix(x) if(nrow(x) != n) stop("nrow of x must match length of y") # Check xkept if(!is.null(xkept)) { xkept <- as.matrix(xkept) if(nrow(xkept) != n) stop("nrow of xkept must match length of y") } # Add intercept if(intercept) xkept <- cbind(rep.int(1,n),xkept) # Sweep out xkept columns if(is.null(xkept)) { rank.xkept <- 0 } else { QR <- qr(xkept) y <- qr.resid(QR,y) x <- qr.resid(QR,x) rank.xkept <- QR$rank } # Check nvar nvar <- min(nvar,ncol(x),n-rank.xkept) if(nvar <= 0) return(numeric(0)) orderin <- rep.int(0,nvar) candidates <- 1:ncol(x) for (nin in 1:nvar) { if(ncol(x)==1) { orderin[nin] <- candidates break } # Standardize y <- y/sqrt(sum(y^2)) x <- t(t(x)/sqrt(colSums(x^2))) # Next to add b.y.x <- crossprod(x,y) bestj <- which.max(abs(b.y.x)) bestx <- x[,bestj] # Record and remove best covariate orderin[nin] <- candidates[bestj] candidates <- candidates[-bestj] x <- x[,-bestj,drop=FALSE] # Orthogonalize remaining wrt best covariate y <- y - b.y.x[bestj]*bestx b.x.x <- crossprod(x,bestx) x <- x - matrix(bestx,ncol=1) %*% matrix(b.x.x,nrow=1) } orderin } statmod/R/remlscor.R0000644000176200001440000000604112654044513014104 0ustar liggesusersremlscore <- function(y,X,Z,trace=FALSE,tol=1e-5,maxit=40) # Mean-variance fit by REML scoring # Fit normal(mu,phi) model to y with # mu=X%*%beta and log(phi)=Z%*%gam # # Gordon Smyth # Created 11 Sept 2000. Last modified 30 March 2010. { n <- length(y) p <- dim(X)[2] q <- dim(Z)[2] const <- n*log(2*pi) # initial residuals from unweighted regression fitm <- lm.fit(X,y) if(fitm$qr$rank < p) stop("X is of not of full column rank") Q <- qr.Q(fitm$qr) h <- as.vector(Q^2 %*% array(1, c(p, 1))) d <- fitm$residuals^2 # starting values # use of weights guarantee that regression can be computed even if 1-h = 0 wd <- 1-h zd <- log( d/(1-h) )+1.27 fitd <- lm.wfit(Z,zd,wd) gam <- ifelse(is.na(fitd$coef),0,fitd$coef) g <- fitd$fitted.values phi <- exp(g) wm <- 1/phi fitm <- lm.wfit(X,y,wm) d <- fitm$residuals^2 dev <- sum(d/phi)+sum(log(phi))+const+2*log(prod(abs(diag(fitm$qr$qr)))) # reml scoring iter <- 0 if(trace) cat("Iter =",iter,", Dev =",dev," Gamma",gam,"\n") Q2 <- array(0,c(n,p*(p+1)/2)) repeat { iter <- iter+1 # information matrix and leverages Q <- qr.qy(fitm$qr, diag(1, nrow = n, ncol = p)) j0 <- 0 for(k in 0:(p-1)) { Q2[ ,(j0+1):(j0+p-k)] <- Q[ ,1:(p-k)] * Q[ ,(k+1):p] j0 <- j0+p-k } if(p>1) Q2[ ,(p+1):(p*(p+1)/2)] <- sqrt(2) * Q2[ ,(p+1):(p*(p+1)/2)] h <- drop( Q2[ ,1:p] %*% array(1,c(p,1)) ) Q2Z <- t(Q2) %*% Z ZVZ <- ( t(Z) %*% vecmat(1-2*h,Z) + t(Q2Z) %*% Q2Z )/2 maxinfo <- max(diag(ZVZ)) if(iter==1) { lambda <- abs(mean(diag(ZVZ)))/q I <- diag(q) } # score vector zd <- ( d - (1-h)*phi ) / phi dl <- crossprod(Z,zd)/2 # Levenberg damping gamold <- gam devold <- dev lev <- 0 repeat { lev <- lev+1 # trial step R <- chol(ZVZ + lambda*I) dgam <- backsolve(R,backsolve(R,dl,transpose=TRUE)) gam <- gamold + dgam phi <- as.vector(exp( Z %*% gam )) wm <- 1/phi fitm <- lm.wfit(X,y,wm) d <- fitm$residuals^2 dev <- sum(d/phi)+sum(log(phi))+const+2*log(prod(abs(diag(fitm$qr$qr)))) if(dev < devold - 1e-15) break # exit if too much damping if(lambda/maxinfo > 1e15) { gam <- gamold warning("Too much damping - convergence tolerance not achievable") break } # step not successful so increase damping lambda <- 2*lambda if(trace) cat("Damping increased to",lambda,"\n") } # iteration output if(trace) cat("Iter =",iter,", Dev =",dev," Gamma",gam,"\n") # keep exiting if too much damping if(lambda/maxinfo > 1e15) break # decrease damping if successful at first try if(lev==1) lambda <- lambda/10 # test for convergence if( crossprod(dl,dgam) < tol ) break # test for iteration limit if(iter > maxit) { warning("reml: Max iterations exceeded") break } } # Nominal standard errors cov.gam <- chol2inv(chol(ZVZ)) se.gam <- sqrt(diag(cov.gam)) cov.beta <- chol2inv(qr.R(fitm$qr)) se.beta <- sqrt(diag(cov.beta)) list(beta=fitm$coef,se.beta=se.beta,gamma=gam,se.gam=se.gam,mu=fitm$fitted,phi=phi,deviance=dev,h=h, cov.beta=cov.beta,cov.gam=cov.gam,iter=iter) } statmod/R/permp.R0000644000176200001440000000214312654044513013400 0ustar liggesuserspermp <- function(x,nperm,n1,n2,total.nperm=NULL,method="auto",twosided=TRUE) # Exact permutation p-values # Belinda Phipson and Gordon Smyth # 16 February 2010. Last modified 27 May 2010. { if(any(x<0)) stop("negative x values") if(any(x>nperm)) stop("x cannot exceed nperm") if(is.null(total.nperm)) { total.nperm <- choose((n1+n2),n1) if(n1==n2 & twosided==TRUE) total.nperm <- total.nperm/2 } method <- match.arg(method,c("auto","exact","approximate")) if(method=="auto") method <- ifelse(total.nperm>10000,"approximate","exact") # exact p-value by summation if(method=="exact") { p <- (1:total.nperm)/total.nperm prob <- rep(p,length(x)) x2 <- rep(x,each=total.nperm) Y <- matrix(pbinom(x2,prob=prob,size=nperm),total.nperm,length(x)) x[] <- colSums(Y)/total.nperm } # integral approximation else { z <- gauss.quad.prob(128,l=0,u=0.5/total.nperm) prob <- rep(z$nodes,length(x)) x2 <- rep(x,each=128) Y <- matrix(pbinom(x2,prob=prob,size=nperm),128,length(x)) int <- 0.5/total.nperm*colSums(z$weights*Y) x[] <- (x+1)/(nperm+1)-int } x } statmod/R/fitNBP.R0000644000176200001440000000455412654044513013407 0ustar liggesusers## fitNBP.R fitNBP <- function(y,group=NULL,lib.size=colSums(y),tol=1e-5,maxit=40,verbose=FALSE) # Fit multi-group negative-binomial model to SAGE data # with Pearson estimation of common overdispersion # Gordon Smyth # 8 July 2006. Last modified 13 July 2009. { # Argument checking y <- as.matrix(y) if(is.null(group)) group <- rep(1,ncol(y)) group <- as.factor(group) if(length(group) != ncol(y)) stop("length(group) must agree with ncol(y)") # Derived quantities ngenes <- nrow(y) nlib <- ncol(y) ngroups <- length(levels(group)) res.df <- ncol(y)-ngroups ind <- matrix(FALSE,nlib,ngroups) for (i in 1:ngroups) ind[,i] <- group==levels(group)[i] # Starting values offset <- matrix(1,ngenes,1) %*% log(lib.size) mu <- pmax(y,0.5) phi <- 0 w <- mu z <- w*(log(mu)-offset) beta <- matrix(0,ngenes,ngroups) eta <- offset for (i in 1:ngroups) { beta[,i] <- rowSums(z[,ind[,i],drop=FALSE])/rowSums(w[,ind[,i],drop=FALSE]) eta[,ind[,i]] <- eta[,ind[,i]]+beta[,i] } if(verbose) cat("mean coef",colMeans(beta),"\n") mu <- exp(eta) # Alternating iterations iter <- 0 repeat{ # Update phi iter <- iter+1 if(iter > maxit) { warning("maxit exceeded") break } e2 <- (y-mu)^2 dV <- mu*mu # Need to ensure phi is converging from below inneriter <- 0 repeat { inneriter <- inneriter+1 if(inneriter > 10) stop("problem with inner iteration") V <- mu*(1+phi*mu) X2 <- sum(e2/V)/res.df-ngenes if(X2 >= 0) { low <- phi break } else { if(phi==0) break if(inneriter > 4) phi <- 0.9*phi else phi <- (low+phi)/2 if(verbose) cat("mean disp",phi,"\n") } } if(X2<0) break dX2 <- sum(e2/V/V*dV)/res.df step.phi <- X2/pmax(dX2,1e-6) phi <- phi+step.phi conv.crit <- step.phi/(phi+1) if(verbose) cat("Conv criterion",conv.crit,"\n") if(conv.crit < tol) break # Update mu w <- mu/(1+phi*mu) z <- (y-mu)/V*mu eta <- offset for (i in 1:ngroups) { beta[,i] <- beta[,i]+rowSums(z[,ind[,i],drop=FALSE])/rowSums(w[,ind[,i],drop=FALSE]) eta[,ind[,i]] <- eta[,ind[,i]]+beta[,i] } if(verbose) cat("mean coef",colMeans(beta),"\n") if(verbose) cat("disp",phi,"\n") mu <- exp(eta) } colnames(beta) <- levels(group) dimnames(mu) <- dimnames(y) list(coefficients=beta,fitted.values=mu,dispersion=phi) } statmod/R/hommel.R0000644000176200001440000000105012654044513013532 0ustar liggesusershommel.test <- # Multiple testing from Hommel (1988). # Similar but very slightly more powerful that Hochberg (1988). # Controls Family-Wise Error rate for hypotheses which are independent or # which satisfy the free-association condition of Simes (1986). # Gordon Smyth, Walter and Eliza Hall Institute, smyth@wehi.edu.au # 29 Aug 2002 function(p,alpha=0.05) { n <- length(p) i <- 1:n po <- sort(p) j <- n repeat { k <- 1:j if(all( po[n - j + k] > k * alpha / j )) break j <- j-1 if(j == 0) break } p >= alpha/j } statmod/R/elda.R0000644000176200001440000002470512654044513013172 0ustar liggesusers# LIMDIL.R elda <- limdil <- function(response, dose, tested = rep(1, length(response)), group=rep(1,length(response)), observed = FALSE, confidence = 0.95, test.unit.slope = FALSE) # Limiting dilution analysis # Gordon Smyth, Yifang Hu # 21 June 2005. Last revised 18 August 2015. { n <- length(response) if(n==0) stop("No data") if(length(dose) != n) stop("length(dose) doesn't match length(response)") if(length(tested) != n) { if(length(tested)==1) tested <- rep_len(tested,n) else stop("length(tested) doesn't match length(response)") } # Allow for structural zeros SZ <- response==0 & (dose==0 | tested==0) if(any(SZ)) { i <- !SZ out <- Recall(response=response[i],dose=dose[i],tested=tested[i],group=group[i],observed=observed,confidence=confidence,test.unit.slope=test.unit.slope) out$response <- response out$dose <- dose out$tested <- tested return(out) } # Check valid data y <- response/tested if (any(y < 0)) stop("Negative values for response or tested") if (any(y > 1)) stop("The response cannot be greater than the number tested") if (any(dose <= 0)) stop("dose must be positive") size <- 1 - confidence out <- list() f <- binomial(link = "cloglog") f$aic <- quasi()$aic group <- factor(group) num.group <- length(levels(group)) groupLevel <- levels(group) out$response <- response out$tested <- tested out$dose <- dose out$group <- group out$num.group <- num.group class(out) <- "limdil" out$CI <- matrix(nrow=num.group,ncol=3) colnames(out$CI) <- c("Lower","Estimate","Upper") rownames(out$CI) <- paste("Group",levels(group)) # Groupwise frequency estimates deviance0 <- dloglik.logdose <- FisherInfo.logdose <- dloglik.dose <- FisherInfo.dose <- 0 for(i in 1:num.group) { index <- (group == groupLevel[i]) fit0 <- eldaOneGroup(response=response[index],dose=dose[index],tested=tested[index],observed=observed,confidence=confidence,trace=FALSE) deviance0 <- deviance0 + fit0$deviance dloglik.logdose <- dloglik.logdose + fit0$dloglik.logdose FisherInfo.logdose <- FisherInfo.logdose + fit0$FisherInfo.logdose dloglik.dose <- dloglik.dose + fit0$dloglik.dose FisherInfo.dose <- FisherInfo.dose + fit0$FisherInfo.dose out$CI[i,] <- pmax(fit0$CI.frequency,1) } # Test for difference between groups if(num.group>1) { fitequal <- eldaOneGroup(response=response,dose=dose,tested=tested,observed=observed,confidence=confidence,trace=FALSE) dev.g <- pmax(fitequal$deviance - deviance0, 0) group.p <- pchisq(dev.g, df=num.group-1, lower.tail=FALSE) out$test.difference <- c(Chisq=dev.g, P.value=group.p, df=num.group-1) } # Test for unit slope if(test.unit.slope) { if(is.na(FisherInfo.logdose)) FisherInfo.logdose <- 0 if(FisherInfo.logdose > 1e-15) { # Wald test if(num.group>1) fit.slope <- suppressWarnings(glm(y~group+log(dose), family=f, weights=tested)) else fit.slope <- suppressWarnings(glm(y~log(dose), family=f, weights=tested)) s.slope <- summary(fit.slope) est.slope <- s.slope$coef["log(dose)","Estimate"] se.slope <- s.slope$coef["log(dose)", "Std. Error"] z.wald <- (est.slope-1)/se.slope p.wald <- 2*pnorm(-abs(z.wald)) out$test.slope.wald <- c("Estimate"=est.slope, "Std. Error"=se.slope, "z value"=z.wald, "Pr(>|z|)"=p.wald) # Likelihood ratio test dev <- pmax(deviance0 - fit.slope$deviance,0) z.lr <- sqrt(dev)*sign(z.wald) p.lr <- pchisq(dev, df = 1, lower.tail = FALSE) out$test.slope.lr <- c("Estimate"=NA, "Std. Error"=NA, "z value"=z.lr, "Pr(>|z|)"=p.lr) # Score tests for log(dose) and dose z.score.logdose <- dloglik.logdose / sqrt(FisherInfo.logdose) p.score.logdose <- 2*pnorm(-abs(z.score.logdose)) z.score.dose <- dloglik.dose / sqrt(FisherInfo.dose) p.score.dose <- 2*pnorm(-abs(z.score.dose)) out$test.slope.score.logdose <- c("Estimate"= NA, "Std. Error"=NA, "z value"=z.score.logdose,"Pr(>|z|)"=p.score.logdose) out$test.slope.score.dose <- c("Estimate"= NA, "Std. Error"=NA, "z value"=z.score.dose,"Pr(>|z|)"=p.score.dose) } else { out$test.slope.wald <- out$test.slope.lr <- out$test.slope.score.logdose <- out$test.slope.score.dose <- c("Estimate"=NA, "Std. Error"=NA, "z value"=NA, "Pr(>|z|)"=1) } } out } print.limdil <- function(x, ...) # Print method for limdil objects # Yifang Hu and Gordon Smyth # 20 February 2009. Last revised 31 January 2013. { cat("Confidence intervals for frequency:\n\n") print(x$CI) if(!is.null(x$test.difference)) { difference <- x$test.difference cat("\nDifferences between groups:\n") cat("Chisq",difference[1], "on", difference[3], "DF, p-value:", format.pval(difference[2],4), "\n") } if(!is.null(x$test.slope.wald)) { a <- rbind(x$test.slope.wald, x$test.slope.lr, x$test.slope.score.logdose, x$test.slope.score.dose) a <- data.frame(a, check.names=FALSE) rownames(a) <- c("Wald test", "LR test", "Score test: log(Dose)", "Score test: Dose") cat("\nGoodness of fit (test log-Dose slope equals 1):\n") suppressWarnings(printCoefmat(a,tst.ind=1,has.Pvalue=TRUE,P.values=TRUE)) } } plot.limdil <- function(x, col.group=NULL, cex=1, lwd=1, legend.pos="bottomleft", ...) # Plot method for limdil objects # Yifang Hu and Gordon Smyth # 20 February 2009. Last revised 6 February 2013. { x$group <- factor(x$group) num.group <- nlevels(x$group) if(is.null(col.group)) col.group <- 1:num.group else col.group <- rep(col.group,num.group) col <- x$group levels(col) <- col.group col <- as.character(col) dose <- x$dose maxx <- max(dose) i <- x$response==x$tested x$response[i] <- x$response[i]-0.5 nonres <- log(1-x$response/x$tested) if(num.group>1 && any(i)) nonres <- pmin(0,jitter(nonres)) miny <- min(nonres) plot(x=1,y=1,xlim=c(0,maxx),ylim=c(min(miny,-0.5),0),xlab="dose (number of cells)",ylab="log fraction nonresponding",type="n",...) points(dose[!i],nonres[!i],pch=1,col=col[!i],cex=cex) points(dose[i],nonres[i],pch=6,col=col[i],cex=cex) for(g in 1:num.group) { abline(a=0,b=-1/x$CI[g,2],col=col.group[g],lty=1,lwd=lwd) abline(a=0,b=-1/x$CI[g,1],col=col.group[g],lty=2,lwd=lwd) abline(a=0,b=-1/x$CI[g,3],col=col.group[g],lty=2,lwd=lwd) } if(num.group>1) legend(legend.pos,legend=paste("Group",levels(x$group)),text.col=col.group,cex=0.6*cex) invisible(list(x=dose,y=nonres,group=x$group)) } .limdil.allpos <- function(tested, dose, confidence, observed) # One-sided confidence interval when all assays are positive # Uses globally convergent Newton iteration # Yifang Hu. # Created 18 March 2009. Last modified 18 Dec 2012. { alpha <- 1 - confidence dosem <- min(dose) tested.group <- tested tested.sum <- sum(tested.group[dose == dosem]) beta <- log(-log(1 - alpha^(1/tested.sum))) - log(dosem) # Starting value lambda <- exp(beta) if(observed) lambda <- -expm1(lambda) # Newton-iteration repeat { if(observed) f <- sum(tested*log(1-(1-lambda)^dose))-log(alpha) else f <- sum(tested*log(1-exp(-lambda*dose)))-log(alpha) if(observed) deriv <- sum(tested*(-dose)*(1-lambda)^(dose-1)/(1-(1-lambda)^dose)) else deriv <- sum(tested*dose*exp(-dose*lambda)/(1-exp(-dose*lambda))) step <- f/deriv lambda <- lambda-step if(-step < 1e-6) break } lambda } eldaOneGroup <- function(response,dose,tested,observed=FALSE,confidence=0.95,tol=1e-8,maxit=100,trace=FALSE) # Estimate active cell frequency from LDA data # using globally convergent Newton iteration # Gordon Smyth # 5 Dec 2012. Last modified 30 Jan 2013. { y <- response n <- tested d <- dose phat <- y/n size <- 1-confidence # Special case of all negative responses if(all(y < 1e-14)) { N <- sum(dose*tested) if (observed) U <- 1 - size^(1/N) else U <- -log(size)/N out <- list() out$CI.frequency <- c(Lower = Inf, Estimate = Inf, Upper = 1/U) out$deviance <- out$dloglik.logdose <- out$FisherInfo.logdose <- out$dloglik.dose <- out$FisherInfo.dose <- 0 return(out) } # Special case of all positive responses if(all(phat > 1-1e-14)) { U <- .limdil.allpos(tested=tested,dose=dose,confidence=confidence,observed=observed) out <- list() out$CI.frequency <- c(Lower = 1/U, Estimate = 1, Upper = 1) out$deviance <- out$dloglik.logdose <- out$FisherInfo.logdose <- out$dloglik.dose <- out$FisherInfo.dose <- 0 return(out) } # Starting value guaranteed to be left of the solution pmean <- mean(y)/mean(n) lambda <- -log1p(-pmean) / max(d) if(trace) cat(0,lambda,1/lambda,"\n") # Globally convergent Newton iteration iter <- 0 repeat{ iter <- iter+1 if(iter > maxit) { warning("max iterations exceeded") break } p <- -expm1(-lambda*d) onemp <- exp(-lambda*d) # First derivative dloglik.lambda <- mean(n*d*(phat-p)/p) # Second derivative d2loglik.lambda <- -mean(n*phat*d*d*onemp/p/p) # Newton step step <- dloglik.lambda / d2loglik.lambda lambda <- lambda - step if(trace) cat(iter,lambda,1/lambda,step,"\n") if(abs(step) < tol) break } # Wald confidence interval for alpha alpha <- log(lambda) p <- -expm1(-lambda*d) onemp <- exp(-lambda*d) FisherInfo.alpha <- sum(n*d*d*onemp/p)*lambda^2 SE.alpha <- 1/sqrt(FisherInfo.alpha) z <- qnorm( (1-confidence)/2, lower.tail=FALSE ) CI.alpha <- c(Lower=alpha-z*SE.alpha,Estimate=alpha,Upper=alpha+z*SE.alpha) # Wald confidence interval for frequency if(observed) CI.frequency <- -1/expm1(-exp(CI.alpha)) else CI.frequency <- exp(-CI.alpha) # Deviance f <- binomial(link="cloglog") deviance <- sum(f$dev.resid(phat,p,n)) # Score test for log(dose) unit slope v <- p*onemp/n x <- log(d) eta <- alpha+x mu.eta <- f$mu.eta(eta) info.alpha <- mu.eta^2/v xmean <- sum(x*info.alpha)/sum(info.alpha) mu.beta <- (x-xmean)*mu.eta dloglik.beta <- sum(mu.beta*(phat-p)/v) FisherInfo.beta <- sum(mu.beta^2/v) z.scoretest <- dloglik.beta/sqrt(FisherInfo.beta) # Score test for dose x <- d xmean <- sum(x*info.alpha)/sum(info.alpha) mu.beta <- (x-xmean)*mu.eta dloglik.beta.dose <- sum(mu.beta*(phat-p)/v) FisherInfo.beta.dose <- sum(mu.beta^2/v) z.scoretest.dose <- dloglik.beta.dose/sqrt(FisherInfo.beta.dose) list(p=p,lambda=lambda,alpha=alpha,CI.alpha=CI.alpha,CI.frequency=CI.frequency,deviance=deviance,iter=iter,z.scoretest=z.scoretest,z.scoretest.dose=z.scoretest.dose,dloglik.logdose=dloglik.beta,FisherInfo.logdose=FisherInfo.beta,dloglik.dose=dloglik.beta.dose,FisherInfo.dose=FisherInfo.beta.dose) } statmod/R/glmscoretest.R0000644000176200001440000000135312654044513014772 0ustar liggesusers## glmscore.R glm.scoretest <- function(fit, x2, dispersion=NULL) # Score test for new covariate in glm # Gordon Smyth # 27 March 2009. Last modified 20 Mar 2010. { w <- fit$weights r <- fit$residuals if(any(w <= 0)) { r <- r[w>0] x2 <- x2[w>0] w <- w[w>0] } if (is.null(dispersion)) { fixed.dispersion <- (fit$family$family %in% c("poisson","binomial")) if(fixed.dispersion) dispersion <- 1 else if(fit$df.residual > 0) { dispersion <- sum(w*r^2)/fit$df.residual } else { stop("No residual df available to estimate dispersion") } } ws <- sqrt(w) x2.1w <- qr.resid(fit$qr,ws*x2) zw <- ws*r colSums(as.matrix(x2.1w*zw))/sqrt(colSums(as.matrix(x2.1w * x2.1w)))/sqrt(dispersion) } statmod/R/qres.R0000644000176200001440000000747512654044513013244 0ustar liggesusers## QRES.R qresiduals <- qresid <- function(glm.obj, dispersion=NULL) # Wrapper function for quantile residuals # Peter K Dunn # 28 Sep 2004. Last modified 5 Oct 2004. { glm.family <- glm.obj$family$family if(substr(glm.family,1,17)=="Negative Binomial") glm.family <- "nbinom" switch(glm.family, binomial = qres.binom( glm.obj), poisson = qres.pois(glm.obj), Gamma = qres.gamma(glm.obj, dispersion), inverse.gaussian = qres.invgauss(glm.obj, dispersion), Tweedie = qres.tweedie(glm.obj, dispersion), nbinom = qres.nbinom(glm.obj), qres.default(glm.obj, dispersion) )} qres.binom <- function(glm.obj) # Randomized quantile residuals for binomial glm # Gordon Smyth # 20 Oct 96. Last modified 25 Jan 02. { p <- fitted(glm.obj) y <- glm.obj$y if(!is.null(glm.obj$prior.weights)) n <- glm.obj$prior.weights else n <- rep(1,length(y)) y <- n * y a <- pbinom(y - 1, n, p) b <- pbinom(y, n, p) u <- runif(n = length(y), min = a, max = b) qnorm(u) } qres.pois <- function(glm.obj) # Quantile residuals for Poisson glm # Gordon Smyth # 28 Dec 96 { y <- glm.obj$y mu <- fitted(glm.obj) a <- ppois(y - 1, mu) b <- ppois(y, mu) u <- runif(n = length(y), min = a, max = b) qnorm(u) } qres.gamma <- function(glm.obj, dispersion = NULL) # Quantile residuals for gamma glm # Gordon Smyth # 28 Dec 96. Last modified 10 Jan 97 { mu <- fitted(glm.obj) y <- glm.obj$y df <- glm.obj$df.residual w <- glm.obj$prior.weights if(is.null(w)) w <- 1 if(is.null(dispersion)) dispersion <- sum(w * ((y - mu)/mu)^2)/df u <- pgamma((w * y)/mu/dispersion, w/dispersion) qnorm(u) } qres.invgauss <- function(glm.obj, dispersion = NULL) # Quantile residuals for inverse Gaussian glm # Gordon Smyth # Created 15 Jan 98. Last modified 31 May 2014. { mu <- fitted(glm.obj) y <- glm.obj$y df <- glm.obj$df.residual w <- glm.obj$prior.weights if(is.null(w)) w <- 1 if(is.null(dispersion)) dispersion <- sum(w * (y - mu)^2 / (mu^2*y)) / df up <- y>mu down <- y 0, pbeta(p, size, pmax(y, 1)), 0) b <- pbeta(p, size, y + 1) u <- runif(n = length(y), min = a, max = b) qnorm(u) } qres.tweedie <- function(glm.obj, dispersion = NULL) # Quantile residuals for Tweedie glms # Gordon Smyth # Created 29 April 1998. Last modified 30 March 2015. { requireNamespace("tweedie") mu <- fitted(glm.obj) y <- glm.obj$y df <- glm.obj$df.residual w <- glm.obj$prior.weights if(is.null(w)) w <- 1 p <- get("p",envir=environment(glm.obj$family$variance)) if(is.null(dispersion)) dispersion <- sum((w * (y - mu)^2)/mu^p)/df u <- tweedie::ptweedie(q=y, power=p, mu=fitted(glm.obj), phi=dispersion/w) if(p>1&&p<2) u[y == 0] <- runif(sum(y == 0), min = 0, max = u[y == 0]) qnorm(u) } qres.default <- function(glm.obj, dispersion=NULL) # Quantile residuals for Gaussian and default glms # Gordon Smyth # 5 Oct 2004. { r <- residuals(glm.obj, type="deviance") if(is.null(dispersion)) { df.r <- glm.obj$df.residual if(df.r > 0) { if(any(glm.obj$weights==0)) warning("observations with zero weight ", "not used for calculating dispersion") dispersion <- sum(glm.obj$weights*glm.obj$residuals^2)/df.r } else dispersion <- 1 } r/sqrt(dispersion) } statmod/R/glmgam.R0000644000176200001440000000660012654044513013523 0ustar liggesusers# GLMGAM.R glmgam.fit <- function(X,y,coef.start=NULL,tol=1e-6,maxit=50,trace=FALSE) { # Fit gamma generalized linear model with identity link # by Levenberg damped Fisher scoring # Gordon Smyth # 12 Mar 2003. Last revised 3 November 2010. # check input X <- as.matrix(X) n <- nrow(X) p <- ncol(X) if(p > n) stop("More columns than rows in X") y <- as.vector(y) if(n != length(y)) stop("length(y) not equal to nrow(X)") if(n == 0) return(list(coefficients=numeric(0),fitted.values=numeric(0),deviance=numeric(0))) if(!(all(is.finite(y)) || all(is.finite(X)))) stop("All values must be finite and non-missing") if(any(y < 0)) stop("y must be non-negative") maxy <- max(y) if(maxy==0) return(list(coefficients=rep(0,p),fitted.values=rep(0,n),deviance=NA)) y1 <- pmax(y,maxy*1e-3) # starting values if(is.null(coef.start)) { fit <- lm.fit(X,y) beta <- fit$coefficients mu <- fit$fitted.values if(any(mu < 0)) { fit <- lm.wfit(X,y,1/y1^2) beta <- fit$coefficients mu <- fit$fitted.values } if(any(mu < 0)) { fit <- lm.fit(X,rep(mean(y),n)) beta <- fit$coefficients mu <- fit$fitted.values } if(any(mu < 0)) { samesign <- apply(X>0,2,all) | apply(X<0,2,all) if(any(samesign)) { i <- (1:p)[samesign][1] beta <- rep(0,p) beta[i] <- lm.wfit(X[,i,drop=FALSE],y,1/y1^2)$coef mu <- X[,i] * beta[i] } else return(list(coefficients=rep(0,p),fitted.values=rep(0,n),deviance=Inf)) } } else { beta <- coef.start mu <- X %*% beta } if(any(mu<0)) stop("Starting values give negative fitted values") deviance.gamma <- function(y,mu) { if(any(mu<0)) return(Inf) o <- (y < 1e-15) & (mu < 1e-15) if(any(o)) { if(all(o)) { dev <- 0 } else { y1 <- y[!o] mu1 <- mu[!o] dev <- 2*sum( (y1-mu1)/mu1 - log(y1/mu1) ) } } else { dev <- 2*sum( (y-mu)/mu - log(y/mu) ) } } dev <- deviance.gamma(y,mu) # Scoring iteration with Levenberg damping iter <- 0 if(trace) cat("Iter =",iter,", Dev =",dev," Beta",beta,"\n") repeat { iter <- iter+1 # information matrix v <- mu^2 v <- pmax(v,max(v)/10^3) XVX <- crossprod(X,vecmat(1/v,X)) maxinfo <- max(diag(XVX)) if(iter==1) { lambda <- abs(mean(diag(XVX)))/p I <- diag(p) } # score vector dl <- crossprod(X,(y-mu)/v) # Levenberg damping betaold <- beta devold <- dev lev <- 0 repeat { lev <- lev+1 # trial step R <- chol(XVX + lambda*I) dbeta <- backsolve(R,backsolve(R,dl,transpose=TRUE)) beta <- betaold + dbeta mu <- X %*% beta dev <- deviance.gamma(y,mu) if(dev <= devold || dev/max(mu) < 1e-15) break # exit if too much damping if(lambda/maxinfo > 1e15) { beta <- betaold warning("Too much damping - convergence tolerance not achievable") break } # step not successful so increase damping lambda <- 2*lambda if(trace) cat("Damping increased to",lambda,"\n") } # iteration output if(trace) cat("Iter =",iter,", Dev =",dev," Beta",beta,"\n") # keep exiting if too much damping if(lambda/maxinfo > 1e15) break # decrease damping if successful at first try if(lev==1) lambda <- lambda/10 # test for convergence if( crossprod(dl,dbeta) < tol || dev/max(mu) < 1e-15) break # test for iteration limit if(iter > maxit) break } beta <- drop(beta) names(beta) <- colnames(X) list(coefficients=beta,fitted.values=as.vector(mu),deviance=dev,iter=iter) } statmod/R/gaussquad.R0000644000176200001440000000715212654044513014257 0ustar liggesusers# NUMERICAL INTEGRATION gauss.quad <- function(n,kind="legendre",alpha=0,beta=0) # Calculate nodes and weights for Gaussian quadrature. # Adapted from Netlib routine gaussq.f # Gordon Smyth, Walter and Eliza Hall Institute # Suggestion from Stephane Laurent 6 Aug 2012 # 4 Sept 2002. Last modified 7 Aug 2012. { n <- as.integer(n) if(n<0) stop("need non-negative number of nodes") if(n==0) return(list(nodes=numeric(0), weights=numeric(0))) kind <- match.arg(kind,c("legendre","chebyshev1","chebyshev2","hermite","jacobi","laguerre")) i <- 1:n i1 <- i[-n] # 1:(n-1) switch(kind, legendre={ lnmuzero <- log(2) a <- rep(0,n) b <- i1/sqrt(4*i1^2-1) }, chebyshev1={ lnmuzero <- log(pi) a <- rep(0,n) b <- rep(0.5,n-1) b[1] <- sqrt(0.5) }, chebyshev2={ lnmuzero <- log(pi/2) a <- rep(0,n) b <- rep(0.5,n-1) }, hermite={ lnmuzero <- log(pi)/2 a <- rep(0,n) b <- sqrt(i1/2) }, jacobi={ ab <- alpha+beta # muzero <- 2^(ab+1) * gamma(alpha+1) * gamma(beta+1) / gamma(ab+2) lnmuzero <- (ab+1)*log(2) + lgamma(alpha+1) + lgamma(beta+1) - lgamma(ab+2) a <- i a[1] <- (beta-alpha)/(ab+2) i2 <- 2:n abi <- ab+2*i2 a[i2] <- (beta^2-alpha^2)/(abi-2)/abi b <- i1 b[1] <- sqrt(4*(alpha+1)*(beta+1)/(ab+2)^2/(ab+3)) i2 <- i1[-1] # 2:(n-1) abi <- ab+2*i2 b[i2] <- sqrt(4*i2*(i2+alpha)*(i2+beta)*(i2+ab)/(abi^2-1)/abi^2) }, laguerre={ a <- 2*i-1+alpha b <- sqrt(i1*(i1+alpha)) lnmuzero <- lgamma(alpha+1) }) b <- c(b,0) z <- rep.int(0,n) z[1] <- 1 ierr <- 0L out <- .Fortran("gausq2",n,as.double(a),as.double(b),as.double(z),ierr,PACKAGE="statmod") x <- out[[2]] w <- out[[4]] w <- exp(lnmuzero + 2*log(abs(w))) list(nodes=x,weights=w) } gauss.quad.prob <- function(n,dist="uniform",l=0,u=1,mu=0,sigma=1,alpha=1,beta=1) # Calculate nodes and weights for Guassian quadrature using probability densities. # Adapted from Netlib routine gaussq.f # Gordon Smyth, Walter and Eliza Hall Institute # Corrections for n=1 and n=2 by Spencer Graves, 28 Dec 2005 # 4 Sept 2002. Last modified 7 Aug 2012. { n <- as.integer(n) if(n<0) stop("need non-negative number of nodes") if(n==0) return(list(nodes=numeric(0), weights=numeric(0))) dist <- match.arg(dist,c("uniform","beta1","beta2","normal","beta","gamma")) if(n==1){ switch(dist, uniform={x <- (l+u)/2}, beta1=,beta2=,beta={x <- alpha/(alpha+beta)}, normal={x <- mu}, gamma={x <- alpha*beta} ) return(list(nodes=x, weights=1)) } if(dist=="beta" && alpha==0.5 && beta==0.5) dist <- "beta1" if(dist=="beta" && alpha==1.5 && beta==1.5) dist <- "beta2" i <- 1:n i1 <- 1:(n-1) switch(dist, uniform={ a <- rep(0,n) b <- i1/sqrt(4*i1^2-1) }, beta1={ a <- rep(0,n) b <- rep(0.5,n-1) b[1] <- sqrt(0.5) }, beta2={ a <- rep(0,n) b <- rep(0.5,n-1) }, normal={ a <- rep(0,n) b <- sqrt(i1/2) }, beta={ ab <- alpha+beta a <- i a[1] <- (alpha-beta)/ab i2 <- 2:n abi <- ab-2+2*i2 a[i2] <- ((alpha-1)^2-(beta-1)^2)/(abi-2)/abi b <- i1 b[1] <- sqrt(4*alpha*beta/ab^2/(ab+1)) i2 <- i1[-1] # 2:(n-1) abi <- ab-2+2*i2 b[i2] <- sqrt(4*i2*(i2+alpha-1)*(i2+beta-1)*(i2+ab-2)/(abi^2-1)/abi^2) }, gamma={ a <- 2*i+alpha-2 b <- sqrt(i1*(i1+alpha-1)) }) b <- c(b,0) z <- rep.int(0,n) z[1] <- 1 ierr <- 0L out <- .Fortran("gausq2",n,as.double(a),as.double(b),as.double(z),ierr,PACKAGE="statmod") x <- out[[2]] w <- out[[4]]^2 switch(dist, uniform = x <- l+(u-l)*(x+1)/2, beta1=,beta2=,beta = x <- (x+1)/2, normal = x <- mu + sqrt(2)*sigma*x, gamma = x <- beta*x) list(nodes=x,weights=w) } statmod/R/growthcurve.R0000644000176200001440000000656512654044513014650 0ustar liggesusersmeanT <- function(y1,y2) { # Mean t-statistic difference between two groups of growth curves # Columns are time points, rows are individuals # Gordon Smyth # 14 Feb 2003 if(is.null(dim(y1)) || is.null(dim(y2))) return(NA) y1 <- as.matrix(y1) y2 <- as.matrix(y2) if(ncol(y1) != ncol(y2)) stop("Number of time points must match") m1 <- colMeans(y1,na.rm=TRUE) m2 <- colMeans(y2,na.rm=TRUE) v1 <- apply(y1,2,var,na.rm=TRUE) v2 <- apply(y2,2,var,na.rm=TRUE) n1 <- apply(!is.na(y1),2,sum) n2 <- apply(!is.na(y2),2,sum) s <- ( (n1-1)*v1 + (n2-1)*v2 ) / (n1+n2-2) t.stat <- (m1-m2) / sqrt(s*(1/n1+1/n2)) weighted.mean(t.stat,w=(n1+n2-2)/(n1+n2),na.rm=TRUE) } compareTwoGrowthCurves <- function(group,y,nsim=100,fun=meanT) { # Permutation test between two groups of growth curves # Columns are time points, rows are individuals # Gordon Smyth # 14 Feb 2003 group <- as.vector(group) g <- unique(group) if(length(g) != 2) stop("Must be exactly 2 groups") stat.obs <- fun(y[group==g[1],,drop=FALSE], y[group==g[2],,drop=FALSE]) asbig <- 0 for (i in 1:nsim) { pgroup <- sample(group) stat <- fun(y[pgroup==g[1],,drop=FALSE], y[pgroup==g[2],,drop=FALSE]) if(abs(stat) >= abs(stat.obs)) asbig <- asbig+1 } list(stat=stat.obs, p.value=asbig/nsim) } compareGrowthCurves <- function(group,y,levels=NULL,nsim=100,fun=meanT,times=NULL,verbose=TRUE,adjust="holm") { # All pairwise permutation tests between groups of growth curves # Columns of y are time points, rows are individuals # Gordon Smyth # 14 Feb 2003. Last modified 17 Nov 2003. group <- as.character(group) if(is.null(levels)) { tab <- table(group) tab <- tab[tab >= 2] lev <- names(tab) } else lev <- as.character(levels) nlev <- length(lev) if(nlev < 2) stop("Less than 2 groups to compare") if(is.null(dim(y))) stop("y must be matrix-like") y <- as.matrix(y) if(!is.null(times)) y <- y[,times,drop=FALSE] g1 <- g2 <- rep("",nlev*(nlev-1)/2) stat <- pvalue <- rep(0,nlev*(nlev-1)/2) pair <- 0 for (i in 1:(nlev-1)) { for (j in (i+1):nlev) { if(verbose) cat(lev[i],lev[j]) pair <- pair+1 sel <- group %in% c(lev[i],lev[j]) out <- compareTwoGrowthCurves(group[sel],y[sel,,drop=FALSE],nsim=nsim,fun=fun) if(verbose) cat("\ ",round(out$stat,2),"\n") g1[pair] <- lev[i] g2[pair] <- lev[j] stat[pair] <- out$stat pvalue[pair] <- out$p.value } } tab <- data.frame(Group1=g1,Group2=g2,Stat=stat,P.Value=pvalue) tab$adj.P.Value <- p.adjust(pvalue,method=adjust) tab } plotGrowthCurves <- function(group,y,levels=sort(unique(group)),times=NULL,col=NULL,...) { # Plot growth curves with colors for groups # Columns of y are time points, rows are individuals # Gordon Smyth # 30 May 2006. Last modified 8 July 2006. group <- as.character(group) if(!is.null(levels)) levels <- as.character(levels) nlev <- length(levels) if(nlev < 2) stop("Less than 2 groups to compare") if(is.null(dim(y))) stop("y must be matrix-like") y <- as.matrix(y) if(!is.null(times)) y <- y[,times,drop=FALSE] if(is.null(col)) col <- 1:nlev group.col <- col[match(group,levels)] plot(col(y),y,type="n",xlab="Time",ylab="Response",...) x <- 1:ncol(y) for (i in 1:nrow(y)) { lines(x,y[i,],col=group.col[i]) } yr <- range(y,na.rm=TRUE) legend(1,yr[2]-diff(yr)/40,legend=levels,col=col,lty=1) invisible() } statmod/R/remlscorgamma.R0000644000176200001440000000634112654044513015112 0ustar liggesusersremlscoregamma <- function(y,X,Z,mlink="log",dlink="log",trace=FALSE,tol=1e-5,maxit=40) { # # Mean-dispersion fit by REML scoring for gamma responses # Fit ED(mu,phi) model to y with # g(mu)=X%*%beta and f(phi)=Z%*%gam # # Gordon Smyth, Walter and Eliza Hall Institute # 16 Dec 2002. n <- length(y) X <- as.matrix(X) if(is.null(colnames(X))) colnames(X) <- paste("X",as.character(1:ncol(X)),sep="") Z <- as.matrix(Z) if(is.null(colnames(Z))) colnames(Z) <- paste("Z",as.character(1:ncol(Z)),sep="") q <- dim(Z)[2] const <- 2*sum(log(y)) # Link functions mli <- make.link(mlink) dli <- make.link(dlink) # Mean family f <- Gamma() f$linkfun <- mli$linkfun f$linkinv <- mli$linkinv f$mu.eta <- mli$mu.eta f$valideta <- mli$valideta # initial residuals and leverages assuming constant dispersion fitm <- glm.fit(X,y,family=f) mu <- fitted(fitm) d <- 2*( (y-mu)/mu - log(y/mu) ) p <- fitm$rank # start from constant dispersion phi <- -1/canonic.digamma(mean(d))*n/(n-p) phi <- rep(phi,n) fitd <- lm.fit(Z,dli$linkfun(phi)) gam <- ifelse(is.na(fitd$coef),0,fitd$coef) if( mean(abs(fitd$residuals))/phi[1] > 1e-12 ) { # intercept is not in span of Z phi <- drop(dli$linkinv( Z %*% gam )) fitm <- glm.fit(X,y,weights=1/phi,mustart=mu,family=f) mu <- fitted(fitm) d <- 2*( (y-mu)/mu - log(y/mu) ) } else fitm <- glm.fit(X,y,weights=1/phi,mustart=mu,family=f) dev <- const+sum(2*(lgamma(1/phi)+(1+log(phi))/phi)+d/phi)+const+2*log(prod(abs(diag(fitm$qr$qr)[1:p]))) # reml scoring iter <- 0 if(trace) cat("Iter =",iter,", Dev =",format(dev,digits=13)," Gamma",gam,"\n") Q2 <- array(0,c(n,p*(p+1)/2)) repeat { iter <- iter+1 # gradient matrix eta <- dli$linkfun(phi) phidot <- dli$mu.eta(eta) * Z Z2 <- phidot / phi / sqrt(2) # information matrix and leverages Q <- qr.qy(fitm$qr, diag(1, nrow = n, ncol = p)) j0 <- 0 for(k in 0:(p-1)) { Q2[ ,(j0+1):(j0+p-k)] <- Q[ ,1:(p-k)] * Q[ ,(k+1):p] j0 <- j0+p-k } if(p>1) Q2[ ,(p+1):(p*(p+1)/2)] <- sqrt(2) * Q2[ ,(p+1):(p*(p+1)/2)] h <- drop( Q2[ ,1:p] %*% array(1,c(p,1)) ) Q2Z <- crossprod(Q2,Z2) extradisp <- 2*( trigamma(1/phi) - trigamma(1/phi/h)/h )/phi^2 - (1-h) info <- crossprod(Z2,(extradisp+1-2*h)*Z2) + crossprod(Q2Z) # score vector deltah <- 2*(digamma(1/h/phi)+log(h)-digamma(1/phi)) dl <- crossprod(phidot, (d - deltah)/(2*phi^2)) # scoring step R <- chol(info) dgam <- backsolve(R,backsolve(R,dl,transpose=TRUE)) gam <- gam + dgam # evaluate modified profile likelihood phi <- drop(dli$linkinv( Z %*% gam )) fitm <- glm.fit(X,y,weights=1/phi,mustart=mu,family=f) mu <- fitted(fitm) d <- 2*( (y-mu)/mu - log(y/mu) ) dev <- const+sum(2*(lgamma(1/phi)+(1+log(phi))/phi)+d/phi)+const+2*log(prod(abs(diag(fitm$qr$qr)[1:p]))) # iteration output if(trace) cat("Iter =",iter,", Dev =",format(dev,digits=13)," Gamma",gam,"\n") # test for convergence if( crossprod(dl,dgam) < tol ) break # test for iteration limit if(iter > maxit) { warning("Max iterations exceeded") break } } # Standard errors se.gam <- sqrt(diag(chol2inv(chol(info)))) se.beta <- sqrt(diag(chol2inv(qr.R(fitm$qr)))) list(beta=fitm$coef,se.beta=se.beta,gamma=gam,se.gam=se.gam,mu=mu,phi=phi,deviance=dev,h=h) } statmod/R/sagetest.R0000644000176200001440000000234312654044513014076 0ustar liggesusers# SAGE.R sage.test <- function(x, y, n1=sum(x), n2=sum(y)) # Exact binomial probabilities for comparing SAGE libraries # Gordon Smyth # 15 Nov 2003. Last modified 20 July 2012. { if(any(is.na(x)) || any(is.na(y))) stop("missing values not allowed") x <- round(x) y <- round(y) if(any(x<0) || any(y<0)) stop("x and y must be non-negative") if(length(x) != length(y)) stop("x and y must have same length") n1 <- round(n1) n2 <- round(n2) if(!missing(n1) && any(x>n1)) stop("x cannot be greater than n1") if(!missing(n2) && any(y>n2)) stop("y cannot be greater than n2") size <- x+y p.value <- rep(1,length(x)) if(n1==n2) { i <- (size>0) if(any(i)) { x <- pmin(x[i],y[i]) size <- size[i] p.value[i] <- pmin(2*pbinom(x,size=size,prob=0.5),1) } return(p.value) } prob <- n1/(n1+n2) if(any(big <- size>10000)) { ibig <- (1:length(x))[big] for (i in ibig) p.value[i] <- chisq.test(matrix(c(x[i],y[i],n1-x[i],n2-y[i]),2,2))$p.value } size0 <- size[size>0 & !big] if(length(size0)) for (isize in unique(size0)) { i <- (size==isize) p <- dbinom(0:isize,prob=prob,size=isize) o <- order(p) cumsump <- cumsum(p[o])[order(o)] p.value[i] <- cumsump[x[i]+1] } p.value } statmod/R/digammaf.R0000644000176200001440000000467112654044513014032 0ustar liggesusersDigamma <- function(link = "log") { # Digamma generalized linear model family # Gordon Smyth, smyth@wehi.edu.au # 3 July 1998. Last revised 9 Dec 2002. # # improve on the link deparsing code in quasi() linkarg <- substitute(link) if (is.expression(linkarg) || is.call(linkarg)) { linkname <- deparse(linkarg) } else if(is.character(linkarg)) { linkname <- linkarg link <- make.link(linkarg) } else if(is.numeric(linkarg)) { linkname <- paste("power(",linkarg,")",sep="") link <- make.link(linkarg) } else { linkname <- deparse(linkarg) link <- make.link(linkname) } validmu <- function(mu) all(mu>0) dev.resids <- function(y, mu, wt) wt * unitdeviance.digamma(y,mu) initialize <- expression({ if (any(y <= 0)) stop(paste("Non-positive values not", "allowed for the Digamma family")) n <- rep(1, nobs) mustart <- y }) aic <- function(y, n, mu, wt, dev) NA structure(list( family = "Digamma", variance = varfun.digamma, dev.resids = dev.resids, aic = aic, link = linkname, linkfun = link$linkfun, linkinv = link$linkinv, mu.eta = link$mu.eta, valideta = link$valideta, validmu = validmu, initialize = initialize, class = "family")) } cumulant.digamma <- function(theta) # Cumulant function for the Digamma family # GKS 3 July 98 2*( theta*(log(-theta)-1) + lgamma(-theta) ) meanval.digamma <- function(theta) # Mean value function for the Digamma family # GKS 3 July 98 2*( log(-theta) - digamma(-theta) ) d2cumulant.digamma <- function(theta) # 2nd derivative of cumulant function for Digamma family # GKS 3 July 98 2*( 1/theta + trigamma(-theta) ) canonic.digamma <- function(mu) { # Canonical mapping for Digamma family # Solve meanval.digamma(theta) = mu for theta # GKS 3 July 98 # # Starting value from -log(-theta) =~ log(mu) mlmt <- log(mu) theta <- -exp(-mlmt) for (i in 1:3) { mu1 <- meanval.digamma(theta) v <- d2cumulant.digamma(theta) deriv <- -v/mu1*theta mlmt <- mlmt - log(mu1/mu)/deriv theta <- -exp(-mlmt) } theta } varfun.digamma <- function(mu) { # Variance function for Digamma family # GKS 3 July 98 # theta <- canonic.digamma(mu) 2*( 1/theta + trigamma(-theta) ) } unitdeviance.digamma <- function(y,mu) { # Unit deviance for Digamma family # GKS 3 July 98 # thetay <- canonic.digamma(y) theta <- canonic.digamma(mu) 2*( y*(thetay-theta) - (cumulant.digamma(thetay)-cumulant.digamma(theta)) ) } statmod/R/invgauss.R0000644000176200001440000002326512654044513014124 0ustar liggesusersdinvgauss <- function(x, mean=1, shape=NULL, dispersion=1, log=FALSE) # Probability density function of inverse Gaussian distribution # Gordon Smyth # Created 15 Jan 1998. Last revised 2 Feb 2016. { # Dispersion is reciprocal of shape if(!is.null(shape)) dispersion <- 1/shape # Check for special cases spec.x <- any(!is.finite(x) | x<=0) spec.mean <- any(!is.finite(mean) | mean<=0) spec.disp <- any(!is.finite(dispersion) | dispersion<=0) any.special <- spec.x | spec.mean | spec.disp # If any parameter has length 0, return result of length 0 r <- range(length(x),length(mean),length(dispersion)) if(r[1L]==0L) return(numeric(0L)) # Make arguments same length n <- r[2L] if(length(x)0 & phimu & (mu==0 | phi==0)) | x==Inf | (x>0 & phi==Inf) spike <- (x==mu & (mu==0 | phi==0)) | (x==0 & phi==Inf) invchisq <- mu==Inf & !(left.limit | right.limit | spike) NA.cases <- is.na(x) | is.na(mu) | is.na(phi) | mu<0 | phi<0 left.limit[NA.cases] <- FALSE right.limit[NA.cases] <- FALSE spike[NA.cases] <- FALSE invchisq[NA.cases] <- FALSE logd[left.limit] <- -Inf logd[right.limit] <- -Inf logd[spike] <- Inf logd[invchisq] <- .dinvgaussInfMean(x=x[invchisq],dispersion=phi[invchisq]) logd[NA.cases] <- NA ok <- !(left.limit | right.limit | spike | invchisq | NA.cases) logd[ok] <- .dinvgauss(x[ok],mean=mu[ok],dispersion=phi[ok],log=TRUE) } else { logd[] <- .dinvgauss(x,mean=mu,dispersion=phi,log=TRUE) } if(log) logd else exp(logd) } .dinvgauss <- function(x, mean=NULL, dispersion=1, log=FALSE) # Probability density function of inverse Gaussian distribution # with no argument checking and assuming mean=1 { notnullmean <- !is.null(mean) if(notnullmean) { x <- x/mean dispersion <- dispersion*mean } d <- (-log(dispersion)-log(2*pi)-3*log(x) - (x-1)^2/dispersion/x)/2 if(notnullmean) d <- d-log(mean) if(log) d else exp(d) } .dinvgaussInfMean <- function(x, dispersion=1) { (-log(dispersion) - log(2*pi) - 3*log(x) - 1/dispersion/x) / 2 } pinvgauss <- function(q, mean=1, shape=NULL, dispersion=1, lower.tail=TRUE, log.p=FALSE) # Cumulative distribution function of inverse Gaussian distribution # Gordon Smyth # Created 15 Jan 1998. Last revised 2 Feb 2016. { # Dispersion is reciprocal of shape if(!is.null(shape)) dispersion <- 1/shape # Check for special cases spec.q <- any(!is.finite(q) | q<=0) spec.mean <- any(!is.finite(mean) | mean<=0) spec.disp <- any(!is.finite(dispersion) | dispersion<=0) any.special <- spec.q | spec.mean | spec.disp # If any parameter has length 0, return result of length 0 r <- range(length(q),length(mean),length(dispersion)) if(r[1L]==0L) return(numeric(0L)) # Make arguments same length n <- r[2L] if(length(q)0 & phimu & (mu==0 | phi==0)) | q==Inf | (q>0 & phi==Inf) spike <- (q==mu & (mu==0 | phi==0)) | (q==0 & phi==Inf) invchisq <- mu==Inf & !(left.limit | right.limit | spike) NA.cases <- is.na(q) | is.na(mu) | is.na(phi) | mu<0 | phi<0 left.limit[NA.cases] <- FALSE right.limit[NA.cases] <- FALSE spike[NA.cases] <- FALSE invchisq[NA.cases] <- FALSE ok <- !(left.limit | right.limit | spike | invchisq | NA.cases) if(lower.tail) { logp[left.limit] <- -Inf logp[right.limit] <- 0 } else { logp[left.limit] <- 0 logp[right.limit] <- -Inf } logp[spike] <- 0 logp[invchisq] <- pchisq(1/q[invchisq]/phi[invchisq],df=1,lower.tail=!lower.tail,log.p=TRUE) logp[NA.cases] <- NA logp[ok] <- .pinvgauss(q[ok],mean=mu[ok],dispersion=phi[ok],lower.tail=lower.tail,log.p=TRUE) } else { logp <- .pinvgauss(q,mean=mu,dispersion=phi,lower.tail=lower.tail,log.p=TRUE) } if(log.p) logp else(exp(logp)) } .pinvgauss <- function(q, mean=NULL, dispersion=1, lower.tail=TRUE, log.p=FALSE) # Cumulative distribution function of inverse Gaussian distribution # without argument checking { if(!is.null(mean)) { q <- q/mean dispersion <- dispersion*mean } pq <- sqrt(dispersion*q) a <- pnorm((q-1)/pq,lower.tail=lower.tail,log.p=TRUE) b <- 2/dispersion + pnorm(-(q+1)/pq,log.p=TRUE) if(lower.tail) b <- exp(b-a) else b <- -exp(b-a) logp <- a+log1p(b) if(log.p) logp else exp(logp) } rinvgauss <- function(n, mean=1, shape=NULL, dispersion=1) # Random variates from inverse Gaussian distribution # Gordon Smyth (with a correction by Trevor Park 14 June 2005) # Created 15 Jan 1998. Last revised 12 Jan 2016. { # Dispersion is reciprocal of shape if(!is.null(shape)) dispersion <- 1/shape # Check n if(length(n)>1L) n <- length(n) else n <- as.integer(n) if(n<0L) stop("n can't be negative") if(n==0L || length(mean)==0L || length(dispersion)==0L) return(numeric(0L)) # Make arguments same length mu <- rep_len(mean,n) phi <- rep_len(dispersion,n) # Setup output vector r <- rep_len(0,n) # Non-positive parameters give NA i <- (mu > 0 & phi > 0) i[is.na(i)] <- FALSE if(!all(i)) { r[!i] <- NA n <- sum(i) } # Divide out mu phi[i] <- phi[i]*mu[i] Y <- rnorm(n)^2 X1 <- 1 + phi[i]/2 * (Y - sqrt(4*Y/phi[i]+Y^2)) firstroot <- (runif(n) < 1/(1+X1)) r[i][firstroot] <- X1[firstroot] r[i][!firstroot] <- 1/X1[!firstroot] mu*r } qinvgauss <- function(p, mean=1, shape=NULL, dispersion=1, lower.tail=TRUE, log.p=FALSE, maxit=200L, tol=1e-14, trace=FALSE) # Quantiles of the inverse Gaussian distribution # using globally convergent Newton iteration. # Gordon Smyth # Created 12 May 2014. Last revised 2 Feb 2016. # # Replaced an earlier function by Paul Bagshaw of 23 Dec 1998 { # Dispersion is reciprocal of shape if(!is.null(shape)) dispersion <- 1/shape # Make sure that p is exp(logp) if(log.p) logp <- p else { p[p<0] <- NA p[p>1] <- NA logp <- log(p) } p <- exp(logp) # Make arguments same length r <- range(length(p),length(mean),length(dispersion)) if(r[1L]==0L) return(numeric(0L)) n <- r[2L] if(length(p)1e3 k1 <- 1/2/kappa[bigcv] if(length(k1)) x[bigcv] <- k1*(1-k1^2) if(trace) { if(n < 6L) cat("mode ",x,"\n") else cat("quantile(mode) ",quantile(x),"\n") } # Identify cases with very small tail probabilities if(lower.tail) { small.left <- (logp < -11.51) small.right <- (logp > -1e-5) } else { small.left <- (logp > -1e-5) small.right <- (logp < -11.51) } # For small left tail prob, use inverse chisq as starting value if(any(small.left)) x[small.left] <- 1/phi[small.left]/qnorm(logp[small.left],lower.tail=lower.tail,log.p=TRUE)^2 # For small right tail prob, use qgamma with same mean and var as starting value if(any(small.right)) { alpha <- 1/phi[small.right] q.gam <- qgamma(logp[small.right],shape=alpha,rate=alpha,lower.tail=lower.tail,log.p=TRUE) x[small.right] <- q.gam } step <- function(x,p,logp,phi) { logF <- .pinvgauss(x,dispersion=phi,lower.tail=lower.tail,log.p=TRUE) dp <- dlogp <- logp-logF smallstep <- abs(dlogp) < 1e-5 dp[smallstep] <- exp(logp[smallstep]+log1p(-dlogp[smallstep]/2)) * dlogp[smallstep] dp[!smallstep] <- p[!smallstep]-exp(logF[!smallstep]) dp / .dinvgauss(x,dispersion=phi) } # First Newton step iter <- 0 dx <- step(x,p,logp,phi) sdx <- sign(dx) if(lower.tail) x <- x + dx else x <- x - dx i <- (abs(dx) > tol) if(trace) { cat("Iter=",iter,"Still converging=",sum(i),"\n") if(n < 6L) cat("x ",x,"\ndx ",dx,"\n") else cat("quantile(x) ",quantile(x),"\nMax dx ",max(abs(dx)),"\n") } # Newton iteration is monotonically convergent from point of inflexion while(any(i)) { iter <- iter+1 if(iter > maxit) { warning("max iterations exceeded") break } dx <- step(x[i],p[i],logp[i],phi[i]) # Change of sign indicates that machine precision has been overstepped dx[dx * sdx[i] < 0] <- 0 if(lower.tail) x[i] <- x[i] + dx else x[i] <- x[i] - dx i[i] <- (abs(dx)/pmax(x[i],1) > tol) if(trace) { cat("Iter=",iter,"Still converging=",sum(i),"\n") if(n < 6L) cat("x ",x,"\ndx ",dx,"\n") else cat("quantile(x) ",quantile(x),"\nMax dx ",max(abs(dx)),"\n") } } # Mu scales the distribution q[ok] <- x*mu[ok] q } statmod/MD50000644000176200001440000000517412654111506012245 0ustar liggesusersf53c0a39fc8299d5fdeebfa079105dbd *DESCRIPTION 6e35872c226ec1e6c151cf5b2ea1c179 *NAMESPACE cabcedc80916e132755147508dc1808b *R/digamma.R b67d4c729d215c0952dc4594882da6fa *R/digammaf.R 2ca5d21ceedcedb7a85e9f46800187c7 *R/elda.R f48cdb780b1cc86bfeef8c4722912c2f *R/fitNBP.R 27838c3de5b6c7ff37ea6e92870903cc *R/forward.R d269c19fdeee45694c0553cef42c4633 *R/gaussquad.R 7871d84512b208401cec51e401c990d9 *R/glmgam.R 99187037b66bc90438229911a23f1f2d *R/glmnb.R 87e2551f59f03a67cf4b0a9bf6479972 *R/glmscoretest.R 98e67ef55b5ac6c79e18608e6e6e2b91 *R/growthcurve.R 1747893341300d18e4af7032e4f91fc0 *R/hommel.R 4a373f050963e1ed4a9d56766ec0a4ec *R/invgauss.R c85f7bc07419ec6489a122fd6c5d9d41 *R/matvec.R 95567b5481bab1b1b23cde0c9ce8369a *R/mixedmodel.R aead17fe81a6209b2c9cc7414ab7aa53 *R/mscale.R 8f269da5958ce55decf5133db3628f1e *R/permp.R e5134cb238e761646fb22e32e95ec08e *R/power.R 4cfc011dee689a2fe3846d867db5fddc *R/qres.R cb4885f03160dcbca66990cd8db7ae0a *R/remlscor.R 4843a9f5f3be5605ac7a96d5772bab26 *R/remlscorgamma.R a3185c8c817e40182c9c31a41970df90 *R/sagetest.R afad3149539912e729cb6d0aea8e53df *R/tweedief.R e600a474fedf570b1913b278022d46bf *data/welding.rdata 98a59632d75755a46f4e616d7dc5cb3b *inst/NEWS d37cd00f740faf6738e9a0c992f54eda *man/digammaf.Rd 1ee58cb70d97c57b1aeb31e2f94258a9 *man/elda.Rd 2f065b8fef25cc88b80bd996eb24b0f1 *man/fitNBP.Rd 3508ac5839f19813b7894387b7a21bb7 *man/forward.Rd 9e4d97cd65ba17014c30878b39a9a351 *man/gauss.quad.Rd 6fdf80d724c6e363280eecb45419977d *man/gauss.quad.prob.Rd 809ccf2ff24e804a2a5bb17baa1c80b9 *man/glmgam.Rd 552561fc0f9300d1564fb3acb0ab0d84 *man/glmscoretest.Rd 26d687ea8e66ede996ecf62873dc59ad *man/growthcurve.Rd 0f8f3ac69241aedf8664fbdf9d50f085 *man/hommel.test.Rd 57123b1d48029196db5178df31642306 *man/invgauss.Rd ee394dd22edb184f213f2e3e754cc980 *man/logmdigamma.Rd ae5d97dda72b73d6688128ac79753757 *man/matvec.Rd e9426cfaf3b6bd604658f95c3e57a032 *man/meanT.Rd 3ba27a33d97129bc9376bbf89f56ab2a *man/mixedmodel.Rd ac05feb7d63c4dd33f0f0d79e8903bff *man/mscale.Rd 33777e7ee45e51e80f62c7f5b42203bd *man/permp.Rd 03485b19a3f73e050a942b597d9addc8 *man/plot.limdil.Rd 8e7ed19b1b289c9ce8fa54a482399d40 *man/power.Rd 4d7a3c0d5daf408d50668b33e7b164e7 *man/qresiduals.Rd 185faf67649fe467b802b8e2fa77fc27 *man/remlscor.Rd 5e69ae5b142d559261b5f18a838e463a *man/remlscorgamma.Rd 396eb7592074f5d79ec1d7e6d9559378 *man/sage.test.Rd 542b2dac5e344bf19914046c1fb85d57 *man/statmod.Rd 3d0552637454f6b32b1ad801ce684dfe *man/tweedie.Rd d8a58e7e91ba0dbf41668a06d0bc34dc *man/welding.Rd bb605bd4a6615e3cd151df9bd01b88bd *src/gaussq.f 858e7cd614337b896cdba7b531453dc8 *tests/statmod-Tests.R 1c27b6296006bb145bc707035ba46fb9 *tests/statmod-Tests.Rout.save statmod/DESCRIPTION0000644000176200001440000000161612654111506013440 0ustar liggesusersPackage: statmod Version: 1.4.24 Date: 2016-02-01 Title: Statistical Modeling Author: Gordon Smyth [cre, aut], Yifang Hu [ctb], Peter Dunn [ctb], Belinda Phipson [ctb], Yunshun Chen [ctb] Maintainer: Gordon Smyth Depends: R (>= 1.6.1) Imports: stats, graphics Suggests: MASS, tweedie Description: A collection of algorithms and functions to aid statistical modeling. Includes growth curve comparisons, limiting dilution analysis (aka ELDA), mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Includes advanced generalized linear model functions that implement secure convergence, dispersion modeling and Tweedie power-law families. License: GPL-2 | GPL-3 NeedsCompilation: yes Packaged: 2016-02-02 06:14:35 UTC; smyth Repository: CRAN Date/Publication: 2016-02-02 12:30:14 statmod/man/0000755000176200001440000000000012654044513012504 5ustar liggesusersstatmod/man/fitNBP.Rd0000644000176200001440000000532212654044513014117 0ustar liggesusers\name{fitNBP} \alias{fitNBP} \title{Negative Binomial Model for SAGE Libraries with Pearson Estimation of Dispersion} \description{ Fit a multi-group negative-binomial model to SAGE data, with Pearson estimation of the common overdispersion parameter. } \usage{ fitNBP(y, group=NULL, lib.size=colSums(y), tol=1e-5, maxit=40, verbose=FALSE) } \arguments{ \item{y}{numeric matrix giving counts. Rows correspond to tags (genes) and columns to SAGE libraries.} \item{group}{factor indicating which library belongs to each group. If \code{NULL} then one group is assumed.} \item{lib.size}{vector giving total number of tags in each library.} \item{tol}{small positive numeric tolerance to judge convergence} \item{maxit}{maximum number of iterations permitted} \item{verbose}{logical, if \code{TRUE} then iteration progress information is output.} } \details{ The overdispersion parameter is estimated equating the Pearson goodness of fit to its expectation. The variance is assumed to be of the form Var(y)=mu*(1+phi*mu) where E(y)=mu and phi is the dispersion parameter. All tags are assumed to share the same dispersion. For given dispersion, the model for each tag is a negative-binomial generalized linear model with log-link and \code{log(lib.size)} as offset. The coefficient parametrization used is that corresponding to the formula \code{~0+group+offset(log(lib.size)}. Except for the dispersion being common rather than genewise, the model fitted by this function is equivalent to that proposed by Lu et al (2005). The numeric algorithm used is that of alternating iterations (Smyth, 1996) using Newton's method as the outer iteration for the dispersion parameter starting at phi=0. This iteration is monotonically convergent for the dispersion. } \value{ List with components \item{coefficients}{numeric matrix of rates for each tag (gene) and each group} \item{fitted.values}{numeric matrix of fitted values} \item{dispersion}{estimated dispersion parameter} } \author{Gordon Smyth} \references{ Lu, J, Tomfohr, JK, Kepler, TB (2005). Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach. \emph{BMC Bioinformatics} 6,165. Smyth, G. K. (1996). Partitioned algorithms for maximum likelihood and other nonlinear estimation. \emph{Statistics and Computing}, 6, 201-216. } \seealso{ \code{\link[statmod]{sage.test}} } \examples{ # True value for dispersion is 1/size=2/3 # Note the Pearson method tends to under-estimate the dispersion y <- matrix(rnbinom(10*4,mu=4,size=1.5),10,4) lib.size <- rep(50000,4) group <- c(1,1,2,2) fit <- fitNBP(y,group=group,lib.size=lib.size) logratio <- fit$coef \%*\% c(-1,1) } \keyword{regression} statmod/man/forward.Rd0000644000176200001440000000211712654044513014440 0ustar liggesusers\name{forward} \alias{forward} \title{Forward Selection of Covariates for Multiple Regression} \description{ Fit a multi-group negative-binomial model to SAGE data, with Pearson estimation of the common overdispersion parameter. } \usage{ forward(y, x, xkept=NULL, intercept=TRUE, nvar=ncol(x)) } \arguments{ \item{y}{numeric response vector.} \item{x}{numeric matrix of covariates, candidates to be added to the regression.} \item{xkept}{numeric matrix of covariates to be included in the starting regression.} \item{intercept}{logical, should an intercept be added to \code{xkept}?} \item{nvar}{integer, number of covariates from \code{x} to add to the regression.} } \details{ This function has the advantage that \code{x} can have many more columns than the length of \code{y}. } \value{ Integer vector of length \code{nvar}, giving the order in which columns of \code{x} are added to the regression. } \author{Gordon Smyth} \seealso{ \code{\link{step}} } \examples{ y <- rnorm(10) x <- matrix(rnorm(10*5),10,5) forward(y,x) } \keyword{regression} statmod/man/digammaf.Rd0000644000176200001440000000653712654044513014553 0ustar liggesusers\name{Digamma} \alias{Digamma} \alias{canonic.digamma} \alias{d2cumulant.digamma} \alias{unitdeviance.digamma} \alias{cumulant.digamma} \alias{meanval.digamma} \alias{varfun.digamma} \title{Digamma generalized linear model family} \description{ Produces a Digamma generalized linear model family object. The Digamma distribution is the distribution of the unit deviance for a gamma response. } \usage{ Digamma(link = "log") unitdeviance.digamma(y, mu) cumulant.digamma(theta) meanval.digamma(theta) d2cumulant.digamma(theta) varfun.digamma(mu) canonic.digamma(mu) } \arguments{ \item{link}{character string, number or expressing specifying the link function. See \code{quasi} for specification of this argument.} \item{y}{numeric vector of (positive) response values} \item{mu}{numeric vector of (positive) fitted values} \item{theta}{numeric vector of values of the canonical variable, equal to \eqn{-1/\phi} where \eqn{\phi} is the dispersion parameter of the gamma distribution} } \value{ \code{Digamma} produces a glm family object, which is a list of functions and expressions used by \code{glm} in its iteratively reweighted least-squares algorithm. See \code{family} for details. The other functions take vector arguments and produce vector values of the same length and called by \code{Digamma}. \code{unitdeviance.digamma} gives the unit deviances of the family, equal to the squared deviance residuals. \code{cumulant.digamma} is the cumulant function. If the dispersion is unity, then successive derivatives of the cumulant function give successive cumulants of the Digamma distribution. \code{meanvalue.digamma} gives the first derivative, which is the expected value. \code{d2cumulant.digamma} gives the second derivative, which is the variance. \code{canonic.digamma} is the inverse of \code{meanvalue.digamma} and gives the canonical parameter as a function of the mean parameter. \code{varfun.digamma} is the variance function of the Digamma family, the variance as a function of the mean. } \details{ This family is useful for dispersion modelling with gamma generalized linear models. The Digamma distribution describes the distribution of the unit deviances for a gamma family, in the same way that the gamma distribution itself describes the distribution of the unit deviances for Gaussian or inverse Gaussian families. The Digamma distribution is so named because it is dual to the gamma distribution in the above sense, and because the \code{digamma function} appears in its mean function. Suppose that \eqn{y} follows a gamma distribution with mean \eqn{\mu} and dispersion parameter \eqn{\phi}, so the variance of \eqn{y} is \eqn{\phi \mu^2}. Write \eqn{d(y,\mu)} for the gamma distribution unit deviance. Then \code{meanval.digamma(-1/phi)} gives the mean of \eqn{d(y,\mu)} and \code{2*d2cumulant.digamma(-1/phi)} gives the variance. } \author{Gordon Smyth} \references{ Smyth, G. K. (1989). Generalized linear models with varying dispersion. \emph{J. R. Statist. Soc. B}, \bold{51}, 47-61. } \examples{ # Test for log-linear dispersion trend in gamma regression y <- rchisq(20,df=1) x <- 1:20 out.gam <- glm(y~x,family=Gamma(link="log")) d <- residuals(out.gam)^2 out.dig <- glm(d~x,family=Digamma(link="log")) summary(out.dig,dispersion=2) } \seealso{ \code{\link{quasi}}, \code{\link{make.link}} } \keyword{models} statmod/man/matvec.Rd0000644000176200001440000000161112654044513014251 0ustar liggesusers\name{matvec} \alias{matvec} \alias{vecmat} \title{Multiply a Matrix by a Vector} \description{Multiply the rows or columns of a matrix by the elements of a vector.} \usage{ matvec(M, v) vecmat(v, M) } \arguments{ \item{M}{numeric matrix, or object which can be coerced to a matrix.} \item{v}{numeric vector, or object which can be coerced to a vector. Length should match the number of columns of \code{M} (for \code{matvec}) or the number of rows of \code{M} (for \code{vecmat})} } \value{A matrix of the same dimensions as \code{M}.} \details{ \code{matvec(M,v)} is equivalent to \code{M \%*\% diag(v)} but is faster to execute. Similarly \code{vecmat(v,M)} is equivalent to \code{diag(v) \%*\% M} but is faster to execute. } \examples{ A <- matrix(1:12,3,4) A matvec(A,c(1,2,3,4)) vecmat(c(1,2,3),A) } \author{Gordon Smyth} \keyword{array} \keyword{algebra} statmod/man/statmod.Rd0000644000176200001440000000552712654044513014457 0ustar liggesusers\name{statmod-package} \alias{statmod} \alias{statmod-package} \docType{package} \title{Introduction to the StatMod Package} \description{ This package includes a variety of functions for numerical analysis and statistical modelling. The functions are briefly summarized by type of application below. } \section{Generalized Linear Models}{ The function \code{\link{tweedie}} defines a large class of generalized linear model families with power variance functions. It used in conjunction with the glm function, and widens the class of families that can be fitted. \code{\link{qresiduals}} implements randomized quantile residuals for generalized linear models. The functions \code{canonic.digamma}, \code{unitdeviance.digamma}, \code{varfun.digamma}, \code{cumulant.digamma}, \code{d2cumulant.digamma}, \code{meanval.digamma} and \code{logmdigamma} are used to fit double-generalized models, in which a link-linear model is fitted to the dispersion as well as to the mean. Spefically they are used to fit the dispersion submodel associated with a gamma glm. } \section{Growth Curves}{ \code{\link{compareGrowthCurves}}, \code{compareTwoGrowthCurves} and \code{meanT} are functions to test for differences between growth curves with repeated measurements on subjects. } \section{Limiting Dilution Analysis}{ The \code{\link{limdil}} function is used in the analysis of stem cell frequencies. It implements limiting dilution analysis using complemenary log-log binomial generalized linear model regression, with some improvements on previous programs. } \section{Probability Distributions}{ The functions \code{\link{qinvgauss}}, \code{\link{dinvgauss}}, \code{\link{pinvgauss}} and \code{\link{rinvgauss}} provide probability calculations for the inverse Gaussian distribution. \code{\link{gauss.quad}} and \code{gauss.quad.prob} compute Gaussian Quadrature with probability distributions. } \section{Tests}{ \code{\link{hommel.test}} performs Hommel's multiple comparison tests. \code{\link{power.fisher.test}} computes the power of Fisher's Exact Test for comparing proportions. \code{\link{sage.test}} is a fast approximation to Fisher's exact test for each tag for comparing two Serial Analysis of Gene Expression (SAGE) libraries. \code{\link{permp}} computes p-values for permutation tests when the permutations are randomly drawn. } \section{Variance Models}{ \code{\link{mixedModel2}}, \code{\link{mixedModel2Fit}} and \code{\link{glmgam.fit}} fit mixed linear models. \code{\link{remlscore}} and \code{\link{remlscoregamma}} fit heteroscedastic and varying dispersion models by REML. \code{\link{welding}} is an example data set. } \section{Matrix Computations}{ \code{\link{matvec}} and \code{\link{vecmat}} facilitate multiplying matrices by vectors. } \author{Gordon Smyth} \keyword{documentation} statmod/man/permp.Rd0000644000176200001440000000537312654044513014126 0ustar liggesusers\name{permp} \alias{permp} \title{Exact permutation p-values} \description{ Calculates exact p-values for permutation tests when permutations are randomly drawn with replacement. } \usage{ permp(x, nperm, n1, n2, total.nperm=NULL, method="auto", twosided=TRUE) } \arguments{ \item{x}{number of permutations that yielded test statistics at least as extreme as the observed data. May be a vector or an array of values.} \item{nperm}{total number of permutations performed.} \item{n1}{sample size of group 1. Not required if \code{total.nperm} is supplied.} \item{n2}{sample size of group 2. Not required if \code{total.nperm} is supplied.} \item{total.nperm}{total number of permutations allowable from the design of the experiment.} \item{method}{character string indicating computation method. Possible values are \code{"exact"}, \code{"approximate"} or \code{"auto"}.} \item{twosided}{logical, is the test two-sided and symmetric between the two groups?} } \details{ This function can be used for calculating exact p-values for permutation tests where permutations are sampled with replacement, using theory and methods developed by Phipson and Smyth (2010). The input values are the total number of permutations done (\code{nperm}) and the number of these that were considered at least as extreme as the observed data (\code{x}). \code{total.nperm} is the total number of distinct values of the test statistic that are possible. This is generally equal to the number of possible permutations, unless a two-sided test is conducted with equal sample sizes, in which case \code{total.nperm} is half the number of permutations, because the test statistic must then be symmetric in the two groups. Usually \code{total.nperm} is computed automatically from \code{n1} and \code{n2}, but can also be supplied directly by the user. When \code{method="exact"}, the p-values are computed to full machine precision by summing a series terms. When \code{method="approximate"}, an approximation is used that is faster and uses less memory. If \code{method="auto"}, the exact calculation is used when \code{total.nperm} is less than or equal to 10,000 and the approximation is used otherwise. } \value{ vector or array of p-values, of same dimensions as \code{x} } \author{Belinda Phipson and Gordon Smyth} \references{ Phipson B, and Smyth GK (2010). Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. \emph{Statistical Applications in Genetics and Molecular Biology}, Volume 9, Article 39. \url{http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf} } \examples{ x <- 0:5 # Both calls give same results permp(x=x, nperm=99, n1=6, n2=6) permp(x=x, nperm=99, total.nperm=462) } \keyword{htest} statmod/man/invgauss.Rd0000644000176200001440000001205512654044513014635 0ustar liggesusers\name{invgauss} \alias{InverseGaussian} \alias{dinvgauss} \alias{pinvgauss} \alias{qinvgauss} \alias{rinvgauss} \title{Inverse Gaussian Distribution} \description{ Density, cumulative probability, quantiles and random generation for the inverse Gaussian distribution. } \usage{ dinvgauss(x, mean=1, shape=NULL, dispersion=1, log=FALSE) pinvgauss(q, mean=1, shape=NULL, dispersion=1, lower.tail=TRUE, log.p=FALSE) qinvgauss(p, mean=1, shape=NULL, dispersion=1, lower.tail=TRUE, log.p=FALSE, maxit=200L, tol=1e-14, trace=FALSE) rinvgauss(n, mean=1, shape=NULL, dispersion=1) } \arguments{ \item{x,q}{vector of quantiles.} \item{p}{vector of probabilities.} \item{n}{sample size. If \code{length(n)} is larger than 1, then \code{length(n)} random values are returned.} \item{mean}{vector of (positive) means.} \item{shape}{vector of (positive) shape parameters.} \item{dispersion}{vector of (positive) dispersion parameters. Ignored if \code{shape} is not \code{NULL}, in which case \code{dispersion=1/shape}.} \item{lower.tail}{logical; if \code{TRUE}, probabilities are P(Xq).} \item{log}{logical; if \code{TRUE}, the log-density is returned.} \item{log.p}{logical; if \code{TRUE}, probabilities are on the log-scale.} \item{maxit}{maximum number of Newton iterations used to find \code{q}.} \item{tol}{small positive numeric value giving the convergence tolerance for the quantile.} \item{trace}{logical, if \code{TRUE} then the working estimate for \code{q} from each iteration will be output.} } \value{ Output values give density (\code{dinvgauss}), probability (\code{pinvgauss}), quantile (\code{qinvgauss}) or random sample (\code{rinvgauss}) for the inverse Gaussian distribution with mean \code{mean} and dispersion \code{dispersion}. Output is a vector of length equal to the maximum length of any of the arguments \code{x}, \code{q}, \code{mean}, \code{shape} or \code{dispersion}. If the first argument is the longest, then all the attributes of the input argument are preserved on output, for example, a matrix \code{x} will give a matrix on output. Elements of input vectors that are missing will cause the corresponding elements of the result to be missing, as will non-positive values for \code{mean} or \code{dispersion}. } \details{ The inverse Gaussian distribution takes values on the positive real line (Tweedie, 1957; Chhikara and Folks, 1989). It is somewhat more right skew than the gamma distribution, with variance given by \code{dispersion*mean^3}. The distribution has applications in reliability and survival analysis, and is one of the response distributions used in generalized linear models. The shape and dispersion parameters are alternative parametrizations for the variability, with \code{dispersion=1/shape}. Only one of these two arguments needs to be specified. If both are set, then \code{shape} takes precedence. These functions implement algorithms described by Giner and Goknur (2016). \code{pinvgauss} uses a result from Chhikara and Folks (1974), with enhancements for right tails and log-probabilities. \code{rinvgauss} uses an algorithm proposed by Michael et al (1976). \code{qinvgauss} uses a monotonically convergent Newton iteration developed by Giner and Smyth (2016). All internal computations are undertaken on the log-scale as far as possible. The \code{pinvgauss} and \code{qinvgauss} functions make use of Taylor series expansions to achieve full floating point accuracy for small tail probabilities. } \references{ Chhikara, R. S., and Folks, J. L., (1989). \emph{The inverse Gaussian distribution: Theory, methodology and applications}. Marcel Dekker, New York. Chhikara, R. S., and Folks, J. L., (1974). Estimation of the inverse Gaussian distribution function. \emph{Journal of the American Statistical Association} 69, 250-254. Giner, G., and Smyth, G. K. (2016). statmod: Probability Calculations for the Inverse Gaussian Distribution. \url{http://www.statsci.org/smyth/pubs/qinvgaussPreprint.pdf} Michael, J. R., Schucany, W. R., and Haas, R. W. (1976). Generating random variates using transformations with multiple roots. \emph{The American Statistician}, 30, 88--90. Tweedie, M. C. (1957). Statistical Properties of Inverse Gaussian Distributions I. \emph{Annals of Mathematical Statistics} 28, 362-377. } \author{Gordon Smyth with suggestions from Paul Bagshaw, Centre National d'Etudes des Telecommunications, France, and Trevor Park, Department of Statistics, University of Florida} \seealso{ \code{dinvGauss}, \code{pinvGauss}, \code{qinvGauss} and \code{rinvGauss} in the SuppDists package. } \examples{ q <- rinvgauss(10, mean=1, disp=0.5) # generate vector of 10 random numbers p <- pinvgauss(q, mean=1, disp=0.5) # p should be uniformly distributed # Quantile for small right tail probability: qinvgauss(1e-20, mean=1.5, disp=0.7, lower.tail=FALSE) # Same quantile, but represented in terms of left tail probability on log-scale qinvgauss(-1e-20, mean=1.5, disp=0.7, lower.tail=TRUE, log.p=TRUE) } \keyword{distribution} statmod/man/mixedmodel.Rd0000644000176200001440000001147712654044513015134 0ustar liggesusers\name{mixedModel2} \alias{mixedModel2} \alias{mixedModel2Fit} \alias{randomizedBlock} \alias{randomizedBlockFit} \title{Fit Mixed Linear Model with 2 Error Components} \description{ Fits a mixed linear model by REML. The linear model contains one random factor apart from the unit errors. } \usage{ mixedModel2(formula, random, weights=NULL, only.varcomp=FALSE, data=list(), subset=NULL, contrasts=NULL, tol=1e-6, maxit=50, trace=FALSE) mixedModel2Fit(y, X, Z, w=NULL, only.varcomp=FALSE, tol=1e-6, maxit=50, trace=FALSE) randomizedBlock(formula, random, weights=NULL, only.varcomp=FALSE, data=list(), subset=NULL, contrasts=NULL, tol=1e-6, maxit=50, trace=FALSE) randomizedBlockFit(y, X, Z, w=NULL, only.varcomp=FALSE, tol=1e-6, maxit=50, trace=FALSE) } \arguments{ The arguments \code{formula}, \code{weights}, \code{data}, \code{subset} and \code{contrasts} have the same meaning as in \code{lm}. The arguments \code{y}, \code{X} and \code{w} have the same meaning as in \code{lm.wfit}. \item{formula}{formula specifying the fixed model.} \item{random}{vector or factor specifying the blocks corresponding to random effects.} \item{weights}{optional vector of prior weights.} \item{only.varcomp}{logical value, if \code{TRUE} computation of standard errors and fixed effect coefficients will be skipped} \item{data}{an optional data frame containing the variables in the model.} \item{subset}{an optional vector specifying a subset of observations to be used in the fitting process.} \item{contrasts}{an optional list. See the \code{contrasts.arg} argument of \code{model.matrix.default}.} \item{tol}{small positive numeric tolerance, passed to \code{glmgam.fit}} \item{maxit}{maximum number of iterations permitted, passed to \code{glmgam.fit}} \item{trace}{logical value, passed to \code{glmgam.fit}. If \code{TRUE} then working estimates will be printed at each iteration.} \item{y}{numeric response vector} \item{X}{numeric design matrix for fixed model} \item{Z}{numeric design matrix for random effects} \item{w}{optional vector of prior weights} } \details{ Note that \code{randomizedBlock} and \code{mixedModel2} are alternative names for the same function. This function fits the model \eqn{y=Xb+Zu+e} where \eqn{b} is a vector of fixed coefficients and \eqn{u} is a vector of random effects. Write \eqn{n} for the length of \eqn{y} and \eqn{q} for the length of \eqn{u}. The random effect vector \eqn{u} is assumed to be normal, mean zero, with covariance matrix \eqn{\sigma^2_uI_q} while \eqn{e} is normal, mean zero, with covariance matrix \eqn{\sigma^2I_n}. If \eqn{Z} is an indicator matrix, then this model corresponds to a randomized block experiment. The model is fitted using an eigenvalue decomposition which transforms the problem into a Gamma generalized linear model. Note that the block variance component \code{varcomp[2]} is not constrained to be non-negative. It may take negative values corresponding to negative intra-block correlations. However the correlation \code{varcomp[2]/sum(varcomp)} must lie between \code{-1} and \code{1}. Missing values in the data are not allowed. This function is equivalent to \code{lme(fixed=formula,random=~1|random)}, except that the block variance component is not constrained to be non-negative, but is faster and more accurate for small to moderate size data sets. It is slower than \code{lme} when the number of observations is large. This function tends to be fast and reliable, compared to competitor functions which fit randomized block models, when then number of observations is small, say no more than 200. However it becomes quadratically slow as the number of observations increases because of the need to do two eigenvalue decompositions of order nearly equal to the number of observations. So it is a good choice when fitting large numbers of small data sets, but not a good choice for fitting large data sets. } \value{ A list with the components: \item{varcomp}{vector of length two containing the residual and block components of variance.} \item{se.varcomp}{standard errors for the components of variance.} \item{reml.residuals}{standardized residuals in the null space of the design matrix.} If \code{fixed.estimates=TRUE} then the components from the diagonalized weighted least squares fit are also returned. } \author{Gordon Smyth} \references{ Venables, W., and Ripley, B. (2002). \emph{Modern Applied Statistics with S-Plus}, Springer. } \seealso{ \code{\link{glmgam.fit}}, \code{\link[nlme:lme]{lme}}, \code{\link{lm}}, \code{\link{lm.fit}} } \examples{ # Compare with first data example from Venable and Ripley (2002), # Chapter 10, "Linear Models" library(MASS) data(petrol) out <- mixedModel2(Y~SG+VP+V10+EP, random=No, data=petrol) cbind(varcomp=out$varcomp,se=out$se.varcomp) } \keyword{regression} statmod/man/mscale.Rd0000644000176200001440000000234412654044513014242 0ustar liggesusers\name{mscale} \alias{mscale} \title{M Scale Estimation} \description{ Robust estimation of a scale parameter using Hampel's redescending psi function. } \usage{ mscale(u, na.rm=FALSE) } \arguments{ \item{u}{numeric vector of residuals.} \item{na.rm}{logical. Should missing values be removed?} } \value{numeric constant giving the estimated scale.} \details{ Estimates a scale parameter or standard deviation using an M-estimator with 50\% breakdown. This means the estimator is highly robust to outliers. If the input residuals \code{u} are a normal sample, then \code{mscale(u)} should be equal to the standard deviation. } \author{Gordon Smyth} \references{ Yohai, V. J. (1987). High breakdown point and high efficiency robust estimates for regression. \emph{Ann. Statist.} 15, 642-656. Stromberg, A. J. (1993). Computation of high breakdown nonlinear regression parameters. \emph{J. Amer. Statist. Assoc.} 88, 237-244. Smyth, G. K., and Hawkins, D. M. (2000). Robust frequency estimation using elemental sets. \emph{Journal of Computational and Graphical Statistics} 9, 196-214. } %\seealso{ %\code{\link{rho.hampel}}, \code{\link{psi.hampel}} %} \examples{ u <- rnorm(100) sd(u) mscale(u) } statmod/man/welding.Rd0000644000176200001440000000301412654044513014422 0ustar liggesusers\name{welding} \alias{welding} \title{Data: Tensile Strength of Welds} \description{ This is a highly fractionated two-level factorial design employed as a screening design in an off-line welding experiment performed by the National Railway Corporation of Japan. There were 16 runs and 9 experimental factors. The response variable is the observed tensile strength of the weld, one of several quality characteristics measured. All other variables are at plus and minus levels. } \usage{data(welding)} \format{ A data frame containing the following variables. All the explanatory variables are numeric with two levels, \code{-1} and \code{1}. \tabular{lll}{ \tab \bold{Variable} \tab \bold{Description}\cr \tab Rods \tab Kind of welding rods\cr \tab Drying \tab Period of drying\cr \tab Material \tab Welded material\cr \tab Thickness \tab Thickness\cr \tab Angle \tab Angle\cr \tab Opening \tab Opening\cr \tab Current \tab Current\cr \tab Method \tab Welding method\cr \tab Preheating \tab Preheating\cr \tab Strength \tab Tensile strength of the weld in kg/mm. The response variable.\cr } } \source{ \url{http://www.statsci.org/data/general/welding.html} } \references{ Smyth, G. K., Huele, F., and Verbyla, A. P. (2001). Exact and approximate REML for heteroscedastic regression. \emph{Statistical Modelling} \bold{1}, 161-175. Smyth, G. K. (2002). An efficient algorithm for REML in heteroscedastic regression. \emph{Journal of Computational and Graphical Statistics} \bold{11}, 1-12. } \keyword{datasets} statmod/man/elda.Rd0000644000176200001440000001637412654044513013713 0ustar liggesusers\name{elda} \alias{elda} \alias{limdil} \alias{limdil.class} \alias{eldaOneGroup} \title{Extreme Limiting Dilution Analysis} \description{ Fit single-hit model to a dilution series using complementary log-log binomial regression. } \usage{ elda(response, dose, tested=rep(1,length(response)), group=rep(1,length(response)), observed=FALSE, confidence=0.95, test.unit.slope=FALSE) limdil(response, dose, tested=rep(1,length(response)), group=rep(1,length(response)), observed=FALSE, confidence=0.95, test.unit.slope=FALSE) eldaOneGroup(response, dose, tested, observed=FALSE, confidence=0.95, tol=1e-8, maxit=100, trace=FALSE) } \arguments{ \item{response}{numeric vector giving number of positive cases out of \code{tested} trials. Should take non-negative integer values.} \item{dose}{numeric vector of expected number of cells in assay. Values must be positive.} \item{tested}{numeric vector giving number of trials at each dose. Should take integer values.} \item{group}{vector or factor giving group to which the response belongs.} \item{observed}{logical, is the actual number of cells observed?} \item{confidence}{numeric level for confidence interval. Should be strictly between 0 and 1.} \item{test.unit.slope}{logical, should the adequacy of the single-hit model be tested?} \item{tol}{convergence tolerance.} \item{maxit}{maximum number of Newton iterations to perform.} \item{trace}{logical, if \code{TRUE} then iterim results are output at each iteration.} } \details{ \code{elda} and \code{limdil} are alternative names for the same function. (\code{limdil} was the older name before the 2009 paper by Hu and Smyth.) \code{eldaOneGroup} is a lower-level function that does the computations when there is just one group, using a globally convergent Newton iteration. It is called by the other functions. These functions implement maximum likelihood analysis of limiting dilution data using methods proposed by Hu and Smyth (2009). The functions gracefully accommodate situations where 0\% or 100\% of the assays give positive results, which is why we call it "extreme" limiting dilution analysis. The functions provide the ability to test for differences in stem cell frequencies between groups, and to test goodness of fit in a number of ways. The methodology has been applied to the analysis of stem cell assays (Shackleton et al, 2006). The statistical method is explained by Hu and Smyth (2009). A binomial generalized linear model is fitted for each group with cloglog link and offset \code{log(dose)}. If \code{observed=FALSE}, a classic Poisson single-hit model is assumed, and the Poisson frequency of the stem cells is the \code{exp} of the intercept. If \code{observed=TRUE}, the values of \code{dose} are treated as actual cell numbers rather than expected values. This doesn't change the generalized linear model fit, but it does change how the frequencies are extracted from the estimated model coefficient (Hu and Smyth, 2009). The confidence interval is a Wald confidence interval, unless the responses are all negative or all positive, in which case Clopper-Pearson intervals are computed. If \code{group} takes several values, then separate confidence intervals are computed for each group. In this case a likelihood ratio test is conducted for differences in active cell frequencies between the groups. These functions compute a number of different tests of goodness of fit. One test is based on the coefficient for \code{log(dose)} in the generalized linear model. The nominal slope is 1. A slope greater than one suggests a multi-hit model in which two or more cells are synergistically required to produce a positive response. A slope less than 1 suggests some sort of cell interference. Slopes less than 1 can also be due to heterogeneity of the stem cell frequency between assays. \code{elda} conducts likelihood ratio and score tests that the slope is equal to one. Another test is based on the coefficient for \code{dose}. This idea is motivated by a suggestion of Gart and Weiss (1967), who suggest that heterogeneity effects are more likely to be linear in \code{dose} than \code{log(dose)}. These functions conducts score tests that the coefficient for \code{dose} is non-zero. Negative values for this test suggest heterogeneity. These functions produce objects of class \code{"limdil"}. There are \code{\link[=print.limdil]{print}} and \code{\link[=plot.limdil]{plot}} methods for \code{"limdil"} objects. } \value{ \code{elda} and \code{limdil} produce an object of class \code{"limdil"}. This is a list with the following components: \item{CI}{numeric matrix giving estimated stem cell frequency and lower and upper limits of Wald confidence interval for each group} \item{test.difference}{numeric vector giving chisquare likelihood ratio test statistic and p-value for testing the difference between groups} \item{test.slope.wald}{numeric vector giving wald test statistics and p-value for testing the slope of the offset equal to one} \item{test.slope.lr}{numeric vector giving chisquare likelihood ratio test statistics and p-value for testing the slope of the offset equal to one} \item{test.slope.score.logdose}{numeric vector giving score test statistics and p-value for testing multi-hit alternatives} \item{test.slope.score.dose}{numeric vector giving score test statistics and p-value for testing heterogeneity} \item{response}{numeric of integer counts of positive cases, out of \code{tested} trials} \item{tested}{numeric vector giving number of trials at each dose} \item{dose}{numeric vector of expected number of cells in assay} \item{group}{vector or factor giving group to which the response belongs} \item{num.group}{number of groups} } \author{Yifang Hu and Gordon Smyth} \references{ Hu, Y, and Smyth, GK (2009). ELDA: Extreme limiting dilution analysis for comparing depleted and enriched populations in stem cell and other assays. \emph{Journal of Immunological Methods} 347, 70-78. \url{http://dx.doi.org/10.1016/j.jim.2009.06.008} \url{http://www.statsci.org/smyth/pubs/ELDAPreprint.pdf} Shackleton, M., Vaillant, F., Simpson, K. J., Stingl, J., Smyth, G. K., Asselin-Labat, M.-L., Wu, L., Lindeman, G. J., and Visvader, J. E. (2006). Generation of a functional mammary gland from a single stem cell. \emph{Nature} 439, 84-88. \url{http://www.nature.com/nature/journal/v439/n7072/abs/nature04372.html} Gart, JJ, and Weiss, GH (1967). Graphically oriented tests for host variability in dilution experiments. \emph{Biometrics} 23, 269-284. } \seealso{ \code{\link{plot.limdil}} and \code{\link{print.limdil}} are methods for \code{limdil} class objects. A webpage interface to this function is available at \url{http://bioinf.wehi.edu.au/software/elda}. } \examples{ # When there is one group Dose <- c(50,100,200,400,800) Responses <- c(2,6,9,15,21) Tested <- c(24,24,24,24,24) out <- elda(Responses,Dose,Tested,test.unit.slope=TRUE) out plot(out) # When there are four groups Dose <- c(30000,20000,4000,500,30000,20000,4000,500,30000,20000,4000,500,30000,20000,4000,500) Responses <- c(2,3,2,1,6,5,6,1,2,3,4,2,6,6,6,1) Tested <- c(6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6) Group <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4) elda(Responses,Dose,Tested,Group,test.unit.slope=TRUE) } \keyword{regression} statmod/man/tweedie.Rd0000644000176200001440000000650312654044513014425 0ustar liggesusers\name{tweedie} \alias{tweedie} \title{Tweedie Generalized Linear Models} \description{ Produces a generalized linear model family object with any power variance function and any power link. Includes the Gaussian, Poisson, gamma and inverse-Gaussian families as special cases. } \usage{ tweedie(var.power=0, link.power=1-var.power) } \arguments{ \item{var.power}{index of power variance function} \item{link.power}{index of power link function. \code{link.power=0} produces a log-link. Defaults to the canonical link, which is \code{1-var.power}.} } \value{ A family object, which is a list of functions and expressions used by glm and gam in their iteratively reweighted least-squares algorithms. See \code{family} and \code{glm} in the R base help for details. } \details{ This function provides access to a range of generalized linear model response distributions which are not otherwise provided by R, or any other package for that matter. It is also useful for accessing distribution/link combinations which are disallowed by the R \code{glm} function. Let \eqn{\mu_i = E(y_i)} be the expectation of the \eqn{i}th response. We assume that \deqn{\mu_i^q = x_i^Tb, var(y_i) = \phi \mu_i^p} where \eqn{x_i} is a vector of covariates and b is a vector of regression cofficients, for some \eqn{\phi}, \eqn{p} and \eqn{q}. This family is specified by \code{var.power = p} and \code{link.power = q}. A value of zero for \eqn{q} is interpreted as \eqn{\log(\mu_i) = x_i^Tb}. The variance power \eqn{p} characterizes the distribution of the responses \eqn{y}. The following are some special cases: \tabular{cl}{ \bold{p} \tab \bold{Response distribution}\cr 0 \tab Normal\cr 1 \tab Poisson\cr (1, 2) \tab Compound Poisson, non-negative with mass at zero\cr 2 \tab Gamma\cr 3 \tab Inverse-Gaussian\cr > 2 \tab Stable, with support on the positive reals } The name Tweedie has been associated with this family by Joergensen (1987) in honour of M. C. K. Tweedie. } \references{ Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. In \emph{Statistics: Applications and New Directions}. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute. Joergensen, B. (1987). Exponential dispersion models. \emph{J. R. Statist. Soc.} B \bold{49}, 127-162. Smyth, G. K. (1996). Regression modelling of quantity data with exact zeroes. Proceedings of the Second Australia-Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, pp. 572-580. Joergensen, B. (1997). \emph{Theory of Dispersion Models}, Chapman and Hall, London. Smyth, G. K., and Verbyla, A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. \emph{Environmetrics} \bold{10}, 695-709. } \author{Gordon Smyth} \seealso{\code{\link{glm}}, \code{\link{family}}, \code{\link[tweedie]{dtweedie}}} \examples{ y <- rgamma(20,shape=5) x <- 1:20 # Fit a poisson generalized linear model with identity link glm(y~x,family=tweedie(var.power=1,link.power=1)) # Fit an inverse-Gaussion glm with log-link glm(y~x,family=tweedie(var.power=3,link.power=0)) } \keyword{regression} statmod/man/sage.test.Rd0000644000176200001440000000575112654044513014700 0ustar liggesusers\name{sage.test} \alias{sage.test} \title{Exact Binomial Tests For Comparing Two SAGE Libraries (Obsolete)} \description{ This function is kept here so as not to break code that depends on it, but has been replaced by \code{binomTest} in the edgeR Bioconductor package and is no longer updated. It may be removed in a later release of this package. Computes p-values for differential abundance for each tag between two digital libraries, conditioning on the total count for each tag. The counts in each group as a proportion of the whole are assumed to follow a binomial distribution. } \usage{ sage.test(x, y, n1=sum(x), n2=sum(y)) } \arguments{ \item{x}{integer vector giving counts in first library. Non-integer values are rounded to the nearest integer.} \item{y}{integer vector giving counts in second library. Non-integer values are rounded to the nearest integer.} \item{n1}{total number of tags in first library. Non-integer values are rounded to the nearest integer.} \item{n2}{total number of tags in second library. Non-integer values are rounded to the nearest integer.} } \details{ This function was originally written for comparing SAGE libraries (a method for counting the frequency of sequence tags in samples of RNA). It can however be used for comparing any two digital libraries from RNA-Seq, ChIP-Seq or other technologies with respect to technical variation. An exact two-sided binomial test is computed for each tag. This test is closely related to Fisher's exact test for 2x2 contingency tables but, unlike Fisher's test, it conditions on the total number of counts for each tag. The null hypothesis is that the expected counts are in the same proportions as the library sizes, i.e., that the binomial probability for the first library is \code{n1/(n1+n2)}. The two-sided rejection region is chosen analogously to Fisher's test. Specifically, the rejection region consists of those values with smallest probabilities under the null hypothesis. When the counts are reasonably large, the binomial test, Fisher's test and Pearson's chisquare all give the same results. When the counts are smaller, the binomial test is usually to be preferred in this context. This function is a later version of the earlier \code{sage.test} function in the sagenhaft Bioconductor package. This function has been made obsolete by \code{binomTest} in the edgeR package. } \value{ Numeric vector of p-values. } \references{ \url{http://en.wikipedia.org/wiki/Binomial_test} \url{http://en.wikipedia.org/wiki/Fisher's_exact_test} \url{http://en.wikipedia.org/wiki/Serial_analysis_of_gene_expression} http://en.wikipedia.org/wiki/RNA-Seq } \author{Gordon Smyth} \seealso{ \code{binomTest} (edgeR package), \code{\link{binom.test}} (stats package) } \examples{ sage.test(c(0,5,10),c(0,30,50),n1=10000,n2=15000) # Univariate equivalents: binom.test(5,5+30,p=10000/(10000+15000))$p.value binom.test(10,10+50,p=10000/(10000+15000))$p.value } \keyword{htest} statmod/man/hommel.test.Rd0000644000176200001440000000223112654044513015230 0ustar liggesusers\name{hommel.test} \alias{hommel.test} \title{Test Multiple Comparisons Using Hommel's Method} \description{Given a set of p-values and a test level, returns vector of test results for each hypothesis.} \usage{hommel.test(p, alpha=0.05) } \arguments{ \item{p}{numeric vector of p-values} \item{alpha}{numeric value, desired significance level} } \value{logical vector indicating whether each hypothesis is accepted} \details{ This function implements the multiple testing procedure of Hommel (1988). Hommel's method is also implemented as an adjusted p-value method in the function \code{p.adjust} but the accept/reject approach used here is faster. } \references{ Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. \emph{Biometrika}, \bold{75}, 383-386. Shaffer, J. P. (1995). Multiple hypothesis testing. \emph{Annual Review of Psychology} \bold{46}, 561-576. (An excellent review of the area.) } \author{Gordon Smyth} \seealso{ \code{\link[stats:p.adjust]{p.adjust}} } \examples{ p <- sort(runif(100))[1:10] cbind(p,p.adjust(p,"hommel"),hommel.test(p)) } \keyword{htest} statmod/man/growthcurve.Rd0000644000176200001440000000667412654044513015367 0ustar liggesusers\name{growthcurve} \alias{compareGrowthCurves} \alias{compareTwoGrowthCurves} \alias{plotGrowthCurves} \title{Compare Groups of Growth Curves} \description{ Do all pairwise comparisons between groups of growth curves using a permutation test. } \usage{ compareGrowthCurves(group, y, levels=NULL, nsim=100, fun=meanT, times=NULL, verbose=TRUE, adjust="holm") compareTwoGrowthCurves(group, y, nsim=100, fun=meanT) plotGrowthCurves(group, y, levels=sort(unique(group)), times=NULL, col=NULL,...) } \arguments{ \item{group}{vector or factor indicating group membership. Missing values are allowed in \code{compareGrowthCurves} but not in \code{compareTwoGrowthCurves}.} \item{y}{matrix of response values with rows for individuals and columns for times. The number of rows must agree with the length of \code{group}. Missing values are allowed.} \item{levels}{a character vector containing the identifiers of the groups to be compared. By default all groups with two more more members will be compared.} \item{nsim}{number of permutations to estimated p-values.} \item{fun}{a function defining the statistic used to measure the distance between two groups of growth curves. Defaults to \code{\link{meanT}}.} \item{times}{a numeric vector containing the column numbers on which the groups should be compared. By default all the columns are used.} \item{verbose}{should progress results be printed?} \item{adjust}{method used to adjust for multiple testing, see \code{p.adjust}.} \item{col}{vector of colors corresponding to distinct groups} \item{...}{other arguments passed to \code{plot()}} } \details{ \code{compareTwoGrowthCurves} performs a permutation test of the difference between two groups of growth curves. \code{compareGrowthCurves} does all pairwise comparisons between two or more groups of growth curves. Accurate p-values can be obtained by setting \code{nsim} to some large value, \code{nsim=10000} say. } \value{ \code{compareTwoGrowthCurves} returns a list with two components, \code{stat} and \code{p.value}, containing the observed statistics and the estimated p-value. \code{compareGrowthCurves} returns a data frame with components \item{Group1}{name of first group in a comparison} \item{Group2}{name of second group in a comparison} \item{Stat}{observed value of the statistic} \item{P.Value}{estimated p-value} \item{adj.P.Value}{p-value adjusted for multiple testing} } \author{Gordon Smyth} \references{ Elso, C. M., Roberts, L. J., Smyth, G. K., Thomson, R. J., Baldwin, T. M., Foote, S. J., and Handman, E. (2004). Leishmaniasis host response loci (lmr13) modify disease severity through a Th1/Th2-independent pathway. \emph{Genes and Immunity} 5, 93-100. \url{http://www.nature.com/gene/journal/v5/n2/full/6364042a.html} Baldwin, T., Sakthianandeswaren, A., Curtis, J., Kumar, B., Smyth, G. K., Foote, S., and Handman, E. (2007). Wound healing response is a major contributor to the severity of cutaneous leishmaniasis in the ear model of infection. \emph{Parasite Immunology} 29, 501-513. \url{http://www.blackwell-synergy.com/doi/abs/10.1111/j.1365-3024.2007.00969.x} } \seealso{ \code{\link{meanT}}, \code{\link{compareGrowthCurves}}, \code{\link{compareTwoGrowthCurves}} } \examples{ # A example with only one time data(PlantGrowth) compareGrowthCurves(PlantGrowth$group,as.matrix(PlantGrowth$weight)) # Can make p-values more accurate by nsim=10000 } \keyword{regression} statmod/man/gauss.quad.prob.Rd0000644000176200001440000000522312654044513016011 0ustar liggesusers\name{gauss.quad.prob} \alias{gauss.quad.prob} \title{Gaussian Quadrature with Probability Distributions} \description{Calculate nodes and weights for Gaussian quadrature in terms of probability distributions.} \usage{gauss.quad.prob(n,dist="uniform",l=0,u=1,mu=0,sigma=1,alpha=1,beta=1)} \arguments{ \item{n}{number of nodes and weights} \item{dist}{distribution that Gaussian quadrature is based on, one of \code{"uniform"}, \code{"normal"}, \code{"beta"} or \code{"gamma"}} \item{l}{lower limit of uniform distribution} \item{u}{upper limit of uniform distribution} \item{mu}{mean of normal distribution} \item{sigma}{standard deviation of normal distribution} \item{alpha}{positive shape parameter for gamma distribution or first shape parameter for beta distribution} \item{beta}{positive scale parameter for gamma distribution or second shape parameter for beta distribution} } \value{A list containing the components \item{nodes}{vector of values at which to evaluate the function} \item{weights}{vector of weights to give the function values} } \details{ This is a rewriting and simplification of \code{gauss.quad} in terms of probability distributions. The probability interpretation is explained by Smyth (1998). For details on the underlying quadrature rules, see \code{\link[statmod:gauss.quad]{gauss.quad}}. The expected value of \code{f(X)} is approximated by \code{sum(w*f(x))} where \code{x} is the vector of nodes and \code{w} is the vector of weights. The approximation is exact if \code{f(x)} is a polynomial of order no more than \code{2n-1}. The possible choices for the distribution of \code{X} are as follows: Uniform on \code{(l,u)}. Normal with mean \code{mu} and standard deviation \code{sigma}. Beta with density \code{x^(alpha-1)*(1-x)^(beta-1)/B(alpha,beta)} on \code{(0,1)}. Gamma with density \code{x^(alpha-1)*exp(-x/beta)/beta^alpha/gamma(alpha)}. } \references{ Smyth, G. K. (1998). Polynomial approximation. In: \emph{Encyclopedia of Biostatistics}, P. Armitage and T. Colton (eds.), Wiley, London, pages 3425-3429. \url{http://www.statsci.org/smyth/pubs/PolyApprox-Preprint.pdf} } \author{Gordon Smyth, using Netlib Fortran code \url{http://www.netlib.org/go/gaussq.f}, and including corrections suggested by Spencer Graves} \seealso{ \code{\link{gauss.quad}}, \code{\link{integrate}} } \examples{ # the 4th moment of the standard normal is 3 out <- gauss.quad.prob(10,"normal") sum(out$weights * out$nodes^4) # the expected value of log(X) where X is gamma is digamma(alpha) out <- gauss.quad.prob(32,"gamma",alpha=5) sum(out$weights * log(out$nodes)) } \keyword{math} statmod/man/meanT.Rd0000644000176200001440000000226512654044513014044 0ustar liggesusers\name{meanT} \alias{meanT} \title{Mean t-Statistic Between Two Groups of Growth Curves} \description{ The mean-t statistic of the distance between two groups of growth curves. } \usage{ meanT(y1, y2) } \arguments{ \item{y1}{matrix of response values for the first group, with a row for each individual and a column for each time. Missing values are allowed.} \item{y2}{matrix of response values for the second group. Must have the same number of columns as \code{y1}. Missing values are allowed.} } \details{ This function computes the pooled two-sample t-statistic between the response values at each time, and returns the mean of these values weighted by the degrees of freedom. This function is used by \code{compareGrowthCurves}. } \value{numeric vector of length one containing the mean t-statistic.} \author{Gordon Smyth} \seealso{ \code{\link{compareGrowthCurves}}, \code{\link{compareTwoGrowthCurves}} } \examples{ y1 <- matrix(rnorm(4*3),4,3) y2 <- matrix(rnorm(4*3),4,3) meanT(y1,y2) data(PlantGrowth) compareGrowthCurves(PlantGrowth$group,as.matrix(PlantGrowth$weight)) # Can make p-values more accurate by nsim=10000 } \keyword{regression} statmod/man/glmscoretest.Rd0000644000176200001440000000437412654044513015516 0ustar liggesusers\name{glm.scoretest} \alias{glm.scoretest} \title{Score Test for Adding a Covariate to a GLM} \description{ Computes score test statistics (z-statistics) for adding covariates to a generalized linear model. } \usage{ glm.scoretest(fit, x2, dispersion=NULL) } \arguments{ \item{fit}{generalized linear model fit object, of class \code{\link{glm}}.} \item{x2}{vector or matrix with each column a covariate to be added.} \item{dispersion}{the dispersion for the generalized linear model family.} } \details{ Rao's score statistic. Is the locally most powerful test for testing vs a one-sided alternative. Asympotically equivalent to likelihood ratio tests, but convenient for one-sided tests. This function computes a score test statistics for adding each covariate individually. The dispersion parameter is treated as for \code{\link{summary.glm}}. If \code{NULL}, the Pearson estimator is used, except for the binomial and Poisson families, for which the dispersion is one. } \value{numeric vector containing the z-statistics, one for each covariate.} \author{Gordon Smyth} \seealso{ \code{\link{glm}}, \code{\link{add1}} } \references{ Lovison, G (2005). On Rao score and Pearson $X^2$ statistics in generalized linear models. \emph{Statistical Papers}, 46, 555-574. Pregibon, D (1982). Score tests in GLIM with applications. In \emph{GLIM82: Proceedings of the International Conference on Generalized Linear Models}, R Gilchrist (ed.), Lecture Notes in Statistics, Volume 14, Springer, New York, pages 87-97. Smyth, G. K. (2003). Pearson's goodness of fit statistic as a score test statistic. In: \emph{Science and Statistics: A Festschrift for Terry Speed}, D. R. Goldstein (ed.), IMS Lecture Notes - Monograph Series, Volume 40, Institute of Mathematical Statistics, Beachwood, Ohio, pages 115-126. \url{http://www.statsci.org/smyth/pubs/goodness.pdf} } \examples{ # Pearson's chisquare test for independence # in a contingency table is a score test. # First the usual test y <- c(20,40,40,30) chisq.test(matrix(y,2,2),correct=FALSE) # Now same test using glm.scoretest a <- gl(2,1,4) b <- gl(2,2,4) fit <- glm(y~a+b,family=poisson) x2 <- c(0,0,0,1) z <- glm.scoretest(fit,x2) z^2 } \keyword{regression} statmod/man/gauss.quad.Rd0000644000176200001440000000560012654044513015047 0ustar liggesusers\name{gauss.quad} \alias{gauss.quad} \title{Gaussian Quadrature} \description{Calculate nodes and weights for Gaussian quadrature.} \usage{gauss.quad(n,kind="legendre",alpha=0,beta=0)} \arguments{ \item{n}{number of nodes and weights} \item{kind}{kind of Gaussian quadrature, one of \code{"legendre"}, \code{"chebyshev1"}, \code{"chebyshev2"}, \code{"hermite"}, \code{"jacobi"} or \code{"laguerre"}} \item{alpha}{parameter for Jacobi or Laguerre quadrature, must be greater than -1} \item{beta}{parameter for Jacobi quadrature, must be greater than -1} } \value{A list containing the components \item{nodes}{vector of values at which to evaluate the function} \item{weights}{vector of weights to give the function values} } \details{ The integral from \code{a} to \code{b} of \code{w(x)*f(x)} is approximated by \code{sum(w*f(x))} where \code{x} is the vector of nodes and \code{w} is the vector of weights. The approximation is exact if \code{f(x)} is a polynomial of order no more than \code{2n-1}. The possible choices for \code{w(x)}, \code{a} and \code{b} are as follows: Legendre quadrature: \code{w(x)=1} on \code{(-1,1)}. Chebyshev quadrature of the 1st kind: \code{w(x)=1/sqrt(1-x^2)} on \code{(-1,1)}. Chebyshev quadrature of the 2nd kind: \code{w(x)=sqrt(1-x^2)} on \code{(-1,1)}. Hermite quadrature: \code{w(x)=exp(-x^2)} on \code{(-Inf,Inf)}. Jacobi quadrature: \code{w(x)=(1-x)^alpha*(1+x)^beta} on \code{(-1,1)}. Note that Chebyshev quadrature is a special case of this. Laguerre quadrature: \code{w(x)=x^alpha*exp(-x)} on \code{(0,Inf)}. The algorithm used to generated the nodes and weights is explained in Golub and Welsch (1969). } \references{ Golub, G. H., and Welsch, J. H. (1969). Calculation of Gaussian quadrature rules. \emph{Mathematics of Computation} \bold{23}, 221-230. Golub, G. H. (1973). Some modified matrix eigenvalue problems. \emph{Siam Review} \bold{15}, 318-334. Smyth, G. K. (1998). Numerical integration. In: \emph{Encyclopedia of Biostatistics}, P. Armitage and T. Colton (eds.), Wiley, London, pages 3088-3095. \url{http://www.statsci.org/smyth/pubs/NumericalIntegration-Preprint.pdf} Smyth, G. K. (1998). Polynomial approximation. In: \emph{Encyclopedia of Biostatistics}, P. Armitage and T. Colton (eds.), Wiley, London, pages 3425-3429. \url{http://www.statsci.org/smyth/pubs/PolyApprox-Preprint.pdf} Stroud, AH, and Secrest, D (1966). \emph{Gaussian Quadrature Formulas}. Prentice-Hall, Englewood Cliffs, N.J. } \author{Gordon Smyth, using Netlib Fortran code \url{http://www.netlib.org/go/gaussq.f}, and including a suggestion from Stephane Laurent} \seealso{ \code{\link[statmod:gauss.quad.prob]{gauss.quad.prob}}, \code{\link{integrate}} } \examples{ # mean of gamma distribution with alpha=6 out <- gauss.quad(10,"laguerre",alpha=5) sum(out$weights * out$nodes) / gamma(6) } \keyword{math} statmod/man/qresiduals.Rd0000644000176200001440000000543412654044513015155 0ustar liggesusers\name{qresiduals} \alias{qresiduals} \alias{qresid} \alias{qres.binom} \alias{qres.pois} \alias{qres.nbinom} \alias{qres.gamma} \alias{qres.invgauss} \alias{qres.tweedie} \alias{qres.default} \title{Randomized Quantile Residuals} \description{ Compute randomized quantile residuals for generalized linear models.} \usage{ qresiduals(glm.obj,dispersion=NULL) qresid(glm.obj,dispersion=NULL) qres.binom(glm.obj) qres.pois(glm.obj) qres.nbinom(glm.obj) qres.gamma(glm.obj,dispersion=NULL) qres.invgauss(glm.obj,dispersion=NULL) qres.tweedie(glm.obj,dispersion=NULL) qres.default(glm.obj,dispersion=NULL) } \arguments{ \item{glm.obj}{Object of class \code{glm}. The generalized linear model family is assumed to be binomial for \code{qres.binom}, poisson for \code{qres.pois}, negative binomial for \code{qres.nbinom}, Gamma for \code{qres.gamma}, inverse Gaussian for \code{qres.invgauss} or tweedie for \code{qres.tweedie}.} \item{dispersion}{a positive real number. Specifies the value of the dispersion parameter for a Gamma or inverse Gaussian generalized linear model if known. If \code{NULL}, the dispersion will be estimated by its Pearson estimator.} } \value{Numeric vector of standard normal quantile residuals.} \details{ Quantile residuals are based on the idea of inverting the estimated distribution function for each observation to obtain exactly standard normal residuals. In the case of discrete distributions, such as the binomial and Poisson, some randomization is introduced to produce continuous normal residuals. Quantile residuals are the residuals of choice for generalized linear models in large dispersion situations when the deviance and Pearson residuals can be grossly non-normal. Quantile residuals are the only useful residuals for binomial or Poisson data when the response takes on only a small number of distinct values. } \references{ Dunn, K. P., and Smyth, G. K. (1996). Randomized quantile residuals. \emph{Journal of Computational and Graphical Statistics} \bold{5}, 1-10. \url{http://www.statsci.org/smyth/pubs/residual.html} } \author{Gordon Smyth} \seealso{ \code{\link{residuals.glm}} } \examples{ # Poisson example: quantile residuals show no granularity y <- rpois(20,lambda=4) x <- 1:20 fit <- glm(y~x, family=poisson) qr <- qresiduals(fit) qqnorm(qr) abline(0,1) # Gamma example: # Quantile residuals are nearly normal while usual resids are not y <- rchisq(20, df=1) fit <- glm(y~1, family=Gamma) qr <- qresiduals(fit, dispersion=2) qqnorm(qr) abline(0,1) # Negative binomial example: if(require("MASS")) { fit <- glm(Days~Age,family=negative.binomial(2),data=quine) summary(qresiduals(fit)) fit <- glm.nb(Days~Age,link=log,data = quine) summary(qresiduals(fit)) } } \keyword{regression} statmod/man/remlscor.Rd0000644000176200001440000000447112654044513014627 0ustar liggesusers\name{remlscore} \alias{remlscore} \title{REML for Heteroscedastic Regression} \description{ Fits a heteroscedastic regression model using residual maximum likelihood (REML). } \usage{ remlscore(y, X, Z, trace=FALSE, tol=1e-5, maxit=40) } \arguments{ \item{y}{numeric vector of responses} \item{X}{design matrix for predicting the mean} \item{Z}{design matrix for predicting the variance} \item{trace}{Logical variable. If true then output diagnostic information at each iteration.} \item{tol}{Convergence tolerance} \item{maxit}{Maximum number of iterations allowed} } \value{ List with the following components: \item{beta}{vector of regression coefficients for predicting the mean} \item{se.beta}{vector of standard errors for beta} \item{gamma}{vector of regression coefficients for predicting the variance} \item{se.gam}{vector of standard errors for gamma} \item{mu}{estimated means} \item{phi}{estimated variances} \item{deviance}{minus twice the REML log-likelihood} \item{h}{numeric vector of leverages} \item{cov.beta}{estimated covariance matrix for beta} \item{cov.gam}{estimated covarate matrix for gamma} \item{iter}{number of iterations used} } \details{ Write \eqn{\mu_i=E(y_i)} for the expectation of the \eqn{i}th response and \eqn{s_i=\var(y_i)}. We assume the heteroscedastic regression model \deqn{\mu_i=\bold{x}_i^T\bold{\beta}} \deqn{\log(\sigma^2_i)=\bold{z}_i^T\bold{\gamma},} where \eqn{\bold{x}_i} and \eqn{\bold{z}_i} are vectors of covariates, and \eqn{\bold{\beta}} and \eqn{\bold{\gamma}} are vectors of regression coefficients affecting the mean and variance respectively. Parameters are estimated by maximizing the REML likelihood using REML scoring as described in Smyth (2002). } \references{ Smyth, G. K. (2002). An efficient algorithm for REML in heteroscedastic regression. \emph{Journal of Computational and Graphical Statistics} \bold{11}, 836-847. } \author{Gordon Smyth} \examples{ data(welding) attach(welding) y <- Strength # Reproduce results from Table 1 of Smyth (2002) X <- cbind(1,(Drying+1)/2,(Material+1)/2) colnames(X) <- c("1","B","C") Z <- cbind(1,(Material+1)/2,(Method+1)/2,(Preheating+1)/2) colnames(Z) <- c("1","C","H","I") out <- remlscore(y,X,Z) cbind(Estimate=out$gamma,SE=out$se.gam) } \keyword{regression} statmod/man/logmdigamma.Rd0000644000176200001440000000145712654044513015260 0ustar liggesusers\name{logmdigamma} \alias{logmdigamma} \title{Log Minus Digamma Function} \description{ The difference between the \code{log} and \code{digamma} functions. } \usage{ logmdigamma(x) } \arguments{ \item{x}{numeric vector or array of positive values. Negative or zero values will return \code{NA}.} } \details{ \code{digamma(x)} is asymptotically equivalent to \code{log(x)}. \code{logmdigamma(x)} computes \code{log(x) - digamma(x)} without subtractive cancellation for large \code{x}. } \author{Gordon Smyth} \references{ Abramowitz, M., and Stegun, I. A. (1970). \emph{Handbook of mathematical functions.} Dover, New York. } \seealso{ \code{\link{digamma}} } \examples{ log(10^15) - digamma(10^15) # returns 0 logmdigamma(10^15) # returns value correct to 15 figures } \keyword{math} statmod/man/remlscorgamma.Rd0000644000176200001440000000567712654044513015643 0ustar liggesusers\name{remlscoregamma} \alias{remlscoregamma} \title{Approximate REML for gamma regression with structured dispersion} \description{ Estimates structured dispersion effects using approximate REML with gamma responses. } \usage{ remlscoregamma(y,X,Z,mlink="log",dlink="log",trace=FALSE,tol=1e-5,maxit=40) } \arguments{ \item{y}{numeric vector of responses} \item{X}{design matrix for predicting the mean} \item{Z}{design matrix for predicting the variance} \item{mlink}{character string or numeric value specifying link for mean model} \item{dlink}{character string or numeric value specifying link for dispersion model} \item{trace}{Logical variable. If true then output diagnostic information at each iteration.} \item{tol}{Convergence tolerance} \item{maxit}{Maximum number of iterations allowed} } \value{ List with the following components: \item{beta}{Vector of regression coefficients for predicting the mean} \item{se.beta}{