robustbase/0000755000176200001440000000000012553614220012423 5ustar liggesusersrobustbase/TODO0000644000176200001440000002704012553432042013116 0ustar liggesusers -*- org -*- C-c C-o follows link[MM: grep-r -e '\(FIXME\|TODO\)'] * Before next release ** DONE print() and print(summary()): 'method' almost twice; show iBest for "deterministic" ** DONE covMcd(): allow 'hsets.ini' argument to covMcd(), and *return* them (optionally) as 'Hsubsets' ** DONE export rankMM, classSVD, .signflip [MM: repmat should not be needed] ** DONE 'scalefn' argument for covMcd() to be used for "detMCD" ** DONE adjOutlyingness(): *** DONE change defaults for clower & cupper (E-mail from P.Segaert). *** DONE But there is more: +/- swap ==> results not back compatible ** DONE colMedians() -> ask Henrik/ about "License: Artistic-2.0" ** TODO nlrob() *** TODO summary(nlrob(*)) fails for new methods; better error message or *work* **** TODO for "MM" we are close to done; ideally want '$ rweights' (robustness weights) for all meth *** TODO residuals( nlrob(), type = "...") should provide types "as in the literature" *** DONE nlrob(*, method = "...") should call methods "tau", "CM", "MTL", "MM" by Eduardo Conceicao **** DONE shouldn't we rename jde() to jdeopt() or even jdeoptim(), jDEoptim(), or JDEoptim() R users already know optim() etc.. so the name seems more logical for them. * Short Term ** TODO estimethod(): also for lmrob() and glmrob() models ** TODO VT implement .detMcd() in C ** TODO r6pack(milk, ..): *return* singularity message but do not signal an error ** TODO nlrob(*): *** TODO for the "vector parameter" biomass example in tests/nlrob-tst.R: method = "MM" As we do want the formula to work ==> we *must* allow 'lower' & 'upper' as list()s in R/nlregrob.R, have 14 matches for "eval *( *formula\[\[3L?" ((and *org* shows the `[[3L].]` (no ".") as underscored 3L)) : 123: y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) ) 127: y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) ) 141: y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) ) 175: res <- y - eval( formula[[3L]], c(data, initial$par) ) 193: fit <- eval( formula[[3L]], c(data, coef) ) 254: fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) ) 300: fit <- eval( formula[[3L]], c(data, coef) ) 355: fit <- eval( formula[[3L]], c(data, par) ) 361: fit <- eval( formula[[3L]], c(data, par) ) 366: fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) ) 390: fit <- eval( formula[[3L]], c(data, coef) ) 434: fit <- eval( formula[[3L]], c(data, par) ) 442: fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) ) 468: fit <- eval( formula[[3L]], c(data, coef) ) the same as in R/nlrob.R where we had eval(.., c(data, coef)) but now eval(.., c(data, start)) *** TODO nlrob(*, method=.) -- try at least *one* other optimizer than JDEoptim(), since we provide already most of the needed "hooks". *** DONE nlrob(*, method="M"): allow a "fixed initial sigma" (--> inference for "MM") *** TODO confint.nlrob():, "Wald" works; naming follows confint.lmer() from lme4 **** TODO method = "profile" based on (TODO) profile.nlrob() **** DONE method = "Wald" works ** TODO simulate() : implement for "nlrob" {and lmrob() ? and ..} ** glmrob *** BYlogreg() [ R/BYlogreg.R ] --> more tests in ./tests/glmrob-1.R --> glm.fit() instead of glm() --> vcov() instead of just std.err. {is already there} *** glmrob(*, weights.on.x = "robCov") uses MASS::cov.rob(), i.e. "MVE" and Andreas had a comment that "mcd" is worse. "covMcd" has been available for a while; now via robXweights() in ./R/glmrobMqle.R HOWEVER: Need something better when 'X' has (binary!) factors! "hat" +- works, but needs more work *** We now allow weights.on.x to be an arbitrary general wts(X, intercept) function _or_ a list containing a robMcd()-like function. Definitely need *testing* this situation! *** glmrob(): anova() has three variants: "V1", "Eva1", "Andreas1" --> ./R/glmrobMqle-DQD.R - gives warning every time {-> easy to fix} - Default is "V1" is that a good idea? *** glmrob() needs a bit more tests in ./tests/ [also consider those from man/glmrob.Rd] take those from Martin's old 'robGLM1' package (need more!) *** --> first test already shows that Martin's tests for "huberC == Inf" were *not* yet moved from robGLM1 to glmrob()... (in other words: glmrob() should work *** also, ni = 0 does not work quite as it should ( ./tests/binom-ni-small.R ) *** obj $ df ... maybe should be defined -- for "glm" methods to be applicable --> e.g. for predict(, interval="..") ! *** summary.glmrob() should be better documented; we should decide if the current return value is fine. *** Eva's code (and MM's) also computed & returned the "asymptotic efficiency"! *** anova.glmrob(): More modularization, allowing to provide own 'test' function. Test if Huber's C are different. Need theory to compare different C's and same model (which includes classical vs robust). *** add1() and/or drop1() would be nice ** TODO scaleTau2(): Also do a cheap finite-sample correction [MM] ! [DONE partly; but undocumented, since bound to change --> file:~/R/MM/STATISTICS/robust/1d-scale.R , 1d-scale-sim.R, etc --- unfinished!! ** TODO Psi/Rho/Chi/Wgt Functions We have quite a few "partial" collections of rho/psi functions; some are "sync"ed now, some not yet:: *** TODO 1) have the nice S4 class psi_func + psiFunc() and .defDwgt() functions in file:R/psi-rho-funs.R with further explorations, ideas in file:misc/experi-psi-rho-funs.R **** TODO print/show of such psi_func should show more; at least the psi function **** TODO str.psi_func() should be a bit nicer than the current default str() **** TODO nlrob(): also allow psi to be a 'psiFunc': --> ./R/nlrob.R ; consider even more *real* checks; now in tests/nlrob-tst.R *** DONE 2) deprecated: "old" tukeyChi() & tukeyPsi1() originally called from lmrob() , in ./R/biweight-funs.R *** DONE 3) psi.*(...., rho = FALSE/TRUE) functions from Andreas ([[file:.R/psi-funs-AR.R]]) replaced by using the new psi_func objects **** DONE nlrob() changed: uses psi = .Mwgt.psi1("huber", cc=1.345) as default *** TODO 4) have (C-based) functions Mpsi(), Mchi(), Mwgt(), etc, used from lmrob(), in ./R/lmrob.MM.R **** TODO provide Mpsi(psi = "t") etc; tuning parameter: 'nu' => MLE-t_\nu psi and psi' **** TODO provide '1)'-i.e. psi_func versions of the Mpsi() etc **** TODO Mpsi(*, "GGW") etc : have no (??) easy way to directly specify (a,b,c) tuning pars **** TODO *New* Mwgt(*, deriv=1) would correspond to Dwgt in psiFunc() which Manuel needs **** DONE now exported and documented in man/M.psi.Rd Further files, illustrating features, differences, etc: ./vignettes/psi_functions.Rnw -- with quite a few FIXME ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./inst/xtraR/plot-psiFun.R chkPsiDeriv() {and plot utils} ./tests/psi-rho-etc.R compute asymp.efficiency and breakdown point ! ./tests/lmrob-psifns.R plot and lmrob()-test them **** DONE Deprecate* the "2)", tukeyChi() etc, making them call the M.*fun(): * Mid Term ** TODO New lmrob() features ("M-S" as option?): *** Function names with "." (which are exported) are frowned upon e.g. lmrob.split() *** checking .vcov.avar1() and its "posdefify" options ?? *** TODO lmrob.mar() [file:inst/doc/estimating.functions.R]: Maronna & Yohai (2010) should ~~~~~~~~~~ become part of robustbase, maybe under a better name, e.g. via lmrob( ... control ..) or directly. It is much used in the simulations of Koller & Stahel (2011) *** TODO Provide "simple M" estimator [so MASS :: rlm() is entirely superseeded] Consider lmrob(*, method = "M") --> default init = "ls" (Least Sq; as MASS:::rlm.default) which calls lmrob..M..fit() which is already documented as "simple" M-estimator (though the *scale* is kept fixed; i.e., no 'proposal 2'). ** TODO glmrob(), glmrobMqle(), etc : expression()s and eval() no longer "satisfactory", e.g., see FIXME in ./R/glmrobMqle.R ** TODO covMcd(): pass k3 as argument; default=current ==> allow "formula" k3 = k3(n,p) !! ** covOGK(): The argument name 'weight.fn' is pretty ugly and the default function name 'hard.rejection()' is just awful (we need a globally available function as 'role model'. - Could allow 'n.iter = 0' to simply compute Cov()_{ij} = rcov(X_i, X_j) ** rrcov etc *** rrcov.control() __ NEEDS name change ! ______ probably use mcd.control() and lts.control() or forget about *control() completely? since there are only a few in each ??????/ *** TODO tolellipse() --> renamed to tolEllipsePlot() **** maybe use cluster::ellipsoidPoints() **** allow other percentiles than just 97.5% **** maybe *return* something *** plot(mcd. ) [ R/covPlot.R ] : should show the call Default for 'ask' should be smarter: depend on prod(par("mfrow")) < #{plots} (which depends on 'classic' and p=2) *** ltsReg(): has undocumented '$resid' in addition to '$residuals' and '$raw.residuals'; drop it or document it ! ** More lmrob() considerations *** DONE more tests in tests/ *** fully implement and test the multivariate case (y = matrix with > 1 col.) *** src/lmrob.c : does median() , MAD() instead of using R's sort() routines * Long Term / Maybe ** inst/doc/lmrob_simulation.Rnw : *** use hyperlinks {e.g. using jss docu.class!} *** consider making parts available in (new) ./demo/lmrob...R *** tau_i (p.5) is not clear for Joe Average. .......................................... ** Generalizing 'wgt.himedian': We'd want a C API on which R builds. There are pure R implementations: - 'weighted.median()' in limma and I have generalized it ---> file:inst/xtraR/ex-funs.R - more general code (different 'tie' strategies; weighted *quantile*s) in file:/u/maechler/R/MM/STATISTICS/robust/weighted-median.R - The 'Hmisc' package has wtd.quantile() ** Miscellaneous *** Alternative version of covOGK() for correlation-only using's Huber's correlation formula which ensures [-1,1] range --> ~/R/MM/Pkg-ex/robustbase/robcorgroesser1.R and ~/R/MM/STATISTICS/robust/pairwise-new.R *** package 'riv' (author @ epfl.ch!) has 'slc()' ~= cov.S(.) -- in pure R code doesn't Valentin have a version too? otherwise: test this, ask author for "donation" to robustbase *** adjOutlyingness() : typo-bug is corrected; and I have made it more pretty. Still a bit problematic when denominator = 0 Currently leave away all the c/0 = Inf and 0/0 = NaN values. MM: Maybe, it's the fact that the coef = 1.5 should really depend on the sample size n and will be too large for small n (??) --> should ask Mia and maybe Guy Brys *** Add data sets from the MMY-book -- mostly done {do we have *all* ?} *** Data Sets --- Valentin Todorov has several of Rousseeuw's in the 'rrov' package (and promised me "the rest" when needed) Don't like the *.x, *.y sub datasets: They shouldn't be needed when use a *formula* In his lts tests, he uses these "data sets from the literature": (Note that 'stackloss' is already in "datasets") : heart.x,heart.y, data(heart) stars.x,stars.y, data(stars) phosphor.x,phosphor.y, data(phosphor) stack.x,stack.loss, data(stackloss) coleman.x,coleman.y, data(coleman) salinity.x,salinity.y, data(salinity) aircraft.x,aircraft.y, data(aircraft) delivery.x,delivery.y, data(delivery) wood.x,wood.y, data(wood) hbk.x,hbk.y, data(hbk) robustbase/po/0000755000176200001440000000000012553432152013043 5ustar liggesusersrobustbase/po/update-me.sh0000755000176200001440000000152612533543452015273 0ustar liggesusers#!/bin/sh # #__>> Keep in sync with ~/R/Pkgs/Matrix/po/update-me.sh <<__ # ## Script for updating package-specific *.pot files ## written such that it should work for any package # R=${R:-R} thisdir=`dirname $0` ; cd $thisdir; thisdir=`pwd` echo "R = '$R' (`$R --version | head -1`) preliminary thisdir='$thisdir'" pkgDIR=`dirname $thisdir` pkg=`basename $pkgDIR` echo ' --> pkgDIR='$pkgDIR' ; pkg='$pkg echo "require('tools'); update_pkg_po('$pkgDIR')" | $R --slave 2>&1 | tee update.log ## -------------------------------- as of R 3.0.0 echo 'end{make pkg-update}' ; echo '' echo 'Test with (e.g.)' echo ' LANGUAGE=de R --no-environ --no-save' ; echo '' echo 'and then something like' echo ' ... [TO BE ADDED] ...'; echo '' echo 'Commit with something like' echo " svn ci -m'translation updates' po inst/po"; echo '' robustbase/po/R-robustbase.pot0000644000176200001440000005015012533543274016146 0ustar liggesusersmsgid "" msgstr "" "Project-Id-Version: robustbase 0.92-4\n" "POT-Creation-Date: 2015-06-03 11:16\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" msgid "some terms will have NAs due to the limits of the method" msgstr "" msgid "y is not onedimensional" msgstr "" msgid "Number of observations in x and y not equal" msgstr "" msgid "no intercept in the model" msgstr "" msgid "All observations have missing values!" msgstr "" msgid "Implosion: sigma1=%g became too small" msgstr "" msgid "Convergence Achieved" msgstr "" msgid "No convergence in" msgstr "" msgid "steps." msgstr "" msgid "maximum number of \"kstep\" iterations must be > 0" msgstr "" msgid "maximal number of *inner* step halvings must be > 0" msgstr "" msgid "value of the tuning constant c ('const') must be > 0" msgstr "" msgid "non-trivial prior 'weights' are not yet implemented for \"BY\"" msgstr "" msgid "'start' cannot yet be passed to glmrobBY()" msgstr "" msgid "'offset' is not yet implemented for \"BY\"" msgstr "" msgid "'weights.on.x' =" msgstr "" msgid "is not implemented" msgstr "" msgid "glm(.) {inner subsample} error:" msgstr "" msgid "betaExacto glm(.) error:" msgstr "" msgid "value of acc must be > 0" msgstr "" msgid "number of subsamples must be > 0" msgstr "" msgid "maximum number of iterations must be > 0" msgstr "" msgid "value of the tuning constant c (cw) must be > 0" msgstr "" msgid "Currently, only family 'poisson' is supported for the \"MT\" estimator" msgstr "" msgid "All weights must be positive" msgstr "" msgid "non-trivial 'offset' is not yet implemented" msgstr "" msgid "NAs in V(mu)" msgstr "" msgid "0s in V(mu)" msgstr "" msgid "'start' must be an initial estimate of beta, of length %d" msgstr "" msgid "optim(.) non-convergence:" msgstr "" msgid "'X' must have at least two columns" msgstr "" msgid "invalid first argument" msgstr "" msgid "'formula' missing or incorrect" msgstr "" msgid "'coef' must not be negative" msgstr "" msgid "More dimensions than observations, currently not implemented" msgstr "" msgid "More dimensions than observations: not yet implemented" msgstr "" msgid "**** sampling iterations were not sufficient. Please report" msgstr "" msgid "the following arguments to 'anova.glmrob' are invalid and" msgstr "" msgid "dropped:" msgstr "" msgid "anova.glmrob() only works for 'glmrob' objects" msgstr "" msgid "'Anova Table' for a single model object not yet implemented" msgstr "" msgid "Not the same response used in the fitted models" msgstr "" msgid "models were not all fitted to the same size of dataset" msgstr "" msgid "Not the same method used for fitting the models" msgstr "" msgid "Not the same tuning constant c used in the robust fits" msgstr "" msgid "Models are not nested!" msgstr "" msgid "Models are not strictly nested" msgstr "" msgid "This family is not implemented" msgstr "" msgid "some eigenvalues are negative" msgstr "" msgid "non-implemented test method:" msgstr "" msgid "for fitting method" msgstr "" msgid "the following arguments to 'anova.lmrob' are invalid and" msgstr "" msgid "For test = 'Deviance', the estimator chain has to end with 'M'" msgstr "" msgid "The first object does not contain the largest model" msgstr "" msgid "All models are refitted except the largest one" msgstr "" msgid "anova.lmrob() only works for 'lmrob' objects" msgstr "" msgid "'Anova Table' for a single model not yet implemented" msgstr "" msgid "invalid 'test'" msgstr "" msgid "Please fit the nested models by lmrob" msgstr "" msgid "NA coefs in full and reduced model do not match" msgstr "" msgid "test" msgstr "" msgid "not yet implemented" msgstr "" msgid "deriv must be in {0,1,2}" msgstr "" msgid "deriv must be in {-1,0,1,2}" msgstr "" msgid "'x' must be a numeric matrix" msgstr "" msgid "The sample size must be greater than 1 for svd" msgstr "" msgid "'wgtFUN' must be a function or a string specifying such a function" msgstr "" msgid "invalid 'seed'. Must be compatible with .Random.seed !" msgstr "" msgid "Invalid number of trials nsamp =" msgstr "" msgid "!" msgstr "" msgid "n <= p -- you can't be serious!" msgstr "" msgid "n == p+1 is too small sample size for MCD" msgstr "" msgid "n < 2 * p, i.e., possibly too small sample size" msgstr "" msgid "Sample size n < h(alpha; n,p) := size of \"good\" subsample" msgstr "" msgid "subsample size\t h < n/2 may be too small" msgstr "" msgid "'wgtFUN' must be a function or one of the strings %s." msgstr "" msgid "Unexpected 'exactfit' code" msgstr "" msgid ". Please report!" msgstr "" msgid "illegal 'singularity$kind'" msgstr "" msgid "Options 'best' and 'exact' not allowed for n greater than 2*nmini-1 =" msgstr "" msgid ".\nUsing default." msgstr "" msgid "'nsamp = \"best\"' allows maximally" msgstr "" msgid "subsets;\ncomputing these" msgstr "" msgid "This may take a" msgstr "" msgid "very" msgstr "" msgid "long time!" msgstr "" msgid "Invalid number of trials nsamp=%s. Using default nsamp=%d." msgstr "" msgid "nsamp > i_max := maximal integer -- not allowed;" msgstr "" msgid "set to i_max =" msgstr "" msgid "Use only with 'mcd' objects" msgstr "" msgid "x is not a numeric dataframe or matrix." msgstr "" msgid "argument 'm.cov' must have numeric components 'center' and 'cov'" msgstr "" msgid "Data set and provided center have different dimensions!" msgstr "" msgid "The covariance matrix is singular!" msgstr "" msgid "id.n" msgstr "" msgid "must be in {1,..," msgstr "" msgid "}" msgstr "" msgid "For tolerance ellipses the dimension 'p' must be 2!" msgstr "" msgid "'hsets.init' must be a h' x L matrix (h' >= h) of observation indices" msgstr "" msgid "'full.h' is true, but 'hsets.init' has less than n rows" msgstr "" msgid "'hsets.init' must be in {1,2,...,n}; n =" msgstr "" msgid "More than h of the observations lie on a hyperplane." msgstr "" msgid "original detmcd() wrongly declared c-step convergence (obj=%g, i=%d, j=%d)" msgstr "" msgid "Invalid scalefn='%s': must be function or a valid string" msgstr "" msgid "'%s' must be a function, numeric vector of length p, or NULL" msgstr "" msgid "provide better scale; must be all positive" msgstr "" msgid "More than half of the observations lie on a hyperplane." msgstr "" msgid "'%s' is not a valid family (see ?family)" msgstr "" msgid "Robust GLM fitting not yet implemented for family %s" msgstr "" msgid "'weights' must be non-negative" msgstr "" msgid "Number of offsets is %d, should rather equal %d (number of observations)" msgstr "" msgid "'weights.on.x' must be a string, function, list or numeric n-vector" msgstr "" msgid "'start' must be a numeric vector, NULL, or a character string" msgstr "" msgid "invalid 'start' string" msgstr "" msgid "For method 'cubif', use glmRob() from package 'robust'" msgstr "" msgid "method='%s' is only applicable for binomial family, but family=\"\"" msgstr "" msgid "invalid 'method':" msgstr "" msgid "setting 'y = FALSE' has no longer any effect" msgstr "" msgid "No weights defined for this object. Use type=\"robustness\" argument to get robustness weights." msgstr "" msgid "need non-robust working residuals for this model type" msgstr "" msgid "invalid 'variant':" msgstr "" msgid "Weighting method" msgstr "" msgid "if a list, weights.on.x must contain a covariance function such as covMcd()" msgstr "" msgid "weights.on.x needs %d none-negative values" msgstr "" msgid "All weights.on.x must be none negative" msgstr "" msgid "'offset' not fully implemented" msgstr "" msgid "illegal 'family' argument" msgstr "" msgid "start 'theta' has still NA's .. badly singular x" msgstr "" msgid "Cannot find valid starting values: You need help" msgstr "" msgid "family '%s' not yet implemented" msgstr "" msgid "NAs in d(mu)/d(eta)" msgstr "" msgid "Non-finite coefficients at iteration" msgstr "" msgid "Algorithm did not converge" msgstr "" msgid "fitted probabilities numerically 0 or 1 occurred" msgstr "" msgid "fitted rates numerically 0 occurred" msgstr "" msgid "Only 'test.acc = \"coef\"' is currently implemented" msgstr "" msgid "value of the tuning constant c (tcc) must be > 0" msgstr "" msgid "Std.error computation not yet available for the case of 'weights'" msgstr "" msgid "negative scale 's'" msgstr "" msgid "scale 's' is zero -- returning initial 'mu'" msgstr "" msgid "scale MAD is zero for this sample" msgstr "" msgid "calculations stopped prematurely in rllarsbi" msgstr "" msgid "(probably because of rounding errors)." msgstr "" msgid "unknown split type" msgstr "" msgid "No categorical variables found in model. Reverting to S-estimator." msgstr "" msgid "No continuous variables found in model. Reverting to L1-estimator." msgstr "" msgid "'psi' should be one of %s" msgstr "" msgid "invalid 'psi'=%s; possibly use .regularize.Mpsi(%s)" msgstr "" msgid "Unknown setting '" msgstr "" msgid "'. Using defaults." msgstr "" msgid "Initial estimator '%s' not supported; using S-estimator instead" msgstr "" msgid ".vcov.avar1 can only be used when initial estimator is S; using .vcov.w instead" msgstr "" msgid "only M and D are steps supported after \"init\" computation" msgstr "" msgid "%s-step did NOT converge. Returning unconverged %s-estimate" msgstr "" msgid ":.vcov.w: cov.hubercorr must be logical (or NULL)" msgstr "" msgid ":.vcov.w: cov.corrfact must be one of" msgstr "" msgid ":.vcov.w: cov.dfcorr has to be one of -1:3" msgstr "" msgid ":.vcov.w: cov.resid must be one of" msgstr "" msgid ":.vcov.w: cov.xwx must be logical (or NULL)" msgstr "" msgid "parameter psi is not defined" msgstr "" msgid "parameter tuning.psi is not numeric" msgstr "" msgid ":.vcov.w: ignoring cov.resid == final since est != final" msgstr "" msgid ":.vcov.w: scale missing, using D scale" msgstr "" msgid "option hcorr is ignored for cov.corrfact = asympt" msgstr "" msgid ":.vcov.w: unsupported psi function" msgstr "" msgid "(tau / hybrid / tauold): tau not found in 'obj'" msgstr "" msgid ":.vcov.w: Caution. Some psiprime are NA" msgstr "" msgid ".vcov.avar1() supports only SM or MM estimates" msgstr "" msgid "X'WX is almost singular. Consider rather using cov = \".vcov.w\"" msgstr "" msgid "X'WX is singular. Rather use cov = \".vcov.w\"" msgstr "" msgid "fixing" msgstr "" msgid "negative eigen([" msgstr "" msgid "])values" msgstr "" msgid "invalid 'posdef.meth':" msgstr "" msgid "'control$n.group' must be larger than 'p' for 'large_n' algorithm" msgstr "" msgid "'groups * n.group' must be smaller than 'n' for 'large_n' algorithm" msgstr "" msgid "'control$n.group' is not much larger than 'p', probably too small" msgstr "" msgid "C function R_lmrob_S() exited prematurely" msgstr "" msgid "S-estimated scale == 0: Probably exact fit; check your data" msgstr "" msgid "lmrob..D..fit: control is missing" msgstr "" msgid "lmrob..D..fit: prior estimator did not converge, stopping" msgstr "" msgid "lmrob..D..fit: robustness weights undefined" msgstr "" msgid "lmrob..D..fit: residuals undefined" msgstr "" msgid "lmrob..D..fit: parameter psi is not defined" msgstr "" msgid "lmrob..D..fit: parameter tuning.psi is not numeric" msgstr "" msgid "control is missing" msgstr "" msgid "'control' is missing" msgstr "" msgid "internal logic error in psi() function name:" msgstr "" msgid "Please report!" msgstr "" msgid "argument 'psi' must be a string (denoting a psi function)" msgstr "" msgid "tuning constant 'cc' is not numeric" msgstr "" msgid "Coefficients for" msgstr "" msgid "function incorrectly specified." msgstr "" msgid "Use c({0, } minimal slope, b, efficiency, breakdown point)" msgstr "" msgid "Use c(minimal slope, b, efficiency, breakdown point) or k[1:3]" msgstr "" msgid "Coef. for Hampel psi function not of length 3" msgstr "" msgid "Coef. for psi function" msgstr "" msgid "not of length 1" msgstr "" msgid "tuning constants for ggw psi: both eff and bp specified, ignoring bp" msgstr "" msgid "Error: neither breakdown point nor efficiency specified" msgstr "" msgid "tuning constants for lqq psi: both eff and bp specified, ignoring bp" msgstr "" msgid ".psi.lqq.findc: unable to find constants for psi function" msgstr "" msgid "method for psi function" msgstr "" msgid "not implemented" msgstr "" msgid "tuning parameter (chi/psi) is not numeric" msgstr "" msgid "need non-negative number of nodes" msgstr "" msgid "unknown setting for parameter ss" msgstr "" msgid "'epsw' must be numeric(1) or a function of nobs(obj.) which returns a numeric(1)" msgstr "" msgid "number of rows in 'x' and length of 'object$rweights' must be the same" msgstr "" msgid "'epsx' must be numeric(1) or a function of max(abs(x)) which returns a numeric(1)" msgstr "" msgid "Detected possible local breakdown of" msgstr "" msgid "-estimate in" msgstr "" msgid "coefficients" msgstr "" msgid "coefficient" msgstr "" msgid "'" msgstr "" msgid "," msgstr "" msgid "." msgstr "" msgid "KS2014" msgstr "" msgid "Use lmrob argument 'setting=\"KS2014\"' to avoid this problem." msgstr "" msgid "'weights' must be a numeric vector" msgstr "" msgid "number of offsets is %d, should equal %d (number of observations)" msgstr "" msgid "Methods argument set by method is different from method in control" msgstr "" msgid "Using the former, method =" msgstr "" msgid "incompatible dimensions" msgstr "" msgid "missing or negative weights not allowed" msgstr "" msgid "singular fit encountered" msgstr "" msgid "init must be \"S\", \"M-S\", function or list" msgstr "" msgid "Length of initial coefficients vector does not match rank of singular design matrix x" msgstr "" msgid "unknown init argument" msgstr "" msgid "arguments" msgstr "" msgid ")$" msgstr "" msgid "^list\\(" msgstr "" msgid "are disregarded in" msgstr "" msgid "Failed to compute robust Mahalanobis distances, reverting to robust leverages." msgstr "" msgid "lmrob object does not have a proper 'qr' component. Rank zero?" msgstr "" msgid "lmrob object does not have a proper 'qr' component. Rank must be zero" msgstr "" msgid "invalid 'lmrob' object: no terms component" msgstr "" msgid ":summary.lmrob: unsupported psi function" msgstr "" msgid "weights should not be both close to 0 and close to 1!" msgstr "" msgid "You should use different 'eps' and/or 'eps1'" msgstr "" msgid "calling predict.lm() ..." msgstr "" msgid "prediction from a rank-deficient fit may be misleading" msgstr "" msgid "Predictions on current data refer to _future_ responses" msgstr "" msgid "Assuming prediction variance inversely proportional to weights used for fitting" msgstr "" msgid "Assuming constant prediction variance even though model fit is weighted" msgstr "" msgid "'weights' as formula should be one-sided" msgstr "" msgid "Use only with 'lts' objects" msgstr "" msgid "Index plot of standardized residuals is not avalable if scale = 0" msgstr "" msgid "Standardized residuals vs Fitted values plot is not avalable if scale = 0" msgstr "" msgid "Diagnostic plot is not available for univar\niate location and scale estimation" msgstr "" msgid "Regression Diagnostic plot is not avalable if scale = 0" msgstr "" msgid "Regression Diagnostic plot is not avalable: option mcd=F was set in ltsReg()." msgstr "" msgid "The MCD covariance matrix was singular." msgstr "" msgid "'id.n' must be in {1,..," msgstr "" msgid "alpha not inside [1/2, 1]" msgstr "" msgid "unknown 'wgtFUN' specification:" msgstr "" msgid "'wgtFUN' must be a function or a string specifying one" msgstr "" msgid "y is not a numeric" msgstr "" msgid "There is at least one constant column. Remove it and set intercept=TRUE" msgstr "" msgid "'qr.out = TRUE' for univariate location is disregarded" msgstr "" msgid "Need more than twice as many observations as variables." msgstr "" msgid "x is singular" msgstr "" msgid "no valid subsample found in LTS - set 'nsamp' or rather use lmrob.S()" msgstr "" msgid "NA coefficient (at %s) from \"best\" subset" msgstr "" msgid "Invalid number of trials nsamp=" msgstr "" msgid "! Using default." msgstr "" msgid "'nsamp' options 'best' and 'exact' not allowed for n greater than" msgstr "" msgid ". Will use default." msgstr "" msgid "Maximum 5000 subsets allowed for option 'best'." msgstr "" msgid "Computing 5000 subsets of size" msgstr "" msgid "out of" msgstr "" msgid "mc(): not 'converged'" msgstr "" msgid "in" msgstr "" msgid "iter" msgstr "" msgid "iterations" msgstr "" msgid "*and*" msgstr "" msgid "'reflect part' in" msgstr "" msgid "iter2" msgstr "" msgid "optimx" msgstr "" msgid "package must be loaded in order to" msgstr "" msgid "use" msgstr "" msgid "optimizer=\"optimx\"" msgstr "" msgid "couldn't find optimizer function" msgstr "" msgid "non-function specified as optimizer" msgstr "" msgid "optimizer function must use (at least) formal parameters" msgstr "" msgid "Either specify 'pnames' or provide 'upper' or 'lower' with names()" msgstr "" msgid "'pnames' must be a character vector" msgstr "" msgid "specifying 'pnames' is deprecated; rather 'lower' or 'upper' should have names()" msgstr "" msgid "parameter names must appear in 'formula'" msgstr "" msgid "lower must be either of length %d, or length 1" msgstr "" msgid "upper must be either of length %d, or length 1" msgstr "" msgid "'%s' must be character string or function, but is \"%s\"" msgstr "" msgid "Psi function '%s' not supported yet" msgstr "" msgid "Initialization 'init = \"%s\"' not supported (yet)" msgstr "" msgid "unable to find constants for psi function" msgstr "" msgid "As \"sigma\" is in 'pnames', do not use it as variable or parameter name in 'formula'" msgstr "" msgid "Method" msgstr "" msgid "not correctly supported yet" msgstr "" msgid "'formula' should be a formula of the type 'y ~ f(x, alpha)'" msgstr "" msgid "specifying both 'acc' and 'tol' is invalid" msgstr "" msgid "The argument 'acc' has been renamed to 'tol'; do adapt your code." msgstr "" msgid "specifying 'weights' is not yet supported for method" msgstr "" msgid "For method = \"%s\", currently 'psi' must be specified via 'control'" msgstr "" msgid "'start' must be fully named (list or numeric vector)" msgstr "" msgid "'start' must be a named list or numeric vector" msgstr "" msgid "'scale' must be NULL or a positive number" msgstr "" msgid "Do not use '%s' as a variable name or as a parameter name" msgstr "" msgid "'weights' must be nonnegative and not contain NAs" msgstr "" msgid "could not compute scale of residuals" msgstr "" msgid "failed to converge in" msgstr "" msgid "steps" msgstr "" msgid "type 'pearson' is not yet implemented" msgstr "" msgid "invalid 'type'" msgstr "" msgid "'cov' must be a p x p matrix" msgstr "" msgid "for method='" msgstr "" msgid "', 'parm' must be specified as an integer" msgstr "" msgid "profile() method not yet implemented for \"nlrob\" objects.\n Use method = \"Wald\"." msgstr "" msgid "Computing profile confidence intervals ..." msgstr "" msgid "\"boot\" method not yet implemented for \"nlrob\" objects.\n Use confint(*, method = \"Wald\")." msgstr "" msgid "Use only with 'lmrob' objects" msgstr "" msgid "'which' must be in 1:5" msgstr "" msgid "'id.n' must be in {1,..,%d}" msgstr "" msgid "recomputing robust Mahalanobis distances" msgstr "" msgid "need 'model' or 'x' component for robust Mahalanobis distances" msgstr "" msgid "saving the robust distances 'MD' as part of" msgstr "" msgid "Tolerance range must be between 0% to 100%" msgstr "" msgid "arguments of function '" msgstr "" msgid "' are (" msgstr "" msgid ") but should be (" msgstr "" msgid ")." msgstr "" msgid "invalid tuning parameter names:" msgstr "" msgid "instead of" msgstr "" msgid "You cannot specify both 'main' and the deprecated 'shortMain'" msgstr "" msgid "'shortMain' is deprecated and will get defunct." msgstr "" msgid "Use 'main = \"short\"' instead of 'shortMain = TRUE'" msgstr "" msgid "'weights' must have same length as 'x'" msgstr "" msgid "Dimension {= ncol(x)} must be 2!" msgstr "" msgid "Initial set" msgid_plural "Initial sets" msgstr[0] "" msgstr[1] "" msgid "For method = \"%s\", argument %s is not made use of" msgid_plural "For method = \"%s\", arguments %s are not made use of" msgstr[0] "" msgstr[1] "" robustbase/inst/0000755000176200001440000000000012553432152013402 5ustar liggesusersrobustbase/inst/po/0000755000176200001440000000000012553432152014020 5ustar liggesusersrobustbase/inst/po/en@quot/0000755000176200001440000000000012553432152015433 5ustar liggesusersrobustbase/inst/po/en@quot/LC_MESSAGES/0000755000176200001440000000000012553432152017220 5ustar liggesusersrobustbase/inst/po/en@quot/LC_MESSAGES/R-robustbase.mo0000644000176200001440000010525012533543274022136 0ustar liggesusersT \X;)+(3<\6)4 ;B"~AAAVPsQ ;5 7q C G (5!^!w!!!A!(! "#?"c"6}"")"/"* #.K#=z#9#4#('$"P$s$1$&$$C$6@%Bw%5%%&&$&/K&{&&&;&&& &&&&'9'OV'.' '''% (*/(1Z("(+(8(&)";)(^)).)%))&*S;*G*O*)'+0Q+-++++*+",D7,7|,$,N, (-9I-B-7-N-6M.e.B.>-/3l/%/A/?0H00a090 00: 1H1UO1/11B1262<U2624272)63/`3 3373B3B,4o4]4$4/5/456d5+5H516FB6%6676#677MR747<7;8IN8A8'8A9"D93g9.9G9:1:*A:l::2:>:: ;<F;;;;;;><,D<=q<,<<<<.=,6=c=@v=9= ==>,,>+Y> > >>$> >>?,?5?K?0k?#??? ??@'@K,@x@@@@)@ @,@+*AVAhA3A6AABBB2BIBiB|BB BEB>B!C+9C2eC9C"C+C !D.,D3[D0D(DDDAE'YE6EE/EE*F>EF!F5FFFGEG#`GG+G<G H&H6HFH0ZHAHFH I95IoI8III1IJJYJ"`J(JJ#J6JP$K*uK(K+K+K!!LCL,RLLL8LPL4@M*uM0MMM)MN!N9&N8`N#NDNDO)GO%qO)OOO OP.+PZP^P4wP/P0P Q5Q*HQ'sQ QQQQ<QSSX&S;SSS,S@S:3T-nTT8T?T&(UOUIoUMUV VTAVUV"VCW?SWOWKW,/X\XyX%XMX, Y":Y']YY:YY-Y3"Z.VZ2ZAZ=Z88[,q[&["[5[.\M\Gb\:\F\9,]f] ]&]3]]^^;^Q^W^ Y^f^h^y^^9^O^.(_ W_'c_%_*_1_"`+1`8]`&`"`(` a.$a%Saya&a[aGbOab)b0b- c:cQcbc*ccHc7 d$BdNgd d=dNe7deNe>ee*fJfFf7"g%ZgAgCgh4h=Th hh:hiUi3kiiBiij<j6\j4j7j)k/*k Zkgk7{kBkBk9l]Kl$l/l/l6.m+emHm5mNn%_nn7n'n7nM,o4zo<o;oI(pErp'pIp"*q3Mq.qGqqr*'rRrqr:r>r:r@8s!ysssss>t,FtEst4tttt2u0Lu}uLu=u v%v=v,Vv+v v vv$v w!w=wVw_w#uw0w#www xx5xUxKZxxxxx)x (y,3y/`yyy7y:y4zOzbzuzzzzzz zIzB6{!y{+{2{9{"4|+W| |.|3|0|("}K}e}E}}'}6}"~/4~d~*~F~!~5N_vI#/@KȀ0܀A NO 98L_1fJ", :#W6{P*(./W3!݄068T\82#4V)†ۆ=<'[DDȇ) )7)a# ň.(,8E/~0߉5*'E m{}dk6'AM~(>B+OTHsN/AC 4$3&p=zE"D-%LfKj1o S*|.{G2JKrTO4v 6+Jw-IQ3_<Pm5.#2MB,'7 %;S9=05C/> :UtEIR?@x@9P 1"&nah](<:^RG0b)[qluL8Z e #Yi,\;`D VHy?N *W)g7!c F$F8!QX!! Using default."boot" method not yet implemented for "nlrob" objects. Use confint(*, method = "Wald").%s-step did NOT converge. Returning unconverged %s-estimate'' are ('%s' is not a valid family (see ?family)'%s' must be a function, numeric vector of length p, or NULL'%s' must be character string or function, but is "%s"', 'parm' must be specified as an integer'. Using defaults.'Anova Table' for a single model not yet implemented'Anova Table' for a single model object not yet implemented'X' must have at least two columns'coef' must not be negative'control$n.group' is not much larger than 'p', probably too small'control$n.group' must be larger than 'p' for 'large_n' algorithm'control' is missing'cov' must be a p x p matrix'epsw' must be numeric(1) or a function of nobs(obj.) which returns a numeric(1)'epsx' must be numeric(1) or a function of max(abs(x)) which returns a numeric(1)'formula' missing or incorrect'formula' should be a formula of the type 'y ~ f(x, alpha)''full.h' is true, but 'hsets.init' has less than n rows'groups * n.group' must be smaller than 'n' for 'large_n' algorithm'hsets.init' must be a h' x L matrix (h' >= h) of observation indices'hsets.init' must be in {1,2,...,n}; n ='id.n' must be in {1,..,'id.n' must be in {1,..,%d}'nsamp = "best"' allows maximally'nsamp' options 'best' and 'exact' not allowed for n greater than'offset' is not yet implemented for "BY"'offset' not fully implemented'pnames' must be a character vector'psi' should be one of %s'qr.out = TRUE' for univariate location is disregarded'reflect part' in'scale' must be NULL or a positive number'shortMain' is deprecated and will get defunct.'start' cannot yet be passed to glmrobBY()'start' must be a named list or numeric vector'start' must be a numeric vector, NULL, or a character string'start' must be an initial estimate of beta, of length %d'start' must be fully named (list or numeric vector)'weights' as formula should be one-sided'weights' must be a numeric vector'weights' must be non-negative'weights' must be nonnegative and not contain NAs'weights' must have same length as 'x''weights.on.x' ='weights.on.x' must be a string, function, list or numeric n-vector'wgtFUN' must be a function or a string specifying one'wgtFUN' must be a function or a string specifying such a function'wgtFUN' must be a function or one of the strings %s.'which' must be in 1:5'x' must be a numeric matrix(probably because of rounding errors).(tau / hybrid / tauold): tau not found in 'obj') but should be ()$).**** sampling iterations were not sufficient. Please report*and*,-estimate in.. Using default.. Please report!. Will use default..psi.lqq.findc: unable to find constants for psi function.vcov.avar1 can only be used when initial estimator is S; using .vcov.w instead.vcov.avar1() supports only SM or MM estimates0s in V(mu):.vcov.w: Caution. Some psiprime are NA:.vcov.w: cov.corrfact must be one of:.vcov.w: cov.dfcorr has to be one of -1:3:.vcov.w: cov.hubercorr must be logical (or NULL):.vcov.w: cov.resid must be one of:.vcov.w: cov.xwx must be logical (or NULL):.vcov.w: ignoring cov.resid == final since est != final:.vcov.w: scale missing, using D scale:.vcov.w: unsupported psi function:summary.lmrob: unsupported psi functionAlgorithm did not convergeAll models are refitted except the largest oneAll observations have missing values!All weights must be positiveAll weights.on.x must be none negativeAs "sigma" is in 'pnames', do not use it as variable or parameter name in 'formula'Assuming constant prediction variance even though model fit is weightedAssuming prediction variance inversely proportional to weights used for fittingC function R_lmrob_S() exited prematurelyCannot find valid starting values: You need helpCoef. for Hampel psi function not of length 3Coef. for psi functionCoefficients forComputing 5000 subsets of sizeComputing profile confidence intervals ...Convergence AchievedCurrently, only family 'poisson' is supported for the "MT" estimatorData set and provided center have different dimensions!Detected possible local breakdown ofDiagnostic plot is not available for univar iate location and scale estimationDimension {= ncol(x)} must be 2!Do not use '%s' as a variable name or as a parameter nameEither specify 'pnames' or provide 'upper' or 'lower' with names()Error: neither breakdown point nor efficiency specifiedFailed to compute robust Mahalanobis distances, reverting to robust leverages.For method 'cubif', use glmRob() from package 'robust'For method = "%s", argument %s is not made use ofFor method = "%s", arguments %s are not made use ofFor method = "%s", currently 'psi' must be specified via 'control'For test = 'Deviance', the estimator chain has to end with 'M'For tolerance ellipses the dimension 'p' must be 2!Implosion: sigma1=%g became too smallIndex plot of standardized residuals is not avalable if scale = 0Initial estimator '%s' not supported; using S-estimator insteadInitial setInitial setsInitialization 'init = "%s"' not supported (yet)Invalid scalefn='%s': must be function or a valid stringInvalid number of trials nsamp =Invalid number of trials nsamp=Invalid number of trials nsamp=%s. Using default nsamp=%d.KS2014Length of initial coefficients vector does not match rank of singular design matrix xMaximum 5000 subsets allowed for option 'best'.MethodMethods argument set by method is different from method in controlModels are not nested!Models are not strictly nestedMore dimensions than observations, currently not implementedMore dimensions than observations: not yet implementedMore than h of the observations lie on a hyperplane.More than half of the observations lie on a hyperplane.NA coefficient (at %s) from "best" subsetNA coefs in full and reduced model do not matchNAs in V(mu)NAs in d(mu)/d(eta)Need more than twice as many observations as variables.No categorical variables found in model. Reverting to S-estimator.No continuous variables found in model. Reverting to L1-estimator.No convergence inNo weights defined for this object. Use type="robustness" argument to get robustness weights.Non-finite coefficients at iterationNot the same method used for fitting the modelsNot the same response used in the fitted modelsNot the same tuning constant c used in the robust fitsNumber of observations in x and y not equalNumber of offsets is %d, should rather equal %d (number of observations)Only 'test.acc = "coef"' is currently implementedOptions 'best' and 'exact' not allowed for n greater than 2*nmini-1 =Please fit the nested models by lmrobPlease report!Predictions on current data refer to _future_ responsesPsi function '%s' not supported yetRegression Diagnostic plot is not avalable if scale = 0Regression Diagnostic plot is not avalable: option mcd=F was set in ltsReg().Robust GLM fitting not yet implemented for family %sS-estimated scale == 0: Probably exact fit; check your dataSample size n < h(alpha; n,p) := size of "good" subsampleStandardized residuals vs Fitted values plot is not avalable if scale = 0Std.error computation not yet available for the case of 'weights'The MCD covariance matrix was singular.The argument 'acc' has been renamed to 'tol'; do adapt your code.The covariance matrix is singular!The first object does not contain the largest modelThe sample size must be greater than 1 for svdThere is at least one constant column. Remove it and set intercept=TRUEThis family is not implementedThis may take aTolerance range must be between 0% to 100%Unexpected 'exactfit' codeUnknown setting 'Use 'main = "short"' instead of 'shortMain = TRUE'Use c(minimal slope, b, efficiency, breakdown point) or k[1:3]Use c({0, } minimal slope, b, efficiency, breakdown point)Use lmrob argument 'setting="KS2014"' to avoid this problem.Use only with 'lmrob' objectsUse only with 'lts' objectsUse only with 'mcd' objectsUsing the former, method =Weighting methodX'WX is almost singular. Consider rather using cov = ".vcov.w"X'WX is singular. Rather use cov = ".vcov.w"You cannot specify both 'main' and the deprecated 'shortMain'You should use different 'eps' and/or 'eps1'])values^list\(alpha not inside [1/2, 1]anova.glmrob() only works for 'glmrob' objectsanova.lmrob() only works for 'lmrob' objectsare disregarded inargument 'm.cov' must have numeric components 'center' and 'cov'argument 'psi' must be a string (denoting a psi function)argumentsarguments of function 'betaExacto glm(.) error:calculations stopped prematurely in rllarsbicalling predict.lm() ...coefficientcoefficientscontrol is missingcould not compute scale of residualscouldn't find optimizer functionderiv must be in {-1,0,1,2}deriv must be in {0,1,2}dropped:failed to converge infamily '%s' not yet implementedfitted probabilities numerically 0 or 1 occurredfitted rates numerically 0 occurredfixingfor fitting methodfor method='function incorrectly specified.glm(.) {inner subsample} error:id.nif a list, weights.on.x must contain a covariance function such as covMcd()illegal 'family' argumentillegal 'singularity$kind'inincompatible dimensionsinit must be "S", "M-S", function or listinstead ofinternal logic error in psi() function name:invalid 'lmrob' object: no terms componentinvalid 'method':invalid 'posdef.meth':invalid 'psi'=%s; possibly use .regularize.Mpsi(%s)invalid 'seed'. Must be compatible with .Random.seed !invalid 'start' stringinvalid 'test'invalid 'type'invalid 'variant':invalid first argumentinvalid tuning parameter names:is not implementediteriter2iterationslmrob object does not have a proper 'qr' component. Rank must be zerolmrob object does not have a proper 'qr' component. Rank zero?lmrob..D..fit: control is missinglmrob..D..fit: parameter psi is not definedlmrob..D..fit: parameter tuning.psi is not numericlmrob..D..fit: prior estimator did not converge, stoppinglmrob..D..fit: residuals undefinedlmrob..D..fit: robustness weights undefinedlong time!lower must be either of length %d, or length 1maximal number of *inner* step halvings must be > 0maximum number of "kstep" iterations must be > 0maximum number of iterations must be > 0mc(): not 'converged'method for psi functionmethod='%s' is only applicable for binomial family, but family=""missing or negative weights not allowedmodels were not all fitted to the same size of datasetmust be in {1,..,n < 2 * p, i.e., possibly too small sample sizen <= p -- you can't be serious!n == p+1 is too small sample size for MCDneed 'model' or 'x' component for robust Mahalanobis distancesneed non-negative number of nodesneed non-robust working residuals for this model typenegative eigen([negative scale 's'no intercept in the modelno valid subsample found in LTS - set 'nsamp' or rather use lmrob.S()non-function specified as optimizernon-implemented test method:non-trivial 'offset' is not yet implementednon-trivial prior 'weights' are not yet implemented for "BY"not correctly supported yetnot implementednot of length 1not yet implementednsamp > i_max := maximal integer -- not allowed;number of offsets is %d, should equal %d (number of observations)number of rows in 'x' and length of 'object$rweights' must be the samenumber of subsamples must be > 0only M and D are steps supported after "init" computationoptim(.) non-convergence:optimizer function must use (at least) formal parametersoptimizer="optimx"optimxoption hcorr is ignored for cov.corrfact = asymptoriginal detmcd() wrongly declared c-step convergence (obj=%g, i=%d, j=%d)out ofpackage must be loaded in order toparameter names must appear in 'formula'parameter psi is not definedparameter tuning.psi is not numericprediction from a rank-deficient fit may be misleadingprofile() method not yet implemented for "nlrob" objects. Use method = "Wald".provide better scale; must be all positiverecomputing robust Mahalanobis distancessaving the robust distances 'MD' as part ofscale 's' is zero -- returning initial 'mu'scale MAD is zero for this sampleset to i_max =setting 'y = FALSE' has no longer any effectsingular fit encounteredsome eigenvalues are negativesome terms will have NAs due to the limits of the methodspecifying 'pnames' is deprecated; rather 'lower' or 'upper' should have names()specifying 'weights' is not yet supported for methodspecifying both 'acc' and 'tol' is invalidstart 'theta' has still NA's .. badly singular xstepssteps.subsample size h < n/2 may be too smallsubsets; computing thesetestthe following arguments to 'anova.glmrob' are invalid andthe following arguments to 'anova.lmrob' are invalid andtuning constant 'cc' is not numerictuning constants for ggw psi: both eff and bp specified, ignoring bptuning constants for lqq psi: both eff and bp specified, ignoring bptuning parameter (chi/psi) is not numerictype 'pearson' is not yet implementedunable to find constants for psi functionunknown 'wgtFUN' specification:unknown init argumentunknown setting for parameter ssunknown split typeupper must be either of length %d, or length 1usevalue of acc must be > 0value of the tuning constant c ('const') must be > 0value of the tuning constant c (cw) must be > 0value of the tuning constant c (tcc) must be > 0veryweights should not be both close to 0 and close to 1!weights.on.x needs %d none-negative valuesx is not a numeric dataframe or matrix.x is singulary is not a numericy is not onedimensional}Project-Id-Version: robustbase 0.92-4 POT-Creation-Date: 2015-06-03 11:16 PO-Revision-Date: 2015-06-03 11:16 Last-Translator: Automatically generated Language-Team: none MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Language: en Plural-Forms: nplurals=2; plural=(n != 1); !! Using default."boot" method not yet implemented for "nlrob" objects. Use confint(*, method = "Wald").%s-step did NOT converge. Returning unconverged %s-estimate'' are (‘%s’ is not a valid family (see ?family)‘%s’ must be a function, numeric vector of length p, or NULL‘%s’ must be character string or function, but is "%s"‘, ’parm' must be specified as an integer'. Using defaults.‘Anova Table’ for a single model not yet implemented‘Anova Table’ for a single model object not yet implemented‘X’ must have at least two columns‘coef’ must not be negative‘control$n.group’ is not much larger than ‘p’, probably too small‘control$n.group’ must be larger than ‘p’ for ‘large_n’ algorithm‘control’ is missing‘cov’ must be a p x p matrix‘epsw’ must be numeric(1) or a function of nobs(obj.) which returns a numeric(1)‘epsx’ must be numeric(1) or a function of max(abs(x)) which returns a numeric(1)‘formula’ missing or incorrect‘formula’ should be a formula of the type ‘y ~ f(x, alpha)’‘full.h’ is true, but ‘hsets.init’ has less than n rows‘groups * n.group’ must be smaller than ‘n’ for ‘large_n’ algorithm‘hsets.init’ must be a h' x L matrix (h' >= h) of observation indices‘hsets.init’ must be in {1,2,...,n}; n =‘id.n’ must be in {1,..,‘id.n’ must be in {1,..,%d}‘nsamp = "best"’ allows maximally‘nsamp’ options ‘best’ and ‘exact’ not allowed for n greater than‘offset’ is not yet implemented for "BY"‘offset’ not fully implemented‘pnames’ must be a character vector‘psi’ should be one of %s‘qr.out = TRUE’ for univariate location is disregarded‘reflect part’ in‘scale’ must be NULL or a positive number‘shortMain’ is deprecated and will get defunct.‘start’ cannot yet be passed to glmrobBY()‘start’ must be a named list or numeric vector‘start’ must be a numeric vector, NULL, or a character string‘start’ must be an initial estimate of beta, of length %d‘start’ must be fully named (list or numeric vector)‘weights’ as formula should be one-sided‘weights’ must be a numeric vector‘weights’ must be non-negative‘weights’ must be nonnegative and not contain NAs‘weights’ must have same length as ‘x’‘weights.on.x’ =‘weights.on.x’ must be a string, function, list or numeric n-vector‘wgtFUN’ must be a function or a string specifying one‘wgtFUN’ must be a function or a string specifying such a function‘wgtFUN’ must be a function or one of the strings %s.‘which’ must be in 1:5‘x’ must be a numeric matrix(probably because of rounding errors).(tau / hybrid / tauold): tau not found in ‘obj’) but should be ()$).**** sampling iterations were not sufficient. Please report*and*,-estimate in.. Using default.. Please report!. Will use default..psi.lqq.findc: unable to find constants for psi function.vcov.avar1 can only be used when initial estimator is S; using .vcov.w instead.vcov.avar1() supports only SM or MM estimates0s in V(mu):.vcov.w: Caution. Some psiprime are NA:.vcov.w: cov.corrfact must be one of:.vcov.w: cov.dfcorr has to be one of -1:3:.vcov.w: cov.hubercorr must be logical (or NULL):.vcov.w: cov.resid must be one of:.vcov.w: cov.xwx must be logical (or NULL):.vcov.w: ignoring cov.resid == final since est != final:.vcov.w: scale missing, using D scale:.vcov.w: unsupported psi function:summary.lmrob: unsupported psi functionAlgorithm did not convergeAll models are refitted except the largest oneAll observations have missing values!All weights must be positiveAll weights.on.x must be none negativeAs "sigma" is in ‘pnames’, do not use it as variable or parameter name in ‘formula’Assuming constant prediction variance even though model fit is weightedAssuming prediction variance inversely proportional to weights used for fittingC function R_lmrob_S() exited prematurelyCannot find valid starting values: You need helpCoef. for Hampel psi function not of length 3Coef. for psi functionCoefficients forComputing 5000 subsets of sizeComputing profile confidence intervals ...Convergence AchievedCurrently, only family ‘poisson’ is supported for the "MT" estimatorData set and provided center have different dimensions!Detected possible local breakdown ofDiagnostic plot is not available for univar iate location and scale estimationDimension {= ncol(x)} must be 2!Do not use ‘%s’ as a variable name or as a parameter nameEither specify ‘pnames’ or provide ‘upper’ or ‘lower’ with names()Error: neither breakdown point nor efficiency specifiedFailed to compute robust Mahalanobis distances, reverting to robust leverages.For method ‘cubif’, use glmRob() from package ‘robust’For method = "%s", argument %s is not made use ofFor method = "%s", arguments %s are not made use ofFor method = "%s", currently ‘psi’ must be specified via ‘control’For test = ‘Deviance’, the estimator chain has to end with ‘M’For tolerance ellipses the dimension ‘p’ must be 2!Implosion: sigma1=%g became too smallIndex plot of standardized residuals is not avalable if scale = 0Initial estimator ‘%s’ not supported; using S-estimator insteadInitial setInitial setsInitialization ‘init = "%s"’ not supported (yet)Invalid scalefn=‘%s’: must be function or a valid stringInvalid number of trials nsamp =Invalid number of trials nsamp=Invalid number of trials nsamp=%s. Using default nsamp=%d.KS2014Length of initial coefficients vector does not match rank of singular design matrix xMaximum 5000 subsets allowed for option ‘best’.MethodMethods argument set by method is different from method in controlModels are not nested!Models are not strictly nestedMore dimensions than observations, currently not implementedMore dimensions than observations: not yet implementedMore than h of the observations lie on a hyperplane.More than half of the observations lie on a hyperplane.NA coefficient (at %s) from "best" subsetNA coefs in full and reduced model do not matchNAs in V(mu)NAs in d(mu)/d(eta)Need more than twice as many observations as variables.No categorical variables found in model. Reverting to S-estimator.No continuous variables found in model. Reverting to L1-estimator.No convergence inNo weights defined for this object. Use type="robustness" argument to get robustness weights.Non-finite coefficients at iterationNot the same method used for fitting the modelsNot the same response used in the fitted modelsNot the same tuning constant c used in the robust fitsNumber of observations in x and y not equalNumber of offsets is %d, should rather equal %d (number of observations)Only ‘test.acc = "coef"’ is currently implementedOptions ‘best’ and ‘exact’ not allowed for n greater than 2*nmini-1 =Please fit the nested models by lmrobPlease report!Predictions on current data refer to _future_ responsesPsi function ‘%s’ not supported yetRegression Diagnostic plot is not avalable if scale = 0Regression Diagnostic plot is not avalable: option mcd=F was set in ltsReg().Robust GLM fitting not yet implemented for family %sS-estimated scale == 0: Probably exact fit; check your dataSample size n < h(alpha; n,p) := size of "good" subsampleStandardized residuals vs Fitted values plot is not avalable if scale = 0Std.error computation not yet available for the case of ‘weights’The MCD covariance matrix was singular.The argument ‘acc’ has been renamed to ‘tol’; do adapt your code.The covariance matrix is singular!The first object does not contain the largest modelThe sample size must be greater than 1 for svdThere is at least one constant column. Remove it and set intercept=TRUEThis family is not implementedThis may take aTolerance range must be between 0% to 100%Unexpected ‘exactfit’ codeUnknown setting 'Use ‘main = "short"’ instead of ‘shortMain = TRUE’Use c(minimal slope, b, efficiency, breakdown point) or k[1:3]Use c({0, } minimal slope, b, efficiency, breakdown point)Use lmrob argument ‘setting="KS2014"’ to avoid this problem.Use only with ‘lmrob’ objectsUse only with ‘lts’ objectsUse only with ‘mcd’ objectsUsing the former, method =Weighting methodX'WX is almost singular. Consider rather using cov = ".vcov.w"X'WX is singular. Rather use cov = ".vcov.w"You cannot specify both ‘main’ and the deprecated ‘shortMain’You should use different ‘eps’ and/or ‘eps1’])values^list\(alpha not inside [1/2, 1]anova.glmrob() only works for ‘glmrob’ objectsanova.lmrob() only works for ‘lmrob’ objectsare disregarded inargument ‘m.cov’ must have numeric components ‘center’ and ‘cov’argument ‘psi’ must be a string (denoting a psi function)argumentsarguments of function 'betaExacto glm(.) error:calculations stopped prematurely in rllarsbicalling predict.lm() ...coefficientcoefficientscontrol is missingcould not compute scale of residualscouldn't find optimizer functionderiv must be in {-1,0,1,2}deriv must be in {0,1,2}dropped:failed to converge infamily ‘%s’ not yet implementedfitted probabilities numerically 0 or 1 occurredfitted rates numerically 0 occurredfixingfor fitting methodfor method='function incorrectly specified.glm(.) {inner subsample} error:id.nif a list, weights.on.x must contain a covariance function such as covMcd()illegal ‘family’ argumentillegal ‘singularity$kind’inincompatible dimensionsinit must be "S", "M-S", function or listinstead ofinternal logic error in psi() function name:invalid ‘lmrob’ object: no terms componentinvalid ‘method’:invalid ‘posdef.meth’:invalid ‘psi’=%s; possibly use .regularize.Mpsi(%s)invalid ‘seed’. Must be compatible with .Random.seed !invalid ‘start’ stringinvalid ‘test’invalid ‘type’invalid ‘variant’:invalid first argumentinvalid tuning parameter names:is not implementediteriter2iterationslmrob object does not have a proper ‘qr’ component. Rank must be zerolmrob object does not have a proper ‘qr’ component. Rank zero?lmrob..D..fit: control is missinglmrob..D..fit: parameter psi is not definedlmrob..D..fit: parameter tuning.psi is not numericlmrob..D..fit: prior estimator did not converge, stoppinglmrob..D..fit: residuals undefinedlmrob..D..fit: robustness weights undefinedlong time!lower must be either of length %d, or length 1maximal number of *inner* step halvings must be > 0maximum number of "kstep" iterations must be > 0maximum number of iterations must be > 0mc(): not ‘converged’method for psi functionmethod=‘%s’ is only applicable for binomial family, but family=""missing or negative weights not allowedmodels were not all fitted to the same size of datasetmust be in {1,..,n < 2 * p, i.e., possibly too small sample sizen <= p -- you can't be serious!n == p+1 is too small sample size for MCDneed ‘model’ or ‘x’ component for robust Mahalanobis distancesneed non-negative number of nodesneed non-robust working residuals for this model typenegative eigen([negative scale ‘s’no intercept in the modelno valid subsample found in LTS - set ‘nsamp’ or rather use lmrob.S()non-function specified as optimizernon-implemented test method:non-trivial ‘offset’ is not yet implementednon-trivial prior ‘weights’ are not yet implemented for "BY"not correctly supported yetnot implementednot of length 1not yet implementednsamp > i_max := maximal integer -- not allowed;number of offsets is %d, should equal %d (number of observations)number of rows in ‘x’ and length of ‘object$rweights’ must be the samenumber of subsamples must be > 0only M and D are steps supported after "init" computationoptim(.) non-convergence:optimizer function must use (at least) formal parametersoptimizer="optimx"optimxoption hcorr is ignored for cov.corrfact = asymptoriginal detmcd() wrongly declared c-step convergence (obj=%g, i=%d, j=%d)out ofpackage must be loaded in order toparameter names must appear in ‘formula’parameter psi is not definedparameter tuning.psi is not numericprediction from a rank-deficient fit may be misleadingprofile() method not yet implemented for "nlrob" objects. Use method = "Wald".provide better scale; must be all positiverecomputing robust Mahalanobis distancessaving the robust distances ‘MD’ as part ofscale ‘s’ is zero -- returning initial ‘mu’scale MAD is zero for this sampleset to i_max =setting ‘y = FALSE’ has no longer any effectsingular fit encounteredsome eigenvalues are negativesome terms will have NAs due to the limits of the methodspecifying ‘pnames’ is deprecated; rather ‘lower’ or ‘upper’ should have names()specifying ‘weights’ is not yet supported for methodspecifying both ‘acc’ and ‘tol’ is invalidstart ‘theta’ has still NA's .. badly singular xstepssteps.subsample size h < n/2 may be too smallsubsets; computing thesetestthe following arguments to ‘anova.glmrob’ are invalid andthe following arguments to ‘anova.lmrob’ are invalid andtuning constant ‘cc’ is not numerictuning constants for ggw psi: both eff and bp specified, ignoring bptuning constants for lqq psi: both eff and bp specified, ignoring bptuning parameter (chi/psi) is not numerictype ‘pearson’ is not yet implementedunable to find constants for psi functionunknown ‘wgtFUN’ specification:unknown init argumentunknown setting for parameter ssunknown split typeupper must be either of length %d, or length 1usevalue of acc must be > 0value of the tuning constant c (‘const’) must be > 0value of the tuning constant c (cw) must be > 0value of the tuning constant c (tcc) must be > 0veryweights should not be both close to 0 and close to 1!weights.on.x needs %d none-negative valuesx is not a numeric dataframe or matrix.x is singulary is not a numericy is not onedimensional}robustbase/inst/CITATION0000644000176200001440000000354611721663343014553 0ustar liggesuserscitHeader("To cite robustbase in publications use:") ## R >= 2.8.0 passes package metadata to citation(). if(!exists("meta") || is.null(meta)) meta <- packageDescription("robustbase") year <- sub("-.*", "", meta$Date) note <- sprintf("R package version %s", meta$Version) citEntry(entry = "Manual", title = "{robustbase}: Basic Robust Statistics", author = personList(as.person("Peter Rousseeuw"), as.person("Christophe Croux"), as.person("Valentin Todorov"), as.person("Andreas Ruckstuhl"), as.person("Matias Salibian-Barrera"), as.person("Tobias Verbeke"), as.person("Manuel Koller"), as.person("Martin Maechler")), year = year, note = note, url = "http://CRAN.R-project.org/package=robustbase", textVersion = paste("Peter Rousseeuw, Christophe Croux, Valentin Todorov, Andreas Ruckstuhl, Matias Salibian-Barrera, Tobias Verbeke, Manuel Koller, Martin Maechler", sprintf("(%s).", year), "robustbase: Basic Robust Statistics.", paste(note, ".", sep = ""), "URL http://CRAN.R-project.org/package=robustbase") ) citEntry(entry = "Article", title = "An Object-Oriented Framework for Robust Multivariate Analysis", author = personList(as.person("Valentin Todorov"), as.person("Peter Filzmoser")), journal = "Journal of Statistical Software", year = "2009", volume = "32", number = "3", pages = "1--47", url = "http://www.jstatsoft.org/v32/i03/", textVersion = paste("Valentin Todorov, Peter Filzmoser (2009).", "An Object-Oriented Framework for Robust Multivariate Analysis.", "Journal of Statistical Software, 32(3), 1-47.", "URL http://www.jstatsoft.org/v32/i03/."), header = "To cite the multivariate class/methods framework use:" ) robustbase/inst/NEWS.Rd0000644000176200001440000001115612553432042014447 0ustar liggesusers% Check from R: % news(db = tools:::.build_news_db_from_package_NEWS_Rd("~/R/Pkgs/robustbase/inst/NEWS.Rd")) \name{NEWS} \title{News for \R Package \pkg{robustbase}} \encoding{UTF-8} \section{CHANGES in robustbase VERSION 0.92-5 (2015-07-21, svn r698)}{ \subsection{NEW FEATURES}{ \itemize{ \item \emph{Hidden} \code{print()} methods \code{print.summary.lmrob()} and \code{print.lmrob.S()} get a \code{showAlgo = TRUE} argument which the user can set to FALSE in order to suppress printing of the \dQuote{Algorithmic parameters}. \item import (remaining parts) from "base" packages. \item \code{summary()} now also prints a summary on the residuals. } } \subsection{BUG FIXES}{ \itemize{ \item \code{summary(lmrob(.))}'s variance-covariance matrix is now called \code{cov} instead of \code{cov.unscaled} (because it \emph{is} scaled). Code which has been using \code{vcov()} or \code{ $ cov}, or even \code{ $ cov} is not affected. } } } \section{CHANGES in robustbase VERSION 0.92-4 (2015-06-04, svn r696)}{ \subsection{NEW FEATURES}{ \itemize{ \item Started this \file{NEWS.Rd} file, to eventually replace the \file{ChangeLog} \item \code{plot.lmrob()} also identifies largest residuals as \code{plot.lm()}. Also gets new argument \code{panel}, and \code{add.smooth=TRUE} behavior. \item adapt to the fact that R 3.3.0 will have its own \code{sigma()} S3 generic. \item setup for having message translations (volunteers sought!). } } \subsection{BUG FIXES}{ \itemize{ \item more careful in \file{../src/mc.c} (valgrind, thanks to Brian) \item add missing documentation, better examples for \code{predict.lmrob} \item \code{warn.limit.*} checks in \code{lmrob*()} } } } \section{CHANGES in robustbase VERSION 0.92-3 (2015-01-14, svn r679)}{ \subsection{NEW FEATURES}{ \itemize{ \item The \sQuote{Co-Median} \code{covComed()} from Maria Anna, tweaked by Valentin and modified considerably by Martin. \item Also document (and export) \code{r6pack()} utility. \item New \code{smoothWgt()} function --- \dQuote{Biweight on a Stick} --- to be used as \code{wgtFUN} option for \code{covMcd()} or \code{covComed()}. \item New utility \code{colMedians()} and \code{rowMedians}, as we use columnwise medians in so many places. } } \subsection{BUG FIXES}{ \itemize{ \item Tweaks to \code{medcouple()}, after detecting flaws -- which may be inherent and have \emph{not} been removed. \item Improved its documentation and the \code{adjOutlyingness()} one, notably its \dQuote{central} case. } } } \section{CHANGES in robustbase VERSION 0.92-2 (2014-11-22, svn r660)}{ \subsection{BUG FIXES}{ \itemize{ \item \code{covMcd()} with new options (\code{kmini}, \code{nmini}) now ok (sometimes wrong in 0.92-1). } } } \section{CHANGES in robustbase VERSION 0.92-1 (2014-11-18)}{ \subsection{NEW FEATURES}{ \itemize{ \item The deterministic MCD, via \code{covMcd(..., nsamp="deterministic")}. } } \subsection{BUG FIXES}{ \itemize{ \item \code{adjOutlyingness()}: reverse the defaults of \code{clower} and \code{cupper} and fix an \dQuote{eternal} erronous \eqn{\pm}{+/-} swap; see new note in \file{man/adjOutlyingness.Rd}. } } } \section{CHANGES in robustbase VERSION 0.92-0 (2014-11-18)}{ \subsection{NEW FEATURES}{ \itemize{ \item \code{nlrob()} now works with indexed vector parameters. \item new \code{outlierStats()} (Manuel). } } \subsection{BUG FIXES}{ \itemize{ \item got rid of Fortran compiler warnings about \bold{ancient} style. \item \code{nlrob(*, weigths)}, fixing R-forge bug #5988. \item \code{covMcd()} fix for \dQuote{MAD = 0} case (new \code{exactfit} code 3). } } } \section{CHANGES in robustbase VERSION 0.91-1 (2014-05-01)}{ \subsection{BUG FIXES}{ \itemize{ \item . } } } \section{CHANGES in robustbase VERSION 0.91-0 (2014-04-24)}{ \subsection{NEW FEATURES}{ \itemize{ \item . } } \subsection{BUG FIXES}{ \itemize{ \item . } } } \section{CHANGES in robustbase VERSION 0.90-1 (2014-01-30)}{ \subsection{NEW FEATURES}{ \itemize{ \item . } } \subsection{BUG FIXES}{ \itemize{ \item . } } } \section{CHANGES in robustbase VERSION 0.9-8 (2013-06-14)}{ \subsection{NEW FEATURES}{ \itemize{ \item . } } \subsection{BUG FIXES}{ \itemize{ \item . } } } robustbase/inst/xtraR/0000755000176200001440000000000012553432151014501 5ustar liggesusersrobustbase/inst/xtraR/test_LTS.R0000644000176200001440000001033311757675047016347 0ustar liggesusers#### Utility functions for testing ltsReg() #### -------------------------------------- ../tests/tlts.R repLTS <- function(form, data, nrep = 1, method = c("FASTLTS","MASS")) { if(method == "MASS") ## MASS::lqs(x,y,control=list(psamp = NA, nsamp= "best", adjust= FALSE)) for(i in 1:nrep) MASS::lqs(form, data = data, method = "lts") else ## set mcd=FALSE - we want to time only the LTS algorithm for(i in 1:nrep) ltsReg(form, data = data, mcd = FALSE) } doLTSdata <- function(nrep = 1, time = nrep >= 3, short = time, full = !short, method = c("FASTLTS", "MASS")) { ##@bdescr ## Test the function ltsReg() on the literature datasets: ## ## Call ltsReg() for "all" regression datasets available in robustbase ## and print: ## - execution time (if time) ## - objective function ## - best subsample found (if not short) ## - outliers identified (with cutoff 0.975) (if not short) ## - estimated coeficients and scale (if full) ## ##@edescr ## ##@in nrep : [integer] number of repetitions to use for estimating the ## (average) execution time ##@in time : [boolean] whether to evaluate the execution time ##@in short : [boolean] whether to do short output (i.e. only the ## objective function value). If short == FALSE, ## the best subsample and the identified outliers are ## printed. See also the parameter full below ##@in full : [boolean] whether to print the estimated coeficients and scale ##@in method : [character] select a method: one of (FASTLTS, MASS) dolts <- function(form, dname, dataset, nrep = 1) { if(missing(dataset)) { data(list = dname) dataset <- get(dname) } else if(missing(dname)) dname <- deparse(substitute(dataset)) environment(form) <- environment() ## !?! x <- model.matrix(form, model.frame(form, data = dataset)) dx <- dim(x) - 0:1 # not counting intercept if(method == "MASS") { lts <- MASS::lqs(form, data = dataset, method = "lts") quan <- (dx[1] + (dx[2] + 1) + 1)/2 #default: (n+p+1)/2 } else { lts <- ltsReg(form, data = dataset, mcd = FALSE) quan <- lts$quan } xres <- sprintf("%*s %3d %3d %3d %12.6f", lname, dname, dx[1], dx[2], as.integer(quan), lts$crit) if(time) { xtime <- system.time(repLTS(form, data = dataset, nrep, method))[1] xres <- sprintf("%s %10.1f", xres, 1000 * xtime / nrep) } cat(xres, "\n") if(!short) { cat("Best subsample: \n") print(lts$best) ibad <- which(lts$lts.wt == 0) names(ibad) <- NULL nbad <- length(ibad) cat("Outliers: ",nbad,"\n") if(nbad > 0) print(ibad) if(full) { cat("-------------\n") print(lts) print(summary(lts)) } cat("--------------------------------------------------------\n") } } method <- match.arg(method) data(heart) data(starsCYG) data(phosphor) data(stackloss) data(coleman) data(salinity) data(aircraft) data(delivery) data(wood) data(hbk) cll <- sys.call() cat("\nCall: ", deparse(substitute(cll)),"\n") cat("========================================================\n") cat("Data Set n p Half obj Time [ms]\n") cat("========================================================\n") ## 1 3 5 7 9.1 3 5 7 9. 123 123 lname <- 20 ## --^ dolts(clength ~ . , "heart", nrep = nrep) dolts(log.light ~ log.Te , "starsCYG", nrep = nrep) dolts(plant ~ . , "phosphor", nrep = nrep) dolts(stack.loss ~ . , "stackloss", nrep = nrep) dolts(Y ~ . , "coleman", nrep = nrep) dolts(Y ~ . , "salinity") dolts(Y ~ . , "aircraft") dolts(delTime ~ . , "delivery") dolts(y ~ . , "wood", nrep = nrep) dolts(Y ~ . , "hbk", nrep = nrep) cat("========================================================\n") } robustbase/inst/xtraR/m-s_fns.R0000644000176200001440000001317612174500122016167 0ustar liggesusers#### Testing M-S estimator --- self-contained utility functions --- #### ## Exercised from ../../tests/m-s-estimator.R ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Test subsampling algorithm m_s_subsample <- function(x1, x2, y, control, orthogonalize=TRUE) { storage.mode(x1) <- "double" storage.mode(x2) <- "double" storage.mode(y) <- "double" z <- .C(robustbase:::R_lmrob_M_S, x1, x2, y, res=double(length(y)), n=length(y), p1=NCOL(x1), p2=NCOL(x2), nResample=as.integer(control$nResample), max_it_scale=as.integer(control$maxit.scale), scale=double(1), b1=double(NCOL(x1)), b2=double(NCOL(x2)), tuning_chi=as.double(control$tuning.chi), ipsi=robustbase:::.psi2ipsi(control$psi), bb=as.double(control$bb), K_m_s=as.integer(control$k.m_s), max_k=as.integer(control$k.max), rel_tol=as.double(control$rel.tol), inv_tol=as.double(control$solve.tol), converged=FALSE, trace_lev=as.integer(control$trace.lev), orthogonalize=as.logical(orthogonalize), subsample=TRUE, descent=FALSE, # and hence no 'convergence' here .. mts = 0L, ss = 1L) z[c("b1", "b2", "scale")] } ## Test descent algorithm m_s_descent <- function(x1, x2, y, control, b1, b2, scale) { storage.mode(x1) <- "double" storage.mode(x2) <- "double" storage.mode(y) <- "double" z <- .C(robustbase:::R_lmrob_M_S, X1=x1, X2=x2, y=y, res=double(length(y)), n=length(y), p1=NCOL(x1), p2=NCOL(x2), nResample=as.integer(control$nResample), max_it_scale=as.integer(control$maxit.scale), scale=as.double(scale), b1=as.double(b1), b2=as.double(b2), tuning_chi=as.double(control$tuning.chi), ipsi=robustbase:::.psi2ipsi(control$psi), bb=as.double(control$bb), K_m_s=as.integer(control$k.m_s), max_k=as.integer(control$k.max), rel_tol=as.double(control$rel.tol), inv_tol=as.double(control$solve.tol), converged=logical(1), trace_lev=as.integer(control$trace.lev), orthogonalize=FALSE, subsample=FALSE, descent=TRUE, ## ----- mts = 0L, ss = 1L) z[c("b1", "b2", "scale", "res", "converged")] } find_scale <- function(r, s0, n, p, control) { c.chi <- robustbase:::.psi.conv.cc(control$psi, control$tuning.chi) b <- .C(robustbase:::R_lmrob_S, x = double(1), y = as.double(r), n = as.integer(n), p = as.integer(p), nResample = 0L, scale = as.double(s0), coefficients = double(p), as.double(c.chi), robustbase:::.psi2ipsi(control$psi), as.double(control$bb), best_r = 0L, groups = 0L, n.group = 0L, k.fast.s = 0L, k.iter = 0L, maxit.scale = as.integer(control$maxit.scale), refine.tol = as.double(control$refine.tol), inv.tol = as.double(control$solve.tol), converged = logical(1), trace.lev = 0L, mts = 0L, ss = 1L, fast.s.large.n = as.integer(n+1) )[c("coefficients", "scale", "k.iter", "converged")] b$scale } ## m_s_descent()--R-only--version : m_s_descent_Ronly <- function(x1, x2, y, control, b1, b2, scale) { stopifnot(is.list(control), is.numeric(control$k.max)) n <- length(y) p1 <- ncol(x1) p2 <- ncol(x2) p <- p1+p2 t2 <- b2 t1 <- b1 rs <- drop(y - x1 %*% b1 - x2 %*% b2) sc <- scale ## do refinement steps ## do maximally control$k.max iterations ## stop if converged ## stop after k.fast.m_s step of no improvement if (control$trace.lev > 4) cat("scale:", scale, "\n") if (control$trace.lev > 4) cat("res:", rs, "\n") n.imp <- nnoimprovement <- nref <- 0L; conv <- FALSE while((nref <- nref + 1) <= control$k.max && !conv && nnoimprovement < control$k.m_s) { ## STEP 1: UPDATE B2 y.tilde <- y - x1 %*% t1 w <- Mwgt(rs / sc, control$tuning.chi, control$psi) if (control$trace.lev > 4) cat("w:", w, "\n") z2 <- lm.wfit(x2, y.tilde, w) t2 <- z2$coef if (control$trace.lev > 4) cat("t2:", t2, "\n") rs <- y - x2 %*% t2 ## STEP 2: OBTAIN M-ESTIMATE OF B1 z1 <- lmrob.lar(x1, rs, control) t1 <- z1$coef if (control$trace.lev > 4) cat("t1:", t1, "\n") rs <- z1$resid ## STEP 3: COMPUTE THE SCALE ESTIMATE sc <- find_scale(rs, sc, n, p, control) if (control$trace.lev > 4) cat("sc:", sc, "\n") ## STEP 4: CHECK FOR CONVERGENCE ##... FIXME ## STEP 5: UPDATE BEST FIT if (sc < scale) { scale <- sc b1 <- t1 b2 <- t2 nnoimprovement <- 0L n.imp <- n.imp + 1L } else nnoimprovement <- nnoimprovement + 1L } ## STEP 6: FINISH if (nref == control$k.max) warning("M-S estimate: maximum number of refinement steps reached.") ## if we'd really check for convergence above : ## if (nnoimprovement == control$k.m_s) ## warning("M-S estimate: maximum number of no-improvements reached.") list(b1=b1, b2=b2, scale=scale, res=rs, nref=nref, n.improve = n.imp)#, converged=conv, nnoimprovement=nnoimprovement) } robustbase/inst/xtraR/plot-psiFun.R0000644000176200001440000001032612455444424017055 0ustar liggesusers## Functions to plot and check psi-functions ## used in ../../tests/lmrob-psifns.R, ## ../../tests/psi-rho-etc.R ## and ../../vignettes/psi_functions.Rnw vignette ## Original Author of functions: Martin Maechler, Date: 13 Aug 2010, 10:17 p.psiFun <- function(x, psi, par, main=FALSE, ...) { m.psi <- cbind(rho = Mpsi(x, par, psi,deriv=-1), psi = Mpsi(x, par, psi,deriv= 0), Dpsi = Mpsi(x, par, psi,deriv= 1), wgt = Mwgt(x, par, psi)) robustbase:::matplotPsi(x, m.psi, psi=psi, par=par, main=main, ...) ## -> cbind(x, m.psi) } p.psiFun2 <- function(x, psi, par, main="short", ...) p.psiFun(x, psi, par, main=main, leg.loc= "bottomright", ylim = c(-2.2, 6)) ## for psi_func class objects: simply use plot() method. mids <- function(x) (x[-1]+x[-length(x)])/2 ##' is 'psi' the name of redescending psi (i.e. with *finite* rejection point) isPsi.redesc <- function(psi) { psi != "Huber" ## <- must be adapted when we introduce more } ##' @title Check consistency of psi/chi/wgt/.. functions ##' @param m.psi matrix as from p.psiFun() ##' @param tol ##' @return concatenation of \code{\link{all.equal}} results ##' @author Martin Maechler chkPsiDeriv <- function(m.psi, tol = 1e-4) { stopifnot(length(tol) > 0, tol >= 0, is.numeric(psi <- m.psi[,"psi"]), is.numeric(dx <- diff(x <- m.psi[,"x"]))) if(length(tol) < 2) tol[2] <- 10*tol[1] xn0 <- abs(x) > 1e-5 c(all.equal(mids(psi), diff(m.psi[,"rho"])/dx, tolerance=tol[1]), # rho' == psi all.equal(mids(m.psi[,"Dpsi"]), diff(psi)/dx, tolerance=tol[2]),# psi' == psip all.equal(m.psi[xn0,"wgt"], (psi/x)[xn0], tolerance= tol[1]/10))# psi/x == wgt } ##' This version "starts from scratch" instead of from p.psiFun() result: ##' ##' @title Check consistency of psi/chi/wgt/.. functions ##' @param x range or vector of abscissa values ##' @param psi psi() function spec., passed to M.psi() etc ##' @param par tuning parameter, passed to M.psi() etc ##' @param tol tolerance for equality checking of numeric derivatives ##' @return concatenation of \code{\link{all.equal}} results ##' @author Martin Maechler chkPsi.. <- function(x, psi, par, tol = 1e-4, doD2, quiet=FALSE) { stopifnot(length(tol) > 0, tol >= 0, is.numeric(x), is.finite(x)) is.redesc <- isPsi.redesc(psi) if(length(x) == 2) ## it is a *range* -> produce vector x <- seq(x[1], x[2], length = 1025L) dx <- diff(x) x0 <- sort(x) x <- c(-Inf, Inf, NA, NaN, x0) if(is.redesc) rho <- Mpsi(x, par, psi, deriv=-1) psix <- Mpsi(x, par, psi, deriv= 0) Dpsi <- Mpsi(x, par, psi, deriv= 1) wgt <- Mwgt(x, par, psi) chi <- Mchi(x, par, psi) if(is.redesc) { chi1 <- Mchi(x, par, psi, deriv=1) chi2 <- Mchi(x, par, psi, deriv=2) } rho.Inf <- MrhoInf(par, psi) if(is.redesc) stopifnot(all.equal(rep(rho.Inf,2), rho[1:2]), all.equal(chi, rho / rho.Inf), all.equal(chi1,psix / rho.Inf), all.equal(chi2,Dpsi / rho.Inf) ) else { ## check any here? From ../src/lmrob.c : ## chi = C-function rho(x) which is unscaled rho <- chi # for checks below } D2psi <- tryCatch(Mpsi(x, par, psi, deriv= 2), error=function(e)e) has2 <- !inherits(D2psi, "error") doD2 <- if(missing(doD2)) has2 else doD2 && has2 if(!quiet & !doD2) message("Not checking psi''() := Mpsi(*, deriv=2)") stopifnot(is.numeric(psix), ## check NA / NaN : identical5(x[3:4], chi[3:4], psix[3:4], Dpsi[3:4], wgt[3:4]), if(has2) identical(x[3:4], D2psi[3:4]) else TRUE) if(length(tol) < 2) tol[2] <- 16*tol[1] if(length(tol) < 3) tol[3] <- tol[1]/10 if(length(tol) < 4) tol[4] <- 8*tol[2] i <- 5:length(x) # leaving away the first 4 (+-Inf, NA..) xn0 <- is.finite(x) & abs(x) > 1e-5 c("rho' = psi" = all.equal(mids(psix[i]), diff(rho [i])/dx, tolerance=tol[1]), "psi' = psip"= all.equal(mids(Dpsi[i]), diff(psix[i])/dx, tolerance=tol[2]), "psi/x= wgt" = all.equal( wgt[xn0], (psix/x)[xn0], tolerance=tol[3]), "psi''=D2psi"= if(doD2) all.equal(mids(D2psi[i]), diff(Dpsi[i])/dx,tolerance=tol[4]) else NA) } robustbase/inst/xtraR/ex-funs.R0000644000176200001440000000432212455444424016221 0ustar liggesusers## These two fail when length(x) == 0 {but are short and nice otherwise} himed <- function(x) { n2 <- 1 + length(x) %/% 2; sort(x, partial = n2)[n2] } lomed <- function(x) { n2 <- (1+ length(x))%/% 2; sort(x, partial = n2)[n2] } ## From package 'limma' : ~/R/BioCore/madman/Rpacks/limma/R/weightedmedian.R weighted.median <- function (x, w, na.rm = FALSE, low = FALSE, high = FALSE) { ## Weighted median ## Gordon Smyth ## 30 June 2005 ## improved by MMaechler: 'low' and 'high' as with 'mad()'; 21 Nov 2005 if (missing(w)) w <- rep.int(1, length(x)) else { if(length(w) != length(x)) stop("'x' and 'w' must have the same length") if(any(is.na(w))) stop("NA weights not allowed") ## Note that sometimes the estimate would be well-defined even ## with some NA weights! if(any(w < 0)) stop("Negative weights not allowed") if(is.integer(w)) w <- as.numeric(w) } if(any(nax <- is.na(x))) { if(na.rm) { w <- w[i <- !nax] x <- x[i] } else return(NA) } if(all(w == 0)) { warning("All weights are zero") return(NA) } ## otherwise, have sum(w) > 0 if(is.unsorted(x)) { o <- order(x) x <- x[o] w <- w[o] } p <- cumsum(w)/sum(w) k <- sum(p < 0.5) + 1:1 if(p[k] > 0.5 || low) x[k] else if(high) x[k+1] else (x[k] + x[k+1])/2 } Qn0R <- function(x) { ## `R only' naive version of Qn() ==> slow and large memory for large n n <- length(x <- sort(x)) if(n == 0) return(NA) else if(n == 1) return(0.) k <- choose(n %/% 2 + 1, 2) m <- outer(x,x,"-")# abs not needed because of sort() sort(m[lower.tri(m)], partial = k)[k] } Sn0R <- function(x) { ## `R only' naive version of Sn() ==> slow and large memory for large n if((n <- length(x)) == 0) return(NA) else if(n == 1) return(0.) lomed(apply(abs(outer(x,x,"-")), 2, himed)) } ## Tol = 2e-7 : higher than usual is.all.equal <- function(x,y, tol = 2e-7, scale = 1) { ## scale = 1: ensures 'absolute error' in all cases ## scale = x: ensures `relative error' in all cases is.logical(r <- all.equal(x,y, tolerance = tol, scale = scale)) && r } robustbase/inst/xtraR/lmrob-trace_lev.R0000644000176200001440000000452212154325275017711 0ustar liggesusers## testing trace_lev settings require(robustbase) ## fit a model with categorical, continuous and mixed variables selDays <- c( ## days ranked according to number of outliers: "403", "407", "693", "405", "396", "453", "461", ## "476", "678", "730", "380", "406", "421", "441" ## ,"442", "454", "462", "472", "480", "488" ## some other days ## "712", "503", "666", "616", "591", "552", "624", "522", "509", "388", "606", "580", "573", "602", "686", "476", "708", "600", "567") contr <- list(julday=contr.sum) ## using this seed and the default configuration options, ## the fast_S algorithm stops with some "local exact fits", ## i.e., coefficients with std. error 0. set.seed(711) lseed <- .Random.seed r1 <- lmrob(LNOx ~ (LNOxEm + sqrtWS)*julday, NOxEmissions, julday %in% selDays, contrasts=contr, seed=lseed, max.it=10000, nResample=5, trace.lev=1) ## change best.r.s to 11 and it works properly ## (for this seed at least) res <- update(r1, k.max=10000, best.r.s = 3, nResample=1000, trace.lev=2) ##### ## fast_S (non-large strategy) ## test non-convergence warnings / trace output: res <- update(r1, max.it = 1) res <- update(r1, k.max = 1) ## test trace_levs: res <- update(r1, trace.lev = 0) res <- update(r1, trace.lev = 1) res <- update(r1, trace.lev = 2) res <- update(r1, trace.lev = 3) res <- update(r1, trace.lev = 4) res <- update(r1, trace.lev = 5) ##### ## M-S estimator r2 <- update(r1, init="M-S", split.type="fi", subsampling="simple", mts=10000) ## test non-convergence warnings / trace output: res <- update(r2, max.it = 1) res <- update(r2, k.m_s = 1) ## does not converge anyway ## test trace_levs: res <- update(r2, trace.lev = 0) res <- update(r2, trace.lev = 1) res <- update(r2, trace.lev = 2) res <- update(r2, trace.lev = 3) res <- update(r2, trace.lev = 4) ## this produces _a_lot_ of output: ## res <- update(r2, trace.lev = 5) ##### ## fast_S (large-n strategy) ## need to use continuous only design r3 <- update(r1, LNOx ~ LNOxEm + sqrtWS, subset=NULL, contrasts=NULL) ## test non-convergence warnings / trace output: res <- update(r3, max.it = 1) res <- update(r3, k.max = 1) ## test trace_levs: res <- update(r3, trace.lev = 0) res <- update(r3, trace.lev = 1) res <- update(r3, trace.lev = 2) res <- update(r3, trace.lev = 3) res <- update(r3, trace.lev = 4) ## (there is no level 5) robustbase/inst/xtraR/mcnaive.R0000644000176200001440000000433712115641034016251 0ustar liggesusersmcNaive <- function (x, method = c("h.use", "simple"), low = FALSE, high = FALSE) { ## Purpose: naive implementation of mc() ## ---------------------------------------------- ## (low, high) - as in mad() - for choosing the (lo/hi)-median with even n ## ## Author: Martin Maechler, Date: 21 Jul 2007 n <- length(x) if(n <= 2) return(0) x <- sort(x) stopifnot(is.finite(m <- median(x)))# <==> no NAs in x[] x <- x - m n1 <- length(xL <- x[x <= 0]) # both contain all (if any) median values n2 <- length(xR <- x[x >= 0]) n.n <- as.double(n1)*n2 if(n.n > 1e8)# 1e8 < .Machine$integer.max stop("\"simple\" method not sensible here: would need too much memory: n.n=", n.n) Mmedian <- { if ((low || high) && n.n %% 2 == 0) { if (low && high) stop("'low' and 'high' cannot be both TRUE") N2 <- n.n %/% 2 + as.integer(high) function(x) sort(x, partial = N2)[N2] } else median } method <- match.arg(method) switch(method, "simple" = { r <- outer(xR, xL, "+") / outer(xR, xL, "-") r[is.na(r)] <- 0 # simple -- ## ok only when the median-observations are "in the middle", ## e.g. *not* ok for c(-5, -1, 0, 0, 0, 1) Mmedian(r) }, "h.use" = { k <- sum(x == 0) ## the number of obs coinciding with median() irep <- rep.int(n1, n2) if(k > 0) { ## have some obs. == median ( == 0) h <- function(xl,xr, i,j) { ## must parallelize (!) eq <- xl == xr r <- xl xr <- xr[!eq] xl <- xl[!eq] r [eq] <- sign(i[eq]+j[eq]-1-k) r[!eq] <- (xr + xl)/(xr - xl) r } i <- integer(n1) j <- integer(n2) i[(n1-k+1):n1] <- j[1:k] <- 1:k i <- rep(i, times = n2) j <- rep(j, irep) } else { ## k == 0: h <- function(xl,xr, i,j) (xr + xl)/(xr - xl) i <- j <- NULL } ## build outer(xL, xR, FUN= h) manually, such that ## we can pass (i,j) properly : Mmedian(h(xl = rep(xL, times = n2), xr = rep(xR, irep), i, j)) }) } robustbase/inst/xtraR/subsample-fns.R0000644000176200001440000001273212271657124017416 0ustar liggesusers### Mainly used for source package checking in ../../tests/subsample.R ### however, available (for reproducible research, confirmation) as ### part of the robustbase package. ## R version of LU decomposition in subsample() in lmrob.c ## Modified from Golub G. H., Van Loan, C. F., Matrix Computations, 3rd edition LU.gaxpy <- function(A, pivot=TRUE, tol = 1e-7, verbose = FALSE) { A <- as.matrix(A) ## m x n matrix, n >= m >= 1 stopifnot((n <- ncol(A)) >= (m <- nrow(A)), m >= 1) ## precondition: cf0 <- max(abs(A)) A <- A / cf0 v <- double(m) ## work matrix ## these matrices will contain the results L <- diag(m) U <- matrix(0, m, m) p <- integer(m-1) ## pivots idc <- 1L:n ## which columns of A are used idr <- 1L:m ## how rows of A are permuted for(j in 1L:m) { sing <- TRUE while(sing) { if (length(idc) < j) break if (j == 1L) { v[j:m] <- A[idr[j:m], idc[j]] } else { rows <- 1L:(j-1L) z <- forwardsolve(L[rows, rows, drop=FALSE], A[idr[rows], idc[j]]) U[rows, j] <- z v[j:m] <- A[idr[j:m], idc[j]] - L[j:m, rows, drop=FALSE] %*% z if(verbose) cat("Step", j, "z=", sapply(z, function(x) sprintf("%.15f", x)), "\n v=", v, "\n") } if (j < m) { mu <- j mu <- if (pivot) which.max(abs(v[j:m])) + j - 1L else j if(verbose) ## debug possumDiv example cat(sprintf("R-Step: %i: ", j), round(abs(v[j:m]), 6), "\n", mu, v[mu], "\n") if (abs(v[mu]) >= tol) { ## singular: can stop here already p[j] <- mu if (pivot) { tmp <- v[j]; v[j] <- v[mu]; v[mu] <- tmp tmp <- idr[j]; idr[j] <- idr[mu]; idr[mu] <- tmp } L[(j+1L):m, j] <- v[(j+1L):m]/v[j] if (pivot && j > 1) { ## swap rows L[j,] <-> L[mu,] tmp <- L[j, rows]; L[j, rows] <- L[mu, rows]; L[mu, rows] <- tmp } } } U[j, j] <- v[j] if (abs(v[j]) < tol) { if(verbose) cat("* singularity detected in step ", j, "; candidate ", idc[j],"\n") idc <- idc[-j] } else sing <- FALSE }## {while} }## {for} list(L = L, U = U * cf0, p = p, idc = idc[1L:m], singular = sing) } Rsubsample <- function(x, y, mts=0, tolInverse = 1e-7) { if(!is.matrix(x)) x <- as.matrix(x) stopifnot((n <- length(y)) == nrow(x)) p <- ncol(x) storage.mode(x) <- "double" .C(robustbase:::R_subsample, x=x, y=as.double(y), n=as.integer(n), m=as.integer(p), beta=double(p), ind_space=integer(n), idc = integer(n), ## elements 1:p: chosen subsample idr = integer(n), lu = matrix(double(1), p,p), v=double(p), pivot = integer(p-1), Dr=double(n), Dc=double(p), rowequ=integer(1), colequ=integer(1), status=integer(1), sample = FALSE, ## set this to TRUE for sampling mts = as.integer(mts), ss = as.integer(mts == 0), tolinv = as.double(tolInverse), solve = TRUE) } ##' Simple version, just checking (non)singularity conformance tstSubsampleSing <- function(X, y) { lX <- X[sample(nrow(X)), ] ## C version zc <- Rsubsample(lX, y) ## R version zR <- LU.gaxpy(t(lX)) if (as.logical(zc$status)) { ## singularity in C detected if (!zR$singular) stop("singularity in C but not in R") } else { ## no singularity detected if (zR$singular) stop("singularity in R but not in C") } zR$singular } ##' Sophisticated version tstSubsample <- function(x, y=rnorm(n), compareMatrix = TRUE, lu.tol = 1e-7, lu.verbose=FALSE, tolInverse = lu.tol, eq.tol = .Machine$double.eps^0.5) { x0 <- x <- as.matrix(x) n <- nrow(x) p <- ncol(x) if(p <= 1) stop("wrong 'x': need at least two columns for these tests") stopifnot(length(y) == n) z <- Rsubsample(x, y, tolInverse=tolInverse) ## ---------- ## convert idc, idr and p to 1-based indexing: idr <- z$idr + 1L idc <- z$idc[1:p] + 1L pivot <- z$pivot + 1L ## get L and U L <- U <- LU <- matrix(z$lu, p, p) L[upper.tri(L, diag=TRUE)] <- 0 diag(L) <- 1 U[lower.tri(U, diag=FALSE)] <- 0 ## test solved parameter if (z$status == 0) { stopifnot(all.equal(z$beta, unname(solve(x[idc, ], y[idc])), tol=eq.tol)) } if (z$rowequ) x <- diag(z$Dr) %*% x if (z$colequ) x <- x %*% diag(z$Dc) if (z$rowequ || z$colequ) cat(sprintf("kappa before equilibration = %g, after = %g\n", kappa(x0), kappa(x))) LU. <- LU.gaxpy(t(x), tol=lu.tol, verbose=lu.verbose) ## -------- if (!isTRUE(all.equal(LU.$p, pivot, tolerance=0))) { cat("LU.gaxpy() and Rsubsample() have different pivots:\n") print(LU.$p) print(pivot) cat(" ... are different at indices:\n ") print(which(LU.$p != pivot)) } else { stopifnot(all.equal(LU.$L, L, tol=eq.tol), all.equal(LU.$U, U, tol=eq.tol), LU.$p == pivot, ## only compare the indices selected before stopping LU.$idc[seq_along(LU.$p)] == idc[seq_along(pivot)]) } ## compare with Matrix result if (compareMatrix && z$status == 0) { xsub <- x[idc, ] stopifnot(require("Matrix")) tmp <- lu(t(xsub)) ## idx <- upper.tri(xsub, diag=TRUE) stopifnot(all.equal(tmp@x, as.vector(z$lu), tol=eq.tol)) } invisible(z) } robustbase/inst/xtraR/test_MCD.R0000644000176200001440000003057512436421036016300 0ustar liggesusers#### Utility functions for testing covMcd() #### -------------------------------------- ../tests/tmcd.R ## "workhorse" -- by default *passed* to and called from doMCDdata(): domcd1 <- function(x, xname, nrep = 1, ## These are all got from doMCDdata() [yuck!] method = get("method", parent.frame()), # compromise time = get("time", parent.frame()), # compromise short = get("short", parent.frame()), # compromise full = get("full", parent.frame()), # compromise lname = 20) { if(short && full) stop("you should not set both 'full' and 'short' to TRUE") force(xname)# => evaluate when it is a data(<>, ..) call n <- dim(x)[1] p <- dim(x)[2] if(method == "MASS") { mcd <- MASS::cov.mcd(x) mcd$quan <- (n + p + 1) %/% 2 #default: floor((n+p+1)/2) } else if(method == "DetMCD") { mcd <- covMcd(x, nsamp="deterministic") # trace = FALSE } else { mcd <- covMcd(x) # trace = FALSE } if(full) { header <- get("header", parent.frame()) header(time) } xres <- sprintf("%*s %3d %3d %3d %12.6f", lname, xname, n, p, mcd$quan, mcd$crit) if(time) { xtime <- system.time(repMCD(x, nrep, method))[1]/nrep xres <- sprintf("%s %10.1f", xres, 1000 * xtime) } cat(xres, "\n") if(!short) { cat("Best subsample: \n") print(mcd$best) ibad <- which(unname(mcd$mcd.wt) == 0) nbad <- length(ibad) cat("Outliers: ",nbad, if(nbad > 0)":", "\n") if(nbad > 0) print(ibad) if(full) { cat("------------- *MCD() result: --------------------------\n") print(mcd) } cat("--------------------------------------------------------\n") } }## {domcd1} ##' Test the function covMcd() on the literature datasets: ##' ##' Call covMcd() for "all" datasets in robustbase / rrcov and print: ##' - execution time (if time is true) ##' - objective function ##' - best subsample found (if short is false) ##' - outliers identified (with cutoff 0.975) (if short is false) ##' - estimated center and covarinance matrix if full is true) ##' ##' @param nrep : [integer] number of repetitions to use for estimating the ##' (average) execution time ##' @param method : [character] select a method: one of (FASTMCD, MASS) ##' @param time : [logical] whether to evaluate the execution time ##' @param short : [logical] whether to do short output (i.e. only the ##' objective function value). If short == FALSE, ##' the best subsample and the identified outliers are ##' printed. See also the parameter full below ##' @param full : [logical] whether to print the estimated cente and covariance matrix ##' @param digits ##' @param domcd workhorse function, to be called e.g. as ##' @examples domcd(starsCYG, data(starsCYG), nrep) ##' @author Valentin Todorov; tweaks by Martin Maechler ##' @note Is called from ../../demo/determinMCD.R and ../../tests/tmcd.R doMCDdata <- function(nrep = 1, method = c("FASTMCD", "MASS", "DetMCD"), time = nrep >= 3, short = time, full = !short, digits=5, domcd = domcd1) { stopifnot(is.function(domcd), length(formals(domcd)) >= 3) options(digits = digits) method <- match.arg(method) # *is* then accessed from domcd(.) stopifnot(require("robustbase")) # all data() which do not specify package data(Animals, package = "MASS") brain <- Animals[c(1:24, 26:25, 27:28),] data(list = c("fish", "pottery", "rice", "un86", "wages"), package = "rrcov") tmp <- sys.call() cat("\nCall: ", deparse(substitute(tmp)),"\n") header <- function(time) { ## the string length here require 'lname <- 20' {FIXME} ## 1 2 ## 1 3 5 7 901 3 5 7 90 2 4 cat("Data Set n p h(alf) LOG(obj)",if(time)" Time [ms]","\n", "=============================================",if(time)"===========","\n", sep="") } if(full) { ## header() is called in each domcd() } else ## here header(time) domcd(bushfire, data(bushfire), nrep) domcd(heart[, 1:2], data(heart), nrep) domcd(starsCYG, data(starsCYG), nrep) domcd(stack.x, data(stackloss), nrep) domcd(data.matrix(subset(phosphor, select= -plant)),data(phosphor), nrep) domcd(data.matrix(subset(coleman, select = -Y)), data(coleman), nrep) domcd(data.matrix(subset(salinity, select = -Y)), data(salinity), nrep) domcd(data.matrix(subset(wood, select = -y)), data(wood), nrep) domcd(data.matrix(subset(hbk, select = -Y)), data(hbk), nrep) domcd(brain, "Animals", nrep) domcd(milk, data(milk), nrep) domcd(lactic, data(lactic), nrep) domcd(pension, data(pension), nrep) domcd(pilot, data(pilot), nrep) ## This is for CovMcdBig .... ## domcd(radarImage, data(radarImage), nrep) ## domcd(NOxEmissions, data(NOxEmissions), nrep) domcd(data.matrix(subset(vaso, select = -Y)), data(vaso), nrep) domcd(data.matrix(subset(wagnerGrowth, select = -Period)), data(wagnerGrowth), nrep) ## Obs 14 has missing, column 7 is categorical domcd(fish[-14,-7], data(fish, package="rrcov"), nrep) domcd(pottery[,-7], data(pottery, package="rrcov"), nrep) domcd(rice, data(rice, package="rrcov"), nrep) domcd(un86, data(un86, package="rrcov"), nrep) ## there are missing values domcd(wages[-c(29, 31, 38),-9], data(wages, package="rrcov"), nrep) cat("========================================================\n") } ## {doMCDdata} if(FALSE){ data(mortality, package = "riv") mm <- as.data.frame(lapply(mortality, signif, 3)) for(j in c(1,2,6,7)) mm[,j] <- mm[,j] * 10 mm[,5] <- mm[,5] * 1000 mm[,8] <- mm[,8] / 100 mort3 <- mm dput(mort3) } ## which gives the equivalent of mort3 <- data.frame(MO70 = c(140, 101, 86, 102, 115, 121, 118, 76.6, 131, 112, 111, 112, 117, 118, 123, 122, 81.7, 108, 111, 109, 92.5, 83.9, 93.8, 135, 124, 126, 122, 120, 127, 115, 156, 95.1, 127, 129, 116, 82.3, 115, 106, 134, 94.9, 119, 111, 131, 85.6, 135, 126, 141, 152, 137, 151, 93.6, 84.2, 78, 50.2, 81.3, 112, 80.1, 125, 120, 143), MAGE = c(297, 277, 275, 268, 296, 327, 314, 258, 342, 278, 278, 313, 284, 272, 296, 277, 271, 296, 286, 250, 280, 270, 246, 301, 279, 287, 293, 271, 291, 295, 314, 267, 275, 307, 259, 251, 324, 285, 288, 254, 278, 287, 316, 287, 326, 309, 334, 369, 321, 311, 261, 272, 260, 244, 248, 277, 240, 295, 319, 346), CI68 = c(137, 137, 129, 129, 151, 157, 157, 157, 157, 202, 202, 202, 138, 160, 190, 191, 191, 191, 159, 159, 146, 146, 203, 203, 182, 166, 203, 203, 167, 167, 165, 153, 149, 149, 149, 157, 152, 183, 183, 183, 183, 183, 183, 111, 171, 148, 148, 148, 192, 160, 160, 172, 172, 172, 172, 101, 173, 173, 144, 181), MDOC = c(142, 80.4, 148, 167, 230, 187, 240, 149, 240, 195, 327, 377, 203, 160, 161, 68.7, 141, 120, 176, 105, 128, 112, 98.9, 160, 209, 200, 153, 126, 157, 157, 145, 160, 158, 102, 195, 188, 250, 143, 157, 186, 114, 129, 129, 143, 186, 207, 144, 112, 157, 121, 168, 155, 144, 144, 120, 194, 93.6, 231, 185, 89.7), DENS = c(37, 37, 27, 32, 17, 13, 23, 19, 27, 29, 15, 15, 48, 34, 26, 47, 17, 10, 10, 18, 11, 13, 26, 19, 55, 17, 16, 7, 10, 17, 44, 13, 18, 26, 40, 22, 29, 7, 28, 10, 15, 1, 11, 10, 8, 13, 13, 6, 10, 26, 49, 28, 32, 18, 62, 15, 21, 18, 10, 12), NONW = c(4.22, 3.36, 0.67, 0.52, 2.51, 0.82, 4.07, 1.11, 2.86, 2.92, 2.74, 1.05, 7.23, 5.16, 3.44, 2.84, 1.84, 1.47, 0.62, 0.03, 0.96, 1.07, 1.74, 2.41, 0.45, 4.7, 4.45, 1.2, 0.64, 2.28, 4.13, 1.06, 4.02, 2.22, 5.6, 0.43, 2.34, 1.78, 2.81, 1.9, 3.09, 1.43, 2.58, 1.34, 0.78, 3.44, 2.07, 0.68, 1, 3.6, 3.92, 2.58, 2.66, 0.05, 0.86, 0.32, 3.02, 4.24, 1.26, 1.08), EDUC = c(454, 516, 601, 631, 565, 620, 661, 653, 661, 591, 568, 499, 685, 534, 539, 536, 560, 542, 680, 546, 648, 632, 601, 469, 458, 446, 521, 540, 661, 601, 480, 627, 506, 363, 551, 662, 518, 556, 484, 607, 562, 517, 521, 582, 629, 506, 534, 433, 459, 476, 492, 548, 517, 517, 468, 685, 483, 471, 678, 528), IN69 = c(86.9, 99.3, 113, 99.2, 104, 118, 113, 117, 125, 100, 104, 115, 122, 107, 135, 101, 123, 114, 114, 113, 108, 109, 100, 99.8, 102, 100, 110, 112, 111, 113, 92.7, 116, 86.3, 103, 86.4, 109, 116, 112, 104, 108, 103, 116, 99.3, 116, 114, 104, 105, 97, 102, 83.4, 101, 125, 117, 118, 90.3, 108, 92.4, 106, 126, 109)) ###'------*Generate* data for benchmarking ---------------------------------------- ##' Generates a location contaminated multivariate ##' normal sample of n observations in p dimensions ##' (1-eps) * N_p(0, I_p) + eps * N_(m,I_p) ##' where ##' m = (b,b,...,b) ##' Defaults: eps=0 and b=10 ##' @title Generate n x p location contaminated MV data ##' @param n number of observations ##' @param p number of variables ##' @param eps amount of contamination ##' @param b mean of "outliers" gendata <- function(n,p, eps=0, b=10) { if(missing(n) || missing(p)) stop("Please specify (n,p)") if(!is.numeric(eps) || length(eps) != 1 || eps < 0 || eps >= 0.5) stop("eps must be in [0,0.5)") X <- matrix(rnorm(n*p), n, p) nbad <- as.integer(eps * n) if(nbad > 0) { b <- rep(b, length = p) # recycle to p-vector ## = E[.] of bad obs. xind <- sample(n,nbad) X[xind,] <- X[xind, , drop=FALSE] + rep(b, each=nbad) } list(X=X, xind=if(nbad > 0) xind) } ##' Repeated calls to different MCD algorithms for timing purposes *only* repMCD <- function(x, nrep = 1, method = "FASTMCD") { stopifnot(length(nrep) == 1, nrep >= 1) switch(method, "FASTMCD" = replicate(nrep, covMcd(x)), "bestMCD" = replicate(nrep, covMcd(x, nsamp= "best")), "exactMCD" = replicate(nrep, covMcd(x, nsamp= "exact")), "DetMCD" = replicate(nrep, covMcd(x, nsamp="deterministic")), "MASS.best" = replicate(nrep, MASS::cov.mcd(x)),# uses nsamp = "best" ==> up to 5000 ## rrcov.control()$nsamp == 500 : "MASS.500" = replicate(nrep, MASS::cov.mcd(x, nsamp = 500)), ## otherwise: stop(gettextf("Method '%s' not yet implemented", method))) } repMCD.meths <- function() { switch.expr <- body(repMCD)[[3]] m <- names(switch.expr) m[m != ""] } if(FALSE) repMCD.meths() ## [1] "FASTMCD" "bestMCD" "DetMCD" "MASS.best" "MASS.500" ##' calls gendata(), repMCD() dogen <- function(nrep=1, eps=0.49, method = repMCD.meths(), ## "FASTMCD" is first p.set = c(2, 5, 10, 20, 30), n.set = c(100, 500, 1000, 10000, 50000), n.p.ratio = 5, seed = 1234) { domcd <- function(x, nrep=1){ ## system.time() *does* gc() xtime <- system.time(repMCD(x, nrep, method))[1]/nrep cat(sprintf("%6d %3d %12.2f\n", dim(x)[1], dim(x)[2], xtime)) xtime } set.seed(seed) method <- match.arg(method) mkL <- function(ch,m) paste(ch,m,sep="=") ans <- matrix(NA, length(n.set), length(p.set), dimnames = list(mkL("n",n.set), mkL("p",p.set))) cat(sprintf("Method: %-12s; nrep = %d\n", method, nrep), "------------------------------\n", " n p Time\n", "=======================\n", sep="") for(n in n.set) { n. <- mkL("n",n) for(p in p.set) { if(n.p.ratio * p <= n) { xx <- gendata(n, p, eps) ans[n., mkL("p",p)] <- domcd(xx$X, nrep) } } } cat("=======================\n") cat(sprintf("Total time: %11.2f\n", nrep * sum(ans, na.rm=TRUE))) structure(ans, nrep = nrep, method=method) }## {dogen} ###' ------------------ These can only be used with rrcov :: CovMcd() -------------- docheck <- function(n, p, eps, ...) { xx <- gendata(n,p,eps) mcd <- CovMcd(xx$X, ...) check(mcd, xx$xind) } ##' check if mcd is robust w.r.t xind, i.e. check how many of xind ##' did not get zero weight check <- function(mcd, xind){ mymatch <- xind %in% which(mcd@wt == 0) length(xind) - length(which(mymatch)) } robustbase/inst/Copyrights0000644000176200001440000000243410353034413015455 0ustar liggesusersThe C code for Qn() and Sn(), src/qnsn.c is based on Fortran code that has been be available from http://www.agoras.ua.ac.be/ (Antwerp Group On Robust & Applied Statistics) (-> Programs -> Robust) by {at the time} Peter Rousseeuw (rousse@wins.uia.ac.be) Christophe Croux (croux@wins.uia.ac.be) Department of Mathematics and Computing Universitaire Instelling Antwerpen Universiteitsplein 1 B-2610 Wilrijk (Antwerp) Belgium and carries the note This file contains fortran functions for two new robust estimators of scale denoted as Qn and Sn, decribed in Rousseeuw and Croux (1993). These estimators have a high breakdown point and a bounded influence function. The implementation given here is very fast (running in O(n logn) time) and needs little storage space. Rousseeuw, P.J. and Croux, C. (1993) Alternatives to the Median Absolute Deviation", Journal of the American Statistical Association, Vol. 88, 1273-1283. This software may be used and copied freely for scientific and/or non-commercial purposes, provided reference is made to the above mentioned paper. where as Martin Maechler got explicit permission from P.Rousseeuw to licence it under the GNU Public Licence. ---------------------------------------------------------------------------- robustbase/inst/doc/0000755000176200001440000000000012553432200014141 5ustar liggesusersrobustbase/inst/doc/estimating.functions.R0000644000176200001440000005504212553432042020451 0ustar liggesusers## Called from ./lmrob_simulation.Rnw ## ~~~~~~~~~~~~~~~~~~~~~ ########################################################################### ## Prediction ########################################################################### f.predict <- function (object, newdata = NULL, scale = sigma(object), se.fit = FALSE, df = object$df.residual, interval = c('none', 'confidence', 'prediction'), level = 0.95, type = c('response'), terms = NULL, na.action = na.pass, pred.var = res.var/weights, weights = 1, cov = covariance.matrix(object), ...) { ## Purpose: replace predict.lmrob from robustbase package ## ---------------------------------------------------------------------- ## Arguments: See ?predict.lm ## type = 'presponse' ('term' is not supported) ## terms argument is ignored ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 6 Sept 2009, 12:16 ## take as much from predict.lm as possible ## check arguments if (!missing(terms)) stop('predict.lmrob: terms argument is ignored') ## set data tt <- terms(object) if (missing(newdata) || is.null(newdata)) { mm <- X <- model.matrix(object) mmDone <- TRUE offset <- object$offset } else { Terms <- delete.response(tt) m <- model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels) if (!is.null(cl <- attr(Terms, "dataClasses"))) .checkMFClasses(cl, m) X <- model.matrix(Terms, m, contrasts.arg = object$contrasts) offset <- rep(0, nrow(X)) if (!is.null(off.num <- attr(tt, "offset"))) for (i in off.num) offset <- offset + eval(attr(tt, "variables")[[i + 1]], newdata) if (!is.null(object$call$offset)) offset <- offset + eval(object$call$offset, newdata) mmDone <- FALSE } n <- length(object$residuals) p <- object$rank if (p < ncol(X) && !(missing(newdata) || is.null(newdata))) warning("prediction from a rank-deficient fit may be misleading") beta <- coef(object) ## ignoring piv here predictor <- drop(X %*% beta) if (!is.null(offset)) predictor <- predictor + offset interval <- match.arg(interval) type <- match.arg(type) if (se.fit || interval != "none") { res.var <- scale^2 if (type != "terms") { if (p > 0) { ## this is probably not optimal... ## cov <- covariance.matrix(object) ## set as argument ip <- diag(X %*% tcrossprod(cov, X)) } else ip <- rep(0, n) } } if (interval != "none") { tfrac <- qt((1 - level)/2, df) hwid <- tfrac * switch(interval, confidence = sqrt(ip), prediction = sqrt(ip + pred.var)) if (type != "terms") { predictor <- cbind(predictor, predictor + hwid %o% c(1, -1)) colnames(predictor) <- c("fit", "lwr", "upr") } } if (se.fit || interval != "none") se <- sqrt(ip) if (missing(newdata) && !is.null(na.act <- object$na.action)) { predictor <- napredict(na.act, predictor) if (se.fit) se <- napredict(na.act, se) } if (se.fit) list(fit = predictor, se.fit = se, df = df, residual.scale = sqrt(res.var)) else predictor } ## predict(obj, pred, interval = 'prediction') ## f.predict(obj, pred, interval = 'prediction') predict.lmRob <- function(object, newdata = NULL, scale = NULL, ...) { ## Purpose: extend predict() functionality to lmRob objects ## ---------------------------------------------------------------------- ## Arguments: ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 6 Sept 2009, 12:16 class(object) <- c(class(object), "lm") object$qr <- qr(sqrt(weights(object)) * model.matrix(object)) if (missing(scale)) scale <- object$scale predict.lm(object, newdata = newdata, scale = scale, ...) } ########################################################################### ## some helper functions ########################################################################### f.lmRob <- function(...) { ## Purpose: wrapper for lmRob ## ---------------------------------------------------------------------- ## Arguments: see ?lmRob ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 8 Oct 2009, 10:28 ## get arguments args <- list(...) ## update defaults: if (is.null(args$mxr)) args$mxr <- 2000 if (is.null(args$mxf)) args$mxf <- 500 if (is.null(args$mxs)) args$mxs <- 2000 ## get all arguments except the arguments of lmRob: uw <- c('formula', 'data', 'weights', 'subset', 'na.action', 'model', 'x', 'y', 'contrasts', 'nrep', 'genetic.control') ind <- if (is.null(names(args))) rep(FALSE, length(args)) else names(args) != '' & !names(args) %in% uw ## they go into control: control <- do.call("lmRob.control", args[ind]) ## now call lmRob do.call("lmRob", c(args[!ind], list(control = control))) } ## lmRob(y ~ x, d.data, control = lmRob.control(initial.alg = 'fast', efficiency = 0.95, weight = c('bisquare', 'bisquare'))) ## lmRob(y ~ x, d.data, initial.alg = 'fast', efficiency = 0.95, weight = c('bisquare', 'bisquare')) ## f.lmRob(y ~ x, d.data, initial.alg = 'fast', efficiency = 0.95, weight = c('bisquare', 'bisquare')) f.lmRob.S <- function(... , robust.control = lmRob.control()) { ## Purpose: call the S estimation procedure of lmRob ## ---------------------------------------------------------------------- ## Arguments: x: design matrix x ## y: vector of observations ## robust.control: control list of lmRob.control() ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 29 Oct 2009, 14:54 ## code: from lmRob.fit.compute, robust package version 0.3-9 robust.control$initial.alg = 'random' robust.control$estim = 'Initial' z <- lmRob(..., control = robust.control) class(z) <- 'lmrob.S' z } ## f.lmRob.S(rep(1,10), rnorm(10), lmRob.control(weight = c('bisquare', 'bisquare'))) f.eff2c.psi <- function(eff, weight='bisquare') { ## Purpose: convert lmRob efficiencies to c.psi ## ---------------------------------------------------------------------- ## Arguments: eff: lmRob efficiency ## weight: type of weight (weight argument in lmRob.control) ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 8 Oct 2009, 15:36 if(is.null(eff)) return(NULL) lw = casefold(weight) if (lw == 'bisquare') { if (eff == 0.95) 4.685061 else if (eff == 0.9) 3.882646 else if (eff == 0.85) 3.443689 else if (eff == 0.8) 3.136909 else NA } else if (lw == 'optimal') { if (eff == 0.95) 1.060158 else if (eff == 0.9) 0.9440982 else if (eff == 0.85) 0.8684 else if (eff == 0.8) 0.8097795 else NA } else NA } f.psi2c.chi <- function(weight) { ## Purpose: return lmRob defaults for c.chi ## ---------------------------------------------------------------------- ## Arguments: weight: type of weight ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 28 Jan 2010, 10:05 switch(weight, 'bisquare' = 1.5477, 'optimal' = 0.4047) } residuals.lmrob.S <- function(obj) obj$residuals robustness.weights <- function(x, ...) UseMethod("robustness.weights") ## Purpose: retrieve robustness weights from robust regression return ## object ## ---------------------------------------------------------------------- ## Arguments: obj: robust regression output object ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 6 Oct 2009, 13:42 robustness.weights.lmrob <- robustness.weights.default <- function(obj) naresid(obj$na.action, obj$w) robustness.weights.lm <- function(obj) { if (any(class(obj) %in% c('lmrob', 'f.lmrob'))) stop('Caution: returning incorrect weights') naresid(obj$na.action, rep(1, length(obj$resid))) } robustness.weights.rlm <- function(obj) naresid(obj$na.action, obj$w) robustness.weights.lmRob <- function(obj) { if (obj$robust.control$weight[2] != 'Optimal') { c.psi <- f.eff2c.psi(obj$robust.control$efficiency, obj$robust.control$weight[2]) rs <- obj$residuals / obj$scale obj$M.weights <- Mwgt(rs, c.psi, obj$robust.control$weight[2]) } naresid(obj$na.action, obj$M.weights) } ## t <- f.lmRob(y ~ x, d.data) ## t <- f.lmrob(y ~ x, d.data, method = 'SM') ## t <- f.lmRob(y ~ x, d.data, initial.alg = 'fast', efficiency = 0.95, weight = c('bisquare', 'bisquare')) ## t <- lmRob(y ~ x, d.data, control = lmRob.control(initial.alg = 'fast', efficiency = 0.95, weight = c('bisquare', 'bisquare'))) ## robustness.weights(t) robustness.weights.lmrob.S <- function(obj) { rstand <- resid(obj)/sigma(obj) Mwgt(rstand, obj$control$tuning.chi, obj$control$psi) } ## MM: Why on earth is this called covariance.matrix() ?? -- S and R standard is vcov() !! ## -- For lm, they are indeed identical; for lmrob, too ## HOWEVER, the *.rlm() method of cov..matrix() *differs* from vcov.rlm() -- why? covariance.matrix <- function(x, ...) UseMethod("covariance.matrix") ## Purpose: retrieve covariance matrix from robust regression return ## object ## ---------------------------------------------------------------------- ## Arguments: obj: robust regression output object ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 6 Oct 2009, 13:42 covariance.matrix.lmrob <- covariance.matrix.default <- function(obj) obj$cov covariance.matrix.rlm <- function(obj, method = 'XtWX') summary(obj, method)$cov covariance.matrix.lm <- function(obj) { s <- summary(obj) s$cov * s$sigma^2 } sigma <- function(x, ...) UseMethod("sigma") ## Purpose: retrieve scale estimate from robust regression return ## object ## ---------------------------------------------------------------------- ## Arguments: obj: robust regression output object ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 6 Oct 2009, 13:42 sigma.lmrob <- sigma.default <- function(obj) obj$scale sigma.lm <- function(obj) summary(obj)$sigma sigma.rlm <- function(obj) obj$s converged <- function(x, ...) UseMethod("converged") ## Purpose: check convergence status of return object ## ---------------------------------------------------------------------- ## Arguments: obj: robust regression output object ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 6 Oct 2009, 13:42 converged.default <- function(obj) is.list(obj) && !is.null(obj$converged) && obj$converged converged.lm <- function(obj) if (is.null(obj$converged)) TRUE else obj$converged converged.lmRob <- function(obj) is.list(obj) && !is.null(obj$est) && obj$est == 'final' ########################################################################### ## alternative estimation methods ########################################################################### lmrob.u <- function(formula, data, subset, weights, na.action, ..., start) { ## Purpose: update lmrob object if possible ## ---------------------------------------------------------------------- ## Arguments: (lmrob arguments) ## start: object to update ## ---------------------------------------------------------------------- ## Author: Manuel Koller, Date: 28 Jul 2010, 08:30 args <- as.list(match.call())[-1] args$start <- NULL if (!missing(start)) { ## if start is a string, get start from parent if (is.character(start)) start <- get(start, envir=parent.frame()) if (class(start) == 'lmrob') { ## check whether we can update start easily oldargs <- as.list(start$call)[-1] if (isTRUE(all.equal(args, oldargs))) return(start) else { ret <- start ## check method argument (ignore cov argument) if (is.null(oldargs$method)) oldargs$method <- start$control$method if (oldargs$method == 'MM') oldargs$method <- 'SM' if (is.null(args$method) || args$method == 'MM') args$method <- 'SM' rest.ok <- isTRUE(all.equal(oldargs[!names(oldargs) %in% c('method', 'cov')], args[!names(args) %in% c('method', 'cov')])) if (is.null(start$x)) stop('x matrix not found. Use lmrob(..., x = TRUE).') if (args$method != oldargs$method && rest.ok) { ## method is different, but the rest is the same oldsteps <- strsplit(oldargs$method, "")[[1]] steps <- strsplit(args $method, "")[[1]] ## reduce start to largest common initial estimator while(length(oldsteps) > length(steps) || any(oldsteps != steps[seq_along(oldsteps)])) { elems <- c('na.action', 'offset', 'contrasts', 'xlevels', 'terms', 'model', 'x', 'y', 'degree.freedom', 'df.residual', 'call') ret <- c(ret$init, start[elems[elems %in% names(ret)]]) class(ret) <- 'lmrob' oldsteps <- oldsteps[-length(oldsteps)] } ret$call$method <- args$method steps <- steps[- seq_along(oldsteps)] if (length(steps) > 0) { ret$cov <- NULL for (step in steps) { ret <- switch(step, D = lmrob..D..fit(ret), M = lmrob..M..fit(obj = ret), N = { y <- model.response(ret$model) ## taus are standardized because otherwise ## the resulting efficiency is lower tau <- ret$tau / mean(ret$tau) tmp <- lmrob..M..fit(x = ret$x/tau, y = y/tau, obj = ret) tmp$residuals <- y - ret$x %*% ret$coef tmp$qr <- NULL tmp }, stop("only M or D steps supported")) if (!ret$converged) { warning(step, "-step did NOT converge.") break } } } else { if (is.null(ret$qr)) ret$qr <- qr(ret$x * sqrt(ret$weights)) ret$rank <- ret$qr$rank } } ## update covariance matrix if (rest.ok) { if (is.null(args$cov)) args$cov <- lmrob.control(method=ret$control$method)$cov ret$cov <- vcov(ret, args$cov) ret$control$cov <- args$cov ret$call$cov <- args$cov return(ret) } } } } ## if we're here, update failed or there was no start cl <- match.call() cl$start <- NULL cl[[1]] <- as.symbol("lmrob") eval(cl, envir = parent.frame()) } ## lmrob.u <- function(formula, data, subset, weights, na.action, ..., start) ## { ## cl <- match.call() ## cl$start <- NULL ## cl[[1]] <- as.symbol("lmrob") ## eval(cl, envir = parent.frame()) ## ## do.call('lmrob', args, envir = parent.frame()) ## } ## set.seed(0); d.data <- data.frame(y = rnorm(10), x = 1:10) ## lres <- lmrob(y ~ x, d.data, method = 'SM', psi = 'lgw', cov = '.vcov.avar1') ## obj1 <- lmrob(y ~ x, d.data, method = 'SM', psi = 'lgw', cov = '.vcov.w') ## test <- lmrob.u(y ~ x, d.data, method = 'SM', psi = 'lgw', cov = '.vcov.w', ## start = 'lres') ## all.equal(obj1, test) ## obj2 <- lmrob(y ~ x, d.data, method = 'SMD', psi = 'lgw', cov = '.vcov.w') ## test <- lmrob.u(y ~ x, d.data, method = 'SMD', psi = 'lgw', ## start = 'lres') ## all.equal(obj2, test[names(obj2)], check.attr = FALSE) ## obj3 <- lmrob(y ~ x, d.data, method = 'SMDM', psi = 'lgw', cov = '.vcov.w') ## test <- lmrob.u(y ~ x, d.data, method = 'SMDM', psi = 'lgw', ## start = 'lres') ## all.equal(obj3, test[names(obj3)], check.attr = FALSE) ## test <- lmrob.u(y ~ x, d.data, method = 'SMDM', psi = 'lgw', ## start = 'obj2') ## all.equal(obj3, test[names(obj3)], check.attr = FALSE) ## test <- lmrob.u(y ~ x, d.data, method = 'SM', psi = 'lgw', cov = '.vcov.w', ## start = obj3) ## all.equal(obj1, test[names(obj1)], check.attr = FALSE) ##' Compute the MM-estimate with corrections qE or qT as in ##' Maronna, R. A., Yohai, V. J., 2010. ##' Correcting MM estimates for "fat" data sets. ##' Computational Statistics & Data Analysis 54 (12), 3168–3173. ##' @title MM-estimate with Maronna-Yohai(2010) corrections ##' @param formula ##' @param data ##' @param subset ##' @param weights ##' @param na.action ##' @param ... ##' @param type ##' @return ##' @author Manuel Koller lmrob.mar <- function(formula, data, subset, weights, na.action, ..., type = c("qE", "qT")) { ## get call and modify it so that ## lmrob returns the appropriate S-estimate cl <- match.call() method <- if (is.null(cl$method)) { if (!is.null(cl$control)) list(...)[["control"]]$method else 'MM' } else cl$method cl$type <- NULL cl$method <- 'S' cov <- if(!is.null(cl$cov)) cl$cov else '.vcov.w' cl$cov <- 'none' cl[[1]] <- as.symbol("lmrob") ## get S-estimate obj <- eval(cl, envir = parent.frame()) ## correct S-scale estimate according to formula n <- length(obj$resid) p <- obj$rank type <- match.arg(type) ## for type qE: adjust tuning.chi (h0) to account for different delta if (type == 'qE') { if (obj$control$psi != 'bisquare')## FIXME: "tukey" should work, too stop('lmrob.mar: type qE is only available for bisquare psi') h0 <- uniroot(function(c) robustbase:::lmrob.bp('bisquare', c) - (1-p/n)/2, c(1, 3))$root ## update scale obj$scale <- obj$scale * obj$control$tuning.chi / h0 obj$control$tuning.chi <- h0 } ## calculate q q <- switch(type, "qT" = { rs <- obj$resid / obj$scale ## \hat a = \mean \rho(r/sigma)^2 ## obj$control$tuning.chi == h_0 ahat <- mean(Mpsi(rs, obj$control$tuning.chi, obj$control$psi)^2) ## \hat b = \mean \rho''(r/sigma) bhat <- mean(Mpsi(rs, obj$control$tuning.chi, obj$control$psi, 1)) ## \hat c = \mean \rho'(r/sigma) * r/sigma chat <- mean(Mpsi(rs, obj$control$tuning.chi, obj$control$psi)*rs) ## qT: 1 + p*ahat/n/2/bhat/chat }, "qE" = 1 / (1 - (1.29 - 6.02/n)*p/n) , stop("unknown type ", type)) ## update scale obj$scale.uncorrected <- obj$scale obj$scale <- q * obj$scale ## add M step if requested if (method %in% c('MM', 'SM')) { obj$control$cov <- cov obj <- lmrob..M..fit(obj = obj) ## construct a proper lmrob object elems <- c('na.action', 'offset', 'contrasts', 'xlevels', 'terms', 'model', 'x', 'y') obj <- c(obj, obj$init.S[elems[elems %in% names(obj$init.S)]]) obj$degree.freedom <- obj$df.residual <- n - obj$rank } else if (method != 'S') stop("lmrob.mar: Only method = S, SM and MM supported.") ## update class class(obj) <- 'lmrob' ## return obj } ## summary(lmrob(y ~ x, d.data)) ## summary(lmrob.mar(y ~ x, d.data, type = 'qE')) ## summary(tmp <- lmrob.mar(y ~ x, d.data, type = 'qT')) ## this function calculates M-estimate of scale ## with constants as used for S-estimate with maximum breakdown point lmrob.mscale <- function(e, control, p = 0L) { ret <- .C("R_lmrob_S", x = as.double(e), ## this is ignored y = as.double(e), n = as.integer(length(e)), p = as.integer(p), ## divide the sum by n - p nResample = 0L, ## find scale only scale = as.double(mad(e)), coef = double(1), as.double(control$tuning.chi), as.integer(robustbase:::.psi2ipsi(control$psi)), as.double(control$bb), ## delta best_r = as.integer(control$best.r.s), groups = as.integer(control$groups), n.group = as.integer(control$n.group), k.fast.s = as.integer(control$k.fast.s), k.max = as.integer(control$k.max), maxit.scale = as.integer(control$maxit.scale), refine.tol = as.double(control$refine.tol), inv.tol = as.double(control$solve.tol), converged = logical(1), trace.lev = as.integer(0), mts = as.integer(control$mts), ss = robustbase:::.convSs(control$subsampling), fast.s.large.n = as.integer(length(e)+1), PACKAGE = 'robustbase') ret$scale } lmrob.dscale <- function(r, control, kappa = robustbase:::lmrob.kappa(control = control)) { tau <- rep.int(1, length(r)) w <- Mwgt(r, control$tuning.psi, control$psi) scale <- sqrt(sum(w * r^2) / kappa / sum(tau^2*w)) psi <- control$psi c.psi <- robustbase:::.psi.conv.cc(psi, control$tuning.psi) ret <- .C("R_find_D_scale", r = as.double(r), kappa = as.double(kappa), tau = as.double(tau), length = as.integer(length(r)), scale = as.double(scale), c = as.double(c.psi), ipsi = robustbase:::.psi2ipsi(psi), type = 3L, ## dt1 as only remaining option rel.tol = as.double(control$rel.tol), k.max = as.integer(control$k.max), converged = logical(1), PACKAGE = 'robustbase') ret$scale } ## sd.trim function by Gregor Gorjanc ## from http://ggorjan.blogspot.com/2008/11/trimmed-standard-deviation.html ## with added correction factor to be unbiased at the normal sd.trim <- function(x, trim=0, na.rm=FALSE, ...) { if(!is.numeric(x) && !is.complex(x) && !is.logical(x)) { warning("argument is not numeric or logical: returning NA") return(NA_real_) } if(na.rm) x <- x[!is.na(x)] if(!is.numeric(trim) || length(trim) != 1) stop("'trim' must be numeric of length one") n <- length(x) if(trim > 0 && n > 0) { if(is.complex(x)) stop("trimmed sd are not defined for complex data") if(trim >= 0.5) return(0) lo <- floor(n * trim) + 1 hi <- n + 1 - lo x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi] } corr <- if (0 < trim && trim < 0.5) { z <- qnorm(trim, lower.tail=FALSE)# = Phi^{-1}(1 - tr) sqrt(1 - 2/(1-2*trim) *z*dnorm(z)) } else 1 sd(x)/corr } robustbase/inst/doc/lmrob_simulation.Rnw0000644000176200001440000017161512553432200020223 0ustar liggesusers\documentclass[11pt, a4paper]{article} \usepackage[a4paper, text={16cm,25cm}]{geometry} %% To regenerate the vignette's plots, replace %% "eval=FALSE," with "eval=TRUE," %% everywhere (without quotes but *WITH* the ",") %\VignetteIndexEntry{Simulations for Robust Regression Inference in Small Samples} %\VignetteDepends{robustbase, xtable, ggplot2,GGally, RColorBrewer, grid, reshape2} \usepackage{amsmath} \usepackage{natbib} \usepackage[utf8]{inputenc} \newcommand{\makeright}[2]{\ifx#1\left\right#2\else#1#2\fi} \newcommand{\Norm}[2][\left]{\mathcal N #1( #2 \makeright{#1}{)}} \newcommand{\norm}[1] {\| #1 \|} \newcommand{\bld}[1]{\boldsymbol{#1}} % shortcut for bold symbol \newcommand{\T}[1] {\texttt{#1}} \DeclareMathOperator{\wgt}{w} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\median}{median} \DeclareMathOperator{\mad}{mad} \DeclareMathOperator{\Erw}{\mathbf{E}} \SweaveOpts{prefix.string=plot, eps = FALSE, pdf = TRUE, strip.white=true} \SweaveOpts{width=6, height=4} \usepackage[noae]{Sweave} \begin{document} \setkeys{Gin}{width=\textwidth} \setlength{\abovecaptionskip}{-5pt} <>= ## set options options(width=60) ## number of workers to start if(FALSE) {## good for pkg developers options(cores= max(1, parallel::detectCores() - 2)) } else { ## CRAN allows maximum of 2: options(cores= min(2, parallel::detectCores())) } ## Number of Repetitions: N <- 1000 ## get path (= ../inst/doc/ in source pkg) robustDoc <- system.file('doc', package='robustbase') robustDta <- robustDoc ## initialize (packages, data, ...): source(file.path(robustDoc, 'simulation.init.R')) ## set the amount of trimming used in calculation of average results trim <- 0.1 ## set truncation limits trunc <- c(0.02, 0.14) trunc.plot <- c(0.0185, 0.155) <>= ## load required packages for graphics stopifnot(require(ggplot2), require(GGally),# for ggpairs() which replaces ggplot2::plotmatrix() require(grid), require(reshape2)) source(file.path(robustDoc, 'graphics.functions.R')) ## set ggplot theme theme <- theme_bw(base_size = 10) theme$legend.key.size <- unit(0.9, "lines") theme$plot.margin <- unit(c(1/2, 0, 0, 0), "lines") theme_set(theme) ## set default sizes for lines and points update_geom_defaults("point", aes(size = 4/3)) update_geom_defaults("line", aes(size = 1/4)) update_geom_defaults("hline", aes(size = 1/4)) update_geom_defaults("smooth", aes(size = 1/4)) ## alpha value for plots with many points alpha.error <- 0.3 alpha.n <- 0.4 f.truncate <- function(x, up = trunc.plot[2], low = trunc.plot[1]) { x[x > up] <- up x[x < low] <- low x } ## ggplot 0.9.1 and before if (packageVersion("ggplot2") <= "0.9.1") { g.truncate.lines <- geom_hline(yintercept = trunc, color = theme$panel.border()$gp$col) g.truncate.line <- geom_hline(yintercept = trunc[2], color = theme$panel.border()$gp$col) g.truncate.areas <- annotate("rect", xmin=rep(-Inf,2), xmax=rep(Inf,2), ymin=c(0,Inf), ymax=trunc, fill = theme$panel.grid.major()$gp$col) g.truncate.area <- annotate("rect", xmin=-Inf, xmax=Inf, ymin=trunc[2], ymax=Inf, fill = theme$panel.grid.major()$gp$col) } else { ## ggplot 0.9.2 and after g.truncate.lines <- geom_hline(yintercept = trunc, color = theme$panel.border$colour) g.truncate.line <- geom_hline(yintercept = trunc[2], color = theme$panel.border$colour) g.truncate.areas <- annotate("rect", xmin=rep(-Inf,2), xmax=rep(Inf,2), ymin=c(0,Inf), ymax=trunc, fill = theme$panel.grid.major$colour) g.truncate.area <- annotate("rect", xmin=-Inf, xmax=Inf, ymin=trunc[2], ymax=Inf, fill = theme$panel.grid.major$colour) } legend.mod <- list(`SMD.Wtau` = expression(paste('SMD.W',tau)), `SMDM.Wtau` = expression(paste('SMDM.W',tau)), `MM.Avar1` = expression(paste('MM.',Avar[1])), `MMqT` = expression(paste('MM',q[T])), `MMqT.Wssc` = expression(paste('MM',q[T],'.Wssc')), `MMqE` = expression(paste('MM',q[E])), `MMqE.Wssc` = expression(paste('MM',q[E],'.Wssc')), `sigma_S` = expression(hat(sigma)[S]), `sigma_D` = expression(hat(sigma)[D]), `sigma_S*qE` = expression(q[E]*hat(sigma)[S]), `sigma_S*qT` = expression(q[T]*hat(sigma)[S]), `sigma_robust` = expression(hat(sigma)[robust]), `sigma_OLS` = expression(hat(sigma)[OLS]), `t1` = expression(t[1]), `t3` = expression(t[3]), `t5` = expression(t[5]), `cskt(Inf,2)` = expression(paste('cskt(',infinity,',2)')) ) @% end{graphics-setup} \title{Simulations for Sharpening Wald-type Inference in Robust Regression for Small Samples} \author{Manuel Koller} \maketitle \tableofcontents \section{Introduction} In this vignette, we recreate the simulation study of \citet{KS2011}. This vignette is supposed to complement the results presented in the above cited reference and render its results reproducible. Another goal is to provide simulation functions, that, with small changes, could also be used for other simulation studies. Additionally, in Section~\ref{sec:maximum-asymptotic-bias}, we calculate the maximum asymptotic bias curves of the $\psi$-functions used in the simulation. \section{Setting} The simulation setting used here is similar to the one in \citet{maronna2009correcting}. We simulate $N = \Sexpr{N}$ repetitions. To repeat the simulation, we recommend using a small value of $N$ here, since for large $n$ and $p$, computing all the replicates will take days. \subsection{Methods} We compare the methods \begin{itemize} \item MM, SMD, SMDM as described in \citet{KS2011}. These methods are available in the package \T{robustbase} (\T{lmrob}). \item MM as implemented in the package \T{robust} (\T{lmRob}). This method will be denoted as \emph{MMrobust} later on. \item MM using S-scale correction by $q_{\rm T}$ and $q_{\rm E}$ as proposed by \citet{maronna2009correcting}. $q_{\rm T}$ and $q_{\rm E}$ are defined as follows. \begin{equation*} q_{\rm E} = \frac{1}{1 - (1.29 - 6.02/n)p/n}, \end{equation*} \begin{equation*} \hat q_{\rm T} = 1 + \frac{p}{2n}\frac{\hat a}{\hat b\hat c}, \end{equation*} where \begin{equation*} \hat a = \frac{1}{n}\sum_{i=1}^n \psi\left(\frac{r_i}{\hat\sigma_{\rm S}}\right)^2, \hat b = \frac{1}{n} \sum_{i=1}^n\psi'\left(\frac{r_i}{\hat\sigma_{\rm S}}\right),%' \hat c = \frac{1}{n}\sum_{i=1}^n \psi\left(\frac{r_i}{\hat\sigma_{\rm S}}\right) \frac{r_i}{\hat\sigma_{\rm S}}, \end{equation*} with $\psi = \rho'$,%' $n$ the number of observations, $p$ the number of predictor variables, $\hat\sigma_{\rm S}$ is the S-scale estimate and $r_i$ is the residual of the $i$-th observation. When using $q_{\rm E}$ it is necessary to adjust the tuning constants of $\chi$ to account for the dependence of $\kappa$ on $p$. For $q_{\rm T}$ no change is required. This method is implemented as \T{lmrob.mar()} in the source file \T{estimating.functions.R}. \end{itemize} \subsection{$\psi$-functions} We compare \emph{bisquare}, \emph{optimal}, \emph{lqq} and \emph{Hampel} $\psi$-functions. They are illustrated in Fig.~\ref{fig:psi.functions}. The tuning constants used in the simulation are compiled in Table~\ref{tab:psi-functions}. Note that the \emph{Hampel} $\psi$-function is tuned to have a downward slope of $-1/3$ instead of the originally proposed $-1/2$. This was set to allow for a comparison to an even slower descending $\psi$-function. %% generate table of tuning constants used for \psi functions \begin{table}[ht] \begin{center} <>= ## get list of psi functions lst <- lapply(estlist$procedures, function(x) { if (is.null(x$args)) return(list(NULL, NULL, NULL)) if (!is.null(x$args$weight)) return(list(x$args$weight[2], round(f.psi2c.chi(x$args$weight[1]),3), round(f.eff2c.psi(x$args$efficiency, x$args$weight[2]),3))) return(list(x$args$psi, round(if (is.null(x$args$tuning.chi)) lmrob.control(psi=x$args$psi)$tuning.chi else x$args$tuning.chi,3), round(if (is.null(x$args$tuning.psi)) lmrob.control(psi=x$args$psi)$tuning.psi else x$args$tuning.psi,3))) }) lst <- unique(lst) ## because of rounding, down from 21 to 5 ! lst <- lst[sapply(lst, function(x) !is.null(x[[1]]))] # 5 --> 4 ## convert to table tbl <- do.call(rbind, lst) tbl[,2:3] <- apply(tbl[,2:3], 1:2, function(x) { gsub('\\$NA\\$', '\\\\texttt{NA}', paste('$', unlist(x), collapse=', ', '$', sep='')) }) tbl[,1] <- paste('\\texttt{', tbl[,1], '}', sep='') colnames(tbl) <- paste('\\texttt{', c('psi', 'tuning.chi', 'tuning.psi'), '}', sep='') print(xtable(tbl), sanitize.text.function=identity, include.rownames = FALSE, floating=FALSE) @ %def \vspace{15pt} \caption{Tuning constants of $\psi$-functions used in the simulation.} \label{tab:psi-functions} \end{center} \end{table} \begin{figure} \begin{center} <>= f.plot.psi <- function(x, psi) { cc <- lmrob.control(psi = psi)$tuning.psi data.frame(x=x, value=Mpsi(x, cc, psi), psi = psi) } x <- seq(0, 10, length.out = 1000) tmp <- rbind(f.plot.psi(x, 'optimal'), f.plot.psi(x, 'bisquare'), f.plot.psi(x, 'lqq'), f.plot.psi(x, 'hampel')) print(ggplot(tmp, aes(x, value, color = psi)) + geom_line() + ylab(expression(psi(x))) + scale_color_discrete(name = expression(paste(psi, '-function')))) @ \includegraphics{plot-fig-psi-functions} \end{center} \caption{$\psi$-functions used in the simulation.} \label{fig:psi.functions} \end{figure} \subsection{Designs} Two types of designs are used in the simulation: fixed and random designs. One design with $n=20$ observations, $p=1+3$ predictors and strong leverage points. This design also includes an intercept column. It is shown in Fig.~\ref{fig:design-predict}. The other designs are random, i.e., regenerated for every repetition, and the models are fitted without an intercept. We use the same distribution to generate the designs as for the errors. The number of observations simulated are $n = 25, 50, 100, 400$ and the ratio to the number of parameters are $p/n = 1/20, 1/10, 1/5, 1/3, 1/2$. We round $p$ to the nearest smaller integer if necessary. The random datasets are generated using the following code. <>= f.gen <- function(n, p, rep, err) { ## get function name and parameters lerrfun <- f.errname(err$err) lerrpar <- err$args ## generate random predictors ret <- replicate(rep, matrix(do.call(lerrfun, c(n = n*p, lerrpar)), n, p), simplify=FALSE) attr(ret[[1]], 'gen') <- f.gen ret } ratios <- c(1/20, 1/10, 1/5, 1/3, 1/2)## p/n lsit <- expand.grid(n = c(25, 50, 100, 400), p = ratios) lsit <- within(lsit, p <- as.integer(n*p)) .errs.normal.1 <- list(err = 'normal', args = list(mean = 0, sd = 1)) for (i in 1:NROW(lsit)) assign(paste('rand',lsit[i,1],lsit[i,2],sep='_'), f.gen(lsit[i,1], lsit[i,2], rep = 1, err = .errs.normal.1)[[1]]) @ An example design is shown in Fig.~\ref{fig:example.design}. \begin{figure} \begin{center} <>= colnames(rand_25_5) <- paste0("X", 1:5) # workaround new (2014-12) change in GGally print(ggpairs(rand_25_5, axisLabels="show", title = "rand_25_5: n=25, p=5")) @ \includegraphics{plot-fig-example-design} \end{center} \caption{Example random design.} \label{fig:example.design} \end{figure} \subsection{Error Distributions} We simulate the following error distributions \begin{itemize} \item standard normal distribution, \item $t_5$, $t_3$, $t_1$, \item centered skewed t with $df = \infty, 5$ and $\gamma = 2$ (denoted by \emph{cskt$(\infty,2)$} and \emph{cskt}$(5,2)$, respectively); as introduced by \citet{fernandez1998bayesian} using the \T{R} package \T{skewt}, \item contaminated normal, $\Norm{0,1}$ contaminated with $10\%$ $\Norm{0, 10}$ (symmetric, \emph{cnorm}$(0.1,0,3.16)$) or $\Norm{4, 1}$ (asymmetric, \emph{cnorm}$(0.1,4,1)$). \end{itemize} \subsection{Covariance Matrix Estimators} For the standard MM estimator, we compare ${\rm Avar}_1$ of \citet{croux03} and the empirical weighted covariance matrix estimate corrected by Huber's small sample correction as described in \citet{HubPR09} (denoted by \emph{Wssc}). The latter is also used for the variation of the MM estimate proposed by \citet{maronna2009correcting}. For the SMD and SMDM variants we use the covariance matrix estimate as described in \citet{KS2011} (\emph{W$\tau$}). The covariance matrix estimate consists of three parts: \begin{equation*} {\rm cov}(\hat\beta) = \sigma^2\gamma\bld V_{\bld X}^{-1}. \end{equation*} The SMD and SMDM methods of \T{lmrob} use the following defaults. \begin{equation} \label{eq:gammatau} \hat\gamma = \frac{\frac{1}{n}\sum_{i=1}^n\tau_i^2 \psi\left(\frac{r_i}{\tau_i\hat\sigma}\right)^2} {\frac{1}{n}\sum_{i=1}^n\psi'\left(\frac{r_i}{\tau_i\hat\sigma}\right)} \end{equation} where $\tau_i$ is the rescaling factor used for the D-scale estimate (see \citet{KS2011}). \noindent\textbf{Remark: } Equation \eqref{eq:gammatau} is a corrected version of $\gamma$. It was changed in \texttt{robustbase} version \texttt{0.91} (April 2014) to ensure that the equation reduces to $1$ in the classical case ($\psi(x) = x$). If the former (incorrect) version is needed for compatibility reasons, it can be obtained by adding the argument \texttt{cov.corrfact = "tauold"}. \begin{equation*} \bld{\widehat V}_{\bld X} = \frac{1}{\frac{1}{n}\sum_{i=1}^n\wgt_{ii}}\bld X^T\bld W\bld X \end{equation*} where $\bld W = \diag\left(\wgt\left(\frac{r_1}{\hat\sigma}\right), \dots, \wgt\left(\frac{r_n}{\hat\sigma}\right)\right)$. The function $\wgt(r) = \psi(r)/r$ produces the robustness weights. \section{Simulation} The main loop of the simulation is fairly simple. (This code is only run if there are no aggregate results available.) %% set eval to TRUE for chunks simulation-run and simulation-aggr %% if you really want to run the simulations again. %% (better fail with an error than run for weeks) <>= aggrResultsFile <- file.path(robustDta, "aggr_results.Rdata") <>= if (!file.exists(aggrResultsFile)) { ## load packages required only for simulation stopifnot(require(robust), require(skewt), require(foreach)) if (!is.null(getOption("cores"))) { if (getOption("cores") == 1) registerDoSEQ() ## no not use parallel processing else { stopifnot(require(doParallel)) if (.Platform$OS.type == "windows") { cl <- makeCluster(getOption("cores")) clusterExport(cl, c("N", "robustDoc")) clusterEvalQ(cl, slave <- TRUE) clusterEvalQ(cl, source(file.path(robustDoc, 'simulation.init.R'))) registerDoParallel(cl) } else registerDoParallel() } } else registerDoSEQ() ## no not use parallel processing for (design in c("dd", ls(pattern = 'rand_\\d+_\\d+'))) { print(design) ## set design estlist$design <- get(design) estlist$use.intercept <- !grepl('^rand', design) ## add design.predict: pc estlist$design.predict <- if (is.null(attr(estlist$design, 'gen'))) f.prediction.points(estlist$design) else f.prediction.points(estlist$design, max.pc = 2) filename <- file.path(robustDta, sprintf('r.test.final.%s.Rdata',design)) if (!file.exists(filename)) { ## run print(system.time(r.test <- f.sim(estlist, silent = TRUE))) ## save save(r.test, file=filename) ## delete output rm(r.test) ## run garbage collection gc() } } } @ The variable \T{estlist} is a list containing all the necessary settings required to run the simulation as outlined above. Most of its elements are self-explanatory. <<>>= str(estlist, 1) @ \T{errs} is a list containing all the error distributions to be simulated. The entry for the standard normal looks as follows. <<>>= estlist$errs[[1]] @ \T{err} is translated internally to the corresponding random generation or quantile function, e.g., in this case \T{rnorm} or \T{qnorm}. \T{args} is a list containing all the required arguments to call the function. The errors are then generated internally with the following call. <>= set.seed(estlist$seed) errs <- c(sapply(1:nrep, function(x) do.call(fun, c(n = nobs, args)))) @ All required random numbers are generated at once instead of during the simulation. Like this, it is certain, that all the compared methods run on exactly the same data. The entry \T{procedures} follows a similar convention. \T{design.predict} contains the design used for the prediction of observations and calculation of confidence or prediction intervals. The objects returned by the procedures are processed by the functions contained in the \T{estlist\$output} list. <<>>= str(estlist$output[1:3], 2) @ The results are stored in a 4-dimensional array. The dimensions are: repetition number, type of value, procedure id, error id. Using \T{apply} it is very easy and fast to generate summary statistics. The raw results are stored on the hard disk, because typically it takes much longer to execute all the procedures than to calculate the summary statistics. The variables saved take up a lot of space quite quickly, so only the necessary data is stored. These are $\sigma$, $\bld\beta$ as well as the corresponding standard errors. To speed up the simulation routine \T{f.sim}, the simulations are carried out in parallel, as long as this is possible. This is accomplished with the help of the \T{R}-package \T{foreach}. This is most easily done on a machine with multiple processors or cores. The \T{multicore} package provides the methods to do so easily. The worker processes are just forked from the main \T{R} process. After all the methods have been simulated, the simulation output is processed. The code is quite lengthy and thus not displayed here (check the Sweave source file \T{lmrob\_simulation.Rnw}). The residuals, robustness weights, leverages and $\tau$ values have to be recalculated. Using vectorized operations and some specialized \T{C} code, this is quite cheap. The summary statistics generated are discussed in the next section. <>= if (!file.exists(aggrResultsFile)) { files <- list.files(robustDta, pattern = 'r.test.final\\.') res <- foreach(file = files) %dopar% { ## get design, load r.test, initialize other stuff design <- substr(basename(file), 14, nchar(basename(file)) - 6) cat(design, ' ') load(file.path(robustDta, file)) estlist <- attr(r.test, 'estlist') use.intercept <- if (!is.null(estlist$use.intercept)) estlist$use.intercept else TRUE sel <- dimnames(r.test)[[3]] ## [dimnames(r.test)[[3]] != "estname=lm"] n.betas <- paste('beta',1:(NCOL(estlist$design)+use.intercept),sep='_') ## get design lX <- if (use.intercept) as.matrix(cbind(1, get(design))) else as.matrix(get(design)) n <- NROW(lX) p <- NCOL(lX) ## prepare arrays for variable designs and leverages if (is.function(attr(estlist$design, 'gen'))) { lXs <- array(NA, c(n, NCOL(lX), dim(r.test)[c(1, 4)]), list(Obs = NULL, Pred = colnames(lX), Data = NULL, Errstr = dimnames(r.test)[[4]])) } ## generate errors lerrs <- array(NA, c(n, dim(r.test)[c(1,4)]) , list(Obs = NULL, Data = NULL, Errstr = dimnames(r.test)[[4]])) for (i in 1:dim(lerrs)[3]) { lerrstr <- f.list2str(estlist$errs[[i]]) lerr <- f.errs(estlist, estlist$errs[[i]], gen = attr(estlist$design, 'gen'), nobs = n, npar = NCOL(lX)) lerrs[,,lerrstr] <- lerr if (!is.null(attr(lerr, 'designs'))) { ## retrieve generated designs: this returns a list of designs lXs[,,,i] <- unlist(attr(lerr, 'designs')) if (use.intercept) stop('intercept not implemented for random desings') } rm(lerr) } if (is.function(attr(estlist$design, 'gen'))) { ## calculate leverages lXlevs <- apply(lXs, 3:4, robustbase:::lmrob.leverages) } ## calculate fitted values from betas if (!is.function(attr(estlist$design, 'gen'))) { ## fixed design case lfitted <- apply(r.test[,n.betas,sel,,drop=FALSE],c(3:4), function(bhat) { lX %*% t(bhat) } ) } else { ## variable design case lfitted <- array(NA, n*prod(dim(r.test)[c(1,4)])*length(sel)) lfitted <- .C('R_calc_fitted', as.double(lXs), ## designs as.double(r.test[,n.betas,sel,,drop=FALSE]), ## betas as.double(lfitted), ## result as.integer(n), ## n as.integer(p), ## p as.integer(dim(r.test)[1]), ## nrep as.integer(length(sel)), ## n procstr as.integer(dim(r.test)[4]), ## n errstr DUP=FALSE, NAOK=TRUE, PACKAGE="robustbase")[[3]] } tdim <- dim(lfitted) <- c(n, dim(r.test)[1], length(sel),dim(r.test)[4]) lfitted <- aperm(lfitted, c(1,2,4,3)) ## calculate residuals = y - fitted.values lfitted <- as.vector(lerrs) - as.vector(lfitted) dim(lfitted) <- tdim[c(1,2,4,3)] lfitted <- aperm(lfitted, c(1,2,4,3)) dimnames(lfitted) <- c(list(Obs = NULL), dimnames(r.test[,,sel,,drop=FALSE])[c(1,3,4)]) lresids <- lfitted rm(lfitted) ## calculate lm MSE and trim trimmed MSE of betas tf.MSE <- function(lbetas) { lnrm <- rowSums(lbetas^2) c(MSE=mean(lnrm,na.rm=TRUE),MSE.1=mean(lnrm,trim=trim,na.rm=TRUE)) } MSEs <- apply(r.test[,n.betas,,,drop=FALSE],3:4,tf.MSE) li <- 1 ## so we can reconstruct where we are lres <- apply(lresids,3:4,f.aggregate.results <- { function(lresid) { ## the counter li tells us, where we are ## we walk dimensions from left to right lcdn <- f.get.current.dimnames(li, dimnames(lresids), 3:4) lr <- r.test[,,lcdn[1],lcdn[2]] ## update counter li <<- li + 1 ## transpose and normalize residuals with sigma lresid <- t(lresid) / lr[,'sigma'] if (lcdn[1] != 'estname=lm') { ## convert procstr to proclst and get control list largs <- f.str2list(lcdn[1])[[1]]$args if (grepl('lm.robust', lcdn[1])) { lctrl <- list() lctrl$psi <- toupper(largs$weight2) lctrl$tuning.psi <- f.eff2c.psi(largs$efficiency, lctrl$psi) lctrl$method <- 'MM' } else { lctrl <- do.call('lmrob.control',largs) } ## calculate correction factors ## A lsp2 <- rowSums(Mpsi(lresid,lctrl$tuning.psi, lctrl$psi)^2) ## B lspp <- rowSums(lpp <- Mpsi(lresid,lctrl$tuning.psi, lctrl$psi,1)) ## calculate Huber\'s small sample correction factor lK <- 1 + rowSums((lpp - lspp/n)^2)*NCOL(lX)/lspp^2 ## 1/n cancels } else { lK <- lspp <- lsp2 <- NA } ## only calculate tau variants if possible if (grepl('args.method=\\w*(D|T)\\w*\\b', lcdn[1])) { ## SMD or SMDM ## calculate robustness weights lwgts <- Mwgt(lresid,lctrl$tuning.psi, lctrl$psi) ## function to calculate robustified leverages tfun <- if (is.function(attr(estlist$design, 'gen'))) function(i) { if (all(is.na(lwgts[i,]))) lwgts[i,] else robustbase:::lmrob.leverages(lXs[,,i,lcdn[2]],lwgts[i,]) } else function(i) { if (all(is.na(lwgts[i,]))) lwgts[i,] else robustbase:::lmrob.leverages(lX,lwgts[i,]) } llev <- sapply(1:dim(r.test)[1], tfun) ## calculate unique leverages lt <- robustbase:::lmrob.tau(list(),h=llev,control=lctrl) ## normalize residuals with tau (transpose lresid) lresid <- t(lresid) / lt ## A lsp2t <- colSums(Mpsi(lresid,lctrl$tuning.psi, lctrl$psi)^2) ## B lsppt <- colSums(Mpsi(lresid,lctrl$tuning.psi, lctrl$psi,1)) } else { lsp2t <- lsppt <- NA } ## calculate raw scales based on the errors lproc <- f.str2list(lcdn[1])[[1]] q <- NA M <- NA if (lproc$estname == 'lmrob.mar' && lproc$args$type == 'qE') { ## for lmrob_mar, qE variant lctrl <- lmrob.control(psi = 'bisquare', tuning.chi=uniroot(function(c) robustbase:::lmrob.bp('bisquare', c) - (1-p/n)/2, c(1, 3))$root) se <- apply(lerrs[,,lcdn[2]],2,lmrob.mscale,control=lctrl,p=p) ltmp <- se/lr[,'sigma'] q <- median(ltmp, na.rm = TRUE) M <- mad(ltmp, na.rm = TRUE) } else if (!is.null(lproc$args$method) && lproc$args$method == 'SMD') { ## for D-scales se <- apply(lerrs[,,lcdn[2]],2,lmrob.dscale,control=lctrl, kappa=robustbase:::lmrob.kappa(control=lctrl)) ltmp <- se/lr[,'sigma'] q <- median(ltmp, na.rm = TRUE) M <- mad(ltmp, na.rm = TRUE) } ## calculate empirical correct test value (to yield 5% level) t.val_2 <- t.val_1 <- quantile(abs(lr[,'beta_1']/lr[,'se_1']), 0.95, na.rm = TRUE) if (p > 1) t.val_2 <- quantile(abs(lr[,'beta_2']/lr[,'se_2']), 0.95, na.rm = TRUE) ## return output: summary statistics: c(## gamma AdB2.1 = mean(lsp2/lspp^2,trim=trim,na.rm=TRUE)*n, K2AdB2.1 = mean(lK^2*lsp2/lspp^2,trim=trim,na.rm=TRUE)*n, AdB2t.1 = mean(lsp2t/lsppt^2,trim=trim,na.rm=TRUE)*n, sdAdB2.1 = sd.trim(lsp2/lspp^2*n,trim=trim,na.rm=TRUE), sdK2AdB2.1 = sd.trim(lK^2*lsp2/lspp^2*n,trim=trim,na.rm=TRUE), sdAdB2t.1 = sd.trim(lsp2t/lsppt^2*n,trim=trim,na.rm=TRUE), ## sigma medsigma = median(lr[,'sigma'],na.rm=TRUE), madsigma = mad(lr[,'sigma'],na.rm=TRUE), meansigma.1 = mean(lr[,'sigma'],trim=trim,na.rm=TRUE), sdsigma.1 = sd.trim(lr[,'sigma'],trim=trim,na.rm=TRUE), meanlogsigma = mean(log(lr[,'sigma']),na.rm=TRUE), meanlogsigma.1 = mean(log(lr[,'sigma']),trim=trim,na.rm=TRUE), sdlogsigma = sd(log(lr[,'sigma']),na.rm=TRUE), sdlogsigma.1 = sd.trim(log(lr[,'sigma']),trim=trim,na.rm=TRUE), q = q, M = M, ## beta efficiency.1 = MSEs['MSE.1','estname=lm',lcdn[2]] / MSEs['MSE.1',lcdn[1],lcdn[2]], ## t-value: level emplev_1 = mean(abs(lr[,'beta_1']/lr[,'se_1']) > qt(0.975, n - p), na.rm = TRUE), emplev_2 = if (p>1) { mean(abs(lr[,'beta_2']/lr[,'se_2']) > qt(0.975, n - p), na.rm = TRUE) } else NA, ## t-value: power power_1_0.2 = mean(abs(lr[,'beta_1']-0.2)/lr[,'se_1'] > t.val_1, na.rm = TRUE), power_2_0.2 = if (p>1) { mean(abs(lr[,'beta_2']-0.2)/lr[,'se_2'] > t.val_2, na.rm = TRUE) } else NA, power_1_0.4 = mean(abs(lr[,'beta_1']-0.4)/lr[,'se_1'] > t.val_1, na.rm = TRUE), power_2_0.4 = if (p>1) { mean(abs(lr[,'beta_2']-0.4)/lr[,'se_2'] > t.val_2, na.rm = TRUE) } else NA, power_1_0.6 = mean(abs(lr[,'beta_1']-0.6)/lr[,'se_1'] > t.val_1, na.rm = TRUE), power_2_0.6 = if (p>1) { mean(abs(lr[,'beta_2']-0.6)/lr[,'se_2'] > t.val_2, na.rm = TRUE) } else NA, power_1_0.8 = mean(abs(lr[,'beta_1']-0.8)/lr[,'se_1'] > t.val_1, na.rm = TRUE), power_2_0.8 = if (p>1) { mean(abs(lr[,'beta_2']-0.8)/lr[,'se_2'] > t.val_2, na.rm = TRUE) } else NA, power_1_1 = mean(abs(lr[,'beta_1']-1)/lr[,'se_1'] > t.val_1, na.rm = TRUE), power_2_1 = if (p>1) { mean(abs(lr[,'beta_2']-1)/lr[,'se_2'] > t.val_2, na.rm = TRUE) } else NA, ## coverage probability: calculate empirically ## the evaluation points are constant, but the designs change ## therefore this makes only sense for fixed designs cpr_1 = mean(lr[,'upr_1'] < 0 | lr[,'lwr_1'] > 0, na.rm = TRUE), cpr_2 = mean(lr[,'upr_2'] < 0 | lr[,'lwr_2'] > 0, na.rm = TRUE), cpr_3 = mean(lr[,'upr_3'] < 0 | lr[,'lwr_3'] > 0, na.rm = TRUE), cpr_4 = mean(lr[,'upr_4'] < 0 | lr[,'lwr_4'] > 0, na.rm = TRUE), cpr_5 = if (any(colnames(lr) == 'upr_5')) { mean(lr[,'upr_5'] < 0 | lr[,'lwr_5'] > 0, na.rm = TRUE) } else NA, cpr_6 = if (any(colnames(lr) == 'upr_6')) { mean(lr[,'upr_6'] < 0 | lr[,'lwr_6'] > 0, na.rm = TRUE) } else NA, cpr_7 = if (any(colnames(lr) == 'upr_7')) { mean(lr[,'upr_7'] < 0 | lr[,'lwr_7'] > 0, na.rm = TRUE) } else NA ) }}) ## convert to data.frame lres <- f.a2df.2(lres, split = '___NO___') ## add additional info lres$n <- NROW(lX) lres$p <- NCOL(lX) lres$nmpdn <- with(lres, (n-p)/n) lres$Design <- design ## clean up rm(r.test, lXs, lXlevs, lresids, lerrs) gc() ## return lres lres } save(res, trim, file = aggrResultsFile) ## stop cluster if (exists("cl")) stopCluster(cl) } <>= load(aggrResultsFile) ## this will fail if the file is not found (for a reason) ## set eval to TRUE for chunks simulation-run and simulation-aggr ## if you really want to run the simulations again. ## (better fail with an error than run for weeks) ## combine list elements to data.frame test.1 <- do.call('rbind', res) test.1 <- within(test.1, { Method[Method == "SM"] <- "MM" Method <- Method[, drop = TRUE] Estimator <- interaction(Method, D.type, drop = TRUE) Estimator <- f.rename.level(Estimator, 'MM.S', 'MM') Estimator <- f.rename.level(Estimator, 'SMD.D', 'SMD') Estimator <- f.rename.level(Estimator, 'SMDM.D', 'SMDM') Estimator <- f.rename.level(Estimator, 'MM.qT', 'MMqT') Estimator <- f.rename.level(Estimator, 'MM.qE', 'MMqE') Estimator <- f.rename.level(Estimator, 'MM.rob', 'MMrobust') Estimator <- f.rename.level(Estimator, 'lsq.lm', 'OLS') Est.Scale <- f.rename.level(Estimator, 'MM', 'sigma_S') Est.Scale <- f.rename.level(Est.Scale, 'MMrobust', 'sigma_robust') Est.Scale <- f.rename.level(Est.Scale, 'MMqE', 'sigma_S*qE') Est.Scale <- f.rename.level(Est.Scale, 'MMqT', 'sigma_S*qT') Est.Scale <- f.rename.level(Est.Scale, 'SMDM', 'sigma_D') Est.Scale <- f.rename.level(Est.Scale, 'SMD', 'sigma_D') Est.Scale <- f.rename.level(Est.Scale, 'OLS', 'sigma_OLS') Psi <- f.rename.level(Psi, 'hampel', 'Hampel') }) ## add interaction of Method and Cov test.1 <- within(test.1, { method.cov <- interaction(Estimator, Cov, drop=TRUE) levels(method.cov) <- sub('\\.+vcov\\.(a?)[wacrv1]*', '\\1', levels(method.cov)) method.cov <- f.rename.level(method.cov, "MMa", "MM.Avar1") method.cov <- f.rename.level(method.cov, "MMrobust.Default", "MMrobust.Wssc") method.cov <- f.rename.level(method.cov, "MM", "MM.Wssc") method.cov <- f.rename.level(method.cov, "SMD", "SMD.Wtau") method.cov <- f.rename.level(method.cov, "SMDM", "SMDM.Wtau") method.cov <- f.rename.level(method.cov, "MMqT", "MMqT.Wssc") method.cov <- f.rename.level(method.cov, "MMqE", "MMqE.Wssc") method.cov <- f.rename.level(method.cov, "OLS.Default", "OLS") }) ## add desired ratios: test.1$ratio <- ratios[apply(abs(as.matrix(1/ratios) %*% t(as.matrix(test.1$p / test.1$n)) - 1), 2, which.min)] ## calculate expected values of psi^2 and psi' test.1$Ep2 <- test.1$Epp <- NA for(Procstr in levels(test.1$Procstr)) { args <- f.str2list(Procstr)[[1]]$args if (is.null(args)) next lctrl <- do.call('lmrob.control',args) test.1$Ep2[test.1$Procstr == Procstr] <- robustbase:::lmrob.E(psi(r)^2, lctrl, use.integrate = TRUE) test.1$Epp[test.1$Procstr == Procstr] <- robustbase:::lmrob.E(psi(r,1), lctrl, use.integrate = TRUE) } ## drop some observations, separate fixed and random designs test.fixed <- droplevels(subset(test.1, n == 20)) test.1 <- droplevels(subset(test.1, n != 20)) test.lm <- droplevels(subset(test.1, Function == 'lm')) test.lm$Psi <- NULL test.lm.2 <- droplevels(subset(test.lm, Error == 'N(0,1)')) test.1 <- droplevels(subset(test.1, Function != 'lm')) test.2 <- droplevels(subset(test.1, Error == 'N(0,1)' & Function != 'lm')) test.3 <- droplevels(subset(test.2, Method != 'SMDM')) test.4 <- droplevels(subset(test.1, Method != 'SMDM')) @ \section{Simulation Results} \subsection{Criteria} The simulated methods are compared using the following criteria. \textbf{Scale estimates.} The criteria for scale estimates are all calculated on the log-scale. The bias of the estimators is measured by the $\Sexpr{trim*100}\%$ trimmed mean. To recover a meaningful scale, the results are exponentiated before plotting. It is easy to see that this is equivalent to calculating geometric means. Since the methods are all tuned at the central model, ${\mathcal N}(0,1)$, a meaningful comparison of biases can only be made for ${\mathcal N}(0,1)$ distributed errors. The variability of the estimators, on the other hand, can be compared over all simulated error distributions. It is measured by the $\Sexpr{trim*100}\%$ trimmed standard deviation, rescaled by the square root of the number of observations. For completeness, the statistics used to compare scale estimates in \citet{maronna2009correcting} are also calculated. They are defined as \begin{equation} \label{eq:def.q.and.M} q = \median\left(\frac{S(\bld e)}{\hat\sigma_S}\right), \quad M = \mad\left(\frac{S(\bld e)}{\hat\sigma_S}\right), \end{equation} where $S(e)$ stands for the S-scale estimate evaluated for the actual errors $\bld e$. For the D-scale estimate, the definition is analogue. Since there is no design to correct for, we set $\tau_i = 1\ \forall i$. \textbf{Coefficients.} The efficiency of estimated regression coefficients $\bld{\hat\beta}$ is characterized by their mean squared error (\emph{MSE}). Since we simulate under $H_0: \bld\beta = 0$, this is determined by the covariance matrix of $\bld{\hat\beta}$. We use $\Erw\left[\norm{\bld{\hat\beta}}_2^2\right] = \sum_{j=1}^p \var(\hat\beta_j)$ as a summary. When comparing to the MSE of the ordinary least squares estimate (\emph{OLS}), this gives the efficiency, which, by the choice of tuning constants of $\psi$, should yield \begin{equation*} \frac{{\rm MSE}(\bld{\hat\beta}_{\rm OLS})}{{\rm MSE}(\bld{\hat\beta})} \approx 0.95 \end{equation*} for standard normally distributed errors. The simulation mean of $\sum_{j=1}^p \var(\hat\beta_j)$ is calculated with $\Sexpr{trim*100}\%$ trimming. For other error distributions, this ratio should be larger than $1$, since by using robust procedures we expect to gain efficiency at other error distributions (relative to the least squares estimate). $\bld\gamma$\textbf{.} We compare the behavior of the various estimators of $\gamma$ by calculating the trimmed mean and the trimmed standard deviation for standard normal distributed errors. \textbf{Covariance matrix estimate.} The covariance matrix estimates are compared indirectly over the performance of the resulting test statistics. We compare the empirical level of the hypothesis tests $H_0: \beta_j = 0$ for some $j \in \{1,\dots, p\}$. The power of the tests is compared by testing for $H_0: \beta_j = b$ for several values of $b>0$. The formal power of a more liberal test is generally higher. Therefore, in order for this comparison to be meaningful, the critical value for each test statistic was corrected such that all tests have the same simulated level of $5\%$. The simple hypothesis tests give only limited insights. To investigate the effects of other error distributions, e.g., asymmetric error distributions, we compare the confidence intervals for the prediction of some fixed points. Since it was not clear how to assess the quality prediction intervals, either at the central or the simulated model, we do not calculate them here. A small number of prediction points is already enough, if they are chosen properly. We chose to use seven points lying on the first two principal components, spaced evenly from the center of the design used to the extended range of the design. The principal components were calculated robustly (using \T{covMcd} of the \T{robustbase} package) and the range was extended by a fraction of $0.5$. An example is shown in Figure~\ref{fig:design-predict}. \subsection{Results} The results are given here as plots (Fig.~\ref{fig:meanscale-1} to Fig.~\ref{fig:cpr}). For a complete discussion of the results, we refer to \citet{KS2011}. The different $\psi$-functions are each plotted in a different facet, except for Fig.~\ref{fig:qscale-all}, Fig.~\ref{fig:Mscale-all} and Fig.~\ref{fig:lqq-level}, where the facets show the results for various error distributions. The plots are augmented with auxiliary lines to ease the comparison of the methods. The lines connect the median values over the values of $n$ for each simulated ratio $p/n$. In many plots the y-axis has been truncated. Points in the grey shaded area represent truncated values using a different scale. \begin{figure} \begin{center} <>= ## ## exp(mean(log(sigma))): this looks almost identical to mean(sigma) print(ggplot(test.3, aes(p/n, exp(meanlogsigma.1), color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = factor(n)), alpha = alpha.n) + geom_hline(yintercept = 1) + g.scale_y_log10_1() + facet_wrap(~ Psi) + ylab(expression(paste('geometric ',mean(hat(sigma))))) + scale_shape_discrete(expression(n)) + scale_colour_discrete("Scale Est.", labels=lab(test.3$Est.Scale))) @ \includegraphics{plot-fig-meanscale} \end{center} \caption{Mean of scale estimates for normal errors. The mean is calculated with $\Sexpr{trim*100}\%$ trimming. The lines connect the median values for each simulated ratio $p/n$. Results for random designs only. } \label{fig:meanscale-1} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(test.3, aes(p/n, sdlogsigma.1*sqrt(n), color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = factor(n)), alpha = alpha.n) + ylab(expression(sd(log(hat(sigma)))*sqrt(n))) + facet_wrap(~ Psi) + geom_point(data=test.lm.2, alpha=alpha.n, aes(color = Est.Scale)) + stat_summary(data=test.lm.2, aes(x=ratio, color = Est.Scale), fun.y=median, geom='line') + scale_shape_discrete(expression(n)) + scale_colour_discrete("Scale Est.", labels=lab(test.3$Est.Scale, test.lm.2$Est.Scale))) @ \includegraphics{plot-fig-sdscale-1} \end{center} \caption{Variability of the scale estimates for normal errors. The standard deviation is calculated with $\Sexpr{trim*100}\%$ trimming. } \label{fig:sdscale-1} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(test.4, aes(p/n, sdlogsigma.1*sqrt(n), color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = Error), alpha = alpha.error) + ylab(expression(sd(log(hat(sigma)))*sqrt(n))) + facet_wrap(~ Psi) + geom_point(data=test.lm, alpha=alpha.n, aes(color = Est.Scale)) + stat_summary(data=test.lm, aes(x=ratio, color = Est.Scale), fun.y=median, geom='line') + ylim(with(test.4, range(sdlogsigma.1*sqrt(n)))) + g.scale_shape(labels=lab(test.4$Error)) + scale_colour_discrete("Scale Est.", labels=lab(test.4$Est.Scale, test.lm$Est.Scale))) @ \includegraphics{plot-fig-sdscale-all} \end{center} \caption{Variability of the scale estimates for all simulated error distributions. } \label{fig:sdscale-all} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- droplevels(subset(test.3, Estimator %in% c("SMD", "MMqE"))), aes(p/n, q, color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = factor(n)), alpha = alpha.n) + geom_hline(yintercept = 1) + g.scale_y_log10_1() + facet_wrap(~ Psi) + ylab(expression(q)) + scale_shape_discrete(expression(n)) + scale_colour_discrete("Scale Est.", labels=lab(tmp$Est.Scale))) @ \includegraphics{plot-fig-qscale} \end{center} \caption{$q$ statistic for normal errors. $q$ is defined in \eqref{eq:def.q.and.M}. } \label{fig:qscale-1} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- droplevels(subset(test.3, Estimator %in% c("SMD", "MMqE"))), aes(p/n, M/q, color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = factor(n)), alpha = alpha.n) + g.scale_y_log10_0.05() + facet_wrap(~ Psi) + ylab(expression(M/q)) + scale_shape_discrete(expression(n)) + scale_colour_discrete("Scale Est.", labels=lab(tmp$Est.Scale))) @ \includegraphics{plot-fig-Mscale} \end{center} \caption{$M/q$ statistic for normal errors. $M$ and $q$ are defined in \eqref{eq:def.q.and.M}. } \label{fig:Mscale-1} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- droplevels(subset(test.1, Estimator %in% c("SMD", "MMqE") & Psi == 'bisquare')), aes(p/n, q, color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = factor(n)), alpha = alpha.n) + geom_hline(yintercept = 1) + g.scale_y_log10_1() + facet_wrap(~ Error) + ## labeller missing! ylab(expression(q)) + scale_shape_discrete(expression(n)) + scale_colour_discrete("Scale Est.", labels=lab(tmp$Est.Scale)), legend.mod = legend.mod) @ \includegraphics{plot-fig-qscale-all} \end{center} \caption{$q$ statistic for \emph{bisquare} $\psi$. } \label{fig:qscale-all} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- droplevels(subset(test.1, Estimator %in% c("SMD", "MMqE") & Psi == 'bisquare')), aes(p/n, M/q, color = Est.Scale)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(aes(shape = factor(n)), alpha = alpha.n) + g.scale_y_log10_0.05() + facet_wrap(~ Error) + ylab(expression(M/q)) + scale_shape_discrete(expression(n)) + scale_colour_discrete("Scale Est.", labels=lab(tmp$Est.Scale)), legend.mod = legend.mod) @ \includegraphics{plot-fig-Mscale-all} \end{center} \caption{$M/q$ statistic for \emph{bisquare} $\psi$. } \label{fig:Mscale-all} \end{figure} \clearpage% not nice, but needed against LaTeX Error: Too many unprocessed floats. \begin{figure} \begin{center} <>= print(ggplot(test.2, aes(p/n, efficiency.1, color = Estimator)) + geom_point(aes(shape = factor(n)), alpha = alpha.n) + geom_hline(yintercept = 0.95) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + facet_wrap(~ Psi) + ylab(expression(paste('efficiency of ',hat(beta)))) + g.scale_shape(expression(n)) + scale_colour_discrete(name = "Estimator", labels = lab(test.2$Estimator))) @ \includegraphics{plot-fig-efficiency} \end{center} \caption{Efficiency for normal errors. The efficiency is calculated by comparing to an OLS estimate and averaging with $\Sexpr{trim*100}\%$ trimming. } \label{fig:efficiency} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- droplevels(subset(test.1, Error != 't1')), aes(p/n, efficiency.1, color = Estimator)) + geom_point(aes(shape = Error), alpha = alpha.error) + geom_hline(yintercept = 0.95) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + g.scale_shape(values=c(16,17,15,3,7,8,9,1,2,4)[-4], labels=lab(tmp$Error)) + facet_wrap(~ Psi) + ylab(expression(paste('efficiency of ',hat(beta)))) + scale_colour_discrete(name = "Estimator", labels = lab(tmp$Estimator))) @ \includegraphics{plot-fig-efficiency-all} \end{center} \caption{Efficiency for all simulated error distributions except $t_1$. } \label{fig:efficiency-all} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(test.2, aes(p/n, AdB2.1/(1-p/n), color = Estimator)) + geom_point(aes(shape=factor(n)), alpha = alpha.n) + geom_point(aes(p/n, K2AdB2.1/(1-p/n)), alpha = alpha.n) + geom_point(aes(p/n, AdB2t.1), alpha = alpha.n) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + stat_summary(aes(x=ratio, y=K2AdB2.1/(1-p/n)), fun.y=median, geom='line', linetype = 2) + stat_summary(aes(x=ratio, y=AdB2t.1), fun.y=median, geom='line', linetype = 3) + geom_hline(yintercept = 1/0.95) + g.scale_y_log10_1() + scale_shape_discrete(expression(n)) + scale_colour_discrete(name = "Estimator", labels = lab(test.2$Estimator)) + ylab(expression(mean(hat(gamma)))) + facet_wrap(~ Psi)) @ \includegraphics{plot-fig-AdB2-1} \end{center} \caption{Comparing the estimates of $\gamma$. The solid line connects the uncorrected estimate, dotted the $\tau$ corrected estimate and dashed Huber's small sample correction. } \label{fig:AdB2-1} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(test.2, aes(p/n, sdAdB2.1/(1-p/n), color = Estimator)) + geom_point(aes(shape=factor(n)), alpha = alpha.n) + geom_point(aes((p/n), sdK2AdB2.1/(1-p/n)), alpha = alpha.n) + geom_point(aes((p/n), sdAdB2t.1), alpha = alpha.n) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + stat_summary(aes(x=ratio, y=sdK2AdB2.1/(1-p/n)), fun.y=median, geom='line', linetype = 2) + stat_summary(aes(x=ratio, y=sdAdB2t.1), fun.y=median, geom='line', linetype = 3) + g.scale_y_log10_0.05() + scale_shape_discrete(expression(n)) + scale_colour_discrete(name = "Estimator", labels=lab(test.2$Estimator)) + ylab(expression(sd(hat(gamma)))) + facet_wrap(~ Psi)) @ \includegraphics{plot-fig-sdAdB2-1} \end{center} \caption{Comparing the estimates of $\gamma$. The solid line connects the uncorrected estimate, dotted the $\tau$ corrected estimate and dashed Huber's small sample correction. } \label{fig:sdAdB2-1} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(test.2, aes(p/n, f.truncate(emplev_1), color = method.cov)) + g.truncate.lines + g.truncate.areas + geom_point(aes(shape = factor(n)), alpha = alpha.n) + scale_shape_discrete(expression(n)) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_hline(yintercept = 0.05) + g.scale_y_log10_0.05() + scale_colour_discrete(name = "Estimator", labels=lab(test.2$method.cov)) + ylab(expression(paste("empirical level (", H[0], ": ", beta[1], "=", 0, ")"))) + facet_wrap(~ Psi)) @ \includegraphics{plot-fig-emp-level} \end{center} \caption{Empirical levels of test $H_0: \beta_1 = 0$ for normal errors. The y-values are truncated at $\Sexpr{trunc[1]}$ and $\Sexpr{trunc[2]}$. } \label{fig:emp-level} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- droplevels(subset(test.1, Psi == 'lqq' & emplev_1 != 0)), aes(p/n, f.truncate(emplev_1), color = method.cov)) + g.truncate.line + g.truncate.area + geom_point(aes(shape = factor(n)), alpha = alpha.n) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_hline(yintercept = 0.05) + g.scale_y_log10_0.05() + g.scale_shape(expression(n)) + scale_colour_discrete(name = "Estimator", labels=lab(tmp$method.cov)) + ylab(expression(paste("empirical level (", H[0], ": ", beta[1], "=", 0, ")"))) + facet_wrap(~ Error), legend.mod = legend.mod ) @ \includegraphics{plot-fig-lqq-level} \end{center} \caption{Empirical levels of test $H_0: \beta_1 = 0$ for \emph{lqq} $\psi$-function and different error distributions. } \label{fig:lqq-level} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- subset(test.1, n == 25), aes(p/n, power_1_0.2, color = method.cov)) + geom_point(aes(shape = Error), alpha = alpha.error) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(data=subset(test.lm, n == 25), alpha = alpha.n) + stat_summary(data=tmp2 <- subset(test.lm, n == 25), aes(x=ratio), fun.y=median, geom='line') + g.scale_shape("Error", labels=lab(tmp$Error)) + scale_colour_discrete(name = "Estimator (Cov. Est.)" ,labels=lab(tmp$method.cov, tmp2$method.cov) ) + ylab(expression(paste("empirical power (", H[0], ": ", beta[1], "=", 0.2, ")"))) + facet_wrap(~ Psi) ) @ \includegraphics{plot-fig-power-1-0_2} \end{center} \caption{Empirical power of test $H_0: \beta_1 = 0.2$ for different $\psi$-functions. Results for $n = 25$ and normal errors only. } \label{fig:power-1-0_2} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- subset(test.1, n == 25), aes(p/n, power_1_0.4, color = method.cov)) + geom_point(aes(shape = Error), alpha = alpha.error) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(data=tmp2 <- subset(test.lm, n == 25), alpha = alpha.n) + stat_summary(data=subset(test.lm, n == 25), aes(x=ratio), fun.y=median, geom='line') + g.scale_shape("Error", labels=lab(tmp$Error)) + scale_colour_discrete(name = "Estimator (Cov. Est.)" ,labels=lab(tmp$method.cov, tmp2$method.cov) ) + ylab(expression(paste("empirical power (", H[0], ": ", beta[1], "=", 0.4, ")"))) + facet_wrap(~ Psi) ) @ \includegraphics{plot-fig-power-1-0_4} \end{center} \caption{Empirical power of test $H_0: \beta_1 = 0.4$ for different $\psi$-functions. Results for $n = 25$ and normal errors only. } \label{fig:power-1-0_4} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- subset(test.1, n == 25), aes(p/n, power_1_0.6, color = method.cov)) + geom_point(aes(shape = Error), alpha = alpha.error) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + g.scale_shape("Error", labels=lab(tmp$Error)) + geom_point(data=tmp2 <- subset(test.lm, n == 25), alpha = alpha.n) + stat_summary(data=subset(test.lm, n == 25), aes(x=ratio), fun.y=median, geom='line') + scale_colour_discrete(name = "Estimator (Cov. Est.)" , labels=lab(tmp$method.cov, tmp2$method.cov) ) + ylab(expression(paste("empirical power (", H[0], ": ", beta[1], "=", 0.6, ")"))) + facet_wrap(~ Psi) ) @ \includegraphics{plot-fig-power-1-0_6} \end{center} \caption{Empirical power of test $H_0: \beta_1 = 0.6$ for different $\psi$-functions. Results for $n = 25$ and normal errors only. } \label{fig:power-1-0_6} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- subset(test.1, n == 25), aes(p/n, power_1_0.8, color = method.cov)) + geom_point(aes(shape = Error), alpha = alpha.error) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(data=tmp2 <- subset(test.lm, n == 25), alpha = alpha.n) + stat_summary(data=subset(test.lm, n == 25), aes(x=ratio), fun.y=median, geom='line') + g.scale_shape("Error", labels=lab(tmp$Error)) + scale_colour_discrete(name = "Estimator (Cov. Est.)" , labels=lab(tmp$method.cov, tmp2$method.cov) ) + ylab(expression(paste("empirical power (", H[0], ": ", beta[1], "=", 0.8, ")"))) + facet_wrap(~ Psi) ) @ \includegraphics{plot-fig-power-1-0_8} \end{center} \caption{Empirical power of test $H_0: \beta_1 = 0.8$ for different $\psi$-functions. Results for $n = 25$ and normal errors only. } \label{fig:power-1-0_8} \end{figure} \begin{figure} \begin{center} <>= print(ggplot(tmp <- subset(test.1, n == 25), aes(p/n, power_1_1, color = method.cov)) + geom_point(aes(shape = Error), alpha = alpha.error) + stat_summary(aes(x=ratio), fun.y=median, geom='line') + geom_point(data=tmp2 <- subset(test.lm, n == 25), alpha = alpha.n) + stat_summary(data=subset(test.lm, n == 25), aes(x=ratio), fun.y=median, geom='line') + g.scale_shape("Error", labels=lab(tmp$Error)) + scale_colour_discrete(name = "Estimator (Cov. Est.)" , labels=lab(tmp$method.cov, tmp2$method.cov) ) + ylab(expression(paste("empirical power (", H[0], ": ", beta[1], "=", 1, ")"))) + facet_wrap(~ Psi) ) @ \includegraphics{plot-fig-power-1-1} \end{center} \caption{Empirical power of test $H_0: \beta_1 = 1$ for different $\psi$-functions. Results for $n = 25$ and normal errors only. } \label{fig:power-1-1} \end{figure} %\clearpage \begin{figure} \begin{center} %% vvvvv because of FIXME below <>= pp <- f.prediction.points(dd)[1:7,] ## Worked in older ggplot2 -- now plotmatrix() is gone, to be replaced by GGally::ggpairs): ## tmp <- plotmatrix(pp)$data ## tmp$label <- as.character(1:7) ## print(plotmatrix(dd) + geom_text(data=tmp, color = 2, aes(label=label), size = 2.5)) tmp <- ggpairs(pp)$data tmp$label <- as.character(1:7) # and now? ## ggpairs() + geom_text() does *NOT* work {ggpairs has own class} ## print(ggpairs(dd) + geom_text(data=tmp, color = 2, aes(label=label), size = 2.5)) ggpairs(dd) ## not at all satisfactory !! __FIXME__ @ \includegraphics{plot-fig-pred-points} \end{center} \caption{Prediction points for fixed design. The black points are the points of the original design. The red digits indicate the numbers and locations of the points where predictions are taken.} \label{fig:design-predict} \end{figure} \begin{figure} \begin{center} <>= n.cprs <- names(test.fixed)[grep('cpr', names(test.fixed))] test.5 <- melt(test.fixed[,c('method.cov', 'Error', 'Psi', n.cprs)]) test.5 <- within(test.5, { ltmp <- as.numeric(do.call('rbind', strsplit(levels(variable), '_'))[,2]) Point <- ltmp[variable] ltmp <- NULL }) print(ggplot(test.5, aes(Point, f.truncate(value), color = method.cov)) + geom_point(aes(shape = Error), alpha = alpha.error) + g.truncate.line + g.truncate.area + stat_summary(fun.y=median, geom='line') + geom_hline(yintercept = 0.05) + g.scale_y_log10_0.05() + g.scale_shape(labels=lab(test.5$Error)) + scale_colour_discrete(name = "Estimator (Cov. Est.)", labels=lab(test.5$method.cov)) + ylab("empirical level of confidence intervals") + facet_wrap(~ Psi) ) @ \includegraphics{plot-fig-cpr} \end{center} \caption{Empirical coverage probabilities. Results for fixed design. The y-values are truncated at $\Sexpr{trunc[2]}$. } \label{fig:cpr} \end{figure} \clearpage \section{Maximum Asymptotic Bias} \label{sec:maximum-asymptotic-bias} The slower redescending $\psi$-functions come with higher asymptotic bias as illustrated in Fig.~\ref{fig:max-asymptotic-bias}. We calculate the asymptotic bias as in \citet{berrendero2007maximum}. <>= ## Henning (1994) eq 33: g <- Vectorize(function(s, theta, mu, ...) { lctrl <- lmrob.control(...) rho <- function(x) Mchi(x, lctrl$tuning.chi, lctrl$psi, deriv = 0) integrate(function(x) rho(((1 + theta^2)/s^2*x)^2)*dchisq(x, 1, mu^2/(1 + theta^2)), -Inf, Inf)$value }) ## Martin et al 1989 Section 3.2: for mu = 0 g.2 <- Vectorize(function(s, theta, mu, ...) { lctrl <- lmrob.control(...) lctrl$tuning.psi <- lctrl$tuning.chi robustbase:::lmrob.E(chi(sqrt(1 + theta^2)/s*r), lctrl, use.integrate = TRUE)}) g.2.MM <- Vectorize(function(s, theta, mu, ...) { lctrl <- lmrob.control(...) robustbase:::lmrob.E(chi(sqrt(1 + theta^2)/s*r), lctrl, use.integrate = TRUE)}) ## Henning (1994) eq 30, one parameter case g.3 <- Vectorize(function(s, theta, mu, ...) { lctrl <- lmrob.control(...) rho <- function(x) Mchi(x, lctrl$tuning.chi, lctrl$psi, deriv = 0) int.x <- Vectorize(function(y) { integrate(function(x) rho((y - x*theta - mu)/s)*dnorm(x)*dnorm(y),-Inf, Inf)$value }) integrate(int.x,-Inf, Inf)$value }) inv.g1 <- function(value, theta, mu, ...) { g <- if (mu == 0) g.2 else g.3 uniroot(function(s) g(s, theta, mu, ...) - value, c(0.1, 100))$root } inv.g1.MM <- function(value, theta, mu, ...) { g <- if (mu == 0) g.2.MM else g.3.MM ret <- tryCatch(uniroot(function(s) g(s, theta, mu, ...) - value, c(0.01, 100)), error = function(e)e) if (inherits(ret, 'error')) { warning('inv.g1.MM: ', value, ' ', theta, ' ', mu,' -> Error: ', ret$message) NA } else { ret$root } } s.min <- function(epsilon, ...) inv.g1(0.5/(1 - epsilon), 0, 0, ...) s.max <- function(epsilon, ...) inv.g1((0.5-epsilon)/(1-epsilon), 0, 0, ...) BS <- Vectorize(function(epsilon, ...) { sqrt(s.max(epsilon, ...)/s.min(epsilon, ...)^2 - 1) }) l <- Vectorize(function(epsilon, ...) { sigma_be <- s.max(epsilon, ...) sqrt((sigma_be/inv.g1.MM(g.2.MM(sigma_be,0,0,...) + epsilon/(1-epsilon),0,0,...))^2 - 1) }) u <- Vectorize(function(epsilon, ...) { gamma_be <- s.min(epsilon, ...) max(l(epsilon, ...), sqrt((gamma_be/inv.g1.MM(g.2.MM(gamma_be,0,0,...) + epsilon/(1-epsilon),0,0,...))^2 - 1)) }) @ \begin{figure} \begin{center} <>= asymptMBFile <- file.path(robustDta, 'asymptotic.max.bias.Rdata') if (!file.exists(asymptMBFile)) { x <- seq(0, 0.35, length.out = 100) rmb <- rbind(data.frame(l=l(x, psi = 'hampel'), u=u(x, psi = 'hampel'), psi = 'Hampel'), data.frame(l=l(x, psi = 'lqq'), u=u(x, psi = 'lqq'), psi = 'lqq'), data.frame(l=l(x, psi = 'bisquare'), u=u(x, psi = 'bisquare'), psi = 'bisquare'), data.frame(l=l(x, psi = 'optimal'), u=u(x, psi = 'optimal'), psi = 'optimal')) rmb$x <- x save(rmb, file=asymptMBFile) } else load(asymptMBFile) print(ggplot(rmb, aes(x, l, color=psi)) + geom_line() + geom_line(aes(x, u, color=psi), linetype = 2) + coord_cartesian(ylim=c(0,10)) + scale_y_continuous(breaks = 1:10) + scale_colour_hue(expression(paste(psi,'-function'))) + xlab(expression(paste("amount of contamination ", epsilon))) + ylab("maximum asymptotic bias bounds")) @ \includegraphics{plot-fig-max-asymptotic-bias} \end{center} \caption{Maximum asymptotic bias bound for the $\psi$-functions used in the simulation. Solid line: lower bound. Dashed line: upper bound.} \label{fig:max-asymptotic-bias} \end{figure} \bibliographystyle{chicago} \bibliography{robustbase} \end{document} robustbase/inst/doc/fastMcd-kmini.pdf0000644000176200001440000020613712553432201017334 0ustar liggesusers%PDF-1.5 % 1 0 obj << /Type /ObjStm /Length 4451 /Filter /FlateDecode /N 83 /First 687 >> stream x\ks6ߚL$w8q8׏mgVi[=\N{.@R|H6h3,s"gلi&e)~f 30Ǭ,eNj&8Ʉ0(WL i- i00)42Irʤ֒I_`iJFbJ;PJ,F#i9HԨc)3) t+5 &A呃4X# ή&uJpQc.fdwy$ =y>RFiٗrE_l8;eO~ vttS59/l0&@Ä 1N:3a&`ڬ}soumD4iY*XruH%4IKEw#j5_L{a*+Į|K^ݏ6~1Odzsrx~~1\c0)VFׁ]#% O 2~ Nyw׏{pK̙h0(U)CVϽ)ݩ`9"= 8L]JY0ܓy/A~7fmdsfh0^.?pQrgx ,q ߁oO3Y. UIOD2L4 ?ߺVV- T+W]a[-gS݀r5ţt:O"[K,8:Ծڸ^69Z₎"9}F|l gM6*=D}T-,,]ܸĔ)wfxhpE^U]'b+JPGyB(D ~SDjCAcat4"),,* {ִbpUq#tjg&g~ȲaM#.Hץp݂Z_`cQx`|{죗"^7涍kۅum^Zu"e O@_::hۤ>9EAHxk*o+7CgElwN(ET)8) տ7q@SIk`<)a_kYFyH%vQyRQQK_6'28ULKʞy9{%du.~A9a!WᛌƋl]fLrpzʦADzFƱr堗ދo Bs!ȳ1丐|-rr@61_fq y}AuSC}ɼu FwJOM UБvN^nSK׮!v䟯 <`X-5׭Ai>|Rn܃.DXVUTw6unqiN./+\WS;[xk%[. |BwJcנ m:`՝R~nB@Z=7"4%%n+r,KFKΛ j;wm2~ЂPhDX.L[ͩ -ZGߣFt%+luB,"ֺHCrӧRaZk KC|Sk7t/+hb! /ul:8b4e&O(> Q0E/v 'l_c2/|E  z)2ӫхWt:HGS8 d) ix&)I[Z@+ Xo~<4$m͖UJ& ;B^N|™f|+6[\WM.6[Le0=Cp>tDӤlnq!94[/~] uZ j. '0mAޛ Y#[H7ip׽3"L86Q,WsTR~SHKa7J킌aAe. ۞CsTCV&yjBosuu?Im*yVk԰ͬT+鎾S]5Se\&cڧXcU {yMnS&}(l||N½LrRu"UVʴ"EP ( ]-))]+ i_ENiڥ- 廚N)SG0 ]q*Ns(%mDa(Qfz/O*}$DR%i]ʞtT\4:_KjdD']y> Yߐ4iDUSyu60~bEyDal/~~6;_a sEDR#*QJD*aZ!ȦRݜ M&RBA ;zc'suǽpg\Ǣf}OZvԉZ ƎfhP:U%60ҵk:d}2bv7F'x28Z_a]PͫMqsx7lfjs` C8:sVh,8VR;6ԅzAY%S0>R > stream GPL Ghostscript 9.14 2015-07-21T14:01:04+02:00 2015-07-21T14:01:04+02:00 LaTeX with hyperref package endstream endobj 86 0 obj << /Filter /FlateDecode /Length 6605 >> stream x=]o$qv@5F`qOwπ#Y\[AL=w<ܥ"]}߯XW {~s zuw oV+n{Z=y3ve=7't n[Ӯ{F?g?pםuzL0}h`w=u)iaK]VXf-0.abobyG|+`R!{,:q(W $Lp~EVz^1}^9LPϾr=3r\q,E2  H ׳@|1LgX"Ia{\6ʹ^qA:_çk 摎@:@4> &"6 6BJ[}#* 0 r|esTDF⪗2nbQ v@SODK| ;Mj?{zп։? sшR}]A8RFKXoFh ۂ7Jq|7:#aÌZ梤N(=ÈK]2,̨8JLdY_xH@/yLc$ɠbi4J=vO/#'x ?3|4'G6>~>}xsa]!WbD:In8) )鵴v E:qQدWA~>*ǀ 8/&wU$?{<ˏ3Ȳg!ٙLTZ 4Av?vsB|w3fϣ0moS )4*y3eEY_eg5g8e_p ȫ{`Ғ! FFx>Npu@GUjipRp@ +"H˼wH D*bi2)|lS Эee0wݠka\Hjl6׏aA-3מH4%Pǚ8c: %#2?p8\3|Ni ; ~;oU{Yq:%& nxs~B yDyKOS",/"a(3ԒX>bK粝] Ǐ0`5H]M]9-KǷEd1a-] n)bD9 ^S4+@6Yƶ5+"u'1!B;̞QP'lraT4;{k mIA6)r6^eFR|(P梂GGm) 9L(MO\ + !P%A(PfSU=:[i 8"~d5L6pW3!n&VXĸ%0muI}2Y71SJL4وaGeR1qȐ&&rfD]:Lq=$WC?'ȝR?Gbl/-/柒?S&ہ~oQ `ΠQ1 6!Ou,|q 4ŠIr2΍6nAwq>ZRxreV`&8#p1w-K(᧼T+am)"d6Z{)C-fLsJ\M|M8mP X.FEE?w,.GG\H Jt$FGi2)\f%@6k$5d>cܿ sv,813E>FtA5p `ꠡ4̼Xu/!\7P q " 3'RG\SCz2|4A>44|y+ɗgMCoE`O~&H"i<Զ)Wi:i1끲D?.EqnoEjEDR,:1 4( q ?sׇ6q$8M0A ĿQ'x0HfdϬR,%.X-Xn4FgrA(ï] ;ѡn ,5U8b*Okpz1OЮcNf_ymZF17on2C 0vő T+{1RT9 2O|&ԍDe4֍#GmgɬJ|5P$a[pl2ɛUek3,%GkYӱ"1! YI&}y FWhU`+z+913]}g*3Abaq1`lRj "R5i#I,=f^dF`+2O?ʧhÊb)CLg[:U+~ST_1YG%ĺB kbp.^e./o\72((4O~W4Bg oLd~Q+TL nWiR,'tdRr`f+sϹ-̗ѠpYlV2HJrXmXK8n"hAU.LbVbٶFzPNp4qJenv*$k2i4"]q!d/= jGզ3>fxQHIz/BY![ s KhfNœLh_P ,뮇|1m%n,'BVE#Zggk 6v@;<`TK|2׃VIr8Vkn^E; ;4-VN JM&񓩥2 d"!MR'0=w;Z7|> 9xQ+ j2x\UV8k%AO;hu*j.d۫r&f@Jqe=^h \W"/WSk11xѴߒrK+b y\␔睻)]DM<#8vةRj  68DF|I&= ].ߊKjxY,QU5GeK8VVox&sg꤯l; H3 XUZ}01J4Μ VܡVUBGOKbVNϏ'Q9lIX?z`3[́E b>]^$xںMn04%젍]~g`;I^I#w-JPRZ@|:=Ytt7tN8$* ĕ"1o1Il zw`ɃuM 'cM0csa%#сֹw*E9qɂى0 7납lc6tBiUC_6D,0)#a(^ȪhE\[4)ɨ '*GnSI: n9c丵\V7Cͣ; NsW?EEң"ūR@ Bv8MkNT<7PsoycLeYmVy<5y"B׀# +َP=Z] bzdhCIVc˖](S/=%efr)6teqWs$hj`s]0Ʀ23CC`<ϼb"uWjκp;a.psy(=]7;#Y)ץyLaaژth@˯TwشOo!Dƅwݷloټ%kV8Gbu{t%$ɝժ"wQbڊR[gZ* M3!LEY"* <="Ƶ1*ЫQ5zBdya