arsenal/0000755000176200001440000000000014056520012011672 5ustar liggesusersarsenal/NAMESPACE0000644000176200001440000001046714056512336013133 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method("[",arsenal_table) S3method("[",keep_labels) S3method("[",keep_tableby_attrs) S3method("[",selectall) S3method("[",tbstat) S3method("[",tbstat_countpct) S3method("[",tbstat_multirow) S3method("[<-",keep_labels) S3method("[[",tbsta_multirowt) S3method("[[",tbstat) S3method("[[",tbstat_countpct) S3method("labels<-",arsenal_table) S3method("labels<-",data.frame) S3method("labels<-",default) S3method("labels<-",keep_labels) S3method(Ops,tableby) S3method(as.data.frame,freqlist) S3method(as.data.frame,modelsum) S3method(as.data.frame,summary.freqlist) S3method(as.data.frame,summary.modelsum) S3method(as.data.frame,summary.tableby) S3method(as.data.frame,tableby) S3method(as.matrix,selectall) S3method(c,yaml) S3method(diffs,comparedf) S3method(diffs,summary.comparedf) S3method(format,tbstat) S3method(format,tbstat_countpct) S3method(freqlist,formula) S3method(freqlist,table) S3method(head,summary.freqlist) S3method(head,tableby) S3method(includeNA,default) S3method(includeNA,factor) S3method(is.na,selectall) S3method(keep.labels,data.frame) S3method(keep.labels,default) S3method(labels,arsenal_table) S3method(labels,data.frame) S3method(labels,keep_labels) S3method(loosen.labels,data.frame) S3method(loosen.labels,default) S3method(merge,arsenal_table) S3method(merge,freqlist) S3method(n.diff.obs,comparedf) S3method(n.diff.obs,summary.comparedf) S3method(n.diffs,comparedf) S3method(n.diffs,summary.comparedf) S3method(padjust,default) S3method(padjust,summary.tableby) S3method(padjust,tableby) S3method(print,arsenal_table) S3method(print,code.chunk) S3method(print,comparedf) S3method(print,comparedf.frame.summary) S3method(print,comparedf.vars.summary) S3method(print,summary.arsenal_table) S3method(print,summary.comparedf) S3method(print,verbatim) S3method(print,yaml) S3method(sort,freqlist) S3method(sort,tableby) S3method(summary,comparedf) S3method(summary,freqlist) S3method(summary,modelsum) S3method(summary,tableby) S3method(tail,summary.freqlist) S3method(tail,tableby) S3method(tests,tableby) S3method(write2,arsenal_table) S3method(write2,character) S3method(write2,code.chunk) S3method(write2,comparedf) S3method(write2,default) S3method(write2,knitr_kable) S3method(write2,list) S3method(write2,summary.arsenal_table) S3method(write2,summary.comparedf) S3method(write2,verbatim) S3method(write2,xtable) S3method(write2,yaml) S3method(xtfrm,tableby) export("%nin%") export("labels<-") export(Date.mdy) export(N) export(Nevents) export(NeventsSurv) export(Nmiss) export(Nmiss2) export(Npct) export(Nrisk) export(NriskSurv) export(Nsigntest) export(allNA) export(as.countpct) export(as.selectall) export(as.tbstat) export(as.tbstat_multirow) export(binomCI) export(clog) export(code.chunk) export(comparedf) export(comparedf.control) export(countN) export(countcellpct) export(countpct) export(countrowpct) export(diffs) export(formulize) export(freq.control) export(freqlist) export(gmean) export(gmeanCI) export(gmeansd) export(gsd) export(has_strata) export(includeNA) export(iqr) export(is.Date) export(is.freqlist) export(is.modelsum) export(is.selectall) export(is.summary.freqlist) export(is.summary.modelsum) export(is.summary.tableby) export(is.tableby) export(is.yaml) export(keep.labels) export(loosen.labels) export(mdy.Date) export(meanCI) export(meansd) export(meanse) export(medSurv) export(medTime) export(medianmad) export(medianq1q3) export(medianrange) export(modelsum) export(modelsum.control) export(modpval.tableby) export(muck_up_mockstudy) export(n.diff.obs) export(n.diffs) export(na.modelsum) export(na.paired) export(na.tableby) export(negbin) export(ordinal) export(padjust) export(paired) export(paired.control) export(q1q3) export(relrisk) export(replace2) export(rowbinomCI) export(selectall) export(set_attr) export(set_labels) export(smart.split) export(survival) export(tableby) export(tableby.control) export(tests) export(tol.NA) export(tol.char.both) export(tol.char.case) export(tol.char.none) export(tol.char.trim) export(tol.date.absolute) export(tol.factor.labels) export(tol.factor.levels) export(tol.factor.none) export(tol.logical.none) export(tol.num.absolute) export(tol.num.pct) export(tol.num.percent) export(tol.other.none) export(verbatim) export(write2) export(write2html) export(write2pdf) export(write2word) export(yaml) importFrom(utils,head) importFrom(utils,tail) arsenal/README.md0000644000176200001440000001247314056501242013164 0ustar liggesusers # The `arsenal` Package Arsenal logo [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/arsenal)](https://CRAN.R-project.org/package=arsenal) [![Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/arsenal)](https://CRAN.R-project.org/package=arsenal) [![Downloads](http://cranlogs.r-pkg.org/badges/arsenal)](https://CRAN.R-project.org/package=arsenal) [![R-CMD-check](https://github.com/mayoverse/arsenal/workflows/R-CMD-check/badge.svg)](https://github.com/mayoverse/arsenal/actions) ## Overview The goal of `library(arsenal)` is to make statistical reporting easy. It includes many functions which the useR will find useful to have in his/her "arsenal" of functions. There are, at this time, 6 main functions, documented below. Each of these functions is motivated by a local SAS macro or procedure of similar functionality. Note that `arsenal` v3.0.0 is not backwards compatible with previous versions (mainly because `compare()` got renamed to `comparedf()`). See the `NEWS` file for more details. `arsenal` now has a `pkgdown` site: https://mayoverse.github.io/arsenal/ ## The `tableby()` Function `tableby()` is a function to easily summarize a set of independent variables by one or more categorical variables. Optionally, an appropriate test is performed to test the distribution of the independent variables across the levels of the categorical variable. Options for this function are easily controlled using `tableby.control()`. The `tableby()` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. Other S3 methods are implemented for objects from `tableby()`, including `print()`, `[`, `as.data.frame()`, `sort()`, `merge()`, `padjust()`, `head()`, and `tail()`. ## The `paired()` Function `paired()` is a function to easily summarize a set of independent variables across two time points. Optionally, an appropriate test is performed to test the distribution of the independent variables across the time points. Options for this function are easily controlled using `paired.control()`. The `paired()` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. It has the same S3 methods as `tableby()`, since it's a special case of the `tableby()` object. ## The `modelsum()` Function `modelsum()` is a function to fit and summarize models for each independent variable with one or more response variables, with options to adjust for covariates for each model. Options for this function are easily controlled using `modelsum.control()`. The `modelsum` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. Other S3 methods are implemented for objects from `modelsum()`, including `print()`, `[`, `as.data.frame()`, and `merge()`. ## The `freqlist()` Function `freqlist()` is a function to approximate the output from SAS's `PROC FREQ` procedure when using the `/list` option of the `TABLE` statement. Options for this function are easily controlled using `freq.control()`. The `freqlist()` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. Other S3 methods are implemented for objects from `freqlist()`, including `print()`, `[`, `as.data.frame()`, `sort()`, and `merge()`. Additionally, the `summary()` output can be used with `head()` or `tail()`. ## The `comparedf()` Function `comparedf()` compares two data.frames and reporting any differences between them, much like SAS's `PROC COMPARE` procedure. The `comparedf()` output is easily knitted in an Rmarkdown document or displayed in the command line using the `summary()` function. Other S3 methods are implemented for objects of class `"comparedf"`, including `print()`, `n.diffs()`, `n.diff.obs()`, and `diffs()`. ## The `write2*()` Family of Functions `write2word()`, `write2pdf()`, and `write2html()` are functions to output a table into a document, much like SAS's `ODS` procedure. The S3 method behind them is `write2()`. There are methods implemented for `tableby()`, `modelsum()`, `freqlist()`, and `comparedf()`, and also methods for `knitr::kable()`, `xtable::xtable()`, and `pander::pander_return()`. Another option is to coerce an object using `verbatim()` to print out the results monospaced (as if they were in the terminal)--the default method does this automatically. To output multiple tables into a document, simply make a list of them and call the same function as before. A YAML header can be added using `yaml()`. Code chunks can be written using `code.chunk()`. For more information, see `vignette("write2")`. ## Other Notable Functions * `keep.labels()` keeps the `'label'` attribute on an R object when subsetting. `loosen.labels()` allows the labels to drop again. * `formulize()` is a shortcut to collapse variable names into a formula. * `mdy.Date()` and `Date.mdy()` convert numeric dates for month, day, and year to Date object, and vice versa. * `is.Date`: tests if an object is a date. * `%nin%` tests for "not in", the negation of `%in%`. * `allNA()` tests for all elements being NA, and `includeNA()` makes NAs explicit values. arsenal/data/0000755000176200001440000000000013632700352012611 5ustar liggesusersarsenal/data/mockstudy.RData0000644000176200001440000011010513632700352015546 0ustar liggesusers|. {vR' WC&M; "ĂرbAEBWdgv&KAZ;;S3{'@E+~v8QlO6cW=((=Srb[V]LQlSlOPl=(+[|6;PMSlW[9|Ul*(Y?+x.bkݜmŖ [Ub1[܂bbVS-؂K=:&sbsa>_M} Եh*6 SM/FТ6} 5zܘϽbTBۣؼ6*)=z4|}AByV9ã[ZhQhϣA/*6mBѾVl ѿ8gЎEk6H5.VU4$-ZEzhѠ&Xgؠj4إa~WlMyР.ShwѠf0gGz @6 n 6AVNEz[@k- k M4zjģB~u4<mn4-,yhy[ء.tkl۷jk [C?}h[ǦhQǷР{hgahўF_v`NױΥӝ]z% Gܝ;cMwv2m&| 6Z >/m9 4e4+t3ѾGCtEu2 F1Э lܭ@hGhX醱FxcѠXkwUwد;iw0N/ѾC= Kb k{>z@{==0nysc3t1{@==K?.CX2؈ǂ+X_,䍅>c1f,x _ǂgX!S4 GD=͞ep4|O{nhўiO4vOMOY4'l+4'gGCLAqh]zA'q768+Sq8!sBG{ 8(=42vU8 kxXz&!&z!z!{A^``/o/اv:}Bz 1 .?BL o|4>!~>oxn<wG⡛xw`hIb4'㡃xO%Р>>>ȟ}`>wVhg<bl:X ~=}`>gسG4rdp/i_/~x ŚB}o_/l/1 :"V~¶}1__d_dJ L(V zH`Ph̛ zO}`<(at\$ _F$A/ IdL@;!/ ~Y?lyy!aS`~ g0o?|?0t:_G-!4ukXK1lGG1?Cȏ/!~1G"֕X I%bm`l"t&s?DD9~_N%։D]"֚V&"$)Dȕ''D;%'`Z"@Y9=Bc|h:kuZ7"" D g"|@0g X3r WB7A d?>7( q6~65nco`AA ot3B!탠A u2~7>9~0i0bn0; n|5 F :kc`R!CP N LA3}s> {>4|i䅧qo*M# 4ӐwAii45 M_Ms 잎NNCgӑCga3[΀/@̀,3X 11>\΄̄τ3g&bn&=yf&Y{6 5 :ځ_ [̆φ~fle6m6|d6t71F @sGsWsoj9\3q3, {.8:{".Ay`ym|}X3CǸG6Gχ#͇_-}6 ``X]_"bcbh݅݅"?4ְ^y!~E_E`"NP;-./F] \ ,Fl/FQ[.F>X`mK t m l ]}2pq9g9rȵ.f9:ZW++X ȶ}W`+;++PMW` kw%XY ۮDϯD]<rWb`Jg%ֺm%c[WaU!VVOWFU*gjjd5>5j̷}5xq\`jj̱_נ.]}A.[}O>"wlkkbmk1Z<|\/XkP+c Yۭk=bt=h=[#Gm  `ktun2o@lm-67BߍCC71Fpn#Y7!N766vڄ߄`fl rlܛ/6a͛7{7cm1f͈͐w3ዛaXfa3h3| k oFmALo[G[`-' ȼvۂ:c+d݊[sߊ ;mC- n ×#vnGێ<lڎcv; ȸw@G;N佝mw"D\D={NNh'ؼwb].Ⱥ ..]B- :݅wAWvﻠ]]]su4z7FlOvcs@7{{"=]{{{=`==={g{{{!^pg/r^Q{bz  zx~/z݋ ڋ=^mr>f 'W+dt28i2X =&Xs2֞bK)-NAܤ SR)1~IA Ϥ@y '2FMWR W )@')`M (z X@7)Q b>-!)x gBA xz>yAAA~/BA w Xz=(v㝻Ġ5As5Dah.HpPihZmFhHkE4N{5Zbz8 ТLΖ^0dbcEkՙ }z^QєeqNykL:ؐasHrİ~ 5:鄯T[M3/[W6o6g{>u6(?[=bbصVlhv- gutZiu&.]dd䤭"|2Z 6~f\H9߈Þe25fϵbk\oT2̲)楙,:.Ӏe6 g Űk٣;R+1Bٺ ǰkv=uAu輕X'٥Iez6c27g0xٸ1L'.&z0Gczh>e}볾MҼ^轊y(dd[*5%Gx7bkcؕc/ QD`"i!yn5 3<\^6a׸OSy[kl=lhf>Sgv ga6g:a}%ZlfLJLo`E="٘7erWazÎuQ#E)o+:6ifl^U~LNnC53eٚڲ8_9w9F;L!LPEns^LM&[9E0Hc6.ꦌ"r[~۔[=tJޟ۸!Hqo.^_6Su@ii<.\mp6vUEQE0>m$/xym*_m&sKn^s4es7a۰ycQTgk+©w("&ʲ˳q[+bwU^*+"7qس#ߠcS/uB{/nU,K" T6~[f{]`.y-T[cu&o~w5ܳwQ[يV-kAq{ 9S_"]|s+iN2s3t9+TꛡwxwԷ߯YΖ6w]n˷{iFYNsۆѷrw5_;-+Yo{}9;Y7mrj-y'r7_3=1"rLE{IG:[9''A{~40.޴ܷuyx{#xtΑNqK`s;ﺽ;=}Ggv/E|߅.w ]߅.w ]߅~!j74>v؀E ~ڴ-&5-3OVc-G_癍+q3+9+o6ٵwfg֘]d$_Vse&Vl2ꓕno}!3dcY!3gϲ?;3kFqY+kaaflneW?7é%;񑙮ʗsFs+z}e߲3wVvh=gHV3#fetg#ìv3Ȏ?dZV]eYeެge6~q:<ٱKf>Mv֙S2͛Ivmfy3w+/; *Fg53gWp oFCvfy+|G,cgŶ'3Vd5~v4q2b-;k7Sq";~d707>Udf|;g'.ZSFlgV>}˃l Y+;}333{fŋ` LH5|wcy'?37lP?;<.6)!v࿹m 9Q&5ΛE찄n]=Wr5f߅b~?fT߅ Y>;#}.ܯrfcv+#Z?.=oؑ u[%_ܿ-^/j3}س!}BUjΕ,0!dЫIWNS ,vZ?lNkgxc̝m rʅ;RWi C?#￸pߘ8~v#}+Srj 94#Qo46yz?қ3\;__n{\Tmw>p:п ȾK].eLc̘B͛ccpWs7~bt4z7V]ܛp?{9 s_ /37 ?rL>)&sҝYM6'ߣsWVkw-n9ZN(#r97fʹ*ܲr7>lrV,q ^QwR_TcPן9n =* :ynU59-V-YNpkzT; 8vP -}ݭ99%wE;M>m6qJ?ġconhΣµ!h&ǵ-x&_±H\;*;8qE ZcuY|6h 0ZEW LR8oc7bpV0? B? נ5A3m%8 5gZ 6}^qhy:Sh zǚs18.zqS G77i_"C~~׍&#lA5- Q[q!| s̰}qqtK2m"Z>5|¯ RQz l'qm=.n QA[?̰Ƅ.W~vLN֡lO]<J,k0,܋µpmILbro\ 5Ԇ=P.%tb{_ 7|m1>Cf|0υ{w\? 4P⹖8nq^ S\;s'b-T3[49;66ƽ?w,]1|̎*Чkh_a'C[R#C~ЋGi{ !sf{l;h\@a<ʟFegi8x4VC/OZ~Hs}/ ?Lն L1/k@!'/Jg )K VA1M9ٶ GG[gO4xO^ؾEs )DHydGĕ2t7bI?z/Ql4̈=ʃTn0F?V`cXI~a.ۑ'64rm2Ay^y)u=YAN!+WO и.dÏ>2#!4>%ge~)h]W;07tWBj>mZH8񧌡<:mqgԏMl~!V`hj^iļ?Wَkt=یX ^f2@vK)/ j g|O9xbMicUq,OmMlV A(#O?ֈM;Dn;;6p"}mAAqkFkvkF v.*F@MOAۨMi?9ƺiºK h`lr,cXoj p*~c_WP;܀,Oڰ~Կ?6Z!}ۻF}Nн?b=#p8Y/l4?9_MlͨG?dv8g:Yw#Q&C kS~?og$Z0#x6 |Jy^1|+v@[~T R+v?c{` Nhïli}m_l?1r* ?Z+S5hL{ר-Q&rBӏ +kkxGuF hOk>;R.#7+l^;cnqFHYP.:td綋Q7 9{g6Dy i4Ou[ ͱ!Fj_aC#??.y)flu jt?İ_~LGlـ.,Ӻh>씳GS3z75lG$dm\5ou#h-GuۈKCCL l e`kh> @  Rs<@F < H׈cZ/ڠkʚ}F졵{״^}4O>#xOev83 //|K];>FeA~?CCok螈;R=n穹 9Z+@<xX# S;h}}φ~CYmNkUՈ31|FkxSXÿl . 8bE4O/U9=t%Q:Ô't1Rn(kԲ@5 mto.ݧS^ӽ@Gj#R[R[j eu ]3O3NI7Z7(aÎ4S9]n^ӽU.~ M~Iw4O1V<i]IkB/KI{b6T(wv1wș~t4)ngf'4iHBuqޏ? GџʏZFyCt .ۯgsT_S SF%OO˩ ߡqJF}e @ef=͇@Q4>|ET9jxZ({Uն^iOFHH?W>AkP~ԧ }<}KPf— ?0sWS9zSX;X5=Fnc_Okj;B;}њ=sxjG㈻O 5毷p~`ݗl2TBF?&!-`vjnM* Xfȓj'F[imyKp?k37?/T״(bOYAP>?+~Rko,5sRNeCuߗ{&;}c@#wTOo 0}wXots\1o:;DG5SѬuj#tOkǞ{|h/w/v3vNy#/CM\h}ODA\T34×PC?kܟ2{׌zEzt} ;O1Oڨ/y}@G:0Q_g#?h8 |A!F g)#~T>P&vg1~ۏᔑ->ރl4:ؤ}C{#^3ĺhG߱w1އT4{{Q{ONk|;oJ;%Ș|vΗsd[ys1^n/E5-W~z;Xw/Y'}!"ƭXV>UN|;=9VV亙v3'0n>w)v+Ɏl9uwjS[N973߿#nI__e{6%5~ٱȐn;[ʔ_wE?{~=k Ex}[g49\]Idˎ򘾞ΓCfz.;r)Yh쟑? Z]gBq櫿u_rZu_s9l\i 5y _GfZe|)am#?/]Oc ev}&Nv3o/kUEs >3Q|6K ,לir/ȗ|z}Bk>};1|>svlS~%iYe\y4%eZ:^f93ժ__=}=gg6V}:ϣi柠;Hcr/[גŲ@>,?c˚se=_2bYz:;OH{/]N{},Ȕg-}x?k !ǚ|yS#?@qLg0%Sۤ.uYft|.Y2iҌCy"]sBy+C履1=+=3s%-2FXe鞑d˲ZYuucdŝ n|2K:þdR_b(dlG߇Նxf y\ҹ֔ٞҧ=g9z՗^w|3Yt~6r_Ct Jk_4|ӯg}Fl >&"?,ֵڭkU7*}_#m,󧻧H`޳̕%h=+xYKI4פ~G#UY'M2Y'qTTq!ٱ_ݯYF5-~ߠ$JX+s~U-,G~\lS>˺u,ۃ3;zr;ؑ닯&;V`Uv#?\:lnOGfXcYk?\溊sy=~q>s?gM}ZSǭE>nO>cz-3_-r\~n/|>Z||:J&,~`#>^M r㙜 -,|Ï,>~ Zg%}s9>kE<*'ge둏[8v1\.}Myz+Wò8_ۢGn7q>i?V9_*Y'E^.Sܮ\V3>x1 <8d*[]ϪK`}-koZyM>qU,vegs(8V si\:?\_Wys^#Cy>~1W>}˼fcf7Kry,6[E_&}s:=,~aVsnU߬#iYY->d^KXl>s|Q9:>?\Ǣo}o;~1>5~aCki}?ŹT7y,q^׵Ap|ά{:Vv]rOySò>Ui.9/+Z['|<./l@5,i\-m[߃/?ךx=n JkGgvE}}ϹagGz~=-g~צs{K4X9^syY>f]O/E3uH<ޟgsf^M[Wcq9vNq#{+s`vU,Z1w/\g74z}en?{W^Ǜ!u%Qxx?-K?!,a}jo3O1-~n}T\TdDFxgf=ks*ؗYh8|{=>?_f]c.ZNeO+ܞ3u?X+s%#eo?ӽzcta-z~g/`㚿[k^gu[{{?.~_}_|~~uZOz8`rqqE|ۃ=_-8gu_V}vIOG~kOdeߟ߫X9#+{=n|~2Et!gʴy~5.Y-a+/n7gV{;-ϧ漖8r==/;rqO&_ӭתtr򍏟A>68rpǵ7风E^S3=ON\ oOp^YhDZSr?'Xo~{+ž9m>JV=ϙyٛc!~ߟgng+\-1M~P%U\W^,Y8Yn1՟38OeX;r?1[8eMWgr}6߳1iOUk5?l\Wvo ?/O?|KCB:Mȷ`ʇĕϐsgJF{nўy/+-+З[h6L-&!Un 9] ]%jߴw&!y{t~|ͯ/D6?zDs i;~/v'U+Woߐ>z4rTkbHUOg0qa[[B]c{!E?*j}{u!;JG‚TVcbH~ +[t{/ `F`}*Sa %ai, ڳ2K=@78iK7޼BM&RNq kVZ|HM{QIGv/uL$/mWxnsswEN'/|s?q~dfж2VɊy_ ~w/%!> sBϐ\>f>*eե'z*_ ZViOGΛ7AB&'| l'cHXˇČܼ$Ȯ_͎"!+k\iwO?X3+?L<ߟ /Jܟ`֙[9jCD_vv4zcV!Ir [R2?w\\̏$;h Ӗ Q>BʏyIH/6 B4=@1I^|6 Y'u!־LHÛ%1d_4,Qמ'sju"0j|!C]V^^a^bgIطKViC"v \g2 Qm.-V4?I"4}iěDoE$QxԉK52Dl{Ltkvcs i1؟u5e$!mZXq⊞ϵrI'_J.R${m~ZQyw?܌o OK:wv\mџXX)I϶T⥸/^:c?=RHȶD|xGWoqHT*)IO !LϮhOawf,AS{oIW|'Vl$C]_Eȩ;V<ﴕ[}]i$۟*|u:Y"_ϐQ1zq8rjB& Tq ,\x$$5׾MԣMZV~Zs\ӟzϑix; [Մwdzm#q'>7ǗDvӟ#d{y;C^?}m H8jy]Hi}[DhU]1;f~z$?Wwo͆&NPg ѾZ/͈'>X})gIџ"ahw{?y ^첳Dw+Q?5;q {`$?UŸxV_2~o'V}O"]Z2dY/PGO,QQMށS68qчu&w/;uˈwHs8 y]7wkE=㪾;Lk߭i3sVu48Q uEa$/:qNgCa!:k҇Y>/;qm{;(C~5>ȡ7H設Uߘ{hb'ױUwW> V+Lp}g?iƝ{]Eܟ|#~{pn"a3N鳎xu%QpyCsYz2Y]_Xkk 9g{iGvIԟ;)`' Z?/ iuz2|C֩Hx?0VB\i=ۉ7tŏ&_sFB=< ڑQ[cE OW<"{i2^HkD^)֫zkQ^:AE/r$,pY¦eM8>ۏIc16VG??oG.S$qyZ6]^\gju71qf_-I618zl(6![R? ^!~Gkt! l"ag )$g 0zrK2$lYS|ņ$tsoҳVr56xxR/ϻ5m۶=N'{]Č:1IȤ_J<'~L+;ʣ )֭/[OG{̥2$}c4q?`Io(2ёQ;_,+G+|f_Gɿ75;T~)vE~‚ f^05;1Crf^:D{9z_q~4Yf /Rfq#mSc}6qOyȁ:Eo'}U?۾b5 :1{B1B|E='X~9`O Du;JwG_݄_TLyH7$ήDGh7~ ګ nR/Mv"ҹoI=tN-\oqmuj|• $tu:C|HIXxε.Xga$=W[ڨ=F§^ yf/ߞ$] l5}HBG|{WUh'usSGoᱧ^|́W oN ܳ o0Y}o.U0dN܊DLz~YŽɱӼv߈;;#vzx^T؅w|x;_Ѷ=WDJE7ZL_ňwlj!Ǯ Xo>Mo#^wrH"<׫<7xv8V̓ DKz$ FoS5jC=Տ>>b}6(uQKx^,>$*>){E9 ;~[/<9>>e!tۼ+KdC;ض{7=N\=}Do{OUzx?O[;EIQN"Gu/q/xKIX~m }h?6=0J$t5sŚNB"Ds[AQ3)1 ߪWJ?o;U=hm)Ɋ~"jq] _xav P /qcD<4=}ўǽPYGҶDgdH> uV$ڍ(Z8x;[߂_}˺-k@޳?} ׳ ۮxb%?E; t]4b5ړZ5ynsAΒ7Z;c/o{ba[v/ןWB!?\ԣ@ȴ2^%ܣ5g#Q{V;ϋ}6r[?} v[-XOyv^zi?qǷK4onk: ^Ѫ\O].Fl&ѓk82ew⻟L b.o3Bt' ~L_rO׎uGG=/]_:WEuQҐ;%!owя.W|;WѭHդYbǶ/"9;?<\reox<|K˒W'!==ۻB"Z " . Aޮ߬?{xtg8L1[*Z/򐐢AW|=p)oIBRgWPHUGэD>>&g"(7o!2b6m 5?n';.=HnDYTa6BB`Y}{Ґ*'%1W )Uvq1'>48qӯV qa,9R>"jكLxv$tG?|fEر!e҉S*ʯ{! ;`;}I}•~D*au 6H&5w\{< H؞8P8Kx)/gV i<ڥ3g*Ƥ|kF'ϴ}xm]/(~QK_C';m9 Ҝڱ? ܜ8oⵝ67^I2C񺓄=ܦo?HHy q$/wRM=Gw;O͵?FD"}p<!:f$5#'7sw<:,1Z$&da%#;KvFIU,:cs]qGQ_6X__Euoȅ/nf$@Q-|)T5ʴsȔfj#։Je')7 t_/!QT߉wM$a{gG՝NgD$!һ$qC?D?uAItUo (yDڿvM^z)qiYe49S#'t67׏\5{(䵏K".3;o9ž1ӏ_%=mR|?pdC z5h؝DD^5.}lJ@ZP0\ZB $b҅s%bMN3z,%"uoł(]" BэQ;'׫zޯ[cޖAIOmyDnk! Dզ!w#1VvD5>wy?֝6%Wɺ_ߙN"&N.%lQ؈W$a 砾={ւ蟩2K$w"$|+sG--/HBl7%x$|x#bBy ދ;zhqxGU(RwH=6O{KD;ooP˼>le^Vu>&Δ׵Ez |xoljvDu tO uD~I51X?h͋n8"|ҠwDܧ2Ghw|c|]I+1d0N쏻y=KBP'+R*/w3#Hg/ _QS Y^`eJί_)- ?:қ /iOhC ^;55D{%1$d3Kɠnh]9W/Dl12 {B<⻏~k'sAEA.Dmpvkkw;/Ԡ?O Psf=QFy' {jMj ՜|:djЙOמ 蒝gkgؼ$l{ʽRE4nhOӿ+Q]}|; -l$l˜jK"ϕ.:~$!T;Z?IȨA3.vH ZhKuoӊ+!vUiU5.Te?(_tWBjG$z-wOL6zQ?}B";V}K_/Ļc#OlxcG/.%'E{n+njΚw.^Չ7-Z{7%<JMZ<&q?ִdا3?](u񼩞n&ڤQz?UZ %Z?B0vq W&a*ۧfI({/{iÿ{fLW:kI|'>O/M_yB'\uzya>W? !_t\ɫ$b1T襈wQ@ _S)Yx6Po8'|!!e=$( =٪3%MqfK~2cIؓ:wM,3AsDz?/q|gC?Wíތ:{5u~x?v?S4t4[/t`U9.\{Q_G!~dz8l3zӲdžӬ>:gCIȁr;NVBܟ]Mb5q'꛹֔/db;sR{ူo}u3SЙk~]h2G&EFE;>;v!wGU>]_[b٬1f>#N? w=W }M $"~ qu3b/57a;GB.Vt]3l M(yz]w$Mfk8qhU%x&1 |Q$ћkJڴ^Q.y'D/'ZDQ̞J<+}o&Cª%kE?u_]&!sޟ蘞%H ;LatтhcY\Lxjıq\Ցou?T윢~;cqM_=G??nס>WH3aG+b?׿3W[ZCۜEIȻ" yV/h}xʧ7'׻sF=yzLLA=Ɂc\Ƭ;VDޚ$tx./NxjB+w 8DVoQۻ%CE MRG4|E $drϓr'QCJxm\t`6H%6!/{z*QU>@_;|~E+|ȧп_ɍcA+<6)\ぽGֿOݜf[*421h1ym43cbXNa4*H/_1S[CC.?{!lg{hˇmt,귧:;?lm+iˎT 4FfҵpXň'xﳾ g4Nv!]젶}}Amل]~uUowgC)Џ4-X1ls#L#XS'[y*Т:ߌݣI}Q1b=Csîf}vG.iLø'y7vJԾfZc|J*RD6ePgvkQ1=y2xvƒ]cmul!ٽX)ň+[TW gQ)@1>(GFL>4F$(3)iN C7~5{ ((ףk:vF|^gkzn:cʛ WC _P9mD17[Ơ;R!7jqH5̐IyWv qnJj>LQ[S?i8~me;gĦ} 7]{DO]3ͯ+9{Ћ}^i-cZ(Bg ?}eb-؜~gb-rs+;⚲_ԗVRh+ICFJs+e|FOS0#5~mX_VZ'QOsol.Ou'_+;&#e2ŕTj(ǩiCs-/*4SТ?JjbcSaס[2?0Ơܦu{l-YmAY%9J$;]tTM=vWcQF! *"n)Wilci08~Z-b M3[)F),w>=yzLS) ) δb}>:vdby;O@siKk3Ø\gĐ >eSu5иM"ߛ79Ms4jZvڷLQQ &fqG C<flъPۃ}ߑxxE>fz74_Ҙycre"_bX=֍@v=)/kT,ك3t1>}rF2&[%5^R|<)VǟsZ?>+'yS.3صXcm%FkE\$\DmnTGImh}H9@Ӏ}8J?~el%Y)yAo}v}(k0PlP]u=50]<>X2I1|Y*& Yn e}yLr:lﻞq>׉ ;vb" k4=+޵=v#3|lϐʓvɞ~^.?c=}"ơvS߫-dqnx=O$j3c8SS^R{S_\Fl]l`;}]O1OkVh%Xڏ׭t47.>\Io Y."4VR߇84h[]=ߑOS_9?g,5RMdsƳNŲmC}uN.u}?3-6/m4@ >֏]ʽZ)=)_h-A4%R+5Nh߹W4dLE=2:ݟ}}R]sb+~ւ4weR?.F狱դ~Zsnu~%YC\%9%}oS]2S)Z^vOggTaϔb}Y*{BLRse&Gq6wm+L0wҽlL_jUfl"Llvzgc Z{$W5ڶ;F}]Ti9vNb1({0y4滱5Uޕ,IQv(?dkORl4뗟[G0WTtC8h1@Ek=hU:#lͥ%K*E>H_BVImCR]!֗"O)#S_k/_Ey6OQrhRL? Ծٽ$]IwZ!z֊19qyv;GjZ>b6|I9?VtTZ~EPSz_:{R/C6#yev(dGU:2ss ֯kQJxsQ;Tce򱖗IJcr?R?)5 ݣrr}]ظWU鞆ps֏U5IeHPtlRZA&3"Ll~lUNG#K-ّꗿîK)SOe~S:f[vl(^AEں%Oݕcpaτ2}cl|#禬/ѵfrWD9>:S_|En\\,{k)"|&ԗײ WyݘW*c=Yi|"""bޫ®edQ[ ƣYEyZ 6G&g>$iUv VD}@u[%]D9sU6OE؁X(xE,hFMm\5̿C6l$/_[ZZ;{?vN}g~&C>&#}g@sUߟgi_Z=I6';IY*,mdE5juO}3"wÒ>)†#}^y5$'}h->QIː˲F%UE9use\4}ʞ"{y( CY߉Te23#qOZkUI~GǞc5R+bZw 5R?4l +QGPNqy8iTYFTĻ_j'/tT/waUZّzs4gևާߍT`fk'e>RjgurNNա4/V\nC:6Pr(z=DzCjbYW[nixHk._CWvlYf'G}5"qwS'/%rEc<;fY.ݓ(hl4v<eg-2ZGfK=ܔ2'~Y{6u󺬻}63_>"@mV8p[[MқULM1kԡb95FFcgK^i^sYslF*:7sٲ]z-srf(_sdV%;cd>׳ZoV(nfUS|o֞3y8J6IZoz۬g~g8LPάtp )itڏ_φL}-3'+Y {63dWoY\k~Ѹγg-'uZu=g2][J@tkP| ^iG/2_5t75gf7H?J}\ :\< A1Ag-gy2v[>>Fl>h_~>PI/>qg3ҥ+_Ze:s[eZE1_:Bn}] O3)o|Q_kc<_2/1&|\9,cZekXԺ sF]Io?_`՗5uef#keS떏K[_ r$˾ae/]y˥AiVn̰jvŷOfk|\+|A>a|-2~Vin'#iWүC։lkw}2O䘐k+cR?J}W^Y?X2zښU~R'Xl8+g y29YO=d-sg$SFjՕ5w+dR|O%e_c.DzA_6 k|˵ernR_szƇ/nJz3KZQΝh嚵rYcKoyN^ubaُh9r}{"_O} 2⺐9ley_1x+S}@˹!5/l_qț"uL֚UYy}S:'<#ZdV2We>~.XP_5OI/[Ʊƭ+YZ/إgq+ 3 Y_c|[֭/FXC[~C6LW˹,s[sV٬<]Fu˲M~ޮ9Z[ʚs_1\c k~2OF1.O9^9aeXf:˟VX)ǭVX {5Y ƴG>? R>\WpmA ښAOpmטןkg7 ?4;6nؠJ_FP_R00v@|\lR<;#NIx硱qGPu% ?& >'O{H`bw+6?^C&& {`:hdiQ6?qhO6w=59|k~`8uS]SG89TMarsenal/man/0000755000176200001440000000000014013225425012450 5ustar liggesusersarsenal/man/paired.control.Rd0000644000176200001440000000351213656527336015705 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/paired.control.R \name{paired.control} \alias{paired.control} \title{Control settings for \code{paired} function} \usage{ paired.control( diff = TRUE, numeric.test = "paired.t", cat.test = "mcnemar", ordered.test = "signed.rank", date.test = "paired.t", mcnemar.correct = TRUE, signed.rank.exact = NULL, signed.rank.correct = TRUE, ... ) } \arguments{ \item{diff}{logical, telling \code{paired} whether to calculate a column of differences between time points.} \item{numeric.test}{name of test for numeric RHS variables in \code{paired}: paired.t, signed.rank, sign.test.} \item{cat.test}{name of test for categorical variables: mcnemar} \item{ordered.test}{name of test for ordered variables: signed.rank, sign.test} \item{date.test}{name of test to perform for date variables: paired.t, signed.rank, sign.test} \item{mcnemar.correct, signed.rank.exact, signed.rank.correct}{Options for statistical tests. See \code{\link{wilcox.test}} and \code{\link{mcnemar.test}} for details.} \item{...}{Arguments passed to \code{\link{tableby.control}}} } \value{ A list with settings to be used within the \code{\link{paired}} function. } \description{ Control test and summary settings for the \code{\link{paired}} function. } \details{ Note that (with the exception of \code{total}) all arguments to \code{\link{tableby.control}} are accepted in this function (in fact, this function passes everything through to \code{\link{tableby.control}}). However, there are different defaults for the statistical tests (shown here). For details on the other arguments, please see the help page for \code{\link{tableby.control}}. } \seealso{ \code{\link{paired}}, \code{\link{tableby}}, \code{\link{tableby.control}}, \code{\link{summary.tableby}} } \author{ Ethan Heinzen } arsenal/man/NA.operations.Rd0000644000176200001440000000160013632700352015417 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/na.operations.R \name{NA.operations} \alias{NA.operations} \alias{allNA} \alias{includeNA} \alias{includeNA.factor} \alias{includeNA.default} \title{Some functions to handle NAs} \usage{ allNA(x) includeNA(x, label, ...) \method{includeNA}{factor}(x, label = "(Missing)", first = FALSE, ...) \method{includeNA}{default}(x, label = "(Missing)", ...) } \arguments{ \item{x}{An object} \item{label}{A character string denoting the label to set \code{NA}s to.} \item{...}{Other arguments (not in use at this time).} \item{first}{Logical; should the new label be the first level?} } \description{ \code{allNA} tests if all elements are NA, and \code{includeNA} sets the \code{NA}s in a character vector or factor to an explicit label. } \seealso{ \code{\link{is.na}}, \code{\link{anyNA}} } \author{ Ethan Heinzen } arsenal/man/mdy.Date.Rd0000644000176200001440000000227613632700353014417 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/mdy.Date.R \name{mdy.Date} \alias{mdy.Date} \alias{Date.mdy} \alias{is.Date} \title{Convert numeric dates to Date object, and vice versa} \usage{ mdy.Date(month, day, year, yearcut = 120) Date.mdy(date) is.Date(x) } \arguments{ \item{month}{integer, month (1-12).} \item{day}{integer, day of the month (1-31, depending on the month).} \item{year}{integer, either 2- or 4-digit year. If two-digit number, will add 1900 onto it, depending on range.} \item{yearcut}{cutoff for method to know if to convert to 4-digit year.} \item{date}{A date value.} \item{x}{An object.} } \value{ \code{mdy.Date} returns a Date object, and Date.mdy returns a list with integer values for month, day, and year. \code{is.Date} returns a single logical value. } \description{ Convert numeric dates for month, day, and year to Date object, and vice versa. } \details{ Test if an object is a date. More work may need to be done with yearcut and 2-digit years. Best to give a full 4-digit year. } \examples{ mdy.Date(9, 2, 2013) tmp <- mdy.Date(9, 2, 2013) Date.mdy(tmp) is.Date(tmp) } \seealso{ \code{\link{Date}}, \code{\link{DateTimeClasses}} } arsenal/man/write2specific.Rd0000644000176200001440000000333013632700353015664 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/write2specific.R \name{write2specific} \alias{write2specific} \alias{write2word} \alias{write2pdf} \alias{write2html} \title{write2word, write2html, write2pdf} \usage{ write2word(object, file, ...) write2pdf(object, file, ...) write2html(object, file, ...) } \arguments{ \item{object}{An object.} \item{file}{A single character string denoting the filename for the output document.} \item{...}{Additional arguments to be passed to \code{FUN}, \code{rmarkdown::render}, etc. One popular option is to use \code{quiet = TRUE} to suppress the command line output.} } \value{ \code{object} is returned invisibly, and \code{file} is written. } \description{ Functions to output tables to a single Word, HTML, or PDF document. } \details{ To generate the appropriate file type, the \code{write2*} functions use one of \code{rmarkdown::word_document}, \code{rmarkdown::html_document}, and \code{rmarkdown::pdf_document} to get the job done. \code{"..."} arguments are passed to these functions, too. } \examples{ \dontrun{ data(mockstudy) # tableby example tab1 <- tableby(arm ~ sex + age, data=mockstudy) write2html(tab1, "~/trash.html") # freqlist example tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany") noby <- freqlist(tab.ex, na.options = "include") write2pdf(noby, "~/trash2.pdf") # A more complicated example write2word(tab1, "~/trash.doc", keep.md = TRUE, reference_docx = mystyles.docx, # passed to rmarkdown::word_document quiet = TRUE, # passed to rmarkdown::render title = "My cool new title") # passed to summary.tableby } } \seealso{ \code{\link{write2}} } \author{ Ethan Heinzen, adapted from code from Krista Goergen } arsenal/man/padjust.Rd0000644000176200001440000000201514013225425014407 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/padjust.R \name{padjust} \alias{padjust} \alias{padjust.default} \alias{padjust.tableby} \alias{padjust.summary.tableby} \title{Adjust P-values for Multiple Comparisons} \usage{ padjust(p, method, n, ...) \method{padjust}{default}(p, method, n, ...) \method{padjust}{tableby}(p, method, n, suffix = " (adjusted for multiple comparisons)", ...) \method{padjust}{summary.tableby}(p, method, n, suffix = " (adjusted for multiple comparisons)", ...) } \arguments{ \item{p}{An object.} \item{method}{correction method. Can be abbreviated.} \item{n}{number of comparisons, must be at least \code{length(p)}; only set this (to non-default) when you know what you are doing!} \item{...}{Other arguments.} \item{suffix}{A suffix to add to the footnotes indicating that the tests were adjusted.} } \description{ Adjust P-values for Multiple Comparisons } \seealso{ \code{\link[stats]{p.adjust}}, \code{\link{modpval.tableby}}, \code{\link{tests.tableby}} } arsenal/man/tableby.stats.Rd0000644000176200001440000001016114056426132015522 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tableby.stats.R \name{tableby.stats} \alias{tableby.stats} \alias{arsenal_sum} \alias{arsenal_min} \alias{arsenal_max} \alias{arsenal_mean} \alias{arsenal_sd} \alias{arsenal_var} \alias{meansd} \alias{meanse} \alias{meanCI} \alias{medianrange} \alias{medianmad} \alias{arsenal_median} \alias{arsenal_range} \alias{gmean} \alias{gsd} \alias{gmeansd} \alias{gmeanCI} \alias{Nsigntest} \alias{Nevents} \alias{medSurv} \alias{NeventsSurv} \alias{NriskSurv} \alias{Nrisk} \alias{medTime} \alias{q1q3} \alias{medianq1q3} \alias{iqr} \alias{Nmiss} \alias{Nmiss2} \alias{N} \alias{Npct} \alias{count} \alias{countpct} \alias{countN} \alias{countrowpct} \alias{countcellpct} \alias{binomCI} \alias{rowbinomCI} \title{tableby Summary Statistics Functions} \usage{ arsenal_sum(x, na.rm = TRUE, ...) arsenal_min(x, na.rm = TRUE, ...) arsenal_max(x, na.rm = TRUE, ...) arsenal_mean(x, na.rm = TRUE, weights = NULL, ...) arsenal_sd(x, na.rm = TRUE, weights = NULL, ...) arsenal_var(x, na.rm = TRUE, weights = NULL, ...) meansd(x, na.rm = TRUE, weights = NULL, ...) meanse(x, na.rm = TRUE, weights = NULL, ...) meanCI(x, na.rm = TRUE, weights = NULL, conf.level = 0.95, ...) medianrange(x, na.rm = TRUE, weights = NULL, ...) medianmad(x, na.rm = TRUE, weights = NULL, ...) arsenal_median(x, na.rm = TRUE, weights = NULL, ...) arsenal_range(x, na.rm = TRUE, ...) gmean(x, na.rm = TRUE, weights = NULL, ...) gsd(x, na.rm = TRUE, weights = NULL, ...) gmeansd(x, na.rm = TRUE, weights = NULL, ...) gmeanCI(x, na.rm = TRUE, weights = NULL, conf.level = 0.95, ...) Nsigntest(x, na.rm = TRUE, weights = NULL, ...) Nevents(x, na.rm = TRUE, weights = NULL, ...) medSurv(x, na.rm = TRUE, weights = NULL, ...) NeventsSurv(x, na.rm = TRUE, weights = NULL, times = 1:5, ...) NriskSurv(x, na.rm = TRUE, weights = NULL, times = 1:5, ...) Nrisk(x, na.rm = TRUE, weights = NULL, times = 1:5, ...) medTime(x, na.rm = TRUE, weights = NULL, ...) q1q3(x, na.rm = TRUE, weights = NULL, ...) medianq1q3(x, na.rm = TRUE, weights = NULL, ...) iqr(x, na.rm = TRUE, weights = NULL, ...) Nmiss(x, weights = NULL, ...) Nmiss2(x, weights = NULL, ...) N(x, na.rm = TRUE, weights = NULL, ...) Npct( x, levels = NULL, by, by.levels = sort(unique(by)), na.rm = TRUE, weights = NULL, ..., totallab = "Total" ) count(x, levels = NULL, na.rm = TRUE, weights = NULL, ...) countpct(x, levels = NULL, na.rm = TRUE, weights = NULL, ...) countN(x, levels = NULL, na.rm = TRUE, weights = NULL, ...) countrowpct( x, levels = NULL, by, by.levels = sort(unique(by)), na.rm = TRUE, weights = NULL, ..., totallab = "Total" ) countcellpct( x, levels = NULL, by, by.levels = sort(unique(by)), na.rm = TRUE, weights = NULL, ..., totallab = "Total" ) binomCI(x, levels = NULL, na.rm = TRUE, weights = NULL, conf.level = 0.95, ...) rowbinomCI( x, levels = NULL, by, by.levels = sort(unique(by)), na.rm = TRUE, weights = NULL, conf.level = 0.95, ..., totallab = "Total" ) } \arguments{ \item{x}{Usually a vector.} \item{na.rm}{Should NAs be removed?} \item{...}{Other arguments.} \item{weights}{A vector of weights.} \item{conf.level}{Numeric, denoting what confidence level to use for confidence intervals.} \item{times}{A vector of times to use for survival summaries.} \item{levels}{A vector of levels that character \code{x}s should have.} \item{by}{a vector of the by-values.} \item{by.levels}{a vector of the levels of \code{by}.} \item{totallab}{What to call the total "column"} } \value{ Usually a vector of the appropriate numbers. } \description{ A collection of functions that will report summary statistics. To create a custom function, consider using a function with all three arguments and \code{...}. See the \code{\link{tableby}} vignette for an example. } \details{ Not all these functions are exported, in order to avoid conflicting NAMESPACES. Note also that the functions prefixed with \code{"arsenal_"} can be referred to by their short names (e.g., \code{"min"} for \code{"arsenal_min"}). } \seealso{ \code{\link{includeNA}}, \code{\link{tableby.control}} } arsenal/man/paired.Rd0000644000176200001440000000570613713262077014225 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/paired.R \name{paired} \alias{paired} \title{Summary Statistics of a Set of Independent Variables Paired Across Two Timepoints} \usage{ paired( formula, data, id, na.action, subset = NULL, strata, control = NULL, ... ) } \arguments{ \item{formula}{an object of class \code{\link{formula}} of the form \code{time ~ var1 + ...}. See "Details" for more information.} \item{data}{an optional data frame, list or environment (or object coercible by \code{\link{as.data.frame}} to a data frame) containing the variables in the model. If not found in data, the variables are taken from \code{environment(formula)}, typically the environment from which the function is called.} \item{id}{The vector giving IDs to match up data for the same subject across two timepoints.} \item{na.action}{a function which indicates what should happen when the data contain \code{NA}s. The default is \code{na.paired("in.both")}. See \code{\link{na.paired}} for more details} \item{subset}{an optional vector specifying a subset of observations (rows of data) to be used in the results. Works as vector of logicals or an index.} \item{strata}{a vector of strata to separate summaries by an additional group.} \item{control}{control parameters to handle optional settings within \code{paired}. Two aspects of \code{paired} are controlled with these: test options of RHS variables and x variable summaries. Arguments for \code{paired.control} can be passed to \code{paired} via the \code{...} argument, but if a control object and \code{...} arguments are both supplied, the latter are used. See \code{\link{paired.control}} for more details.} \item{...}{additional arguments to be passed to internal \code{paired} functions or \code{\link{paired.control}}.} } \value{ An object with class \code{c("paired", "tableby", "arsenal_table")} } \description{ Summarize one or more variables (x) by a paired time variable (y). Variables on the right side of the formula, i.e. independent variables, are summarized by the two time points on the left of the formula. Optionally, an appropriate test is performed to test the distribution of the independent variables across the time points. } \details{ Do note that this function piggybacks off of \code{\link{tableby}} quite heavily, so there is no \code{summary.paired} function (for instance). These tests are accepted: \itemize{ \item{ \code{paired.t}: a paired \code{\link[stats:t.test]{t-test}}. } \item{ \code{mcnemar}: \link[stats:mcnemar.test]{McNemar's test}. } \item{ \code{signed.rank}: a \link[stats:wilcox.test]{signed rank test}. } \item{ \code{sign.test}: a sign test. } \item{ \code{notest}: no test is performed. } } } \seealso{ \code{\link{arsenal_table}}, \code{\link{paired.control}}, \code{\link{tableby}}, \code{\link{formulize}}, \code{\link{selectall}} } \author{ Jason Sinnwell, Beth Atkinson, Ryan Lennon, and Ethan Heinzen } arsenal/man/comparedf.control.Rd0000644000176200001440000001217413656527336016405 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/comparedf.control.R \name{comparedf.control} \alias{comparedf.control} \title{Control settings for \code{comparedf} function} \usage{ comparedf.control( tol.logical = "none", tol.num = c("absolute", "percent", "pct"), tol.num.val = sqrt(.Machine$double.eps), int.as.num = FALSE, tol.char = c("none", "trim", "case", "both"), tol.factor = c("none", "levels", "labels"), factor.as.char = FALSE, tol.date = "absolute", tol.date.val = 0, tol.other = "none", tol.vars = "none", max.print.vars = NA, max.print.obs = NA, max.print.diffs.per.var = 10, max.print.diffs = 50, max.print.attrs = NA, ..., max.print.diff = 10 ) } \arguments{ \item{tol.logical, tol.num, tol.char, tol.factor, tol.date, tol.other}{A function or one of the shortcut character strings or a list thereof, denoting the tolerance function to use for a given data type. See "details", below.} \item{tol.num.val}{Numeric; maximum value of differences allowed in numerics (fed to the function given in \code{tol.num}).} \item{int.as.num}{Logical; should integers be coerced to numeric before comparison? Default FALSE.} \item{factor.as.char}{Logical; should factors be coerced to character before comparison? Default FALSE.} \item{tol.date.val}{Numeric; maximum value of differences allowed in dates (fed to the function given in \code{tol.date}).} \item{tol.vars}{Either \code{"none"} (the default), denoting that variable names are to be matched as-is, a named vector manually specifying variable names to compare (where the names correspond to columns of \code{x} and the values correspond to columns of \code{y}), or a character vector denoting equivalence classes for characters in the variable names. See "details", below.} \item{max.print.vars}{Integer denoting maximum number of variables to report in the "variables not shared" and "variables not compared" output. \code{NA} will print all differences.} \item{max.print.obs}{Integer denoting maximum number of not-shared observations to report. \code{NA} will print all differences.} \item{max.print.diffs.per.var, max.print.diffs}{Integers denoting the maximum number of differences to report for each variable or overall. \code{NA} will print all differences for each variable or overall.} \item{max.print.attrs}{Integers denoting the maximum number of non-identical attributes to report.\code{NA} will print all differences.} \item{...}{Other arguments (not in use at this time).} \item{max.print.diff}{Deprecated.} } \value{ A list containing the necessary parameters for the \code{\link{comparedf}} function. } \description{ Control tolerance definitions for the \code{\link{comparedf}} function. } \details{ The following character strings are accepted: \itemize{ \item{\code{tol.logical = "none"}: compare logicals exactly as they are.} \item{\code{tol.num = "absolute"}: compare absolute differences in numerics.} \item{\code{tol.num = "percent"}, \code{tol.num = "pct"} compare percent differences in numerics.} \item{\code{tol.char = "none"}: compare character strings exactly as they are.} \item{\code{tol.char = "trim"}: left-justify and trim all trailing white space.} \item{\code{tol.char = "case"}: allow differences in upper/lower case.} \item{\code{tol.char = "both"}: combine \code{"trim"} and \code{"case"}.} \item{\code{tol.factor = "none"}: match both character labels and numeric levels.} \item{\code{tol.factor = "levels"}: match only the numeric levels.} \item{\code{tol.factor = "labels"}: match only the labels.} \item{\code{tol.date = "absolute"}: compare absolute differences in dates.} \item{\code{tol.other = "none"}: expect objects of other classes to be exactly identical.} } A list with names mapped to \code{x} can be used to specify tolerances by variable. One unnamed element is supported as the default. \code{tol.vars}: If not set to \code{"none"} (the default) or a named vector, the \code{tol.vars} argument is a character vector denoting equivalence classes for the characters in the variable names. A single character in this vector means to replace that character with \code{""}. All other strings in this vector are split by character and replaced by the first character in the string. E.g., a character vector \code{c("._", "aA", " ")} would denote that the dot and underscore are equivalent (to be translated to a dot), that "a" and "A" are equivalent (to be translated to "a"), and that spaces should be removed. The special character string \code{"case"} in this vector is the same as specifying \code{paste0(letters, LETTERS)}. } \examples{ cntl <- comparedf.control( tol.num = "pct", # calculate percent differences tol.vars = c("case", # ignore case "._", # set all underscores to dots. "e") # remove all letter e's ) cntl <- comparedf.control(tol.char = list( "none", # the default x1 = "case", # be case-insensitive for the variable "x1" x2 = function(x, y) tol.NA(x, y, x != y | y == "NA") # a custom-defined tolerance )) } \seealso{ \code{\link{comparedf}}, \code{\link{comparedf.tolerances}}, \code{\link{summary.comparedf}} } \author{ Ethan Heinzen } arsenal/man/tableby.internal.Rd0000644000176200001440000000411713674211211016177 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tableby.internal.R \name{tableby.internal} \alias{tableby.internal} \alias{is.tableby} \alias{is.summary.tableby} \alias{modpval.tableby} \alias{tests} \alias{tests.tableby} \alias{na.tableby} \alias{xtfrm.tableby} \alias{sort.tableby} \alias{Ops.tableby} \alias{head.tableby} \alias{tail.tableby} \title{Helper functions for tableby} \usage{ is.tableby(x) is.summary.tableby(x) modpval.tableby(x, pdata, use.pname = FALSE) tests(x) \method{tests}{tableby}(x) na.tableby(lhs = TRUE) \method{xtfrm}{tableby}(x) \method{sort}{tableby}(x, ...) \method{Ops}{tableby}(e1, e2) \method{head}{tableby}(x, n = 6L, ...) \method{tail}{tableby}(x, n = 6L, ...) } \arguments{ \item{x}{A \code{tableby} object.} \item{pdata}{A named data.frame where the first column is the by-variable names, the (optional) second is the strata value, the next is the x variable names, the next is p-values (or some test stat), and the (optional) next column is the method name.} \item{use.pname}{Logical, denoting whether the column name in \code{pdata} corresponding to the p-values should be used in the output of the object.} \item{lhs}{Logical, denoting whether to remove \code{NA}s from the first column of the data.frame (the "left-hand side")} \item{...}{Other arguments.} \item{e1, e2}{\code{\link{tableby}} objects, or numbers to compare them to.} \item{n}{A single integer. See \code{\link[utils]{head}} or \code{\link[utils:head]{tail}} for more details} } \value{ \code{na.tableby} returns a subsetted version of \code{object} (with attributes). \code{Ops.tableby} returns a logical vector. \code{xtfrm.tableby} returns the p-values (which are ordered by \code{\link{order}} to \code{\link{sort}}). } \description{ A set of helper functions for \code{\link{tableby}}. } \details{ Logical comparisons are implemented for \code{Ops.tableby}. } \seealso{ \code{\link{arsenal_table}}, \code{\link{sort}}, \code{\link[utils]{head}}, \code{\link[utils:head]{tail}}, \code{\link{tableby}}, \code{\link{summary.tableby}}, \code{\link{tableby.control}} } arsenal/man/modelsum.control.Rd0000644000176200001440000000521113714571072016253 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/modelsum.control.R \name{modelsum.control} \alias{modelsum.control} \title{Control settings for \code{modelsum} function} \usage{ modelsum.control( digits = 3L, digits.ratio = 3L, digits.p = 3L, format.p = TRUE, show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95, ordinal.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", "Nmiss"), binomial.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", "concordance", "Nmiss"), gaussian.stats = c("estimate", "std.error", "p.value", "adj.r.squared", "Nmiss"), poisson.stats = c("RR", "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss"), negbin.stats = c("RR", "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss"), relrisk.stats = c("RR", "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss"), clog.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", "concordance", "Nmiss"), survival.stats = c("HR", "CI.lower.HR", "CI.upper.HR", "p.value", "concordance", "Nmiss"), stat.labels = list(), ... ) } \arguments{ \item{digits}{Numeric, denoting the number of digits after the decimal point for beta coefficients and standard errors.} \item{digits.ratio}{Numeric, denoting the number of digits after the decimal point for ratios, e.g. OR, RR, HR.} \item{digits.p}{Numeric, denoting the number of digits for p-values. See "Details", below.} \item{format.p}{Logical, denoting whether to format p-values. See "Details", below.} \item{show.adjust}{Logical, denoting whether to show adjustment terms.} \item{show.intercept}{Logical, denoting whether to show intercept terms.} \item{conf.level}{Numeric, giving the confidence level.} \item{ordinal.stats, binomial.stats, survival.stats, gaussian.stats, poisson.stats, negbin.stats, clog.stats, relrisk.stats}{Character vectors denoting which stats to show for the various model types.} \item{stat.labels}{A named list of labels for all the stats used above.} \item{...}{Other arguments (not in use at this time).} } \value{ A list with settings to be used within the \code{modelsum} function. } \description{ Control test and summary settings for \code{\link{modelsum}} function. } \details{ If \code{format.p} is \code{FALSE}, \code{digits.p} denotes the number of significant digits shown. The p-values will be in exponential notation if necessary. If \code{format.p} is \code{TRUE}, \code{digits.p} will determine the number of digits after the decimal point to show. If the p-value is less than the resulting number of places, it will be formatted to show so. } \seealso{ \code{\link{modelsum}}, \code{\link{summary.modelsum}}, \code{\link{modelsum.internal}} } arsenal/man/tableby.control.Rd0000644000176200001440000001540214056426132016047 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tableby.control.R \name{tableby.control} \alias{tableby.control} \title{Control settings for \code{tableby} function} \usage{ tableby.control( test = TRUE, total = TRUE, total.pos = c("after", "before"), test.pname = NULL, numeric.simplify = FALSE, cat.simplify = FALSE, cat.droplevels = FALSE, ordered.simplify = FALSE, date.simplify = FALSE, numeric.test = "anova", cat.test = "chisq", ordered.test = "trend", surv.test = "logrank", date.test = "kwt", selectall.test = "notest", test.always = FALSE, numeric.stats = c("Nmiss", "meansd", "range"), cat.stats = c("Nmiss", "countpct"), ordered.stats = c("Nmiss", "countpct"), surv.stats = c("Nmiss", "Nevents", "medSurv"), date.stats = c("Nmiss", "median", "range"), selectall.stats = c("Nmiss", "countpct"), stats.labels = list(), digits = 3L, digits.count = 0L, digits.pct = 1L, digits.p = 3L, format.p = TRUE, digits.n = 0L, conf.level = 0.95, wilcox.correct = FALSE, wilcox.exact = NULL, chisq.correct = FALSE, simulate.p.value = FALSE, B = 2000, times = 1:5, ... ) } \arguments{ \item{test}{logical, telling \code{tableby} whether to perform tests of x variables across levels of the group variable.} \item{total}{logical, telling \code{tableby} whether to calculate a column of totals across group variable.} \item{total.pos}{One of \code{"before"} or \code{"after"}, denoting where to put the total column relative to the by-variable columns.} \item{test.pname}{character string denoting the p-value column name in \code{\link{summary.tableby}}. Modifiable also with \code{\link{modpval.tableby}}.} \item{numeric.simplify, date.simplify}{logical, tell \code{tableby} whether to condense numeric/date output to a single line. NOTE: this only simplifies to one line if there is only one statistic reported, such as \code{meansd}. In particular, if \code{Nmiss} is specified and there are missings, then the output is not simplified.} \item{cat.simplify, ordered.simplify}{logical, tell \code{tableby} whether to remove the first level of the categorical/ordinal variable if binary. If \code{TRUE}, only the summary stats of the second level are reported (unless there's only one level, in which case it's reported). If \code{"label"}, the second level's label is additionally appended to the label. NOTE: this only simplifies to one line if there is only one statistic reported, such as \code{countpct}. In particular, if \code{Nmiss} is specified and there are missings, then the output is not simplified.} \item{cat.droplevels}{Should levels be dropped for categorical variables? If set to true, p-values will not be displayed unless \code{test.always = TRUE} as well.} \item{numeric.test}{name of test for numeric RHS variables in \code{tableby}: anova, kwt (Kruskal-Wallis), medtest (median test). If no LHS variable exists, then a mean is required for a univariate test.} \item{cat.test}{name of test for categorical variables: chisq, fe (Fisher's Exact)} \item{ordered.test}{name of test for ordered variables: trend} \item{surv.test}{name of test for survival variables: logrank} \item{date.test}{name of test for date variables: kwt} \item{selectall.test}{name of test for date variables: notest} \item{test.always}{Should the test be performed even if one or more by-group has 0 observations? Relevant for kwt and anova.} \item{numeric.stats, cat.stats, ordered.stats, surv.stats, date.stats, selectall.stats}{summary statistics to include for the respective class of RHS variables within the levels of the group LHS variable.} \item{stats.labels}{A named list of labels for all the statistics function names, where the function name is the named element in the list and the value that goes with it is a string containing the formal name that will be printed in all printed renderings of the output, e.g., \code{list(countpct="Count (Pct)")}. Any unnamed elements will be ignored. Passing \code{NULL} will disable labels.} \item{digits}{Number of decimal places for numeric values.} \item{digits.count}{Number of decimal places for count values.} \item{digits.pct}{Number of decimal places for percents.} \item{digits.p}{Number of decimal places for p-values.} \item{format.p}{Logical, denoting whether to format p-values. See "Details", below.} \item{digits.n}{Number of decimal places for N's in the header. Set it to NA to suppress the N's.} \item{conf.level}{Numeric, denoting what confidence level to use for confidence intervals. (See, e.g., \code{\link{binomCI}})} \item{wilcox.correct, wilcox.exact}{See \code{\link[stats]{wilcox.test}}} \item{chisq.correct}{logical, correction factor for chisq.test} \item{simulate.p.value}{logical, simulate p-value for categorical tests (fe and chisq)} \item{B}{number of simulations to perform for simulation-based p-value} \item{times}{A vector of times to use for survival summaries.} \item{...}{additional arguments.} } \value{ A list with settings to be used within the \code{tableby} function. } \description{ Control test and summary settings for the \code{\link{tableby}} function. } \details{ All tests can be turned off by setting \code{test} to FALSE. Otherwise, test are set to default settings in this list, or set explicitly in the formula of \code{tableby}. If \code{format.p} is \code{FALSE}, \code{digits.p} denotes the number of significant digits shown. The p-values will be in exponential notation if necessary. If \code{format.p} is \code{TRUE}, \code{digits.p} will determine the number of digits after the decimal point to show. If the p-value is less than the resulting number of places, it will be formatted to show so. Options for statistics are described more thoroughly in the vignette and are listed in \link{tableby.stats} } \examples{ set.seed(100) ## make 3+ categories for Response mdat <- data.frame(Response=c(0,0,0,0,0,1,1,1,1,1), Sex=sample(c("Male", "Female"), 10,replace=TRUE), Age=round(rnorm(10,mean=40, sd=5)), HtIn=round(rnorm(10,mean=65,sd=5))) ## allow default summaries in RHS variables, and pass control args to ## main function, to be picked up with ... when calling tableby.control outResp <- tableby(Response ~ Sex + Age + HtIn, data=mdat, total=FALSE, test=TRUE) outCtl <- tableby(Response ~ Sex + Age + HtIn, data=mdat, control=tableby.control(total=TRUE, cat.simplify=TRUE, cat.stats=c("Nmiss","countpct"),digits=1)) summary(outResp, text=TRUE) summary(outCtl, text=TRUE) } \seealso{ \code{\link[stats]{anova}}, \code{\link[stats]{chisq.test}}, \code{\link{tableby}}, \code{\link{summary.tableby}}, \code{\link{tableby.stats}}. } \author{ Jason Sinnwell, Beth Atkinson, Ethan Heinzen, Terry Therneau, adapted from SAS Macros written by Paul Novotny and Ryan Lennon } arsenal/man/tableby.stats.internal.Rd0000644000176200001440000000342013656527336017352 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/tableby.internal.R \name{tableby.stats.internal} \alias{tableby.stats.internal} \alias{as.tbstat} \alias{as.countpct} \alias{as.tbstat_multirow} \title{Internal \code{tableby} functions} \usage{ as.tbstat( x, oldClass = NULL, sep = NULL, parens = NULL, sep2 = NULL, pct = NULL, ... ) as.countpct( x, ..., which.count = setdiff(seq_along(x), which.pct), which.pct = 0L ) as.tbstat_multirow(x) } \arguments{ \item{x}{Usually a vector.} \item{oldClass}{class(es) to add to the resulting object.} \item{sep}{The separator between \code{x[1]} and the rest of the vector.} \item{parens}{A length-2 vector denoting parentheses to use around \code{x[2]} and \code{x[3]}.} \item{sep2}{The separator between \code{x[2]} and \code{x[3]}.} \item{pct}{For statistics of length 2, the symbol to use after the second one. (It's called "pct" because usually the first statistic is never a percent, but the second often is.)} \item{...}{arguments to pass to \code{as.tbstat}.} \item{which.count}{Which statistics are counts? The default is everything except the things that are percents.} \item{which.pct}{Which statistics are percents? The default is 0, indicating that none are.} } \description{ A collection of functions that may help users create custom functions that are formatted correctly. } \details{ The vignette has an example on how to use these. \code{as.tbstat} defines a tableby statistic with its appropriate formatting. \code{as.countpct} adds another class to \code{as.tbstat} to use different "digits" arguments (i.e., \code{digits.count} or \code{digits.pct}). See \code{\link{tableby.control}}. \code{as.tbstat_multirow} marks an object (usually a list) for multiple-row printing. } arsenal/man/mockstudy.Rd0000644000176200001440000000322513656527615015005 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/mockstudy.R \docType{data} \name{mockstudy} \alias{mockstudy} \alias{muck_up_mockstudy} \title{Mock study data for examples} \format{ A data frame with 1499 observations on the following 15 variables: \describe{ \item{\code{case}}{a numeric identifier-patient ID} \item{\code{age}}{age in years} \item{\code{arm}}{treatment arm divided into 3 groups, character string } \item{\code{sex}}{a factor with levels \code{Male} \code{Female}} \item{\code{race}}{self-reported race/ethnicity, character string} \item{\code{fu.time}}{survival or censoring time in years} \item{\code{fu.stat}}{censoring status; 1=censor, 2=death} \item{\code{ps}}{integer, ECOG performance score } \item{\code{hgb}}{numeric, hemoglobin count} \item{\code{bmi}}{numeric, body mass index, kg/m^2} \item{\code{alk.phos}}{numeric, alkaline phosphatase} \item{\code{ast}}{numeric, aspartate transaminase } \item{\code{mdquality.s}}{integer, LASA QOL 0=Clinically Deficient, 1=Not Clinically Deficient } \item{\code{age.ord}}{an ordered factor split of age, with levels \code{10-19} < \code{20-29} < \code{30-39} < \code{40-49} < \code{50-59} < \code{60-69} < \code{70-79} < \code{80-89}} } An object of class \code{data.frame} with 1499 rows and 14 columns. } \usage{ mockstudy muck_up_mockstudy() } \description{ Mock clinical study data for examples to test data manipulation and statistical functions. The function \code{muck_up_mockstudy()} is used in examples for \code{\link{comparedf}}. } \examples{ data(mockstudy) str(mockstudy) } \keyword{datasets} arsenal/man/comparedf.Rd0000644000176200001440000000313613656527336014724 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/comparedf.R \name{comparedf} \alias{comparedf} \alias{print.comparedf} \title{Compare two data.frames and report differences} \usage{ comparedf(x, y, by = NULL, by.x = by, by.y = by, control = NULL, ...) \method{print}{comparedf}(x, ...) } \arguments{ \item{x, y}{A data.frame to compare} \item{by, by.x, by.y}{Which variables are IDs to merge the two data.frames? If set to \code{"row.names"}, merging will occur over the row.names. If set to \code{NULL} (default), merging will occur row-by-row.} \item{control}{A list of control parameters from \code{\link{comparedf.control}}.} \item{...}{Other arguments, passed to \code{\link{comparedf.control}} when appropriate.} } \description{ Compare two data.frames and report any differences between them, much like SAS's \code{PROC COMPARE} procedure. } \examples{ df1 <- data.frame(id = paste0("person", 1:3), a = c("a", "b", "c"), b = c(1, 3, 4), c = c("f", "e", "d"), row.names = paste0("rn", 1:3), stringsAsFactors = FALSE) df2 <- data.frame(id = paste0("person", 3:1), a = c("c", "b", "a"), b = c(1, 3, 4), d = paste0("rn", 1:3), row.names = paste0("rn", c(1,3,2)), stringsAsFactors = FALSE) summary(comparedf(df1, df2)) summary(comparedf(df1, df2, by = "id")) summary(comparedf(df1, df2, by = "row.names")) } \seealso{ \code{\link{summary.comparedf}}, \code{\link{comparedf.control}}, \code{\link{diffs}}, \code{\link{n.diffs}}, \code{\link{n.diff.obs}} } \author{ Ethan Heinzen, adapted from code from Andrew Hanson } arsenal/man/paired.internal.Rd0000644000176200001440000000154213656527336016042 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/paired.internal.R \name{paired.internal} \alias{paired.internal} \alias{na.paired} \title{Helper functions for paired} \usage{ na.paired(missings = c("in.both", "fill", "asis")) } \arguments{ \item{missings}{A character string denoting which action to take. See "Details", below.} } \value{ \code{na.paired} returns a function used to subset data.frames in \code{\link{paired}}. } \description{ A set of helper functions for \code{\link{paired}}. } \details{ All methods subset out any NA time points or IDs. \code{"in.both"} (the default) subsets the data.frame to individuals who appear at both time points. \code{"fill"} adds explicit missings for the people missing second time points. \code{"asis"} does nothing to add or remove missings. } \seealso{ \link{tableby.internal} } arsenal/man/arsenal.Rd0000644000176200001440000000423113656527335014405 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/arsenal.R \docType{package} \name{arsenal} \alias{arsenal} \title{An Arsenal of 'R' Functions for Large-Scale Statistical Summaries} \description{ An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. } \details{ The package download, NEWS, and README are available on CRAN: \url{https://cran.r-project.org/package=arsenal} } \section{Functions}{ Below are listed some of the most widely used functions available in \code{arsenal}: \code{\link{tableby}}: Summary statistics of a set of independent variables by a categorical variable. \code{\link{paired}}: Summary statistics of a set of independent variables paired across two timepoints. \code{\link{modelsum}}: Fit models over each of a set of independent variables with a response variable. \code{\link{freqlist}}: Approximate the output from SAS's \code{PROC FREQ} procedure when using the \code{/list} option of the \code{TABLE} statement. \code{\link{comparedf}}: Compare two data.frames and report any differences between them, much like SAS's \code{PROC COMPARE} procedure. \code{\link{write2word}}, \code{\link{write2html}}, \code{\link{write2pdf}}: Functions to output tables to a single Word, HTML, or PDF document. \code{\link{write2}}: Functions to output tables to a single document. (Also the S3 backbone behind the \code{write2*} functions.) \code{\link{keep.labels}}: Keep the \code{'label'} attribute on an R object when subsetting. \code{\link{formulize}}: A shortcut to generate one-, two-, or many-sided formulas. \code{\link{mdy.Date}} and \code{\link{Date.mdy}}: Convert numeric dates for month, day, and year to Date object, and vice versa. \code{\link{is.Date}}: Test if an object is a date. \code{\link{\%nin\%}}: Test for "not in". \code{\link{allNA}} and \code{\link{includeNA}}: some useful functions for dealing with NAs. } \section{Data}{ \code{\link{mockstudy}}: Mock study data for examples. } \examples{ library(arsenal) } arsenal/man/summary.modelsum.Rd0000644000176200001440000000556513737403146016305 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/summary.modelsum.R \name{summary.modelsum} \alias{summary.modelsum} \alias{as.data.frame.summary.modelsum} \title{Summarize a \code{modelsum} object.} \usage{ \method{summary}{modelsum}( object, ..., labelTranslations = NULL, text = FALSE, title = NULL, term.name = "", adjustment.names = FALSE ) \method{as.data.frame}{summary.modelsum}( x, ..., text = x$text, term.name = x$term.name, adjustment.names = x$adjustment.names, width = NULL, min.split = NULL, list.ok = FALSE ) } \arguments{ \item{object}{A \code{\link{modelsum}} object.} \item{...}{For \code{summary.modelsum}, other arguments passed to \code{\link{as.data.frame.modelsum}}. For \code{as.data.frame.summary.modelsum}, "width" and "min.split" are passed to \code{\link{smart.split}}. For \code{print}ing the summary object, these are passed to both \code{as.data.frame.summary.modelsum} and \code{\link[knitr]{kable}}.} \item{labelTranslations}{A named list (or vector) where the name is the label in the output to be replaced in the pretty rendering by the character string value for the named element of the list, e.g., \code{list(age = "Age(Years)", meansd = "Mean(SD)")}.} \item{text}{An argument denoting how to print the summary to the screen. Default is \code{FALSE} (show markdown output). \code{TRUE} and \code{NULL} output a text-only version, with the latter avoiding all formatting. \code{"html"} uses the HTML tag \code{} instead of the markdown formatting, and \code{"latex"} uses the LaTeX command \code{\\textbf}.} \item{title}{Title/caption for the table, defaulting to \code{NULL} (no title). Passed to \code{\link[knitr]{kable}}. Can be length > 1 if the more than one table is being printed.} \item{term.name}{A character vector denoting the column name for the "terms" column. It should be the same length as the number of tables or less (it will get recycled if needed). The special value \code{TRUE} will use the y-variable's label for each table.} \item{adjustment.names}{Logical, denoting whether the names of the adjustment models should be printed.} \item{x}{An object of class \code{"summary.modelsum"}.} \item{width}{Passed to \code{\link{smart.split}} for formatting of the "term" column.} \item{min.split}{Passed to \code{\link{smart.split}} for formatting of the "term" column.} \item{list.ok}{If the object has multiple by-variables, is it okay to return a list of data.frames instead of a single data.frame? If \code{FALSE} but there are multiple by-variables, a warning is issued.} } \value{ An object of class \code{"summary.modelsum"} } \description{ Format the information in \code{object} as a table using Pandoc coding or plain text, and cat it to stdout. } \seealso{ \code{\link{modelsum}}, \code{\link{as.data.frame.modelsum}} } \author{ Ethan Heinzen, based on code originally by Greg Dougherty } arsenal/man/modelsum.internal.Rd0000644000176200001440000000134513632700353016406 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/modelsum.internal.R \name{modelsum.internal} \alias{modelsum.internal} \alias{is.modelsum} \alias{is.summary.modelsum} \alias{na.modelsum} \title{Helper functions for modelsum} \usage{ is.modelsum(x) is.summary.modelsum(x) na.modelsum(object, ...) } \arguments{ \item{x}{A \code{modelsum} object.} \item{object}{A \code{data.frame} resulting from evaluating a \code{modelsum} formula.} \item{...}{Other arguments, or a vector of indices for extracting.} } \value{ \code{na.modelsum} returns a subsetted version of \code{object} (with attributes). } \description{ A set of helper functions for \code{\link{modelsum}}. } \seealso{ \code{\link{arsenal_table}} } arsenal/man/summary.tableby.Rd0000644000176200001440000000720313737403146016071 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/summary.tableby.R \name{summary.tableby} \alias{summary.tableby} \alias{as.data.frame.summary.tableby} \title{The summary method for a \code{tableby} object} \usage{ \method{summary}{tableby}( object, ..., labelTranslations = NULL, text = FALSE, title = NULL, pfootnote = FALSE, term.name = "" ) \method{as.data.frame}{summary.tableby}( x, ..., text = x$text, pfootnote = x$pfootnote, term.name = x$term.name, width = NULL, min.split = NULL, list.ok = FALSE ) } \arguments{ \item{object}{An object of class \code{"tableby"}, made by the \code{\link{tableby}} function.} \item{...}{For \code{summary.tableby}, other arguments passed to \code{\link{as.data.frame.tableby}}. For \code{print}ing the summary object, these are passed to both \code{as.data.frame.summary.tableby} and \code{\link[knitr]{kable}}.} \item{labelTranslations}{A named list (or vector) where the name is the label in the output to be replaced in the pretty rendering by the character string value for the named element of the list, e.g., \code{list(age = "Age(Years)", meansd = "Mean(SD)")}.} \item{text}{An argument denoting how to print the summary to the screen. Default is \code{FALSE} (show markdown output). \code{TRUE} and \code{NULL} output a text-only version, with the latter avoiding all formatting. \code{"html"} uses the HTML tag \code{} instead of the markdown formatting, and \code{"latex"} uses the LaTeX command \code{\\textbf}.} \item{title}{Title/caption for the table, defaulting to \code{NULL} (no title). Passed to \code{\link[knitr]{kable}}. Can be length > 1 if the more than one table is being printed.} \item{pfootnote}{Logical, denoting whether to put footnotes describing the tests used to generate the p-values. Alternatively, "html" to surround the outputted footnotes with \code{
  • }.} \item{term.name}{A character vector denoting the column name for the "terms" column. It should be the same length as the number of tables or less (it will get recycled if needed). The special value \code{TRUE} will use the y-variable's label for each table.} \item{x}{An object of class \code{"summary.tableby"}.} \item{width}{Passed to \code{\link{smart.split}} for formatting of the "term" column.} \item{min.split}{Passed to \code{\link{smart.split}} for formatting of the "term" column.} \item{list.ok}{If the object has multiple by-variables, is it okay to return a list of data.frames instead of a single data.frame? If \code{FALSE} but there are multiple by-variables, a warning is issued.} } \value{ An object of class \code{summary.tableby} } \description{ The summary method for a \code{\link{tableby}} object, which is a pretty rendering of a \code{\link{tableby}} object into a publication-quality results table in R Markdown, and can render well in text-only. } \examples{ set.seed(100) ## make 3+ categories for response nsubj <- 90 mdat <- data.frame(Response=sample(c(1,2,3),nsubj, replace=TRUE), Sex=sample(c("Male", "Female"), nsubj,replace=TRUE), Age=round(rnorm(nsubj,mean=40, sd=5)), HtIn=round(rnorm(nsubj,mean=65,sd=5))) ## allow default summaries on RHS variables out <- tableby(Response ~ Sex + Age + HtIn, data=mdat) summary(out, text=TRUE) labels(out) labels(out) <- c(Age="Age (years)", HtIn="Height (inches)") summary(out, stats.labels=c(meansd="Mean-SD", q1q3 = "Q1-Q3"), text=TRUE) } \seealso{ \code{\link{tableby.control}}, \code{\link{tableby}} } \author{ Ethan Heinzen, based on code by Gregory Dougherty, Jason Sinnwell, Beth Atkinson, adapted from SAS Macros written by Paul Novotny and Ryan Lennon } arsenal/man/write2.internal.Rd0000644000176200001440000000177213656527336016017 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/write2.internal.R \name{write2.internal} \alias{write2.internal} \alias{verbatim} \alias{code.chunk} \title{Helper functions for \code{write2}} \usage{ verbatim(...) code.chunk(..., chunk.opts = "r") } \arguments{ \item{...}{For \code{verbatim}, objects to print out monospaced (as if in the terminal). For \code{code.chunk}, either expressions or single character strings to paste into the code chunk.} \item{chunk.opts}{A single character string giving the code chunk options. Make sure to specify the engine!} } \description{ Helper functions for \code{\link{write2}}. } \details{ The \code{"verbatim"} class is to tell \code{\link{write2}} to print the object inside a section surrounded by three back ticks. The results will look like it would in the terminal (monospaced). \code{code.chunk()} is to write explicit code chunks in the \code{.Rmd} file; it captures the call and writes it to the file, to execute upon knitting. } arsenal/man/write2.Rd0000644000176200001440000001215313656527336014177 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/write2.R, R/write2.default.R \name{write2} \alias{write2} \alias{write2.arsenal_table} \alias{write2.summary.arsenal_table} \alias{write2.comparedf} \alias{write2.summary.comparedf} \alias{write2.verbatim} \alias{write2.yaml} \alias{write2.code.chunk} \alias{write2.knitr_kable} \alias{write2.xtable} \alias{write2.character} \alias{write2.list} \alias{write2.default} \title{write2} \usage{ write2(object, file, ..., output_format) \method{write2}{arsenal_table}(object, file, ..., output_format = NULL) \method{write2}{summary.arsenal_table}(object, file, ..., output_format = NULL) \method{write2}{comparedf}(object, file, ..., output_format = NULL) \method{write2}{summary.comparedf}(object, file, ..., output_format = NULL) \method{write2}{verbatim}(object, file, ..., output_format = NULL) \method{write2}{yaml}(object, file, ..., output_format = NULL) \method{write2}{code.chunk}(object, file, ..., output_format = NULL) \method{write2}{knitr_kable}(object, file, ..., output_format = NULL) \method{write2}{xtable}(object, file, ..., output_format = NULL) \method{write2}{character}(object, file, ..., output_format = NULL) \method{write2}{list}( object, file, ..., append. = FALSE, render. = TRUE, keep.rmd = !render., output_format = NULL ) \method{write2}{default}( object, file, FUN = NULL, ..., append. = FALSE, render. = TRUE, keep.rmd = !render., output_format = NULL ) } \arguments{ \item{object}{An object.} \item{file}{A single character string denoting the filename for the output document.} \item{...}{Additional arguments to be passed to \code{FUN}, \code{rmarkdown::render}, etc. One popular option is to use \code{quiet = TRUE} to suppress the command line output.} \item{output_format}{One of the following: \enumerate{ \item{An output format object, e.g. \code{rmarkdown::\link[rmarkdown]{html_document}(...)}.} \item{A character string denoting such a format function, e.g. \code{"html_document"}. In this case, the \code{"..."} are NOT passed.} \item{The format function itself, e.g. \code{rmarkdown::html_document}. In this case, the \code{"..."} arguments are passed.} \item{One of \code{"html"}, \code{"pdf"}, and \code{"word"}, shortcuts implemented here. In this case, the \code{"..."} arguments are passed.} \item{\code{NULL}, in which the output is HTML by default.} } See \code{rmarkdown::\link[rmarkdown]{render}} for details.} \item{append.}{Logical, denoting whether (if a temporary \code{.Rmd} file of the same name already exists) to append on. Used mostly for \code{write2.list}.} \item{render.}{Logical, denoting whether to render the temporary \code{.Rmd} file. Used mostly for \code{write2.list}.} \item{keep.rmd}{Logical, denoting whether to keep the intermediate \code{.Rmd} file. Used mostly for \code{write2.list}.} \item{FUN}{The summary-like or print-like function to use to generate the markdown content. Can be passed as a function or a character string. It's expected that \code{FUN(object, ...)} looks "good" when put directly in a \code{.Rmd} file.} } \value{ \code{object} is returned invisibly, and \code{file} is written. } \description{ Functions to output tables to a single document. (Also the S3 backbone behind the \code{write2*} functions.) } \details{ \code{write2} is an S3 method. The default prints the object (using \code{\link{print}}) inside a section surrounded by three back ticks. See \code{\link{verbatim}} for details. There are methods implemented for \code{\link{tableby}}, \code{\link{modelsum}}, and \code{\link{freqlist}}, all of which use the \code{summary} function. There are also methods compatible with \code{\link[knitr]{kable}}, \code{\link[xtable]{xtable}}, and \code{\link[pander]{pander_return}}. Another option is to coerce an object using \code{\link{verbatim}()} to print out the results monospaced (as if they were in the terminal). To output multiple tables into a document, simply make a list of them and call the same function as before. Finally, to output code chunks to be evaluated, use \code{\link{code.chunk}}. For more information, see \code{vignette("write2")}. } \examples{ \dontrun{ data(mockstudy) # tableby example tab1 <- tableby(arm ~ sex + age, data=mockstudy) write2(tab1, tempfile(fileext = ".rtf"), toc = TRUE, # passed to rmarkdown::rtf_document, though in this case it's not practical quiet = TRUE, # passed to rmarkdown::render title = "My cool new title", # passed to summary.tableby output_format = rmarkdown::rtf_document) write2html(list( "# Header 1", # a header code.chunk(a <- 1, b <- 2, a + b), # a code chunk verbatim("hi there") # verbatim output ), tempfile(fileext = ".html"), quite = TRUE) } } \seealso{ \code{\link{write2word}}, \code{\link{write2pdf}}, \code{\link{write2html}}, \code{\link[rmarkdown]{render}}, \code{\link[rmarkdown]{word_document}}, \code{\link[rmarkdown]{html_document}}, \code{\link[rmarkdown]{pdf_document}}, \code{\link[rmarkdown]{rtf_document}}, \code{\link[rmarkdown]{md_document}}, \code{\link[rmarkdown]{odt_document}} } \author{ Ethan Heinzen, adapted from code from Krista Goergen } arsenal/man/arsenal_table.Rd0000644000176200001440000000375213656527335015563 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/arsenal_table.R \name{arsenal_table} \alias{arsenal_table} \alias{has_strata} \alias{[.arsenal_table} \alias{labels.arsenal_table} \alias{labels<-.arsenal_table} \alias{print.arsenal_table} \alias{merge.arsenal_table} \alias{merge.freqlist} \alias{print.summary.arsenal_table} \title{\code{arsenal} tables with common structure} \usage{ has_strata(x) \method{[}{arsenal_table}(x, i, j, ...) \method{labels}{arsenal_table}(object, ...) \method{labels}{arsenal_table}(x) <- value \method{print}{arsenal_table}(x, ...) \method{merge}{arsenal_table}(x, y, all = FALSE, all.x = all, all.y = all, ...) \method{merge}{freqlist}(x, y, all = TRUE, ...) \method{print}{summary.arsenal_table}( x, ..., format = if (!is.null(x$text) && x$text \%in\% c("html", "latex")) x$text else "markdown", escape = x$text \%nin\% c("html", "latex"), width = NULL, min.split = NULL ) } \arguments{ \item{x, y, object}{An object of class \code{"arsenal_table"}} \item{i, j}{A vector to index \code{x} with: either names of variables, a numeric vector, or a logical vector of appropriate length. \code{i} indexes the x-variables, and \code{j} indexes the by-variables.} \item{...}{Other arguments (only used in \code{print.summary.arsenal_table})} \item{value}{A list of new labels.} \item{all, all.x, all.y}{Logicals, denoting which terms to keep if not all are in common.} \item{format}{Passed to \code{\link[knitr]{kable}}: the format for the table. The default here is "markdown". To use the default in \code{kable}, pass \code{NULL}. If \code{x$text} specifies LaTeX or HTML formatting, that format is used in the table.} \item{escape}{Passed to \code{\link[knitr]{kable}}: should special characters be escaped when printed?} \item{width, min.split}{Passed to \code{\link{smart.split}} for formatting of the "term" column.} } \description{ \code{arsenal} tables with common structure } \seealso{ \code{\link{merge}}, \code{\link{labels}} } arsenal/man/figures/0000755000176200001440000000000013656527336014136 5ustar liggesusersarsenal/man/figures/logo.png0000755000176200001440000010600513656527336015611 0ustar liggesusersPNG  IHDR% Q pHYs.#.#x?v IDATx]\/ec#G #\پBbΖ0F7|2ٖ/Fl"'btv6=H;#F::C"'\hH`Jѳ-c?jw'YUڮ?wdmّSoBjW=(EQl\R87y{UkbKף97 8$1M4U+}"_I[¾77^ 7+%C:{YiJ JPڪO:&t ۷L=(gN͞yYVM^cτK#~WY)(4PP7+# o@:% jݛF:2菉G=<J @ȾTon0@s% z#p '' M(Xܛ#jOtNM%dۑH#~tuO̤Ȯ{NM2"d"F"@!o ()bkbKף&'X=vS7p'Ӫ @M %5j]ˆHN}WB(4wH_HĖ.XV{,y(xI*VvmNzÔ.RP4wH%NUԆPP4wH-ĊFTC(79W..F.&?s1{MQz$M7 `ZȞy##KW_+ÉOwOTzt &Dr! @K%|.(Z+.+Fv} *&Q?Z'Ӗ.ex>J(F"ܔuiMmZ~ќɏcjKD_&}"y{ߪlu &69Q( ':'&'X}oFFF/)F,_?Cjr 'P'B @ȮGFo$+{e);&= ,RڪY_,}VWLĖ&R87X`ZݍD9oE$t}犞uĖGMN(XV{즴:o$j0btdod꛼P`ZSɿ7Ncs7o<,%sz#Ҙ߬;od!#~mrb|M~hK언F3k"t#kMzkxͻV{l{\#~W\7Re$(}ۍr#H}|EoUZ)&j–)! dA(j/FMݑE)GW)]7ߞ(7N(jۑVG>o$ +漑L|RLcFNE5x,JZJ]ُ%S%v}hWS87jG(j%F"irpoNMJZ87rO[.@iĔs-]u!knJ9ZKO1:xr&Mbd P5L=vu #Fb5x0 'gw79‰ P Uk_cz#qވZCm!@ P T=1(KGM78<`H%@,Y^^Z,bp4o&/ '}FϾ7["0GbRnF$ɫ5x,@C %@_Ȯ47ke]o|=&@-qI^kǢ~(ՑdG|?GXa(&>*~(#kMzkx. LT1wez#\7Q7&! P,ZD)V-s[!Bꛜc2& z#H"#~FɉqIDDX(Xٕߥ+KQP80BtZ59PdF(%Fve?7z#F5}hWS87L($F"EodEHlՂ'bKW}i 0+U=({XՊDj/&1?ZŖGM%@WMiVe/6or2J/iǮNa$H826əbb&Mɫ5x,@%9i47kU:ߤ9}t,q%b*Dy>ՑdSSF|B[[VMO؝#kۣ0 R.ݟ}odbdRFI] Ī 5x,@ERi{#17ˆC*&tŤlۍ$I{Sw$kC7&\GҤ.}"B V{lGy#~Zy&>)tŴ̝N// @K|ň,o2qM!j(C %`7a䦬eF:#~9;+ )@C %@F~ֲe]ohr &P`OtNMyhٕߥ+KQ@D0BtZ59PP z##q!sQ2|&S^@(̥Zّ3򋊑 ]8[HD2&@ZݍD1+&*`\lzT%@ZHe/Tj/qI}oB  0B d$ݟ}o$j^Z+&fM"ŹPH]ioΝ7b/&Ж.%^#V{,󹯎4%+5rkek٪Pr~.b7Lsg5FϾ7dy1[F KQ;orPOB LD_tj+&j–)! 5"@MȽ;"_hN${$oo5 @ c;H>_s4'?6aT?PCM( ȾTW낢2278&~5}PHئ# ܖ&/po&'ZV ɉ2D@i=97qN H;$°@FcD}u$i-YyT-#~A1b-s)bG<Ȏ;ZWg}bFZ(&QG 7a(FV-)&j)! s ,RoZrįPg1B8t&}P3hvՑ}+ /g؄§Ӕ5x,PKB L#Fve?7z#L2}hWS87/J<7adG%z#/*Fb@C!NM J{ߪluyH7)]9g[7)B C)F6|-72[,<9n&'& 5j]H O6or8}PI[57ت +&bp}obKCeajEo(ӎiZKnjֈVMOU#kۣްjp7PH#~gYY:ZGPO1B8VN7UjX-ݟ}o$jE+VLn7B&Jhٕf,,gT7qtJhV{, r_KVX7Y]9_菉G<w:4FK%|ňA7.or4PBRo$MY?tjŶ-W[g ~V}cVt7y!MȚPBRoZ$z# 5F͛֗c+5+W@k*}qos79Q( ':'&J(!K7+KW#~aXܸbۖvu˕e?ZPq:Lg5S*֭-6\v'3/Rz![LCR2j '&dC(! iVFvdEodEHlՂ0}7A$#V)b"U@[0pr F/NĖ&a}jO(ZEQ9Zqai]PG\ju*w}[>[pUOqQ37)]9W&t=oB9jnY>j}%Wr[?U#%׭-k夯xO(S| VNm\nJ'Oj |;$jzdm٢(WGȥS# nsmcsOǟbD8ʕ_UNLr;LÜti|ط}dm7P%FFFֶϴ6ZMklx`ѾJ+.jʢN[ͮƲRHF,+Z.,FFF*өGֶLz<&(@-cqҹ#يe%j8' !zFV)p=nsFZKVm(WO2c~zP^F%ˋKsa)[gΖu]IPLw=we (×+'Y-]Z(v#PB%Z)}H0'fw?\J̐ wߍmY|l^$p}kAȪ 儮ԕq$ޓjNa*z#iwKV@sXo')÷IA+3-]`00B jHadWW=FFo$jtDBI'8MwEquW\2";,Nuicd|V0^_(3Uٸ.8T$/~u<\ڇKd2 G6-+V0",tVNꋓbU$bĤ`*.,&>|?r7ůV{lOQLxt  NJ_H$z#Uʑ ޹ok߱'܊ܯ7QBۢ'Z[gȹ˳G8x*3Tn}}oB%,J?S|97)P{Qx_<D&o#^ߞ&MX,;1pFst&j=}U!܈ʭq_|JѝykǢ/mZ7o&jA5:͋ %Uz?7>3/*&-&?=Z&jpK/19KKs?}pxءb^ql fʩ2rg[GV/&?s9$k 1>8fG8y %*-Jc~wcň@rC][/XY“VӖ)!̌ܙ1V{:tJz#PX8}!.\i{'iMM1|eG!3#ZߥH,Y^V~ y @r˿nFoK0/=1q/+ωH=|˺Ȟ١:%)_bS׬]J3 [Y~1|PB)7{SW%:#֙nfO?{xogt׶kB}}=uI޶-]C(!ɎFv|5boy%z#0dbtn]*QN$zKkN>p6RoioTĞ荬\7Lt8_u+8㶫XQ?{/o2܄!z#Sol@F}ZLBc崺qXXhx,m {٢!N}MP2D:Vg_Hj[Ӎ޶sP~+eXp[|K5OySҵ[d%C)]Y])H`uJ=m}& [)@ҙ/jB &Of7ٕ_7:FE|#~cjX`V#?@S^>z'Vz' Sgto]S&'4Tڪ?O}z]YS.p4p?>UD 9?vl$C\wp@'}oߖJFDS`7/nVQ|y bn;'wIc#]~Pb4ýI  =m?mZiydR:_JIl͏_EvU992ԯ-o^nqcHmГ.,&?|l!VJvcP'NB = 4@?mZEwmZ0:czcZ|fw"$2(tM`l]eP3Rbf!Z!Ng%=W'y+mx^9J}$TyIQ,_SLc#=XL:P4J]ُ=evmIFFضrDk<7"w.-'am{#}kW{yt8GS&#H$ӕMDPn[Rՙ`EqgX/"[s~)?>UGɒ|&W )ԄPRXZ~ֲz#֬-n;]ҙ2uX݈QqDZ.g}L'jůX[*CJ,n^n1u'wVw?>m%.,QX j=oR#~jݔz#s~#~{s#n;["ttNt>^Ou &Fxlm<7 '&J*j]H#Vur~(N xBz>xGu!{:TCŌ&ӖWpP2`iF[iLJ:&Fc>[z!rdx ߺ*6eK !{S}!oߤ@ 't P2@X缑{#h/0/0Re_?/e<u;)[R-[4'S7 H8< @?Ȓz#J&R1`3ju!|'Ī 5x,fMňV{1z.@[V\VVmHŖNcJ۴" uIlaw+||wܿ=sH>Hئuo%Jz#@tVH:[<92?W2-e{BSPtC@ %||ŮߵcI]&=&X=#,]QV|MohN0Tw}oI[ehi`9)z}["lre-Q/+@%FZK.tŴ;Z&I}ЌRmZbձS2XŸo=w"{&uNWm\ !7oZ/(+gʾIr*7}KkX',R_/ tj@.|\5tUXaA9;+/2]©qIbKWQ,P7+KW#~XX&dAN8lbs!cjr'+>荤y9+G $@$FzCvbV{שoY)U+z#;j`gF,P71hg*^-]MbQmr 7Ycc.u $@͕#WuyUCnJhǢ?0򼑕_:Qՙ#@.bNJˊV|tE/[P'yOk#۷:m{`͌ 迎\_L~z1|=v8~6$-Jc~v#~(_r|&ӪɁN~pa֔V{:\,hT$t5o_zCRF=޷9zi9bˊe*&QϸܩjCQOxp S%2cSɻ7R8t0ّU&yK4Bp 'C7BxBIܛ#YH6-M iK#| m=#F$1/z#KN$ܛIwv|q&o)[}lA7ZcoȮOb/{#/r;&ϷcS8yiV%7+Z+VFFjZ}8$ɯSu!\tr$ej_`57oSQot_][\)IGϾ7cb@~⸆+&?/&TL\_rbm=^µ %7/Xtj]MVmꛌ)-]q7N#kJRoZmZI=mz$oRyߤ;tDr$17z#q"@PI􅗮9ޛIUvV{;DZhiakqr}Z>|=oȮOb/{#/PIҕo7y!hd`=YqmZWdمe> SѢ(Td wխX<_lIlPV\&yfeb}Tߤw֭թ7|gĞ[<{ŲouM67Uk_cz#qވ!N/G^17ٞtH}ov=;:sc͙#,B9B8+2H#w l$-zVZn=.,WN&?=u-&?UzEJ҈iY'_U+H GV/'t#'> |=v8W/)z#ؾU+@*OkTZy=Mڗz PjݛȽ=}JVV7Қ+sM}y݉Go$O=1zƴ74' ;BxNٗNxWFrԈ_ojTdO97}K'gf %i_ؽDǬEi٦@͕} VE8$_Ӗ=EQ<2&]ӑ=qkcrS=hMc>U+  -v\~Q19n1\횴Sk0$vLGV/&?9SL[z霒F[bq+M %qNb -6 W}6ƶ-Wklq[^hٌ&VOoYd$1U(F/y]l/6^ߩϜ-_ֳ}7JϼJqǞD8z(_UtԕIpWJʃc.(֬*7nԡbnIqΩE??{[_ T}\NZ23d_ $@un;rE`" Ez4"a!}tOj~_-CIa";^XHzqcUqt&gQoň@ŽWMMſ|TK\P8o$F}SMVm!{TJb[mILebEdzUN w 굈HUo/pjpoRBIFFF/)G %Mӫb5W %@~I5?V6Bx$EiNܣO' %–_T6%,^-] $ē+Z P^~Q1qI ]^NH qɀyTJ(*%JJU~;]l^ݰ^~j(nZ/+ٰ} P{e@(# <ܿyűoGOosj` %䡁<+} | T߿StX|[x/O$IYbko5"d.{~Ҹ1U-RB P)PTJ(*%JJ %@RB P)PTJ(*%JJ %@RB P)PTJ(*%JJ %@RK]~jz\jt/ %x辛]vHl*%JwNCoK0+9w;Vqk~f%Qܐ~GO7 ϼJqǼs`@E0ҍn?,{RJ+^"388fJ"ɑ{9ȎN P)PTJ(*΂l|mq5Fz|TYuJ6MW[\98׼9"[/DJwIy0Y/DzgZP۶\Yu˶b}}Lx?yz׭m7\۳7_qoC' x_ӳPR'7qUҽ `UHg|?#쉻P TC?{Ӡ$>2{w^_[_k޾x:%ZuC#>YwO {mEkYG(*-~ bmҮ͵Œl Z7?U(z^}I7}o>()/P2`0gK8vr17SV$FbD/DQl~s}*A?(b4a}zUZ Fz5o^ 潽>STJ(*%JJ %@RB P)ovuK1O8ǿ|ὝJm7\;{;MaLg]Jc}kű0L|@mSoBI=+}RS7?x'9<s1bq2ןP@{<4O~s}~ǧ9}PyN7=lyoχ{Ωb.e{+O^Q9z/brǞms-[UG}*e{y=ANJ5F򜼷E(i'{}20?(nC~qKܳxo-RB P)PTJ(*%JJ %@RB P)PTJ(*mk.:;?\Y5Zl޴G{79e ȑP57ŶkG~S9z_C\u|@򣇟+ym[,niunx:=?#'3oǎ՗.Ο7]QY_$ @ ӻnC {:T}POv|ݺ"}7#VS5~v~|kCV㺽nr㶫9srN D'Uy˛^F||\u_:q-DKwݲu|PJPfs"@MħUo'űŢtEՍۯs\WI7%ƴsm5WVovJ(ذ7MTlF>PX:vșPTJj.FjzROlrku;'&Rկ[?!t/N_6}7zݪ֏?'~`%Pso}7f[׾Gx f0ǟ^7,۷J %@R:%5@&h07BIT9%=07oJ %^D{pJX"ǎU|GO2íX]seQd87|غ诳a]}o<ވNU~mtZԡb2ԒP2o eQ|zڤ^Rn֕H| % PG?՞Ń\H! $e g]xqs׉(:v>Aݼ/y\tE7'{r'@U?;x7=<6*|DϿ3/9?T~ *L'B# U(c?:yMR,Wq7Sѩb ?ykϥ 1|0ncȠlI?on|狐0ՋgGOg%~NX__|jN ̢AofQNlx}>e챊ǧM#>Mb[LT=n8)?ō]lp7U~k!ʘfD(Y4yP@A^ M1HDx[s{$"nIgۿ=qU %9ud.#scj{~R)ZMap2U(P  1"7n:Rޜ?ueA|tI<3<xqb{ %O#eQ܄+ߖ+EֵgOݓS;6q7qϋkѧ˖$L"\q`YbButu,Nw\_B @ ŧ1]kmDq+#qՑ@r 78pq.T,PP3۶\YV1KϋAn.[:}xJjdh}׬7э1VG'\suz& @%5dBZ&kN `2n`K qZv.x#zO(\GخOa;`{B @"e/"PP#IJ*0 &P0`ID0yz B E3Mr %7)㳝>3^\YON%IzDz+g &[mXHl۪N09=y&~E(yLI^L?%}tIly 2[DhkR̉P'Iy|b͛w+L+\]i"%}4i+w@zC]Dl{p` f<>ai8 ?~b{Erk.|D/h9۰a'PH_0ˆH'ݱ/V͞~Jz$nfB @I3Hۛo/JzmW1k=[xFB "MnEHLb1.z0JiO z$&C3_b %0۶$z$)Vflgml74L6.% 7>ma{ˡB čdPvcZ6ӊC' zB t#7Gr=5n.nZ jB < dHNǴ-3vYjB <čc@vlbZpb\ʹyGh18!&r #`4ypM&?>z!t;09U*H(jf6B ,1X-P0 7 a# VIP0$T%+6 ͛JBv;0$.wNlGOLLP0%@̋lާuR@(L%Ov8rWKb+!@ %_fըUn4P3 $Klu*EH-M%|AU?{ofx:Lh.<3}"_) 33=[Jm$}I@04P ܭ>ӈV觙ۿ"CoCfa[=ONxD(^0 Tt}7n~0@`xn8weqJ&J6ӍL<ڴi@ jݶ;ܩVlB([`l B 0fe[Je)$4P %[Iӱ hJnc B 0nrڧmue dB 0tSty:nQW3m[^7 kB 0tuY%)f"umk۬yJsoNg*C9zL %z+% #Fmr&C%nY5:SShnDșP 3|In98 J(ʶ.[^>z8jom3 q3mJ$d"2t 3vDc%Й9Umz%@uJOuۖW@({%:%@f +YR?h$g m2 tJ7b c>l۾@(i7JBSTw%Pyi &&as]uyjM(i߄U7B 4@ %/=16.niO|iP7B 0T|G(J5'@C;iM@ %@YSaiJƙSn$Jr$@%@RB P)oߚl6j(|_/Fof?AEԡb\yt8m[)*%JJ %@RK]n~ :{.NM % r~ Ȑ[@RB P)PTJ(*%JJ %@RB P)PTJ(*%JJ %@RB P)PTJ(*%JJ %@RK]~`võņukg}{m-RB P)PTJ(*%JJ %@RB P)PTJ(*%JJ %@RB P)RfhyMW]s崏όOo*?3>>o䫽yņ/.6bʩq8v| -6غ}qYKޑ_^5k*N^;mۖ+aکqş{Ϝ=w_>z?ߣ~v~og+x{(A'@Mۯ*miosSE)8x䵁vn^fr q3]ܸs!m:kV}9$,T|~q};7s!n~o:?\5+99|o*1-,?s q ז_w6]xo{r!mregk}y}H=( V~Y:?`M BO}P?yNܠfZq79\EvP.c.~+1kp-> =;~y={Y}P9\< D~3i.k1s}?ONj3ߏxӏ,FltPMX?a|q}i 3^ oZkRF1~z׭{~R-=s[zY޹嵨z ^x,YYBqg?w(* NzIU[5z8 MI M4.$ʨIɛhBT l6` `JIрMߤ1eV%fṉ,3sΙp6sG7< >݁iR'vUUzP5֏:& Vr4Yim&u|UMFm2Qe{dų=] 1(Jh¡>}j,>M޿/Y5{FW+c|b}}msқVI_35U>uu_OqnTP_[ݷ 2kہ@ޡC~imZ xv't>x(sT(EG&~ oIg鍽W/5A)T{aϽw-ou«^?zxN>0wE P!M|~!^AAބ7yդC[&kՇ ㌖ʸO{)ŷ/Zmv|EI?䊾޽&Ӛ_fnE 2AcMw}O(ѪmI_|'?)uvOj|:])ŋǞ@N3?*NϟovɜaQvN[e?=QVU v+U1MtwPu6`]a?,}UZD!(}608"E ^zr}D[ezd\m ɃzEF"ȶbJcQVİNekǎLt,CQTmMzx`nH4alQgk=eVwL@Q8tm Ea@0 +.֨O# ~(*' ~VaTOGAI]V}jh ҸA*fyPO\%XPkaFQTh{'ePtXpM3ѽXfJ\po3_Ru("1hlSbng@K`xojiZys)PE A ?,X؃:+/)TqnI~=QAIHFQx @II]>IS^w?}["ݏ$_vcE).4EV>8@QD,m񹵦Wu@^v*~ͻNT{La4Vq_g^Qw}y0_/2%@U.I-s/|St_WwhÇlK+Yj?(J֘աSJ+]Hߚ+BU6\Gt{t= @(JgUoQ-@*~w/k¤hǥל%(z5y-kGI'H SDLgMQGb59ꗮmnM?zU).`u*~hWJ<5|*DO#ނx#;wp8aa1LVQhI֮ n^*3 eW^e":D6,jPey'kME=xПޟ')7+V bFut܈ V=/BmQc3|JOͫG͗0&(׸ AK`&*F+ϙ~ ٚ7H:Cr-zZCFM@EqoJ PdSU.Sp Bc9'>Ar/}~фcNS:cOU^mcXD~(*/~幑Nj5i^^]q؍nMOsO@V^sCR_\Ϭt_w(aݯ޼knb(JK_[q']_0;WTؕWSܺ]?e,C\Myi0ՄOYmI;IH E Pe>=pvEzYjG\AgYPlA"DM2 E 0b@ p~c ģ-_It_Kji$O5&VUݯ2[je q[ȊTK[Җ- y4OW1UPyfLvM »[}_DwTpѓO}Ļ)~`ѫaZVb9{];?Uh`Qf5Sv0^s#{(ɶMvN^׏1ۖJ]cUt_GSVv8U=_gt<8_[=X?Hk+7rwuU8, IDATdJ+hC a4#=OpK oImI0)n{.x|X $UU[h/|7 4͍|gW/6|^%=6oԥԚlwZABf/sj?U6.EW^^62JUQVI<:Eʎz2K"&V]*  qv[ 2QZ!a+)0:7ݟ7η=pn>7#VH]~U<=֪LլAޟ-j?I _-҃+DblST /3Ԝ9>>4[.Jp~幁&1o(bwH}ĪZKOͿ\Ʀ?¬scbSA ad~n] mj7_`/+VS_xDgΐ91.5lRA_0޿&ye& ݓe ߟA [2(rضoTt&N]M࣢W什?֬UQbdXO8}C9Sg2^A&;'t.5ɹO}j_(~fZ=VIOw,;I ϲ^t&w?}3vGLrQcDH ۧ$[gQ|!=ջCMH>[fux#w?¯U|_eҩI Ou]OaVV>w۽~~My{Ts@u}uCP&QATd ҢF.)0]z[o57K _F6oİcH#߲ Iڞwɹ{:mLhr|D E(~'PQ{z6ʍtoL}VLhwmr֮Ccoc=t7"3*8 歔@g7vG}|Oc+ iRP\E=u0 h^_|~wfUxX[:m|a:nҾ Oܖ]Hz[!|VmB_D=P"'Xik^U{!ڱփ쇅}B Te缑(w[P<[Y\Ӿ/cΛ PA_ULx5U"'%막V .N4Yg?UL}xM˼U:kOYC)r` ?7YtBqe~zo$mON]O=:[&M56@ ZA:Xmi!]nsGN=/@C H^׶/F֤\&Mʞ^a ꆦɣV7lG~');3LzqT /ƷlCQ@]L͍nqxuGzǽ]ߍ : $VnhA>Q{]~'!iɂ[kEIOtɩaZ$M2iIh&F?~k; Њ7Qqrl?Jʛ$"r#wC%7[Y0:6&rDz3a9=_lݎZcƴ^ vyf;p+7B̰VG7ҙ˛\6<99M4CG~Ob7H<$a˛\-'M QW[bda/T鞤$mtMi!L=ʍ $vF.$^;DA_{kN a7h4d:o$nuFU[z;Iۧ\qvD-u yQ7-uՊ$727{J+NΛ4Ï sFr#7bdA %pYOZ'+Ƙ;xq1E-n?UB Byfh{ īuƫk!<MDaxD!i~ndF<|Bx&t<`6^!7슑Zr#{zQ]$&M2-Bcȍt\1Rknd/gJ.N\$UMg^$o/F6_7ƫ1c!7`f.oU)?MBX+:0S=Ȓ {Kp].M&Y9@QG- ܚ+Fˍ%}DI>aMax&5PnDlwy#<Ki^gJ^uykrߑ#۲yU &U3wwj:S/˛\&O֡  N67F/Ze Q=O˛jE 9^DcZ=%i{ڵ?odl܃EjGnDȼc`[.NW1]wDXɶMu.77Ց9wURFh] h& 4T޽7Ȃend/~̞']&B4Zfq,b$^[4cq˶Mq=H#ȍ1%7 K&ݎ]&M@Luݹc̙&F¬.oU)V&:^Nr#K.޸^(Jv&o!<6nZO7ARn$ikilnd/%H9ש+#E& ʍlް+$Snb8ˍ;2%W59`mټIˮL7~FUfOe+`,^cfZBOIaxZ+oras#:o$몵X@Q҇$mO$ɁI{"yP'E?+F<KP(Jm <|QrBT*lGXr#j,v<Op(Jcnd:7 7LֽC_Fh;!q-gț~v|<-~dȒ=튓&*N= Zn\!7qy ZY:^1脟Spxmoݲ@0r#lVG1Og+W=KTiPpy߈&5~܃Pݛ;-~ύ,r##BQR79VN~#M28-_5j~Ti`ͣub%UH (Jj&o!<6nZ*Nț67y#k.ľXI>7 ;or=| RnD?F.f+(JSA+M&0CHZ_5WEI}Mn!| ëHPnDl*v\G CD$B։1fs:wm۱skl eɻ&s~NVG,/r.or9ʛ>i`1ؗ\WeƂ(i7| qRqB1]sȂcQ4Lg1gϛ:j;u7QnD_F.f+(i$ms]~5~|p+`@C+#Yf -~HǃB% S8 ;o:߄ aMK]"ɍbo.8rIyuQ; f:g$W8r#*F= jDK`OZ'+Ƙ;xq"ȫm|S -3ݽn@ |uD۳^p ^`<6k!<MDax"wo ?7r#k(C3QM@yjmCnd#F6%S]$&M2-Pߍ1F:!7])։˛jIcM3wMvZ #1gȍ`?P˛hd*諦Z|<3ύ,;BQ\r-Mɝj܈:jo\1BnPoI>aMax&)7Fo~; "SW]D皜Jf[6oҲ'MˍV# ? C&o!t:`17~}uZ`,E *IS. v=x `XlndF -~\1X Dg1gϛ:j;u7-lG-N,7#cnd:諬-]wZVFZλ ;-~1%$mrIyҥn]EݴU+!vE *iB8qF7܈yƂ%0*.NW1}xʵOm[O(&WWGW= jZD7QHyNs# TJ.or*OByi! Y+#ˮ!7P .or>&M2- fٶ +ˍ\~EȔ .orMҥIc yGn;׌ɶB[*Dΰ:_0cw\r-79|>F\BQo[&Fr~VF%x/I\&ɛ@ټaWHleq>#SW]D皜˶lޤeWO&țzuպGVX=1+BP}u ;o7܈ kXB(J$mO$ɁI{"y"Zb#Oa# (AKҶȞ >or]9!o3Q+܈f+FQ%i[5qr|ɺ7ch;@/FQh3QM?juK-~m=Ȳ۪E_DI+Nϛ8`48ݸCn㊑y -|}uu"bIW\+d&ߺeTB ar#lVG1Og+W= 0Tr$m?&SAOMԥk0ݛ;-~ύ,r#E Mλp)o28-`ʍueR 4E MBI&0O7};q.x0dJ(.orLNޕ7vSy͖g&ﮛ5c/ 3iɠ\r-79|Mub%U&o!<6nZ*NțhټC5b_`,@m(J'I&aM/7%FTlSnb8XQI1hK­ălnɺ7cت5VG:E I>劓&j!MțiV.bdك^()W790iZ*Nț7܈Z{0K%@I֞߳Mkyf;jŒQ_j{( r-gɛL<`lndF U@%@ILyÏn]P76^!7?J(i7 {KׁIL>F aPߍ1F:!7a'__]nH/cAd&vV o@|dư:2gy:[YX 1Mj2TD]t  Ggȍ(MT *o28- r#j~a\1Bn`}΅ΛtM}kN VA+#Yf -~HǃBQ(Iۧ\qvD-u yQMK]"ɍbCQx IS8 ;or`Ҵ?js'sFr#7bdރFK`OZ'+Ƙ;xq"%6- Z3mnbXs ^`,@1cN a4F;2r#@Q\|-Ld1Z#^ʍ輑ۡŎ _`,jF&\$.]ʛt;v+]5!oXɻ&sk IDATl+7B #z6.or9ʛ>i`Bub!ϐ ry7-'MFr~GVF<E }%i;$#E&r#7 I,r W59˶lޤeWO&ț?܈j Oe+`,<72R}u ;o:߄ VMj!<~ܮo0Yf -~-~ и3QM?juä6~ndFh `((J ]]q~DظAwz +Fȍ*Z|}uu"bjm|bcFYʍlnWcX3<,^`,"7,J.o2VD]Ə{0 ޴ FYp#F E JSA_qM&0vHZ_$!ϐPry>0ih!_QnDlt\G @C)P)7d0Ɯ+o'; &Mg&ﮛ5c/ 3ߢj&o!F4>FV-r#jAQv.or9c㦥ℼIlndF -~\}h0H9wIyGM"܈7Boʍ\Vg= PK tti!J<{3ZnuX($mrIy&MnZK!vޡ($mO$ɁI:͝ :g$W8r#*F= <-x-__IDzu/N- Z{,Lv=1mzuպx`W|+k!<Mdݛ6AndFh %ZD7іG<LoZ eWJ˛BI&pwz +FȍW['K.o&ݎ2d^$o2zʍlnWcX3rPnD7pk!7 %s. v'o2Ft`߂q\b40DdJD)__&:t1۲y=AjM7)#L]7ٝk:NVGĊo7syVI:tԃFtH!%UkكPh 7QqD56nZ*Nț͍lވ+F< E I1lyCGm.&ůB[ox0(7r1[Y =@i%)I\߿ti!J<P2uowi q(J4ZO$켉Z~vj uRWHr#s4E 'Ӯpydl܃ь7܈y%0^nHcʼn Km[O(QVnduoȜ;cڱRp-gɛ( {soU܇vZw+'Rd [oȲ+FȍCP>\|-Ld0Zg&۸Cnr#2%7&vRޤ۱[쪉y<3ywdwŰ:rs쏕(Mj2uSd⑝M<ub%b'7Q@\r-MZ&ʍ:^1 ɚ+Fȍ@I%0$msΛ 1 if the more than one table is being printed.} \item{x}{An object of class \code{summary.freqlist}.} \item{list.ok}{If the object has multiple by-variables, is it okay to return a list of data.frames instead of a single data.frame? If \code{FALSE} but there are multiple by-variables, a warning is issued.} } \value{ An object of class \code{"summary.freqlist"} (invisibly for the print method). } \description{ Summarize the \code{freqlist} object. } \examples{ # load mockstudy data data(mockstudy) tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA = "ifany") noby <- freqlist(tab.ex, na.options = "include") summary(noby) withby <- freqlist(tab.ex, strata = c("arm","sex"), na.options = "showexclude") summary(withby) summary(withby, dupLabels = TRUE) } \seealso{ \code{\link{freqlist}}, \code{\link[base]{table}}, \code{\link[stats]{xtabs}}, \code{\link[knitr]{kable}}, \code{\link{freq.control}}, \code{\link{freqlist.internal}} } \author{ Tina Gunderson, with major revisions by Ethan Heinzen } arsenal/man/selectall.Rd0000644000176200001440000000201613713262505014714 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/selectall.R \name{selectall} \alias{selectall} \alias{as.selectall} \alias{as.matrix.selectall} \alias{[.selectall} \alias{is.na.selectall} \alias{is.selectall} \title{Make a column for "select all" input} \usage{ selectall(...) as.selectall(x) \method{as.matrix}{selectall}(x, ...) \method{[}{selectall}(x, i, j, drop = FALSE) \method{is.na}{selectall}(x) is.selectall(x) } \arguments{ \item{...}{Named arguments of the same length. These should be logical, numeric (0/1) or a factor with two levels.} \item{x}{An object of class "selectall"} \item{i, j, drop}{Arguments to `[.matrix`} } \description{ Make a column for "select all" input } \examples{ d <- data.frame(grp = rep(c("A", "B"), each = 5)) d$s <- selectall( `Option 1` = c(rep(1, 4), rep(0, 6)), `Option 2` = c(0, 1, 0, 0, 0, 1, 1, 1, 0, 0), `Option 3` = 1, `Option 4` = 0 ) summary(tableby(grp ~ s, data = d), text = TRUE) } \seealso{ \code{\link{tableby}}, \code{\link{paired}} } arsenal/man/diffs.Rd0000644000176200001440000000311113715042107014030 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/comparedf.internal.R \name{diffs} \alias{diffs} \alias{n.diff.obs} \alias{n.diff.obs.comparedf} \alias{n.diff.obs.summary.comparedf} \alias{n.diffs} \alias{n.diffs.comparedf} \alias{n.diffs.summary.comparedf} \alias{diffs.comparedf} \alias{diffs.summary.comparedf} \title{Extract differences} \usage{ n.diff.obs(object, ...) \method{n.diff.obs}{comparedf}(object, ...) \method{n.diff.obs}{summary.comparedf}(object, ...) n.diffs(object, ...) \method{n.diffs}{comparedf}(object, ...) \method{n.diffs}{summary.comparedf}(object, ...) diffs(object, ...) \method{diffs}{comparedf}( object, what = c("differences", "observations"), vars = NULL, ..., by.var = FALSE ) \method{diffs}{summary.comparedf}( object, what = c("differences", "observations"), vars = NULL, ..., by.var = FALSE ) } \arguments{ \item{object}{An object of class \code{comparedf} or \code{summary.comparedf}.} \item{...}{Other arguments (not in use at this time).} \item{what}{Should differences or the not-shared observations be returned?} \item{vars}{A character vector of variable names to subset the results to.} \item{by.var}{Logical: should the number of differences by variable be reported, or should all differences be reported (the default).} } \description{ Extract differences (\code{diffs()}), number of differences (\code{n.diffs()}), or number of not-shared observations (\code{n.diff.obs()}) from a \code{comparedf} object. } \seealso{ \code{\link{comparedf}} \code{\link{summary.comparedf}} } \author{ Ethan Heinzen } arsenal/man/formulize.Rd0000644000176200001440000000440113674210554014763 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/formulize.R \name{formulize} \alias{formulize} \title{formulize} \usage{ formulize( y = "", x = "", ..., data = NULL, collapse = "+", collapse.y = collapse, escape = FALSE ) } \arguments{ \item{y, x, ...}{Character vectors, names, or calls to be collapsed (by \code{"+"}) and put left-to-right in the formula. If \code{data} is supplied, these can also be numeric, denoting which column name to use. See examples.} \item{data}{An R object with non-null column names.} \item{collapse}{How should terms be collapsed? Default is addition.} \item{collapse.y}{How should the y-terms be collapsed? Default is addition. Also accepts the special string "list", which combines them into a multiple-left-hand-side formula, for use in other functions.} \item{escape}{A logical indicating whether character vectors should be coerced to names (that is, whether names with spaces should be surrounded with backticks or not)} } \description{ A shortcut to generate one-, two-, or many-sided formulas from vectors of variable names. } \examples{ ## two-sided formula f1 <- formulize("y", c("x1", "x2", "x3")) ## one-sided formula f2 <- formulize(x = c("x1", "x2", "x3")) ## multi-sided formula f3 <- formulize("y", c("x1", "x2", "x3"), c("z1", "z2"), "w1") ## can use numerics for column names data(mockstudy) f4 <- formulize(y = 1, x = 2:4, data = mockstudy) ## mix and match f5 <- formulize(1, c("x1", "x2", "x3"), data = mockstudy) ## get an interaction f6 <- formulize("y", c("x1*x2", "x3")) ## get only interactions f7 <- formulize("y", c("x1", "x2", "x3"), collapse = "*") ## no intercept f8 <- formulize("y", "x1 - 1") f9 <- formulize("y", c("x1", "x2", "-1")) ## LHS as a list to use in arsenal functions f10 <- formulize(c("y1", "y2", "y3"), c("x", "z"), collapse.y = "list") ## use in an lm f11 <- formulize(2, 3:4, data = mockstudy) summary(lm(f11, data = mockstudy)) ## using non-syntactic names or calls (like reformulate example) f12 <- formulize(as.name("+-"), c("`P/E`", "`\% Growth`")) f12 <- formulize("+-", c("P/E", "\% Growth"), escape = TRUE) f <- Surv(ft, case) ~ a + b f13 <- formulize(f[[2]], f[[3]]) } \seealso{ \code{\link[stats:delete.response]{reformulate}} } \author{ Ethan Heinzen } arsenal/man/as.data.frame.modelsum.Rd0000644000176200001440000000214313656527335017210 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/as.data.frame.modelsum.R \name{as.data.frame.modelsum} \alias{as.data.frame.modelsum} \title{as.data.frame.modelsum} \usage{ \method{as.data.frame}{modelsum}(x, ..., labelTranslations = NULL, list.ok = FALSE) } \arguments{ \item{x}{A \code{\link{modelsum}} object.} \item{...}{Arguments to pass to \code{\link{modelsum.control}}.} \item{labelTranslations}{A named list (or vector) where the name is the label in the output to be replaced in the pretty rendering by the character string value for the named element of the list, e.g., \code{list(age = "Age(Years)", meansd = "Mean(SD)")}.} \item{list.ok}{If the object has multiple by-variables, is it okay to return a list of data.frames instead of a single data.frame? If \code{FALSE} but there are multiple by-variables, a warning is issued.} } \value{ A \code{data.frame}. } \description{ Coerce a \code{\link{modelsum}} object to a \code{data.frame}. } \seealso{ \code{\link{modelsum}}, \code{\link{summary.modelsum}} } \author{ Ethan Heinzen, based on code originally by Greg Dougherty } arsenal/man/freq.control.Rd0000644000176200001440000000247713656527336015407 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/freq.control.R \name{freq.control} \alias{freq.control} \title{Control settings for \code{freqlist} function} \usage{ freq.control( sparse = FALSE, single = FALSE, dupLabels = FALSE, digits.count = 0L, digits.pct = 2L, ..., digits = NULL ) } \arguments{ \item{sparse}{a logical value indicating whether to keep rows with counts of zero. The default is \code{FALSE} (drop zero-count rows).} \item{single}{logical, indicating whether to collapse results created using a strata variable into a single table for printing} \item{dupLabels}{logical: should labels which are the same as the row above be printed? The default (\code{FALSE}) more closely approximates \code{PROC FREQ} output from SAS, where a label carried down from the row above is left blank.} \item{digits.count}{Number of decimal places for count values.} \item{digits.pct}{Number of decimal places for percents.} \item{...}{additional arguments.} \item{digits}{A deprecated argument} } \value{ A list with settings to be used within the \code{freqlist} function. } \description{ Control test and summary settings for the \code{\link{freqlist}} function. } \seealso{ \code{\link{freqlist}}, \code{\link{summary.freqlist}}, \code{\link{freqlist.internal}} } \author{ Ethan Heinzen } arsenal/man/labels.Rd0000644000176200001440000000273113632700353014210 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/labels.R \name{labels} \alias{labels} \alias{labels.data.frame} \alias{labels.keep_labels} \alias{labels<-} \alias{labels<-.keep_labels} \alias{labels<-.default} \alias{labels<-.data.frame} \alias{set_labels} \alias{set_attr} \title{Labels} \usage{ \method{labels}{data.frame}(object, ...) \method{labels}{keep_labels}(object, ...) labels(x) <- value \method{labels}{keep_labels}(x) <- value \method{labels}{default}(x) <- value \method{labels}{data.frame}(x) <- value set_labels(x, value) set_attr(x, which, value) } \arguments{ \item{...}{Other arguments (not in use at this time).} \item{x, object}{An R object.} \item{value}{A vector or list containing labels to assign. Labels are assigned based on names, if available; otherwise, they're assigned in order. Can pass \code{NULL} to remove all labels.} \item{which}{See \code{\link{attr<-}}} } \value{ The labels of \code{object}, or \code{object} with new labels. } \description{ Assign and extract the \code{'label'} attribute on an R object. \code{set_labels} is the same as \code{labels(x) <- value} but returns \code{x} for use in a pipe chain. \code{set_attr} is the same as \code{attr(x, which) <- value} but returns \code{x} for use in a pipe chain. } \details{ The \code{\link{data.frame}} methods put labels on and extract labels from the \emph{columns} of \code{object}. } \seealso{ \code{\link{keep.labels}} } \author{ Ethan Heinzen } arsenal/man/yaml.Rd0000644000176200001440000000200113632700353013676 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/yaml.R \name{yaml} \alias{yaml} \alias{print.yaml} \alias{c.yaml} \alias{is.yaml} \title{Include a YAML header in \code{write2}} \usage{ yaml(...) \method{print}{yaml}(x, ...) \method{c}{yaml}(..., recursive = FALSE) is.yaml(x) } \arguments{ \item{...}{For \code{yaml()}, arguments to be bundled into a list and passed to \code{\link[yaml]{as.yaml}}. For \code{print.yaml()}, extra arguments. For \code{c.yaml()}, "yaml" objects to be concatenated.} \item{x}{An object of class \code{"yaml"}.} \item{recursive}{Not in use at this time.} } \value{ A text string of class \code{"yaml"}. } \description{ Include a YAML header in \code{write2} } \examples{ x <- yaml(title = "My cool title", author = "Ethan P Heinzen") x y <- yaml("header-includes" = list("\\\\usepackage[labelformat=empty]{caption}")) y c(x, y) } \seealso{ \code{\link[yaml]{as.yaml}}, \code{\link{write2}} } \author{ Ethan Heinzen, adapted from an idea by Brendan Broderick } arsenal/man/grapes-nin-grapes.Rd0000644000176200001440000000105213675162464016277 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/not.in.R \name{\%nin\%} \alias{\%nin\%} \alias{nin} \title{Not in} \usage{ x \%nin\% table } \arguments{ \item{x}{vector or \code{NULL}: the values to be matched.} \item{table}{vector or \code{NULL}: the values to be matched against.} } \value{ The negation of \code{\link[base:match]{\%in\%}}. } \description{ The not-in operator for R. } \examples{ 1 \%nin\% 2:10 c("a", "b") \%nin\% c("a", "c", "d") } \seealso{ \code{\link[base:match]{\%in\%}} } \author{ Raymond Moore } arsenal/man/keep.labels.Rd0000644000176200001440000000236613632700353015137 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/keep.labels.R \name{keep.labels} \alias{keep.labels} \alias{keep.labels.data.frame} \alias{keep.labels.default} \alias{[.keep_labels} \alias{[<-.keep_labels} \alias{loosen.labels} \alias{loosen.labels.data.frame} \alias{loosen.labels.default} \title{Keep Labels} \usage{ keep.labels(x, ...) \method{keep.labels}{data.frame}(x, ...) \method{keep.labels}{default}(x, ...) \method{[}{keep_labels}(x, ...) \method{[}{keep_labels}(x, i) <- value loosen.labels(x, ...) \method{loosen.labels}{data.frame}(x, ...) \method{loosen.labels}{default}(x, ...) } \arguments{ \item{x}{An R object} \item{...}{Other arguments (not in use at this time).} \item{i, value}{See \code{\link{[<-}}.} } \value{ A copy of \code{x} with a "keep_labels" class appended on or removed. Note that for the \code{data.frame} method, only classes on the columns are changed; the \code{data.frame} won't have an extra class appended. This is different from previous versions of \code{arsenal}. } \description{ Keep the \code{'label'} attribute on an R object when subsetting. \code{loosen.labels} allows the \code{'label'} attribute to be lost again. } \seealso{ \code{\link{labels}} } \author{ Ethan Heinzen } arsenal/man/modelsum.family.Rd0000644000176200001440000000141013714570766016062 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/modelsum.families.R \name{modelsum.family} \alias{modelsum.family} \alias{survival} \alias{ordinal} \alias{negbin} \alias{clog} \alias{relrisk} \title{Family functions for modelsum} \usage{ survival() ordinal(method = c("logistic", "probit", "loglog", "cloglog", "cauchit")) negbin(link = c("log", "identity", "sqrt")) clog() relrisk(link = "log") } \arguments{ \item{method}{See \code{MASS::\link[MASS]{polr}}.} \item{link}{See \code{MASS::\link[MASS]{glm.nb}}.} } \value{ A list, in particular with element \code{family}. } \description{ A set of family functions for \code{\link{modelsum}}. } \seealso{ \code{\link[stats]{family}}, \code{\link[survival]{coxph}}, \code{\link[MASS]{polr}} } arsenal/man/comparedf.tolerances.Rd0000644000176200001440000000331213632700353017040 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/comparedf.tolerances.R \name{comparedf.tolerances} \alias{comparedf.tolerances} \alias{tol.NA} \alias{tol.num.absolute} \alias{tol.num.percent} \alias{tol.num.pct} \alias{tol.factor.none} \alias{tol.factor.levels} \alias{tol.factor.labels} \alias{tol.char.both} \alias{tol.char.case} \alias{tol.char.trim} \alias{tol.char.none} \alias{tol.date.absolute} \alias{tol.logical.none} \alias{tol.other.none} \title{\code{comparedf} tolerances} \usage{ tol.NA(x, y, idx) tol.num.absolute(x, y, tol) tol.num.percent(x, y, tol) tol.num.pct(x, y, tol) tol.factor.none(x, y) tol.factor.levels(x, y) tol.factor.labels(x, y) tol.char.both(x, y) tol.char.case(x, y) tol.char.trim(x, y) tol.char.none(x, y) tol.date.absolute(x, y, tol) tol.logical.none(x, y) tol.other.none(x, y) } \arguments{ \item{x, y}{vectors of the appropriate lengths and types.} \item{idx}{A logical vector of appropriate length.} \item{tol}{A numeric tolerance} } \value{ A logical vector of length equal to that of \code{x} and \code{y}, where \code{TRUE} denotes a difference between \code{x} and \code{y}, and \code{FALSE} denotes no difference between \code{x} and \code{y}. } \description{ Internal functions defining tolerances for the \code{\link{comparedf.control}} function. To create your own tolerance definitions, see the vignette. } \details{ \code{tol.NA} takes as differences between two vectors any elements which are NA in one but not the other, or which are non-NA in both and \code{TRUE} in \code{idx}. It is useful for handling NAs in custom tolerance functions. } \seealso{ \code{\link{comparedf.control}}, \code{\link{comparedf}} } \author{ Ethan Heinzen } arsenal/man/internal.functions.Rd0000644000176200001440000000176013712342521016570 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/internal.functions.R \name{internal.functions} \alias{internal.functions} \alias{smart.split} \alias{replace2} \title{Internal Functions} \usage{ smart.split(string, width = Inf, min.split = -Inf) replace2(x, list, values) } \arguments{ \item{string}{A character vector} \item{width}{Either \code{Inf} or \code{NULL} to specify no splitting, or a positive integer giving the largest allowed string length.} \item{min.split}{Either \code{-Inf} or \code{NULL} to specify no lower bound on the string length, or a positive integer giving the minimum string length.} \item{x}{vector} \item{list}{an index vector} \item{values}{replacement values} } \value{ For \code{smart.split}, a list of the same length as \code{string}, with each element being the "intelligently" split string. For \code{replace2}, a vector with the proper values replaced. } \description{ Internal Functions } \seealso{ \code{\link[base]{replace}} } arsenal/DESCRIPTION0000644000176200001440000000501214056520012013376 0ustar liggesusersPackage: arsenal Title: An Arsenal of 'R' Functions for Large-Scale Statistical Summaries Version: 3.6.3 Date: 2021-06-04 Authors@R: c( person("Ethan", "Heinzen", email = "heinzen.ethan@mayo.edu", role = c("aut", "cre")), person("Jason", "Sinnwell", role="aut"), person("Elizabeth", "Atkinson", role="aut"), person("Tina", "Gunderson", role="aut"), person("Gregory", "Dougherty", role="aut"), person("Patrick", "Votruba", role="ctb"), person("Ryan", "Lennon", role="ctb"), person("Andrew", "Hanson", role="ctb"), person("Krista", "Goergen", role="ctb"), person("Emily", "Lundt", role="ctb"), person("Brendan", "Broderick", role="ctb"), person("Maddie", "McCullough", role="art") ) Description: An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document. Suggests: broom (>= 0.7.1), magrittr, rmarkdown, testthat, xtable, pander, survival (>= 2.43-1), coin, pROC, MASS, splines, rpart, yaml, geepack Depends: R (>= 3.4.0), stats (>= 3.4.0) Imports: knitr (>= 1.29), utils (>= 3.4.0) URL: https://github.com/mayoverse/arsenal, https://cran.r-project.org/package=arsenal, https://mayoverse.github.io/arsenal/ BugReports: https://github.com/mayoverse/arsenal/issues VignetteBuilder: knitr License: GPL (>= 2) RoxygenNote: 7.1.1 LazyData: true Encoding: UTF-8 NeedsCompilation: no Packaged: 2021-06-04 21:23:01 UTC; m144326 Author: Ethan Heinzen [aut, cre], Jason Sinnwell [aut], Elizabeth Atkinson [aut], Tina Gunderson [aut], Gregory Dougherty [aut], Patrick Votruba [ctb], Ryan Lennon [ctb], Andrew Hanson [ctb], Krista Goergen [ctb], Emily Lundt [ctb], Brendan Broderick [ctb], Maddie McCullough [art] Maintainer: Ethan Heinzen Repository: CRAN Date/Publication: 2021-06-04 21:50:02 UTC arsenal/build/0000755000176200001440000000000014056514665013012 5ustar liggesusersarsenal/build/vignette.rds0000644000176200001440000000054314056514665015353 0ustar liggesusersRMO@BmKiI3IEcda+lCzo{؁=a?KƘ,\s6eP'CoH8w sjN<x 2O<4P^ȋY]%i?"̼FYIrǕ:nnײ>5l7RAw5y&Ol@+rؤih(5&N筑MAֱ͙4ګVDx N48D' םld\'2xrKqYUU9V ^"W|X7Varsenal/tests/0000755000176200001440000000000013632700353013043 5ustar liggesusersarsenal/tests/testthat/0000755000176200001440000000000014056520012014674 5ustar liggesusersarsenal/tests/testthat/helper-data.R0000644000176200001440000000240413632700353017214 0ustar liggesusers # set.seed(100) # nsubj <- 90 # need multiple of 3 # mdat <- data.frame( # Group = c(rep("High", nsubj/3), rep("Med", nsubj/3), rep("Low", nsubj/3)), # Sex = sample(c("Male", "Female"), nsubj, replace=TRUE), # Age = round(rnorm(nsubj, mean=40, sd=5)), # Phase = ordered(sample(c("I", "II", "III"), nsubj, replace=TRUE), levels=c("I", "II", "III")), # ht_in = round(rnorm(nsubj, mean=65, sd=5)), # time = round(runif(nsubj,0,7)), # status = rbinom(nsubj, 1, prob=0.4), # dt = as.Date(round(rnorm(90, mean=100, sd=2000)), origin="1950/01/01"), # missing = as.character(NA), # trt = factor(sample(c("A", "B"), nsubj, replace=TRUE)), # ethan = factor(c(NA, NA, NA, sample(c("Ethan", "Heinzen"), nsubj - 3, replace=TRUE))), # weights = c(20, 1.5, rep(1, nsubj - 2)), # stringsAsFactors = FALSE # ) # mdat$Group.fac <- factor(mdat$Group) # attr(mdat$ht_in, "label") <- "Height in Inches" # attr(mdat$trt, "label") <- "Treatment Arm" # attr(mdat$Age, "label") <- "Age in Years" # # class(mdat$Sex) <- c("dummyClassToTriggerErrors", class(mdat$Sex)) # # saveRDS(mdat, "tests/testthat/mdat.rds", compress = TRUE) ## As of 2019-02-26 the random number generator behind sample() changed, so I've saved out a copy to test against instead. mdat <- readRDS("mdat.rds") arsenal/tests/testthat/write2.freqlist.doc.Rmd0000644000176200001440000000300014056510156021151 0ustar liggesusers--- title: Test title --- Table: My cool title |arm |sex |mdquality.s | Freq| cumFreq| freqPercent| cumPercent| |:---------|:------|:-----------|----:|-------:|-----------:|----------:| |A: IFL |Male |0 | 29| 29| 10.47| 10.47| | | |1 | 214| 243| 77.26| 87.73| | | |NA | 34| 277| 12.27| 100.00| | |Female |0 | 12| 12| 7.95| 7.95| | | |1 | 118| 130| 78.15| 86.09| | | |NA | 21| 151| 13.91| 100.00| |F: FOLFOX |Male |0 | 31| 31| 7.54| 7.54| | | |1 | 285| 316| 69.34| 76.89| | | |NA | 95| 411| 23.11| 100.00| | |Female |0 | 21| 21| 7.50| 7.50| | | |1 | 198| 219| 70.71| 78.21| | | |NA | 61| 280| 21.79| 100.00| |G: IROX |Male |0 | 17| 17| 7.46| 7.46| | | |1 | 187| 204| 82.02| 89.47| | | |NA | 24| 228| 10.53| 100.00| | |Female |0 | 14| 14| 9.21| 9.21| | | |1 | 121| 135| 79.61| 88.82| | | |NA | 17| 152| 11.18| 100.00| arsenal/tests/testthat/write2.mylists.pdf.Rmd0000644000176200001440000000523614056510162021042 0ustar liggesusers--- title: Test title --- # Header 1 This is a small paragraph. | | Male (N=916) | Female (N=583) | Total (N=1499) | p value| |:---------------------------|:---------------:|:---------------:|:---------------:|-------:| |**Age in Years** | | | | 0.048| |   Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | | |   Q1, Q3 | 53.000, 69.000 | 52.000, 68.000 | 52.000, 68.000 | | |   Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | | | | Male (N=916) | Female (N=583) | Total (N=1499) | p value| |:---------------------------|:---------------:|:---------------:|:---------------:|-------:| |**Age in Years** | | | | 0.048| |   Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | | |   Q1, Q3 | 53.000, 69.000 | 52.000, 68.000 | 52.000, 68.000 | | |   Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | | |sex |arm | Freq| cumFreq| freqPercent| cumPercent| |:------|:---------|----:|-------:|-----------:|----------:| |Male |A: IFL | 277| 277| 18.48| 18.48| | |F: FOLFOX | 411| 688| 27.42| 45.90| | |G: IROX | 228| 916| 15.21| 61.11| |Female |A: IFL | 151| 1067| 10.07| 71.18| | |F: FOLFOX | 280| 1347| 18.68| 89.86| | |G: IROX | 152| 1499| 10.14| 100.00| | | case| age|arm |sex |race | fu.time| fu.stat| ps| hgb| bmi| alk.phos| ast| mdquality.s|age.ord | |:--|------:|---:|:---------|:------|:---------|-------:|-------:|--:|----:|--------:|--------:|---:|-----------:|:-------| |1 | 110754| 67|F: FOLFOX |Male |Caucasian | 922| 2| 0| 11.5| 25.09861| 160| 35| NA|60-69 | |2 | 99706| 74|A: IFL |Female |Caucasian | 270| 2| 1| 10.7| 19.49786| 290| 52| 1|70-79 | |4 | 105271| 50|A: IFL |Female |Caucasian | 175| 2| 1| 11.1| NA| 700| 100| 1|40-49 | |5 | 105001| 71|G: IROX |Female |Caucasian | 128| 2| 1| 12.6| 29.42922| 771| 68| 1|70-79 | |7 | 112263| 69|F: FOLFOX |Female |NA | 233| 2| 0| 13.0| 26.35352| 350| 35| NA|60-69 | |8 | 86205| 56|G: IROX |Male |Caucasian | 120| 2| 0| 10.2| 19.03673| 569| 27| 1|50-59 | arsenal/tests/testthat/write2.mylist.pdf.Rmd0000644000176200001440000000405614056510162020656 0ustar liggesusers--- title: Test title --- | | Male (N=916) | Female (N=583) | Total (N=1499) | p value| |:---------------------------|:---------------:|:---------------:|:---------------:|-------:| |**Age in Years** | | | | 0.048| |   Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | | |   Q1, Q3 | 53.000, 69.000 | 52.000, 68.000 | 52.000, 68.000 | | |   Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | | |sex |arm | Freq| cumFreq| freqPercent| cumPercent| |:------|:---------|----:|-------:|-----------:|----------:| |Male |A: IFL | 277| 277| 18.48| 18.48| | |F: FOLFOX | 411| 688| 27.42| 45.90| | |G: IROX | 228| 916| 15.21| 61.11| |Female |A: IFL | 151| 1067| 10.07| 71.18| | |F: FOLFOX | 280| 1347| 18.68| 89.86| | |G: IROX | 152| 1499| 10.14| 100.00| | | case| age|arm |sex |race | fu.time| fu.stat| ps| hgb| bmi| alk.phos| ast| mdquality.s|age.ord | |:--|------:|---:|:---------|:------|:---------|-------:|-------:|--:|----:|--------:|--------:|---:|-----------:|:-------| |1 | 110754| 67|F: FOLFOX |Male |Caucasian | 922| 2| 0| 11.5| 25.09861| 160| 35| NA|60-69 | |2 | 99706| 74|A: IFL |Female |Caucasian | 270| 2| 1| 10.7| 19.49786| 290| 52| 1|70-79 | |4 | 105271| 50|A: IFL |Female |Caucasian | 175| 2| 1| 11.1| NA| 700| 100| 1|40-49 | |5 | 105001| 71|G: IROX |Female |Caucasian | 128| 2| 1| 12.6| 29.42922| 771| 68| 1|70-79 | |7 | 112263| 69|F: FOLFOX |Female |NA | 233| 2| 0| 13.0| 26.35352| 350| 35| NA|60-69 | |8 | 86205| 56|G: IROX |Male |Caucasian | 120| 2| 0| 10.2| 19.03673| 569| 27| 1|50-59 | arsenal/tests/testthat/write2.render.html.Rmd0000644000176200001440000000024014056510166021002 0ustar liggesusers--- title: Test title --- hi there ```{r} arsenal::write2html(list("hi there", yaml(title = "hi there")), "hi_there.html", clean = TRUE) ``` arsenal/tests/testthat/test_lhs_freqlist.R0000644000176200001440000003576713632700353020606 0ustar liggesusers## Tests for freqlist context("Testing the freqlist strata and multiple LHS output") tab <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA = "ifany") dat <- data.frame( wts1 = rep(1:2, each = 6), wts2 = rep(1:2, times = 6), wts3 = rep(1:3, times = 4), a = rep(c("A", "B", "C"), each = 4), b = rep(c("D", "E", "F", "G"), each = 3), d = rep(c("H", "I"), each = 6), stringsAsFactors = FALSE ) tab.wts <- freqlist(list(wts1, wts2) ~ a + b + d, strata = "d", data = dat, sparse = FALSE) ##################################### test_that("Multiple strata work", { expect_identical( capture.kable(summary(freqlist(tab, strata = c("arm", "sex")))), capture.kable(summary(freqlist(~ arm + sex + addNA(mdquality.s), data = mockstudy, strata = c("arm", "sex")), labelTranslations = c(arm = "arm", "addNA(mdquality.s)" = "mdquality.s"))) ) }) test_that("Multiple endpoints work", { expect_identical( capture.kable(summary(freqlist(list(wts1, wts2) ~ a + b, data = dat))), c( capture.kable(summary(freqlist(wts1 ~ a + b, data = dat))), "", "", capture.kable(summary(freqlist(wts2 ~ a + b, data = dat))) ) ) }) test_that("Multiple endpoints and strata work", { expect_identical( capture.kable(summary(tab.wts)), c( capture.kable(summary(freqlist(wts1 ~ a + b + d, strata = "d", data = dat))), "", "", capture.kable(summary(freqlist(wts2 ~ a + b + d, strata = "d", data = dat))) ) ) expect_identical( capture.kable(summary(tab.wts)), c( "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|H |A |D | 3| 3| 50.00| 50.00|", "| | |E | 1| 4| 16.67| 66.67|", "| |B |E | 2| 6| 33.33| 100.00|", "" , "" , "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|I |B |F | 4| 4| 33.33| 33.33|", "| |C |F | 2| 6| 16.67| 50.00|", "| | |G | 6| 12| 50.00| 100.00|", "" , "" , "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|H |A |D | 4| 4| 44.44| 44.44|", "| | |E | 2| 6| 22.22| 66.67|", "| |B |E | 3| 9| 33.33| 100.00|", "" , "" , "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|I |B |F | 3| 3| 33.33| 33.33|", "| |C |F | 1| 4| 11.11| 44.44|", "| | |G | 5| 9| 55.56| 100.00|" ) ) }) ##################################### test_that("Reordering variables and subsetting", { expect_identical( capture.kable(summary(tab.wts[c("b", "d", "a", "Freq")], text = TRUE)), c( "|b |d |a | Freq|", "|:--|:--|:--|----:|", "|D |H |A | 3|", "|E |H |A | 1|", "| | |B | 2|", "" , "" , "|b |d |a | Freq|", "|:--|:--|:--|----:|", "|F |I |B | 4|", "| | |C | 2|", "|G |I |C | 6|", "" , "" , "|b |d |a | Freq|", "|:--|:--|:--|----:|", "|D |H |A | 4|", "|E |H |A | 2|", "| | |B | 3|", "" , "" , "|b |d |a | Freq|", "|:--|:--|:--|----:|", "|F |I |B | 3|", "| | |C | 1|", "|G |I |C | 5|" ) ) expect_identical( capture.kable(summary(tab.wts[c(3,1,2,4)], text = TRUE)), capture.kable(summary(tab.wts[c("b", "d", "a", "Freq")], text = TRUE)) ) expect_identical( capture.kable(summary(tab.wts[, 2:1], text = TRUE)), capture.kable(summary(tab.wts[, c("wts2", "wts1")], text = TRUE)) ) expect_identical( capture.kable(summary(tab.wts[1:4], text = TRUE)), capture.kable(summary(tab.wts[c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)], text = TRUE)) ) expect_identical( capture.kable(summary(tab.wts[, 2], text = TRUE)), capture.kable(summary(tab.wts[, c(FALSE, TRUE)], text = TRUE)) ) expect_warning(tab.wts[1:8], "Some indices not found") expect_warning(tab.wts[, 1:3], "Some indices not found") expect_error(tab.wts[TRUE], "Logical vector") expect_error(tab.wts[, TRUE], "Logical vector") }) test_that("Merging freqlist objects", { tb1 <- tab.wts tb2 <- freqlist(wts3 ~ b + d, strata = "d", data = dat) tb3 <- freqlist(~ a + d, strata = "d", data = dat) tb4 <- freqlist(~ b, data = dat) expect_error(merge(tb1, freqlist(wts1 ~ a + d, strata = "d", data = dat)), "Can only merge freqlist objects") expect_identical( capture.kable(summary(merge(tb1, tb2))), c(capture.kable(summary(tb1)), "", "", capture.kable(summary(tb2))) ) expect_identical( capture.kable(summary(merge(tb1, merge(tb3, tb4)))), c(capture.kable(summary(tb1)), "", "", capture.kable(summary(tb3)), "", "", capture.kable(summary(tb4))) ) }) test_that("head() and tail() work with freqlist (#188)", { expect_identical( capture.kable(head(summary(tab.wts), 2)), capture.kable(summary(tab.wts))[2-7*(1:4)] # we already tested this above ) expect_identical( capture.kable(head(summary(tab.wts, sparse = TRUE))), c("|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|H |A |D | 3| 3| 50.00| 50.00|", "| | |E | 1| 4| 16.67| 66.67|", "| | |F | 0| 4| 0.00| 66.67|", "| | |G | 0| 4| 0.00| 66.67|", "| |B |D | 0| 4| 0.00| 66.67|", "| | |E | 2| 6| 33.33| 100.00|", "" , "" , "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|I |A |D | 0| 0| 0.00| 0.00|", "| | |E | 0| 0| 0.00| 0.00|", "| | |F | 0| 0| 0.00| 0.00|", "| | |G | 0| 0| 0.00| 0.00|", "| |B |D | 0| 0| 0.00| 0.00|", "| | |E | 0| 0| 0.00| 0.00|", "" , "" , "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|H |A |D | 4| 4| 44.44| 44.44|", "| | |E | 2| 6| 22.22| 66.67|", "| | |F | 0| 6| 0.00| 66.67|", "| | |G | 0| 6| 0.00| 66.67|", "| |B |D | 0| 6| 0.00| 66.67|", "| | |E | 3| 9| 33.33| 100.00|", "" , "" , "|d |a |b | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:--|:--|:--|----:|---------------:|-------:|------------------:|", "|I |A |D | 0| 0| 0.00| 0.00|", "| | |E | 0| 0| 0.00| 0.00|", "| | |F | 0| 0| 0.00| 0.00|", "| | |G | 0| 0| 0.00| 0.00|", "| |B |D | 0| 0| 0.00| 0.00|", "| | |E | 0| 0| 0.00| 0.00|" ) ) }) test_that("sort() works with freqlist() (#187)", { expect_error(sort(freqlist(~ arm, data = mockstudy)[c("arm", "cumFreq")]), "You tried to create or sort a freqlist") mockstudy.wts <- mockstudy mockstudy.wts$one <- 1 mockstudy.wts$two <- 2 mck.wts <- freqlist(list(one, two) ~ arm + addNA(mdquality.s) + sex, data = mockstudy.wts, strata = "sex", na.options = 'showexclude', labelTranslations = c("addNA(mdquality.s)" = "QOL")) mck.wts2 <- freqlist(list(one, two) ~ arm + addNA(mdquality.s) + sex, data = mockstudy.wts, strata = "sex", na.options = 'include', labelTranslations = c("addNA(mdquality.s)" = "QOL")) mck.wts3 <- freqlist(list(one, two) ~ arm + addNA(mdquality.s) + sex, data = mockstudy.wts, strata = "sex", na.options = 'remove') expect_true( all(capture.kable(summary(sort(mck.wts[c(1:4, 6)]), dupLabels = TRUE)) %in% capture.kable(summary(mck.wts[c(1:4, 6)], dupLabels = TRUE))) ) expect_true( all(capture.kable(summary(sort(mck.wts[c(1:4, 6)], decreasing = TRUE), dupLabels = TRUE)) %in% capture.kable(summary(mck.wts[c(1:4, 6)], dupLabels = TRUE))) ) expect_true( all(capture.kable(summary(sort(mck.wts2[c(1:4, 6)]), dupLabels = TRUE)) %in% capture.kable(summary(mck.wts2[c(1:4, 6)], dupLabels = TRUE))) ) expect_true( all(capture.kable(summary(sort(mck.wts3[c(1:4, 6)]), dupLabels = TRUE)) %in% capture.kable(summary(mck.wts3[c(1:4, 6)], dupLabels = TRUE))) ) ## this checks that each table recalculates with its own na.options expect_identical( capture.kable(summary(sort(merge(mck.wts[, 1], mck.wts2[, 2]), decreasing = TRUE))), c("|sex |Treatment Arm |QOL | Freq| Cumulative Freq| Percent| Cumulative Percent|" , "|:----|:-------------|:---|----:|---------------:|-------:|------------------:|" , "|Male |F: FOLFOX |1 | 285| 285| 37.35| 37.35|" , "| |A: IFL |1 | 214| 499| 28.05| 65.40|" , "| |G: IROX |1 | 187| 686| 24.51| 89.91|" , "| |F: FOLFOX |NA | 95| NA| NA| NA|" , "| |A: IFL |NA | 34| NA| NA| NA|" , "| |F: FOLFOX |0 | 31| 717| 4.06| 93.97|" , "| |A: IFL |0 | 29| 746| 3.80| 97.77|" , "| |G: IROX |NA | 24| NA| NA| NA|" , "| | |0 | 17| 763| 2.23| 100.00|" , "" , "" , "|sex |Treatment Arm |QOL | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:------|:-------------|:---|----:|---------------:|-------:|------------------:|", "|Female |F: FOLFOX |1 | 198| 198| 40.91| 40.91|", "| |G: IROX |1 | 121| 319| 25.00| 65.91|", "| |A: IFL |1 | 118| 437| 24.38| 90.29|", "| |F: FOLFOX |NA | 61| NA| NA| NA|", "| |A: IFL |NA | 21| NA| NA| NA|", "| |F: FOLFOX |0 | 21| 458| 4.34| 94.63|", "| |G: IROX |NA | 17| NA| NA| NA|", "| | |0 | 14| 472| 2.89| 97.52|", "| |A: IFL |0 | 12| 484| 2.48| 100.00|", "" , "" , "|sex |Treatment Arm |QOL | Freq| Cumulative Freq| Percent| Cumulative Percent|" , "|:----|:-------------|:---|----:|---------------:|-------:|------------------:|" , "|Male |F: FOLFOX |1 | 570| 570| 31.11| 31.11|" , "| |A: IFL |1 | 428| 998| 23.36| 54.48|" , "| |G: IROX |1 | 374| 1372| 20.41| 74.89|" , "| |F: FOLFOX |NA | 190| 1562| 10.37| 85.26|" , "| |A: IFL |NA | 68| 1630| 3.71| 88.97|" , "| |F: FOLFOX |0 | 62| 1692| 3.38| 92.36|" , "| |A: IFL |0 | 58| 1750| 3.17| 95.52|" , "| |G: IROX |NA | 48| 1798| 2.62| 98.14|" , "| | |0 | 34| 1832| 1.86| 100.00|" , "" , "" , "|sex |Treatment Arm |QOL | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:------|:-------------|:---|----:|---------------:|-------:|------------------:|", "|Female |F: FOLFOX |1 | 396| 396| 33.96| 33.96|", "| |G: IROX |1 | 242| 638| 20.75| 54.72|", "| |A: IFL |1 | 236| 874| 20.24| 74.96|", "| |F: FOLFOX |NA | 122| 996| 10.46| 85.42|", "| |A: IFL |NA | 42| 1038| 3.60| 89.02|", "| |F: FOLFOX |0 | 42| 1080| 3.60| 92.62|", "| |G: IROX |NA | 34| 1114| 2.92| 95.54|", "| | |0 | 28| 1142| 2.40| 97.94|", "| |A: IFL |0 | 24| 1166| 2.06| 100.00|" ) ) }) arsenal/tests/testthat/write2.freqlist.html.Rmd0000644000176200001440000000275214056510156021365 0ustar liggesusers--- title: Test title --- |arm |sex |mdquality.s | Freq| cumFreq| freqPercent| cumPercent| |:---------|:------|:-----------|----:|-------:|-----------:|----------:| |A: IFL |Male |0 | 29| 29| 10.47| 10.47| | | |1 | 214| 243| 77.26| 87.73| | | |NA | 34| 277| 12.27| 100.00| | |Female |0 | 12| 12| 7.95| 7.95| | | |1 | 118| 130| 78.15| 86.09| | | |NA | 21| 151| 13.91| 100.00| |F: FOLFOX |Male |0 | 31| 31| 7.54| 7.54| | | |1 | 285| 316| 69.34| 76.89| | | |NA | 95| 411| 23.11| 100.00| | |Female |0 | 21| 21| 7.50| 7.50| | | |1 | 198| 219| 70.71| 78.21| | | |NA | 61| 280| 21.79| 100.00| |G: IROX |Male |0 | 17| 17| 7.46| 7.46| | | |1 | 187| 204| 82.02| 89.47| | | |NA | 24| 228| 10.53| 100.00| | |Female |0 | 14| 14| 9.21| 9.21| | | |1 | 121| 135| 79.61| 88.82| | | |NA | 17| 152| 11.18| 100.00| arsenal/tests/testthat/write2.char.html.Rmd0000644000176200001440000000012014056510164020433 0ustar liggesusers--- title: Test title --- ``` [1] "Hi.1" "Hi.2" "Hi.3" "Hi.4" "Hi.5" ``` arsenal/tests/testthat/test_lhs_modelsum.R0000644000176200001440000007514114013041725020563 0ustar liggesuserscontext("Testing the modelsum strata and multiple LHS output") ########################################################################################################### #### Basic modelsum call ########################################################################################################### test_that("A three-LHS modelsum call", { expect_identical( capture.kable(summary(modelsum(list(ht_in, time, Age) ~ Sex + dt, adjust = ~ trt + Phase, data = mdat), text = TRUE)), c( capture.kable(summary(modelsum(ht_in ~ Sex + dt, adjust = ~ trt + Phase, data = mdat), text = TRUE)), "", "", capture.kable(summary(modelsum(time ~ Sex + dt, adjust = ~ trt + Phase, data = mdat), text = TRUE)), "", "", capture.kable(summary(modelsum(Age ~ Sex + dt, adjust = ~ trt + Phase, data = mdat), text = TRUE)) ) ) }) test_that("A modelsum call with strata", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + dt, adjust = ~ trt, data = mdat, strata = Group), text = TRUE)), c("|Group | |estimate |std.error |p.value |adj.r.squared |", "|:-----|:---------------|:--------|:---------|:-------|:-------------|", "|High |(Intercept) |40.833 |2.062 |< 0.001 |-0.050 |", "| |Sex Male |0.333 |2.326 |0.887 | |", "| |Treatment Arm B |-1.812 |2.331 |0.444 | |", "| |(Intercept) |39.853 |3.948 |< 0.001 |-0.047 |", "| |dt |-0.000 |0.001 |0.750 | |", "| |Treatment Arm B |-1.910 |2.347 |0.423 | |", "|Low |(Intercept) |40.524 |1.304 |< 0.001 |-0.040 |", "| |Sex Male |-1.192 |1.463 |0.422 | |", "| |Treatment Arm B |-0.591 |1.505 |0.697 | |", "| |(Intercept) |45.146 |2.751 |< 0.001 |0.074 |", "| |dt |0.001 |0.000 |0.054 | |", "| |Treatment Arm B |-0.363 |1.423 |0.801 | |", "|Med |(Intercept) |40.300 |2.069 |< 0.001 |-0.056 |", "| |Sex Male |0.117 |2.095 |0.956 | |", "| |Treatment Arm B |-1.467 |2.169 |0.505 | |", "| |(Intercept) |42.941 |4.383 |< 0.001 |-0.040 |", "| |dt |0.000 |0.001 |0.528 | |", "| |Treatment Arm B |-1.650 |2.171 |0.454 | |" ) ) }) test_that("strata levels are maintained", { tmp <- mdat tmp$Group[tmp$trt == "A" & tmp$Group == "Low"] <- NA expect_identical( capture.kable(summary(set_labels(modelsum(Age ~ Sex + dt, adjust = ~ Group, strata = trt, data = tmp), c(GroupLow = "Low")), text = TRUE)), c("|Treatment Arm | |estimate |std.error |p.value |adj.r.squared |Nmiss |", "|:-------------|:-----------|:--------|:---------|:-------|:-------------|:-----|", "|A |(Intercept) |42.139 |2.047 |< 0.001 |-0.047 |11 |", "| |Sex Male |-2.277 |2.456 |0.364 | | |", "| |Group Med |-0.533 |2.472 |0.831 | | |", "| |(Intercept) |34.850 |3.600 |< 0.001 |0.065 |11 |", "| |dt |-0.001 |0.000 |0.072 | | |", "| |Group Med |-1.642 |2.394 |0.500 | | |", "|B |(Intercept) |38.733 |1.470 |< 0.001 |-0.050 |0 |", "| |Sex Male |0.908 |1.407 |0.522 | | |", "| |Low |0.205 |1.753 |0.907 | | |", "| |Group Med |-0.317 |1.753 |0.857 | | |", "| |(Intercept) |47.149 |2.812 |< 0.001 |0.114 |0 |", "| |dt |0.001 |0.000 |0.003 | | |", "| |Low |-0.145 |1.613 |0.929 | | |", "| |Group Med |-0.318 |1.610 |0.844 | | |" ) ) }) test_that("Multiple adjustments work (#240)", { tmp.ms <- modelsum(list(age, bmi) ~ fu.time + ast, adjust = list(Unadjusted = ~ 1, "Adjusted for Arm" = ~ arm, "HGB + PS" = ~ hgb + ps), data = set_labels(mockstudy, list(ast = "AST", ps = "PS")), strata = sex, gaussian.stats = c("estimate", "std.error", "p.value", "adj.r.squared", "Nmiss", "N")) tmp.ms2 <- modelsum(list(age, bmi) ~ fu.time + ast, adjust = list(Unadjusted = ~ NULL, "Adjusted for Arm" = ~ arm, "HGB + PS" = ~ hgb + ps), data = set_labels(mockstudy, list(ast = "AST", ps = "PS")), strata = sex, gaussian.stats = c("estimate", "std.error", "p.value", "adj.r.squared", "Nmiss", "N")) expect_identical( capture.kable(summary(tmp.ms, adjustment.names = TRUE, term.name = TRUE)), capture.kable(summary(tmp.ms2, adjustment.names = TRUE, term.name = TRUE)) ) expect_identical( capture.kable(summary(tmp.ms, adjustment.names = TRUE, term.name = TRUE)), c("|sex |adjustment |Age in Years |estimate |std.error |p.value |adj.r.squared |Nmiss |N |", "|:------|:----------------|:---------------------------|:--------|:---------|:-------|:-------------|:-----|:---|", "|Male |Unadjusted |(Intercept) |61.686 |0.654 |< 0.001 |0.005 |0 |916 |", "| | |**fu.time** |-0.002 |0.001 |0.022 | | | |", "| |Adjusted for Arm |(Intercept) |61.073 |0.837 |< 0.001 |0.004 |0 |916 |", "| | |**fu.time** |-0.002 |0.001 |0.016 | | | |", "| | |**Treatment Arm F: FOLFOX** |1.048 |0.889 |0.239 | | | |", "| | |**Treatment Arm G: IROX** |0.857 |1.015 |0.399 | | | |", "| |HGB + PS |(Intercept) |57.090 |3.068 |< 0.001 |0.010 |162 |754 |", "| | |**fu.time** |-0.002 |0.001 |0.008 | | | |", "| | |**hgb** |0.365 |0.234 |0.118 | | | |", "| | |**PS** |0.881 |0.746 |0.238 | | | |", "| |Unadjusted |(Intercept) |61.556 |0.696 |< 0.001 |0.003 |162 |754 |", "| | |**AST** |-0.028 |0.016 |0.074 | | | |", "| |Adjusted for Arm |(Intercept) |61.524 |0.947 |< 0.001 |0.001 |162 |754 |", "| | |**AST** |-0.028 |0.016 |0.075 | | | |", "| | |**Treatment Arm F: FOLFOX** |0.322 |0.977 |0.741 | | | |", "| | |**Treatment Arm G: IROX** |-0.445 |1.115 |0.690 | | | |", "| |HGB + PS |(Intercept) |57.707 |3.133 |< 0.001 |0.006 |162 |754 |", "| | |**AST** |-0.032 |0.016 |0.043 | | | |", "| | |**hgb** |0.253 |0.232 |0.276 | | | |", "| | |**PS** |1.529 |0.739 |0.039 | | | |", "|Female |Unadjusted |(Intercept) |59.385 |0.822 |< 0.001 |-0.002 |0 |583 |", "| | |**fu.time** |-0.000 |0.001 |0.835 | | | |", "| |Adjusted for Arm |(Intercept) |59.449 |1.089 |< 0.001 |-0.003 |0 |583 |", "| | |**fu.time** |-0.000 |0.001 |0.707 | | | |", "| | |**Treatment Arm F: FOLFOX** |0.580 |1.216 |0.634 | | | |", "| | |**Treatment Arm G: IROX** |-0.856 |1.353 |0.527 | | | |", "| |HGB + PS |(Intercept) |46.660 |4.632 |< 0.001 |0.015 |104 |479 |", "| | |**fu.time** |-0.001 |0.001 |0.564 | | | |", "| | |**hgb** |1.156 |0.378 |0.002 | | | |", "| | |**PS** |-0.421 |0.902 |0.641 | | | |", "| |Unadjusted |(Intercept) |61.001 |0.883 |< 0.001 |0.004 |104 |479 |", "| | |**AST** |-0.033 |0.020 |0.096 | | | |", "| |Adjusted for Arm |(Intercept) |60.687 |1.266 |< 0.001 |0.006 |104 |479 |", "| | |**AST** |-0.032 |0.020 |0.102 | | | |", "| | |**Treatment Arm F: FOLFOX** |1.253 |1.297 |0.334 | | | |", "| | |**Treatment Arm G: IROX** |-1.006 |1.441 |0.486 | | | |", "| |HGB + PS |(Intercept) |47.825 |4.730 |< 0.001 |0.017 |104 |479 |", "| | |**AST** |-0.026 |0.020 |0.193 | | | |", "| | |**hgb** |1.086 |0.376 |0.004 | | | |", "| | |**PS** |-0.115 |0.873 |0.895 | | | |", "" , "" , "|sex |adjustment |Body Mass Index (kg/m^2) |estimate |std.error |p.value |adj.r.squared |Nmiss |N |", "|:------|:----------------|:---------------------------|:--------|:---------|:-------|:-------------|:-----|:---|", "|Male |Unadjusted |(Intercept) |26.679 |0.294 |< 0.001 |0.011 |22 |894 |", "| | |**fu.time** |0.001 |0.000 |< 0.001 | | | |", "| |Adjusted for Arm |(Intercept) |26.901 |0.376 |< 0.001 |0.012 |22 |894 |", "| | |**fu.time** |0.001 |0.000 |< 0.001 | | | |", "| | |**Treatment Arm F: FOLFOX** |-0.594 |0.397 |0.134 | | | |", "| | |**Treatment Arm G: IROX** |-0.040 |0.451 |0.930 | | | |", "| |HGB + PS |(Intercept) |23.559 |1.321 |< 0.001 |0.036 |181 |735 |", "| | |**fu.time** |0.001 |0.000 |0.021 | | | |", "| | |**hgb** |0.283 |0.101 |0.005 | | | |", "| | |**PS** |-0.794 |0.322 |0.014 | | | |", "| |Unadjusted |(Intercept) |27.885 |0.306 |< 0.001 |0.006 |181 |735 |", "| | |**AST** |-0.016 |0.007 |0.023 | | | |", "| |Adjusted for Arm |(Intercept) |28.040 |0.412 |< 0.001 |0.004 |181 |735 |", "| | |**AST** |-0.016 |0.007 |0.023 | | | |", "| | |**Treatment Arm F: FOLFOX** |-0.354 |0.427 |0.408 | | | |", "| | |**Treatment Arm G: IROX** |0.015 |0.486 |0.976 | | | |", "| |HGB + PS |(Intercept) |24.297 |1.350 |< 0.001 |0.032 |181 |735 |", "| | |**AST** |-0.011 |0.007 |0.123 | | | |", "| | |**hgb** |0.307 |0.100 |0.002 | | | |", "| | |**PS** |-0.878 |0.318 |0.006 | | | |", "|Female |Unadjusted |(Intercept) |26.198 |0.423 |< 0.001 |0.003 |11 |572 |", "| | |**fu.time** |0.001 |0.001 |0.101 | | | |", "| |Adjusted for Arm |(Intercept) |26.198 |0.560 |< 0.001 |0.002 |11 |572 |", "| | |**fu.time** |0.001 |0.001 |0.164 | | | |", "| | |**Treatment Arm F: FOLFOX** |0.366 |0.625 |0.558 | | | |", "| | |**Treatment Arm G: IROX** |-0.386 |0.692 |0.577 | | | |", "| |HGB + PS |(Intercept) |27.620 |2.409 |< 0.001 |-0.000 |113 |470 |", "| | |**fu.time** |0.001 |0.001 |0.099 | | | |", "| | |**hgb** |-0.121 |0.196 |0.539 | | | |", "| | |**PS** |0.184 |0.467 |0.694 | | | |", "| |Unadjusted |(Intercept) |26.500 |0.456 |< 0.001 |0.001 |113 |470 |", "| | |**AST** |0.012 |0.010 |0.247 | | | |", "| |Adjusted for Arm |(Intercept) |26.062 |0.649 |< 0.001 |0.008 |113 |470 |", "| | |**AST** |0.012 |0.010 |0.225 | | | |", "| | |**Treatment Arm F: FOLFOX** |1.102 |0.669 |0.100 | | | |", "| | |**Treatment Arm G: IROX** |-0.276 |0.741 |0.710 | | | |", "| |HGB + PS |(Intercept) |27.324 |2.462 |< 0.001 |-0.003 |113 |470 |", "| | |**AST** |0.012 |0.010 |0.255 | | | |", "| | |**hgb** |-0.064 |0.196 |0.743 | | | |", "| | |**PS** |-0.103 |0.454 |0.821 | | | |" ) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, data = mdat), text = TRUE, adjustment.names = TRUE)), c("|adjustment | |estimate |std.error |p.value |adj.r.squared |", "|:----------|:-----------|:--------|:---------|:-------|:-------------|", "|unadjusted |(Intercept) |39.826 |0.779 |< 0.001 |-0.011 |", "| |Sex Male |-0.258 |1.115 |0.818 | |", "|unadjusted |(Intercept) |41.130 |1.197 |< 0.001 |0.009 |", "| |time |-0.371 |0.275 |0.182 | |" ) ) }) ########################################################################################################### #### Other warnings and tests and things... ########################################################################################################### test_that("Reordering variables and subsetting", { tmp.tab <- modelsum(list(Age, time) ~ Sex + dt + ethan, adjust = ~ Group, strata = trt, data = mdat) expect_identical( capture.kable(summary(tmp.tab[c(3,1,2), 2:1], text = TRUE)), capture.kable(summary(modelsum(list(time, Age) ~ ethan + Sex + dt, adjust = ~ Group, strata = trt, data = mdat), text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[c(3,1,2)], text = TRUE)), capture.kable(summary(tmp.tab[c("ethan", "Sex", "dt")], text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[, 2:1], text = TRUE)), capture.kable(summary(tmp.tab[, c("time", "Age")], text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[1:2], text = TRUE)), capture.kable(summary(tmp.tab[c(TRUE, TRUE, FALSE)], text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[, 2], text = TRUE)), capture.kable(summary(tmp.tab[, c(FALSE, TRUE)], text = TRUE)) ) expect_warning(tmp.tab[1:4], "Some indices not found") expect_warning(tmp.tab[, 1:3], "Some indices not found") expect_error(tmp.tab[TRUE], "Logical vector") expect_error(tmp.tab[, TRUE], "Logical vector") }) test_that("Merging modelsum objects", { tb1 <- modelsum(list(Age, time) ~ Sex + Phase, strata = trt, data = mdat) tb2 <- modelsum(list(as.numeric(dt), ht_in) ~ Group, strata = trt, data = mdat) tb3 <- modelsum(list(Age, ht_in, time) ~ Group + ethan, strata = trt, data = mdat) tb4 <- modelsum(list(Age, time) ~ Sex + Phase + Group + ethan, strata = trt, data = mdat) expect_error(merge(tb1, tb2), "No terms in common") expect_identical( capture.kable(summary(merge(tb1, tb2, all = TRUE))), c(capture.kable(summary(tb1)), "", "", capture.kable(summary(tb2))) ) expect_identical( capture.kable(summary(merge(tb1, tb3), text = TRUE)), capture.kable(summary(tb4, text = TRUE)) ) expect_identical( capture.kable(summary(merge(tb1, tb3, all.x = TRUE), text = TRUE)), capture.kable(summary(tb4, text = TRUE)) ) expect_identical( capture.kable(summary(merge(tb1, tb3, all = TRUE), text = TRUE)), c( capture.kable(summary(tb4, text = TRUE)), "", "", capture.kable(summary(modelsum(ht_in ~ Group + ethan, data = mdat, strata = trt), text = TRUE)) ) ) }) test_that("Changing labels", { mdat.tmp <- set_labels(mdat, NULL) mdat.tmp$`1x` <- 1:nrow(mdat.tmp) mdat.tmp$`2x` <- rep(c("A", "B"), each = nrow(mdat.tmp)/2) tab <- modelsum(list(Age, time) ~ Sex + `1x` + `2x`, adjust = ~ Group, strata = trt, data = mdat.tmp) tmp <- capture.kable(summary(tab, text = TRUE, term.name = TRUE)) expect_warning(labels(tab) <- c(hi = "hi", SexMale = "Male", Age = "Age 1", time = "Time", trt = "Trt Arm", GroupLow = "Low", "`1x`" = "onex", "`2x`B" = "B"), NA) expect_identical( capture.kable(summary(tab, text = TRUE, term.name = TRUE)), c("|Trt Arm |Age 1 |estimate |std.error |p.value |adj.r.squared |", "|:-------|:-----------|:--------|:---------|:-------|:-------------|", "|A |(Intercept) |42.000 |1.795 |< 0.001 |-0.052 |", "| |Male |-2.000 |1.918 |0.305 | |", "| |Low |-1.182 |2.303 |0.611 | |", "| |Group Med |-0.545 |2.290 |0.813 | |", "| |(Intercept) |41.277 |2.281 |< 0.001 |-0.087 |", "| |onex |-0.019 |0.113 |0.870 | |", "| |Low |0.237 |7.336 |0.974 | |", "| |Group Med |-0.083 |4.085 |0.984 | |", "| |(Intercept) |41.000 |1.543 |< 0.001 |-0.087 |", "| |B |-0.667 |3.495 |0.850 | |", "| |Low |-0.242 |4.198 |0.954 | |", "| |Group Med |-0.333 |2.817 |0.907 | |", "|B |(Intercept) |38.733 |1.470 |< 0.001 |-0.050 |", "| |Male |0.908 |1.407 |0.522 | |", "| |Low |0.205 |1.753 |0.907 | |", "| |Group Med |-0.317 |1.753 |0.857 | |", "| |(Intercept) |38.753 |1.835 |< 0.001 |-0.056 |", "| |onex |0.027 |0.081 |0.739 | |", "| |Low |-1.414 |5.083 |0.782 | |", "| |Group Med |-1.105 |2.996 |0.714 | |", "| |(Intercept) |39.187 |1.296 |< 0.001 |-0.058 |", "| |B |0.011 |2.383 |0.996 | |", "| |Low |0.170 |2.962 |0.955 | |", "| |Group Med |-0.299 |2.161 |0.891 | |", "" , "" , "|Trt Arm |Time |estimate |std.error |p.value |adj.r.squared |", "|:-------|:-----------|:--------|:---------|:-------|:-------------|", "|A |(Intercept) |4.469 |0.568 |< 0.001 |-0.016 |", "| |Male |-0.223 |0.606 |0.716 | |", "| |Low |-1.115 |0.728 |0.136 | |", "| |Group Med |-0.620 |0.724 |0.398 | |", "| |(Intercept) |5.331 |0.671 |< 0.001 |0.090 |", "| |onex |-0.066 |0.033 |0.058 | |", "| |Low |2.947 |2.159 |0.182 | |", "| |Group Med |1.316 |1.202 |0.282 | |", "| |(Intercept) |4.357 |0.475 |< 0.001 |0.004 |", "| |B |-0.967 |1.076 |0.376 | |", "| |Low |-0.118 |1.292 |0.928 | |", "| |Group Med |-0.190 |0.867 |0.828 | |", "|B |(Intercept) |4.301 |0.591 |< 0.001 |0.086 |", "| |Male |0.898 |0.566 |0.119 | |", "| |Low |-1.621 |0.705 |0.026 | |", "| |Group Med |-0.879 |0.705 |0.218 | |", "| |(Intercept) |3.710 |0.725 |< 0.001 |0.112 |", "| |onex |0.065 |0.032 |0.048 | |", "| |Low |-5.461 |2.008 |0.009 | |", "| |Group Med |-2.797 |1.184 |0.022 | |", "| |(Intercept) |4.750 |0.516 |< 0.001 |0.098 |", "| |B |1.700 |0.948 |0.079 | |", "| |Low |-3.345 |1.179 |0.007 | |", "| |Group Med |-1.750 |0.860 |0.047 | |" ) ) labels(tab) <- NULL expect_identical( capture.kable(summary(tab, text = TRUE, term.name = TRUE)), tmp ) expect_identical( tmp, c("|trt |Age |estimate |std.error |p.value |adj.r.squared |", "|:---|:-----------|:--------|:---------|:-------|:-------------|", "|A |(Intercept) |42.000 |1.795 |< 0.001 |-0.052 |", "| |Sex Male |-2.000 |1.918 |0.305 | |", "| |Group Low |-1.182 |2.303 |0.611 | |", "| |Group Med |-0.545 |2.290 |0.813 | |", "| |(Intercept) |41.277 |2.281 |< 0.001 |-0.087 |", "| |1x |-0.019 |0.113 |0.870 | |", "| |Group Low |0.237 |7.336 |0.974 | |", "| |Group Med |-0.083 |4.085 |0.984 | |", "| |(Intercept) |41.000 |1.543 |< 0.001 |-0.087 |", "| |2x B |-0.667 |3.495 |0.850 | |", "| |Group Low |-0.242 |4.198 |0.954 | |", "| |Group Med |-0.333 |2.817 |0.907 | |", "|B |(Intercept) |38.733 |1.470 |< 0.001 |-0.050 |", "| |Sex Male |0.908 |1.407 |0.522 | |", "| |Group Low |0.205 |1.753 |0.907 | |", "| |Group Med |-0.317 |1.753 |0.857 | |", "| |(Intercept) |38.753 |1.835 |< 0.001 |-0.056 |", "| |1x |0.027 |0.081 |0.739 | |", "| |Group Low |-1.414 |5.083 |0.782 | |", "| |Group Med |-1.105 |2.996 |0.714 | |", "| |(Intercept) |39.187 |1.296 |< 0.001 |-0.058 |", "| |2x B |0.011 |2.383 |0.996 | |", "| |Group Low |0.170 |2.962 |0.955 | |", "| |Group Med |-0.299 |2.161 |0.891 | |", "" , "" , "|trt |time |estimate |std.error |p.value |adj.r.squared |", "|:---|:-----------|:--------|:---------|:-------|:-------------|", "|A |(Intercept) |4.469 |0.568 |< 0.001 |-0.016 |", "| |Sex Male |-0.223 |0.606 |0.716 | |", "| |Group Low |-1.115 |0.728 |0.136 | |", "| |Group Med |-0.620 |0.724 |0.398 | |", "| |(Intercept) |5.331 |0.671 |< 0.001 |0.090 |", "| |1x |-0.066 |0.033 |0.058 | |", "| |Group Low |2.947 |2.159 |0.182 | |", "| |Group Med |1.316 |1.202 |0.282 | |", "| |(Intercept) |4.357 |0.475 |< 0.001 |0.004 |", "| |2x B |-0.967 |1.076 |0.376 | |", "| |Group Low |-0.118 |1.292 |0.928 | |", "| |Group Med |-0.190 |0.867 |0.828 | |", "|B |(Intercept) |4.301 |0.591 |< 0.001 |0.086 |", "| |Sex Male |0.898 |0.566 |0.119 | |", "| |Group Low |-1.621 |0.705 |0.026 | |", "| |Group Med |-0.879 |0.705 |0.218 | |", "| |(Intercept) |3.710 |0.725 |< 0.001 |0.112 |", "| |1x |0.065 |0.032 |0.048 | |", "| |Group Low |-5.461 |2.008 |0.009 | |", "| |Group Med |-2.797 |1.184 |0.022 | |", "| |(Intercept) |4.750 |0.516 |< 0.001 |0.098 |", "| |2x B |1.700 |0.948 |0.079 | |", "| |Group Low |-3.345 |1.179 |0.007 | |", "| |Group Med |-1.750 |0.860 |0.047 | |" ) ) }) ########################################################################################################### #### Reported bugs for modelsum ########################################################################################################### test_that("02/23/2018: wrapping long labels (#59)", { labs <- list( Group = "This is a really long label for the Group variable", time = "Another really long label. Can you believe how long this is", dt = "ThisLabelHasNoSpacesSoLetsSeeHowItBehaves", trt = NULL ) expect_identical( capture.kable(print(summary(modelsum(Age ~ Group + time + dt, strata = trt, data = set_labels(mdat, labs)), text = TRUE, digits = 4), width = 30)), c("|trt | |estimate |std.error |p.value |adj.r.squared |", "|:---|:------------------------------|:--------|:---------|:-------|:-------------|", "|A |(Intercept) |41.0000 |1.5200 |< 0.001 |-0.0552 |", "| |This is a really long label |-0.9091 |2.2915 |0.694 | |", "| |for the Group variable Low | | | | |", "| |This is a really long label |-0.6364 |2.2915 |0.783 | |", "| |for the Group variable Med | | | | |", "| |(Intercept) |43.2848 |2.1872 |< 0.001 |0.0257 |", "| |Another really long label. |-0.7192 |0.5187 |0.175 | |", "| |Can you believe how long this | | | | |", "| |is | | | | |", "| |(Intercept) |36.3141 |2.9357 |< 0.001 |0.0352 |", "| |ThisLabelHasNoSpacesSoLetsSeeH |-0.0006 |0.0004 |0.141 | |", "| |owItBehaves | | | | |", "|B |(Intercept) |39.1875 |1.2837 |< 0.001 |-0.0375 |", "| |This is a really long label |0.1809 |1.7423 |0.918 | |", "| |for the Group variable Low | | | | |", "| |This is a really long label |-0.2928 |1.7423 |0.867 | |", "| |for the Group variable Med | | | | |", "| |(Intercept) |39.9678 |1.4192 |< 0.001 |-0.0107 |", "| |Another really long label. |-0.2118 |0.3205 |0.512 | |", "| |Can you believe how long this | | | | |", "| |is | | | | |", "| |(Intercept) |46.9878 |2.5356 |< 0.001 |0.1478 |", "| |ThisLabelHasNoSpacesSoLetsSeeH |0.0011 |0.0003 |0.002 | |", "| |owItBehaves | | | | |" ) ) }) arsenal/tests/testthat/helper-script.R0000644000176200001440000000042013632700353017603 0ustar liggesusers capture.kable <- function(x) { y <- capture.output(x) stopifnot(length(y) > 3) if(grepl("^Table:", y[2])) y <- c("", y) if(y[1] != "" || y[2] != "" || tail(y, 1) != "") stop("Leading and/or trailing elements aren't blank") utils::head(utils::tail(y, -2), -1) } arsenal/tests/testthat/write2.modelsum.html.Rmd0000644000176200001440000000205014056510155021347 0ustar liggesusers--- title: Test title --- Table: My test table | |estimate |std.error |p.value |adj.r.squared |Nmiss | |:---------------------------|:---------|:---------|:-------|:-------------|:-----| |**Treatment Arm F: FOLFOX** |-13.70062 |8.72963 |0.117 |-0.00070 |266 | |**Treatment Arm G: IROX** |-2.24498 |9.86004 |0.820 | | | |**Age in Years** |-0.01741 |0.31878 |0.956 | | | |**sex Female** |3.01598 |7.52097 |0.688 | | | |**ps** |46.72138 |5.98698 |< 0.001 |0.04501 |266 | |**Age in Years** |-0.08420 |0.31108 |0.787 | | | |**sex Female** |1.16929 |7.34303 |0.874 | | | |**hgb** |-13.84479 |2.13698 |< 0.001 |0.03078 |266 | |**Age in Years** |0.09455 |0.31403 |0.763 | | | |**sex Female** |-5.97954 |7.51625 |0.426 | | | arsenal/tests/testthat/write2.tableby.html.Rmd0000644000176200001440000000163014056510153021145 0ustar liggesusers--- title: Test title --- Table: My test table | | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | p value| |:---------------------------|:---------------:|:-----------------:|:---------------:|-------:| |**SEX** | | | | 0.190| |   Male | 277 (64.7%) | 411 (59.5%) | 228 (60.0%) | | |   Female | 151 (35.3%) | 280 (40.5%) | 152 (40.0%) | | |**Age, yrs** | | | | 0.614| |   Mean (SD) | 59.673 (11.365) | 60.301 (11.632) | 59.763 (11.499) | | |   Q1, Q3 | 53.000, 68.000 | 52.000, 69.000 | 52.000, 68.000 | | |   Range | 27.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | | arsenal/tests/testthat/test_write2.R0000644000176200001440000002357514056510242017312 0ustar liggesusers## Tests for write2 context("Testing the write2 output") data(mockstudy) expect_write2_worked <- function(FUN, object, reference, ...) { skip_on_cran() skip_on_os("mac") skip_on_os("windows") FUN <- match.fun(FUN) filename <- tempfile() # on.exit(expect_true(file.remove(paste0(filename, ".Rmd")))) if(!file.exists(reference)) skip("Couldn't find the reference file.") if(!file.create(paste0(filename, ".Rmd"))) skip("Couldn't create the temporary file.") expect_error(FUN(list(object, yaml(title = "Test title")), file = filename, ..., render. = TRUE, keep.rmd = TRUE, append. = FALSE, quiet = TRUE), NA) # on.exit(expect_true(file.remove(filename)), add = TRUE) generated <- readLines(paste0(filename, ".Rmd")) expect_output_file(cat(generated, sep = "\n"), reference) } ########################################################################################################### #### Internal output ########################################################################################################### test_that("write2.tableby -> HTML", { expect_write2_worked(write2html, tableby(arm ~ sex + age, data=mockstudy, numeric.stats = c("meansd", "q1q3", "range")), reference = "write2.tableby.html.Rmd", title = "My test table", labelTranslations = list(sex = "SEX", age ="Age, yrs"), total = FALSE) }) test_that("write2.modelsum -> HTML", { expect_write2_worked(write2html, modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy), reference = "write2.modelsum.html.Rmd", title = "My test table", show.intercept = FALSE, digits = 5) }) old.labs <- c(cumFreq = "cumFreq", freqPercent = "freqPercent", cumPercent = "cumPercent") test_that("write2.freqlist -> HTML", { expect_write2_worked(write2html, freqlist(table(mockstudy[c("arm", "sex", "mdquality.s")], useNA = "ifany"), strata = c("arm", "sex")), reference = "write2.freqlist.html.Rmd", single = TRUE, labelTranslations = old.labs) }) test_that("write2.freqlist -> doc", { expect_write2_worked(write2word, freqlist(table(mockstudy[c("arm", "sex", "mdquality.s")], useNA = "ifany"), strata = c("arm", "sex")), reference = "write2.freqlist.doc.Rmd", single = TRUE, title = "My cool title", labelTranslations = old.labs) }) ## From the vignette test_that("write2.list (summary objects) -> PDF", { mylist6 <- list( summary(tableby(sex ~ age, data = mockstudy), title = "A Title for tableby"), summary(modelsum(age ~ sex, data = mockstudy), title = "A Title for modelsum"), summary(freqlist(~ sex, data = mockstudy, labelTranslations = old.labs), title = "A Title for freqlist") ) expect_write2_worked(write2pdf, mylist6, reference = "write2.multititles.pdf.Rmd") }) ########################################################################################################### #### External output, commented out on 11/9/17 because of external package changes ########################################################################################################### # # test_that("write2.knitr_kable -> HTML", { # if(require(knitr)) # { # expect_write2_worked(write2html, knitr::kable(head(mockstudy)), reference = "write2.kable.html.Rmd") # } else skip("library(knitr) not available.") # }) # # test_that("write2.xtable -> HTML", { # if(require(xtable)) # { # expect_write2_worked(write2html, xtable::xtable(head(mockstudy), caption = "My xtable"), reference = "write2.xtable.html.Rmd", # type = "html", comment = FALSE, include.rownames = FALSE, caption.placement = 'top') # } else skip("library(xtable) not available.") # }) # # test_that("write2.character (pander) -> HTML", { # if(require(pander)) # { # expect_write2_worked(write2html, pander::pander_return(head(mockstudy)), reference = "write2.pander.html.Rmd") # } else skip("library(pander) not available.") # }) # ########################################################################################################### #### List output ########################################################################################################### mylist <- list(tableby(sex ~ age, data = mockstudy, numeric.stats = c("meansd", "q1q3", "range")), freqlist(table(mockstudy[, c("sex", "arm")]), labelTranslations = old.labs), knitr::kable(utils::head(mockstudy))) mylist2 <- list("# Header 1", "This is a small paragraph.", tableby(sex ~ age, data = mockstudy, numeric.stats = c("meansd", "q1q3", "range"))) test_that("write2.list -> PDF", { expect_write2_worked(write2pdf, mylist, reference = "write2.mylist.pdf.Rmd") }) test_that("write2.list -> Word", { expect_write2_worked(write2word, mylist2, reference = "write2.mylist2.doc.Rmd") }) test_that("write2.list recursion -> PDF", { expect_write2_worked(write2word, list(mylist2, mylist), reference = "write2.mylists.pdf.Rmd") }) ########################################################################################################### #### verbatim output ########################################################################################################### my.lm <- summary(lm(age ~ sex, data = mockstudy)) test_that("write2.default -> PDF", { expect_write2_worked(write2pdf, my.lm, reference = "write2.lm.pdf.Rmd") }) test_that("write2.verbatim -> html", { expect_write2_worked(write2pdf, verbatim(paste0("Hi.", 1:5)), reference = "write2.char.html.Rmd") }) test_that("Writing HTML from PDF works (#162)", { expect_write2_worked(write2pdf, list( "hi there", code.chunk( arsenal::write2html(list("hi there", yaml(title = "hi there")), "hi_there.html", clean = TRUE) ) ), reference = "write2.render.html.Rmd") }) test_that("verbatim(call()) works (#248)", { expect_identical( capture.output(print(verbatim(call("hi", x = 1), NULL, "hi"))), c("```" , "hi(x = 1)" , "" , "```" , "" , "```" , "NULL" , "" , "```" , "" , "```" , "[1] \"hi\"", "" , "```" , "" ) ) }) ########################################################################################################### #### YAML output ########################################################################################################### mylist3 <- list( "# Header 1", "This is a small paragraph.", tableby(sex ~ age, data = mockstudy, numeric.stats = c("meansd", "q1q3", "range")), yaml(title = "My title"), my.lm, yaml(author = "Ethan P Heinzen"), yaml("header-includes" = list("\\usepackage[labelformat=empty]{caption}")), code.chunk(a <- 1, "b <- 2", a + b, "a - b", chunk.opts = "r echo = FALSE, eval = TRUE") ) mylist4 <- list( yaml(title = "My title", author = "Ethan P Heinzen", "header-includes" = list("\\usepackage[labelformat=empty]{caption}")), "# Header 1", "This is a small paragraph.", tableby(sex ~ age, data = mockstudy, numeric.stats = c("meansd", "q1q3", "range")), my.lm, code.chunk(a <- 1, "b <- 2", a + b, "a - b", chunk.opts = "r echo = FALSE, eval = TRUE") ) test_that("write2.yaml -> PDF", { expect_write2_worked(write2pdf, mylist3, reference = "write2.yaml.pdf.Rmd") expect_write2_worked(write2pdf, mylist4, reference = "write2.yaml.pdf.Rmd") }) ########################################################################################################### #### Code used to generate the files ########################################################################################################### # # write2html(tableby(arm ~ sex + age, data=mockstudy, numeric.stats = c("meansd", "q1q3", "range")), "tests/testthat/write2.tableby.html", # title = "My test table", labelTranslations = list(sex = "SEX", age ="Age, yrs"), total = FALSE, render. = FALSE) # # write2html(modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy), # "tests/testthat/write2.modelsum.html", # title = "My test table", show.intercept = FALSE, digits = 5, render. = FALSE) # # write2html(freqlist(table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany"), groupBy = c("arm", "sex")), # "tests/testthat/write2.freqlist.html", single = TRUE, render. = FALSE) # # write2word(freqlist(table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany"), groupBy = c("arm", "sex")), # "tests/testthat/write2.freqlist.doc", single = TRUE, title = "My cool title", render. = FALSE) # # write2pdf(mylist6, "tests/testthat/write2.multititles.pdf", render. = FALSE) # ## write2html(knitr::kable(head(mockstudy)), ## "tests/testthat/write2.kable.html", render. = FALSE) ## ## write2html(xtable::xtable(head(mockstudy), caption = "My xtable"), ## "tests/testthat/write2.xtable.html", ## type = "html", comment = FALSE, include.rownames = FALSE, caption.placement = "top", render. = FALSE) ## ## write2html(pander::pander_return(head(mockstudy)), ## "tests/testthat/write2.pander.html", render. = FALSE) # # # write2pdf(mylist, "tests/testthat/write2.mylist.pdf", render. = FALSE) # write2word(mylist2, "tests/testthat/write2.mylist2.doc", render. = FALSE) # write2pdf(list(mylist2, mylist), "tests/testthat/write2.mylists.pdf", render. = FALSE) # # write2pdf(my.lm, "tests/testthat/write2.lm.pdf", render. = FALSE) # write2html(verbatim(paste0("Hi.", 1:5)), # "tests/testthat/write2.char.html", render. = FALSE) # write2pdf(mylist3, "tests/testthat/write2.yaml.pdf", render. = FALSE) ########################################################################################################### #### Reported bugs for write2 ########################################################################################################### arsenal/tests/testthat/mdat.rds0000644000176200001440000000261213632700353016343 0ustar liggesusersZKlT}3N$"(j+R $6ABl!$&%Pɸ3#< QV !D|DViZ64-uɆ6H +J̽Dz/~$Xw~<իJ+#҆?4?}Uc/>6\{'R{2"sUc]cv9^sd1MRﶏI}o{wY 3L3>&zcBȃ:Q>|QyB~LEY 4⛂?,4}r[M' >A^*%RL(V-qXwwrOB4鐮u!luV:Ia/k8&C/%0rJv i߁?U313|L3gLO GYgz_GA^yϤ!nezQ>+- q D9Q/ {[ƇSWiJ+i'.4.\L㶿vDe`8>$ޫAdm n#rثɸOƗ5xX/2'{DuuE ~/Kw[=Y_ tk4E^ܸqۤ,LKbҗbI}] u5|h#;M;JpM&0wܳ_0M9ֿYo׉suD_㭱=o^]';qo,a'9wY0~BonCN_Y~;H!o2AW>&Y>C:MU微OW 0!k?^Okާ+  ϸܿXϏss]79GWmJK;F-}^]\-Nq Vog]V$43\2 kf`=7ؖ[+@)=At͌;pFKJ_wB&'= 4==Y`U_U^+r:lC7.jn7nܡmiȬ7`7O`n`Xen= "3.4.0") { expect_identical( capture.kable(summary(freqlist(~ trt + ethan, data = mdat, addNA = TRUE), labelTranslations = c(trt = "Trt", ethan = "Ethan"))), capture.kable(summary(freqlist(~ trt + addNA(ethan), data = mdat), labelTranslations = c("addNA(ethan)" = "Ethan", trt = "Trt"))) ) } else skip("R version isn't right to use 'addNA=TRUE'") }) test_that("digits specification", { expect_identical( capture.kable(summary(freqlist(~ trt + addNA(ethan), data = mdat), digits.pct = 1, digits.count = 1)), capture.kable(summary(freqlist(~ trt + addNA(ethan), data = mdat, digits.pct = 1, digits.count = 1))) ) expect_identical( capture.kable(summary(freqlist(~ trt + addNA(ethan), data = mdat), digits.pct = 1, digits.count = 1)), c( "|Treatment Arm |addNA(ethan) | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:-------------|:------------|----:|---------------:|-------:|------------------:|", "|A |Ethan | 17.0| 17.0| 18.9| 18.9|", "| |Heinzen | 16.0| 33.0| 17.8| 36.7|", "| |NA | 3.0| 36.0| 3.3| 40.0|", "|B |Ethan | 25.0| 61.0| 27.8| 67.8|", "| |Heinzen | 29.0| 90.0| 32.2| 100.0|" ) ) }) ########################################################################################################### #### Reported bugs for freqlist ########################################################################################################### test_that("11/18/16: Emily Lundt's subsetted table and duplicate label problem", { expect_identical( capture.kable(summary(freqlist(TAB.subset), labelTranslations = old.labs)), c("|Group |Sex |Phase | Freq| cumFreq| freqPercent| cumPercent|", "|:-----|:------|:-----|----:|-------:|-----------:|----------:|", "|High |Female |I | 4| 4| 5.19| 5.19|", "| | |II | 8| 12| 10.39| 15.58|", "| | |III | 3| 15| 3.90| 19.48|", "| |Male |I | 7| 22| 9.09| 28.57|", "| | |II | 2| 24| 2.60| 31.17|", "| | |III | 6| 30| 7.79| 38.96|", "|Low |Female |I | 7| 37| 9.09| 48.05|", "| | |II | 8| 45| 10.39| 58.44|", "| | |III | 2| 47| 2.60| 61.04|", "|Med |Female |II | 11| 58| 14.29| 75.32|", "| | |III | 3| 61| 3.90| 79.22|", "| |Male |II | 8| 69| 10.39| 89.61|", "| | |III | 8| 77| 10.39| 100.00|" ) ) }) test_that("04/17/18: using 'method' in freqlist (#95)", { dat <- data.frame(method = c(1, 1, 2, 2, 3, 3, 4, 4)) expect_identical( capture.kable(summary(freqlist(~method, data = dat))), c("|method | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:------|----:|---------------:|-------:|------------------:|", "|1 | 2| 2| 25.00| 25.00|", "|2 | 2| 4| 25.00| 50.00|", "|3 | 2| 6| 25.00| 75.00|", "|4 | 2| 8| 25.00| 100.00|" ) ) }) test_that("02/26/19: don't drop labels with subset= argument (#184)", { expect_identical( capture.kable(summary(freqlist(~ age, data = mockstudy, subset = age > 80))), c("|Age in Years | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:------------|----:|---------------:|-------:|------------------:|", "|81 | 12| 12| 41.38| 41.38|", "|82 | 6| 18| 20.69| 62.07|", "|83 | 6| 24| 20.69| 82.76|", "|84 | 1| 25| 3.45| 86.21|", "|85 | 2| 27| 6.90| 93.10|", "|88 | 2| 29| 6.90| 100.00|" ) ) }) test_that("03/20/2019: freqlist still works with all zero counts (#194, #186).", { tab0 <- table(factor(c(), levels = c("m", "f"))) expect_error(print(summary(freqlist(tab0))), "There wasn't anything") expect_identical( capture.kable(summary(freqlist(tab0), sparse = TRUE)), c("|Var1 | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:----|----:|---------------:|-------:|------------------:|", "|m | 0| 0| NA| NA|", "|f | 0| 0| NA| NA|" ) ) }) test_that("03/21/2019: freqlist doesn't lose labels when subsetting (#196)", { expect_identical( capture.kable(summary(freqlist(~ sex + ps + arm, data = mockstudy, strata = "arm", subset = arm == "F: FOLFOX" & !is.na(ps))[c(1:2, 4)])), c("|Treatment Arm |sex | Freq|", "|:-------------|:------|----:|", "|F: FOLFOX |Male | 168|", "| | | 148|", "| | | 16|", "| |Female | 110|", "| | | 95|", "| | | 13|" ) ) }) test_that("02/28/2020: freqlist.formula works without needing addNA AND na.option (#265)", { expect_identical( capture.kable(summary(freqlist(~ sex + ps, data = mockstudy))), c("|sex |ps | Freq| Cumulative Freq| Percent| Cumulative Percent|", "|:------|:--|----:|---------------:|-------:|------------------:|", "|Male |0 | 391| 391| 26.08| 26.08|", "| |1 | 329| 720| 21.95| 48.03|", "| |2 | 34| 754| 2.27| 50.30|", "| |NA | 162| 916| 10.81| 61.11|", "|Female |0 | 244| 1160| 16.28| 77.38|", "| |1 | 202| 1362| 13.48| 90.86|", "| |2 | 33| 1395| 2.20| 93.06|", "| |NA | 104| 1499| 6.94| 100.00|" ) ) expect_identical( capture.kable(summary(freqlist(~ sex + ps, data = mockstudy, addNA = FALSE))), capture.kable(summary(freqlist(~ sex + ps, data = mockstudy, na.options = "remove"))) ) }) arsenal/tests/testthat/write2.multititles.pdf.Rmd0000644000176200001440000000204314056510161021705 0ustar liggesusers--- title: Test title --- Table: A Title for tableby | | Male (N=916) | Female (N=583) | Total (N=1499) | p value| |:---------------------------|:---------------:|:---------------:|:---------------:|-------:| |**Age in Years** | | | | 0.048| |   Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | | |   Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | | Table: A Title for modelsum | |estimate |std.error |p.value |adj.r.squared | |:--------------|:--------|:---------|:-------|:-------------| |(Intercept) |60.455 |0.380 |< 0.001 |0.002 | |**sex Female** |-1.208 |0.610 |0.048 | | Table: A Title for freqlist |sex | Freq| cumFreq| freqPercent| cumPercent| |:------|----:|-------:|-----------:|----------:| |Male | 916| 916| 61.11| 61.11| |Female | 583| 1499| 38.89| 100.00| arsenal/tests/testthat/test_formulize.R0000644000176200001440000001575613656527336020135 0ustar liggesusers## Tests for formulize context("Testing the formulize output") data(mockstudy) cap <- function(...) capture.output(print(...)) check_form <- function(whatis, shouldbe) { tmp <- cap(whatis, showEnv = TRUE) expect_identical(tmp[1], shouldbe) expect_true(grepl("|t|) (Intercept) 60.4552 0.3802 159.001 <2e-16 *** sexFemale -1.2082 0.6097 -1.982 0.0477 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11.51 on 1497 degrees of freedom Multiple R-squared: 0.002617, Adjusted R-squared: 0.00195 F-statistic: 3.927 on 1 and 1497 DF, p-value: 0.04769 ``` arsenal/tests/testthat/test_modelsum.R0000644000176200001440000010730614056440776017734 0ustar liggesusers## Tests for modelsum context("Testing the modelsum output") # "mdat" now defined in helper-data.R ########################################################################################################### #### Basic modelsum call ########################################################################################################### test_that("A basic modelsum call--no labels, no missings", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, data = mdat), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |39.826 |0.779 |< 0.001 |-0.011 |", "|Sex Male |-0.258 |1.115 |0.818 | |", "|(Intercept) |41.130 |1.197 |< 0.001 |0.009 |", "|time |-0.371 |0.275 |0.182 | |" ) ) }) test_that("A basic modelsum tableby call--labels, no missings", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + trt, data = mdat), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:---------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |39.826 |0.779 |< 0.001 |-0.011 |", "|Sex Male |-0.258 |1.115 |0.818 | |", "|(Intercept) |40.528 |0.874 |< 0.001 |0.006 |", "|Treatment Arm B |-1.380 |1.128 |0.225 | |" ) ) }) test_that("A basic modelsum call--adding adjustment", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:---------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |40.632 |1.024 |< 0.001 |-0.005 |", "|Sex Male |-0.221 |1.112 |0.843 | |", "|Treatment Arm B |-1.373 |1.135 |0.229 | |", "|(Intercept) |41.938 |1.366 |< 0.001 |0.014 |", "|time |-0.368 |0.275 |0.184 | |", "|Treatment Arm B |-1.366 |1.123 |0.227 | |" ) ) }) test_that("A basic modelsum call--suppressing intercept and/or adjustment vars", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat), text = TRUE, show.intercept = FALSE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:---------------|:--------|:---------|:-------|:-------------|", "|Sex Male |-0.221 |1.112 |0.843 |-0.005 |", "|Treatment Arm B |-1.373 |1.135 |0.229 | |", "|time |-0.368 |0.275 |0.184 |0.014 |", "|Treatment Arm B |-1.366 |1.123 |0.227 | |" ) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat), text = TRUE, show.adjust = FALSE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |40.632 |1.024 |< 0.001 |-0.005 |", "|Sex Male |-0.221 |1.112 |0.843 | |", "|(Intercept) |41.938 |1.366 |< 0.001 |0.014 |", "|time |-0.368 |0.275 |0.184 | |" ) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat), text = TRUE, show.intercept = FALSE, show.adjust = FALSE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:--------|:--------|:---------|:-------|:-------------|", "|Sex Male |-0.221 |1.112 |0.843 |-0.005 |", "|time |-0.368 |0.275 |0.184 |0.014 |" ) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat), text = TRUE, show.intercept = FALSE, show.adjust = FALSE)), capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat, show.intercept = FALSE, show.adjust = FALSE), text = TRUE)) ) }) test_that("Reordering variables", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat)[c(3,1,2)], text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |41.130 |1.197 |< 0.001 |0.009 |", "|time |-0.371 |0.275 |0.182 | |", "|(Intercept) |39.826 |0.779 |< 0.001 |-0.011 |", "|Sex Male |-0.258 |1.115 |0.818 | |", "|(Intercept) |40.033 |0.970 |< 0.001 |-0.021 |", "|Group Low |-0.400 |1.372 |0.771 | |", "|Group Med |-0.600 |1.372 |0.663 | |" ) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat)[c(3,1,2)], text = TRUE)), capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat)[c("time", "Sex", "Group")], text = TRUE)) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat)[1:2], text = TRUE)), capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat)[c(TRUE, TRUE, FALSE)], text = TRUE)) ) expect_identical( capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat), text = TRUE)), capture.kable(summary(modelsum(Age ~ Sex + Group + time, data = mdat)[], text = TRUE)) ) expect_warning(modelsum(Age ~ Sex + Group + time, data = mdat)[1:4], "Some indices not found") expect_error(modelsum(Age ~ Sex + Group + time, data = mdat)[TRUE], "Logical vector") }) test_that("offset() works", { expect_error(summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, data=mockstudy, family=poisson)), NA) }) test_that("strata() works", { skip_if_not(getRversion() >= "3.3.0") skip_if_not_installed("survival", "2.43-1") require(survival) expect_identical( capture.kable(summary(modelsum(Surv(time, status) ~ ethan, adjust = ~strata(Sex), data = mdat, family="survival"), text = TRUE)), c("| |HR |CI.lower.HR |CI.upper.HR |p.value |concordance |Nmiss |", "|:-------------|:-----|:-----------|:-----------|:-------|:-----------|:-----|", "|ethan Heinzen |1.051 |0.549 |2.014 |0.880 |0.499 |3 |" ) ) # borrowed from help page for ?clogit data("logan") resp <- levels(logan$occupation) n <- nrow(logan) indx <- rep(1:n, length(resp)) logan2 <- data.frame(logan[indx,], id = indx, tocc = factor(rep(resp, each=n))) logan2$case <- (logan2$occupation == logan2$tocc) expect_identical( capture.kable(summary(modelsum(case ~ tocc, adjust = ~ tocc:education + strata(id), data = set_labels(logan2, list(education = "edu")), family = "clog"))), c("| |OR |CI.lower.OR |CI.upper.OR |p.value |concordance |", "|:-------------------------|:-----|:-----------|:-----------|:-------|:-----------|", "|**tocc farm** |0.150 |0.010 |2.248 |0.170 |0.766 |", "|**tocc operatives** |3.212 |1.060 |9.732 |0.039 | |", "|**tocc professional** |0.000 |0.000 |0.001 |< 0.001 | |", "|**tocc sales** |0.007 |0.001 |0.030 |< 0.001 | |", "|**tocc craftsmen:edu** |0.717 |0.642 |0.802 |< 0.001 | |", "|**tocc farm:edu** |0.691 |0.550 |0.868 |0.001 | |", "|**tocc operatives:edu** |0.656 |0.585 |0.735 |< 0.001 | |", "|**tocc professional:edu** |1.321 |1.195 |1.460 |< 0.001 | |", "|**tocc sales:edu** |NA |NA |NA | | |" ) ) }) test_that("'weights=' works", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex, data = mdat, weights = weights))), c("| |estimate |std.error |p.value |adj.r.squared |", "|:------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |39.826 |0.889 |< 0.001 |0.020 |", "|**Sex Male** |1.953 |1.167 |0.098 | |" ) ) }) test_that("interactions work", { expect_identical( capture.kable(summary(modelsum(age ~ bmi, adjust = ~ sex*arm, data=mockstudy))), c("| |estimate |std.error |p.value |adj.r.squared |Nmiss |", "|:--------------------------------------|:--------|:---------|:-------|:-------------|:-----|", "|(Intercept) |58.401 |1.691 |< 0.001 |0.001 |33 |", "|**Body Mass Index (kg/m^2)** |0.051 |0.056 |0.362 | | |", "|**sex Female** |-0.351 |1.177 |0.765 | | |", "|**Treatment Arm F: FOLFOX** |0.852 |0.908 |0.348 | | |", "|**Treatment Arm G: IROX** |0.979 |1.040 |0.347 | | |", "|**sex Female:Treatment Arm F: FOLFOX** |-0.596 |1.485 |0.688 | | |", "|**sex Female:Treatment Arm G: IROX** |-1.975 |1.688 |0.242 | | |" ) ) expect_identical( capture.kable(summary(modelsum(age ~ bmi, adjust = ~ hgb*arm, data=mockstudy))), c("| |estimate |std.error |p.value |adj.r.squared |Nmiss |", "|:-------------------------------|:--------|:---------|:-------|:-------------|:-----|", "|(Intercept) |54.324 |4.747 |< 0.001 |0.004 |294 |", "|**Body Mass Index (kg/m^2)** |0.029 |0.062 |0.643 | | |", "|**hgb** |0.404 |0.366 |0.271 | | |", "|**Treatment Arm F: FOLFOX** |-1.386 |5.748 |0.809 | | |", "|**Treatment Arm G: IROX** |-1.228 |6.589 |0.852 | | |", "|**hgb:Treatment Arm F: FOLFOX** |0.176 |0.462 |0.703 | | |", "|**hgb:Treatment Arm G: IROX** |0.052 |0.529 |0.922 | | |" ) ) expect_identical( capture.kable(summary(modelsum(age ~ bmi:arm, adjust = ~ hgb, data=mockstudy))), c("| |estimate |std.error |p.value |adj.r.squared |Nmiss |", "|:----------------------------------------------------|:--------|:---------|:-------|:-------------|:-----|", "|(Intercept) |53.303 |2.822 |< 0.001 |0.005 |294 |", "|**hgb** |0.499 |0.193 |0.010 | | |", "|**Body Mass Index (kg/m^2):Treatment Arm A: IFL** |0.023 |0.065 |0.721 | | |", "|**Body Mass Index (kg/m^2):Treatment Arm F: FOLFOX** |0.052 |0.063 |0.416 | | |", "|**Body Mass Index (kg/m^2):Treatment Arm G: IROX** |0.002 |0.065 |0.973 | | |" ) ) expect_identical( capture.kable(summary(modelsum(age ~ bmi:arm, adjust = ~ hgb, data=mockstudy))), c("| |estimate |std.error |p.value |adj.r.squared |Nmiss |", "|:----------------------------------------------------|:--------|:---------|:-------|:-------------|:-----|", "|(Intercept) |53.303 |2.822 |< 0.001 |0.005 |294 |", "|**hgb** |0.499 |0.193 |0.010 | | |", "|**Body Mass Index (kg/m^2):Treatment Arm A: IFL** |0.023 |0.065 |0.721 | | |", "|**Body Mass Index (kg/m^2):Treatment Arm F: FOLFOX** |0.052 |0.063 |0.416 | | |", "|**Body Mass Index (kg/m^2):Treatment Arm G: IROX** |0.002 |0.065 |0.973 | | |" ) ) expect_identical( as.data.frame(modelsum(age ~ bmi:arm, adjust = ~ hgb, data=mockstudy))$term.type, c("Intercept", "Adjuster", "Term", "Term", "Term") ) }) test_that("ordinal works", { if(require(MASS)) { data(housing) expect_identical( capture.kable(summary(modelsum(Sat ~ Infl, adjust = ~ Type + Cont, weights = Freq, data = housing, family = "ordinal"))), c("| |OR |CI.lower.OR |CI.upper.OR |p.value |", "|:------------------|:-----|:-----------|:-----------|:-------|", "|Low|Medium |NA |NA |NA |< 0.001 |", "|Medium|High |NA |NA |NA |< 0.001 |", "|**Infl Medium** |1.762 |1.436 |2.164 |< 0.001 |", "|**Infl High** |3.628 |2.832 |4.663 |< 0.001 |", "|**Type Apartment** |0.564 |0.446 |0.712 |< 0.001 |", "|**Type Atrium** |0.693 |0.511 |0.940 |0.018 |", "|**Type Terrace** |0.336 |0.249 |0.451 |< 0.001 |", "|**Cont High** |1.434 |1.189 |1.730 |< 0.001 |" ) ) expect_identical( capture.kable(summary(modelsum(Sat ~ Infl, adjust = ~ Type + Cont, weights = Freq, data = housing, family = "ordinal", ordinal.stats = c("estimate", "statistic", "p.value")), text = TRUE)), c("| |estimate |statistic |p.value |", "|:----------------|:--------|:---------|:-------|", "|Low|Medium |-0.496 |-3.974 |< 0.001 |", "|Medium|High |0.691 |5.505 |< 0.001 |", "|Infl Medium |0.566 |5.412 |< 0.001 |", "|Infl High |1.289 |10.136 |< 0.001 |", "|Type Apartment |-0.572 |-4.800 |< 0.001 |", "|Type Atrium |-0.366 |-2.360 |0.018 |", "|Type Terrace |-1.091 |-7.202 |< 0.001 |", "|Cont High |0.360 |3.771 |< 0.001 |" ) ) expect_identical( capture.kable(summary(modelsum(Sat ~ Infl, adjust = ~ Type + Cont, weights = Freq, data = housing, family = "ordinal", show.adjust = FALSE, show.intercept = FALSE), text = TRUE)), c("| |OR |CI.lower.OR |CI.upper.OR |p.value |", "|:-----------|:-----|:-----------|:-----------|:-------|", "|Infl Medium |1.762 |1.436 |2.164 |< 0.001 |", "|Infl High |3.628 |2.832 |4.663 |< 0.001 |" ) ) } else skip("'MASS' is not available") }) test_that("negbin works", { if(require(MASS)) { data(mockstudy) expect_identical( capture.kable(summary(modelsum(fu.time ~ sex, adjust = ~ age + arm, data = mockstudy, family = negbin), negbin.stats = c("estimate", "p.value", "theta"), text = TRUE, digits = 5)), c("| |estimate |p.value |theta |", "|:-----------------------|:--------|:-------|:-------|", "|(Intercept) |6.52819 |< 0.001 |1.84776 |", "|sex Female |-0.02370 |0.545 | |", "|Age in Years |-0.00342 |0.039 | |", "|Treatment Arm F: FOLFOX |0.28161 |< 0.001 | |", "|Treatment Arm G: IROX |0.09396 |0.071 | |" ) ) } else skip("'MASS' is not available") }) ########################################################################################################### #### Reported bugs for modelsum ########################################################################################################### set.seed(3248) dat <- data.frame(short.name = rnorm(100), really.long.name = rnorm(100), why.would.you.name.something = rnorm(100), as.long.as.this = rnorm(100)) test_that("01/26/2017: Brendan Broderick's Bold Text Wrapping Problem", { expect_identical( capture.kable(summary(modelsum(short.name ~ really.long.name + as.long.as.this, adjust = ~ why.would.you.name.something, data = dat))), c("| |estimate |std.error |p.value |adj.r.squared |", "|:--------------------------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |0.035 |0.099 |0.721 |-0.001 |", "|**really.long.name** |0.099 |0.099 |0.319 | |", "|**why.would.you.name.something** |-0.083 |0.090 |0.361 | |", "|(Intercept) |0.048 |0.097 |0.624 |0.023 |", "|**as.long.as.this** |0.198 |0.106 |0.066 | |", "|**why.would.you.name.something** |-0.090 |0.089 |0.314 | |" ) ) }) rm(dat) ################################################################################################################################# test_that("02/07/2017: Ryan Lennon's R Markdown spacing problem. Also 02/14/2018 (#66)", { expect_error(capture.kable(summary(modelsum(Age ~ Sex + time, data = mdat), text = TRUE)), NA) }) ################################################################################################################################# test_that("02/13/2017: Krista Goergen's survival subset and NA problems", { skip_if_not(getRversion() >= "3.5.0") skip_if_not_installed("survival", "2.41-3") require(survival) mdat.tmp <- keep.labels(mdat) form <- Surv(time, status) ~ Sex + ethan expect_identical(capture.kable(summary(modelsum(form, data = mdat.tmp, subset = Group=="High", family="survival"), text = TRUE)), capture.kable(summary(modelsum(form, data = mdat.tmp[mdat.tmp$Group=="High",], family="survival"), text = TRUE))) mdat.tmp[3:4,"time"] <- c(NA,NA) expect_identical(capture.kable(summary(modelsum(form, data = mdat.tmp, subset = Group=="High", family="survival"), text = TRUE)), capture.kable(summary(modelsum(form, data = mdat.tmp[mdat.tmp$Group=="High",], family="survival"), text = TRUE))) expect_identical(capture.kable(summary(modelsum(form, adjust = ~Age, data = mdat.tmp, subset = Group=="High", family="survival"), text = TRUE)), capture.kable(summary(modelsum(form, adjust = ~Age, data = mdat.tmp[mdat.tmp$Group=="High",], family="survival"), text = TRUE))) expect_identical( capture.kable(summary(modelsum(form, adjust = ~Age, data = mdat.tmp, subset = Group=="High", family="survival"), text = TRUE)), c("| |HR |CI.lower.HR |CI.upper.HR |p.value |concordance |Nmiss |", "|:-------------|:-----|:-----------|:-----------|:-------|:-----------|:-----|", "|Sex Male |0.612 |0.210 |1.786 |0.369 |0.592 |2 |", "|Age in Years |1.061 |0.968 |1.164 |0.205 | | |", "|ethan Heinzen |1.019 |0.297 |3.501 |0.976 |0.639 |4 |", "|Age in Years |1.058 |0.960 |1.166 |0.258 | | |" ) ) }) ################################################################################################################################# test_that("04/12/2017: ... vs modelsum.control", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat, show.adjust = FALSE, control = modelsum.control()), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |40.632 |1.024 |< 0.001 |-0.005 |", "|Sex Male |-0.221 |1.112 |0.843 | |", "|(Intercept) |41.938 |1.366 |< 0.001 |0.014 |", "|time |-0.368 |0.275 |0.184 | |" ) ) }) ################################################################################################################################# data(mockstudy) test_that("08/01/2017: Beth Atkinson's subset problem", { idx <- mockstudy$sex == "Male" form <- fu.stat - 1 ~ age + hgb expect_identical(capture.kable(summary(modelsum(form, data = mockstudy, subset = idx, adjust = ~arm, family="binomial"), text = TRUE)), capture.kable(summary(modelsum(form, data = mockstudy, subset = sex == "Male", adjust = ~arm, family="binomial"), text = TRUE))) }) ################################################################################################################################# set.seed(88) df <- data.frame( y = rnorm(1000), x1 = rnorm(1000), x2 = rnorm(1000), x3 = rpois(1000, 2), x5 = rnorm(1000), x7 = rep(LETTERS[1:5], each = 200), x8 = runif(1000) ) test_that("07/27/2017: Too many adjustment vars in as.data.frame.modelsum (#12)", { expect_equal(nrow(as.data.frame(modelsum(y ~ x1, adjust = ~ x7 + x2 + x3 + x5 + x8, data = df))), 10L) }) ################################################################################################################################# test_that("07/27/2017: modelsum labels (#13)", { expect_identical( capture.kable(summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = list(sexFemale = "Female", age = "Age, yrs"), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |Nmiss |", "|:-----------|:--------|:---------|:-------|:-------------|:-----|", "|(Intercept) |26.793 |0.766 |< 0.001 |0.004 |33 |", "|Age, yrs |0.012 |0.012 |0.348 | | |", "|Female |-0.718 |0.291 |0.014 | | |" ) ) expect_identical( capture.kable(summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = list(sexFemale = "Female", age = "Age, yrs"), text = TRUE)), capture.kable(summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = c(sexFemale = "Female", age = "Age, yrs"), text = TRUE)) ) expect_warning(summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = c(badvar = "Eek")), NA) }) ################################################################################################################################# test_that("12/23/2017: non-syntactic names (#44, #45)", { dat <- data.frame(y = 1:10, x1x = rep(c("A", "B"), each = 5), `1x` = rep(c("C", "D"), each = 5), stringsAsFactors = FALSE, check.names = FALSE) expect_identical( capture.kable(summary(modelsum(y ~ x1x, data = dat))), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |3.000 |0.707 |0.003 |0.727 |", "|**x1x B** |5.000 |1.000 |0.001 | |" ) ) expect_identical( capture.kable(summary(modelsum(y ~ `1x`, data = dat))), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |3.000 |0.707 |0.003 |0.727 |", "|**1x D** |5.000 |1.000 |0.001 | |" ) ) }) ################################################################################################################################# test_that("01/05/2018: leading/trailing whitespace (#48)", { expect_identical( capture.kable(summary(modelsum(age ~ arm, data = set_labels(mockstudy, list(arm = " Arm "))))), c("| |estimate |std.error |p.value |adj.r.squared |", "|:------------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |59.673 |0.557 |< 0.001 |-0.001 |", "|**Arm F: FOLFOX** |0.628 |0.709 |0.376 | |", "|**Arm G: IROX** |0.090 |0.812 |0.912 | |" ) ) expect_identical( capture.kable(summary(modelsum(age ~ arm, data = set_labels(mockstudy, list(arm = " Arm "))), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:--------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |59.673 |0.557 |< 0.001 |-0.001 |", "|Arm F: FOLFOX |0.628 |0.709 |0.376 | |", "|Arm G: IROX |0.090 |0.812 |0.912 | |" ) ) }) ################################################################################################################################# test_that("02/23/2018: wrapping long labels (#59)", { labs <- list( Group = "This is a really long label for the Group variable", time = "Another really long label. Can you believe how long this is", dt = "ThisLabelHasNoSpacesSoLetsSeeHowItBehaves" ) expect_identical( capture.kable(print(summary(modelsum(Age ~ Group + time + dt, data = set_labels(mdat, labs)), text = TRUE), width = 30)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:------------------------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |40.033 |0.970 |< 0.001 |-0.021 |", "|This is a really long label |-0.400 |1.372 |0.771 | |", "|for the Group variable Low | | | | |", "|This is a really long label |-0.600 |1.372 |0.663 | |", "|for the Group variable Med | | | | |", "|(Intercept) |41.130 |1.197 |< 0.001 |0.009 |", "|Another really long label. |-0.371 |0.275 |0.182 | |", "|Can you believe how long this | | | | |", "|is | | | | |", "|(Intercept) |41.531 |2.017 |< 0.001 |-0.001 |", "|ThisLabelHasNoSpacesSoLetsSeeH |0.000 |0.000 |0.348 | |", "|owItBehaves | | | | |" ) ) }) ################################################################################################################################# test_that("05/31/2018: similar column names (#98)", { dat <- data.frame( y = c(1:9, 11), a = c(2, 2, 1:8), aa = c(1, 1:9), b = factor(rep(c("a", "b"), each = 5)) ) expect_identical( capture.kable(summary(modelsum(y ~ b, adjust = ~a + aa, data = dat))), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |0.417 |0.295 |0.208 |0.984 |", "|**b b** |-0.467 |0.548 |0.427 | |", "|**a** |-0.083 |0.217 |0.714 | |", "|**aa** |1.250 |0.183 |< 0.001 | |" ) ) }) ################################################################################################################################# test_that("05/31/2018: similar column names (#100)", { dat <- data.frame( y = c(1:9, 11), a = factor(rep(c("a", "b"), each = 5), levels = c("b", "a")), d = factor(rep(c("c", "d"), times = 5), levels = c("c", "d")) ) expect_identical( capture.kable(summary(modelsum(y ~ a, adjust = ~ d, data = set_labels(dat, list(a = "A", d = "D"))), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:-----------|:--------|:---------|:-------|:-------------|", "|(Intercept) |8.100 |1.112 |< 0.001 |0.656 |", "|A a |-5.167 |1.213 |0.004 | |", "|D d |0.167 |1.213 |0.895 | |" ) ) }) ################################################################################################################################# test_that("06/19/2018: term.name (#109)", { expect_identical( capture.kable(summary(modelsum(Age ~ Sex + time, adjust = ~ trt, data = mdat), text = TRUE, term.name = "Term")), c("|Term |estimate |std.error |p.value |adj.r.squared |", "|:---------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |40.632 |1.024 |< 0.001 |-0.005 |", "|Sex Male |-0.221 |1.112 |0.843 | |", "|Treatment Arm B |-1.373 |1.135 |0.229 | |", "|(Intercept) |41.938 |1.366 |< 0.001 |0.014 |", "|time |-0.368 |0.275 |0.184 | |", "|Treatment Arm B |-1.366 |1.123 |0.227 | |" ) ) }) ################################################################################################################################# test_that("08/24/2018: latex (#123)", { expect_identical( capture.output(summary(modelsum(Age ~ Sex, adjust = ~ trt, data = mdat), text = "latex")), c("" , "\\begin{tabular}{l|l|l|l|l}" , "\\hline" , " & estimate & std.error & p.value & adj.r.squared\\\\" , "\\hline" , "(Intercept) & 40.632 & 1.024 & < 0.001 & -0.005\\\\" , "\\hline" , "\\textbf{Sex Male} & -0.221 & 1.112 & 0.843 & \\\\" , "\\hline" , "\\textbf{Treatment Arm B} & -1.373 & 1.135 & 0.229 & \\\\", "\\hline" , "\\end{tabular}" , "" ) ) }) ################################################################################################################################# test_that("09/05/2018: correctly label contrasts for ordinal variables (#133)", { mdat$Group.ord <- ordered(mdat$Group.fac, levels = c("Low", "Med", "High")) expect_identical( capture.kable(summary(modelsum(Age ~ Phase, adjust = ~ Group.ord + trt + ht_in, data = mdat), text = TRUE)), c("| |estimate |std.error |p.value |adj.r.squared |", "|:----------------|:--------|:---------|:-------|:-------------|", "|(Intercept) |47.686 |7.301 |< 0.001 |-0.019 |", "|Phase .L |-0.679 |1.152 |0.557 | |", "|Phase .Q |-1.044 |0.960 |0.280 | |", "|Group.ord .L |0.243 |0.977 |0.804 | |", "|Group.ord .Q |0.410 |1.069 |0.702 | |", "|Treatment Arm B |-1.460 |1.159 |0.211 | |", "|Height in Inches |-0.112 |0.110 |0.314 | |" ) ) }) ################################################################################################################################# test_that("08/07/2019: survival confidence limits (#245)", { skip_if_not(getRversion() >= "3.5.0") skip_if_not_installed("survival", "2.41-3") require(survival) expect_identical( capture.kable(summary(modelsum(Surv(fu.time, fu.stat) ~ arm, data = mockstudy, conf.level = 0.99, family="survival"), text = TRUE)), c("| |HR |CI.lower.HR |CI.upper.HR |p.value |concordance |", "|:-----------------------|:-----|:-----------|:-----------|:-------|:-----------|", "|Treatment Arm F: FOLFOX |0.638 |0.540 |0.754 |< 0.001 |0.556 |", "|Treatment Arm G: IROX |0.871 |0.722 |1.050 |0.057 | |" ) ) }) ################################################################################################################################# test_that("08/07/2019: p.value.lrt (#238)", { skip_if_not(getRversion() >= "3.5.0") skip_if_not_installed("survival", "2.41-3") require(survival) expect_identical( capture.kable(summary(modelsum(age ~ sex + arm + bmi, data = mockstudy, gaussian.stats = c("estimate", "p.value.lrt")), text = TRUE, show.intercept = FALSE)), c("| |estimate |p.value.lrt |", "|:------------------------|:--------|:-----------|", "|sex Female |-1.208 |0.048 |", "|Treatment Arm F: FOLFOX |0.628 |0.614 |", "|Treatment Arm G: IROX |0.090 | |", "|Body Mass Index (kg/m^2) |0.059 |0.289 |" ) ) expect_identical( capture.kable(summary(modelsum(Surv(fu.time, fu.stat) ~ sex + arm + bmi, data = mockstudy, family="survival", survival.stats = c("estimate", "p.value.lrt")), text = TRUE, show.intercept = FALSE)), c("| |estimate |p.value.lrt |", "|:------------------------|:--------|:-----------|", "|sex Female |0.002 |0.975 |", "|Treatment Arm F: FOLFOX |-0.449 |< 0.001 |", "|Treatment Arm G: IROX |-0.138 | |", "|Body Mass Index (kg/m^2) |-0.016 | |" ) ) }) test_that("statistic.F works (#262)", { tab3 <- modelsum(bmi ~ age + sex, data=mockstudy, family=gaussian, gaussian.stats=c("estimate", "N","Nmiss","statistic.F")) expect_identical( capture.kable(summary(tab3, text=TRUE)), c("| |estimate |N |Nmiss |statistic.F |", "|:------------|:--------|:----|:-----|:-----------|", "|(Intercept) |26.424 |1466 |33 |1.122 |", "|Age in Years |0.013 | | | |", "|(Intercept) |27.491 |1466 |33 |6.341 |", "|sex Female |-0.731 | | | |" ) ) }) test_that("Nevents works (#266)", { skip_if_not(getRversion() >= "3.5.0") skip_if_not_installed("survival", "2.41-3") require(survival) tab3 <- modelsum(Surv(fu.time,fu.stat)~sex, data=mockstudy, survival.stats=c('HR','p.value','Nmiss','Nevents','N'), family = "survival") expect_identical( capture.kable(summary(tab3)), c("| |HR |p.value |Nevents |N |", "|:--------------|:-----|:-------|:-------|:----|", "|**sex Female** |1.002 |0.975 |1356 |1499 |" ) ) }) test_that("relrisk works (#279)", { skip_if_not(getRversion() >= "3.5.0") opts <- options() expect_identical( capture.kable(summary(modelsum(mdquality.s ~ arm + sex, data = mockstudy, id = case, family = "relrisk"))), c("| |RR |CI.lower.RR |CI.upper.RR |p.value |Nmiss |", "|:---------------------------|:-----|:-----------|:-----------|:-------|:-----|", "|(Intercept) |0.890 |0.859 |0.922 |< 0.001 |252 |", "|**Treatment Arm F: FOLFOX** |1.014 |0.969 |1.061 |0.538 | |", "|**Treatment Arm G: IROX** |1.021 |0.972 |1.072 |0.412 | |", "|(Intercept) |0.899 |0.878 |0.921 |< 0.001 |252 |", "|**sex Female** |1.004 |0.967 |1.043 |0.826 | |" ) ) options(opts) }) test_that("Nevents works for binomial (#325)", { expect_identical( capture.kable(summary(modelsum(fu.stat == 1 ~ age, data = mockstudy, family = "binomial", binomial.stats = c("OR", "Nevents")))), c("| |OR |Nevents |", "|:----------------|:-----|:-------|", "|(Intercept) |0.145 |143 |", "|**Age in Years** |0.995 | |" ) ) }) arsenal/tests/testthat/test_paired.R0000644000176200001440000005337614051175305017346 0ustar liggesusers## Tests for tableby context("Testing the paired output") dat <- data.frame( tp = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2), id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6), Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"), Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")), Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA), Num2 = c(1, 2, 1, 2, 2, 1, 2, 0, 2, NA), Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")), Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4), stringsAsFactors = FALSE ) dat$s <- selectall(a = c(1, 1, 0, 0, 0, 1, 0, 1, 0, 0), b = c(0, 0, 1, 1, 1, 0, 1, 0, 1, 1)) dat2 <- dat ########################################################################################################### #### Basic paired calls ########################################################################################################### for(i in 1:3) { if(i == 2) dat$id <- as.character(dat$id) else if(i == 3) dat$id <- as.factor(dat$id) test_that(paste0("Basic paired call; class(id) = ", class(dat$id), "; na.paired('asis')"), { expect_identical( capture.kable(summary(paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat + s, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("asis")), text = TRUE)), c("| | 1 (N=5) | 2 (N=5) | Difference (N=4) | p value|", "|:------------|:-----------------------:|:-----------------------:|:----------------:|-------:|", "|Cat | | | | 1.000|", "|- N-Miss | 1 | 0 | 0 | |", "|- A | 2 (50.0%) | 2 (40.0%) | 1 (50.0%) | |", "|- B | 2 (50.0%) | 3 (60.0%) | 1 (50.0%) | |", "|Fac | | | | 0.261|", "|- A | 2 (40.0%) | 2 (40.0%) | 2 (100.0%) | |", "|- B | 1 (20.0%) | 2 (40.0%) | 1 (100.0%) | |", "|- C | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |", "|Num | | | | 0.391|", "|- N-Miss | 0 | 1 | 0 | |", "|- Mean (SD) | 2.200 (1.643) | 3.250 (0.957) | 0.500 (1.000) | |", "|- Range | 0.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |", "|Ord | | | | 0.174|", "|- I | 2 (40.0%) | 1 (20.0%) | 2 (100.0%) | |", "|- II | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |", "|- III | 1 (20.0%) | 3 (60.0%) | 0 (0.0%) | |", "|Lgl | | | | 1.000|", "|- FALSE | 3 (60.0%) | 2 (40.0%) | 2 (100.0%) | |", "|- TRUE | 2 (40.0%) | 3 (60.0%) | 1 (50.0%) | |", "|Dat | | | | 0.182|", "|- Median | 2018-05-04 | 2018-05-05 | 0.500 | |", "|- Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 | |", "|s | | | | |", "|- a | 1 (20.0%) | 3 (60.0%) | 2 (50.0%) | |", "|- b | 4 (80.0%) | 2 (40.0%) | 2 (50.0%) | |" ) ) }) test_that(paste0("Basic paired call; class(id) = ", class(dat$id), "; na.paired('fill')"), { expect_identical( capture.kable(summary(paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("fill")), text = TRUE)), c("| | 1 (N=6) | 2 (N=6) | Difference (N=6) | p value|", "|:------------|:-----------------------:|:-----------------------:|:----------------:|-------:|", "|Cat | | | | 1.000|", "|- N-Miss | 2 | 1 | 2 | |", "|- A | 2 (50.0%) | 2 (40.0%) | 1 (50.0%) | |", "|- B | 2 (50.0%) | 3 (60.0%) | 1 (50.0%) | |", "|Fac | | | | 0.261|", "|- N-Miss | 1 | 1 | 2 | |", "|- A | 2 (40.0%) | 2 (40.0%) | 2 (100.0%) | |", "|- B | 1 (20.0%) | 2 (40.0%) | 1 (100.0%) | |", "|- C | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |", "|Num | | | | 0.391|", "|- N-Miss | 1 | 2 | 2 | |", "|- Mean (SD) | 2.200 (1.643) | 3.250 (0.957) | 0.500 (1.000) | |", "|- Range | 0.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |", "|Ord | | | | 0.174|", "|- N-Miss | 1 | 1 | 2 | |", "|- I | 2 (40.0%) | 1 (20.0%) | 2 (100.0%) | |", "|- II | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |", "|- III | 1 (20.0%) | 3 (60.0%) | 0 (0.0%) | |", "|Lgl | | | | 1.000|", "|- N-Miss | 1 | 1 | 2 | |", "|- FALSE | 3 (60.0%) | 2 (40.0%) | 2 (100.0%) | |", "|- TRUE | 2 (40.0%) | 3 (60.0%) | 1 (50.0%) | |", "|Dat | | | | 0.182|", "|- N-Miss | 1 | 1 | 2 | |", "|- Median | 2018-05-04 | 2018-05-05 | 0.500 | |", "|- Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 | |" ) ) }) test_that(paste0("Basic paired call; class(id) = ", class(dat$id), "; na.paired('in.both')"), { expect_identical( capture.kable(summary(paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("in.both")), text = TRUE)), c("| | 1 (N=4) | 2 (N=4) | Difference (N=4) | p value|", "|:------------|:-----------------------:|:-----------------------:|:----------------:|-------:|", "|Cat | | | | 1.000|", "|- A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|- B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|Fac | | | | 0.261|", "|- A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |", "|- B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | |", "|- C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |", "|Num | | | | 0.391|", "|- Mean (SD) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | |", "|- Range | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |", "|Ord | | | | 0.174|", "|- I | 2 (50.0%) | 0 (0.0%) | 2 (100.0%) | |", "|- II | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |", "|- III | 1 (25.0%) | 3 (75.0%) | 0 (0.0%) | |", "|Lgl | | | | 1.000|", "|- FALSE | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |", "|- TRUE | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | |", "|Dat | | | | 0.182|", "|- Median | 2018-05-03 | 2018-05-04 | 0.500 | |", "|- Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 | |" ) ) }) } test_that(paste0("Basic paired call; na.paired('in.both')"), { expect_identical( capture.kable(summary(paired(tp ~ Cat + Fac + Lgl, data = dat, id = id, test = FALSE, na.action = na.paired("in.both"), cat.stats = c("Nmiss", "countrowpct", "countcellpct", "rowbinomCI")), text = TRUE)), c("| | 1 (N=4) | 2 (N=4) | Difference (N=4) |", "|:--------|:--------------------:|:--------------------:|:--------------------:|", "|Cat | | | |", "|- A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) |", "|- B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) |", "|- A | 2 (25.0%) | 2 (25.0%) | 1 (50.0%) |", "|- B | 2 (25.0%) | 2 (25.0%) | 1 (50.0%) |", "|- A | 0.500 (0.068, 0.932) | 0.500 (0.068, 0.932) | 0.500 (0.013, 0.987) |", "|- B | 0.500 (0.068, 0.932) | 0.500 (0.068, 0.932) | 0.500 (0.013, 0.987) |", "|Fac | | | |", "|- A | 2 (66.7%) | 1 (33.3%) | 2 (100.0%) |", "|- B | 1 (33.3%) | 2 (66.7%) | 1 (100.0%) |", "|- C | 1 (50.0%) | 1 (50.0%) | 1 (100.0%) |", "|- A | 2 (25.0%) | 1 (12.5%) | 2 (100.0%) |", "|- B | 1 (12.5%) | 2 (25.0%) | 1 (100.0%) |", "|- C | 1 (12.5%) | 1 (12.5%) | 1 (100.0%) |", "|- A | 0.667 (0.094, 0.992) | 0.333 (0.008, 0.906) | 1.000 (0.158, 1.000) |", "|- B | 0.333 (0.008, 0.906) | 0.667 (0.094, 0.992) | 1.000 (0.025, 1.000) |", "|- C | 0.500 (0.013, 0.987) | 0.500 (0.013, 0.987) | 1.000 (0.025, 1.000) |", "|Lgl | | | |", "|- FALSE | 2 (66.7%) | 1 (33.3%) | 2 (100.0%) |", "|- TRUE | 2 (40.0%) | 3 (60.0%) | 1 (50.0%) |", "|- FALSE | 2 (25.0%) | 1 (12.5%) | 2 (100.0%) |", "|- TRUE | 2 (25.0%) | 3 (37.5%) | 1 (50.0%) |", "|- FALSE | 0.667 (0.094, 0.992) | 0.333 (0.008, 0.906) | 1.000 (0.158, 1.000) |", "|- TRUE | 0.400 (0.053, 0.853) | 0.600 (0.147, 0.947) | 0.500 (0.013, 0.987) |" ) ) }) dat$id[10] <- NA dat$tp[9] <- NA test_that("Paired with missings", { expect_identical( capture.kable(summary(paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE), text = TRUE)), c("| | 1 (N=4) | 2 (N=4) | Difference (N=4) | p value|", "|:------------|:-----------------------:|:-----------------------:|:----------------:|-------:|", "|Cat | | | | 1.000|", "|- A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|- B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|Fac | | | | 0.261|", "|- A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |", "|- B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | |", "|- C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |", "|Num | | | | 0.391|", "|- Mean (SD) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | |", "|- Range | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |", "|Ord | | | | 0.174|", "|- I | 2 (50.0%) | 0 (0.0%) | 2 (100.0%) | |", "|- II | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |", "|- III | 1 (25.0%) | 3 (75.0%) | 0 (0.0%) | |", "|Lgl | | | | 1.000|", "|- FALSE | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |", "|- TRUE | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | |", "|Dat | | | | 0.182|", "|- Median | 2018-05-03 | 2018-05-04 | 0.500 | |", "|- Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 | |" ) ) }) test_that("09/07/2018: specifying different digits (#107) and cat.simplify (#134)", { expect_identical( capture.kable(summary(paired(tp ~ mcnemar(Cat, digits.count = 1, digits.pct = 0, cat.simplify = TRUE) + paired.t(Num, digits = 1) + sign.test(Num2, "meansd") + paired.t(Dat, "median", date.simplify = TRUE), data = dat, id = id, numeric.simplify = TRUE), text = TRUE, labelTranslations = list(Dat = "Date"))), c("| | 1 (N=4) | 2 (N=4) | Difference (N=4) | p value|", "|:------------|:-------------:|:-------------:|:----------------:|-------:|", "|Cat | 2.0 (50%) | 2.0 (50%) | 1.0 (50%) | 1.000|", "|Num | | | | 0.391|", "|- Mean (SD) | 2.8 (1.3) | 3.2 (1.0) | 0.5 (1.0) | |", "|- Range | 1.0 - 4.0 | 2.0 - 4.0 | -1.0 - 1.0 | |", "|Num2 | 1.500 (0.577) | 1.250 (0.957) | -0.250 (1.500) | 1.000|", "|Date | 2018-05-03 | 2018-05-04 | 0.500 | 0.182|" ) ) }) dat$tp <- replace(as.character(dat$tp), dat$tp == "2", "") test_that("08/23/2018: empty string in by-variable (#121)", expect_warning(summary(paired(tp ~ Cat, id = id, data = dat, signed.rank.exact = FALSE)), "Empty")) test_that("07/17/2019: fix bug with confidence limits and count (#234, #235)", { tmp <- dat2 tmp$Cat[2] <- "B" expect_identical( capture.kable(summary(paired(tp ~ Cat, data = tmp, cat.stats = c("binomCI", "count", "countpct"), id = id, control = tableby.control(conf.level = 0.9)), text = TRUE)), c("| | 1 (N=4) | 2 (N=4) | Difference (N=4) | p value|", "|:----|:--------------------:|:--------------------:|:--------------------:|-------:|", "|Cat | | | | 0.248|", "|- A | 0.500 (0.098, 0.902) | 0.250 (0.013, 0.751) | 1.000 (0.224, 1.000) | |", "|- B | 0.500 (0.098, 0.902) | 0.750 (0.249, 0.987) | 0.500 (0.025, 0.975) | |", "|- A | 2 | 1 | 2 | |", "|- B | 2 | 3 | 1 | |", "|- A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |", "|- B | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | |" ) ) }) test_that("12/27/2019: Npct works (#263)", { d <- data.frame( tp = rep(c("Time 1", "Time 2"), times = 4), id = c(1, 1, 2, 2, 3, 3, 4, 4), a = c(1, 1, 2, 2, 3, 3, 4, 4), b = c(1, 0, 2, 0, 3, 0, 4, 0) ) expect_identical( capture.kable(summary(paired(tp ~ notest(a) + b, id = id, data = d, numeric.stats = c("meansd", "Npct")), text = TRUE)), c("| | Time 1 (N=4) | Time 2 (N=4) | Difference (N=4) | p value|", "|:------------|:-------------:|:-------------:|:----------------:|-------:|", "|a | | | | |", "|- Mean (SD) | 2.500 (1.291) | 2.500 (1.291) | 0.000 (0.000) | |", "|- N (Pct) | 4 (50.0%) | 4 (50.0%) | 0 (0.0%) | |", "|b | | | | 0.030|", "|- Mean (SD) | 2.500 (1.291) | 0.000 (0.000) | -2.500 (1.291) | |", "|- N (Pct) | 4 (50.0%) | 4 (50.0%) | 4 (100.0%) | |" ) ) }) test_that("12/27/2019: changing the difference label (#271)", { expect_identical( capture.kable(summary(paired(tp ~ Cat + Fac + Num, data = dat2, id = id, signed.rank.exact = FALSE, cat.stats = c("countpct", "countrowpct"), stats.labels = list(meansd = "Mean (sd)", range = "Ran", difference = "Diff")), text = TRUE)), c("| | 1 (N=4) | 2 (N=4) | Diff (N=4) | p value|", "|:------------|:-------------:|:-------------:|:--------------:|-------:|", "|Cat | | | | 1.000|", "|- A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|- B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|- A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|- B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |", "|Fac | | | | 0.261|", "|- A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |", "|- B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | |", "|- C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |", "|- A | 2 (66.7%) | 1 (33.3%) | 2 (100.0%) | |", "|- B | 1 (33.3%) | 2 (66.7%) | 1 (100.0%) | |", "|- C | 1 (50.0%) | 1 (50.0%) | 1 (100.0%) | |", "|Num | | | | 0.391|", "|- Mean (sd) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | |", "|- Ran | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |" ) ) }) test_that("12/27/2019: informative error when no stats are computed (#273)", { expect_error(summary(paired(tp ~ Cat, data = dat2, id = id, cat.stats = "Nmiss")), "Nothing to show for variable") }) test_that("NAs in sign.test, plus Nsigntest (#326)", { d <- data.frame( tp = rep(c("Time 1", "Time 2"), times = 4), id = c(1, 1, 2, 2, 3, 3, 4, 4), a = c(1, 2, 2, 3, 3, 4, 5, NA) ) expect_identical( capture.kable(summary(paired(tp ~ sign.test(a), id = id, data = d, numeric.stats = c("Nmiss", "meansd", "range", "Nsigntest")), text = TRUE)), c("| | Time 1 (N=4) | Time 2 (N=4) | Difference (N=4) | p value|", "|:----------------|:-------------:|:-------------:|:----------------:|-------:|", "|a | | | | 0.250|", "|- N-Miss | 0 | 1 | 1 | |", "|- Mean (SD) | 2.750 (1.708) | 3.000 (1.000) | 1.000 (0.000) | |", "|- Range | 1.000 - 5.000 | 2.000 - 4.000 | 1.000 - 1.000 | |", "|- N (sign test) | NA | NA | 3 | |" ) ) }) arsenal/tests/testthat/test_lhs_tableby.R0000644000176200001440000011632014056440546020366 0ustar liggesusers## Tests for tableby context("Testing the tableby strata and multiple LHS output") # "mdat" now defined in helper-data.R ########################################################################################################### #### Basic two-sided tableby ########################################################################################################### test_that("A three-LHS tableby call", { skip_if_not_installed("coin") expect_identical( capture.kable(summary(tableby(list(Group, trt, ethan) ~ Sex + time + dt + Phase, data = mdat), text = TRUE)), c( capture.kable(summary(tableby(Group ~ Sex + time + dt + Phase, data = mdat), text = TRUE)), "", "", capture.kable(summary(tableby(trt ~ Sex + time + dt + Phase, data = mdat), text = TRUE)), "", "", capture.kable(summary(tableby(ethan ~ Sex + time + dt + Phase, data = mdat), text = TRUE)) ) ) }) test_that("A tableby call with strata", { skip_if_not_installed("coin") expect_identical( capture.kable(summary(tableby(Group ~ Age + time + Phase, data = mdat, strata = trt), text = TRUE)), c("|Treatment Arm | | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value |", "|:-------------|:------------|:---------------:|:---------------:|:---------------:|:---------------:|:-------:|", "|A |Age in Years | | | | | 0.918 |", "| |- Mean (SD) | 41.000 (6.493) | 40.091 (4.571) | 40.364 (5.573) | 40.528 (5.537) | |", "| |- Range | 29.000 - 53.000 | 33.000 - 48.000 | 30.000 - 49.000 | 29.000 - 53.000 | |", "| |time | | | | | 0.319 |", "| |- Mean (SD) | 4.357 (1.865) | 3.273 (1.421) | 3.727 (1.954) | 3.833 (1.781) | |", "| |- Range | 0.000 - 6.000 | 1.000 - 5.000 | 1.000 - 7.000 | 0.000 - 7.000 | |", "| |Phase | | | | | 0.506 |", "| |- I | 6 (42.9%) | 2 (18.2%) | 0 (0.0%) | 8 (22.2%) | |", "| |- II | 3 (21.4%) | 6 (54.5%) | 8 (72.7%) | 17 (47.2%) | |", "| |- III | 5 (35.7%) | 3 (27.3%) | 3 (27.3%) | 11 (30.6%) | |", "|B |Age in Years | | | | | 0.960 |", "| |- Mean (SD) | 39.188 (6.047) | 39.368 (3.515) | 38.895 (5.646) | 39.148 (5.041) | |", "| |- Range | 30.000 - 49.000 | 32.000 - 47.000 | 30.000 - 52.000 | 30.000 - 52.000 | |", "| |time | | | | | 0.081 |", "| |- Mean (SD) | 4.750 (1.807) | 3.105 (2.355) | 3.895 (2.079) | 3.870 (2.172) | |", "| |- Range | 1.000 - 7.000 | 0.000 - 6.000 | 1.000 - 7.000 | 0.000 - 7.000 | |", "| |Phase | | | | | 0.005 |", "| |- I | 5 (31.2%) | 10 (52.6%) | 0 (0.0%) | 15 (27.8%) | |", "| |- II | 7 (43.8%) | 6 (31.6%) | 11 (57.9%) | 24 (44.4%) | |", "| |- III | 4 (25.0%) | 3 (15.8%) | 8 (42.1%) | 15 (27.8%) | |" ) ) expect_equal( xtfrm(tableby(Group ~ Age + time + Phase, data = mdat, strata = trt)), c( xtfrm(tableby(Group ~ Age + time + Phase, data = mdat, subset = trt == "A")), xtfrm(tableby(Group ~ Age + time + Phase, data = mdat, subset = trt == "B")) ) ) }) test_that("strata levels are maintained", { dat <- data.frame(a = c("A", "A", "A", "B", "A", "B"), b = c(1, 1, 1, 2, 2, 2), stringsAsFactors = FALSE) expect_identical( capture.kable(summary(tableby(~ a, strata = b, data = dat), text = TRUE)), c("|b | | Overall (N=6) |", "|:--|:----|:-------------:|", "|1 |a | |", "| |- A | 3 (100.0%) |", "| |- B | 0 (0.0%) |", "|2 |a | |", "| |- A | 1 (33.3%) |", "| |- B | 2 (66.7%) |" ) ) }) ########################################################################################################### #### Change totals/p-values ########################################################################################################### test_that("A two-LHS tableby call--no p-value, no total", { expect_identical( capture.kable(summary(tableby(list(Group, ethan) ~ Age + Sex, data = mdat, strata = trt), test = FALSE, total = FALSE, text = TRUE)), capture.kable(summary(tableby(list(Group, ethan) ~ Age + Sex, data = mdat, strata = trt, test = FALSE, total = FALSE), text = TRUE)) ) }) ########################################################################################################### #### Other warnings and tests and things... ########################################################################################################### test_that("Certain functions don't work with multiple LHS or strata", { expect_error(padjust(tableby(list(Group, ethan) ~ Age + Sex, data = mdat)), "with strata or multiple") expect_error(padjust(tableby(Group ~ Age + Sex, data = mdat, strata = trt)), "with strata or multiple") expect_error(sort(tableby(list(Group, ethan) ~ Age + Sex, data = mdat)), "with strata or multiple") expect_error(sort(tableby(Group ~ Age + Sex, data = mdat, strata = trt)), "with strata or multiple") }) test_that("Using cat.simplify", { expect_identical( capture.kable(summary(tableby(list(Group, ethan) ~ Sex + Age, data = mdat, cat.simplify = TRUE, numeric.simplify = TRUE, numeric.stats = "meansd"), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:--------------:|:--------------:|:--------------:|:--------------:|-------:|", "|Sex | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | 0.733|", "|Age in Years | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | 39.700 (5.258) | 0.906|", "" , "" , "| | Ethan (N=42) | Heinzen (N=45) | Total (N=87) | p value|" , "|:------------|:--------------:|:--------------:|:--------------:|-------:|" , "|Sex | 24 (57.1%) | 18 (40.0%) | 42 (48.3%) | 0.110|" , "|Age in Years | 38.857 (5.201) | 40.200 (5.225) | 39.552 (5.227) | 0.233|" ) ) expect_identical( capture.kable(summary(tableby(list(Group, ethan) ~ Sex + Age, strata = trt, data = mdat, cat.simplify = TRUE, numeric.simplify = TRUE, numeric.stats = "meansd"), text = TRUE)), c("|Treatment Arm | | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value |", "|:-------------|:------------|:--------------:|:--------------:|:--------------:|:--------------:|:-------:|", "|A |Sex | 7 (50.0%) | 4 (36.4%) | 6 (54.5%) | 17 (47.2%) | 0.670 |", "| |Age in Years | 41.000 (6.493) | 40.091 (4.571) | 40.364 (5.573) | 40.528 (5.537) | 0.918 |", "|B |Sex | 8 (50.0%) | 9 (47.4%) | 10 (52.6%) | 27 (50.0%) | 0.949 |", "| |Age in Years | 39.188 (6.047) | 39.368 (3.515) | 38.895 (5.646) | 39.148 (5.041) | 0.960 |", "" , "" , "|Treatment Arm | | Ethan (N=42) | Heinzen (N=45) | Total (N=87) | p value |" , "|:-------------|:------------|:--------------:|:--------------:|:--------------:|:-------:|" , "|A |Sex | 10 (58.8%) | 5 (31.2%) | 15 (45.5%) | 0.112 |" , "| |Age in Years | 37.647 (5.689) | 42.938 (3.924) | 40.212 (5.533) | 0.004 |" , "|B |Sex | 14 (56.0%) | 13 (44.8%) | 27 (50.0%) | 0.413 |" , "| |Age in Years | 39.680 (4.785) | 38.690 (5.292) | 39.148 (5.041) | 0.477 |" ) ) }) test_that("Reordering variables and subsetting", { tmp.tab <- tableby(list(Group, ethan) ~ fe(Sex) + dt + Age, strata = trt, data = mdat) expect_identical( capture.kable(summary(tmp.tab[c(3,1,2), 2:1], text = TRUE)), capture.kable(summary(tableby(list(ethan, Group) ~ Age + fe(Sex) + dt, strata = trt, data = mdat), text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[c(3,1,2)], text = TRUE)), capture.kable(summary(tmp.tab[c("Age", "Sex", "dt")], text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[, 2:1], text = TRUE)), capture.kable(summary(tmp.tab[, c("ethan", "Group")], text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[1:2], text = TRUE)), capture.kable(summary(tmp.tab[c(TRUE, TRUE, FALSE)], text = TRUE)) ) expect_identical( capture.kable(summary(tmp.tab[, 2], text = TRUE)), capture.kable(summary(tmp.tab[, c(FALSE, TRUE)], text = TRUE)) ) expect_warning(tmp.tab[1:4], "Some indices not found") expect_warning(tmp.tab[, 1:3], "Some indices not found") expect_error(tmp.tab[TRUE], "Logical vector") expect_error(tmp.tab[, TRUE], "Logical vector") }) test_that("Merging tableby objects", { skip_if_not_installed("coin") tb1 <- tableby(list(Group, ethan) ~ Sex + Phase, strata = trt, data = mdat) tb2 <- tableby(list(Group.fac, status) ~ Age, strata = trt, data = mdat) tb3 <- tableby(list(Group, Group.fac, ethan) ~ Age + dt, strata = trt, data = mdat) tb4 <- tableby(list(Group, ethan) ~ Sex + Phase + Age + dt, strata = trt, data = mdat) expect_error(merge(tb1, tb2), "No terms in common") expect_identical( capture.kable(summary(merge(tb1, tb2, all = TRUE))), c(capture.kable(summary(tb1)), "", "", capture.kable(summary(tb2))) ) expect_identical( capture.kable(summary(merge(tb1, tb3), text = TRUE)), capture.kable(summary(tb4, text = TRUE)) ) expect_identical( capture.kable(summary(merge(tb1, tb3, all.x = TRUE), text = TRUE)), capture.kable(summary(tb4, text = TRUE)) ) expect_identical( capture.kable(summary(merge(tb1, tb3, all = TRUE), text = TRUE)), c( capture.kable(summary(tb4, text = TRUE)), "", "", capture.kable(summary(tableby(Group.fac ~ Age + dt, data = mdat, strata = trt), text = TRUE)) ) ) }) test_that("Changing labels", { tb <- tableby(list(Group, ethan) ~ Sex + Age, strata = trt, data = mdat) expect_warning(labels(tb) <- c(hi = "hi", Sex = "Sex label", Age = "Age at event", trt = "Trt Arm", ethan = "EthanH", Group = "Grp"), NA) expect_identical( capture.kable(summary(tb, text = TRUE, term.name = TRUE)), c("|Trt Arm |Grp | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value |", "|:-------|:------------|:---------------:|:---------------:|:---------------:|:---------------:|:-------:|", "|A |Sex label | | | | | 0.670 |", "| |- Female | 7 (50.0%) | 7 (63.6%) | 5 (45.5%) | 19 (52.8%) | |", "| |- Male | 7 (50.0%) | 4 (36.4%) | 6 (54.5%) | 17 (47.2%) | |", "| |Age at event | | | | | 0.918 |", "| |- Mean (SD) | 41.000 (6.493) | 40.091 (4.571) | 40.364 (5.573) | 40.528 (5.537) | |", "| |- Range | 29.000 - 53.000 | 33.000 - 48.000 | 30.000 - 49.000 | 29.000 - 53.000 | |", "|B |Sex label | | | | | 0.949 |", "| |- Female | 8 (50.0%) | 10 (52.6%) | 9 (47.4%) | 27 (50.0%) | |", "| |- Male | 8 (50.0%) | 9 (47.4%) | 10 (52.6%) | 27 (50.0%) | |", "| |Age at event | | | | | 0.960 |", "| |- Mean (SD) | 39.188 (6.047) | 39.368 (3.515) | 38.895 (5.646) | 39.148 (5.041) | |", "| |- Range | 30.000 - 49.000 | 32.000 - 47.000 | 30.000 - 52.000 | 30.000 - 52.000 | |", "" , "" , "|Trt Arm |EthanH | Ethan (N=42) | Heinzen (N=45) | Total (N=87) | p value |" , "|:-------|:------------|:---------------:|:---------------:|:---------------:|:-------:|" , "|A |Sex label | | | | 0.112 |" , "| |- Female | 7 (41.2%) | 11 (68.8%) | 18 (54.5%) | |" , "| |- Male | 10 (58.8%) | 5 (31.2%) | 15 (45.5%) | |" , "| |Age at event | | | | 0.004 |" , "| |- Mean (SD) | 37.647 (5.689) | 42.938 (3.924) | 40.212 (5.533) | |" , "| |- Range | 29.000 - 53.000 | 36.000 - 49.000 | 29.000 - 53.000 | |" , "|B |Sex label | | | | 0.413 |" , "| |- Female | 11 (44.0%) | 16 (55.2%) | 27 (50.0%) | |" , "| |- Male | 14 (56.0%) | 13 (44.8%) | 27 (50.0%) | |" , "| |Age at event | | | | 0.477 |" , "| |- Mean (SD) | 39.680 (4.785) | 38.690 (5.292) | 39.148 (5.041) | |" , "| |- Range | 30.000 - 48.000 | 30.000 - 52.000 | 30.000 - 52.000 | |" ) ) labels(tb) <- NULL expect_identical( capture.kable(summary(tb, text = TRUE, term.name = TRUE)), c("|trt |Group | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value |", "|:---|:------------|:---------------:|:---------------:|:---------------:|:---------------:|:-------:|", "|A |Sex | | | | | 0.670 |", "| |- Female | 7 (50.0%) | 7 (63.6%) | 5 (45.5%) | 19 (52.8%) | |", "| |- Male | 7 (50.0%) | 4 (36.4%) | 6 (54.5%) | 17 (47.2%) | |", "| |Age | | | | | 0.918 |", "| |- Mean (SD) | 41.000 (6.493) | 40.091 (4.571) | 40.364 (5.573) | 40.528 (5.537) | |", "| |- Range | 29.000 - 53.000 | 33.000 - 48.000 | 30.000 - 49.000 | 29.000 - 53.000 | |", "|B |Sex | | | | | 0.949 |", "| |- Female | 8 (50.0%) | 10 (52.6%) | 9 (47.4%) | 27 (50.0%) | |", "| |- Male | 8 (50.0%) | 9 (47.4%) | 10 (52.6%) | 27 (50.0%) | |", "| |Age | | | | | 0.960 |", "| |- Mean (SD) | 39.188 (6.047) | 39.368 (3.515) | 38.895 (5.646) | 39.148 (5.041) | |", "| |- Range | 30.000 - 49.000 | 32.000 - 47.000 | 30.000 - 52.000 | 30.000 - 52.000 | |", "" , "" , "|trt |ethan | Ethan (N=42) | Heinzen (N=45) | Total (N=87) | p value |" , "|:---|:------------|:---------------:|:---------------:|:---------------:|:-------:|" , "|A |Sex | | | | 0.112 |" , "| |- Female | 7 (41.2%) | 11 (68.8%) | 18 (54.5%) | |" , "| |- Male | 10 (58.8%) | 5 (31.2%) | 15 (45.5%) | |" , "| |Age | | | | 0.004 |" , "| |- Mean (SD) | 37.647 (5.689) | 42.938 (3.924) | 40.212 (5.533) | |" , "| |- Range | 29.000 - 53.000 | 36.000 - 49.000 | 29.000 - 53.000 | |" , "|B |Sex | | | | 0.413 |" , "| |- Female | 11 (44.0%) | 16 (55.2%) | 27 (50.0%) | |" , "| |- Male | 14 (56.0%) | 13 (44.8%) | 27 (50.0%) | |" , "| |Age | | | | 0.477 |" , "| |- Mean (SD) | 39.680 (4.785) | 38.690 (5.292) | 39.148 (5.041) | |" , "| |- Range | 30.000 - 48.000 | 30.000 - 52.000 | 30.000 - 52.000 | |" ) ) }) ########################################################################################################### #### Reported bugs for tableby ########################################################################################################### test_that("02/23/2018: wrapping long labels (#59)", { labs <- list( Group = "This is a really long label for the Group variable", time = "Another really long label. Can you believe how long this is", dt = "ThisLabelHasNoSpacesSoLetsSeeHowItBehaves", trt = NULL ) expect_identical( capture.kable(print(summary(tableby(Sex ~ Group + time + dt, strata = trt, data = set_labels(mdat, labs)), text = TRUE), width = 30)), c("|trt | | Female (N=46) | Male (N=44) | Total (N=90) | p value |", "|:---|:------------------------------|:-----------------------:|:-----------------------:|:-----------------------:|:-------:|", "|A |This is a really long label | | | | 0.670 |", "| |for the Group variable | | | | |", "| |- High | 7 (36.8%) | 7 (41.2%) | 14 (38.9%) | |", "| |- Low | 7 (36.8%) | 4 (23.5%) | 11 (30.6%) | |", "| |- Med | 5 (26.3%) | 6 (35.3%) | 11 (30.6%) | |", "| |Another really long label. | | | | 0.831 |", "| |Can you believe how long this | | | | |", "| |is | | | | |", "| |- Mean (SD) | 3.895 (1.560) | 3.765 (2.047) | 3.833 (1.781) | |", "| |- Range | 1.000 - 6.000 | 0.000 - 7.000 | 0.000 - 7.000 | |", "| |ThisLabelHasNoSpacesSoLetsSeeH | | | | 0.669 |", "| |owItBehaves | | | | |", "| |- Median | 1949-11-01 | 1950-02-06 | 1949-12-19 | |", "| |- Range | 1939-04-01 - 1959-09-06 | 1939-04-03 - 1968-05-14 | 1939-04-01 - 1968-05-14 | |", "|B |This is a really long label | | | | 0.949 |", "| |for the Group variable | | | | |", "| |- High | 8 (29.6%) | 8 (29.6%) | 16 (29.6%) | |", "| |- Low | 10 (37.0%) | 9 (33.3%) | 19 (35.2%) | |", "| |- Med | 9 (33.3%) | 10 (37.0%) | 19 (35.2%) | |", "| |Another really long label. | | | | 0.118 |", "| |Can you believe how long this | | | | |", "| |is | | | | |", "| |- Mean (SD) | 3.407 (2.153) | 4.333 (2.130) | 3.870 (2.172) | |", "| |- Range | 0.000 - 7.000 | 0.000 - 7.000 | 0.000 - 7.000 | |", "| |ThisLabelHasNoSpacesSoLetsSeeH | | | | 0.102 |", "| |owItBehaves | | | | |", "| |- Median | 1948-05-31 | 1951-03-31 | 1949-09-09 | |", "| |- Range | 1935-08-15 - 1957-08-15 | 1937-02-08 - 1958-07-30 | 1935-08-15 - 1958-07-30 | |" ) ) }) test_that("strata with includeNA()", { expect_identical( capture.kable(summary(tableby(list(sex, arm) ~ age, data = mockstudy, strata = includeNA(mdquality.s, "NA")), text = TRUE, labelTranslations = c('includeNA(mdquality.s, "NA")' = "QOL"))), c("|QOL | | Male (N=916) | Female (N=583) | Total (N=1499) | p value |" , "|:---|:------------|:---------------:|:---------------:|:---------------:|:-------:|" , "|0 |Age in Years | | | | 0.648 |" , "| |- Mean (SD) | 59.714 (12.246) | 60.702 (10.634) | 60.089 (11.627) | |" , "| |- Range | 29.000 - 82.000 | 35.000 - 81.000 | 29.000 - 82.000 | |" , "|1 |Age in Years | | | | 0.013 |" , "| |- Mean (SD) | 60.445 (11.447) | 58.693 (11.611) | 59.763 (11.537) | |" , "| |- Range | 19.000 - 88.000 | 26.000 - 88.000 | 19.000 - 88.000 | |" , "|NA |Age in Years | | | | 0.933 |" , "| |- Mean (SD) | 60.876 (10.589) | 61.000 (12.557) | 60.925 (11.379) | |" , "| |- Range | 36.000 - 85.000 | 22.000 - 81.000 | 22.000 - 85.000 | |" , "" , "" , "|QOL | | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value |", "|:---|:------------|:---------------:|:-----------------:|:---------------:|:---------------:|:-------:|", "|0 |Age in Years | | | | | 0.238 |", "| |- Mean (SD) | 58.317 (11.869) | 62.154 (11.037) | 58.968 (12.098) | 60.089 (11.627) | |", "| |- Range | 30.000 - 79.000 | 41.000 - 82.000 | 29.000 - 76.000 | 29.000 - 82.000 | |", "|1 |Age in Years | | | | | 0.891 |", "| |- Mean (SD) | 59.560 (11.279) | 59.944 (11.663) | 59.698 (11.647) | 59.763 (11.537) | |", "| |- Range | 28.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | 19.000 - 88.000 | |", "|NA |Age in Years | | | | | 0.949 |", "| |- Mean (SD) | 61.364 (11.522) | 60.788 (11.725) | 60.854 (10.009) | 60.925 (11.379) | |", "| |- Range | 27.000 - 81.000 | 22.000 - 85.000 | 40.000 - 81.000 | 22.000 - 85.000 | |" ) ) }) #################################################### test_that("01/31/2019: modpval.tableby (#174, #175)", { tmp <- tableby(sex ~ age + arm, data = mockstudy, strata = fu.stat, test = FALSE) expect_true(!any(c("test", "p.value") %in% names(as.data.frame(tmp)))) tmp <- modpval.tableby(tmp, data.frame(y = "sex", strata = c("1", "2"), x = c("age", "arm"), p = c(1, 0.5)), use.pname = TRUE) expect_identical( capture.kable(summary(tmp, pfootnote = TRUE, text = TRUE)), c("|fu.stat | | Male (N=916) | Female (N=583) | Total (N=1499) | p |", "|:-------|:-------------|:---------------:|:---------------:|:---------------:|:---------:|", "|1 |Age in Years | | | | 1.000 (1) |", "| |- Mean (SD) | 58.253 (12.048) | 61.018 (10.649) | 59.336 (11.561) | |", "| |- Range | 32.000 - 85.000 | 35.000 - 80.000 | 32.000 - 85.000 | |", "| |Treatment Arm | | | | |", "| |- A: IFL | 14 (16.1%) | 4 (7.1%) | 18 (12.6%) | |", "| |- F: FOLFOX | 56 (64.4%) | 43 (76.8%) | 99 (69.2%) | |", "| |- G: IROX | 17 (19.5%) | 9 (16.1%) | 26 (18.2%) | |", "|2 |Age in Years | | | | |", "| |- Mean (SD) | 60.686 (11.278) | 59.059 (11.824) | 60.054 (11.516) | |", "| |- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |", "| |Treatment Arm | | | | 0.500 (1) |", "| |- A: IFL | 263 (31.7%) | 147 (27.9%) | 410 (30.2%) | |", "| |- F: FOLFOX | 355 (42.8%) | 237 (45.0%) | 592 (43.7%) | |", "| |- G: IROX | 211 (25.5%) | 143 (27.1%) | 354 (26.1%) | |", "1. Modified by user" ) ) }) test_that("05/20/2019: non-alphabetical strata with missing values (#215)", { dat <- data.frame( strat = factor(rep(1:2, each = 5), levels = 2:1), a = c("a", "a", "a", "b", NA, "b", "a", "b", "a", "b") ) expect_identical( capture.kable(summary(tableby(~ a, data = dat, strata = strat))), c("|strat | | Overall (N=10) |", "|:-----|:------------------------|:--------------:|", "|2 |**a** | |", "| |   a | 2 (40.0%) |", "| |   b | 3 (60.0%) |", "|1 |**a** | |", "| |   N-Miss | 1 |", "| |   a | 3 (75.0%) |", "| |   b | 1 (25.0%) |" ) ) }) test_that("06/11/2019: retaining control.list with merge() (#221)", { tab1 <- tableby(sex ~ age, data = mockstudy) tab2 <- tableby(sex ~ anova(ps, digits = 0), data = mockstudy) # this also doesn't work... I'm planning to fix this expect_identical( capture.kable(summary(merge(tab1, tab2))), capture.kable(summary(merge(tab2, tab1)[2:1])) ) expect_identical( capture.kable(summary(merge(tab1, tab2), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | 0.048|", "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |", "|ps | | | | 0.345|", "|- N-Miss | 162 | 104 | 266 | |", "|- Mean (SD) | 1 (1) | 1 (1) | 1 (1) | |", "|- Range | 0 - 2 | 0 - 2 | 0 - 2 | |" ) ) }) test_that("Multiple labels work (#310)", { tb <- tableby(list(sex, arm) ~ age, data = mockstudy) expect_identical( capture.kable(summary(tb, title = "Just one", text=TRUE)), c("Table: Just one" , "" , "| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|" , "|:------------|:---------------:|:---------------:|:---------------:|-------:|" , "|Age in Years | | | | 0.048|" , "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |" , "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |" , "" , "" , "| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:------------|:---------------:|:-----------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | | 0.614|", "|- Mean (SD) | 59.673 (11.365) | 60.301 (11.632) | 59.763 (11.499) | 59.985 (11.519) | |", "|- Range | 27.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | 19.000 - 88.000 | |" ) ) expect_identical( capture.kable(summary(tb, title = c("Just one", "both"), text=TRUE)), c("Table: Just one" , "" , "| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|" , "|:------------|:---------------:|:---------------:|:---------------:|-------:|" , "|Age in Years | | | | 0.048|" , "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |" , "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |" , "" , "" , "Table: both" , "" , "| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:------------|:---------------:|:-----------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | | 0.614|", "|- Mean (SD) | 59.673 (11.365) | 60.301 (11.632) | 59.763 (11.499) | 59.985 (11.519) | |", "|- Range | 27.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | 19.000 - 88.000 | |" ) ) expect_identical( capture.kable(summary(tb, title = list(NULL, "both"), text=TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|" , "|:------------|:---------------:|:---------------:|:---------------:|-------:|" , "|Age in Years | | | | 0.048|" , "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |" , "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |" , "" , "" , "Table: both" , "" , "| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:------------|:---------------:|:-----------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | | 0.614|", "|- Mean (SD) | 59.673 (11.365) | 60.301 (11.632) | 59.763 (11.499) | 59.985 (11.519) | |", "|- Range | 27.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | 19.000 - 88.000 | |" ) ) }) test_that("total.pos = 'before' (#320)", { expect_identical( capture.kable(summary(tableby(sex ~ age, data = mockstudy, strata = arm, total.pos = "before"), text = TRUE)), c("|Treatment Arm | | Total (N=1499) | Male (N=916) | Female (N=583) | p value |", "|:-------------|:------------|:---------------:|:---------------:|:---------------:|:-------:|", "|A: IFL |Age in Years | | | | 0.572 |", "| |- Mean (SD) | 59.673 (11.365) | 59.903 (11.347) | 59.252 (11.422) | |", "| |- Range | 27.000 - 88.000 | 28.000 - 83.000 | 27.000 - 88.000 | |", "|F: FOLFOX |Age in Years | | | | 0.286 |", "| |- Mean (SD) | 60.301 (11.632) | 60.691 (11.598) | 59.729 (11.679) | |", "| |- Range | 19.000 - 88.000 | 19.000 - 88.000 | 22.000 - 83.000 | |", "|G: IROX |Age in Years | | | | 0.051 |", "| |- Mean (SD) | 59.763 (11.499) | 60.702 (10.999) | 58.355 (12.113) | |", "| |- Range | 26.000 - 85.000 | 29.000 - 85.000 | 26.000 - 82.000 | |" ) ) }) test_that("cat.droplevels (#318)", { d <- data.frame( z = rep(LETTERS[1:3], each = 10), x = rep(c("a", "b", "b"), each = 10), y = rep(LETTERS[1:3], times = 10), stringsAsFactors = FALSE ) expect_identical( capture.kable(summary(tableby(y ~ x, data = d, strata = z, cat.droplevels = FALSE), text = TRUE)), c("|z | | A (N=10) | B (N=10) | C (N=10) | Total (N=30) | p value |", "|:--|:----|:----------:|:----------:|:----------:|:------------:|:-------:|", "|A |x | | | | | |", "| |- a | 4 (100.0%) | 3 (100.0%) | 3 (100.0%) | 10 (100.0%) | |", "| |- b | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |", "|B |x | | | | | |", "| |- a | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |", "| |- b | 3 (100.0%) | 4 (100.0%) | 3 (100.0%) | 10 (100.0%) | |", "|C |x | | | | | |", "| |- a | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |", "| |- b | 3 (100.0%) | 3 (100.0%) | 4 (100.0%) | 10 (100.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(y ~ x, data = d, strata = z, cat.droplevels = TRUE), text = TRUE)), c("|z | | A (N=10) | B (N=10) | C (N=10) | Total (N=30) | p value |", "|:--|:----|:----------:|:----------:|:----------:|:------------:|:-------:|", "|A |x | | | | | |", "| |- a | 4 (100.0%) | 3 (100.0%) | 3 (100.0%) | 10 (100.0%) | |", "|B |x | | | | | |", "| |- b | 3 (100.0%) | 4 (100.0%) | 3 (100.0%) | 10 (100.0%) | |", "|C |x | | | | | |", "| |- b | 3 (100.0%) | 3 (100.0%) | 4 (100.0%) | 10 (100.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(y ~ x, data = d, strata = z, cat.droplevels = TRUE), text = TRUE)), capture.kable(summary(tableby(y ~ chisq(x, cat.droplevels = TRUE), data = d, strata = z), text = TRUE)) ) expect_identical( capture.kable(summary(tableby(y ~ x, data = d, strata = z, cat.droplevels = TRUE, test.always = TRUE), text = TRUE)), capture.kable(summary(tableby(y ~ chisq(x, cat.droplevels = TRUE), data = d, strata = z, test.always = TRUE), text = TRUE)) ) expect_identical( capture.kable(summary(tableby(y ~ x, data = d, strata = z, cat.droplevels = TRUE, test.always = TRUE), text = TRUE)), capture.kable(summary(tableby(y ~ x, data = d, strata = z, test.always = TRUE), text = TRUE))[c(1:4, 6, 8:9, 11)] ) }) arsenal/tests/testthat/test_tableby.R0000644000176200001440000032602614056441350017520 0ustar liggesusers## Tests for tableby context("Testing the tableby output") # "mdat" now defined in helper-data.R ########################################################################################################### #### Basic two-sided tableby ########################################################################################################### test_that("A basic two-sided tableby call--no labels, no missings", { expect_identical( capture.kable(summary(tableby(Group ~ Sex + time + dt, data = mdat, numeric.stats = c("meansd", "q1q3", "range")), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:-----------------------:|:-----------------------:|:-----------------------:|:-----------------------:|-------:|", "|Sex | | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|time | | | | | 0.025|", "|- Mean (SD) | 4.567 (1.813) | 3.167 (2.036) | 3.833 (2.001) | 3.856 (2.014) | |", "|- Q1, Q3 | 3.250, 6.000 | 1.250, 5.000 | 2.000, 5.000 | 2.000, 6.000 | |", "|- Range | 0.000 - 7.000 | 0.000 - 6.000 | 1.000 - 7.000 | 0.000 - 7.000 | |", "|dt | | | | | 0.391|", "|- Median | 1950-01-07 | 1951-06-13 | 1948-09-13 | 1949-10-07 | |", "|- Range | 1935-08-15 - 1968-05-14 | 1937-02-08 - 1959-09-06 | 1939-04-01 - 1958-07-30 | 1935-08-15 - 1968-05-14 | |" ) ) }) test_that("A basic two-sided tableby call--labels, no missings", { skip_if_not_installed("coin") expect_identical( capture.kable(summary(tableby(Group ~ Age + trt + Phase, data = mdat, numeric.stats = c("meansd", "q1q3", "range")), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:-------------|:---------------:|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | | 0.906|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | 39.700 (5.258) | |", "|- Q1, Q3 | 36.000, 44.500 | 37.250, 41.750 | 35.250, 44.000 | 36.000, 43.000 | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | 29.000 - 53.000 | |", "|Treatment Arm | | | | | 0.659|", "|- A | 14 (46.7%) | 11 (36.7%) | 11 (36.7%) | 36 (40.0%) | |", "|- B | 16 (53.3%) | 19 (63.3%) | 19 (63.3%) | 54 (60.0%) | |", "|Phase | | | | | 0.008|", "|- I | 11 (36.7%) | 12 (40.0%) | 0 (0.0%) | 23 (25.6%) | |", "|- II | 10 (33.3%) | 12 (40.0%) | 19 (63.3%) | 41 (45.6%) | |", "|- III | 9 (30.0%) | 6 (20.0%) | 11 (36.7%) | 26 (28.9%) | |" ) ) }) test_that("A basic two-sided tableby call--no labels, some missings", { expect_identical( capture.kable(summary(tableby(Group ~ ethan, data = mdat), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:----------|:-----------:|:----------:|:----------:|:------------:|-------:|", "|ethan | | | | | 0.178|", "|- N-Miss | 3 | 0 | 0 | 3 | |", "|- Ethan | 17 (63.0%) | 13 (43.3%) | 12 (40.0%) | 42 (48.3%) | |", "|- Heinzen | 10 (37.0%) | 17 (56.7%) | 18 (60.0%) | 45 (51.7%) | |" ) ) }) ########################################################################################################### #### Basic one-sided tableby ########################################################################################################### test_that("A basic one-sided tableby call--no labels, no missings", { expect_identical( capture.kable(summary(tableby(~ Sex + time + dt, data = mdat), text = TRUE)), c("| | Overall (N=90) |", "|:------------|:-----------------------:|", "|Sex | |", "|- Female | 46 (51.1%) |", "|- Male | 44 (48.9%) |", "|time | |", "|- Mean (SD) | 3.856 (2.014) |", "|- Range | 0.000 - 7.000 |", "|dt | |", "|- Median | 1949-10-07 |", "|- Range | 1935-08-15 - 1968-05-14 |" ) ) }) test_that("A basic one-sided tableby call--labels, no missings", { expect_identical( capture.kable(summary(tableby(~ Age + trt, data = mdat, numeric.stats = c("meansd", "q1q3", "range")), text = TRUE)), c("| | Overall (N=90) |", "|:-------------|:---------------:|", "|Age in Years | |", "|- Mean (SD) | 39.700 (5.258) |", "|- Q1, Q3 | 36.000, 43.000 |", "|- Range | 29.000 - 53.000 |", "|Treatment Arm | |", "|- A | 36 (40.0%) |", "|- B | 54 (60.0%) |" ) ) }) test_that("A basic one-sided tableby call--no labels, some missings (Sarah Jenkins's Error)", { expect_identical( capture.kable(summary(tableby(~ ethan, data = mdat), text = TRUE)), c("| | Overall (N=90) |", "|:----------|:--------------:|", "|ethan | |", "|- N-Miss | 3 |", "|- Ethan | 42 (48.3%) |", "|- Heinzen | 45 (51.7%) |" ) ) }) ########################################################################################################### #### Change totals/p-values ########################################################################################################### test_that("A basic two-sided tableby call--no p-value, no total", { expect_identical( capture.kable(summary(tableby(Group ~ Age + Sex, data = mdat, test = FALSE, total = FALSE), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) |", "|:------------|:---------------:|:---------------:|:---------------:|", "|Age in Years | | | |", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 |", "|Sex | | | |", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) |" ) ) expect_identical( capture.kable(summary(tableby(Group ~ Age + Sex, data = mdat), test = FALSE, total = FALSE, text = TRUE)), capture.kable(summary(tableby(Group ~ Age + Sex, data = mdat, test = FALSE, total = FALSE), text = TRUE)) ) }) test_that("A basic two-sided tableby call--p-value, no total", { expect_identical( capture.kable(summary(tableby(Group ~ Age + Sex, data = mdat, total = FALSE), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | 0.906|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | |", "|Sex | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | |" ) ) expect_identical( capture.kable(summary(tableby(Group ~ Age + Sex, data = mdat), total = FALSE, text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | 0.906|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | |", "|Sex | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | |" ) ) }) ########################################################################################################### #### markdown output ########################################################################################################### test_that("A basic two-sided tableby markdown output", { expect_identical( capture.kable(summary(tableby(Group ~ Age + Sex + notest(ethan) + dt, data = mdat, numeric.stats = c("meansd", "q1q3", "range"), total = FALSE), pfootnote = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | p value|", "|:---------------------------|:-----------------------:|:-----------------------:|:-----------------------:|--------:|", "|**Age in Years** | | | | 0.906^1^|", "|   Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | |", "|   Q1, Q3 | 36.000, 44.500 | 37.250, 41.750 | 35.250, 44.000 | |", "|   Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | |", "|**Sex** | | | | 0.733^2^|", "|   Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | |", "|   Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | |", "|**ethan** | | | | |", "|   N-Miss | 3 | 0 | 0 | |", "|   Ethan | 17 (63.0%) | 13 (43.3%) | 12 (40.0%) | |", "|   Heinzen | 10 (37.0%) | 17 (56.7%) | 18 (60.0%) | |", "|**dt** | | | | 0.391^3^|", "|   Median | 1950-01-07 | 1951-06-13 | 1948-09-13 | |", "|   Range | 1935-08-15 - 1968-05-14 | 1937-02-08 - 1959-09-06 | 1939-04-01 - 1958-07-30 | |", "1. Linear Model ANOVA" , "2. Pearson's Chi-squared test" , "3. Kruskal-Wallis rank sum test" ) ) }) ########################################################################################################### #### Other warnings and tests and things... ########################################################################################################### test_that("A warning occurs using one-sided formula and na.tableby", { expect_error(tableby(~ ethan, data = mdat, na.action = na.tableby), "na.tableby now generates functions") expect_warning(tableby(~ ethan, data = mdat, na.action = na.tableby(TRUE))) }) test_that("The by-variable droplevels is working correctly", { expect_identical( capture.kable(summary(tableby(Group.fac ~ Sex + time + dt, data = mdat[mdat$Group.fac %in% c("High", "Low"), ]), text = TRUE)), c("| | High (N=30) | Low (N=30) | Total (N=60) | p value|", "|:------------|:-----------------------:|:-----------------------:|:-----------------------:|-------:|", "|Sex | | | | 0.605|", "|- Female | 15 (50.0%) | 17 (56.7%) | 32 (53.3%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 28 (46.7%) | |", "|time | | | | 0.007|", "|- Mean (SD) | 4.567 (1.813) | 3.167 (2.036) | 3.867 (2.038) | |", "|- Range | 0.000 - 7.000 | 0.000 - 6.000 | 0.000 - 7.000 | |", "|dt | | | | 0.574|", "|- Median | 1950-01-07 | 1951-06-13 | 1950-07-02 | |", "|- Range | 1935-08-15 - 1968-05-14 | 1937-02-08 - 1959-09-06 | 1935-08-15 - 1968-05-14 | |" ) ) }) test_that("Using cat.simplify", { expect_identical( capture.kable(summary(tableby(Group ~ Sex + trt, data = mdat, cat.simplify = TRUE), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:-------------|:-----------:|:----------:|:----------:|:------------:|-------:|", "|Sex | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | 0.733|", "|Treatment Arm | 16 (53.3%) | 19 (63.3%) | 19 (63.3%) | 54 (60.0%) | 0.659|" ) ) }) test_that("Reordering variables", { expect_identical( capture.kable(summary(tableby(Group ~ Sex + dt + Age, data = mdat)[c(3,1,2)], text = TRUE)), capture.kable(summary(tableby(Group ~ Age + Sex + dt, data = mdat), text = TRUE)) ) expect_identical( capture.kable(summary(sort(tableby(Group ~ Sex + dt + Age, data = mdat)))), capture.kable(summary(tableby(Group ~ dt + Sex + Age, data = mdat))) ) expect_identical( capture.kable(summary(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[c(3,1,2)], text = TRUE)), capture.kable(summary(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[c("Age", "Sex", "dt")], text = TRUE)) ) expect_identical( capture.kable(summary(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[1:2], text = TRUE)), capture.kable(summary(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[c(TRUE, TRUE, FALSE)], text = TRUE)) ) expect_identical( capture.kable(summary(tableby(Group ~ fe(Sex) + dt + Age, data = mdat), text = TRUE)), capture.kable(summary(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[], text = TRUE)) ) expect_warning(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[1:4], "Some indices not found") expect_error(tableby(Group ~ fe(Sex) + dt + Age, data = mdat)[TRUE], "Logical vector") }) test_that("Merging tableby objects", { skip_if_not_installed("coin") tb1 <- tableby(Group ~ Sex + Phase, data = mdat) tb2 <- tableby(Group.fac ~ Age, data = mdat) tb3 <- tableby(Group ~ Age + dt, data = mdat) tb4 <- tableby(Group ~ chisq(Sex, "count"), data = mdat) expect_error(merge(tb1, tb2), "No terms in common") expect_error(merge(tb1, tableby(Group ~ Age, data = set_labels(mdat, list(Group = "Eek")))), "By-variables not identical") expect_identical( capture.kable(summary(merge(tb1, tb2, all = TRUE))), c(capture.kable(summary(tb1)), "", "", capture.kable(summary(tb2))) ) expect_identical( capture.kable(summary(merge(tb1, tb3), text = TRUE)), capture.kable(summary(tableby(Group ~ Sex + Phase + Age + dt, data = mdat), text = TRUE)) ) expect_identical( capture.kable(summary(merge(tb1, tb4), text = TRUE)), capture.kable(summary(tableby(Group ~ chisq(Sex, "count") + Phase, data = mdat), text = TRUE)) ) }) test_that("Changing tests", { expect_identical( capture.kable(summary(tableby(Group ~ fe(Sex) + kwt(Age) + notest(Phase), data = mdat, numeric.stats = c("meansd", "q1q3", "range")), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|:---------------:|-------:|", "|Sex | | | | | 0.806|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|Age in Years | | | | | 0.869|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | 39.700 (5.258) | |", "|- Q1, Q3 | 36.000, 44.500 | 37.250, 41.750 | 35.250, 44.000 | 36.000, 43.000 | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | 29.000 - 53.000 | |", "|Phase | | | | | |", "|- I | 11 (36.7%) | 12 (40.0%) | 0 (0.0%) | 23 (25.6%) | |", "|- II | 10 (33.3%) | 12 (40.0%) | 19 (63.3%) | 41 (45.6%) | |", "|- III | 9 (30.0%) | 6 (20.0%) | 11 (36.7%) | 26 (28.9%) | |" ) ) expect_identical( capture.kable(summary(tableby(Group ~ Sex + Age + Phase, data = mdat, numeric.test = "kwt", cat.test = "fe", ordered.test = "notest"), text = TRUE)), capture.kable(summary(tableby(Group ~ fe(Sex) + kwt(Age) + notest(Phase), data = mdat), text = TRUE)) ) }) test_that("Changing labels", { tb <- tableby(Group ~ Sex + Age, data = mdat) expect_error(labels(tb) <- c("Group", "Sex", "Age")) expect_warning(labels(tb) <- c(hi = "hi", Sex = "Sex", Age = "Age"), NA) expect_identical( capture.kable(summary(tb, text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|:---------------:|-------:|", "|Sex | | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|Age | | | | | 0.906|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | 39.700 (5.258) | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | 29.000 - 53.000 | |" ) ) labels(tb) <- NULL expect_identical( capture.kable(summary(tb, text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|:---------------:|-------:|", "|Sex | | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|Age | | | | | 0.906|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | 39.700 (5.258) | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | 29.000 - 53.000 | |" ) ) labels(tb) <- list(Age = "Age (yrs)", Sex = "Gender") expect_identical(labels(tb), c(Group = "Group", Sex = "Gender", Age = "Age (yrs)")) expect_identical( capture.kable(summary(tb, text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|:---------------:|-------:|", "|Gender | | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|Age (yrs) | | | | | 0.906|", "|- Mean (SD) | 40.033 (6.217) | 39.633 (3.873) | 39.433 (5.569) | 39.700 (5.258) | |", "|- Range | 29.000 - 53.000 | 32.000 - 48.000 | 30.000 - 52.000 | 29.000 - 53.000 | |" ) ) }) round.p <- function(x) { x$p.value <- round(x$p.value, 5) row.names(x) <- NULL x } set.seed(1000) test_that("05/25/2017: simulate.p.value option for chisq.test", { expect_true(identical( round.p(tests(tableby(Group ~ Sex + time + dt, data = mdat, subset=Group != "High",simulate.p.value=TRUE))), data.frame(Group = "Group", Variable = c("Sex", "time", "dt"), p.value = c(0.61169, 0.20595, 0.17144), Method = c("Pearson's Chi-squared test with simulated p-value\n\t (based on 2000 replicates)", "Linear Model ANOVA", "Kruskal-Wallis rank sum test"), stringsAsFactors = FALSE) )) }) test_that("05/25/2017: chisq.correct=FALSE option for chisq.test", { expect_true(identical( round.p(tests(tableby(Group ~ Sex + time + dt, data = mdat, subset=Group != "High", chisq.correct=FALSE))), data.frame(Group = "Group", Variable = c("Sex", "time", "dt"), p.value = c(0.43832, 0.20595, 0.17144), Method = c("Pearson's Chi-squared test", "Linear Model ANOVA", "Kruskal-Wallis rank sum test"), stringsAsFactors = FALSE) )) }) set.seed(1000) test_that("05/25/2017: simulate.p.value=TRUE option for fisher.test", { expect_true(identical( round.p(tests(tableby(Group ~ fe(Sex) + time + dt, data = mdat, simulate.p.value=TRUE, B = 1999))), data.frame(Group = "Group", Variable = c("Sex", "time", "dt"), p.value = c(0.80000, 0.02480, 0.39127), Method = c("Fisher's Exact Test for Count Data with simulated p-value\n\t (based on 1999 replicates)", "Linear Model ANOVA", "Kruskal-Wallis rank sum test"), stringsAsFactors = FALSE) )) }) ########################################################################################################### #### Reported bugs for tableby ########################################################################################################### test_that("02/07/2017: Ryan Lennon's R Markdown spacing problem. Also 02/14/2018 (#65)", { expect_error(capture.kable(summary(tableby(Group ~ Sex + time + dt, data = mdat), text = TRUE)), NA) }) dat <- data.frame(x = c("A", "A", "A", rep(c("B", "C"), each = 7)), y = c("cough", "pneumonia", NA, "chest pain", "chest pain", "chest pain", "cough", "cough", "pneumonia", "cough", "cough", "pneumonia", "chest pain", "chest pain", "pneumonia", NA, NA)) dat$y <- factor(dat$y) test_that("02/07/2017: Jason Sinnwell's countpct problem", { expect_identical( capture.kable(summary(tableby(x ~ fe(y), data = dat), text = TRUE)), c("| | A (N=3) | B (N=7) | C (N=7) | Total (N=17) | p value|", "|:-------------|:---------:|:---------:|:---------:|:------------:|-------:|", "|y | | | | | 0.750|", "|- N-Miss | 1 | 0 | 2 | 3 | |", "|- chest pain | 0 (0.0%) | 3 (42.9%) | 2 (40.0%) | 5 (35.7%) | |", "|- cough | 1 (50.0%) | 3 (42.9%) | 1 (20.0%) | 5 (35.7%) | |", "|- pneumonia | 1 (50.0%) | 1 (14.3%) | 2 (40.0%) | 4 (28.6%) | |" ) ) }) test_that("02/07/2017: Jason Sinnwell's chisq problem", { expect_identical( capture.kable(summary(tableby(x ~ y, data = dat[dat$y == "cough",]), text = TRUE)), c("| | A (N=1) | B (N=3) | C (N=1) | Total (N=5) | p value|", "|:-------------|:----------:|:----------:|:----------:|:-----------:|-------:|", "|y | | | | | |", "|- chest pain | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |", "|- cough | 1 (100.0%) | 3 (100.0%) | 1 (100.0%) | 5 (100.0%) | |", "|- pneumonia | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(x ~ as.character(y), data = dat[dat$y == "cough",]), text = TRUE)), c("| | A (N=1) | B (N=3) | C (N=1) | Total (N=5) | p value|", "|:---------------|:----------:|:----------:|:----------:|:-----------:|-------:|", "|as.character(y) | | | | | 0.449|", "|- cough | 1 (100.0%) | 3 (100.0%) | 1 (100.0%) | 5 (100.0%) | |" ) ) }) rm(dat) test_that("03/17/2017: Beth's medianq1q3 label", { expect_identical( capture.kable(summary(tableby(Group ~ ht_in + time, data = mdat, control = tableby.control(numeric.stats = c("Nmiss2", "medianq1q3"))), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------------|:-----------------------:|:-----------------------:|:-----------------------:|:-----------------------:|-------:|", "|Height in Inches | | | | | 0.785|", "|- N-Miss | 0 | 0 | 0 | 0 | |", "|- Median (Q1, Q3) | 64.500 (62.000, 68.000) | 64.000 (61.000, 68.750) | 64.500 (62.000, 68.000) | 64.000 (62.000, 68.000) | |", "|time | | | | | 0.025|", "|- N-Miss | 0 | 0 | 0 | 0 | |", "|- Median (Q1, Q3) | 5.000 (3.250, 6.000) | 3.000 (1.250, 5.000) | 4.000 (2.000, 5.000) | 4.000 (2.000, 6.000) | |" ) ) }) test_that("04/12/2017: Katherine King's cat.simplify vs tableby.control", { expect_identical( capture.kable(summary(tableby(Group ~ trt + Sex, data = mdat, control = tableby.control(), cat.simplify = TRUE), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:-------------|:-----------:|:----------:|:----------:|:------------:|-------:|", "|Treatment Arm | 16 (53.3%) | 19 (63.3%) | 19 (63.3%) | 54 (60.0%) | 0.659|", "|Sex | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | 0.733|" ) ) }) data(mockstudy) temp <- mockstudy[1:5,] test_that("05/24/2017: Katherine King's count vs countpct", { expect_identical( capture.kable(summary(tableby(arm ~ sex + age, data=temp, cat.stats="count", test = FALSE), text = TRUE)), c("| | A: IFL (N=2) | F: FOLFOX (N=2) | G: IROX (N=1) | Total (N=5) |", "|:------------|:---------------:|:---------------:|:---------------:|:---------------:|", "|sex | | | | |", "|- Male | 0 | 1 | 0 | 1 |", "|- Female | 2 | 1 | 1 | 4 |", "|age | | | | |", "|- Mean (SD) | 62.000 (16.971) | 68.000 (1.414) | 71.000 (NA) | 66.200 (9.418) |", "|- Range | 50.000 - 74.000 | 67.000 - 69.000 | 71.000 - 71.000 | 50.000 - 74.000 |" ) ) }) df <- data.frame(x = c("a ", "a ", "b", "b ", "c", "c"), y = c("A", "A", "A", "B", "B", "B"), stringsAsFactors = FALSE) ##table(df$x, df$y) test_that("05/24/2017: Missy Larson and Ethan Heinzen trailing spaces on char x variable", { expect_identical( capture.kable(summary(tableby(y ~ x, data = df, test = FALSE), text = TRUE)), c("| | A (N=3) | B (N=3) | Total (N=6) |", "|:----|:---------:|:---------:|:-----------:|", "|x | | | |", "|- a | 2 (66.7%) | 0 (0.0%) | 2 (33.3%) |", "|- b | 1 (33.3%) | 0 (0.0%) | 1 (16.7%) |", "|- b | 0 (0.0%) | 1 (33.3%) | 1 (16.7%) |", "|- c | 0 (0.0%) | 2 (66.7%) | 2 (33.3%) |" ) ) }) test_that("08/02/2017: Chi-square warnings are suppressed", { expect_warning(tableby(arm ~ sex, data = mockstudy, subset = 1:5), NA) }) test_that("08/26/2017: Richard Pendegraft and using formulize and tableby (#21)", { # tableby was having trouble identifying one-sided formulas when you use formulize expect_warning(tableby(formulize(x = 11, data = mdat), data = mdat, na.action = na.tableby(TRUE)), "It appears you're using na.tableby") expect_identical( capture.kable(summary(tableby(Group ~ fe(Sex) + kwt(Age), data = mdat), text = TRUE)), capture.kable(summary(tableby(formulize("Group", c("fe(Sex)", "kwt(Age)")), data = mdat), text = TRUE)) ) }) df <- data.frame(a = c("b", "b", "b", "a", "a"), d = NA_character_, e = c(1, 2, 2, 1, 2), stringsAsFactors = FALSE) test_that("08/30/2017: Brendan Broderick and zero-length levels (#22)", { expect_error(tableby(a ~ d + e, data = df), "Zero-length levels") }) test_that("09/13/2017: Peter Martin and rounding to integers (#23)", { expect_identical( capture.kable(summary(tableby(Group ~ Sex + time + dt, data = mdat, numeric.stats = c("meansd", "q1q3", "range"), digits = 0, digits.p = 3), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:------------|:-----------------------:|:-----------------------:|:-----------------------:|:-----------------------:|-------:|", "|Sex | | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|time | | | | | 0.025|", "|- Mean (SD) | 5 (2) | 3 (2) | 4 (2) | 4 (2) | |", "|- Q1, Q3 | 3, 6 | 1, 5 | 2, 5 | 2, 6 | |", "|- Range | 0 - 7 | 0 - 6 | 1 - 7 | 0 - 7 | |", "|dt | | | | | 0.391|", "|- Median | 1950-01-07 | 1951-06-13 | 1948-09-13 | 1949-10-07 | |", "|- Range | 1935-08-15 - 1968-05-14 | 1937-02-08 - 1959-09-06 | 1939-04-01 - 1958-07-30 | 1935-08-15 - 1968-05-14 | |" ) ) expect_warning(tableby(Group ~ Sex + time + dt, data = mdat, digits.p = -1)) expect_warning(tableby(Group ~ Sex + time + dt, data = mdat, digits = -1)) }) dat <- data.frame(a = c("b", "b", "b", "a", "a", "a"), b = c("a", "b", "a", "b", "a", "b"), stringsAsFactors = FALSE) attr(dat$a, "stats") <- c("countpct", "Nmiss") test_that("11/10/2017: trouble with 'stats' attribute (#39)", { expect_error(tableby(~ a + b, data = dat), NA) }) colnames(dat) <- c("1y", "2x") test_that("11/15/2017: Krista Goergen and non-syntactic names (#41)", { expect_identical( capture.kable(summary(tableby(`1y` ~ `2x`, data = dat), text = TRUE)), c("| | a (N=3) | b (N=3) | Total (N=6) | p value|", "|:----|:---------:|:---------:|:-----------:|-------:|", "|2x | | | | 0.414|", "|- a | 1 (33.3%) | 2 (66.7%) | 3 (50.0%) | |", "|- b | 2 (66.7%) | 1 (33.3%) | 3 (50.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(`1y` ~ fe(`2x`), data = dat), text = TRUE)), c("| | a (N=3) | b (N=3) | Total (N=6) | p value|", "|:----|:---------:|:---------:|:-----------:|-------:|", "|2x | | | | 1.000|", "|- a | 1 (33.3%) | 2 (66.7%) | 3 (50.0%) | |", "|- b | 2 (66.7%) | 1 (33.3%) | 3 (50.0%) | |" ) ) expect_identical( capture.kable(summary(tableby( ~ `2x`, data = dat), text = TRUE)), c("| | Overall (N=6) |", "|:----|:-------------:|", "|2x | |", "|- a | 3 (50.0%) |", "|- b | 3 (50.0%) |" ) ) }) test_that("7/27/2017: as.data.frame.tableby and dates (#10)", { expect_identical(as.data.frame(tableby(~ dt, data = mdat))$Overall[[3]][2], as.Date("1968-05-14")) }) test_that("01/24/2018: count, countN, and countpct at the same time (#51, #201)", { dat <- data.frame(y = rep(c("C", "D"), times = 5), x = rep(c("A", "B"), each = 5), stringsAsFactors = FALSE) expect_identical( capture.kable(summary(tableby(y ~ x, data = dat, cat.stats = c("count", "countN", "countpct")), text = TRUE)), c("| | C (N=5) | D (N=5) | Total (N=10) | p value|", "|:----|:---------:|:---------:|:------------:|-------:|", "|x | | | | 0.527|", "|- A | 3 | 2 | 5 | |", "|- B | 2 | 3 | 5 | |", "|- A | 3/5 | 2/5 | 5/10 | |", "|- B | 2/5 | 3/5 | 5/10 | |", "|- A | 3 (60.0%) | 2 (40.0%) | 5 (50.0%) | |", "|- B | 2 (40.0%) | 3 (60.0%) | 5 (50.0%) | |" ) ) }) test_that("01/30/2018: additional follow-up statistics (#32)", { skip_if_not(getRversion() >= "3.3.0") skip_if_not_installed("survival", "2.41-3") require(survival) expect_identical( capture.kable(summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("medSurv", "Nevents", "NeventsSurv", "Nrisk", "NriskSurv", "medTime")), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:-----------------------------|:------------:|:--------------:|:--------------:|-------:|", "|Surv(fu.time/365.25, fu.stat) | | | | 0.975|", "|- Median Survival | 1.506 | 1.487 | 1.495 | |", "|- Events | 829 | 527 | 1356 | |", "|- time = 1 | 286 (68.7) | 202 (65.3) | 488 (67.4) | |", "|- time = 2 | 597 (34.4) | 391 (32.8) | 988 (33.7) | |", "|- time = 3 | 748 (17.5) | 481 (17.0) | 1229 (17.3) | |", "|- time = 4 | 809 (9.4) | 513 (10.9) | 1322 (10.1) | |", "|- time = 5 | 825 (6.3) | 525 (7.4) | 1350 (6.8) | |", "|- time = 1 | 626 | 380 | 1006 | |", "|- time = 2 | 309 | 190 | 499 | |", "|- time = 3 | 152 | 95 | 247 | |", "|- time = 4 | 57 | 51 | 108 | |", "|- time = 5 | 24 | 18 | 42 | |", "|- time = 1 | 626 (68.7) | 380 (65.3) | 1006 (67.4) | |", "|- time = 2 | 309 (34.4) | 190 (32.8) | 499 (33.7) | |", "|- time = 3 | 152 (17.5) | 95 (17.0) | 247 (17.3) | |", "|- time = 4 | 57 (9.4) | 51 (10.9) | 108 (10.1) | |", "|- time = 5 | 24 (6.3) | 18 (7.4) | 42 (6.8) | |", "|- Median Follow-Up | 4.665 | 4.413 | 4.561 | |" ) ) }) test_that("01/31/2018 and 6/4/18: row and cell percents (#9, #106)", { catstats <- c("Nmiss", "countpct", "countrowpct", "countcellpct", "binomCI", "rowbinomCI") expect_identical( capture.kable(summary(tableby(Group ~ Sex + ethan, data = mdat, cat.stats = catstats), text = TRUE)), c("| | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:----------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|-------:|", "|Sex | | | | | 0.733|", "|- Female | 15 (50.0%) | 17 (56.7%) | 14 (46.7%) | 46 (51.1%) | |", "|- Male | 15 (50.0%) | 13 (43.3%) | 16 (53.3%) | 44 (48.9%) | |", "|- Female | 15 (32.6%) | 17 (37.0%) | 14 (30.4%) | 46 (100.0%) | |", "|- Male | 15 (34.1%) | 13 (29.5%) | 16 (36.4%) | 44 (100.0%) | |", "|- Female | 15 (16.7%) | 17 (18.9%) | 14 (15.6%) | 46 (51.1%) | |", "|- Male | 15 (16.7%) | 13 (14.4%) | 16 (17.8%) | 44 (48.9%) | |", "|- Female | 0.500 (0.313, 0.687) | 0.567 (0.374, 0.745) | 0.467 (0.283, 0.657) | 0.511 (0.403, 0.618) | |", "|- Male | 0.500 (0.313, 0.687) | 0.433 (0.255, 0.626) | 0.533 (0.343, 0.717) | 0.489 (0.382, 0.597) | |", "|- Female | 0.326 (0.195, 0.480) | 0.370 (0.232, 0.525) | 0.304 (0.177, 0.458) | 1.000 (0.923, 1.000) | |", "|- Male | 0.341 (0.205, 0.499) | 0.295 (0.168, 0.452) | 0.364 (0.224, 0.522) | 1.000 (0.920, 1.000) | |", "|ethan | | | | | 0.178|", "|- N-Miss | 3 | 0 | 0 | 3 | |", "|- Ethan | 17 (63.0%) | 13 (43.3%) | 12 (40.0%) | 42 (48.3%) | |", "|- Heinzen | 10 (37.0%) | 17 (56.7%) | 18 (60.0%) | 45 (51.7%) | |", "|- Ethan | 17 (40.5%) | 13 (31.0%) | 12 (28.6%) | 42 (100.0%) | |", "|- Heinzen | 10 (22.2%) | 17 (37.8%) | 18 (40.0%) | 45 (100.0%) | |", "|- Ethan | 17 (19.5%) | 13 (14.9%) | 12 (13.8%) | 42 (48.3%) | |", "|- Heinzen | 10 (11.5%) | 17 (19.5%) | 18 (20.7%) | 45 (51.7%) | |", "|- Ethan | 0.630 (0.424, 0.806) | 0.433 (0.255, 0.626) | 0.400 (0.227, 0.594) | 0.483 (0.374, 0.592) | |", "|- Heinzen | 0.370 (0.194, 0.576) | 0.567 (0.374, 0.745) | 0.600 (0.406, 0.773) | 0.517 (0.408, 0.626) | |", "|- Ethan | 0.405 (0.256, 0.567) | 0.310 (0.176, 0.471) | 0.286 (0.157, 0.446) | 1.000 (0.916, 1.000) | |", "|- Heinzen | 0.222 (0.112, 0.371) | 0.378 (0.238, 0.535) | 0.400 (0.257, 0.557) | 1.000 (0.921, 1.000) | |" ) ) }) test_that("01/31/2018: include NAs in percents (#57, #62)", { mdat2 <- mdat attr(mdat2$ethan, "label") <- "Ethan" expect_identical( capture.kable(summary(tableby(Sex ~ includeNA(ethan, label = "N-Miss") + includeNA(ethan, first = TRUE, label = "N-Miss"), data = mdat2, cat.stats = "countrowpct"), text = TRUE)), c("| | Female (N=46) | Male (N=44) | Total (N=90) | p value|", "|:----------|:-------------:|:-----------:|:------------:|-------:|", "|Ethan | | | | 0.229|", "|- Ethan | 18 (42.9%) | 24 (57.1%) | 42 (100.0%) | |", "|- Heinzen | 27 (60.0%) | 18 (40.0%) | 45 (100.0%) | |", "|- N-Miss | 1 (33.3%) | 2 (66.7%) | 3 (100.0%) | |", "|Ethan | | | | 0.229|", "|- N-Miss | 1 (33.3%) | 2 (66.7%) | 3 (100.0%) | |", "|- Ethan | 18 (42.9%) | 24 (57.1%) | 42 (100.0%) | |", "|- Heinzen | 27 (60.0%) | 18 (40.0%) | 45 (100.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(Sex ~ includeNA(ethan, label = "N-Miss"), data = mdat2, cat.stats = "countpct"), text = TRUE)), c("| | Female (N=46) | Male (N=44) | Total (N=90) | p value|", "|:----------|:-------------:|:-----------:|:------------:|-------:|", "|Ethan | | | | 0.229|", "|- Ethan | 18 (39.1%) | 24 (54.5%) | 42 (46.7%) | |", "|- Heinzen | 27 (58.7%) | 18 (40.9%) | 45 (50.0%) | |", "|- N-Miss | 1 (2.2%) | 2 (4.5%) | 3 (3.3%) | |" ) ) }) test_that("02/23/2018: wrapping long labels (#59)", { labs <- list( Group = "This is a really long label for the Group variable", time = "Another really long label. Can you believe how long this is", dt = "ThisLabelHasNoSpacesSoLetsSeeHowItBehaves" ) expect_identical( capture.kable(print(summary(tableby(Sex ~ Group + time + dt, data = set_labels(mdat, labs)), text = TRUE), width = 30)), c("| | Female (N=46) | Male (N=44) | Total (N=90) | p value|", "|:------------------------------|:-----------------------:|:-----------------------:|:-----------------------:|-------:|", "|This is a really long label | | | | 0.733|", "|for the Group variable | | | | |", "|- High | 15 (32.6%) | 15 (34.1%) | 30 (33.3%) | |", "|- Low | 17 (37.0%) | 13 (29.5%) | 30 (33.3%) | |", "|- Med | 14 (30.4%) | 16 (36.4%) | 30 (33.3%) | |", "|Another really long label. | | | | 0.237|", "|Can you believe how long this | | | | |", "|is | | | | |", "|- Mean (SD) | 3.609 (1.926) | 4.114 (2.093) | 3.856 (2.014) | |", "|- Range | 0.000 - 7.000 | 0.000 - 7.000 | 0.000 - 7.000 | |", "|ThisLabelHasNoSpacesSoLetsSeeH | | | | 0.339|", "|owItBehaves | | | | |", "|- Median | 1948-12-07 | 1951-03-26 | 1949-10-07 | |", "|- Range | 1935-08-15 - 1959-09-06 | 1937-02-08 - 1968-05-14 | 1935-08-15 - 1968-05-14 | |" ) ) }) test_that("02/26/2018: all NA vars (#80, #81, #82, #83, #84)", { dat <- data.frame(y = factor(c("A", "A", "A", "B", "B")), x = c(1, 2, 3, NA, NA)) expect_identical( capture.kable(summary(tableby(y ~ x, data = dat, numeric.test = "anova"), text = TRUE)), c("| | A (N=3) | B (N=2) | Total (N=5) | p value|", "|:------------|:-------------:|:-------:|:-------------:|-------:|", "|x | | | | |", "|- N-Miss | 0 | 2 | 2 | |", "|- Mean (SD) | 2.000 (1.000) | NA | 2.000 (1.000) | |", "|- Range | 1.000 - 3.000 | NA | 1.000 - 3.000 | |" ) ) expect_identical( capture.kable(summary(tableby(y ~ x, data = dat, numeric.test = "kwt"), text = TRUE)), c("| | A (N=3) | B (N=2) | Total (N=5) | p value|", "|:------------|:-------------:|:-------:|:-------------:|-------:|", "|x | | | | |", "|- N-Miss | 0 | 2 | 2 | |", "|- Mean (SD) | 2.000 (1.000) | NA | 2.000 (1.000) | |", "|- Range | 1.000 - 3.000 | NA | 1.000 - 3.000 | |" ) ) dat2 <- data.frame(Group = rep(1:2, each=5), A = rep(c(1, NA), each=5), B = rep(factor(c("A", NA)), each=5)) expect_identical( capture.kable(summary(tableby(Group ~ A + B, data = dat2), text = TRUE)), c("| | 1 (N=5) | 2 (N=5) | Total (N=10) | p value|", "|:------------|:-------------:|:-------:|:-------------:|-------:|", "|A | | | | |", "|- N-Miss | 0 | 5 | 5 | |", "|- Mean (SD) | 1.000 (0.000) | NA | 1.000 (0.000) | |", "|- Range | 1.000 - 1.000 | NA | 1.000 - 1.000 | |", "|B | | | | 0.025|", "|- N-Miss | 0 | 5 | 5 | |", "|- A | 5 (100.0%) | 0 | 5 (100.0%) | |" ) ) skip_if_not(getRversion() >= "3.3.0") skip_if_not_installed("survival", "2.41-3") require(survival) expect_identical( capture.kable(summary(tableby(y ~ Surv(x), data=dat, times = 1:2, surv.stats=c("medSurv", "Nevents", "NeventsSurv", "Nrisk", "NriskSurv", "medTime")), text = TRUE)), c("| | A (N=3) | B (N=2) | Total (N=5) | p value|", "|:-------------------|:--------:|:-------:|:-----------:|-------:|", "|Surv(x) | | | | |", "|- Median Survival | 2.000 | NA | 2.000 | |", "|- Events | 3 | NA | 3 | |", "|- time = 1 | 1 (66.7) | NA | 1 (66.7) | |", "|- time = 2 | 2 (33.3) | NA | 2 (33.3) | |", "|- time = 1 | 3 | NA | 3 | |", "|- time = 2 | 2 | NA | 2 | |", "|- time = 1 | 3 (66.7) | NA | 3 (66.7) | |", "|- time = 2 | 2 (33.3) | NA | 2 (33.3) | |", "|- Median Follow-Up | NA | NA | NA | |" ) ) }) test_that("03/07/2018 and 07/17/2019: quantiles for dates and IQR and mad (#86)", { expect_identical( capture.kable(summary(tableby(Sex ~ dt + ht_in + Age, data = mdat, numeric.stats = c("q1q3", "iqr", "medianmad"), date.stats = c("q1q3", "iqr", "medianmad")), text = TRUE)), c("| | Female (N=46) | Male (N=44) | Total (N=90) | p value|", "|:----------------|:--------------------------:|:--------------------------:|:--------------------------:|-------:|", "|dt | | | | 0.339|", "|- Q1, Q3 | 1946-04-26, 1953-11-07 | 1946-11-27, 1954-06-13 | 1946-06-13, 1954-04-26 | |", "|- IQR | 2751.250 days | 2755.500 days | 2873.250 days | |", "|- Median (MAD) | 1948-12-07 (1574.000 days) | 1951-03-26 (1420.500 days) | 1949-10-07 (1601.500 days) | |", "|Height in Inches | | | | 0.786|", "|- Q1, Q3 | 61.250, 68.000 | 62.000, 68.000 | 62.000, 68.000 | |", "|- IQR | 6.750 | 6.000 | 6.000 | |", "|- Median (MAD) | 65.000 (3.000) | 64.000 (3.000) | 64.000 (4.000) | |", "|Age in Years | | | | 0.818|", "|- Q1, Q3 | 36.000, 44.000 | 37.000, 41.250 | 36.000, 43.000 | |", "|- IQR | 8.000 | 4.250 | 7.000 | |", "|- Median (MAD) | 39.000 (4.000) | 39.500 (2.500) | 39.000 (3.000) | |" ) ) }) test_that("06/19/2018: term.name (#109)", { expect_identical( capture.kable(summary(tableby(Group ~ ethan, data = mdat), text = TRUE, term.name = "Term")), c("|Term | High (N=30) | Low (N=30) | Med (N=30) | Total (N=90) | p value|", "|:----------|:-----------:|:----------:|:----------:|:------------:|-------:|", "|ethan | | | | | 0.178|", "|- N-Miss | 3 | 0 | 0 | 3 | |", "|- Ethan | 17 (63.0%) | 13 (43.3%) | 12 (40.0%) | 42 (48.3%) | |", "|- Heinzen | 10 (37.0%) | 17 (56.7%) | 18 (60.0%) | 45 (51.7%) | |" ) ) }) mockstudy$grp <- c(rep("Group1", 749), rep("Group2",749), "") test_that("08/23/2018: empty string in by-variable (#121)", expect_warning(summary(tableby(grp ~ race, data=mockstudy)), "Empty")) test_that("08/24/2018: latex (#123, #258)", { expect_identical( capture.output(summary(tableby(Group ~ ethan, data = mdat), text = "latex")), c("" , "\\begin{tabular}{l|c|c|c|c|r}" , "\\hline" , " & High (N=30) & Low (N=30) & Med (N=30) & Total (N=90) & p value\\\\", "\\hline" , "\\textbf{ethan} & & & & & 0.178\\\\" , "\\hline" , "~~~N-Miss & 3 & 0 & 0 & 3 & \\\\" , "\\hline" , "~~~Ethan & 17 (63.0\\%) & 13 (43.3\\%) & 12 (40.0\\%) & 42 (48.3\\%) & \\\\" , "\\hline" , "~~~Heinzen & 10 (37.0\\%) & 17 (56.7\\%) & 18 (60.0\\%) & 45 (51.7\\%) & \\\\", "\\hline" , "\\end{tabular}" , "" ) ) }) test_that("09/07/2018: using countpct with numerics (#137)", { expect_identical( capture.kable(summary(tableby(y ~ chisq(x, "countpct"), data = data.frame(y = c("A", "B", "C"), x = c(1, 2, 3))), text = TRUE)), c("| | A (N=1) | B (N=1) | C (N=1) | Total (N=3) | p value|", "|:----|:----------:|:----------:|:----------:|:-----------:|-------:|", "|x | | | | | 0.199|", "|- 1 | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 (33.3%) | |", "|- 2 | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 (33.3%) | |", "|- 3 | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 1 (33.3%) | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ anova(ps, "count", "meansd"), data = mockstudy), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:-------------:|:--------------:|:--------------:|-------:|", "|ps | | | | 0.345|", "|- 0 | 391 | 244 | 635 | |", "|- 1 | 329 | 202 | 531 | |", "|- 2 | 34 | 33 | 67 | |", "|- Mean (SD) | 0.527 (0.583) | 0.559 (0.621) | 0.539 (0.598) | |" ) ) }) test_that("09/07/2018: specifying different digits (#107) and cat.simplify (#134) and numeric.simplify (#139)", { tmp.mockstudy <- mockstudy tmp.mockstudy$date <- as.Date("2019-03-01") + c(rep(1, times = 750), rep(2, times = 749)) tmp.mockstudy$date2 <- tmp.mockstudy$date tmp.mockstudy$ord <- ordered(c(rep("A", times = 749), rep("B", times = 750))) tmp.mockstudy$ord2 <- tmp.mockstudy$ord expect_identical( capture.kable(summary(tableby(arm ~ I(age/10) + chisq(sex, digits.count=1, digits.pct=0, cat.simplify=TRUE) + race + anova(ast, digits=0, digits.count=1) + kwt(fu.time, "medianq1q3", digits=0) + kwt(date, date.simplify=TRUE) + notest(ord, ordered.simplify=TRUE) + date2 + notest(ord2), numeric.simplify=TRUE, date.stats = "median", data = tmp.mockstudy), text=TRUE)), c("| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:-------------------|:--------------:|:-----------------:|:---------------:|:--------------:|-------:|", "|Age in Years | | | | | 0.614|", "|- Mean (SD) | 5.967 (1.136) | 6.030 (1.163) | 5.976 (1.150) | 5.999 (1.152) | |", "|- Range | 2.700 - 8.800 | 1.900 - 8.800 | 2.600 - 8.500 | 1.900 - 8.800 | |", "|sex | 151.0 (35%) | 280.0 (41%) | 152.0 (40%) | 583.0 (39%) | 0.190|", "|Race | | | | | 0.367|", "|- N-Miss | 0 | 6 | 1 | 7 | |", "|- African-Am | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | 115 (7.7%) | |", "|- Asian | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | 18 (1.2%) | |", "|- Caucasian | 371 (86.7%) | 586 (85.5%) | 331 (87.3%) | 1288 (86.3%) | |", "|- Hawaii/Pacific | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | 5 (0.3%) | |", "|- Hispanic | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | 54 (3.6%) | |", "|- Native-Am/Alaska | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | 5 (0.3%) | |", "|- Other | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | 7 (0.5%) | |", "|ast | | | | | 0.507|", "|- N-Miss | 69.0 | 141.0 | 56.0 | 266.0 | |", "|- Mean (SD) | 37 (28) | 35 (27) | 36 (26) | 36 (27) | |", "|- Range | 10 - 205 | 7 - 174 | 5 - 176 | 5 - 205 | |", "|fu.time | 446 (256, 724) | 601 (345, 1046) | 516 (306, 807) | 542 (310, 878) | < 0.001|", "|date | 2019-03-02 | 2019-03-03 | 2019-03-02 | 2019-03-02 | < 0.001|", "|ord | 170 (39.7%) | 439 (63.5%) | 141 (37.1%) | 750 (50.0%) | |", "|date2 | | | | | < 0.001|", "|- Median | 2019-03-02 | 2019-03-03 | 2019-03-02 | 2019-03-02 | |", "|ord2 | | | | | |", "|- A | 258 (60.3%) | 252 (36.5%) | 239 (62.9%) | 749 (50.0%) | |", "|- B | 170 (39.7%) | 439 (63.5%) | 141 (37.1%) | 750 (50.0%) | |" ) ) }) test_that("09/19/2018: specifying different stats for character and logical variables (#142)", { expect_identical( capture.kable(summary(tableby(arm ~ chisq(race, "countpct") + chisq(I(sex == "Male"), "count"), data = mockstudy), text = TRUE)), c('| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|', '|:-------------------|:--------------:|:-----------------:|:---------------:|:--------------:|-------:|', '|Race | | | | | 0.367|', '|- African-Am | 39 (9.1%) | 49 (7.2%) | 27 (7.1%) | 115 (7.7%) | |', '|- Asian | 1 (0.2%) | 14 (2.0%) | 3 (0.8%) | 18 (1.2%) | |', '|- Caucasian | 371 (86.7%) | 586 (85.5%) | 331 (87.3%) | 1288 (86.3%) | |', '|- Hawaii/Pacific | 1 (0.2%) | 3 (0.4%) | 1 (0.3%) | 5 (0.3%) | |', '|- Hispanic | 12 (2.8%) | 28 (4.1%) | 14 (3.7%) | 54 (3.6%) | |', '|- Native-Am/Alaska | 2 (0.5%) | 1 (0.1%) | 2 (0.5%) | 5 (0.3%) | |', '|- Other | 2 (0.5%) | 4 (0.6%) | 1 (0.3%) | 7 (0.5%) | |', '|I(sex == "Male") | | | | | 0.190|', '|- FALSE | 151 | 280 | 152 | 583 | |', '|- TRUE | 277 | 411 | 228 | 916 | |' ) ) }) test_that("10/19/2018: padjust works on tableby objects (#146)", { tab <- tableby(sex ~ age + arm + race + ps + alk.phos, data = mockstudy) expect_identical( capture.kable(summary(padjust(tab, method = "bonfer"), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:-------------------|:-----------------:|:-----------------:|:-----------------:|-------:|", "|Age in Years | | | | 0.238|", "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |", "|Treatment Arm | | | | 0.952|", "|- A: IFL | 277 (30.2%) | 151 (25.9%) | 428 (28.6%) | |", "|- F: FOLFOX | 411 (44.9%) | 280 (48.0%) | 691 (46.1%) | |", "|- G: IROX | 228 (24.9%) | 152 (26.1%) | 380 (25.4%) | |", "|Race | | | | 1.000|", "|- N-Miss | 6 | 1 | 7 | |", "|- African-Am | 65 (7.1%) | 50 (8.6%) | 115 (7.7%) | |", "|- Asian | 11 (1.2%) | 7 (1.2%) | 18 (1.2%) | |", "|- Caucasian | 787 (86.5%) | 501 (86.1%) | 1288 (86.3%) | |", "|- Hawaii/Pacific | 2 (0.2%) | 3 (0.5%) | 5 (0.3%) | |", "|- Hispanic | 37 (4.1%) | 17 (2.9%) | 54 (3.6%) | |", "|- Native-Am/Alaska | 3 (0.3%) | 2 (0.3%) | 5 (0.3%) | |", "|- Other | 5 (0.5%) | 2 (0.3%) | 7 (0.5%) | |", "|ps | | | | 1.000|", "|- N-Miss | 162 | 104 | 266 | |", "|- Mean (SD) | 0.527 (0.583) | 0.559 (0.621) | 0.539 (0.598) | |", "|- Range | 0.000 - 2.000 | 0.000 - 2.000 | 0.000 - 2.000 | |", "|alk.phos | | | | 1.000|", "|- N-Miss | 162 | 104 | 266 | |", "|- Mean (SD) | 167.893 (130.754) | 170.664 (124.965) | 168.969 (128.492) | |", "|- Range | 10.000 - 1014.000 | 7.000 - 771.000 | 7.000 - 1014.000 | |" ) ) expect_identical( capture.kable(summary(padjust(tab, "bonfer"), pfootnote = TRUE)), capture.kable(padjust(summary(tab, pfootnote = TRUE), "bonfer")) ) }) test_that("02/26/2019: digits and stats are maintained when subsetting (#182, #183)", { mck <- mockstudy attr(mck$arm, "name") <- "armm" tmp <- tableby(sex ~ kwt(age, digits = 1, "meansd") + chisq(arm, "count", digits.count = 1), data = mck, subset = age < 65) expect_identical( capture.kable(summary(tmp)), c("| | Male (N=552) | Female (N=375) | Total (N=927) | p value|", "|:---------------------------|:------------:|:--------------:|:-------------:|-------:|", "|**Age in Years** | | | | 0.143|", "|   Mean (SD) | 53.3 (8.4) | 52.5 (8.6) | 53.0 (8.5) | |", "|**Treatment Arm** | | | | 0.404|", "|   A: IFL | 169.0 | 100.0 | 269.0 | |", "|   F: FOLFOX | 246.0 | 173.0 | 419.0 | |", "|   G: IROX | 137.0 | 102.0 | 239.0 | |" ) ) }) test_that("03/27/2019: cat.simplify and numeric.simplify work right, even with custom stats (#199, #200, #203)", { dat <- data.frame(x = c("A", "A")) expect_identical( capture.kable(summary(tableby(~ x, data = dat, numeric.simplify = TRUE), text = TRUE)), c("| | Overall (N=2) |", "|:----|:-------------:|", "|x | |", "|- A | 2 (100.0%) |" ) ) expect_identical( capture.kable(summary(tableby(~ x, data = dat, cat.simplify = TRUE), text = TRUE)), c("| | Overall (N=2) |", "|:--|:-------------:|", "|x | 2 (100.0%) |" ) ) # mystat <- countpct # expect_identical( # capture.kable(summary(tableby(~ x, data = dat, cat.simplify = TRUE, cat.stats = "mystat"), text = TRUE)), # c("| | Overall (N=2) |", # "|:--|:-------------:|", # "|x | 2 (100.0%) |" # ) # ) }) test_that("04/12/2019: Missing Surv()[,2] (#208)", { skip_if_not(getRversion() >= "3.3.0") skip_if_not_installed("survival", "2.41-3") require(survival) dat <- data.frame(by = c(1, 1, 2), time = c(1, 2, 2), event = c(0, NA, 1)) expect_identical( capture.kable(summary(tableby(by ~ Surv(time, event), data = dat), text = TRUE)), c("| | 1 (N=2) | 2 (N=1) | Total (N=3) | p value|", "|:------------------|:-------:|:-------:|:-----------:|-------:|", "|Surv(time, event) | | | | 1.000|", "|- N-Miss | 1 | 0 | 1 | |", "|- Events | 0 | 1 | 1 | |", "|- Median Survival | NA | 2.000 | 2.000 | |" ) ) }) test_that("06/12/2019: labelTranslations for non-default stat tests (#220, #222)", { expect_identical( capture.kable(summary(tableby(sex ~ age + kwt(fu.time), data = mockstudy), labelTranslations = list(fu.time = "FU time"), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:-----------------:|:-----------------:|:-----------------:|-------:|", "|Age in Years | | | | 0.048|", "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |", "|FU time | | | | 0.679|", "|- Mean (SD) | 649.345 (454.332) | 648.674 (475.472) | 649.084 (462.511) | |", "|- Range | 0.000 - 2472.000 | 9.000 - 2441.000 | 0.000 - 2472.000 | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ age + kwt(fu.time), data = mockstudy), labelTranslations = list(fu.time = "FU time"), text = TRUE)), capture.kable(summary(tableby(sex ~ age + kwt(fu.time), data = mockstudy), labelTranslations = list(`kwt(fu.time)` = "FU time"), text = TRUE)) ) expect_identical( capture.kable(summary(tableby(sex ~ kwt(fu.time), data = set_labels(mockstudy, list(fu.time = "FU time"))), labelTranslations = list(NULL))), capture.kable(summary(tableby(sex ~ kwt(fu.time), data = mockstudy))) ) }) test_that("06/24/2019: fe() and chisq() works with only one level (#227)", { skip_if_not_installed("coin") expect_identical( capture.kable(summary(tableby(sex ~ fe(arm), data = mockstudy, subset = arm == "F: FOLFOX"), text = TRUE)), c("| | Male (N=411) | Female (N=280) | Total (N=691) | p value|", "|:-------------|:------------:|:--------------:|:-------------:|-------:|", "|Treatment Arm | | | | |", "|- F: FOLFOX | 411 (100.0%) | 280 (100.0%) | 691 (100.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ chisq(arm), data = mockstudy, subset = arm == "F: FOLFOX"), text = TRUE)), c("| | Male (N=411) | Female (N=280) | Total (N=691) | p value|", "|:-------------|:------------:|:--------------:|:-------------:|-------:|", "|Treatment Arm | | | | < 0.001|", "|- F: FOLFOX | 411 (100.0%) | 280 (100.0%) | 691 (100.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ ordered(arm), data = mockstudy, subset = arm == "F: FOLFOX"), text = TRUE)), c("| | Male (N=411) | Female (N=280) | Total (N=691) | p value|", "|:------------|:------------:|:--------------:|:-------------:|-------:|", "|ordered(arm) | | | | |", "|- A: IFL | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |", "|- F: FOLFOX | 411 (100.0%) | 280 (100.0%) | 691 (100.0%) | |", "|- G: IROX | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |" ) ) }) test_that("07/16/2019: n's in tableby header work with weights (#229, #257)", { d <- data.frame(a = 1:10, b = rep(c("A", "B"), 5), w = 1:10) expect_identical( capture.kable(summary(tableby(b ~ a, weights = w, data = d), text = TRUE)), c("| | A (N=25) | B (N=30) | Total (N=55) |", "|:------------|:-------------:|:--------------:|:--------------:|", "|a | | | |", "|- Mean (SD) | 6.600 (2.719) | 7.333 (2.870) | 7.000 (2.622) |", "|- Range | 1.000 - 9.000 | 2.000 - 10.000 | 1.000 - 10.000 |" ) ) d$w <- d$w + 0.111 expect_identical( capture.kable(summary(tableby(b ~ a, weights = w, data = d, digits.n = 2), text = TRUE)), c("| | A (N=25.55) | B (N=30.55) | Total (N=56.11) |", "|:------------|:-------------:|:--------------:|:---------------:|", "|a | | | |", "|- Mean (SD) | 6.565 (2.741) | 7.309 (2.881) | 6.970 (2.640) |", "|- Range | 1.000 - 9.000 | 2.000 - 10.000 | 1.000 - 10.000 |" ) ) }) test_that("07/17/2019: fix bug with confidence limits (#234)", { expect_identical( capture.kable(summary(tableby(sex ~ arm, data = mockstudy, cat.stats = "binomCI", control = tableby.control(conf.level = 0.9)), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:-------------|:--------------------:|:--------------------:|:--------------------:|-------:|", "|Treatment Arm | | | | 0.190|", "|- A: IFL | 0.302 (0.277, 0.328) | 0.259 (0.229, 0.291) | 0.286 (0.266, 0.305) | |", "|- F: FOLFOX | 0.449 (0.421, 0.476) | 0.480 (0.446, 0.515) | 0.461 (0.440, 0.483) | |", "|- G: IROX | 0.249 (0.225, 0.274) | 0.261 (0.231, 0.292) | 0.254 (0.235, 0.273) | |" ) ) }) test_that("07/17/2019: run stat test even if one group has 0 observations (#233, #250)", { skip_if_not_installed("coin") dd <- data.frame(group=factor(rep(c("A", "B", "C"), 20)), x1=1:60, x2 = rep(c("D", "E", "F"), each = 20)) dd$x1[dd$group == "C"] <- NA dd$x2[dd$group == "C"] <- NA expect_identical( capture.kable(summary(tableby(group ~ x1, data = dd), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:------------|:---------------:|:---------------:|:--------:|:---------------:|-------:|", "|x1 | | | | | |", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- Mean (SD) | 29.500 (17.748) | 30.500 (17.748) | NA | 30.000 (17.527) | |", "|- Range | 1.000 - 58.000 | 2.000 - 59.000 | NA | 1.000 - 59.000 | |" ) ) expect_identical( capture.kable(summary(tableby(group ~ x1, data = dd, test.always = TRUE), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:------------|:---------------:|:---------------:|:--------:|:---------------:|-------:|", "|x1 | | | | | 0.860|", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- Mean (SD) | 29.500 (17.748) | 30.500 (17.748) | NA | 30.000 (17.527) | |", "|- Range | 1.000 - 58.000 | 2.000 - 59.000 | NA | 1.000 - 59.000 | |" ) ) expect_identical( capture.kable(summary(tableby(group ~ x2, data = dd), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:---------|:---------:|:---------:|:--------:|:------------:|-------:|", "|x2 | | | | | |", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- D | 7 (35.0%) | 7 (35.0%) | 0 | 14 (35.0%) | |", "|- E | 7 (35.0%) | 6 (30.0%) | 0 | 13 (32.5%) | |", "|- F | 6 (30.0%) | 7 (35.0%) | 0 | 13 (32.5%) | |" ) ) expect_identical( capture.kable(summary(tableby(group ~ x2, data = dd, test.always = TRUE), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:---------|:---------:|:---------:|:--------:|:------------:|-------:|", "|x2 | | | | | 0.926|", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- D | 7 (35.0%) | 7 (35.0%) | 0 | 14 (35.0%) | |", "|- E | 7 (35.0%) | 6 (30.0%) | 0 | 13 (32.5%) | |", "|- F | 6 (30.0%) | 7 (35.0%) | 0 | 13 (32.5%) | |" ) ) expect_identical( capture.kable(summary(tableby(group ~ x2, data = dd), text = TRUE)), capture.kable(summary(tableby(group ~ fe(x2), data = dd), text = TRUE)) ) expect_identical( capture.kable(summary(tableby(group ~ fe(x2), data = dd, test.always = TRUE, subset = group != "A" | x2 != "F"), text = TRUE)), c("| | A (N=14) | B (N=20) | C (N=20) | Total (N=54) | p value|", "|:---------|:---------:|:---------:|:--------:|:------------:|-------:|", "|x2 | | | | | 0.055|", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- D | 7 (50.0%) | 7 (35.0%) | 0 | 14 (41.2%) | |", "|- E | 7 (50.0%) | 6 (30.0%) | 0 | 13 (38.2%) | |", "|- F | 0 (0.0%) | 7 (35.0%) | 0 | 7 (20.6%) | |" ) ) expect_identical( capture.kable(summary(tableby(group ~ x2, data = dd), text = TRUE)), capture.kable(summary(tableby(group ~ ordered(x2), data = dd), labelTranslations = list("ordered(x2)" = "x2"), text = TRUE)) ) skip_if_not(getRversion() >= "3.3.0") skip_if_not_installed("coin") expect_identical( capture.kable(summary(tableby(group ~ ordered(x2), data = dd, test.always = TRUE), labelTranslations = list("ordered(x2)" = "x2"), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:---------|:---------:|:---------:|:--------:|:------------:|-------:|", "|x2 | | | | | 0.849|", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- D | 7 (35.0%) | 7 (35.0%) | 0 | 14 (35.0%) | |", "|- E | 7 (35.0%) | 6 (30.0%) | 0 | 13 (32.5%) | |", "|- F | 6 (30.0%) | 7 (35.0%) | 0 | 13 (32.5%) | |" ) ) skip_if_not_installed("survival", "2.41-3") require(survival) dd$surv <- Surv(1:60) dd$surv[dd$group == "C"] <- NA expect_identical( capture.kable(summary(tableby(group ~ surv, data = dd), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:------------------|:--------:|:--------:|:--------:|:------------:|-------:|", "|surv | | | | | |", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- Events | 20 | 20 | NA | 40 | |", "|- Median Survival | 29.500 | 30.500 | NA | 30.000 | |" ) ) expect_identical( capture.kable(summary(tableby(group ~ surv, data = dd, test.always = TRUE), text = TRUE)), c("| | A (N=20) | B (N=20) | C (N=20) | Total (N=60) | p value|", "|:------------------|:--------:|:--------:|:--------:|:------------:|-------:|", "|surv | | | | | 0.690|", "|- N-Miss | 0 | 0 | 20 | 20 | |", "|- Events | 20 | 20 | NA | 40 | |", "|- Median Survival | 29.500 | 30.500 | NA | 30.000 | |" ) ) }) test_that("07/30/2019: modpval.tableby and factors (#239)", { tab1 <- tableby(arm ~ sex + age + race, total = FALSE, test = FALSE, data = mockstudy) mypval <- data.frame( byvar = factor("arm"), variable = factor(c("sex", "age", "race")), adj.pvalue = 1:3 ) tab2 <- modpval.tableby(tab1, mypval, use.pname = TRUE) expect_equal(tests(tab2)$adj.pvalue, mypval$adj.pvalue) }) test_that("07/30/2019: summary.tableby and pre-formatted p-values (#249)", { tab1 <- tableby(arm ~ sex + age, total = FALSE, test = FALSE, data = mockstudy) mypval <- data.frame(byvar = "arm", variable = "sex", adj.pvalue = "0.0001", stringsAsFactors = FALSE) tab2 <- modpval.tableby(tab1, mypval, use.pname = TRUE) expect_identical( capture.kable(summary(tab2, text = TRUE)), c("| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | adj.pvalue|", "|:------------|:---------------:|:-----------------:|:---------------:|----------:|", "|sex | | | | 0.0001|", "|- Male | 277 (64.7%) | 411 (59.5%) | 228 (60.0%) | |", "|- Female | 151 (35.3%) | 280 (40.5%) | 152 (40.0%) | |", "|Age in Years | | | | |", "|- Mean (SD) | 59.673 (11.365) | 60.301 (11.632) | 59.763 (11.499) | |", "|- Range | 27.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | |" ) ) }) test_that("10/09/2019: change title for overall and total (#253, #261, #272)", { tab1 <- tableby(~ sex + age, data = mockstudy, stats.labels = list(overall = "Total")) tab2 <- tableby(~ sex + age, data = mockstudy, stats.labels = list(overall = "Hello")) expect_identical( capture.kable(summary(tab1, text = TRUE)), c("| | Total (N=1499) |", "|:------------|:---------------:|", "|sex | |", "|- Male | 916 (61.1%) |", "|- Female | 583 (38.9%) |", "|Age in Years | |", "|- Mean (SD) | 59.985 (11.519) |", "|- Range | 19.000 - 88.000 |" ) ) expect_identical( capture.kable(summary(tab1, text = TRUE)), sub("Hello", "Total", capture.kable(summary(tab2, text = TRUE))) ) d <- data.frame( x = 10:1, by = factor(rep(c("b", "Total"), each = 5), levels = c("b", "Total")) ) tab <- tableby(by ~ x, data = d, stats.labels = list(total = "Total 2")) expect_identical( capture.kable(summary(tab, text = TRUE)), c("| | b (N=5) | Total (N=5) | Total 2 (N=10) | p value|", "|:------------|:--------------:|:-------------:|:--------------:|-------:|", "|x | | | | 0.001|", "|- Mean (SD) | 8.000 (1.581) | 3.000 (1.581) | 5.500 (3.028) | |", "|- Range | 6.000 - 10.000 | 1.000 - 5.000 | 1.000 - 10.000 | |" ) ) expect_identical( capture.kable(summary(tab, text = TRUE, total = FALSE)), c("| | b (N=5) | Total (N=5) | p value|", "|:------------|:--------------:|:-------------:|-------:|", "|x | | | 0.001|", "|- Mean (SD) | 8.000 (1.581) | 3.000 (1.581) | |", "|- Range | 6.000 - 10.000 | 1.000 - 5.000 | |" ) ) tab3 <- tableby(sex ~ age + arm, data = mockstudy, stats.labels = list(total = "Overa"), cat.stats = c("countpct", "countrowpct", "rowbinomCI")) tab4 <- tableby(sex ~ age + arm, data = mockstudy, stats.labels = list(total = "Hello"), cat.stats = c("countpct", "countrowpct", "rowbinomCI")) tab5 <- tableby(sex ~ age + arm, data = mockstudy, stats.labels = list(total = "Total"), cat.stats = c("countpct", "countrowpct", "rowbinomCI")) expect_identical( capture.kable(summary(tab5, text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:-------------|:--------------------:|:--------------------:|:--------------------:|-------:|", "|Age in Years | | | | 0.048|", "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |", "|Treatment Arm | | | | 0.190|", "|- A: IFL | 277 (30.2%) | 151 (25.9%) | 428 (28.6%) | |", "|- F: FOLFOX | 411 (44.9%) | 280 (48.0%) | 691 (46.1%) | |", "|- G: IROX | 228 (24.9%) | 152 (26.1%) | 380 (25.4%) | |", "|- A: IFL | 277 (64.7%) | 151 (35.3%) | 428 (100.0%) | |", "|- F: FOLFOX | 411 (59.5%) | 280 (40.5%) | 691 (100.0%) | |", "|- G: IROX | 228 (60.0%) | 152 (40.0%) | 380 (100.0%) | |", "|- A: IFL | 0.647 (0.600, 0.692) | 0.353 (0.308, 0.400) | 1.000 (0.991, 1.000) | |", "|- F: FOLFOX | 0.595 (0.557, 0.632) | 0.405 (0.368, 0.443) | 1.000 (0.995, 1.000) | |", "|- G: IROX | 0.600 (0.549, 0.650) | 0.400 (0.350, 0.451) | 1.000 (0.990, 1.000) | |" ) ) expect_identical( capture.kable(summary(tab3, text = TRUE)), sub("Total", "Overa", capture.kable(summary(tab5, text = TRUE))) ) expect_identical( capture.kable(summary(tab4, text = TRUE)), sub("Total", "Hello", capture.kable(summary(tab5, text = TRUE))) ) }) test_that("11/05/2019: remove N's in title (#256)", { expect_identical( capture.kable(summary(tableby(sex ~ age, data = mockstudy, digits.n = NA), text = TRUE)), c("| | Male | Female | Total | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | 0.048|", "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ age, data = mockstudy, digits.n = NA), text = TRUE))[-1], capture.kable(summary(tableby(sex ~ age, data = mockstudy), text = TRUE))[-1] ) }) test_that("11/12/2019: base summary stats work (#259, #281); meanse (#315)", { allstats <- c("min", "max", "range", "mean", "sd", "meansd", "meanse", "meanCI", "var", "median", "medianrange", "sum") expect_identical( capture.kable(summary(tableby(Sex ~ Age, data = mdat, numeric.stats = allstats), text = TRUE)), c("| | Female (N=46) | Male (N=44) | Total (N=90) | p value|", "|:-----------------|:-----------------------:|:-----------------------:|:-----------------------:|-------:|", "|Age in Years | | | | 0.818|", "|- Min | 30.000 | 29.000 | 29.000 | |", "|- Max | 49.000 | 53.000 | 53.000 | |", "|- Range | 30.000 - 49.000 | 29.000 - 53.000 | 29.000 - 53.000 | |", "|- Mean | 39.826 | 39.568 | 39.700 | |", "|- SD | 5.259 | 5.315 | 5.258 | |", "|- Mean (SD) | 39.826 (5.259) | 39.568 (5.315) | 39.700 (5.258) | |", "|- Mean (SE) | 39.826 (0.775) | 39.568 (0.801) | 39.700 (0.554) | |", "|- Mean (CI) | 39.826 (38.264, 41.388) | 39.568 (37.952, 41.184) | 39.700 (38.599, 40.801) | |", "|- Var | 27.658 | 28.251 | 27.651 | |", "|- Median | 39.000 | 39.500 | 39.000 | |", "|- Median (Range) | 39.000 (30.000, 49.000) | 39.500 (29.000, 53.000) | 39.000 (29.000, 53.000) | |", "|- Sum | 1832.000 | 1741.000 | 3573.000 | |" ) ) expect_identical( capture.kable(summary(tableby(Sex ~ dt, data = mdat, date.stats = setdiff(allstats, "sum")), text = TRUE)), c("| | Female (N=46) | Male (N=44) | Total (N=90) | p value|", "|:-----------------|:-----------------------------------:|:-----------------------------------:|:-----------------------------------:|-------:|", "|dt | | | | 0.339|", "|- Min | 1935-08-15 | 1937-02-08 | 1935-08-15 | |", "|- Max | 1959-09-06 | 1968-05-14 | 1968-05-14 | |", "|- Range | 1935-08-15 - 1959-09-06 | 1937-02-08 - 1968-05-14 | 1935-08-15 - 1968-05-14 | |", "|- Mean | 1949-06-11 | 1950-07-14 | 1949-12-23 | |", "|- SD | 1981.348 days | 2227.654 days | 2103.010 days | |", "|- Mean (SD) | 1949-06-11 (1981.348 days) | 1950-07-14 (2227.654 days) | 1949-12-23 (2103.010 days) | |", "|- Mean (SE) | 1949-06-11 (292.134 days) | 1950-07-14 (335.832 days) | 1949-12-23 (221.677 days) | |", "|- Mean (CI) | 1949-06-11 (1947-10-31, 1951-01-20) | 1950-07-14 (1948-09-05, 1952-05-22) | 1949-12-23 (1948-10-08, 1951-03-08) | |", "|- Var | 3925741.628 | 4962443.482 | 4422652.929 | |", "|- Median | 1948-12-07 | 1951-03-26 | 1949-10-07 | |", "|- Median (Range) | 1948-12-07 (1935-08-15, 1959-09-06) | 1951-03-26 (1937-02-08, 1968-05-14) | 1949-10-07 (1935-08-15, 1968-05-14) | |" ) ) }) test_that("11/13/2019: geometric summaries (#260)", { allstats <- c("gmean", "gsd", "gmeansd", "gmeanCI") blah <- data.frame( a = rep(c("A", "B", "C"), each = 5), b = c(0:4, 1:5, -1, 1:4), d = Sys.Date() + c(-1, 1:14), stringsAsFactors = FALSE ) expect_identical( capture.kable(summary(tableby(a ~ b + d, data = blah, numeric.stats = allstats, date.stats = allstats), text = TRUE)), c("| | A (N=5) | B (N=5) | C (N=5) | Total (N=15) | p value|", "|:----------------------|:----------:|:--------------------:|:-------:|:------------:|-------:|", "|b | | | | | 0.510|", "|- Geom Mean | 0.000 | 2.605 | NA | NA | |", "|- Geom SD | NA | 1.765 | NA | NA | |", "|- Geom Mean (Geom SD) | 0.000 (NA) | 2.605 (1.765) | NA | NA | |", "|- Geom Mean (CI) | 0.000 (NA) | 2.605 (1.286, 5.277) | NA | NA | |", "|d | | | | | 0.002|", "|- Geom Mean | NA | NA | NA | NA | |", "|- Geom SD | NA | NA | NA | NA | |", "|- Geom Mean (Geom SD) | NA | NA | NA | NA | |", "|- Geom Mean (CI) | NA | NA | NA | NA | |" ) ) }) test_that("12/20/2019: Npct (#263)", { expect_identical( capture.kable(summary(tableby(sex ~ arm + ps, data = mockstudy, cat.stats = "Npct", numeric.stats = c("Npct", "Nmiss")), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:-------------|:------------:|:--------------:|:--------------:|-------:|", "|Treatment Arm | | | | 0.190|", "|- N (Pct) | 916 (61.1%) | 583 (38.9%) | 1499 (100.0%) | |", "|ps | | | | 0.345|", "|- N (Pct) | 754 (61.2%) | 479 (38.8%) | 1233 (100.0%) | |", "|- N-Miss | 162 | 104 | 266 | |" ) ) }) test_that("12/27/2019: informative error when no stats are computed (#273)", { expect_error(summary(tableby(~ sex, data = mockstudy, cat.stats = "Nmiss")), "Nothing to show for variable") }) test_that("02/28/2020: as.tbstat and as.countpct are better described (#283)", { trim510bracket <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = " ", parens = c("[", "]")) } trim10pct <- function(x, weights=rep(1,length(x)), ...){ tmp <- mean(x, trim = 0.05, ...) as.countpct(c(tmp, 10), sep = " ", parens = c("(", ")"), which.count = 0, which.pct = 2, pct = "%") } expect_identical( capture.kable(summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510bracket"), stats.labels = list(Nmiss = "N-Missing", trim510bracket = "Trimmed means"), digits.count = 0, digits = 2), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:----------------|:-------------:|:--------------:|:--------------:|-------:|", "|hgb | | | | < 0.001|", "|- N-Missing | 162 | 104 | 266 | |", "|- Trimmed means | 12.57 [12.56] | 11.92 [11.91] | 12.31 [12.29] | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim10pct"), digits = 2, digits.pct = 0, digits.count = 1), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:------------:|:--------------:|:--------------:|-------:|", "|hgb | | | | < 0.001|", "|- N-Miss | 162.0 | 104.0 | 266.0 | |", "|- trim10pct | 12.57 (10%) | 11.92 (10%) | 12.31 (10%) | |" ) ) }) test_that("Warn if reserved word is used in tableby by-variable (#277)", { for(v in c("group.term", "group.label", "strata.term", "strata.value", "variable", "term", "label", "variable.type", "test", "p.value")) { expect_error(tableby(y ~ x, data = data.frame(y = rep(c("hi", v), each = 5), x = 1:10)), v) } expect_identical( capture.kable(summary(tableby(y ~ x, data = data.frame(y = rep(c("hi", "test "), each = 5), x = 1:10)), text = TRUE)), c("| | hi (N=5) | test (N=5) | Total (N=10) | p value|", "|:------------|:-------------:|:--------------:|:--------------:|-------:|", "|x | | | | 0.001|", "|- Mean (SD) | 3.000 (1.581) | 8.000 (1.581) | 5.500 (3.028) | |", "|- Range | 1.000 - 5.000 | 6.000 - 10.000 | 1.000 - 10.000 | |" ) ) }) test_that("HTML footnotes (#298)", { expect_identical( capture.output(summary(tableby(sex ~ age + arm, data = mockstudy), text = "html", pfootnote = "html")), c("" , " " , " " , " " , " " , " " , " " , " " , " " , " " , "" , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " ", " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " , "" , "
    Male (N=916) Female (N=583) Total (N=1499) p value
    Age in Years 0.0481
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    Treatment Arm 0.1902
       A: IFL 277 (30.2%) 151 (25.9%) 428 (28.6%)
       F: FOLFOX 411 (44.9%) 280 (48.0%) 691 (46.1%)
       G: IROX 228 (24.9%) 152 (26.1%) 380 (25.4%)
    " , "
      " , "
    1. Linear Model ANOVA
    2. " , "
    3. Pearson's Chi-squared test
    4. " , "
    " , "" ) ) }) test_that("selectall", { d <- data.frame( grp = rep(c("A", "B"), each = 5), option1 = c(rep(1, 4), rep(0, 6)), option2 = c(0, 1, 0, 0, 0, 1, 1, 1, 0, 0), option3 = 1, option4 = c(rep(0, 9), NA) ) d$s <- selectall(`Option 1` = d$option1, `Option 2` = d$option2, `Option 3` = d$option3, `Option 4` = d$option4) expect_identical( capture.kable(summary(tableby(grp ~ s, data = d[1:9, ]), text = TRUE)), c("| | A (N=5) | B (N=4) | Total (N=9) | p value|", "|:-----------|:----------:|:----------:|:-----------:|-------:|", "|s | | | | |", "|- Option 1 | 4 (80.0%) | 0 (0.0%) | 4 (44.4%) | |", "|- Option 2 | 1 (20.0%) | 3 (75.0%) | 4 (44.4%) | |", "|- Option 3 | 5 (100.0%) | 4 (100.0%) | 9 (100.0%) | |", "|- Option 4 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |" ) ) expect_identical( capture.kable(summary(tableby(grp ~ notest(s, "count", "Nmiss"), data = d), text = TRUE)), c("| | A (N=5) | B (N=5) | Total (N=10) | p value|", "|:-----------|:-------:|:-------:|:------------:|-------:|", "|s | | | | |", "|- Option 1 | 4 | 0 | 4 | |", "|- Option 2 | 1 | 3 | 4 | |", "|- Option 3 | 5 | 4 | 9 | |", "|- Option 4 | 0 | 0 | 0 | |", "|- N-Miss | 0 | 1 | 1 | |" ) ) }) test_that("Labels work for cat.simplify and ord.simplify (#288)", { expect_identical( capture.kable(summary(tableby(arm ~ sex + as.character(fu.stat), data = mockstudy, cat.simplify = "label"), text = TRUE)), c("| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:-------------------------|:--------------:|:-----------------:|:---------------:|:--------------:|-------:|", "|sex (Female) | 151 (35.3%) | 280 (40.5%) | 152 (40.0%) | 583 (38.9%) | 0.190|", "|as.character(fu.stat) (2) | 410 (95.8%) | 592 (85.7%) | 354 (93.2%) | 1356 (90.5%) | < 0.001|" ) ) expect_identical( capture.kable(summary(tableby(arm ~ sex + notest(as.character(fu.stat), cat.simplify = "label"), data = mockstudy, cat.simplify = TRUE), text = TRUE)), c("| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:-------------------------|:--------------:|:-----------------:|:---------------:|:--------------:|-------:|", "|sex | 151 (35.3%) | 280 (40.5%) | 152 (40.0%) | 583 (38.9%) | 0.190|", "|as.character(fu.stat) (2) | 410 (95.8%) | 592 (85.7%) | 354 (93.2%) | 1356 (90.5%) | |" ) ) }) test_that("Titles work with knitr::kable(caption=) (#310)", { expect_identical( capture.kable(summary(tableby(arm ~ sex + age, data = mockstudy), title = "My cool table", text = TRUE)), c("Table: My cool table" , "" , "| | A: IFL (N=428) | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499) | p value|", "|:------------|:---------------:|:-----------------:|:---------------:|:---------------:|-------:|", "|sex | | | | | 0.190|", "|- Male | 277 (64.7%) | 411 (59.5%) | 228 (60.0%) | 916 (61.1%) | |", "|- Female | 151 (35.3%) | 280 (40.5%) | 152 (40.0%) | 583 (38.9%) | |", "|Age in Years | | | | | 0.614|", "|- Mean (SD) | 59.673 (11.365) | 60.301 (11.632) | 59.763 (11.499) | 59.985 (11.519) | |", "|- Range | 27.000 - 88.000 | 19.000 - 88.000 | 26.000 - 85.000 | 19.000 - 88.000 | |" ) ) expect_identical( capture.kable(print(summary(tableby(arm ~ sex + age, data = mockstudy), title = "My cool table", text = TRUE), format = "pandoc")), c("Table: My cool table" , "" , " A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value", "------------- ----------------- ------------------- ----------------- ----------------- --------", "sex 0.190", "- Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%) ", "- Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%) ", "Age in Years 0.614", "- Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519) ", "- Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000 " ) ) expect_true(any(grepl("", capture.output(print(summary(tableby(arm ~ sex, data = mockstudy), title = "hi"), format = "html"))))) expect_true(any(grepl("\\\\caption", capture.output(print(summary(tableby(arm ~ sex, data = mockstudy), title = "hi"), format = "latex"))))) }) test_that("stats.labels doesn't overwrite existing labels (#316)", { expect_identical( capture.kable(summary(tableby(sex ~ age, data = mockstudy, stats.labels=list(medSurv = 'Median')), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | 0.048|", "|- Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |" ) ) expect_identical( capture.kable(summary(tableby(sex ~ age, data = mockstudy, stats.labels=NULL), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|Age in Years | | | | 0.048|", "|- meansd | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | |", "|- range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | |" ) ) }) test_that("wt (#321)", { expect_identical( capture.kable(summary(tableby(sex ~ kwt(age), data = mockstudy), text = TRUE)), capture.kable(summary(tableby(sex ~ wt(age), data = mockstudy), text = TRUE)) ) expect_identical( capture.kable(summary(tableby(sex ~ wt(age), data = head(mockstudy, 10), wilcox.correct = TRUE, wilcox.exact = FALSE), text = TRUE)), c("| | Male (N=5) | Female (N=5) | Total (N=10) | p value|", "|:------------|:---------------:|:---------------:|:---------------:|-------:|", "|age | | | | 0.463|", "|- Mean (SD) | 58.600 (6.580) | 63.000 (11.554) | 60.800 (9.163) | |", "|- Range | 50.000 - 67.000 | 50.000 - 74.000 | 50.000 - 74.000 | |" ) ) }) test_that("medtest (#327)", { skip_if_not_installed("coin") expect_identical( capture.kable(summary(tableby(sex ~ medtest(ast), data = mockstudy, numeric.stats = c("meansd", "range", "N")), text = TRUE)), c("| | Male (N=916) | Female (N=583) | Total (N=1499) | p value|", "|:------------|:---------------:|:----------------:|:---------------:|-------:|", "|ast | | | | 0.705|", "|- Mean (SD) | 35.873 (26.606) | 36.029 (27.238) | 35.933 (26.843) | |", "|- Range | 5.000 - 205.000 | 10.000 - 178.000 | 5.000 - 205.000 | |", "|- N | 754 | 479 | 1233 | |" ) ) }) arsenal/tests/testthat/write2.mylist2.doc.Rmd0000644000176200001440000000120714056510162020727 0ustar liggesusers--- title: Test title --- # Header 1 This is a small paragraph. | | Male (N=916) | Female (N=583) | Total (N=1499) | p value| |:---------------------------|:---------------:|:---------------:|:---------------:|-------:| |**Age in Years** | | | | 0.048| |   Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | | |   Q1, Q3 | 53.000, 69.000 | 52.000, 68.000 | 52.000, 68.000 | | |   Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | | arsenal/tests/testthat/write2.yaml.pdf.Rmd0000644000176200001440000000256214056510171020277 0ustar liggesusers--- title: Test title --- --- title: My title author: Ethan P Heinzen header-includes: - \usepackage[labelformat=empty]{caption} --- # Header 1 This is a small paragraph. | | Male (N=916) | Female (N=583) | Total (N=1499) | p value| |:---------------------------|:---------------:|:---------------:|:---------------:|-------:| |**Age in Years** | | | | 0.048| |   Mean (SD) | 60.455 (11.369) | 59.247 (11.722) | 59.985 (11.519) | | |   Q1, Q3 | 53.000, 69.000 | 52.000, 68.000 | 52.000, 68.000 | | |   Range | 19.000 - 88.000 | 22.000 - 88.000 | 19.000 - 88.000 | | ``` Call: lm(formula = age ~ sex, data = mockstudy) Residuals: Min 1Q Median 3Q Max -41.455 -7.455 0.753 8.545 28.753 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 60.4552 0.3802 159.001 <2e-16 *** sexFemale -1.2082 0.6097 -1.982 0.0477 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 11.51 on 1497 degrees of freedom Multiple R-squared: 0.002617, Adjusted R-squared: 0.00195 F-statistic: 3.927 on 1 and 1497 DF, p-value: 0.04769 ``` ```{r echo = FALSE, eval = TRUE} a <- 1 b <- 2 a + b a - b ``` arsenal/tests/testthat/test_comparedf.R0000644000176200001440000011774413767210531020046 0ustar liggesusers## Tests for formulize context("Testing the comparedf output") data(mockstudy) mockstudy2 <- muck_up_mockstudy() # a far simpler example df1 <- data.frame(id = paste0("person", 1:3), a = c("a", "b", "c"), b = c(1, 3, 4), c = c("f", "e", "d"), row.names = paste0("rn", 1:3), stringsAsFactors = FALSE) df2 <- data.frame(id = paste0("person", 3:1), a = c("c", "b", "a"), b = c(1, 3, 4), d = paste0("rn", 1:3), row.names = paste0("rn", c(1,3,2)), stringsAsFactors = FALSE) ########################################################################################################### #### Just test that it ran right ########################################################################################################### test_that("Basic comparison works: by row", { expect_identical( capture.output(comparedf(df1, df2)), c("Compare Object" , "" , "Function Call: " , "comparedf(x = df1, y = df2)" , "" , "Shared: 3 non-by variables and 3 observations." , "Not shared: 2 variables and 0 observations." , "" , "Differences found in 2/3 variables compared." , "0 variables compared have non-identical attributes." ) ) expect_true(n.diffs(comparedf(df1, df1)) == 0) }) test_that("Basic comparison works: by id", { expect_identical( capture.output(comparedf(df1, df2, by = "id")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = df1, y = df2, by = \"id\")" , "" , "Shared: 2 non-by variables and 3 observations." , "Not shared: 2 variables and 0 observations." , "" , "Differences found in 1/2 variables compared." , "0 variables compared have non-identical attributes." ) ) expect_true(n.diffs(comparedf(df1, df1, by = "id")) == 0) }) test_that("Basic comparison works: by row.names", { expect_identical( capture.output(comparedf(df1, df2, by = "row.names")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = df1, y = df2, by = \"row.names\")" , "" , "Shared: 3 non-by variables and 3 observations." , "Not shared: 2 variables and 0 observations." , "" , "Differences found in 3/3 variables compared." , "0 variables compared have non-identical attributes." ) ) expect_true(n.diffs(comparedf(df1, df1, by = "row.names")) == 0) }) test_that("Basic comparison works: by row.names for x and something else for y", { expect_identical( capture.output(comparedf(df1, df2, by.x = "row.names", by.y = "d")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = df1, y = df2, by.x = \"row.names\", by.y = \"d\")" , "" , "Shared: 3 non-by variables and 3 observations." , "Not shared: 1 variables and 0 observations." , "" , "Differences found in 2/3 variables compared." , "0 variables compared have non-identical attributes." ) ) }) df1$listcol <- list(1:3, "hi there", FALSE) df2$listcol <- list(FALSE, "bye now", 1:2) test_that("List-column comparison works: by id", { expect_identical( capture.output(comparedf(df1, df2, by = "id")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = df1, y = df2, by = \"id\")" , "" , "Shared: 3 non-by variables and 3 observations." , "Not shared: 2 variables and 0 observations." , "" , "Differences found in 2/3 variables compared." , "0 variables compared have non-identical attributes." ) ) expect_true(n.diffs(comparedf(df1, df1, by = "id")) == 0) }) df1$testdate <- as.Date(c("2017-07-09", "2017-08-08", "2017-09-07")) df2$testdate <- as.Date(c("2017-09-07", "2017-08-08", "2017-09-07")) test_that("Dates comparison works: by id", { expect_identical( capture.output(comparedf(df1, df2, by = "id")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = df1, y = df2, by = \"id\")" , "" , "Shared: 4 non-by variables and 3 observations." , "Not shared: 2 variables and 0 observations." , "" , "Differences found in 3/4 variables compared." , "0 variables compared have non-identical attributes." ) ) expect_true(n.diffs(comparedf(df1, df1, by = "id")) == 0) }) test_that("Basic mockstudy comparison works: by id", { expect_identical( capture.output(comparedf(mockstudy, mockstudy2, by = 'case')), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mockstudy2, by = \"case\")" , "" , "Shared: 9 non-by variables and 1495 observations." , "Not shared: 7 variables and 4 observations." , "" , "Differences found in 3/7 variables compared." , "3 variables compared have non-identical attributes." ) ) }) test_that("Comparison with empty data.frames works", { mck1 <- mockstudy[0, , drop = FALSE] mck2 <- mockstudy2[0, , drop = FALSE] expect_identical( capture.output(comparedf(mockstudy, mck2, by = 'case')), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mck2, by = \"case\")" , "" , "Shared: 9 non-by variables and 0 observations." , "Not shared: 7 variables and 1499 observations." , "" , "Differences found in 0/7 variables compared." , "3 variables compared have non-identical attributes." ) ) expect_true(n.diff.obs(comparedf(mockstudy, mck2, by = "case")) == 1499) expect_true(n.diffs(comparedf(mockstudy, mck2, by = "case")) == 0) expect_identical( capture.output(comparedf(mockstudy, mck2, by = 'row.names')), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mck2, by = \"row.names\")" , "" , "Shared: 10 non-by variables and 0 observations." , "Not shared: 7 variables and 1499 observations." , "" , "Differences found in 0/8 variables compared." , "3 variables compared have non-identical attributes." ) ) expect_true(n.diff.obs(comparedf(mockstudy, mck2, by = "row.names")) == 1499) expect_true(n.diffs(comparedf(mockstudy, mck2, by = "row.names")) == 0) expect_identical( capture.output(comparedf(mck1, mck2, by = "case")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mck1, y = mck2, by = \"case\")" , "" , "Shared: 9 non-by variables and 0 observations." , "Not shared: 7 variables and 0 observations." , "" , "Differences found in 0/7 variables compared." , "2 variables compared have non-identical attributes." ) ) expect_true(n.diff.obs(comparedf(mck1, mck2, by = "case")) == 0) expect_true(n.diffs(comparedf(mck1, mck2, by = "case")) == 0) }) ########################################################################################################### #### Check for certain errors ########################################################################################################### test_that("Different by-variables with overlap with non-by-variables throws an error", { expect_error(comparedf(df1, df2, by.x = "id", by.y = "b"), "non-by-variables") expect_error(comparedf(df1, df2, by.x = "c", by.y = "id"), NA) }) test_that("Using forbidden names throws an error", { expect_error(comparedf(df1, cbind(df2, ..row.names.. = 1:3)), "reserved colnames") }) ########################################################################################################### #### Using comparedf.control ########################################################################################################### test_that("tol.vars is working correctly", { expect_identical( capture.output(comparedf(mockstudy, mockstudy2, by = 'case', tol.vars = "._ ")), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mockstudy2, by = \"case\", tol.vars = \"._ \")", "" , "Shared: 11 non-by variables and 1495 observations." , "Not shared: 3 variables and 4 observations." , "" , "Differences found in 3/9 variables compared." , "3 variables compared have non-identical attributes." ) ) expect_identical( capture.output(comparedf(mockstudy, mockstudy2, by = 'case', tol.vars = c("._ ", "case"))), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mockstudy2, by = \"case\", tol.vars = c(\"._ \", ", " \"case\"))" , "" , "Shared: 12 non-by variables and 1495 observations." , "Not shared: 1 variables and 4 observations." , "" , "Differences found in 3/10 variables compared." , "4 variables compared have non-identical attributes." ) ) tolvars <- c(arm = "Arm", fu.time = "fu_time", fu.stat = "fu stat") expect_identical( comparedf(mockstudy, mockstudy2, by = 'case', tol.vars = tolvars)[1:2], comparedf(mockstudy, mockstudy2, by = 'case', tol.vars = c("._ ", "case"))[1:2] ) expect_warning(comparedf(mockstudy, mockstudy2, by = 'case', tol.vars = c("hi" = "Arm")), "'hi' not found in colnames of x") expect_warning(comparedf(mockstudy, mockstudy2, by = 'case', tol.vars = c(arm = "hi")), "'hi' not found in colnames of y") }) tol <- comparedf.control( tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case" # ignore case in character vectors ) test_that("tolerances are working correctly", { expect_identical( capture.output(comparedf(mockstudy, mockstudy2, by = "case", control = tol)), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mockstudy2, by = \"case\", control = tol)", "" , "Shared: 12 non-by variables and 1495 observations." , "Not shared: 1 variables and 4 observations." , "" , "Differences found in 3/12 variables compared." , "4 variables compared have non-identical attributes." ) ) }) tol.minus9 <- function(x, y, tol) { idx1 <- is.na(x) & !is.na(y) & y == -9 idx2 <- tol.num.absolute(x, y, tol) # find other absolute differences return(!idx1 & idx2) } tol$tol.num <- tol.minus9 # ignore NA -> -9 changes test_that("custom tolerances are working correctly", { expect_identical( capture.output(comparedf(mockstudy, mockstudy2, by = "case", control = tol)), c("Compare Object" , "" , "Function Call: " , "comparedf(x = mockstudy, y = mockstudy2, by = \"case\", control = tol)", "" , "Shared: 12 non-by variables and 1495 observations." , "Not shared: 1 variables and 4 observations." , "" , "Differences found in 2/12 variables compared." , "4 variables compared have non-identical attributes." ) ) }) tols <- comparedf.control( tol.vars = c("._ ", "case"), # dots=underscores=spaces; match up Arm and arm int.as.num = TRUE, # compare integers and numerics tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case" # ignore case in character vectors ) test_that("Summary numbers are reported correctly", { expect_equal( summary(comparedf(mockstudy, mockstudy2, by = "case", control = tols))$comparison.summary.table$value, c(1, 12, 12, 1, 0, 3, 9, 1495, 4, 0, 269, 1226, 270) ) }) tols$tol.vars <- "._ " # don't match arm and Arm anymore test_that("Summary numbers are still reported correctly", { expect_equal( summary(comparedf(mockstudy, mockstudy2, by = "case", control = tols))$comparison.summary.table$value, c(1, 11, 11, 2, 1, 3, 8, 1495, 4, 0, 269, 1226, 270) ) }) ########################################################################################################### #### Using custom tolerances ########################################################################################################### test_that("Custom tolerances work specified by variable", { dat1 <- dat2 <- data.frame( x1 = rep(c("A", "B", "C"), each = 10), x2 = rep(c("D", "E", "F"), each = 10), x3 = 1:30 + 0.5, x4 = 1:30 + 0, stringsAsFactors = FALSE ) dat2$x1 <- tolower(dat2$x1) dat2$x2 <- tolower(dat2$x2) dat2$x3 <- dat2$x3 + rep(c(0, -0.5, 1), each = 10) dat2$x4 <- dat2$x4 * rep(c(1, 1.1, 1.6), each = 10) expect_equal( summary(comparedf(dat1, dat2, tol.num = list("absolute", x4 = "percent"), tol.num.val = 0.5, tol.char = list(x1 = "case", x2 = function(x, y) x != y & x %nin% c("D", "E"))))$comparison.summary.table$value, c(0, 4, 4, 0, 0, 3, 1, 30, 0, 0, 10, 20, 30) ) }) ########################################################################################################### #### Using helper functions ########################################################################################################### test_that("helper functions are working correctly", { expect_true(n.diffs(comparedf(mockstudy, mockstudy2, by = "case")) == n.diffs(summary(comparedf(mockstudy, mockstudy2, by = "case")))) expect_true(n.diffs(comparedf(df1, df2, by = "id")) == n.diffs(summary(comparedf(df1, df2, by = "id")))) expect_identical(diffs(comparedf(df1, df2, by = "id")), diffs(summary(comparedf(df1, df2, by = "id")))) expect_identical(diffs(comparedf(df1, df2, by = "id"), by.var = TRUE), diffs(summary(comparedf(df1, df2, by = "id")), by.var = TRUE)) expect_identical(diffs(comparedf(df1, df2, by = "id"), vars = "a"), diffs(summary(comparedf(df1, df2, by = "id")), vars = "a")) expect_identical(diffs(comparedf(df1, df2, by = "id"), vars = "b"), diffs(summary(comparedf(df1, df2, by = "id")), vars = "b")) expect_identical(diffs(comparedf(df1, df2, by = "id"), vars = "a", by.var = TRUE), diffs(summary(comparedf(df1, df2, by = "id")), vars = "a", by.var = TRUE)) }) test_that("diff.obs() works (#305)", { expect_identical(diffs(comparedf(df1, df2, by = "id"), what = "observations"), diffs(summary(comparedf(df1, df2, by = "id")), what = "observations")) }) ########################################################################################################### #### Summary output ########################################################################################################### test_that("Summary output looks right (i.e. for factors)", { expect_identical( capture.kable(summary(comparedf(mockstudy, mockstudy2, by = "case"))), c("Table: Summary of data.frames" , "" , "version arg ncol nrow" , "-------- ----------- ----- -----" , "x mockstudy 14 1499" , "y mockstudy2 13 1495" , "" , "" , "" , "Table: Summary of overall comparison" , "" , "statistic value" , "------------------------------------------------------------ ------" , "Number of by-variables 1" , "Number of non-by variables in common 9" , "Number of variables compared 7" , "Number of variables in x but not y 4" , "Number of variables in y but not x 3" , "Number of variables compared with some values unequal 3" , "Number of variables compared with all values equal 4" , "Number of observations in common 1495" , "Number of observations in x but not y 4" , "Number of observations in y but not x 0" , "Number of observations with some compared variables unequal 1495" , "Number of observations with all compared variables equal 0" , "Number of values unequal 1762" , "" , "" , "" , "Table: Variables not shared" , "" , "version variable position class " , "-------- --------- --------- ----------" , "x age 2 integer " , "x arm 3 character " , "x fu.time 6 integer " , "x fu.stat 7 integer " , "y fu_time 11 integer " , "y fu stat 12 integer " , "y Arm 13 character " , "" , "" , "" , "Table: Other variables not compared" , "" , "var.x pos.x class.x var.y pos.y class.y " , "------ ------ ---------- ------ ------ --------" , "race 5 character race 3 factor " , "ast 12 integer ast 8 numeric " , "" , "" , "" , "Table: Observations not shared" , "" , "version case observation" , "-------- ------- ------------" , "x 88989 9" , "x 90158 8" , "x 99508 7" , "x 112263 5" , "" , "" , "" , "Table: Differences detected by variable" , "" , "var.x var.y n NAs" , "------------ ------------ ----- ----" , "sex sex 1495 0" , "ps ps 1 1" , "hgb hgb 266 266" , "bmi bmi 0 0" , "alk.phos alk.phos 0 0" , "mdquality.s mdquality.s 0 0" , "age.ord age.ord 0 0" , "" , "" , "" , "Table: Differences detected (1741 not shown)" , "" , "var.x var.y case values.x values.y row.x row.y" , "------ ------ ------ --------- --------- ------ ------" , "sex sex 76170 Male Male 26 20" , "sex sex 76240 Male Male 27 21" , "sex sex 76431 Female Female 28 22" , "sex sex 76712 Male Male 29 23" , "sex sex 76780 Female Female 30 24" , "sex sex 77066 Female Female 31 25" , "sex sex 77316 Male Male 32 26" , "sex sex 77355 Male Male 33 27" , "sex sex 77591 Male Male 34 28" , "sex sex 77851 Male Male 35 29" , "ps ps 86205 0 NA 6 3" , "hgb hgb 88714 NA -9 192 186" , "hgb hgb 88955 NA -9 204 198" , "hgb hgb 89549 NA -9 229 223" , "hgb hgb 89563 NA -9 231 225" , "hgb hgb 89584 NA -9 237 231" , "hgb hgb 89591 NA -9 238 232" , "hgb hgb 89595 NA -9 239 233" , "hgb hgb 89647 NA -9 243 237" , "hgb hgb 89665 NA -9 244 238" , "hgb hgb 89827 NA -9 255 249" , "" , "" , "" , "Table: Non-identical attributes" , "" , "var.x var.y name " , "------ ------ -------" , "sex sex label " , "sex sex levels " , "race race class " , "race race label " , "race race levels " , "bmi bmi label " ) ) }) test_that("Summary output with attributes and max.print options", { expect_identical( capture.kable(summary(comparedf(mockstudy, mockstudy2, by = "case"), show.attrs = TRUE, max.print.vars = 2, max.print.obs = 3, max.print.diffs.per.var = 3, max.print.diffs = NA, max.print.attrs = 3)), c("Table: Summary of data.frames" , "" , "version arg ncol nrow" , "-------- ----------- ----- -----" , "x mockstudy 14 1499" , "y mockstudy2 13 1495" , "" , "" , "" , "Table: Summary of overall comparison" , "" , "statistic value" , "------------------------------------------------------------ ------" , "Number of by-variables 1" , "Number of non-by variables in common 9" , "Number of variables compared 7" , "Number of variables in x but not y 4" , "Number of variables in y but not x 3" , "Number of variables compared with some values unequal 3" , "Number of variables compared with all values equal 4" , "Number of observations in common 1495" , "Number of observations in x but not y 4" , "Number of observations in y but not x 0" , "Number of observations with some compared variables unequal 1495" , "Number of observations with all compared variables equal 0" , "Number of values unequal 1762" , "" , "" , "" , "Table: Variables not shared (5 not shown)" , "" , "version variable position class " , "-------- --------- --------- ----------" , "x age 2 integer " , "x arm 3 character " , "" , "" , "" , "Table: Other variables not compared" , "" , "var.x pos.x class.x var.y pos.y class.y " , "------ ------ ---------- ------ ------ --------" , "race 5 character race 3 factor " , "ast 12 integer ast 8 numeric " , "" , "" , "" , "Table: Observations not shared (1 not shown)" , "" , "version case observation" , "-------- ------ ------------" , "x 88989 9" , "x 90158 8" , "x 99508 7" , "" , "" , "" , "Table: Differences detected by variable" , "" , "var.x var.y n NAs" , "------------ ------------ ----- ----" , "sex sex 1495 0" , "ps ps 1 1" , "hgb hgb 266 266" , "bmi bmi 0 0" , "alk.phos alk.phos 0 0" , "mdquality.s mdquality.s 0 0" , "age.ord age.ord 0 0" , "" , "" , "" , "Table: Differences detected (1755 not shown)" , "" , "var.x var.y case values.x values.y row.x row.y" , "------ ------ ------ --------- --------- ------ ------" , "sex sex 76170 Male Male 26 20" , "sex sex 76240 Male Male 27 21" , "sex sex 76431 Female Female 28 22" , "ps ps 86205 0 NA 6 3" , "hgb hgb 88714 NA -9 192 186" , "hgb hgb 88955 NA -9 204 198" , "hgb hgb 89549 NA -9 229 223" , "" , "" , "" , "Table: Non-identical attributes (3 not shown)" , "" , "var.x var.y name attr.x attr.y " , "------ ------ ------- --------------- -------------" , "sex sex label NA Sex (M/F) " , "sex sex levels Male , Female Female, Male " , "race race class NA factor " ) ) }) ########################################################################################################### #### Reported bugs for comparedf ########################################################################################################### test_that("2019/04/09: Percent tolerances work when everything is zero (#206)", { expect_true(n.diffs(comparedf(data.frame(x = 0), data.frame(x = 0), tol.num = "percent")) == 0) }) test_that("2019/04/10: summary breaks when no variables were compared (#207)", { check_it <- function(x, y, by = NULL, n = 0) { tmp <- diffs(comparedf(x, y, by = by)) expect_true(nrow(tmp) == n) expect_identical(names(tmp), c("var.x", "var.y", if(is.null(by)) "..row.names.." else by, "values.x", "values.y", "row.x", "row.y")) expect_true(inherits(tmp$values.x, "AsIs")) expect_true(inherits(tmp$values.y, "AsIs")) expect_true(is.integer(tmp$row.x)) expect_true(is.integer(tmp$row.y)) } check_it(data.frame(x = 1, y = 1), data.frame(x = 1, z = 1)) check_it(data.frame(x = 1, y = 1), data.frame(x = 1, z = 1), by = "x") check_it(data.frame(x = 1, y = 1), data.frame(x = 1, y = 1), by = "x") check_it(data.frame(x = 1, y = 1), data.frame(x = 1, y = 1.1), by = "x", n = 1) }) test_that("2019/05/15: using row.names with other by-variables", { d <- data.frame(a = 1:3, b = 2:4, row.names = c("A", "B", "C")) f <- data.frame(a = 3:1, b = 2:4, row.names = c("C", "A", "B")) expect_identical( capture.output(comparedf(d, f, by = c("a", "row.names"))), c("Compare Object" , "" , "Function Call: " , "comparedf(x = d, y = f, by = c(\"a\", \"row.names\"))", "" , "Shared: 1 non-by variables and 1 observations." , "Not shared: 0 variables and 4 observations." , "" , "Differences found in 1/1 variables compared." , "0 variables compared have non-identical attributes." ) ) }) test_that("2019/05/15: empty by-variable isn't counted", { expect_equal(summary(comparedf(mockstudy, mockstudy))$comparison.summary.table$value, c(0, 14, 14, 0, 0, 0, 14, 1499, 0, 0, 0, 1499, 0)) }) test_that("2019/05/16: multiple by-variables are counted", { expect_equal(summary(comparedf(mockstudy, mockstudy, by = c("case", "sex", "arm")))$comparison.summary.table$value, c(3, 11, 11, 0, 0, 0, 11, 1499, 0, 0, 0, 1499, 0)) }) test_that("2019/05/22: vectors that share a class are still compared (#216)", { dat2 <- dat <- data.frame(x = 1:5) class(dat2$x) <- c("myclass", class(dat2$x)) cmp <- summary(comparedf(dat, dat2)) expect_true(nrow(cmp$vars.nc.table) == 0) expect_true(nrow(cmp$diffs.table) == 0) }) test_that("Inf (#306)", { dat <- data.frame(x = c(2, Inf, -Inf, Inf, NA)) dat2 <- data.frame(x = c(2, Inf, -Inf, -Inf, Inf)) cmp <- summary(comparedf(dat, dat2)) expect_true(nrow(cmp$diffs.table) == 2) cmp <- summary(comparedf(dat, dat2, tol.num = "pct")) expect_true(nrow(cmp$diffs.table) == 2) }) arsenal/tests/testthat.R0000644000176200001440000000007213632700353015025 0ustar liggesuserslibrary(testthat) library(arsenal) test_check("arsenal") arsenal/vignettes/0000755000176200001440000000000014056514665013723 5ustar liggesusersarsenal/vignettes/paired.Rmd0000644000176200001440000000660013656527336015641 0ustar liggesusers--- title: "The paired function" author: "Ethan Heinzen, Beth Atkinson, Jason Sinnwell" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The paired function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = FALSE} options(width = 100) ``` # Introduction Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro `%paired` was written to create summary tables with a single call. With the increasing interest in R, we have developed the function `paired()` to create similar tables within the R environment. This vignette is light on purpose; `paired()` piggybacks off of tableby, so most documentation there applies here, too. # Simple Example The first step when using the `paired()` function is to load the `arsenal` package. We can't use `mockstudy` here because we need a dataset with paired observations, so we'll create our own dataset. ```{r, load-data} library(arsenal) dat <- data.frame( tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)), id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6), Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"), Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")), Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA), Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")), Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4), stringsAsFactors = FALSE ) ``` To create a simple table stratified by time point, use a `formula=` statement to specify the variables that you want summarized and the `id=` argument to specify the paired observations. ```{r results = 'asis'} p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE) summary(p) ``` The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2. # NAs Note that by default, observations which do not have both timepoints are removed. This is easily changed using the `na.action = na.paired("")` argument. For example: ```{r results = 'asis'} p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("fill")) summary(p) ``` For more details, see the help page for `na.paired()`. # Available Function Options ## Testing options The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function. The following tests are accepted: * `paired.t`: A paired t-test. * `mcnemar`: McNemar's test. * `signed.rank`: the signed-rank test. * `sign.test`: the sign test. * `notest`: Don't perform a test. ## `paired.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `paired.control` or in `summary.tableby`. ```{r} args(paired.control) ``` ## `summary.tableby` settings Since the "paired" object inherits "tableby", the `summary.tableby` function is what's actually used to format and print the table. ```{r} args(arsenal:::summary.tableby) ``` arsenal/vignettes/freqlist.Rmd0000644000176200001440000002277213656527336016236 0ustar liggesusers--- title: "The freqlist function" author: "Tina Gunderson and Ethan Heinzen" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The freqlist function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) ``` # Overview `freqlist()` is a function meant to produce output similar to SAS's `PROC FREQ` procedure when using the `/list` option of the `TABLE` statement. `freqlist()` provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the `knitr` package for printing. ```{r message = FALSE} require(arsenal) ``` ## Sample dataset For our examples, we'll load the `mockstudy` data included with this package and use it to create a basic table. Because they have fewer levels, for brevity, we'll use the variables arm, sex, and mdquality.s to create the example table. We'll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R. ```{r loading.data} # load the data data(mockstudy) # retain NAs when creating the table using the useNA argument tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") ``` # The `freqlist` object The `freqlist()` function is an S3 generic (with methods for tables and formulas) which returns an object of class `"freqlist"`. ```{r console.output} example1 <- freqlist(tab.ex) str(example1) # view the data frame portion of freqlist output head(as.data.frame(example1)) ## or use as.data.frame(example1) ``` # Basic output using `summary()` The `summary` method for `freqlist()` relies on the `kable()` function (in the `knitr` package) for printing. `knitr::kable()` converts the output to markdown which can be printed in the console or easily rendered in Word, PDF, or HTML documents. Note that you must supply `results="asis"` to properly format the markdown output. ```{r, results = 'asis'} summary(example1) ``` You can print a title for the table using the `title=` argument. ```{r, results = 'asis'} summary(example1, title="Basic freqlist output") ``` You can also easily pull out the `freqlist` data frame for more complicated formatting or manipulation (e.g. with another function such as `xtable()` or `pander()`) using `as.data.frame(summary())`: ```{r} head(as.data.frame(summary(example1))) ``` # Using a formula with `freqlist` Instead of passing a pre-computed table to `freqlist()`, you can instead pass a formula, which will be in turn passed to the `xtabs()` function. Additional `freqlist()` arguments are passed through the `...` to the `freqlist()` table method. Note that `freqlist()` sets the `addNA=TRUE` argument by default: ```{r results='asis'} summary(freqlist(~ arm + sex + mdquality.s, data = mockstudy)) ``` One can also set NAs to an explicit value using `includeNA()`. ```{r results='asis'} summary(freqlist(~ arm + sex + includeNA(mdquality.s, "Missing"), data = mockstudy)) ``` In fact, since `xtabs()` allows for left-hand-side weights, so does `freqlist()`! ```{r results='asis'} mockstudy$weights <- c(10000, rep(1, nrow(mockstudy) - 1)) summary(freqlist(weights ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ``` You can also specify multiple weights: ```{r results='asis'} mockstudy$weights2 <- c(rep(1, nrow(mockstudy) - 1), 10000) summary(freqlist(list(weights, weights2) ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ``` # Rounding percentage digits or changing variable names for printing The `digits.pct=` argument takes a single numeric value and controls the number of digits of percentages in the output. The `digits.count=` argument takes a similar argument and controls the number of digits of the count columns. The `labelTranslations=` argument is a named character vector or list. Both options are applied in the following example. ```{r labelTranslations, results = 'asis'} example2 <- freqlist(tab.ex, labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL"), digits.pct = 1, digits.count = 1) summary(example2) ``` # Additional examples ## Including combinations with frequencies of zero The `sparse=` argument takes a single logical value as input. The default option is `FALSE`. If set to `TRUE`, the sparse option will include combinations with frequencies of zero in the list of results. As our initial table did not have any such levels, we create a second table to use in our example. ```{r sparse, results = 'asis'} summary(freqlist(~ race + sex + arm, data = mockstudy, sparse = TRUE, digits.pct=1)) ``` ## Options for NA handling The various `na.options=` allow you to include or exclude data with missing values for one or more factor levels in the counts and percentages, as well as show the missing data but exclude it from the cumulative counts and percentages. The default option is to include all combinations with missing values. ```{r na.options, results = 'asis'} summary(freqlist(tab.ex, na.options="include")) summary(freqlist(tab.ex, na.options="showexclude")) summary(freqlist(tab.ex, na.options="remove")) ``` ## Frequency counts and percentages subset by factor levels The `strata=` argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse the subsetted result into a single table. ```{r freq.counts, results='asis'} example3 <- freqlist(tab.ex, strata = c("arm","sex")) summary(example3) #using the single = TRUE argument will collapse results into a single table for printing summary(example3, single = TRUE) ``` ## Show only the "n" most common combinations in each table (`head()` and `sort()`) You can now sort `freqlist()` objects, and, by taking the `head()` of the summary, output the most common frequencies. This looks the prettiest with `dupLabels=TRUE`. ```{r} head(summary(sort(example1, decreasing = TRUE), dupLabels = TRUE)) ``` ## Change labels on the fly ```{r changelabs, results = 'asis'} labs <- c(arm = "Arm", sex = "Sex", mdquality.s = "QOL", freqPercent = "%") labels(example1) <- labs summary(example1) ``` You can also supply `labelTranslations=` to `summary()`. ```{r, results = 'asis'} summary(example1, labelTranslations = labs) ``` ## Using `xtable()` to format and print `freqlist()` results Fair warning: `xtable()` has kind of a steep learning curve. These examples are given without explanation, for more advanced users. ```{r results='asis'} require(xtable) # set up custom function for xtable text italic <- function(x) paste0('', x, '') xftbl <- xtable(as.data.frame(summary(example1)), caption = "xtable formatted output of freqlist data frame", align="|r|r|r|r|c|c|c|r|") # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ``` ## Use `freqlist` in bookdown Since the backbone of `freqlist()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.freqlist()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(freqlist(~ sex + age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` # Appendix: Notes regarding table options in R ## NAs There are several widely used options for basic tables in R. The `table()` function in base R is probably the most common; by default it excludes NA values. You can change NA handling in `base::table()` using the `useNA=` or `exclude=` arguments. ```{r} # base table default removes NAs tab.d1 <- base::table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") tab.d1 ``` `xtabs()` is similar to `table()`, but uses a formula-based syntax. However, NAs must be explicitly added to each factor using the `addNA()` function or using the argument `addNA = TRUE`. ```{r} # without specifying addNA tab.d2 <- xtabs(formula = ~ arm + sex + mdquality.s, data = mockstudy) tab.d2 # now with addNA tab.d3 <- xtabs(~ arm + sex + addNA(mdquality.s), data = mockstudy) tab.d3 ``` Since the formula method of `freqlist()` uses `xtabs()`, NAs should be treated in the same way. `includeNA()` can also be helpful here for setting explicit NA values. ## Table dimname names (dnn) Supplying a data.frame to the `table()` function without giving columns individually will create a contingency table using all variables in the data.frame. However, if the columns of a data.frame or matrix are supplied separately (i.e., as vectors), column names will not be preserved. ```{r} # providing variables separately (as vectors) drops column names table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s) ``` If desired, you can use the `dnn=` argument to pass variable names. ```{r} # add the column name labels back using dnn option in base::table table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s, dnn=c("Arm", "Sex", "QOL")) ``` You can also name the arguments to `table()`: ```{r} table(Arm = mockstudy$arm, Sex = mockstudy$sex, QOL = mockstudy$mdquality.s) ``` If using `freqlist()`, you can provide the labels directly to `freqlist()` or to `summary()` using `labelTranslations=`. arsenal/vignettes/tableby.Rmd0000755000176200001440000010606514045272104016007 0ustar liggesusers--- title: "The tableby function" author: "Beth Atkinson, Ethan Heinzen, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The tableby function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = FALSE} options(width = 100) ge330 <- getRversion() >= "3.3.0" ``` # Introduction One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm). Locally at Mayo, the SAS macros `%table` and `%summary` were written to create summary tables with a single call. With the increasing interest in R, we have developed the function `tableby` to create similar tables within the R environment. In developing the `tableby()` function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R's strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `tableby()`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R Markdown documents. For those who are new to the topic, a good initial resource is available at [rmarkdown.rstudio.com](https://rmarkdown.rstudio.com/). # Simple Example The first step when using the `tableby` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples. ```{r, load-data} library(arsenal) require(knitr) require(survival) data(mockstudy) ##load data dim(mockstudy) ##look at how many subjects and variables are in the dataset # help(mockstudy) ##learn more about the dataset and variables str(mockstudy) ##quick look at the data ``` To create a simple table stratified by treatment arm, use a formula statement to specify the variables that you want summarized. The example below uses age (a continuous variable) and sex (a factor). ```{r, simple1} tab1 <- tableby(arm ~ sex + age, data=mockstudy) ``` If you want to take a quick look at the table, you can use `summary()` on your tableby object and the table will print out as text in your R console window. If you use `summary()` without any options you will see a number of $\ $ statements which translates to "space" in HTML. ## Pretty text version of table If you want a nicer version in your console window then add the `text=TRUE` option. ```{r, simple-text} summary(tab1, text=TRUE) ``` ## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the R chunk. This changes the layout slightly (compresses it) and bolds the variable names. ```{r, simple-markdown, results='asis'} summary(tab1) ``` ## Data frame version of table If you want a data.frame version, simply use `as.data.frame`. ```{r} as.data.frame(tab1) ``` ## Summaries using standard R code ```{r} ## base R frequency example tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm) tmp # Note: The continuity correction is applied by default in R (not used in %table) chisq.test(tmp) ## base R numeric summary example tapply(mockstudy$age, mockstudy$arm, summary) summary(aov(age ~ arm, data=mockstudy)) ``` # Modifying Output ## Add labels In the above example, age is shown with a label (Age in Years), but sex is listed "as is" with lower case letters. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R. ```{r, check-labels} ## Look at one variable's label attr(mockstudy$age,'label') ## See all the variables with a label unlist(lapply(mockstudy,'attr','label')) # Can also use labels(mockstudy) ``` If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset. ```{r, add-label, results='asis'} attr(mockstudy$sex,'label') <- 'Gender' tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1) ``` You can also use the built-in `data.frame` method for `labels<-`: ```{r, results = 'asis'} labels(mockstudy) <- c(age = 'Age, yrs', sex = "Gender") tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1) ``` Another option is to add labels after you have created the table ```{r, results='asis'} mylabels <- list(sex = "SEX", age = "Age, yrs") summary(tab1, labelTranslations = mylabels) ``` Alternatively, you can check the variable labels and manipulate them with a function called `labels`, which works on the `tableby` object. ```{r, assignlabels} labels(tab1) labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)") labels(tab1) ``` ```{r, results='asis'} summary(tab1) ``` ## Change summary statistics globally Currently the default behavior is to summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value. For categorical variables the default is to show: Number of missing values and count (column percent) with a chi-square p-value. This behavior can be modified using the tableby.control function. In fact, you can save your standard settings and use that for future tables. Note that `test=FALSE` and `total=FALSE` results in the total column and p-value column not being printed. ```{r, results='asis'} mycontrols <- tableby.control(test=FALSE, total=FALSE, numeric.test="kwt", cat.test="chisq", numeric.stats=c("N", "median", "q1q3"), cat.stats=c("countpct"), stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3')) tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols) summary(tab2) ``` You can also change these settings directly in the tableby call. ```{r, results='asis'} tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, numeric.stats=c("median","q1q3"), numeric.test="kwt") summary(tab3) ``` ## Change summary statistics within the formula In addition to modifying summary options globally, it is possible to modify the test and summary statistics for specific variables within the formula statement. For example, both the kwt (Kruskal-Wallis rank-based) and anova (asymptotic analysis of variance) tests apply to numeric variables, and we can use one for the variable "age", another for the variable "bmi", and no test for the variable "ast". A list of all the options is shown at the end of the vignette. The `tests` function can do a quick check on what tests were performed on each variable in tableby. ```{r, testformula} tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + notest(ast), data=mockstudy) tests(tab.test) ``` ```{r, results='asis'} summary(tab.test) ``` Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function. The function names must be strings that are functions already written for tableby, built-in R functions like mean and range, or user-defined functions. ```{r, testsAndStats, results='asis'} tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") + notest(bmi, "Nmiss","median"), data=mockstudy) summary(tab.test) ``` ## Controlling Options for Categorical Tests (Chisq and Fisher's) The formal tests for categorical variables against the levels of the by variable, chisq and fe, have options to simulate p-values. We show how to turn on the simulations for these with 500 replicates for the Fisher's test (fe). ```{r, simfe, results='asis'} set.seed(100) tab.catsim <- tableby(arm ~ sex + race, cat.test="fe", simulate.p.value=TRUE, B=500, data=mockstudy) tests(tab.catsim) ``` The chi-square test on 2x2 tables applies Yates' continuity correction by default, so we provide an option to turn off the correction. We show the results with and without the correction that is applied to treatment arm by sex, if we use subset to ignore one of the three treatment arms. ```{r, chisqcorrect, results='asis'} cat.correct <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), data=mockstudy) tests(cat.correct) cat.nocorrect <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), chisq.correct=FALSE, data=mockstudy) tests(cat.nocorrect) ``` ## Modifying the look & feel in Word documents You can easily create Word versions of `tableby` output via an Rmarkdown report and the default options will give you a reasonable table in Word - just select the "Knit Word" option in RStudio. **The functionality listed in this next paragraph is coming soon but needs an upgraded version of RStudio** If you want to modify fonts used for the table, then you'll need to add an extra line to your header at the beginning of your file. You can take the `WordStylesReference01.docx` file and modify the fonts (storing the format preferences in your project directory). To see how this works, run your report once using WordStylesReference01.docx and then WordStylesReference02.docx. ``` output: word_document reference_docx: /projects/bsi/gentools/R/lib320/arsenal/doc/WordStylesReference01.docx ``` For more information on changing the look/feel of your Word document, see the [Rmarkdown documentation](https://bookdown.org/yihui/rmarkdown/word-document.html) website. # Additional Examples Here are multiple examples showing how to use some of the different options. ## 1. Summarize without a group/by variable ```{r, nobyvar, results='asis'} tab.noby <- tableby(~ bmi + sex + age, data=mockstudy) summary(tab.noby) ``` ## 2. Display footnotes indicating which "test" was used ```{r, results="asis"} summary(tab.test, pfootnote=TRUE) ``` ## 3. Summarize an ordered factor When comparing groups of ordered data there are a couple of options. The **default** uses a general independence test available from the `coin` package. For two-group comparisons, this is essentially the Armitage trend test. The other option is to specify the Kruskal Wallis test. The example below shows both options. ```{r} mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)]))) table(mockstudy$age.ord, mockstudy$sex) table(mockstudy$age.ordnew, mockstudy$sex) class(mockstudy$age.ord) ``` ```{r, results="asis", eval=requireNamespace("coin", quietly = TRUE)} summary(tableby(sex ~ age.ordnew, data = mockstudy), pfootnote = TRUE) summary(tableby(sex ~ age.ord, data = mockstudy), pfootnote = TRUE) ``` ## 4. Summarize a survival variable First look at the information that is presented by the `survfit()` function, then see how the same results can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%). ```{r, eval=ge330} survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy) survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy) ``` ```{r, results='asis'} summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy)) ``` It is also possible to obtain summaries of the % survival at certain time points (say the probability of surviving 1-year). ```{r, eval=ge330} summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5) ``` ```{r, results='asis', eval=ge330} summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv"))) ``` ## 5. Summarize date variables Date variables by default are summarized with the number of missing values, the median, and the range. For example purposes we've created a random date. Missing values are introduced for impossible February dates. ```{r, results='asis'} set.seed(100) N <- nrow(mockstudy) mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), year=sample(2005:2009,N,replace=T)) summary(tableby(sex ~ dtentry, data=mockstudy)) ``` ## 6. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. ```{r, results='asis'} ## create a vector specifying the variable names myvars <- names(mockstudy) ## select the 8th through the last variables ## paste them together, separated by the + sign RHS <- paste(myvars[8:10], collapse="+") RHS ## create a formula using the as.formula function as.formula(paste('arm ~ ', RHS)) ## use the formula in the tableby function summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy)) ``` These steps can also be done using the `formulize` function. ```{r, results='asis'} ## The formulize function does the paste and as.formula steps tmp <- formulize('arm',myvars[8:10]) tmp ## More complex formulas could also be written using formulize tmp2 <- formulize('arm',c('ps','hgb^2','bmi')) ## use the formula in the tableby function summary(tableby(tmp, data=mockstudy)) ``` To change summary statistics or statistical tests en masse, consider using `paste0()` together with `formulize()`: ```{r results='asis'} varlist1 <- c('age','sex','hgb') varlist2 <- paste0("anova(", c('bmi','alk.phos','ast'), ", 'meansd')") summary(tableby(formulize("arm", c(varlist1, varlist2)), data = mockstudy, numeric.test = "kwt"), pfootnote = TRUE) ``` ## 7. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>5 and in the F: FOLFOX treatment group). * The first approach uses the subset function applied to the dataset `mockstudy`. This example also selects a subset of variables. The `tableby` function is then applied to this subsetted data. ```{r} newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi)) dim(mockstudy) table(mockstudy$arm) dim(newdata) names(newdata) ``` ```{r, results='asis'} summary(tableby(sex ~ ., data=newdata)) ``` * The second approach does the same analysis but uses the subset argument within `tableby` to subset the data. ```{r, results='asis'} summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) ``` ## 8. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex with(mockstudy, table(interaction(mdquality.s,sex))) ``` ```{r, results='asis'} summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy)) ``` ```{r, results='asis'} ## create a new grouping variable with combined levels of arm and sex summary(tableby(interaction(mdquality.s, sex) ~ age + bmi, data=mockstudy, subset=arm=="F: FOLFOX")) ``` ## 9. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols `/ - + ^ *` then surround the new variable by `I()`. ```{r, maketrans, results='asis'} trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')), data=mockstudy) summary(trans) ``` The labels for these variables aren't exactly what we'd like, so we can change modify those after the fact. Instead of typing out the very long variable names, you can modify specific labels by position. ```{r, assignlabels2} labels(trans) labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality') labels(trans) ``` ```{r, transsummary, results='asis'} summary(trans) ``` Note that if we had not changed `mdquality.s` to a factor, it would have been summarized as though it were a continuous variable. ```{r, results='asis'} class(mockstudy$mdquality.s) summary(tableby(arm~mdquality.s, data=mockstudy)) ``` Another option would be to specify the test and summary statistics. In fact, if I had a set of variables coded 0/1 and that was all I was summarizing, then I could change the global option for continuous variables to use the chi-square test and show countpct. ```{r, results='asis'} summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy)) ``` ## 10. Subsetting (change the ordering of the variables, delete a variable, sort by p-value, filter by p-value) ```{r, results='asis'} mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy) mytab2 <- mytab[c('age','sex','alk.phos')] summary(mytab2) summary(mytab[c('age','sex')], digits = 2) summary(mytab[c(3,1)], digits = 3) summary(sort(mytab, decreasing = TRUE)) summary(mytab[mytab < 0.5]) head(mytab, 1) # can also use tail() ``` ## 11. Merge two `tableby` objects together It is possible to combine two tableby objects so that they print out together. Overlapping by-variables will have their x-variables concatenated, and (if `all=TRUE`) non-overlapping by-variables will have their tables printed separately. ```{r, results="asis"} ## demographics tab1 <- tableby(arm ~ sex + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE)) ## lab data tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"), numeric.test="kwt", total=FALSE)) tab12 <- merge(tab1, tab2) class(tab12) summary(tab12) ``` For tables with two different outcomes, consider the `all=TRUE` argument: ```{r, results='asis'} summary(merge( tableby(sex ~ age, data = mockstudy), tableby(arm ~ bmi, data = mockstudy), all = TRUE )) ``` ## 12. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. ```{r, results='asis'} t1 <- tableby(arm ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` With multiple left-hand sides, you can pass a vector or list to determine labels for each table: ```{r, results='asis'} summary(tableby(list(arm, sex) ~ age, data = mockstudy), title = c("arm table", "sex table")) ``` ## 13. Modify how missing values are displayed Depending on the report you are writing you have the following options: * Show how many subjects have each variable * Show how many subjects are missing each variable * Show how many subjects are missing each variable only if there are any missing values * Don't indicate missing values at all ```{r} ## look at how many missing values there are for each variable apply(is.na(mockstudy),2,sum) ``` ```{r, results='asis'} ## Show how many subjects have each variable (non-missing) summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("N","median"), total=FALSE))) ## Always list the number of missing values summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE))) ## Only show the missing values if there are some (default) summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE))) ## Don't show N at all summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("mean"),total=FALSE))) ``` One might also consider the use of `includeNA()` to include NAs in the counts and percents for categorical variables. ```{r, results = 'asis'} mockstudy$ps.cat <- factor(mockstudy$ps) attr(mockstudy$ps.cat, "label") <- "ps" summary(tableby(sex ~ includeNA(ps.cat), data = mockstudy, cat.stats = "countpct")) ``` ## 14. Modify the number of digits used Within tableby.control function there are 4 options for controlling the number of significant digits shown. * digits: controls the number of digits after the decimal place for continuous values * digits.count: controls the number of digits after the decimal point for counts * digits.pct: controls the number of digits after the decimal point for percents * digits.p: controls the number of digits after the decimal point for p-values ```{r, results='asis'} summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.p=2, digits.pct=1) ``` With the exception of `digits.p`, all of these can be specified on a per-variable basis using the in-formula functions that specify which tests are run: ```{r results='asis'} summary(tableby(arm ~ chisq(sex, digits.pct=1) + anova(age, digits=4) + anova(fu.time, digits = 1), data=mockstudy)) ``` ## 15. Create a user-defined summary statistic For purposes of this example, the code below creates a trimmed mean function (trims 10%) and use that to summarize the data. Note the use of the `...` which tells R to pass extra arguments on - this is required for user-defined functions. In this case, `na.rm=T` is passed to `myfunc`. The *weights* argument is also required, even though it isn't passed on to the internal function in this particular example. ```{r, results='asis'} trim10 <- function(x, weights=rep(1,length(x)), ...){ mean(x, trim=.1, ...) } summary(tableby(sex ~ hgb, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","trim10"), numeric.test="kwt", stats.labels=list(Nmiss='Missing values', trim10="Trimmed Mean, 10%")))) ``` For statistics to be formatted appropriately, you may want to use `as.tbstat()` or `as.countpct()`. For example, suppose you want to create a trimmed mean function that trims by both 5 and 10 percent. The first example shows them separated by a comma; the second puts the 10% trimmed mean in brackets ```{r, results='asis'} trim510comma <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = ", ") } trim510bracket <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = " ", parens = c("[", "]")) } summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510comma"), test = FALSE)) summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510bracket"), test = FALSE)) ``` Or perhaps it's useful to put the amount of trimming in parentheses. Since it is a percent, we can flag it as such: ```{r results='asis'} trim10pct <- function(x, weights=rep(1,length(x)), ...){ tmp <- mean(x, trim = 0.05, ...) as.countpct(c(tmp, 10), sep = " ", parens = c("(", ")"), which.count = 0, which.pct = 2, pct = "%") } summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim10pct"), digits = 2, digits.pct = 0, test = FALSE)) ``` ## 16. Use case-weights for creating summary statistics When comparing groups, they are often unbalanced when it comes to nuisances such as age and sex. The `tableby` function allows you to create weighted summary statistics. If this option us used then p-values are not calculated (`test=FALSE`). ```{r} ##create fake group that is not balanced by age/sex set.seed(200) mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)), sample(c('A','B'),replace=T, prob=c(.8,.4))) mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE) ## create weights based on agegp and sex distribution tab1 <- with(mockstudy,table(agegp, sex)) tab2 <- with(mockstudy, table(agegp, sex, fake_arm)) tab2 gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2 gpwts[gpwts>50] <- 30 ## apply weights to subjects index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$fake_arm, summary) ``` ```{r, results='asis', eval=ge330} orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE) summary(orig, title='No Case Weights used') tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts) summary(tab1, title='Case Weights used') ``` ## 17. Create your own p-value and add it to the table When using weighted summary statistics, it is often desirable to then show a p-value from a model that corresponds to the weighted analysis. It is possible to add your own p-value and modify the column title for that new p-value. Another use for this would be to add standardized differences or confidence intervals instead of a p-value. To add the p-value, you simply need to create a data frame and use the function `modpval.tableby()`. The first few columns in the data.frame are required: (1) the by-variable, (2) the strata value (if the table has a strata term), (3) the x-variable, and (4) the new p-value (or test statistic). Another optional column can be used to indicate what method was used to calculate the p-value. If you specify `use.pname=TRUE` then the column name indicating the p-value will be also be used in the tableby summary. ```{r, results='asis', eval=ge330} mypval <- data.frame( byvar = "fake_arm", variable = c('age','sex','Surv(fu.time/365, fu.stat)'), adj.pvalue = c(.953,.811,.01), method = c('Age/Sex adjusted model results') ) tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE) summary(tab2, title='Case Weights used, p-values added', pfootnote=TRUE) ``` ## 18. For two-level categorical variables or one-line numeric variables, simplify the output. If the `cat.simplify` option is set to `TRUE`, then only the second level of two-level categorical varialbes is shown. In the example below, `sex` has two levels, and "Female" is the second level, hence only the counts and percents for Female are shown. Similarly, "mdquality.s" was turned to a factor, and "1" is the second level, but since there are missings, the table ignores `cat.simplify` and displays all levels (since the output can no longer be displayed on one line). ```{r, results='asis'} table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE) summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality")) ``` Similarly, if `numeric.simplify` is set to `TRUE`, then any numerics which only have one row of summary statistics are simplified into a single row. Note again that `ast` has missing values and so is not simplified to a single row. ```{r results='asis'} summary(tableby(arm ~ age + ast, data = mockstudy, numeric.simplify=TRUE, numeric.stats=c("Nmiss", "meansd"))) ``` The in-formula functions to change which tests are run can also be used to specify these options for each variable at a time. ```{r results='asis'} summary(tableby(arm ~ anova(age, "meansd", numeric.simplify=TRUE) + chisq(sex, cat.simplify=TRUE), data = mockstudy)) ``` The `cat.simplify` and `ord.simplify` argument also accept the special string `"label"`, which appends the shown level's label to the overall label: ```{r results='asis'} summary(tableby(arm ~ sex, cat.simplify = "label", data = mockstudy)) ``` ## 19. Use `tableby` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. ``` \documentclass{article} \usepackage{longtable} \usepackage{pdfpages} \begin{document} \section{Read in Data} <>= require(arsenal) require(knitr) require(rmarkdown) data(mockstudy) tab1 <- tableby(arm~sex+age, data=mockstudy) @ \section{Convert Summary.Tableby to LaTeX} <>= capture.output(summary(tab1), file="Test.md") ## Convert R Markdown Table to LaTeX render("Test.md", pdf_document(keep_tex=TRUE)) @ \includepdf{Test.pdf} \end{document} ``` ## 20. Export `tableby` object to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. ```{r} tab1 <- summary(tableby(arm~sex+age, data=mockstudy), text = NULL) as.data.frame(tab1) # write.csv(tab1, '/my/path/here/my_table.csv') ``` ## 21. Write `tableby` object to a separate Word or HTML file ```{r eval = FALSE} ## write to an HTML document tab1 <- tableby(arm ~ sex + age, data=mockstudy) write2html(tab1, "~/trash.html") ## write to a Word document write2word(tab1, "~/trash.doc", title="My table in Word") ``` ## 22. Use `tableby` in R Shiny The easiest way to output a `tableby()` object in an R Shiny app is to use the `tableOutput()` UI in combination with the `renderTable()` server function and `as.data.frame(summary(tableby()))`: ```{r eval=FALSE} # A standalone shiny app library(shiny) library(arsenal) data(mockstudy) shinyApp( ui = fluidPage(tableOutput("table")), server = function(input, output) { output$table <- renderTable({ as.data.frame(summary(tableby(sex ~ age, data = mockstudy), text = "html")) }, sanitize.text.function = function(x) x) } ) ``` This can be especially powerful if you feed the selections from a `selectInput(multiple = TRUE)` into `formulize()` to make the table dynamic! ## 23. Use `tableby` in bookdown Since the backbone of `tableby()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.tableby()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(tableby(sex ~ age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` ## 24. Adjust `tableby` for multiple p-values The `padjust()` function is a new S3 generic piggybacking off of `p.adjust()`. It works on both `tableby` and `summary.tableby` objects: ```{r results='asis'} tab <- summary(tableby(sex ~ age + fu.time + bmi + mdquality.s, data = mockstudy)) tab padjust(tab, method = "bonferroni") ``` ## 25. Tabulate multiple endpoints You can now use `list()` on the left-hand side of `tableby()` to give multiple endpoints. ```{r results='asis'} summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy)) ``` To avoid confusion about which table is which endpoint, you can set `term.name=TRUE` in `summary()`. This takes the labels for each by-variable and puts them in the top-left of the table. ```{r results='asis'} summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy), term.name = TRUE) ``` ## 26. Tabulate data by a non-test group (strata) You can also specify a second grouping variable that doesn't get tested (but instead separates results): a *strata* variable. ```{r results='asis'} summary(tableby(list(sex, ps) ~ age + bmi, strata = arm, data = mockstudy)) ``` # Available Function Options ## Summary statistics The **default** summary statistics, by varible type, are: * `numeric.stats`: Continuous variables will show by default `Nmiss, meansd, range` * `cat.stats`: Categorical and factor variables will show by default `Nmiss, countpct` * `ordered.stats`: Ordered factors will show by default `Nmiss, countpct` * `surv.stats`: Survival variables will show by default `Nmiss, Nevents, medsurv` * `date.stats`: Date variables will show by default `Nmiss, median, range` There are a number of extra functions defined specifically for the tableby function. * `N`: a count of the number of observations for a particular group * `Nmiss`: only show the count of the number of missing values if there are some missing values * `Nmiss2`: always show a count of the number of missing values for a variable within each group * `meansd`: print the mean and standard deviation in the format `mean(sd)` * `meanse`: print the mean and standard error in the format `mean(se)` * `meanCI`: print the mean and a (t) confidence interval * `count`: print the number of values in a category * `countN`: print the number of values in a category plus the total N for the group in the format `N/Total` * `countpct`: print the number of values in a category plus the column-percentage in the format `N (%)` * `countrowpct`: print the number of values in a category plus the row-percentage in the format `N (%)` * `countcellpct`: print the number of values in a category plus the cell-percentage in the format `N (%)` * `binomCI`: print the proportion in a category plus a binomial confidence interval. * `rowbinomCI`: print the row proportion in a category plus a binomial confidence interval. * `medianq1q3`: print the median, 25th, and 75th quantiles `median (Q1, Q3)` * `q1q3`: print the 25th and 75th quantiles `Q1, Q3` * `iqr`: print the inter-quartile range. * `medianrange`: print the median, minimum and maximum values `median (minimum, maximum)` * `medianmad`: print the median and median absolute deviation (mad) * `Nevents`: print number of events for a survival object within each grouping level * `medSurv`: print the median survival * `NeventsSurv`: print number of events and survival at given times * `NriskSurv`: print the number still at risk and survival at given times * `Nrisk`: print the number still at risk at given times * `medTime`: print the median follow-up time * `sum` * `max` * `min` * `mean` * `sd` * `var` * `median` * `range` * `gmean`, `gsd`, `gmeansd`, `gmeanCI`: geometric means, sds, and confidence intervals. ## Testing options The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function. The following tests are accepted: * `anova`: analysis of variance test; the default test for continuous variables. When the grouping variable has two levels, it is equivalent to the two-sample t-test with equal variance. * `kwt`: Kruskal-Wallis test, optional test for continuous variables. When the grouping variable has two levels, it is equivalent to the Wilcoxon Rank Sum test. * `wt`: An explicit Wilcoxcon test. * `medtest`: Median test test, optional test for continuous variables. * `chisq`: chi-square goodness of fit test for equal counts of a categorical variable across categories; the default for categorical or factor variables * `fe`: Fisher's exact test for categorical variables; optional * `logrank`: log-rank test, the default test for time-to-event variables * `trend`: The `independence_test` function from the `coin` is used to test for trends. Whenthe grouping variable has two levels, it is equivalent to the Armitage trend test. This is the default for ordered factors * `notest`: Don't perform a test. ## `tableby.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `tableby.control` or in `summary.tableby`. ```{r} args(tableby.control) ``` ## `summary.tableby` settings The `summary.tableby` function has options that modify how the table appears (such as adding a title or modifying labels). ```{r} args(arsenal:::summary.tableby) ``` arsenal/vignettes/labels.Rmd0000644000176200001440000000717513656527336015647 0ustar liggesusers--- title: "A Few Notes on Labels" author: "Ethan Heinzen" output: rmarkdown::html_vignette: toc: true vignette: | %\VignetteIndexEntry{A Few Notes on Labels} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = FALSE} knitr::opts_chunk$set(message = FALSE, results = 'asis') ``` # Introduction The `arsenal` package relies somewhat heavily on variable labels to make output more "pretty". A `label` here is understood to be a single character string with "pretty" text (i.e., not an "ugly" variable name). Three of the main `arsenal` function use labels in their `summary()` output. There are several ways to set these labels. We'll use the `mockstudy` dataset for all examples here: ```{r} library(arsenal) data(mockstudy) library(magrittr) # for 'freqlist' examples tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA="ifany") ``` # Examples ## Set labels in the function call The `summary()` method for `tableby()`, `modelsum()`, and `freqlist()` objects contains a `labelTranslations = ` argument to specify labels in the function call. Note that the `freqlist()` function matches labels in order, whereas the other two match labels by name. The labels can be input as a list or a character vector. ```{r} summary(freqlist(tab.ex), labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL")) summary(tableby(arm ~ sex + age, data = mockstudy), labelTranslations = c(sex = "SEX", age = "Age, yrs")) summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = list(sexFemale = "Female", age = "Age, yrs")) ``` ## Modify labels after the fact Another option is to add labels after you have created the object. To do this, you can use the form `labels(x) <- value` or use the pipe-able version, `set_labels()`. ```{r} # the non-pipe version; somewhat clunky tmp <- freqlist(tab.ex) labels(tmp) <- c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL") summary(tmp) # piped--much cleaner mockstudy %>% tableby(arm ~ sex + age, data = .) %>% set_labels(c(sex = "SEX", age = "Age, yrs")) %>% summary() mockstudy %>% modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% set_labels(list(sexFemale = "Female", age = "Age, yrs")) %>% summary() ``` ## Add labels to a `data.frame` `tableby()` and `modelsum()` also allow you to have label attributes on the data. Note that by default these attributes usually get dropped upon subsetting, but `tableby()` and `modelsum()` use the `keep.labels()` function to retain them. ```{r} mockstudy.lab <- keep.labels(mockstudy) class(mockstudy$age) class(mockstudy.lab$age) ``` To undo this, simply `loosen.labels()`: ```{r} class(loosen.labels(mockstudy.lab)$age) ``` You can set attributes one at a time in two ways: ```{r} attr(mockstudy.lab$sex, "label") <- "Sex" labels(mockstudy.lab$age) <- "Age, yrs" ``` ...or all at once: ```{r} labels(mockstudy.lab) <- list(sex = "Sex", age = "Age, yrs") summary(tableby(arm ~ sex + age, data = mockstudy.lab)) ``` You can pipe this, too. ```{r} mockstudy %>% set_labels(list(sex = "SEX", age = "Age, yrs")) %>% modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% summary() ``` To extract labels from a `data.frame`, simply use the `labels()` function: ```{r results='markdown'} labels(mockstudy.lab) ``` ## When labels get long `tableby()` and `modelsum()` both support the wrapping of long labels. Consider the `width=` argument in the `print()` function: ```{r} mockstudy %>% set_labels(list(age = "This is a really long label for the arm variable")) %>% tableby(sex ~ age, data = .) %>% summary() %>% print(width = 20) ``` arsenal/vignettes/modelsum.Rmd0000644000176200001440000007664414051207602016216 0ustar liggesusers--- title: "The modelsum function" author: "Beth Atkinson, Ethan Heinzen, Pat Votruba, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The modelsum function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, echo=FALSE, message=FALSE, results='hide', warning=FALSE} require(knitr) require(broom) require(MASS) require(pROC) require(rpart) opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE, collapse=TRUE) ``` # Introduction Very often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there may be several independent variables that we want to examine univariately or adjusted for certain variables such as age and sex. Locally at Mayo, the SAS macros `%modelsum`, `%glmuniv`, and `%logisuni` were written to create such summary tables. With the increasing interest in R, we have developed the function `modelsum` to create similar tables within the R environment. In developing the `modelsum` function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R's strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `modelsum`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at [rmarkdown.rstudio.com](https://rmarkdown.rstudio.com/). # Simple Example The first step when using the `modelsum` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples. ```{r, load-data} require(arsenal) data(mockstudy) # load data dim(mockstudy) # look at how many subjects and variables are in the dataset # help(mockstudy) # learn more about the dataset and variables str(mockstudy) # quick look at the data ``` To create a simple linear regression table (the default), use a formula statement to specify the variables that you want summarized. The example below predicts BMI with the variables sex and age. ```{r simple1} tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) ``` If you want to take a quick look at the table, you can use `summary` on your modelsum object and the table will print out as text in your R console window. If you use `summary` without any options you will see a number of $\ $ statements which translates to "space" in HTML. ## Pretty text version of table If you want a nicer version in your console window then adding the `text=TRUE` option. ```{r simple-text} summary(tab1, text=TRUE) ``` ## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names. ```{r simple-markdown, results='asis'} summary(tab1) ``` ## Data frame version of table If you want a data.frame version, simply use `as.data.frame`. ```{r} as.data.frame(tab1) ``` ## Add an adjustor to the model The argument `adjust` allows the user to indicate that all the variables should be adjusted for these terms. To adjust each model for age and sex (for instance), we use `adjust = ~ age + sex`: ```{r adjust, results="asis"} tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy) summary(tab2) ``` # Models for each endpoint type To make sure the correct model is run you need to specify "family". The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added. ## Gaussian ### Fit and summarize linear regression model Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship). ```{r} fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy) summary(fit) plot(fit) ``` The results suggest that the endpoint may need to be transformed. Calculating the Box-Cox transformation suggests a log transformation. ```{r} require(MASS) boxcox(fit) ``` ```{r} fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy) summary(fit2) plot(fit2) ``` Finally, look to see whether there there is a non-linear relationship with age. ```{r} require(splines) fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy) # test whether there is a difference between models stats::anova(fit2,fit3) # look at functional form of age termplot(fit3, term=2, se=T, rug=T) ``` In this instance it looks like there isn't enough evidence to say that the relationship is non-linear. ### Extract data using the `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tmp <- tidy(fit3) # coefficients, p-values class(tmp) tmp glance(fit3) ``` ### Create a summary table using modelsum ```{r, results="asis"} ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, family=gaussian, gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value")) summary(ms.logy) ``` ## Binomial ### Fit and summarize logistic regression model ```{r} boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s') fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial) summary(fit) # create Odd's ratio w/ confidence intervals tmp <- data.frame(summary(fit)$coef) tmp tmp$OR <- round(exp(tmp[,1]),2) tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2) tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2) names(tmp)[4] <- 'P-value' kable(tmp[,c('OR','lower.CI','upper.CI','P-value')]) # Assess the predictive ability of the model # code using the pROC package require(pROC) pred <- predict(fit, type='response') tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE) tmp$auc ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals glance(fit) # model summary statistics ``` ### Create a summary table using modelsum ```{r, results="asis"} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial)) fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial, binomial.stats=c("Nmiss2","OR","p.value")) summary(fitall) ``` ## Survival ### Fit and summarize a Cox regression model ```{r survival} require(survival) # multivariable model with all 3 terms fit <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy) summary(fit) # check proportional hazards assumption fit.z <- cox.zph(fit) fit.z plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case abline(h=coef(fit)[1], col='red') # check functional form for age using pspline (penalized spline) # results are returned for the linear and non-linear components fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy) fit2 # plot smoothed age to visualize why significant termplot(fit2, se=T, terms=1) abline(h=0) # The c-statistic comes out in the summary of the fit summary(fit2)$concordance # It can also be calculated using the survConcordance function survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy) ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ``` ### Create a summary table using modelsum ```{r results="asis"} ##Note: You must use quotes when specifying family="survival" ## family=survival will not work summary(modelsum(Surv(fu.time, fu.stat) ~ arm, adjust=~age + sex, data=mockstudy, family="survival")) ##Note: the pspline term is not working yet #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, # adjust=~pspline(age) + sex, data=mockstudy, family='survival')) ``` ## Poisson Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely related and survival data can be summarized using Poisson regression. If you have overdispersion (see if the residual deviance is much larger than degrees of freedom), you may want to use `quasipoisson()` instead of `poisson()`. Some of these diagnostics need to be done outside of `modelsum`. ### Example 1: fit and summarize a Poisson regression model For the first example, use the solder dataset available in the `rpart` package. The endpoint `skips` has a definite Poisson look. ```{r poisson} require(rpart) ##just to get access to solder dataset data(solder) hist(solder$skips) fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson) stats::anova(fit, test='Chi') summary(fit) ``` Overdispersion is when the Residual deviance is larger than the degrees of freedom. This can be tested, approximately using the following code. The goal is to have a p-value that is $>0.05$. ```{r} 1-pchisq(fit$deviance, fit$df.residual) ``` One possible solution is to use the quasipoisson family instead of the poisson family. This adjusts for the overdispersion. ```{r} fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson) summary(fit2) ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ``` ### Create a summary table using modelsum ```{r results='asis'} summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson")) summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson")) ``` ### Example 2: fit and summarize a Poisson regression model This second example uses the survival endpoint available in the `mockstudy` dataset. There is a close relationship between survival and Poisson models, and often it is easier to fit the model using Poisson regression, especially if you want to present absolute risk. ```{r} # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson) summary(fit) 1-pchisq(fit$deviance, fit$df.residual) coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy)) coef(fit)[-1] # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio) exp(coef(fit)[-1]) # As before, we can model the dispersion which alters the standard error fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=quasipoisson) summary(fit2) ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit) ##coefficients, p-values glance(fit) ##model summary statistics ``` ### Create a summary table using `modelsum` Remember that the result from `modelsum` is different from the `fit` above. The `modelsum` summary shows the results for `age + offset(log(fu.time+.01))` then `sex + offset(log(fu.time+.01))` instead of `age + sex + arm + offset(log(fu.time+.01))`. ```{r results="asis", eval=TRUE} summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, data=mockstudy, family=poisson)) ``` # Additional Examples Here are multiple examples showing how to use some of the different options. ## 1. Change summary statistics globally There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard settings and use that for future tables. ```{r, results='asis'} mycontrols <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), show.adjust=FALSE, show.intercept=FALSE) tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols) summary(tab2) ``` You can also change these settings directly in the modelsum call. ```{r, results='asis'} tab3 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), show.intercept=FALSE, show.adjust=FALSE) summary(tab3) ``` ## 2. Add labels to independent variables In the above example, age is shown with a label (Age in Years), but sex is listed "as is". This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R. ```{r check-labels} ## Look at one variable's label attr(mockstudy$age,'label') ## See all the variables with a label unlist(lapply(mockstudy,'attr','label')) ## or cbind(sapply(mockstudy,attr,'label')) ``` If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset. ```{r add-label, results='asis'} attr(mockstudy$age,'label') <- 'Age, yrs' tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy) summary(tab1) ``` You can also use the built-in `data.frame` method for `labels<-`: ```{r, results = 'asis'} labels(mockstudy) <- c(age = 'Age, yrs') tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy) summary(tab1) ``` Another option is to add labels after you have created the table ```{r, results='asis'} mylabels <- list(sexFemale = "Female", age ="Age, yrs") summary(tab1, labelTranslations = mylabels) ``` Alternatively, you can check the variable labels and manipulate them with a function called `labels`, which works on the `modelsum` object. ```{r, eval=TRUE} labels(tab1) labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)") labels(tab1) ``` ```{r, results='asis'} summary(tab1) ``` ## 3. Don't show intercept values ```{r, results='asis'} summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE) ``` ## 4. Don't show results for adjustment variables ```{r, results='asis'} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial), show.adjust=FALSE) ``` ## 5. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. ```{r, results='asis'} # create a vector specifying the variable names myvars <- names(mockstudy) # select the 8th through the 12th # paste them together, separated by the + sign RHS <- paste(myvars[8:12], collapse="+") RHS # create a formula using the as.formula function as.formula(paste('mdquality.s ~ ', RHS)) # use the formula in the modelsum function summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy)) ``` These steps can also be done using the `formulize` function. ```{r, results='asis'} ## The formulize function does the paste and as.formula steps tmp <- formulize('mdquality.s',myvars[8:10]) tmp ## More complex formulas could also be written using formulize tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)')) ## use the formula in the modelsum function summary(modelsum(tmp, data=mockstudy, family=binomial)) ``` ## 6. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group). * The first approach uses the subset function applied to the dataset `mockstudy`. This example also selects a subset of variables. The `modelsum` function is then applied to this subsetted data. ```{r} newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos)) dim(mockstudy) table(mockstudy$arm) dim(newdata) names(newdata) ``` ```{r, results='asis'} summary(modelsum(alk.phos ~ ., data=newdata)) ``` * The second approach does the same analysis but uses the subset argument within `modelsum` to subset the data. ```{r, results='asis', eval=TRUE} summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy)) summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy)) ``` ## 7. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex with(mockstudy, table(interaction(mdquality.s,sex))) ``` ```{r, results='asis'} summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy)) ``` ## 8. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols `/ - + ^ *` then surround the new variable by `I()`. ```{r, results='asis'} summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s, data=mockstudy, family=binomial)) ``` ## 9. Change the ordering of the variables or delete a variable ```{r, results='asis'} mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy) mytab2 <- mytab[c('age','sex','alk.phos')] summary(mytab2) summary(mytab[c('age','sex')]) summary(mytab[c(3,1)]) ``` ## 10. Merge two `modelsum` objects together It is possible to merge two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is sometimes easier to combine two models of the same family (such as two sets of linear models). Overlapping y-variables will have their x-variables concatenated, and (if `all=TRUE`) non-overlapping y-variables will have their tables printed separately. ```{r, results="asis"} ## demographics tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) ## lab data tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial) tab12 <- merge(tab1, tab2, all = TRUE) class(tab12) summary(tab12) ``` ## 11. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. ```{r, results='asis'} t1 <- modelsum(bmi ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` ## 12. Modify how missing values are treated Depending on the report you are writing you have the following options: * Use all values available for each variable * Use only those subjects who have measurements available for all the variables ```{r} ## look at how many missing values there are for each variable apply(is.na(mockstudy),2,sum) ``` ```{r, results='asis'} ## Show how many subjects have each variable (non-missing) summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("N","estimate")))) ## Always list the number of missing values summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("Nmiss2","estimate")))) ## Only show the missing values if there are some (default) summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("Nmiss","estimate")))) ## Don't show N at all summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("estimate")))) ``` ## 13. Modify the number of digits used Within modelsum.control function there are 3 options for controlling the number of significant digits shown. * digits: controls the number of digits after the decimal point for continuous values * digits.ratio: controls the number of digits after the decimal point for continuous values * digits.p: controls the number of digits after the decimal point for continuous values ```{r, results='asis'} summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2) ``` ## 14. Use case-weights in the models Occasionally it is of interest to fit models using case weights. The `modelsum` function allows you to pass on the weights to the models and it will do the appropriate fit. ```{r} mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE) ## create weights based on agegp and sex distribution tab1 <- with(mockstudy,table(agegp, sex)) tab1 tab2 <- with(mockstudy, table(agegp, sex, arm)) gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2 ## apply weights to subjects index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$arm, summary) ``` ```{r results='asis'} mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL') tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), family=binomial) summary(tab1, title='No Case Weights used') suppressWarnings({ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), weights=wts, family=binomial) summary(tab2, title='Case Weights used') }) ``` ## 15. Use `modelsum` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. ``` \documentclass{article} \usepackage{longtable} \usepackage{pdfpages} \begin{document} \section{Read in Data} <>= require(arsenal) require(knitr) require(rmarkdown) data(mockstudy) tab1 <- modelsum(bmi~sex+age, data=mockstudy) @ \section{Convert Summary.modelsum to LaTeX} <>= capture.output(summary(tab1), file="Test.md") ## Convert R Markdown Table to LaTeX render("Test.md", pdf_document(keep_tex=TRUE)) @ \includepdf{Test.pdf} \end{document} ``` ## 16. Export `modelsum` results to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. ```{r} summary(tab2, text=T) tmp <- as.data.frame(summary(tab2, text = TRUE)) tmp # write.csv(tmp, '/my/path/here/mymodel.csv') ``` ## 17. Write `modelsum` object to a separate Word or HTML file ```{r eval = FALSE} ## write to an HTML document write2html(tab2, "~/ibm/trash.html") ## write to a Word document write2word(tab2, "~/ibm/trash.doc", title="My table in Word") ``` ## 18. Use `modelsum` in R Shiny The easiest way to output a `modelsum()` object in an R Shiny app is to use the `tableOutput()` UI in combination with the `renderTable()` server function and `as.data.frame(summary(modelsum()))`: ```{r eval=FALSE} # A standalone shiny app library(shiny) library(arsenal) data(mockstudy) shinyApp( ui = fluidPage(tableOutput("table")), server = function(input, output) { output$table <- renderTable({ as.data.frame(summary(modelsum(age ~ sex, data = mockstudy), text = "html")) }, sanitize.text.function = function(x) x) } ) ``` This can be especially powerful if you feed the selections from a `selectInput(multiple = TRUE)` into `formulize()` to make the table dynamic! ## 23. Use `modelsum` in bookdown Since the backbone of `modelsum()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.modelsum()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(modelsum(age ~ sex, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` ## 24. Model multiple endpoints You can now use `list()` on the left-hand side of `modelsum()` to give multiple endpoints. Note that only one "family" can be specified this way (use `merge()` instead if you want multiple families). ```{r results='asis'} summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy)) ``` To avoid confusion about which table is which endpoint, you can set `term.name=TRUE` in `summary()`. This takes the labels for each endpoint and puts them in the top-left of the table. ```{r results='asis'} summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy), term.name = TRUE) ``` ## 25. Model data by a non-test group (strata) You can also specify a grouping variable that doesn't get tested (but instead separates results): a *strata* variable. ```{r results='asis'} summary(modelsum(list(age, hgb) ~ bmi + sex, strata = arm, data = mockstudy)) ``` ## 26. Add multiple sets of adjustors to the model By putting multiple formulas into a list, you can use multiple sets of adjustors. Use `~ 1` or `NULL` for an "unadjusted" model. By using the `adjustment.names=TRUE` argument and giving names to your adjustor sets in the list, you can name the various analyses. ```{r} adj.list <- list( Unadjusted = ~ 1, # can also specify NULL here "Adjusted for Arm" = ~ arm ) multi.adjust <- modelsum(list(age, bmi) ~ fu.time + ast, adjust = adj.list, data = mockstudy) summary(multi.adjust, adjustment.names = TRUE) summary(multi.adjust, adjustment.names = TRUE, show.intercept = FALSE, show.adjust = FALSE) ``` # Available Function Options ## Summary statistics The available summary statistics, by varible type, are: * `ordinal`: Ordinal logistic regression models + default: `Nmiss, OR, CI.lower.OR, CI.upper.OR, p.value` + optional: `estimate, CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate,` `N, Nmiss2, endpoint, std.error, statistic, logLik, AIC, BIC, edf, deviance, df.residual, p.value.lrt` * `binomial`,`quasibinomial`: Logistic regression models + default: `OR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss` + optional: `estimate, CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate,` `CI.wald, CI.lower.wald, CI.upper.wald, CI.OR.wald, CI.lower.OR.wald, CI.upper.OR.wald,` `N, Nmiss2, Nevents, endpoint, std.error, statistic, logLik, AIC, BIC, null.deviance, deviance, df.residual, df.null, p.value.lrt` * `gaussian`: Linear regression models + default: `estimate, std.error, p.value, adj.r.squared, Nmiss` + optional: `CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, statistic,` `standard.estimate, endpoint, r.squared, AIC, BIC, logLik, statistic.F, p.value.F, p.value.lrt` * `poisson`, `quasipoisson`: Poisson regression models + default: `RR, CI.lower.RR, CI.upper.RR, p.value, Nmiss` + optional: `CI.RR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, std.error,` `estimate, statistic, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null, p.value.lrt` * `negbin`: Negative binomial regression models + default: `RR, CI.lower.RR, CI.upper.RR, p.value, Nmiss` + optional: `CI.RR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, std.error, estimate,` `statistic, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null, theta, SE.theta, p.value.lrt` * `clog`: Conditional Logistic models + default: `OR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss` + optional: `CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, std.error, endpoint, Nevents, statistic,` `r.squared, r.squared.max, logLik, AIC, BIC, statistic.log, p.value.log, statistic.sc, p.value.sc,` `statistic.wald, p.value.wald, N, std.error.concordance, p.value.lrt` * `survival`: Cox models + default: `HR, CI.lower.HR, CI.upper.HR, p.value, concordance, Nmiss` + optional: `CI.HR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, std.error, endpoint,` `Nevents, statistic, r.squared, r.squared.max, logLik, AIC, BIC, statistic.log, p.value.log, statistic.sc, p.value.sc,` `statistic.wald, p.value.wald, N, std.error.concordance, p.value.lrt` The full description of these parameters that can be shown for models include: * `N`: a count of the number of observations used in the analysis * `Nmiss`: only show the count of the number of missing values if there are some missing values * `Nmiss2`: always show a count of the number of missing values for a model * `endpoint`: dependent variable used in the model * `std.err`: print the standard error * `statistic`: test statistic * `statistic.F`: test statistic (F test) * `p.value`: print the p-value * `p.value.lrt`: print the likelihood ratio p-value for *the main effect only* (not the adjustors) * `r.squared`: print the model R-square * `adj.r.squared`: print the model adjusted R-square * `r.squared.max`: print the model R-square * `concordance`: print the model C statistic (which is the AUC for logistic models) * `logLik`: print the loglikelihood value * `p.value.log`: print the p-value for the overall model likelihood test * `p.value.wald`: print the p-value for the overall model wald test * `p.value.sc`: print the p-value for overall model score test * `AIC`: print the Akaike information criterion * `BIC`: print the Bayesian information criterion * `null.deviance`: null deviance * `deviance`: model deviance * `df.residual`: degrees of freedom for the residual * `df.null`: degrees of freedom for the null model * `dispersion`: This is used in Poisson models and is defined as the deviance/df.residual * `statistic.sc`: overall model score statistic * `statistic.wald`: overall model score statistic * `statistic.log`: overall model score statistic * `std.error.concordance`: standard error for the C statistic * `HR`: print the hazard ratio (for survival models), i.e. exp(beta) * `CI.lower.HR, CI.upper.HR`: print the confidence interval for the HR * `OR`: print the odd's ratio (for logistic models), i.e. exp(beta) * `CI.lower.OR, CI.upper.OR`: print the confidence interval for the OR * `CI.lower.OR.wald, CI.upper.OR.wald`: print the Wald confidence interval for the OR * `RR`: print the risk ratio (for poisson models), i.e. exp(beta) * `CI.lower.RR, CI.upper.RR`: print the confidence interval for the RR * `estimate`: print beta coefficient * `standardized.estimate`: print the standardized beta coefficient * `CI.lower.estimate, CI.upper.estimate`: print the confidence interval for the beta coefficient * `CI.lower.wald, CI.upper.wald`: print the Wald confidence interval for the beta coefficient * `edf`: print the effective degrees of freedom. * `theta`: print the estimate of theta. * `SE.theta`: print the estimate of theta's standard error. ## `modelsum.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `modelsum.control` or in `summary.modelsum`. ```{r} args(modelsum.control) ``` ## `summary.modelsum` settings The summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels). ```{r} args(arsenal:::summary.modelsum) ``` arsenal/vignettes/comparedf.Rmd0000644000176200001440000003260413656527336016340 0ustar liggesusers--- title: "The comparedf function" author: "Ethan Heinzen, Ryan Lennon, Andrew Hanson" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The comparedf function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = FALSE} knitr::opts_chunk$set(eval = TRUE, message = FALSE, results = 'asis', comment='') options(width = 120) ``` # Introduction The `comparedf()` function can be used to determine and report differences between two data.frames. It was written in the spirit of replacing `PROC COMPARE` from SAS. ```{r results = 'asis'} library(arsenal) ``` Why "comparedf"? We originally called this function `compare.data.frame()`, using `testthat::compare()` as our S3 generic, but that ended up getting us in trouble because of conflicting object structures. Why this didn't occur to us at the time remains a mystery. To replace it, we brainstormed several ideas (`comparedf()`, `dfcompare()`, `collate()`, `comparison()`) but settled on the former for three reasons: 1. There were no other objects with that generic or class (see `testthat::compare()` and `compare::compare()`). 2. It is mnemonically easy to remember (we "compare data.frames", not "data.frames compare"). 3. It tab auto-completes from the original "compare". # Basic examples We first build two similar data.frames to compare. ```{r} df1 <- data.frame(id = paste0("person", 1:3), a = c("a", "b", "c"), b = c(1, 3, 4), c = c("f", "e", "d"), row.names = paste0("rn", 1:3), stringsAsFactors = FALSE) df2 <- data.frame(id = paste0("person", 3:1), a = c("c", "b", "a"), b = c(1, 3, 4), d = paste0("rn", 1:3), row.names = paste0("rn", c(1,3,2)), stringsAsFactors = FALSE) ``` To compare these datasets, simply pass them to the `comparedf()` function: ```{r results='markup'} comparedf(df1, df2) ``` Use `summary()` to get a more detailed summary ```{r} summary(comparedf(df1, df2)) ``` By default, the datasets are compared row-by-row. To change this, use the `by=` or `by.x=` and `by.y=` arguments: ```{r} summary(comparedf(df1, df2, by = "id")) ``` # A larger example Let's muck up the `mockstudy` data. ```{r} data(mockstudy) mockstudy2 <- muck_up_mockstudy() ``` We've changed row order, so let's compare by the case ID: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case")) ``` # Column name comparison options It is possible to change which column names are considered "the same variable". ## Ignoring case For example, to ignore case in variable names (so that `Arm` and `arm` are considered the same), pass `tol.vars = "case"`. You can do this using `comparedf.control()` ```{r eval = FALSE} summary(comparedf(mockstudy, mockstudy2, by = "case", control = comparedf.control(tol.vars = "case"))) ``` or pass it through the `...` arguments. ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = "case")) ``` ## Treating dots and underscores the same (equivalence classes) It is possible to treat certain characters or sets of characters as the same by passing a character vector of equivalence classes to the `tol.vars=` argument. In short, each string in the vector is split into single characters, and the resulting set of characters is replaced by the first character in the string. For example, passing `c("._")` would replace all underscores with dots in the column names of both datasets. Similarly, passing `c("aA", "BbCc")` would replace all instances of `"A"` with `"a"` and all instances of `"b"`, `"C"`, or `"c"` with `"B"`. This is one way to ignore case for certain letters. Otherwise, it's possible to combine the equivalence classes with ignoring case, by passing (e.g.) `c("._", "case")`. Passing a single character as an element this vector will replace that character with the empty string. For example, passing c(" ", ".") would remove all spaces and dots from the column names. For mockstudy, let's treat dots, underscores, and spaces as the same, and ignore case: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case") # dots=underscores=spaces, ignore case )) ``` ## Manually specifying columns to match together If you pass a named vector to the `tol.vars=` argument, `comparedf()` will line up the names of that vector to the column names of `x` and the values of that vector to the column names of `y`. In this way, you can manually specify which non-identically-named columns to compare. For mockstudy, let's specify our variables manually in this way: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c(arm = "Arm", fu.stat = "fu stat", fu.time = "fu_time") )) ``` # Column comparison options ## Logical tolerance Use the `tol.logical=` argument to change how logicals are compared. By default, they're expected to be equal to each other. ## Numeric tolerance To allow numeric differences of a certain tolerance, use the `tol.num=` and `tol.num.val=` options. `tol.num.val=` determines the maximum (unsigned) difference tolerated if `tol.num="absolute"` (default), and determines the maximum (unsigned) percent difference tolerated if `tol.num="percent"`. Also note the option `int.as.num=`, which determines whether integers and numerics should be compared despite their class difference. If `TRUE`, the integers are coerced to numeric. Note that `mockstudy$ast` is integer, while `mockstudy2$ast` is numeric: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE # compare integers and numerics )) ``` Suppose a tolerance of up to 10 is allowed for `ast`: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10 # allow absolute differences <= 10 )) ``` ## Factor tolerance By default, factors are compared to each other based on both the labels and the underlying numeric levels. Set `tol.factor="levels"` to match only the numeric levels, or set `tol.factor="labels"` to match only the labels. ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels" # match only factor labels )) ``` Also note the option `factor.as.char=`, which determines whether factors and characters should be compared despite their class difference. If `TRUE`, the factors are coerced to characters. Note that `mockstudy$race` is a character, while `mockstudy2$race` is a factor: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE # compare factors and characters )) ``` ## Character tolerance Use the `tol.char=` argument to change how character variables are compared. By default, they are compared as-is, but they can be compared after ignoring case or trimming whitespace or both. ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case" # ignore case in character vectors )) ``` ## Date tolerance Use the `tol.date=` argument to change how dates are compared. By default, they're expected to be equal to each other. ## Other data type tolerances Use the `tol.other=` argument to change how other objects are compared. By default, they're expected to be `identical()`. ## Specifying tolerances for each variable You can also provide a list of tolerance functions to `comparedf()`: ```{r eval=FALSE} comparedf.control(tol.char = list( "none", # the default x1 = "case", # be case-insensitive for the variable "x1" x2 = function(x, y) tol.NA(x, y, x != y | y == "NA") # a custom-defined tolerance )) ``` ## User-defined tolerance functions ### Details The `comparedf.control()` function accepts functions for any of the tolerance arguments in addition to the short-hand character strings. This allows the user to create custom tolerance functions to suit his/her needs. Any custom tolerance function must accept two vectors as arguments and return a logical vector of the same length. The `TRUE`s in the results should correspond to elements which are deemed "different". Note that the numeric and date tolerance functions should also include a third argument for tolerance size (even if it's not used). CAUTION: the results should not include NAs, since the logical vector is used to subset the input data.frames. The `tol.NA()` function is useful for considering any NAs in the two vectors (but not both) as differences, in addition to other criteria. The `tol.NA()` function is used in all default tolerance functions to help handle NAs. ### Example 1 Suppose we want to ignore any dates which are later in the second dataset than the first. We define a custom tolerance function. ```{r results = 'markup'} my.tol <- function(x, y, tol) { tol.NA(x, y, x > y) } date.df1 <- data.frame(dt = as.Date(c("2017-09-07", "2017-08-08", "2017-07-09", NA))) date.df2 <- data.frame(dt = as.Date(c("2017-10-01", "2017-08-08", "2017-07-10", "2017-01-01"))) n.diffs(comparedf(date.df1, date.df2)) # default finds any differences n.diffs(comparedf(date.df1, date.df2, tol.date = my.tol)) # our function identifies only the NA as different... n.diffs(comparedf(date.df2, date.df1, tol.date = my.tol)) # ... until we change the argument order ``` ### Example 2 (Continuing our mockstudy example) Suppose we're okay with NAs getting replaced by -9. ```{r} tol.minus9 <- function(x, y, tol) { idx1 <- is.na(x) & !is.na(y) & y == -9 idx2 <- tol.num.absolute(x, y, tol) # find other absolute differences return(!idx1 & idx2) } summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case", # ignore case in character vectors tol.num = tol.minus9 # ignore NA -> -9 changes )) ``` # Extract Differences Differences can be easily extracted using the `diffs()` function. If you only want to determine how many differences were found, use the `n.diffs()` function. ```{r results = 'markup'} cmp <- comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), int.as.num = TRUE) n.diffs(cmp) head(diffs(cmp)) ``` Differences can also be summarized by variable. ```{r results = 'markup'} diffs(cmp, by.var = TRUE) ``` To report differences from only a few variables, one can pass a list of variable names to `diffs()`. ```{r results = 'markup'} diffs(cmp, vars = c("ps", "ast"), by.var = TRUE) diffs(cmp, vars = c("ps", "ast")) ``` # Appendix ## Stucture of the Object (This section is just as much for my use as for yours!) ```{r} obj <- comparedf(mockstudy, mockstudy2, by = "case") ``` There are two main objects in the `"comparedf"` object, each with its own print method. The `frame.summary` contains: - the substituted-deparsed arguments - information about the number of columns and rows in each dataset - the by-variables for each dataset (which may not be the same) - the attributes for each dataset (which get counted in the print method) - a data.frame of by-variables and row numbers of observations not shared between datasets - the number of shared observations ```{r results='markup'} print(obj$frame.summary) ``` The `vars.summary` contains: - variable name, column number, and class vector (with possibly more than one element) for each x and y. These are all `NA` if there isn't a match in both datasets. - values, a list-column of the text string `"by-variable"` for the by-variables, `NULL` for columns that aren't compared, or a data.frame containing: - The by-variables for differences found - The values which are different for x and y - The row numbers for differences found - attrs, a list-column of `NULL` if there are no attributes, or a data.frame containing: - The name of the attributes - The attributes for x and y, set to `NA` if non-existant - The actual attributes (if `show.attr=TRUE`). ```{r results='markup'} print(obj$vars.summary) ``` arsenal/vignettes/write2.Rmd0000644000176200001440000003121313741603743015577 0ustar liggesusers--- title: "The write2 function" author: "Ethan Heinzen" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The write2 function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = FALSE} knitr::opts_chunk$set(eval = FALSE, message = FALSE) ``` # Introduction The `write2*()` functions were designed as an alternative to SAS's `ODS` procedure for useRs who want to save R Markdown tables to separate Word, HTML, or PDF files without needing separate R Markdown programs. There are three shortcut functions for the most common output types: HTML, PDF, and Word. Each of these three functions calls `write2()`, an S3 function which accepts many file output types (see the help pages for `rmarkdown::render()`). Methods have been implemented for `tableby()`, `modelsum()`, and `freqlist()`, but also `knitr::kable()`, `xtable::xtable()`, and `pander::pander_return()`. The two most important things to recognize with `write2()` are the following: 1. Which function is being used to output the object. Sometimes the `write2` functions use `summary()`, while other times they will use `print()`. The details for each object specifically are described below. 2. How the `...` arguments are passed. To change the options for the summary-like or print-like function, you can pass named arguments which will in turn get passed to the appropriate function. Details for each object specifically are described below. # A note on piping `arsenal` is piping-compatible! The `write2*()` functions are probably the most useful place to take advantage of the `magrittr` package's piping framework, since commands are often nested several functions deep in the context of `write2*()`. Piping also allows the `arsenal` package to become a part of more standard analysis pipelines; instead of needing to write separate R Markdown programs, intermediate analysis tables and output can be easily incorporated into piped statements. This vignette will sprinkle the foward pipe (`%>%`) throughout as a hint at the power and flexibility of `arsenal` and piping. # Examples Using `arsenal` Objects ```{r} library(arsenal) library(magrittr) data(mockstudy) tmpdir <- tempdir() ``` ## `tableby` For `tableby` objects, the output function in `write2()` is `summary()`. For `summary.tableby` objects, the output function is `print()`. For available arguments, see the help pages for `summary.tableby()`. Don't use the option `text = TRUE` with the `write2` functions. ```{r} mylabels <- list(sex = "SEX", age ="Age, yrs") tab1 <- tableby(arm ~ sex + age, data=mockstudy) write2html( tab1, paste0(tmpdir, "/test.tableby.html"), quiet = TRUE, title = "My test table", # passed to summary.tableby labelTranslations = mylabels, # passed to summary.tableby total = FALSE # passed to summary.tableby ) ``` ## `modelsum` For `modelsum` objects, the output function in `write2()` is `summary()`. For `summary.modelsum` objects, the output function is `print()`. For available arguments, see the help pages for `summary.modelsum()`. Don't use the option `text = TRUE` with the `write2` functions. ```{r} tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy) write2pdf( tab2, paste0(tmpdir, "/test.modelsum.pdf"), quiet = TRUE, title = "My test table", # passed to summary.modelsum show.intercept = FALSE, # passed to summary.modelsum digits = 5 # passed to summary.modelsum ) ``` ## `freqlist` For `freqlist` objects, the output function in `write2()` is `summary()`. For `summary.freqlist` objects, the output function is `print()`. For available arguments, see the help pages for `summary.freqlist()`. ```{r} mockstudy[, c("arm", "sex", "mdquality.s")] %>% table(useNA = "ifany") %>% freqlist(groupBy = c("arm", "sex")) %>% write2word( paste0(tmpdir, "/test.freqlist.doc"), quiet = TRUE, single = FALSE, # passed to summary.freqlist title = "My cool title" # passed to summary.freqlist ) ``` ## `comparedf` For `comparedf` objects, the output function in `write2()` is `summary()`. For `summary.comparedf` objects, the output function is `print()`. # Examples Using Other Objects ## `knitr::kable()` For objects resulting from a call to `kable()`, the output function in `write2()` is `print()`. There aren't any arguments to the `print.knitr_kable()` function. ```{r} mockstudy %>% head() %>% knitr::kable() %>% write2html(paste0(tmpdir, "/test.kable.html"), quiet = TRUE) ``` ## `xtable::xtable()` For `xtable` objects, the output function in `write2()` is `print()`. For available arguments, see the help pages for `print.xtable()`. ```{r} mockstudy %>% head() %>% xtable::xtable(caption = "My xtable") %>% write2pdf( paste0(tmpdir, "/test.xtable.pdf"), quiet = TRUE, comment = FALSE, # passed to print.xtable to turn off the default message about xtable version include.rownames = FALSE, # passed to print.xtable caption.placement = "top" # passed to print.xtable ) ``` To make an HTML document, use the `print.xtable()` option `type = "html"`. ```{r} mockstudy %>% head() %>% xtable::xtable(caption = "My xtable") %>% write2html( paste0(tmpdir, "/test.xtable.html"), quiet = TRUE, type = "html", # passed to print.xtable comment = FALSE, # passed to print.xtable to turn off the default message about xtable version include.rownames = FALSE, # passed to print.xtable caption.placement = "top" # passed to print.xtable ) ``` User beware! `xtable()` is not compatible with `write2word()`. ## `pander::pander_return()` Pander is a little bit more tricky. Since `pander::pander()` doesn't return an object, the useR should instead use `pander::pander_return()`. For this (and for all character vectors), the the output function in `write2()` is `cat(sep = '\n')`. ```{r} write2word(pander::pander_return(head(mockstudy)), file = paste0(tmpdir, "/test.pander.doc"), quiet = TRUE) ``` # Output Multiple Tables to One Document To output multiple tables into a document, simply make a list of them and call the same function as before. ```{r} mylist <- list( tableby(sex ~ age, data = mockstudy), freqlist(table(mockstudy[, c("sex", "arm")])), knitr::kable(head(mockstudy)) ) write2pdf(mylist, paste0(tmpdir, "/test.mylist.pdf"), quiet = TRUE) ``` One neat side-effect of this function is that you can output text and headers, etc. The possibilities are endless! ```{r} mylist2 <- list( "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy), "
    ", "# Header 2", "I can change color of my text!" ) write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE) ``` In fact, you can even recurse on the lists! ```{r} write2pdf(list(mylist2, mylist), paste0(tmpdir, "/test.mylists.pdf"), quiet = TRUE) ``` # Output Other Objects Monospaced (as if in a terminal) It may be useful at times to write output that would normally be copied from the terminal. The default method for `write2()` does this automatically. To output the results of `summary.lm()`, for example: ```{r} lm(age ~ sex, data = mockstudy) %>% summary() %>% write2pdf(paste0(tmpdir, "/test.lm.pdf"), quiet = TRUE) ``` The `verbatim()` function is another option to explicitly alert `write2()` to do this. This becomes particularly helpful to overrule existing S3 methods. For example, suppose you wanted to just print a tableby object (as if it were to print in the terminal): ```{r} tab4 <- tableby(arm ~ sex + age, data=mockstudy) write2html(verbatim(tab4), paste0(tmpdir, "/test.print.tableby.html"), quiet = TRUE) ``` Or suppose you wanted to print a character vector (as if it were to print in the terminal): ```{r} chr <- paste0("MyVector", 1:10) write2pdf(verbatim(chr), paste0(tmpdir, "/test.character.pdf"), quiet = TRUE) ``` Note that you can combine multiple objects in one call: ```{r} write2pdf(verbatim(tab4, chr), paste0(tmpdir, "/test.verbatim.pdf"), quiet = TRUE) ``` # Add a YAML Header to the Output You can add a YAML header to `write2()` output using the `yaml()` function. ```{r} mylist3 <- list( yaml(title = "Test YAML Title", author = "My cool author name"), "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy) ) write2html(mylist3, paste0(tmpdir, "/test.yaml.html"), quiet = TRUE) ``` In fact, all detected YAML pieces will be moved as the first output, so that the above code chunk gives the same output as this one: ```{r} mylist4 <- list( "# Header 1", "This is a small paragraph introducing tableby.", yaml(title = "Test YAML Title"), tableby(sex ~ age, data = mockstudy), yaml(author = "My cool author name") ) write2html(mylist4, paste0(tmpdir, "/test.yaml2.html"), quiet = TRUE) ``` # Add a Code Chunk to the Output It is now possible to add code chunks to the output `.Rmd`: ```{r} mylist5 <- list( "# What is 1 + 2?", code.chunk(a <- 1, b <- 2), code.chunk(a + b, chunk.opts = "r echo=FALSE, eval=TRUE") ) write2html(mylist5, paste0(tmpdir, "/test.code.chunk.html"), quiet = TRUE) ``` This allow flexibility to create objects on-the-fly, to read in saved objects to the temporary `.Rmd`, etc. The possibilities are endless! # FAQs ## How do I suppress the note about my document getting rendered? This is easily accomplished by using the argument `quiet = TRUE` (passed to the `rmarkdown::render()` function). ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.quiet.html"), quiet = TRUE # passed to rmarkdown::render ) ``` ## How do I look at the temporary `.Rmd` file? This is easily accomplished by using the option `keep.rmd = TRUE`. ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.rmd.html"), quiet = TRUE, # passed to rmarkdown::render keep.rmd = TRUE ) ``` ## How do I prevent my document from being rendered? This is easily accomplished by using the option `render. = FALSE`. Note that this will then default to `keep.rmd = TRUE`. ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.dont.render.html"), render. = FALSE ) ``` ## How do I output headers, raw HTML/LaTeX, paragraphs, etc.? One can simply abuse the list S3 method for `write2()`! ```{r} mylist2 <- list( "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy), "
    ", "# Header 2", "I can change color of my text!" ) write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE) ``` ## How do I tweak the default format from `write2word()`, `write2html()`, or `write2pdf()`? You can pass arguments to the format functions used behind the scenes. ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.theme.html"), quiet = TRUE, # passed to rmarkdown::render theme = "yeti" # passed to rmarkdown::html_document ) ``` See the help pages for `rmarkdown::word_document()`, `rmarkdown::html_document()`, and `rmarkdown::pdf_document()`. ## How do I output to a file format other than word, HTML, and PDF? This can be done using the generic `write2()` function. The last argument in the function can be another format specification. For details on the acceptable inputs, see the help page for `write2()`. ```{r} write2( knitr::kable(head(mockstudy[, 1:4])), paste0(tmpdir, "/test.kable.rtf"), quiet = TRUE, # passed to rmarkdown::render output_format = rmarkdown::rtf_document ) ``` ## How do I avoid prefixes on my table captions in PDF? You can do this pretty easily with the `yaml()` function: ```{r} mylist5 <- list( yaml("header-includes" = list("\\usepackage[labelformat=empty]{caption}")), "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy) ) write2pdf(mylist5, paste0(tmpdir, "/test.noprefixes.pdf"), title = "My tableby") ``` ## How do I output multiple tables with different titles? There are now `write2()` methods for the summary objects of `arsenal` functions. This allows you to specify a title for each table: ```{r} mylist6 <- list( summary(tableby(sex ~ age, data = mockstudy), title = "A Title for tableby"), summary(modelsum(age ~ sex, data = mockstudy), title = "A Title for modelsum"), summary(freqlist(~ sex, data = mockstudy), title = "A Title for freqlist") ) write2pdf(mylist6, paste0(tmpdir, "/test.multiple.titles.pdf")) ``` ## Why is `write2()` not working in R Markdown/R Studio? It's possible that a global option in R Studio is preventing the tables from rendering. Consider turning off (i.e., unchecking) the option Tools > Global Options > R Markdown > Show output inline for all R Markdown documents. arsenal/R/0000755000176200001440000000000014053010771012076 5ustar liggesusersarsenal/R/arsenal.R0000644000176200001440000000530013675162464013664 0ustar liggesusers## Created: 12/13/2016 ## Author: Ethan Heinzen #' An Arsenal of 'R' Functions for Large-Scale Statistical Summaries #' #' An Arsenal of 'R' functions for large-scale statistical summaries, #' which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and #' which use formulas and versatile summary statistics for summary tables and models. #' #' The package download, NEWS, and README are available on CRAN: \url{https://cran.r-project.org/package=arsenal} #' #' @section Functions: #' #' Below are listed some of the most widely used functions available in \code{arsenal}: #' #' \code{\link{tableby}}: Summary statistics of a set of independent variables by a categorical variable. #' #' \code{\link{paired}}: Summary statistics of a set of independent variables paired across two timepoints. #' #' \code{\link{modelsum}}: Fit models over each of a set of independent variables with a response variable. #' #' \code{\link{freqlist}}: Approximate the output from SAS's \code{PROC FREQ} procedure when using #' the \code{/list} option of the \code{TABLE} statement. #' #' \code{\link{comparedf}}: Compare two data.frames and report any differences between them, #' much like SAS's \code{PROC COMPARE} procedure. #' #' \code{\link{write2word}}, \code{\link{write2html}}, \code{\link{write2pdf}}: Functions to output #' tables to a single Word, HTML, or PDF document. #' #' \code{\link{write2}}: Functions to output tables to a single document. #' (Also the S3 backbone behind the \code{write2*} functions.) #' #' \code{\link{keep.labels}}: Keep the \code{'label'} attribute on an R object when subsetting. #' #' \code{\link{formulize}}: A shortcut to generate one-, two-, or many-sided formulas. #' #' \code{\link{mdy.Date}} and \code{\link{Date.mdy}}: Convert numeric dates for month, day, and year to Date object, and vice versa. #' #' \code{\link{is.Date}}: Test if an object is a date. #' #' \code{\link{\%nin\%}}: Test for "not in". #' #' \code{\link{allNA}} and \code{\link{includeNA}}: some useful functions for dealing with NAs. #' #' @section Data: #' #' \code{\link{mockstudy}}: Mock study data for examples. #' #' @examples #' library(arsenal) #' #' @docType package #' @name arsenal #' @importFrom utils head tail #' NULL #### commands to build the package using devtools # devtools::check_man() # devtools::test() # devtools::check() # withr::with_libpaths(c("../testinstalls/", .libPaths()), devtools::install(build_vignettes = TRUE, dependencies = FALSE)) # devtools::build("../rpkg-arsenal/") ## < restart R > ## library(arsenal, lib.loc = "../testinstalls/") #### to upload to CRAN ## Update DESCRIPTION, README.md, NEWS.md, and cran-comments.md # devtools::revdep_check() # devtools::release() arsenal/R/summary.modelsum.R0000644000176200001440000001477513656527335015601 0ustar liggesusers## Purpose: summary method for modelsum object ## Author: Greg Dougherty, Jason Sinnwell and Beth Atkinson ## Updated: 9/29/2015 #' Summarize a \code{modelsum} object. #' #' Format the information in \code{object} as a table using Pandoc coding or plain text, and cat it to stdout. #' #' @param object A \code{\link{modelsum}} object. #' @param ... For \code{summary.modelsum}, other arguments passed to \code{\link{as.data.frame.modelsum}}. #' For \code{as.data.frame.summary.modelsum}, "width" and "min.split" are passed to \code{\link{smart.split}}. #' For \code{print}ing the summary object, these are passed to both \code{as.data.frame.summary.modelsum} and #' \code{\link[knitr]{kable}}. #' @inheritParams summary.tableby #' @param x An object of class \code{"summary.modelsum"}. #' @param adjustment.names Logical, denoting whether the names of the adjustment models should be printed. #' @seealso \code{\link{modelsum}}, \code{\link{as.data.frame.modelsum}} #' @return An object of class \code{"summary.modelsum"} #' @author Ethan Heinzen, based on code originally by Greg Dougherty #' @name summary.modelsum NULL #> NULL #' @rdname summary.modelsum #' @export summary.modelsum <- function(object, ..., labelTranslations = NULL, text = FALSE, title = NULL, term.name = "", adjustment.names = FALSE) { dat <- as.data.frame(object, ..., labelTranslations = labelTranslations, list.ok = TRUE) structure(list( object = set_attr(dat, "control", NULL), control = attr(dat, "control"), hasStrata = has_strata(object), text = text, title = title, term.name = term.name, adjustment.names = adjustment.names ), class = c("summary.modelsum", "summary.arsenal_table")) } as_data_frame_summary_modelsum <- function(df, control, hasStrata, term.name, text, adjustment.names, width, min.split) { df.orig <- df #### format the digits and nsmall things #### # integers, one-per-model use.digits0 <- c("Nmiss", "N", "Nmiss2", "Nevents", "df.residual", "df.null", "edf") # non-integers, one-per-model use.digits1 <- c("logLik", "AIC", "BIC", "null.deviance", "deviance", "statistic.F", "dispersion", "statistic.sc", "concordance", "std.error.concordance", "adj.r.squared", "r.squared", "theta", "SE.theta") # non-integers, many-per-model use.digits2 <- c("estimate", "CI.lower.estimate", "CI.upper.estimate", "std.error", "statistic", "standard.estimate") use.digits.ratio <- c("OR", "CI.lower.OR", "CI.upper.OR", "RR", "CI.lower.RR", "CI.upper.RR", "HR", "CI.lower.HR", "CI.upper.HR") use.digits.p <- c("p.value.sc", "p.value.log", "p.value.wald", "p.value.F", "p.value.lrt") #"p.value" cn <- colnames(df) df[cn %in% c(use.digits1, use.digits2)] <- lapply(df[cn %in% c(use.digits1, use.digits2)], formatC, digits = control$digits, format = "f") df[cn %in% use.digits.ratio] <- lapply(df[cn %in% use.digits.ratio], formatC, digits = control$digits.ratio, format = "f") df[cn %in% c("p.value", use.digits.p)] <- lapply(df[cn %in% c("p.value", use.digits.p)], formatC, digits = control$digits.p, format = if(control$format.p) "f" else "g") if(control$format.p) { cutoff <- 10^(-control$digits.p) fmt <- paste0("< ", format(cutoff, digits = control$digits.p, format = "f")) for(tst in c("p.value", use.digits.p)) { if(tst %in% cn) df[[tst]][df.orig[[tst]] < cutoff] <- fmt } } df[cn %in% c("p.value", use.digits.p)] <- lapply(df[cn %in% c("p.value", use.digits.p)], sub, pattern = "^\\s*NA\\s*$", replacement = "") #### don't show the same statistics more than once #### dups <- if(hasStrata) unlist(by(df, df[[4]], function(x) duplicated(x$model), simplify = FALSE), use.names = FALSE) else duplicated(df$model) df[cn %in% c(use.digits0, use.digits1, use.digits.p)] <- lapply(df[cn %in% c(use.digits0, use.digits1, use.digits.p)], replace, list = dups, values = "") if(hasStrata) { df[[4]] <- as.character(df[[4]]) df[[4]][duplicated(df[[4]])] <- "" } #### get rid of unnecessary columns #### df$y.term <- NULL df$y.label <- NULL df$strata.term <- NULL df$model <- NULL df$term <- NULL term.type <- df$term.type df$term.type <- NULL if(!adjustment.names) { df$adjustment <- NULL } else df$adjustment[dups] <- "" #### Format if necessary #### if(!is.null(width)) { firstcol <- smart.split(df[[1L + hasStrata]], width = width, min.split = min.split) lens <- vapply(firstcol, length, NA_integer_) df <- do.call(cbind.data.frame, c(list(label = unlist(firstcol, use.names = FALSE)), lapply(df[-1L - hasStrata], insert_elt, times = lens))) if(hasStrata) df <- df[replace(seq_along(df), 1:2, 2:1)] row.names(df) <- NULL term.type <- insert_elt(term.type, times = lens, elt = NULL) } df$label <- trimws(df$label) if(!is.null(text)) { if(identical(text, "html")) { df$label <- ifelse(term.type == "Intercept", df$label, paste0("", df$label, "")) } else if(identical(text, "latex")) { df$label <- ifelse(term.type == "Intercept", df$label, paste0("\\textbf{", df$label, "}")) } else if(!text) { df$label <- ifelse(term.type == "Intercept", df$label, paste0("**", ifelse(df$label == "", " ", df$label), "**")) } } #### tweak column names according to specifications #### cn <- stats::setNames(colnames(df), colnames(df)) if(length(control$stat.labels) > 0) { nm <- intersect(cn, names(control$stat.labels)) if(length(nm)) cn[nm] <- unlist(control$stat.labels[nm]) } cn["label"] <- term.name colnames(df) <- cn set_attr(df, "align", rep("l", ncol(df))) } #' @rdname summary.modelsum #' @export as.data.frame.summary.modelsum <- function(x, ..., text = x$text, term.name = x$term.name, adjustment.names = x$adjustment.names, width = NULL, min.split = NULL, list.ok = FALSE) { if(is.null(term.name) || identical(term.name, TRUE)) { term.name <- vapply(x$object, attr, NA_character_, "ylabel") } stopifnot(length(term.name) <= length(x$object)) out <- Map(as_data_frame_summary_modelsum, x$object, x$hasStrata, term.name, MoreArgs = list(control = x$control, text = text, width = width, min.split = min.split, adjustment.names = adjustment.names)) if(!list.ok) { if(length(out) == 1) out <- out[[1]] else warning("as.data.frame.summary.modelsum is returning a list of data.frames") } out } arsenal/R/mdy.Date.R0000644000176200001440000000434213632700352013674 0ustar liggesusers## Author: Terry Therneau ## Contributed on 8/30/2013 ## Updated 7/23/2014 by Jason Sinnwell #' Convert numeric dates to Date object, and vice versa #' #' Convert numeric dates for month, day, and year to Date object, and vice versa. #' #' Test if an object is a date. #' #' @param month integer, month (1-12). #' @param day integer, day of the month (1-31, depending on the month). #' @param year integer, either 2- or 4-digit year. If two-digit number, will add 1900 onto it, depending on range. #' @param yearcut cutoff for method to know if to convert to 4-digit year. #' @param date A date value. #' @param x An object. #' @return \code{mdy.Date} returns a Date object, and Date.mdy returns a list with integer values for month, day, and year. #' \code{is.Date} returns a single logical value. #' @details More work may need to be done with yearcut and 2-digit years. Best to give a full 4-digit year. #' @seealso \code{\link{Date}}, \code{\link{DateTimeClasses}} #' @examples #' mdy.Date(9, 2, 2013) #' #' tmp <- mdy.Date(9, 2, 2013) #' Date.mdy(tmp) #' #' is.Date(tmp) #' @name mdy.Date NULL #> NULL #' @rdname mdy.Date #' @export # mdy.Date(c(0,5),c(1, 1),c(2014, 2013)) # should return NA, "2013-05-01" mdy.Date <- function(month, day, year, yearcut=120) { ## keep operations vectorized ## NA for day or month out of range day <- as.numeric(day) day <- ifelse(day < 1 | day > 31, NA, day) # stop ("invalid day") month <- as.numeric(month) month <- ifelse(month < 1 | month > 12 | month != floor(month), NA, month) year <- ifelse(year < yearcut, year + 1900, year) temp <- cbind(year, month, day) # force them all to the same length ## allow NAs dtext <- rep(NA, nrow(temp)) dtext[rowSums(is.na(temp)) < 1] <- paste(temp[rowSums(is.na(temp)) < 1, 1, drop=FALSE], sprintf("%2d", temp[rowSums(is.na(temp)) < 1, 2, drop=FALSE]), sprintf("%2d", temp[rowSums(is.na(temp)) < 1, 3, drop=FALSE]), sep='/') as.Date(dtext) } #' @rdname mdy.Date #' @export Date.mdy <- function(date) { temp <- unclass(as.POSIXlt(date)) list(month=temp$mon+1, day=temp$mday, year=1900+temp$year) } #' @rdname mdy.Date #' @export is.Date <- function(x) inherits(x, "Date") arsenal/R/tableby.R0000644000176200001440000004541314045272124013655 0ustar liggesusers #' Summary Statistics of a Set of Independent Variables by a Categorical Variable #' #' Summarize one or more variables (x) by a categorical variable (y). Variables #' on the right side of the formula, i.e. independent variables, are summarized by the #' levels of a categorical variable on the left of the formula. Optionally, an appropriate test is performed to test the #' distribution of the independent variables across the levels of the categorical variable. #' #' @param formula an object of class \code{\link{formula}}; a symbolic description of the variables to be summarized by the group, #' or categorical variable, of interest. See "Details" for more information. To only view overall summary #' statistics, a one-sided formula can be used. #' @param data an optional data frame, list or environment (or object coercible by \code{\link{as.data.frame}} to a data frame) #' containing the variables in the model. If not found in data, the variables are taken from \code{environment(formula)}, #' typically the environment from which the function is called. #' @param na.action a function which indicates what should happen when the data contain \code{NA}s. #' The default is \code{na.tableby(TRUE)} if there is a by-variable, and \code{na.tableby(FALSE)} if there is not. #' This schema thus includes observations with \code{NA}s in x variables, #' but removes those with \code{NA} in the categorical group variable and strata (if used). #' @param subset an optional vector specifying a subset of observations (rows of data) to be used in the results. #' Works as vector of logicals or an index. #' @param weights a vector of weights. Using weights will disable statistical tests. #' @param strata a vector of strata to separate summaries by an additional group. #' @param control control parameters to handle optional settings within \code{tableby}. #' Two aspects of \code{tableby} are controlled with these: test options of RHS variables across levels of the categorical #' grouping variable, and x variable summaries within the grouping variable. Arguments for \code{tableby.control} #' can be passed to \code{tableby} via the \code{...} argument, but if a control object and \code{...} arguments are both supplied, #' the latter are used. See \code{\link{tableby.control}} for more details. #' @param ... additional arguments to be passed to internal \code{tableby} functions or \code{\link{tableby.control}}. #' @details #' The group variable (if any) is categorical, which could be an integer, character, #' factor, or ordered factor. \code{tableby} makes a simple summary of #' the counts within the k-levels of the independent variables on the #' right side of the formula. Note that unused levels are dropped. #' #' The \code{data} argument allows data.frames with label attributes for the columns, and those #' labels will be used in the summary methods for the \code{tableby} class. #' #' The independent variables are a mixture of types: categorical (discrete), #' numeric (continuous), and time to event (survival). These variables #' are split by the levels of the group variable (if any), then summarized within #' those levels, specific to the variable type. A statistical test is #' performed to compare the distribution of the independent variables across the #' levels of the grouping variable. #' #' The tests differ by the independent variable type, but can be specified #' explicitly in the formula statement or in the control function. #' These tests are accepted: #' \itemize{ #' \item{ #' \code{anova}: analysis of variance test; the default test for continuous variables. When #' LHS variable has two levels, equivalent to two-sample t-test. #' } #' \item{ #' \code{kwt}: Kruskal-Wallis Rank Test, optional test for continuous #' variables. When LHS variable has two levels, equivalent to Wilcoxon test. #' } #' \item{ #' \code{wt}: An explicit Wilcoxon test. #' } #' \item{ #' \code{medtest}: A median test. #' } #' \item{ #' \code{chisq}: chi-square goodness of fit test for equal counts of a #' categorical variable across categories; the default for categorical #' or factor variables #' } #' \item{ #' \code{fe}: Fisher's exact test for categorical variables #' } #' \item{ #' \code{trend}: trend test for equal distribution of an ordered variable #' across a categorical variable; the default for ordered factor variables #' } #' \item{ #' \code{logrank}: log-rank, the default for time-to-event variables #' } #' \item{ #' \code{notest}: no test is performed. #' } #' } #' #' To perform a mixture of asymptotic and rank-based tests on two #' different continuous variables, an example formula is: #' \code{formula = group ~ anova(age) + kwt(height)}. The test settings #' in \code{tableby.control} apply to all independent variables of a given type. #' #' The summary statistics reported for each independent variable within the #' group variable can be set in \code{\link{tableby.control}}. #' #' Finally, multiple by-variables can be set using \code{list()}. See the examples for more details. #' #' @return An object with class \code{c("tableby", "arsenal_table")} #' @seealso \code{\link{arsenal_table}}, \code{\link[stats]{anova}}, \code{\link[stats]{chisq.test}}, \code{\link{tableby.control}}, #' \code{\link{summary.tableby}}, \code{\link{tableby.internal}}, \code{\link{formulize}}, \code{\link{selectall}} #' #' @examples #' data(mockstudy) #' tab1 <- tableby(arm ~ sex + age, data=mockstudy) #' summary(tab1, text=TRUE) #' #' mylabels <- list(sex = "SEX", age ="Age, yrs") #' summary(tab1, labelTranslations = mylabels, text=TRUE) #' #' tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, #' numeric.stats=c("median","q1q3"), numeric.test="kwt") #' summary(tab3, text=TRUE) #' #' # multiple LHS #' summary(tableby(list(arm, sex) ~ age, data = mockstudy, strata = ps), text = TRUE) #' #' tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + kwt(ast), data=mockstudy) #' tests(tab.test) #' #' @author Jason Sinnwell, Beth Atkinson, Gregory Dougherty, and Ethan Heinzen, adapted from SAS Macros written by Paul Novotny and Ryan Lennon #' @name tableby NULL #> NULL #' @rdname tableby #' @export tableby <- function(formula, data, na.action, subset=NULL, weights=NULL, strata, control = NULL, ...) { control <- c(list(...), control) control <- do.call("tableby.control", control[!duplicated(names(control))]) Call <- match.call() ## Tell user if they passed an argument that was not expected, either here or in control expectArgs <- c("formula", "data", "na.action", "subset", "weights", "strata", "control", names(control), "times") match.idx <- match(names(Call)[-1], expectArgs) if(anyNA(match.idx)) warning("unused arguments: ", paste(names(Call)[1+which(is.na(match.idx))], collapse=", "), "\n") indx <- match(c("formula", "data", "subset", "weights", "na.action", "strata"), names(Call), nomatch = 0) if(indx[1] == 0) stop("A formula argument is required") special <- c("anova", "kwt", "wt", "medtest", "chisq", "fe", "logrank", "trend", "notest") out.tables <- list() formula.list <- as_list_formula(formula) for(FORM in formula.list) { temp.call <- Call[c(1, indx)] temp.call[[1]] <- as.name("model.frame") if(is.null(temp.call$na.action)) { temp.call$na.action <- if(length(FORM) == 2) na.tableby(FALSE) else na.tableby(TRUE) } else if(length(FORM) == 2 && identical(na.action, na.tableby(TRUE), ignore.environment = TRUE)) { # purposely using na.action instead of temp.call$na.action here warning("It appears you're using na.tableby(TRUE) with a one-sided formula... Results may not be what you expect.") } if (missing(data)) { temp.call$formula <- stats::terms(FORM, special) } else { # instead of call("keep.labels", ...), which breaks when arsenal isn't loaded (Can't find "keep.labels") temp.call$data <- as.call(list(keep.labels, temp.call$data)) temp.call$formula <- stats::terms(FORM, special, data = keep.labels(data)) } ## set up new environment for ## if specials, assign dummy versions of those functions tabenv <- new.env(parent = environment(formula)) for(sp in special) { if(!is.null(attr(temp.call$formula, "specials")[[sp]])) assign(sp, inline_tableby_stat_test, envir = tabenv) } ## set tabenv as environment in which to evalulate formula environment(temp.call$formula) <- tabenv ## evaluate the formula with env set for it modeldf <- loosen.labels(eval.parent(temp.call)) if(nrow(modeldf) == 0) stop("No (non-missing) observations") Terms <- stats::terms(modeldf) ###### Check for weights ###### if(hasWeights <- "(weights)" %in% colnames(modeldf)) { weights <- as.vector(stats::model.weights(modeldf)) if(!is.numeric(weights) || any(weights < 0)) stop("'weights' must be a numeric vector and must be non-negative") modeldf[["(weights)"]] <- NULL control$test <- FALSE } else weights <- NULL ###### Check for strata ###### if(hasStrata <- "(strata)" %in% colnames(modeldf)) { strata.col <- modeldf[["(strata)"]] strataTerm <- deparse(Call$strata) if(is.null(strataLabel <- attr(strata.col, "label"))) strataLabel <- strataTerm if(is.factor(strata.col)) { strata.col <- droplevels(strata.col) strata.levels <- levels(strata.col) } else strata.levels <- sort(unique(strata.col)) modeldf[["(strata)"]] <- NULL } else { strata.col <- rep("", nrow(modeldf)) strataTerm <- strataLabel <- strata.levels <- "" } ###### Check for by-variable ###### if(attributes(Terms)$response != 0) { by.col <- modeldf[[1]] termBy <- names(modeldf)[1] if(is.null(labelBy <- attr(by.col, "label"))) labelBy <- termBy if(is.factor(by.col)) { by.col <- droplevels(by.col) by.levels <- levels(by.col) } else by.levels <- sort(unique(by.col)) by.col <- as.character(by.col) by.levels <- as.character(by.levels) if(any(by.levels == "")) { warning('Empty string detected in by-variable is not allowed; converting to " ".') by.col[by.col == ""] <- " " by.levels <- unique(replace(by.levels, by.levels == "", " ")) } reserved <- c("group.term", "group.label", "strata.term", "strata.value", "variable", "term", "label", "variable.type", "test", "p.value") if(any(by.levels %in% reserved)) { stop("One or more reserved word found in by-variable: ", paste0(intersect(by.levels, reserved), collapse = ", "), ". Please use a different word") } if(length(by.levels) < 2 && control$test) { warning("The by-variable has fewer than two levels; statistical tests are ignored") control$test <- FALSE } modeldf[[1]] <- NULL } else { ## no response, create a dummy one if(is.null(overall <- control$stats.labels$overall)) overall <- "Overall" by.col <- rep(overall, nrow(modeldf)) termBy <- labelBy <- by.levels <- overall control$total <- FALSE control$test <- FALSE } if(is.null(totallab <- control$stats.labels$total)) totallab <- "Total" ystats <- if(hasWeights) { c(stats::xtabs(weights ~ factor(by.col, levels=by.levels), exclude = NA), stats::setNames(sum(weights[!is.na(by.col)]), totallab)) } else c(table(factor(by.col, levels=by.levels), exclude=NA), stats::setNames(sum(!is.na(by.col)), totallab)) yList <- list(stats=ystats, label=labelBy, term=termBy) ## find which columnss of modeldf have specials assigned to known specials specialIndices <- unlist(attr(Terms, "specials")) - attributes(Terms)$response specialTests <- rep("", ncol(modeldf)) ## If a special shows up multiple times, the unlist assigned a number at the end. Strip it off. ## This disallows functions with a number at the end specialTests[specialIndices] <- gsub("\\d+$", "", names(specialIndices)) xTerms <- Map(modeldf, names(modeldf), f = function(col, nm) { if(is.null(nameEff <- attr(col, "name"))) nameEff <- nm if(is.null(labelEff <- attr(col, "label"))) labelEff <- nameEff if(is.null(termEff <- attr(col, "term"))) termEff <- nm list(variable=nameEff, label=labelEff, term=termEff, term.orig = nm) }) names(xTerms) <- vapply(xTerms, "[[", NA_character_, "variable") control.list <- lapply(modeldf, attr, "control.list") names(control.list) <- names(xTerms) strataList <- vector("list", length(strata.levels)) if(hasStrata) names(strataList) <- paste0("(", strataTerm, ") == ", strata.levels) for(strat in strata.levels) { ## list of x variables xList <- vector("list", ncol(modeldf)) names(xList) <- names(xTerms) bycol <- by.col[strata.col == strat] weightscol <- weights[strata.col == strat] for(eff in seq_along(modeldf)) { currcol <- modeldf[[eff]] ############################################################ if(is.ordered(currcol) || is.logical(currcol) || is.factor(currcol) || is.character(currcol)) { ######## ordinal or categorical variable (character or factor) ############### ## convert logicals and characters to factor if(is.character(currcol)) { currcol <- factor(currcol, levels = sort(unique(currcol[!is.na(currcol)]))) } else if(is.logical(currcol)) currcol <- factor(currcol, levels=c(FALSE, TRUE)) ## to make sure all levels of cat variable are counted, need to pass values along xlevels <- levels(currcol) if(length(xlevels) == 0) stop(paste0("Zero-length levels found for ", names(xTerms)[eff])) ## get stats funs from either formula or control if(is.ordered(currcol)) { currstats <- control$ordered.stats currtest <- control$ordered.test vartype <- "ordinal" } else { currstats <- control$cat.stats currtest <- control$cat.test vartype <- "categorical" } } else if(is.Date(currcol)) { ######## Date variable ############### xlevels <- sort(unique(currcol)) ## get stats funs from either formula or control currstats <- control$date.stats currtest <- control$date.test vartype <- "Date" } else if(is.selectall(currcol)) { xlevels <- colnames(currcol) currstats <- control$selectall.stats currtest <- control$selectall.test vartype <- "selectall" } else if(inherits(currcol, "Surv")) { ##### Survival (time to event) ####### xlevels <- NULL if(any(currcol[, 2] %nin% c(0:1, NA))) stop("Survival endpoint may not be coded 0/1.") currstats <- control$surv.stats currtest <- control$surv.test vartype <- "survival" } else if(is.numeric(currcol) || inherits(currcol, "difftime")) { ######## Continuous variable (numeric) ############### ## for difftime, convert to numeric if(inherits(currcol, "difftime")) currcol <- as.numeric(currcol) xlevels <- sort(unique(currcol)) ## if no missings, and control says not to show missings, ## remove Nmiss stat fun currstats <- control$numeric.stats currtest <- control$numeric.test vartype <- "numeric" } else stop("Variable ", names(xTerms), " has unknown class(es): ", paste0(class(currcol)[-1], collapse = ", ")) ############################################################ ## if no missings, and control says not to show missings, ## remove Nmiss stat fun if(!is.null(attrstats <- attr(modeldf[[eff]], "stats"))) currstats <- attrstats # now finally subset currcol <- currcol[strata.col == strat] if(vartype == "categorical") { tmpdl <- control.list[[eff]]$cat.droplevels if(is.null(tmpdl)) tmpdl <- control$cat.droplevels if(tmpdl) { currcol <- droplevels(currcol) xlevels <- levels(currcol) if(length(xlevels) == 0) stop(paste0("Zero-length levels found for ", names(xTerms)[eff])) if(!control$test.always) specialTests[eff] <- "notest" } } if(!anyNA(currcol) && "Nmiss" %in% currstats) currstats <- currstats[currstats != "Nmiss"] statList <- list() for(statfun2 in currstats) { statfun <- get_stat_function(statfun2) tmp <- get0(statfun, mode = "function") statfun <- if(is.null(tmp)) get(statfun, parent.frame(), mode = "function") else tmp bystatlist <- list() if(statfun2 %in% c("countrowpct", "countcellpct", "rowbinomCI", "Npct")) { bystatlist <- do.call(statfun, list( currcol, levels = xlevels, by = bycol, by.levels = by.levels, weights = weightscol, na.rm = TRUE, totallab = totallab )) } else { for(bylev in by.levels) { idx <- bycol == bylev bystatlist[[bylev]] <- do.call(statfun, list(currcol[idx], levels=xlevels, na.rm=TRUE, weights=weightscol[idx], conf.level=control$conf.level, times=control$times)) } ## add Total bystatlist[[totallab]] <- do.call(statfun, list(currcol, levels=xlevels, na.rm=TRUE, weights=weightscol, conf.level=control$conf.level, times=control$times)) } statList[[statfun2]] <- bystatlist } if(length(statList) == 0) stop(paste0("Nothing to show for variable '", names(xTerms)[eff], "'")) currtest <- if(nchar(specialTests[eff]) > 0) specialTests[eff] else currtest testout <- if(control$test) { eval(call(currtest, currcol, factor(bycol, levels = by.levels), chisq.correct=control$chisq.correct, simulate.p.value=control$simulate.p.value, B=control$B, wilcox.correct = control$wilcox.correct, wilcox.exact = control$wilcox.exact, test.always=control$test.always)) } else notest() xList[[eff]] <- list(stats=statList, test=testout, type=vartype) } strataList[[if(!hasStrata) 1 else paste0("(", strataTerm, ") == ", strat)]] <- xList } out.tables[[termBy]] <- list(y = yList, strata = list(term = strataTerm, values = strata.levels, label = strataLabel, hasStrata = hasStrata), x = xTerms, tables = strataList, control.list = control.list, hasWeights = hasWeights) } structure(list(Call = Call, control = control, tables = out.tables), class = c("tableby", "arsenal_table")) } arsenal/R/padjust.R0000644000176200001440000000436413632700352013705 0ustar liggesusers#' Adjust P-values for Multiple Comparisons #' #' @param p An object. #' @inheritParams stats::p.adjust #' @param suffix A suffix to add to the footnotes indicating that the tests were adjusted. #' @param ... Other arguments. #' @seealso \code{\link[stats]{p.adjust}}, \code{\link{modpval.tableby}}, \code{\link{tests.tableby}} #' @name padjust NULL #> NULL #' @rdname padjust #' @export padjust <- function(p, method, n, ...) UseMethod("padjust") #' @rdname padjust #' @export padjust.default <- function(p, method, n, ...) { Call <- match.call() indx <- match(c("p", "method", "n"), names(Call), nomatch = 0) temp.call <- Call[c(1, indx)] temp.call[[1L]] <- quote(stats::p.adjust) eval(temp.call, parent.frame()) } #' @rdname padjust #' @export padjust.tableby <- function(p, method, n, suffix = " (adjusted for multiple comparisons)", ...) { Call <- match.call() if(any(has_strata(p)) || length(p$tables) > 1) stop("Can't adjust p-values on tables with strata or multiple by-variables.") if(!p$control$test) stop("Can't adjust p-values when no tests were run") indx <- match(c("p", "method", "n"), names(Call), nomatch = 0) temp.call <- Call[c(1, indx)] temp.call[[1L]] <- quote(stats::p.adjust) pvals <- tests(p) temp.call$p <- pvals$p.value pvals$p.value <- eval(temp.call, parent.frame()) pvals$Method <- paste0(pvals$Method, suffix) modpval.tableby(p, pvals) } #' @rdname padjust #' @export padjust.summary.tableby <- function(p, method, n, suffix = " (adjusted for multiple comparisons)", ...) { Call <- match.call() if(any(p$hasStrata) || length(p$object) > 1) stop("Can't adjust p-values on tables with strata or multiple by-variables") if(!p$control$test) { warning("No tests run on tableby object") return(p) } indx <- match(c("p", "method", "n"), names(Call), nomatch = 0) temp.call <- Call[c(1, indx)] temp.call[[1L]] <- quote(stats::p.adjust) pvals <- unique(p$object[[1]][c("variable", "p.value")]) # find unique variable-pval combos temp.call$p <- pvals$p.value pvals$p.value <- eval(temp.call, parent.frame()) p$object[[1]][["p.value"]] <- pvals$p.value[match(p$object[[1]][["variable"]], pvals$variable)] # "merge" them back in p$object[[1]][["test"]] <- paste0(p$object[[1]][["test"]], suffix) p } arsenal/R/as.data.frame.freqlist.R0000644000176200001440000000251113632700352016457 0ustar liggesusers as_data_frame_freqlist <- function(yList, single, sparse) { filter_zero <- function(x) if(!sparse) droplevels(x[x$Freq != 0, , drop = FALSE]) else x labs <- vapply(yList$x, "[[", NA_character_, "label") if(single || !yList$strata$hasStrata) { list(set_attr(do.call(rbind_chr, lapply(yList$tables, filter_zero)), "labels", labs)) } else { lapply(lapply(yList$tables, filter_zero), set_attr, "labels", labs) } } #' as.data.frame.freqlist #' #' Convert \code{\link{freqlist}} object to a data.frame. #' #' @param x An object of class \code{"freqlist"}. #' @inheritParams summary.freqlist #' @param ... Arguments to pass to \code{\link{freq.control}} #' @return A data.frame corresponding to the \code{freqlist} object. #' @export as.data.frame.freqlist <- function(x, ..., labelTranslations = NULL, list.ok = FALSE) { if(!is.null(labelTranslations)) labels(x) <- labelTranslations control <- c(list(...), x$control) control <- do.call("freq.control", control[!duplicated(names(control))]) out <- lapply(x$tables, as_data_frame_freqlist, single = control$single, sparse = control$sparse) out <- unlist(out, recursive = FALSE, use.names = FALSE) if(!list.ok) { if(length(out) == 1) out <- out[[1]] else warning("as.data.frame.freqlist is returning a list of data.frames") } set_attr(out, "control", control) } arsenal/R/freqlist.internal.R0000644000176200001440000000735113656527335015714 0ustar liggesusersinternalTable <- function(data, na.options, keep_cols = c("cumFreq", "freqPercent", "cumPercent"), sort = FALSE, decreasing = FALSE) { if("Freq" %nin% names(data)) stop("You tried to create or sort a freqlist table with no 'Freq' column!") data <- if(!sort) { data[do.call(order, unname(data)), ] } else data[order(data$Freq, decreasing = decreasing), ] na.index <- rowSums(is.na(data)) if (na.options == 'remove') { data <- data[na.index == 0, ] cumFreq <- cumsum(data$Freq) freqPct <- if(sum(data$Freq) > 0) 100 * data$Freq / sum(data$Freq) else NA_real_ cumPct <- cumsum(freqPct) } else if(na.options == 'include') { cumFreq <- cumsum(data$Freq) freqPct <- if(sum(data$Freq) > 0) 100 * data$Freq / sum(data$Freq) else NA_real_ cumPct <- cumsum(freqPct) } else if(na.options == 'showexclude') { freq_tmp <- data$Freq freq_tmp[na.index != 0] <- NA cumFreq <- cumfun(freq_tmp) denom <- max(stats::na.omit(cumFreq), na.rm = TRUE) freqPct <- if(denom > 0) 100 * freq_tmp / denom else NA_real_ cumPct <- cumfun(freqPct) } if("cumFreq" %in% keep_cols) data$cumFreq <- cumFreq if("freqPercent" %in% keep_cols) data$freqPercent <- freqPct if("cumPercent" %in% keep_cols) data$cumPercent <- cumPct row.names(data) <- NULL data } cumfun <- function(x) { # function to create a cumulative sum retaining NAs, but omitting in sum function x2 <- rep(NA, length(x)) x.om <- stats::na.omit(x) if (length(x.om) == 0) { warning("For at least one level, all entries have NAs") } else { x2[!is.na(x)] <- cumsum(x.om) } x2 } add_freqlist_xterms <- function(xTerms) { xTerms$Freq <- list(variable="Freq", label="Freq", term="Freq") xTerms$cumFreq <- list(variable="cumFreq", label="Cumulative Freq", term="cumFreq") xTerms$freqPercent <- list(variable="freqPercent", label="Percent", term="freqPercent") xTerms$cumPercent <- list(variable="cumPercent", label="Cumulative Percent", term="cumPercent") xTerms } #' Helper functions for freqlist #' #' A set of helper functions for \code{\link{freqlist}}. #' #' @param x A \code{freqlist} object. #' @inheritParams tableby.internal #' @param decreasing Should the sort be increasing or decreasing? #' @seealso \code{\link{merge.freqlist}}, \code{\link{arsenal_table}}, \code{\link{sort}}, #' \code{\link{freqlist}}, \code{\link{summary.freqlist}}, \code{\link{freq.control}}, #' @details #' Note that \code{sort()} has to recalculate cumulative statistics. Note also that the reordering of rows #' will also affect which labels are duplicates; you may also want to consider using #' \code{dupLabels=TRUE} in \code{\link{freq.control}()}. #' @name freqlist.internal NULL #> NULL #' @rdname freqlist.internal #' @export is.freqlist <- function(x) inherits(x, "freqlist") #' @rdname freqlist.internal #' @export is.summary.freqlist <- function(x) inherits(x, "summary.freqlist") #' @rdname freqlist.internal #' @export head.summary.freqlist <- function(x, n = 6L, ...) { x$object <- lapply(x$object, function(d) set_attr(utils::head(d, n = n, ...), "labels", attr(d, "labels"))) x } #' @rdname freqlist.internal #' @export tail.summary.freqlist <- function(x, n = 6L, ...) { x$object <- lapply(x$object, function(d) set_attr(utils::tail(d, n = n, ...), "labels", attr(d, "labels"))) x } #' @rdname freqlist.internal #' @export sort.freqlist <- function(x, decreasing = FALSE, ...) { for(i in seq_along(x$tables)) { na.opts <- x$tables[[i]]$na.options keepcols <- names(x$tables[[i]]$x) # in case they subsetted away some of the columns x$tables[[i]]$tables[] <- lapply(x$tables[[i]]$tables, internalTable, na.options = na.opts, sort = TRUE, decreasing = decreasing, keep_cols = keepcols) } x } arsenal/R/tableby.stat.tests.R0000644000176200001440000001147614045271724015777 0ustar liggesusers### Test functions ####### notest <- function(x, x.by, ...) { list(p.value = NA_real_, method = "No test") } ## continuous tests: ## 1. anova (parametric) ## consider allowing glm, for now just lm with gaussian errors ## Would like to just use either "anova" or "aov", anova needs lm(group~x first, ## aov does not return p-value. Could add it after. ## For now, just write our own to avoid over-writing anova R-base function ## also, nice to keep same format to call, eval(call(function, x, x,by)), as other tests anova <- function(x, x.by, ..., test.always = FALSE) { tab <- table(is.na(x), x.by) if(!test.always && (any(tab[1, ] == 0) || any(colSums(tab) == 0))) { return(list(p.value=NA_real_, statistic.F=NA_real_, method="Linear Model ANOVA")) } aov.out <- stats::lm(x~x.by) test <- stats::anova(aov.out) list(p.value = test[1,ncol(test)], statistic.F = test[1,ncol(test)-1], method = "Linear Model ANOVA") } ## 2. kruskal-wallis (non-parametric) kwt <- function(x, x.by, ..., test.always = FALSE) { tab <- table(is.na(x), x.by) if(!test.always && (any(tab[1, ] == 0) || any(colSums(tab) == 0))) { return(list(p.value=NA_real_, statistic.F=NA_real_, method="Kruskal-Wallis rank sum test")) } stats::kruskal.test(x, as.factor(x.by)) } ## 2. wilcoxon (non-parametric) wt <- function(x, x.by, ..., wilcox.correct = FALSE, wilcox.exact = NULL, test.always = FALSE) { tab <- table(is.na(x), x.by) if(ncol(tab) != 2) stop("The Wilcoxon Rank Sum test must have exactly two groups") if(!test.always && (any(tab[1, ] == 0) || any(colSums(tab) == 0))) { return(list(p.value=NA_real_, statistic.F=NA_real_, method="Wilcoxon rank sum test")) } stats::wilcox.test(x ~ as.factor(x.by), correct = wilcox.correct, exact = wilcox.exact) } ## median test medtest <- function(x, x.by, ..., test.always = FALSE) { if(!requireNamespace("coin", quietly = TRUE)) { warning("The \"coin\" package is required to run a median test.", call. = FALSE) return(notest(x, x.by, ...)) } tab <- table(is.na(x), x.by) if(!test.always && (any(tab[1, ] == 0) || any(colSums(tab) == 0))) { return(list(p.value=NA_real_, method = "Median test")) } ## should be taken care of with coin:: check_pkg("coin") mtest <- coin::median_test(x~as.factor(x.by), teststat="quad") list(p.value=coin::pvalue(mtest), method="Median test", statistic=mtest@statistic@teststatistic) } ## two tests for categorical, ## 1. chisq goodness of fit, equal proportions across table cells chisq <- function(x, x.by, ..., chisq.correct=FALSE, simulate.p.value=FALSE, B=2000, test.always = FALSE) { tab <- table(x, x.by, exclude=NA) rs <- rowSums(tab) cs <- colSums(tab) if(!test.always && (any(rs == 0) || any(cs == 0)) && ncol(tab) > 1 && nrow(tab) > 1) { return(list(p.value=NA_real_, method="Pearson's Chi-squared test")) } if(length(cs) > 1) tab <- tab[rs > 0, , drop = FALSE] if(length(rs) > 1) tab <- tab[, cs > 0, drop = FALSE] suppressWarnings(stats::chisq.test(tab, correct=chisq.correct, simulate.p.value=simulate.p.value, B=B)) } ## 2. Fisher's exact test for prob of as or more extreme table fe <- function(x, x.by, ..., simulate.p.value=FALSE, B=2000, test.always = FALSE) { tab <- table(x, x.by, exclude=NA) rs <- rowSums(tab) cs <- colSums(tab) if((!test.always && (any(rs == 0) || any(cs == 0))) || ncol(tab) == 1 || nrow(tab) == 1) { return(list(p.value=NA_real_, method = "Fisher's Exact Test for Count Data")) } # this already subsets out rows and cols with all 0's stats::fisher.test(tab, simulate.p.value=simulate.p.value, B=B) } ## trend test for ordinal data trend <- function(x, x.by, ..., test.always = FALSE) { if(!requireNamespace("coin", quietly = TRUE)) { warning("The \"coin\" package is required to run a trend test.", call. = FALSE) return(notest(x, x.by, ...)) } tab <- table(x, x.by, exclude=NA) rs <- rowSums(tab) cs <- colSums(tab) if(!test.always && (any(rs == 0) || any(cs == 0))) { return(list(p.value=NA_real_, method = "Trend test for ordinal variables")) } ## should be taken care of with coin:: check_pkg("coin") indtest <- coin::independence_test(x~as.factor(x.by), teststat="quad") list(p.value=coin::pvalue(indtest), method="Trend test for ordinal variables", statistic=indtest@statistic@teststatistic) } ## ' logrank ## ' ## ' survdiff logrank test ## ' @param x surv variable ## ' @param x.by by, categorical variable ## ' @return test output with $method and $p.value logrank <- function(x, x.by, ..., test.always = FALSE) { tab <- table(is.na(x), x.by, exclude=NA) if(!test.always && (any(tab[1, ] == 0) || any(colSums(tab) == 0))) { return(list(p.value=NA_real_, method="survdiff logrank")) } out <- survival::survdiff(x ~ x.by) out$p.value <- 1-stats::pchisq(out$chisq, df=sum(tab[1,] != 0)-1) out$method <- "survdiff logrank" out } arsenal/R/summary.freqlist.R0000644000176200001440000000636113755037257015574 0ustar liggesusers#' summary.freqlist #' #' Summarize the \code{freqlist} object. #' #' @param object an object of class \code{\link{freqlist}} #' @param ... For \code{summary.freqlist}, these are passed to \code{\link{as.data.frame.freqlist}} (and hence to #' \code{\link{freq.control}}). For the print method, these are #' additional arguments passed to the \code{\link[knitr]{kable}} function. #' @param x An object of class \code{summary.freqlist}. #' @inheritParams summary.tableby #' @return An object of class \code{"summary.freqlist"} (invisibly for the print method). #' @seealso \code{\link{freqlist}}, \code{\link[base]{table}}, \code{\link[stats]{xtabs}}, \code{\link[knitr]{kable}}, #' \code{\link{freq.control}}, \code{\link{freqlist.internal}} #' #' @examples #' # load mockstudy data #' data(mockstudy) #' tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA = "ifany") #' noby <- freqlist(tab.ex, na.options = "include") #' summary(noby) #' withby <- freqlist(tab.ex, strata = c("arm","sex"), na.options = "showexclude") #' summary(withby) #' summary(withby, dupLabels = TRUE) #' @author Tina Gunderson, with major revisions by Ethan Heinzen #' @name summary.freqlist NULL #> NULL #' @rdname summary.freqlist #' @export summary.freqlist <- function(object, ..., labelTranslations = NULL, title = NULL) { dat <- as.data.frame(object, ..., labelTranslations = labelTranslations, list.ok = TRUE) structure(list( object = set_attr(dat, "control", NULL), control = attr(dat, "control"), title = title ), class = c("summary.freqlist", "summary.arsenal_table")) } as_data_frame_summary_freqlist <- function(tb, labs, cntrl) { fmtdups <- function(x, i) { x[i] <- lapply(x[i], as.character) if(nrow(x) == 0) return(x) tab <- as.matrix(x[i]) tab[is.na(tab)] <- "NA" num <- max(lengths(gregexpr(",", tab))) for(col in seq_len(ncol(tab))) { tmp <- apply(tab[, 1:col, drop = FALSE], 1, paste, collapse = strrep(",", num + 1)) x[c(FALSE, tmp[-1] == tmp[-length(tmp)]), colnames(tab)[col]] <- "" } x } fmtdigits <- function(x, digits.count, digits.pct) { if(nrow(x) == 0) return(x) if("Freq" %in% names(x)) x$Freq <- formatC(x$Freq, digits = digits.count, format = "f") if("cumFreq" %in% names(x)) x$cumFreq <- formatC(x$cumFreq, digits = digits.count, format = "f") if("freqPercent" %in% names(x)) x$freqPercent <- formatC(x$freqPercent, digits = digits.pct, format = "f") if("cumPercent" %in% names(x)) x$cumPercent <- formatC(x$cumPercent, digits = digits.pct, format = "f") x } idx <- names(tb) %nin% c("Freq", "cumFreq", "freqPercent", "cumPercent") tb <- fmtdigits(tb, digits.count = cntrl$digits.count, digits.pct = cntrl$digits.pct) if(!cntrl$dupLabels) tb <- fmtdups(tb, idx) tb <- stats::setNames(tb, labs[names(tb)]) set_attr(set_attr(tb, "labels", NULL), "align", c("r", "l")[1 + idx]) } #' @rdname summary.freqlist #' @export as.data.frame.summary.freqlist <- function(x, ..., list.ok = FALSE) { out <- Map(x$object, lapply(x$object, attr, "labels"), f = as_data_frame_summary_freqlist, MoreArgs = list(cntrl = x$control)) if(!list.ok) { if(length(out) == 1) out <- out[[1]] else warning("as.data.frame.summary.freqlist is returning a list of data.frames") } out } arsenal/R/comparedf.control.R0000644000176200001440000001762413656527335015673 0ustar liggesusers #' Control settings for \code{comparedf} function #' #' Control tolerance definitions for the \code{\link{comparedf}} function. #' #' @param tol.logical,tol.num,tol.char,tol.factor,tol.date,tol.other A function or one of the shortcut character strings or a list thereof, #' denoting the tolerance function to use for a given data type. See "details", below. #' @param tol.num.val Numeric; maximum value of differences allowed in numerics (fed to the function given in \code{tol.num}). #' @param int.as.num Logical; should integers be coerced to numeric before comparison? Default FALSE. #' @param factor.as.char Logical; should factors be coerced to character before comparison? Default FALSE. #' @param tol.date.val Numeric; maximum value of differences allowed in dates (fed to the function given in \code{tol.date}). #' @param tol.vars Either \code{"none"} (the default), denoting that variable names are to be matched as-is, #' a named vector manually specifying variable names to compare (where the names correspond to columns of #' \code{x} and the values correspond to columns of \code{y}), or a #' character vector denoting equivalence classes for characters in the variable names. See "details", below. #' @param max.print.vars Integer denoting maximum number of variables to report in the "variables not shared" and "variables not compared" #' output. \code{NA} will print all differences. #' @param max.print.obs Integer denoting maximum number of not-shared observations to report. \code{NA} will print all differences. #' @param max.print.diffs.per.var,max.print.diffs Integers denoting the maximum number of differences to report for each variable or overall. #' \code{NA} will print all differences for each variable or overall. #' @param max.print.attrs Integers denoting the maximum number of non-identical attributes to report.\code{NA} will print all differences. #' @param max.print.diff Deprecated. #' @param ... Other arguments (not in use at this time). #' @return A list containing the necessary parameters for the \code{\link{comparedf}} function. #' @details #' The following character strings are accepted: #' \itemize{ #' \item{\code{tol.logical = "none"}: compare logicals exactly as they are.} #' \item{\code{tol.num = "absolute"}: compare absolute differences in numerics.} #' \item{\code{tol.num = "percent"}, \code{tol.num = "pct"} compare percent differences in numerics.} #' \item{\code{tol.char = "none"}: compare character strings exactly as they are.} #' \item{\code{tol.char = "trim"}: left-justify and trim all trailing white space.} #' \item{\code{tol.char = "case"}: allow differences in upper/lower case.} #' \item{\code{tol.char = "both"}: combine \code{"trim"} and \code{"case"}.} #' \item{\code{tol.factor = "none"}: match both character labels and numeric levels.} #' \item{\code{tol.factor = "levels"}: match only the numeric levels.} #' \item{\code{tol.factor = "labels"}: match only the labels.} #' \item{\code{tol.date = "absolute"}: compare absolute differences in dates.} #' \item{\code{tol.other = "none"}: expect objects of other classes to be exactly identical.} #' } #' #' A list with names mapped to \code{x} can be used to specify tolerances by variable. One unnamed element is supported #' as the default. #' #' \code{tol.vars}: If not set to \code{"none"} (the default) or a named vector, #' the \code{tol.vars} argument is a character vector denoting equivalence classes #' for the characters in the variable names. A single character in this vector means to replace that character #' with \code{""}. All other strings in this vector are split by character and replaced by the first character in the string. #' #' E.g., a character vector \code{c("._", "aA", " ")} would denote that the dot and underscore are equivalent (to be translated to a dot), #' that "a" and "A" are equivalent (to be translated to "a"), and that spaces should be removed. #' #' The special character string \code{"case"} in this vector is the same as specifying \code{paste0(letters, LETTERS)}. #' @examples #' cntl <- comparedf.control( #' tol.num = "pct", # calculate percent differences #' tol.vars = c("case", # ignore case #' "._", # set all underscores to dots. #' "e") # remove all letter e's #' ) #' #' cntl <- comparedf.control(tol.char = list( #' "none", # the default #' x1 = "case", # be case-insensitive for the variable "x1" #' x2 = function(x, y) tol.NA(x, y, x != y | y == "NA") # a custom-defined tolerance #' )) #' @seealso \code{\link{comparedf}}, \code{\link{comparedf.tolerances}}, \code{\link{summary.comparedf}} #' @author Ethan Heinzen #' @export comparedf.control <- function( tol.logical = "none", tol.num = c("absolute", "percent", "pct"), tol.num.val = sqrt(.Machine$double.eps), int.as.num = FALSE, tol.char = c("none", "trim", "case", "both"), tol.factor = c("none", "levels", "labels"), factor.as.char = FALSE, tol.date = "absolute", tol.date.val = 0, tol.other = "none", tol.vars = "none", max.print.vars = NA, max.print.obs = NA, max.print.diffs.per.var = 10, max.print.diffs = 50, max.print.attrs = NA, ..., max.print.diff = 10) { make_tol <- function(x, prefix, defaults) { if(is.function(x)) return(list(x)) if(is.character(x)) return(list(match.fun(paste0(prefix, match.arg(x, defaults, several.ok = FALSE))))) if(!is.list(x) || (length(x) > 1 && (is.null(names(x)) || sum(names(x) == "") > 1))) stop("Provided tolerances should be a function, single character string, or a named list of functions/character strings") out <- lapply(x, function(y) { if(!is.function(y)) match.fun(paste0(prefix, match.arg(y, defaults, several.ok = FALSE))) else y }) names(out) <- names(x) out } #### Logical #### tol.logical <- make_tol(tol.logical, "tol.logical.", "none") #### Numerics #### if(!is.numeric(tol.num.val)) stop("'tol.num.val' needs to be numeric.") tol.num <- make_tol(tol.num, "tol.num.", c("absolute", "percent", "pct")) if(!is.logical(int.as.num) || length(int.as.num) != 1 || is.na(int.as.num)) stop("'int.as.num' should be TRUE or FALSE.") #### Characters and factors #### tol.char <- make_tol(tol.char, "tol.char.", c("none", "trim", "case", "both")) tol.factor <- make_tol(tol.factor, "tol.factor.", c("none", "levels", "labels")) if(!is.logical(factor.as.char) || length(factor.as.char) != 1 || is.na(factor.as.char)) stop("'factor.as.char' should be TRUE or FALSE.") #### Dates #### if(!is.numeric(tol.date.val)) stop("'tol.date.val' needs to be numeric.") tol.date <- make_tol(tol.date, "tol.date.", "absolute") #### Other #### tol.other <- make_tol(tol.other, "tol.other.", "none") #### Variable names #### if(!is.character(tol.vars)) stop("'tol.vars' must be a character string or vector.") if(is.null(names(tol.vars))) { if("none" %in% tol.vars || length(tol.vars) == 0) tol.vars <- "none" if("case" %in% tol.vars) tol.vars <- c(paste0(letters, LETTERS), tol.vars[tol.vars != "case"]) } if(!missing(max.print.diff)) { .Deprecated(msg = "Using 'max.print.diff = ' is deprecated. Use 'max.print.diffs.per.var = ' instead.") max.print.diffs.per.var <- max.print.diff } chk <- function(x) { tmp <- deparse(substitute(x)) if(length(x) != 1 || (!is.na(x) && (!is.numeric(x) || x <= 0))) stop("'", tmp, "' needs to be NA or a numeric > 0") } chk(max.print.vars) chk(max.print.obs) chk(max.print.diffs.per.var) chk(max.print.diffs) chk(max.print.attrs) list(tol.logical = tol.logical, tol.num = tol.num, tol.num.val = tol.num.val, int.as.num = int.as.num, tol.char = tol.char, tol.factor = tol.factor, factor.as.char = factor.as.char, tol.date = tol.date, tol.date.val = tol.date.val, tol.other = tol.other, tol.vars = tol.vars, max.print.vars = max.print.vars, max.print.obs = max.print.obs, max.print.diffs.per.var = max.print.diffs.per.var, max.print.diffs = max.print.diffs, max.print.attrs = max.print.attrs) } arsenal/R/comparedf.tolerances.R0000644000176200001440000000542313767210412016330 0ustar liggesusers #' \code{comparedf} tolerances #' #' Internal functions defining tolerances for the \code{\link{comparedf.control}} function. #' To create your own tolerance definitions, see the vignette. #' #' @param x,y vectors of the appropriate lengths and types. #' @param tol A numeric tolerance #' @param idx A logical vector of appropriate length. #' @return A logical vector of length equal to that of \code{x} and \code{y}, where \code{TRUE} denotes a #' difference between \code{x} and \code{y}, and \code{FALSE} denotes no difference between \code{x} and \code{y}. #' @details #' \code{tol.NA} takes as differences between two vectors any elements which are NA in one but not the other, #' or which are non-NA in both and \code{TRUE} in \code{idx}. It is useful for handling NAs in custom tolerance functions. #' @author Ethan Heinzen #' @seealso \code{\link{comparedf.control}}, \code{\link{comparedf}} #' @name comparedf.tolerances NULL #> NULL #' @rdname comparedf.tolerances #' @export tol.NA <- function(x, y, idx) { (is.na(x) & !is.na(y)) | (is.na(y) & !is.na(x)) | (!is.na(x) & !is.na(y) & idx) } #' @rdname comparedf.tolerances #' @export tol.num.absolute <- function(x, y, tol) { both.inf <- is.infinite(x) & is.infinite(y) tol.NA(x, y, (both.inf & x != y) | (!both.inf & abs(x - y) > tol)) } #' @rdname comparedf.tolerances #' @export tol.num.percent <- tol.num.pct <- function(x, y, tol) { both.inf <- is.infinite(x) & is.infinite(y) tol.NA(x, y, (x == 0 & y != 0) | (both.inf & x != y) | (!both.inf & x != 0 & abs((x - y)/x) > tol)) } #' @rdname comparedf.tolerances #' @export tol.num.pct <- tol.num.percent #' @rdname comparedf.tolerances #' @export tol.factor.none <- function(x, y) { tol.NA(x, y, (as.character(x) != as.character(y)) | (as.numeric(x) != as.numeric(y))) } #' @rdname comparedf.tolerances #' @export tol.factor.levels <- function(x, y) { tol.NA(x, y, as.numeric(x) != as.numeric(y)) } #' @rdname comparedf.tolerances #' @export tol.factor.labels <- function(x, y) { tol.NA(x, y, as.character(x) != as.character(y)) } #' @rdname comparedf.tolerances #' @export tol.char.both <- function(x, y) { tol.NA(x, y, tolower(trimws(x)) != tolower(trimws(y))) } #' @rdname comparedf.tolerances #' @export tol.char.case <- function(x, y) { tol.NA(x, y, tolower(x) != tolower(y)) } #' @rdname comparedf.tolerances #' @export tol.char.trim <- function(x, y) { tol.NA(x, y, trimws(x) != trimws(y)) } #' @rdname comparedf.tolerances #' @export tol.char.none <- function(x, y) { tol.NA(x, y, x != y) } #' @rdname comparedf.tolerances #' @export tol.date.absolute <- tol.num.absolute #' @rdname comparedf.tolerances #' @export tol.logical.none <- tol.char.none #' @rdname comparedf.tolerances #' @export tol.other.none <- function(x, y) { unlist(Map(Negate(identical), x, y)) } arsenal/R/freq.control.R0000644000176200001440000000336613632700352014650 0ustar liggesusers #' Control settings for \code{freqlist} function #' #' Control test and summary settings for the \code{\link{freqlist}} function. #' #' @param sparse a logical value indicating whether to keep rows with counts of zero. #' The default is \code{FALSE} (drop zero-count rows). #' @param single logical, indicating whether to collapse results created using a strata variable into a single table for printing #' @param dupLabels logical: should labels which are the same as the row above be printed? The default (\code{FALSE}) more #' closely approximates \code{PROC FREQ} output from SAS, where a label carried down from the row above is left blank. #' @param digits.count Number of decimal places for count values. #' @param digits.pct Number of decimal places for percents. #' @param ... additional arguments. #' @param digits A deprecated argument #' @return A list with settings to be used within the \code{freqlist} function. #' #' @seealso \code{\link{freqlist}}, \code{\link{summary.freqlist}}, \code{\link{freqlist.internal}} #' @author Ethan Heinzen #' @export freq.control <- function(sparse = FALSE, single = FALSE, dupLabels = FALSE, digits.count = 0L, digits.pct = 2L, ..., digits = NULL) { if(!is.null(digits)) { .Deprecated(msg = "Using 'digits = ' is deprecated. Use 'digits.pct = ' instead.") digits.pct <- digits } # digits are OK to be NULL. See ?format if(!is.null(digits.count) && digits.count < 0L) { warning("digits.count must be >= 0. Set to default.") digits.count <- 0L } if(!is.null(digits.pct) && digits.pct < 0L) { warning("digits.pct must be >= 0. Set to default.") digits.pct <- 1L } list(sparse = sparse, single = single, dupLabels = dupLabels, digits.count = digits.count, digits.pct = digits.pct) } arsenal/R/magic8.R0000644000176200001440000000255713632700352013405 0ustar liggesusers magic8 <- function(question=NULL){ sample(c("There is an 'apply' function for that", 'It is decidedly so', 'Without a doubt', "Need to install another bioconductor package for that", 'Better off asking Siri', 'You may rely on it', 'As I see it, yes', 'Most likely', "It might be time for a walking break", 'Just bootstrap it', 'May need to perform simulations', 'Signs point to yes', "You need to go home and rethink your life", 'Reply hazy; try again', 'Insufficient sample size', 'The answer is 7', "Bazinga!", 'Cannot predict right now', 'Concentrate and ask again', "Don't count on it", 'Maybe, p-value 0.06', 'Hierarchical clustering always gives an answer', 'My sources say no', 'Make a 3-D pie chart', "Yes, if the figure is colorful", 'Very doubtful', "PC Load Letter", "Not Sure", "The possibility of successfully answering your question is approximately three thousand seven hundred and twenty to one!", "Depends...did you want the two-tailed probability or the one-tailed?", "Error: 404 Not Found", "You index is off by one (we won't tell)", "Correlation does not equal causation"), 1) } arsenal/R/deprecated.R0000644000176200001440000000027413632700352014327 0ustar liggesusers #' Deprecated functions in \code{arsenal} #' #' Details about deprecated functions in \code{arsenal} #' #' @seealso \code{\link{arsenal-defunct}} #' @name arsenal-deprecated NULL #> NULL arsenal/R/not.in.R0000644000176200001440000000133713675162464013452 0ustar liggesusers########################################################################################### ### Creation Date: 6/2015 ### Last Modified: Monday, 18 July 2016 03:00 PM CDT ########################################################################################### #' Not in #' #' The not-in operator for R. #' #' @param x vector or \code{NULL}: the values to be matched. #' @param table vector or \code{NULL}: the values to be matched against. #' @return The negation of \code{\link[base:match]{\%in\%}}. #' @examples #' 1 %nin% 2:10 #' c("a", "b") %nin% c("a", "c", "d") #' @seealso \code{\link[base:match]{\%in\%}} #' @author Raymond Moore #' @aliases nin #' @export `%nin%` <- function(x, table) match(x, table, nomatch = 0L) == 0L arsenal/R/selectall.R0000644000176200001440000000302113713262477014203 0ustar liggesusers #' Make a column for "select all" input #' #' @param ... Named arguments of the same length. These should be logical, numeric (0/1) or a factor with two levels. #' @param i,j,drop Arguments to `[.matrix` #' @param x An object of class "selectall" #' @examples #' d <- data.frame(grp = rep(c("A", "B"), each = 5)) #' d$s <- selectall( #' `Option 1` = c(rep(1, 4), rep(0, 6)), #' `Option 2` = c(0, 1, 0, 0, 0, 1, 1, 1, 0, 0), #' `Option 3` = 1, #' `Option 4` = 0 #' ) #' summary(tableby(grp ~ s, data = d), text = TRUE) #' @seealso \code{\link{tableby}}, \code{\link{paired}} #' @name selectall NULL #> NULL #' @rdname selectall #' @export selectall <- function(...) { Call <- match.call() args <- lapply(list(...), function(x) if(is.factor(x)) as.numeric(x)-1 else if(is.numeric(x)) x else if(is.logical(x)) +x else as.numeric(factor(x))-1) out <- do.call(cbind, args) if(any(out %nin% c(0, 1, NA))) stop("Some elements may not have two levels") as.selectall(out) } #' @rdname selectall #' @export as.selectall <- function(x) { structure(x, class = c("selectall", class(x)[class(x) != "selectall"])) } #' @rdname selectall #' @export as.matrix.selectall <- function(x, ...) { class(x) <- "matrix" x } #' @rdname selectall #' @export `[.selectall` <- function(x, i, j, drop = FALSE) { as.selectall(as.matrix(x)[i, j, drop = FALSE]) } #' @rdname selectall #' @export is.na.selectall <- function(x) { rowSums(is.na(as.matrix(x))) > 0 } #' @rdname selectall #' @export is.selectall <- function(x) inherits(x, "selectall") arsenal/R/modelsum.control.R0000644000176200001440000003202314051176253015533 0ustar liggesusers## Purpose: control parameters for modelsum function ## Authors: P Votruba, Jason Sinnwell, Beth Atkinson ## Created: 9/3/2015 #' Control settings for \code{modelsum} function #' #' Control test and summary settings for \code{\link{modelsum}} function. #' #' @param digits Numeric, denoting the number of digits after the decimal point for beta coefficients and standard errors. #' @param digits.ratio Numeric, denoting the number of digits after the decimal point for ratios, e.g. OR, RR, HR. #' @param digits.p Numeric, denoting the number of digits for p-values. See "Details", below. #' @param format.p Logical, denoting whether to format p-values. See "Details", below. #' @param show.adjust Logical, denoting whether to show adjustment terms. #' @param show.intercept Logical, denoting whether to show intercept terms. #' @param conf.level Numeric, giving the confidence level. #' @param ordinal.stats,binomial.stats,survival.stats,gaussian.stats,poisson.stats,negbin.stats,clog.stats,relrisk.stats #' Character vectors denoting which stats to show for the various model types. #' @param stat.labels A named list of labels for all the stats used above. #' @param ... Other arguments (not in use at this time). #' @return A list with settings to be used within the \code{modelsum} function. #' @details #' If \code{format.p} is \code{FALSE}, \code{digits.p} denotes the number of significant digits shown. The #' p-values will be in exponential notation if necessary. If \code{format.p} is \code{TRUE}, #' \code{digits.p} will determine the number of digits after the decimal point to show. If the p-value #' is less than the resulting number of places, it will be formatted to show so. #' @seealso \code{\link{modelsum}}, \code{\link{summary.modelsum}}, \code{\link{modelsum.internal}} #' @export modelsum.control <- function( digits = 3L, digits.ratio = 3L, digits.p = 3L, format.p = TRUE, show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95, ordinal.stats=c("OR","CI.lower.OR","CI.upper.OR", "p.value","Nmiss"), binomial.stats=c("OR","CI.lower.OR","CI.upper.OR","p.value", "concordance","Nmiss"), gaussian.stats=c("estimate","std.error","p.value","adj.r.squared","Nmiss"), poisson.stats=c("RR","CI.lower.RR", "CI.upper.RR","p.value","Nmiss"), negbin.stats=c("RR","CI.lower.RR", "CI.upper.RR","p.value","Nmiss"), relrisk.stats=c("RR","CI.lower.RR", "CI.upper.RR","p.value","Nmiss"), clog.stats=c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", "concordance", "Nmiss"), survival.stats=c("HR","CI.lower.HR","CI.upper.HR","p.value","concordance","Nmiss"), stat.labels = list(), ... ) { if("nsmall" %in% names(list(...))) .Deprecated(msg = "Using 'nsmall = ' is deprecated. Use 'digits = ' instead.") if("nsmall.ratio" %in% names(list(...))) .Deprecated(msg = "Using 'nsmall.ratio = ' is deprecated. Use 'digits.ratio = ' instead.") if("digits.test" %in% names(list(...))) .Deprecated(msg = "Using 'digits.test = ' is deprecated. Use 'digits.p = ' instead.") # digits and digits.test are OK to be NULL. See ?format if(!is.null(digits) && digits < 0L) { warning("digits must be >= 0. Set to default.") digits <- 3L } if(!is.null(digits.ratio) && digits.ratio < 0L) { warning("digits.ratio must be >= 0. Set to default.") digits.ratio <- 3L } if(!is.null(digits.p) && digits.p < 0L) { warning("digits.p must be >= 0. Set to default.") digits.p <- 3L } if(conf.level <= 0 || conf.level >= 1) { warning("conf.level must be between (0,1). Setting to default.\n") conf.level <- 0.95 } ########################## ## Ordinal stats: ########################## ordinal.stats.valid <- c( "Nmiss", "OR", "CI.lower.OR", "CI.upper.OR", "p.value", # default "estimate", "CI.OR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "N", "Nmiss2", "endpoint", "std.error", "statistic", "logLik", "AIC", "BIC", "edf", "deviance", "df.residual", "p.value.lrt" ) if(any(ordinal.stats %nin% ordinal.stats.valid)) { stop("Invalid binomial stats: ", paste(ordinal.stats[ordinal.stats %nin% ordinal.stats.valid],collapse=","), "\n") } ## let CI.OR decode to CI.lower.OR and CI.upper.OR if(any(ordinal.stats == "CI.OR")) { ordinal.stats <- unique(c(ordinal.stats[ordinal.stats != "CI.OR"], "CI.lower.OR", "CI.upper.OR")) } if(any(ordinal.stats == "CI.estimate")) { ordinal.stats <- unique(c(ordinal.stats[ordinal.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } ########################## ## Binomial stats: ########################## ##Other coefficient columns: ##CI.estimate, N, Nmiss2, depvar (show name of dependent variable), estimate, se, zstat ##Other model fits: logLik,AIC,BIC binomial.stats.valid <- c( "Nmiss", "OR", "CI.lower.OR", "CI.upper.OR", "p.value", "concordance", # default "estimate", "CI.OR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "CI.wald", "CI.lower.wald", "CI.upper.wald", "CI.OR.wald", "CI.lower.OR.wald", "CI.upper.OR.wald", "N", "Nmiss2", "endpoint", "std.error", "statistic", "Nevents", "logLik", "AIC", "BIC", "null.deviance", "deviance", "df.residual", "df.null", "p.value.lrt" ) if(any(binomial.stats %nin% binomial.stats.valid)) { stop("Invalid binomial stats: ", paste(binomial.stats[binomial.stats %nin% binomial.stats.valid],collapse=","), "\n") } ## let CI.OR decode to CI.lower.OR and CI.upper.OR if(any(binomial.stats == "CI.OR")) { binomial.stats <- unique(c(binomial.stats[binomial.stats != "CI.OR"], "CI.lower.OR", "CI.upper.OR")) } if(any(binomial.stats == "CI.estimate")) { binomial.stats <- unique(c(binomial.stats[binomial.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } if(any(binomial.stats == "CI.wald")) { binomial.stats <- unique(c(binomial.stats[binomial.stats != "CI.wald"], "CI.lower.wald", "CI.upper.wald")) } if(any(binomial.stats == "CI.OR.wald")) { binomial.stats <- unique(c(binomial.stats[binomial.stats != "CI.OR.wald"], "CI.lower.OR.wald", "CI.upper.OR.wald")) } ########################## ## Gaussian stats: ########################## ##Other coefficient columns: CI.estimate, N, Nmiss2, t.stat, standard.estimate, endpoint ##Other model fits: r.squared, AIC, BIC,logLik gaussian.stats.valid <- c( "Nmiss", "estimate", "std.error", "p.value", "adj.r.squared", #default "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "N", "Nmiss2", "statistic", "standard.estimate", "endpoint", "r.squared", "AIC", "BIC", "logLik", "statistic.F", "p.value.F", "p.value.lrt" ) if(any(gaussian.stats %nin% gaussian.stats.valid)) { stop("Invalid gaussian stats: ", paste(gaussian.stats[gaussian.stats %nin% gaussian.stats.valid],collapse=","), "\n") } if(any(gaussian.stats == "CI.estimate")) { gaussian.stats <- unique(c(gaussian.stats[gaussian.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } ########################## ## Poisson stats: ########################## ##(quasi)/poisson.stats=c("Nmiss","RR","CI.RR", "p.value","concordance"), ##Other coeff columns: CI.estimate, CI.RR (ci for relrisk),N,Nmiss2, std.error, estimate, z.stat, endpoint ##Other model fits: AIC,BIC,logLik, dispersion ## dispersion = deviance/df.residual poisson.stats.valid <- c( "RR", "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss", # default "CI.RR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "CI.RR", "Nmiss2", "std.error", "estimate", "statistic", "endpoint", "AIC", "BIC", "logLik", "dispersion", "null.deviance", "deviance", "df.residual", "df.null", "p.value.lrt" ) if(any(poisson.stats %nin% poisson.stats.valid)) { stop("Invalid poisson stats: ", paste(poisson.stats[poisson.stats %nin% poisson.stats.valid],collapse=","), "\n") } ## let CI.RR decode to CI.lower.RR and CI.upper.RR if(any(poisson.stats == "CI.RR")) { poisson.stats <- unique(c(poisson.stats[poisson.stats != "CI.RR"], "CI.lower.RR", "CI.upper.RR")) } if(any(poisson.stats == "CI.estimate")) { poisson.stats <- unique(c(poisson.stats[poisson.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } ########################## ## Negbin stats: ########################## negbin.stats.valid <- c( "RR", "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss", # default "CI.RR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "CI.RR", "Nmiss2", "std.error", "estimate", "statistic", "endpoint", "AIC", "BIC", "logLik", "dispersion", "null.deviance", "deviance", "df.residual", "df.null", "theta", "SE.theta", "p.value.lrt" ) if(any(negbin.stats %nin% negbin.stats.valid)) { stop("Invalid poisson stats: ", paste(negbin.stats[negbin.stats %nin% negbin.stats.valid],collapse=","), "\n") } ## let CI.RR decode to CI.lower.RR and CI.upper.RR if(any(negbin.stats == "CI.RR")) { negbin.stats <- unique(c(negbin.stats[negbin.stats != "CI.RR"], "CI.lower.RR", "CI.upper.RR")) } if(any(negbin.stats == "CI.estimate")) { negbin.stats <- unique(c(negbin.stats[negbin.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } ########################## ## clog stats: ########################## clog.stats.valid <- c( "OR", "CI.lower.OR", "CI.upper.OR", "p.value", "concordance", "Nmiss", # default "CI.OR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "N", "Nmiss2", "estimate", "std.error", "endpoint", "Nevents", "statistic", "r.squared", "r.squared.max", "logLik", "AIC", "BIC", "statistic.log", "p.value.log", "statistic.sc", "p.value.sc", "statistic.wald", "p.value.wald", "N", "std.error.concordance", "p.value.lrt" ) if(any(clog.stats %nin% clog.stats.valid)) { stop("Invalid clog stats: ", paste(clog.stats[clog.stats %nin% clog.stats.valid],collapse=","), "\n") } ## let CI.OR decode to CI.lower.OR and CI.upper.OR if(any(clog.stats == "CI.OR")) { clog.stats <- unique(c(clog.stats[clog.stats != "CI.OR"], "CI.lower.OR", "CI.upper.OR")) } if(any(clog.stats == "CI.estimate")) { clog.stats <- unique(c(clog.stats[clog.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } ########################## ## relrisk stats: ########################## ##(quasi)/poisson.stats=c("Nmiss","RR","CI.RR", "p.value","concordance"), ##Other coeff columns: CI.estimate, CI.RR (ci for relrisk),N,Nmiss2, std.error, estimate, z.stat, endpoint ##Other model fits: AIC,BIC,logLik, dispersion ## dispersion = deviance/df.residual relrisk.stats.valid <- c( "RR", "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss", # default "CI.RR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "CI.RR", "Nmiss2", "std.error", "estimate", "statistic", "endpoint", "AIC", "BIC", "logLik", "dispersion", "null.deviance", "deviance", "df.residual", "df.null" ) if(any(relrisk.stats %nin% relrisk.stats.valid)) { stop("Invalid relrisk stats: ", paste(relrisk.stats[relrisk.stats %nin% relrisk.stats.valid],collapse=","), "\n") } ## let CI.RR decode to CI.lower.RR and CI.upper.RR if(any(relrisk.stats == "CI.RR")) { relrisk.stats <- unique(c(relrisk.stats[relrisk.stats != "CI.RR"], "CI.lower.RR", "CI.upper.RR")) } if(any(relrisk.stats == "CI.estimate")) { relrisk.stats <- unique(c(relrisk.stats[relrisk.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } ########################## ## Survival stats: ########################## ##surv.stats=c(Nmiss,HR,CI.HR,p.value,concorance) ##Other possible coefficient table columns: CI.estimate,N,Nmiss2,estimate,se,endpoint,Nevents,z.stat ##Other possible model fits: r.squared, logLik, AIC, BIC surv.stats.valid <- c( "HR", "CI.lower.HR", "CI.upper.HR", "p.value", "concordance", "Nmiss", # default "CI.HR", "CI.estimate", "CI.lower.estimate", "CI.upper.estimate", "N", "Nmiss2", "estimate", "std.error", "endpoint", "Nevents", "statistic", "r.squared", "r.squared.max", "logLik", "AIC", "BIC", "statistic.log", "p.value.log", "statistic.sc", "p.value.sc", "statistic.wald", "p.value.wald", "N", "std.error.concordance", "p.value.lrt" ) if(any(survival.stats %nin% surv.stats.valid)) { stop("Invalid survival stats: ", paste(survival.stats[survival.stats %nin% surv.stats.valid], collapse=","), "\n") } ## let CI.HR decode to CI.lower.HR and CI.upper.HR if(any(survival.stats == "CI.HR")) { survival.stats <- unique(c(survival.stats[survival.stats != "CI.HR"], "CI.lower.HR", "CI.upper.HR")) } if(any(survival.stats == "CI.estimate")) { survival.stats <- unique(c(survival.stats[survival.stats != "CI.estimate"], "CI.lower.estimate", "CI.upper.estimate")) } list(digits=digits, digits.ratio=digits.ratio, digits.p = digits.p, format.p = format.p, show.adjust=show.adjust, show.intercept=show.intercept, conf.level=conf.level, ordinal.stats=ordinal.stats, binomial.stats=binomial.stats, gaussian.stats=gaussian.stats, poisson.stats=poisson.stats, negbin.stats = negbin.stats, clog.stats=clog.stats, relrisk.stats=relrisk.stats, survival.stats=survival.stats, stat.labels = stat.labels) } arsenal/R/comparedf.internal.R0000644000176200001440000002553313715042102016001 0ustar liggesuserstweakcolnames <- function(by.x, by.y, cn.x, cn.y, control) { if(any(by.x %in% setdiff(cn.y, by.y))) { stop("A by-variable for x appears in non-by-variables for y. base::merge() will not work in this function.") } if(anyDuplicated(cn.x) || anyDuplicated(cn.y)) stop("Sorry, there are duplicate colnames.") cn.y[match(by.y, cn.y)] <- by <- by.x rm(by.x, by.y) # just to make sure we don't use it again rn <- "..row.names.." %in% by # don't replace the characters for "..row.names.." if(rn) { if(cn.x[length(cn.x)] != "..row.names.." || cn.y[length(cn.y)] != "..row.names..") { stop("Something went wrong with the row.names.") } cn.x <- cn.x[- length(cn.x)] cn.y <- cn.y[- length(cn.y)] } tv <- control$tol.vars if(!is.null(names(tv))) { for(i in seq_along(tv)) { if(bad1 <- names(tv)[i] %nin% cn.x) warning("Variable tolerance '", names(tv)[i], "' not found in colnames of x") if(bad2 <- tv[i] %nin% cn.y) warning("Variable tolerance '", tv[i], "' not found in colnames of y") if(!bad1 && !bad2) cn.y[cn.y == tv[i]] <- names(tv)[i] } # no need to do anything with the by-variables, since we've already set those to x } else if("none" %nin% tv) { for(elt in strsplit(tv, "", fixed = TRUE)) { if(length(elt) == 1) { cn.x <- gsub(elt, "", cn.x, fixed = TRUE) cn.y <- gsub(elt, "", cn.y, fixed = TRUE) if(!rn) by <- gsub(elt, "", by, fixed = TRUE) } else { for(elt2 in elt[-1]) { # I know I could do some fancy regex work here, but for readibility, the for-loop is easiest. cn.x <- gsub(elt2, elt[1], cn.x, fixed = TRUE) cn.y <- gsub(elt2, elt[1], cn.y, fixed = TRUE) if(!rn) by <- gsub(elt2, elt[1], by, fixed = TRUE) } } } if(anyDuplicated(cn.x) || anyDuplicated(cn.y)) stop("'tol.vars' resulted in duplicate colnames.") } if(rn) { cn.x <- c(cn.x, "..row.names..") cn.y <- c(cn.y, "..row.names..") } return(list(by = by, cn.x = cn.x, cn.y = cn.y)) } cleanup.null.na <- function(x) if(is.null(x) || allNA(x)) NA_character_ else x compare_values <- function(i, v, df, byvars, contr) { if(is.na(v$var.x[i]) || is.na(v$var.y[i])) return(NULL) if(v$tmp[i] %in% byvars) return("by-variable") var <- v$tmp[i] var.x <- df[[paste0(var, ".x")]] var.y <- df[[paste0(var, ".y")]] int.num <- function(vr) is.integer(vr) || is.numeric(vr) fac.chr <- function(vr) is.factor(vr) || is.character(vr) if(length(intersect(v$class.x[[i]], v$class.y[[i]])) == 0 && !(contr$int.as.num && int.num(var.x) && int.num(var.y)) && !(contr$factor.as.char && fac.chr(var.x) && fac.chr(var.y))) return("Not compared") ## Technically, we can compare these as-is, but let's be explicit about it. if(!identical(v$class.x[i], v$class.y[i]) && contr$int.as.num && int.num(var.x) && int.num(var.y)) { var.x <- as.numeric(var.x) var.y <- as.numeric(var.y) } if(!identical(v$class.x[i], v$class.y[i]) && contr$factor.as.char && fac.chr(var.x) && fac.chr(var.y)) { var.x <- as.character(var.x) var.y <- as.character(var.y) } find_tol <- function(whch) { opts <- contr[[whch]] if(v$var.x[i] %in% names(opts)) return(opts[[v$var.x[i]]]) if(is.null(names(opts))) return(opts[[1]]) if("" %nin% names(opts)) stop("No default tolerance specified for ", whch) opts[[which(names(opts) == "")]] } if(is.logical(var.x) && is.logical(var.y)) { idx <- find_tol("tol.logical")(var.x, var.y) } else if(is.numeric(var.x) && is.numeric(var.y)) # this covers integers, too { idx <- find_tol("tol.num")(var.x, var.y, contr$tol.num.val) } else if(is.factor(var.x) && is.factor(var.y)) { idx <- find_tol("tol.factor")(var.x, var.y) } else if(is.character(var.x) && is.character(var.y)) { idx <- find_tol("tol.char")(var.x, var.y) } else if(is.Date(var.x) && is.Date(var.y)) { idx <- find_tol("tol.date")(var.x, var.y, contr$tol.date.val) } else { idx <- find_tol("tol.other")(var.x, var.y) } out <- data.frame(values.x = I(var.x[idx]), # just in case list-column values.y = I(var.y[idx]), row.x = df[["..row.x.."]][idx], row.y = df[["..row.y.."]][idx]) return(cbind(df[idx, byvars, drop = FALSE], out)) } compare_attrs <- function(i, v, x_, y_) { if(is.na(v$var.x[i]) || is.na(v$var.y[i])) return(NULL) attr.x <- attributes(x_[[v$var.x[i]]]) attr.y <- attributes(y_[[v$var.y[i]]]) if(is.null(attr.x) && is.null(attr.y)) return(NULL) empty <- data.frame(name = character(0), attr = I(list()), stringsAsFactors = FALSE) out <- merge(if(!is.null(attr.x)) data.frame(name = names(attr.x), attr = I(attr.x), stringsAsFactors = FALSE) else empty, if(!is.null(attr.y)) data.frame(name = names(attr.y), attr = I(attr.y), stringsAsFactors = FALSE) else empty, by = "name", all = TRUE) out$attr.x <- lapply(out$attr.x, cleanup.null.na) out$attr.y <- lapply(out$attr.y, cleanup.null.na) if(nrow(out) > 0) { out <- out[order(out$name), , drop = FALSE] out <- out[vapply(seq_len(nrow(out)), function(i) !identical(out$attr.x[[i]], out$attr.y[[i]]), logical(1)), , drop = FALSE] } out } #################################################################################################### #################################################################################################### #################################################################################################### idx_var_sum <- function(object, which = c("vars.not.shared", "nonby.vars.shared", "vars.compared", "vars.not.compared", "differences.found", "non.identical.attributes", "by.variables")) { which <- match.arg(which, several.ok = FALSE) if(which == "vars.not.shared") { vapply(object$vars.summary$values, is.null, logical(1)) } else if(which == "nonby.vars.shared") { vapply(object$vars.summary$values, function(elt) !is.null(elt) && !identical(elt, "by-variable"), logical(1)) } else if(which == "vars.not.compared") { vapply(object$vars.summary$values, identical, logical(1), y = "Not compared") } else if(which == "by.variables") { vapply(object$vars.summary$values, identical, logical(1), y = "by-variable") } else if(which == "vars.compared") { vapply(object$vars.summary$values, function(elt) is.data.frame(elt), logical(1)) } else if(which == "differences.found") { vapply(object$vars.summary$values, function(elt) is.data.frame(elt) && nrow(elt) > 0, logical(1)) } else if(which == "non.identical.attributes") { vapply(object$vars.summary$attrs, function(elt) is.data.frame(elt) && nrow(elt) > 0, logical(1)) } } #' Extract differences #' #' Extract differences (\code{diffs()}), number of differences (\code{n.diffs()}), #' or number of not-shared observations (\code{n.diff.obs()}) from a \code{comparedf} object. #' #' @param object An object of class \code{comparedf} or \code{summary.comparedf}. #' @param what Should differences or the not-shared observations be returned? #' @param vars A character vector of variable names to subset the results to. #' @param ... Other arguments (not in use at this time). #' @param by.var Logical: should the number of differences by variable be reported, or should #' all differences be reported (the default). #' @author Ethan Heinzen #' @seealso \code{\link{comparedf}} \code{\link{summary.comparedf}} #' @name diffs NULL #> NULL #' @rdname diffs #' @export n.diff.obs <- function(object, ...) { UseMethod("n.diff.obs") } #' @rdname diffs #' @export n.diff.obs.comparedf <- function(object, ...) { nrow(object$frame.summary$unique[[1]]) + nrow(object$frame.summary$unique[[2]]) } #' @rdname diffs #' @export n.diff.obs.summary.comparedf <- function(object, ...) { nrow(object$obs.table) } #' @rdname diffs #' @export n.diffs <- function(object, ...) { UseMethod("n.diffs") } #' @rdname diffs #' @export n.diffs.comparedf <- function(object, ...) { sum(vapply(object$vars.summary$values, function(elt) if(is.data.frame(elt)) nrow(elt) else 0, numeric(1))) } #' @rdname diffs #' @export n.diffs.summary.comparedf <- function(object, ...) { nrow(object$diffs.table) } #' @rdname diffs #' @export diffs <- function(object, ...) { UseMethod("diffs") } #' @rdname diffs #' @export diffs.comparedf <- function(object, what = c("differences", "observations"), vars = NULL, ..., by.var = FALSE) { what <- match.arg(what) if(what == "observations") { get.obs.not.shared <- function(n) { obs.ns <- object$frame.summary$unique[[n]] cbind(version = rep(object$frame.summary$version[[n]], times = nrow(obs.ns)), obs.ns, stringsAsFactors = FALSE) } return(rbind(get.obs.not.shared(1), get.obs.not.shared(2))) } if(!is.logical(by.var) || length(by.var) != 1) stop("'by.var' must be a single logical value.") diffs <- as.data.frame(object$vars.summary[idx_var_sum(object, "vars.compared"), c("var.x", "var.y", "values")]) diffs$n <- vapply(diffs$values, nrow, numeric(1)) sumNA <- function(df) sum(is.na(df$values.x) | is.na(df$values.y)) diffs$NAs <- vapply(diffs$values, sumNA, numeric(1)) if(is.null(vars)) vars <- unique(c(diffs$var.x, diffs$var.y)) else if(!is.character(vars)) stop("'vars' should be NULL or a character vector.") rownames(diffs) <- NULL if(by.var) return(diffs[diffs$var.x %in% vars | diffs$var.y %in% vars, c("var.x", "var.y", "n", "NAs"), drop = FALSE]) tolist <- function(df) { df$values.x <- I(as.list(df$values.x)) # need the I() for factors and dates to show up right df$values.y <- I(as.list(df$values.y)) df } diffs1 <- diffs[diffs$n > 0, , drop = FALSE] if(nrow(diffs1) > 0) { diffs.table <- do.call(rbind, lapply(Map(cbind, var.x = diffs1$var.x, var.y = diffs1$var.y, diffs1$values, MoreArgs = list(stringsAsFactors = FALSE)), tolist)) } else { diffs.table <- data.frame(var.x = character(0), var.y = character(0), stringsAsFactors = FALSE) diffs.table[object$frame.summary$by[[1]]] <- rep(list(character(0)), length(object$frame.summary$by[[1]])) diffs.table$values.y <- diffs.table$values.x <- I(list()) diffs.table$row.y <- diffs.table$row.x <- integer(0) } rownames(diffs.table) <- NULL diffs.table[diffs.table$var.x %in% vars | diffs.table$var.y %in% vars, , drop = FALSE] } #' @rdname diffs #' @export diffs.summary.comparedf <- function(object, what = c("differences", "observations"), vars = NULL, ..., by.var = FALSE) { what <- match.arg(what) if(what == "observations") return(object$obs.table) tmp <- if(by.var) object$diffs.byvar.table else object$diffs.table if(is.null(vars)) vars <- unique(tmp$var.x, tmp$var.y) else if(!is.character(vars)) stop("'vars' should be NULL or a character vector.") tmp[tmp$var.x %in% vars | tmp$var.y %in% vars, , drop = FALSE] } arsenal/R/summary.comparedf.R0000644000176200001440000001570313715041652015671 0ustar liggesusers #' The summary method for a \code{comparedf} object #' #' Print a more detailed output of the \code{\link{comparedf}} object. #' #' @param object An object of class \code{"comparedf"}, as made by the \code{\link{comparedf}} S3 method. #' @param ... Other arguments passed to \code{\link{comparedf.control}}. In \code{print}, these are passed to \code{\link[knitr]{kable}}. #' @param show.attrs Logical, denoting whether to show the actual attributes which are different. For (e.g.) factors with lots #' of levels, this can make the tables quite wide, so this feature is \code{FALSE} by default. #' @param x An object returned by the \code{summary.comparedf} function. #' @param format Passed to \code{\link[knitr]{kable}}: the format for the table. The default here is "pandoc". #' To use the default in \code{kable}, pass \code{NULL}. #' @return An object of class \code{"summary.comparedf"} is returned. #' @seealso \code{\link{comparedf}}, \code{\link{comparedf.control}} #' @name summary.comparedf NULL #> NULL #' @rdname summary.comparedf #' @export summary.comparedf <- function(object, ..., show.attrs = FALSE) { control <- c(list(...), object$control) control <- do.call("comparedf.control", control[!duplicated(names(control))]) #### start with summaries of the data.frames #### frame.summary <- as.data.frame(object$frame.summary[c("version", "arg", "ncol", "nrow")]) #### after we've done all that, summaries of the overall comparison #### diffs.byvar <- diffs(object, by.var = TRUE) diffs.tab <- diffs(object) nobs.shared <- object$frame.summary$n.shared[1] nobs.uneq <- length(unique(diffs.tab$row.x)) comparison.summary <- data.frame( statistic = c( "Number of by-variables", "Number of non-by variables in common", "Number of variables compared", "Number of variables in x but not y", "Number of variables in y but not x", "Number of variables compared with some values unequal", "Number of variables compared with all values equal", "Number of observations in common", "Number of observations in x but not y", "Number of observations in y but not x", "Number of observations with some compared variables unequal", "Number of observations with all compared variables equal", "Number of values unequal" ), value = c( (!attr(object$frame.summary$by, "byrow"))*sum(idx_var_sum(object, "by.variables")), sum(idx_var_sum(object, "nonby.vars.shared")), sum(idx_var_sum(object, "vars.compared")), sum(is.na(object$vars.summary$var.y)), sum(is.na(object$vars.summary$var.x)), sum(diffs.byvar$n > 0), sum(diffs.byvar$n == 0), nobs.shared, nrow(object$frame.summary$unique[[1]]), nrow(object$frame.summary$unique[[2]]), nobs.uneq, nobs.shared - nobs.uneq, n.diffs(object) ), stringsAsFactors = FALSE ) #### start with differences in variables first #### get.vars.not.shared <- function(a, b) { var.diff.a <- object$vars.summary[is.na(object$vars.summary[[paste0("var.", b)]]), paste0(c("var.", "pos.", "class."), a)] var.diff.a2 <- cbind(version = rep(a, times = nrow(var.diff.a)), var.diff.a, stringsAsFactors = FALSE) colnames(var.diff.a2) <- c("version", "variable", "position", "class") var.diff.a2 } vars.ns <- rbind(get.vars.not.shared("x", "y"), get.vars.not.shared("y", "x")) #### report variables not compared #### vars.nc <- as.data.frame(object$vars.summary[idx_var_sum(object, "vars.not.compared"), c("var.x", "pos.x", "class.x", "var.y", "pos.y", "class.y"), drop = FALSE]) #### Now for the observations which aren't shared #### obs.ns <- diffs(object, what = "observations") #### Now for the actual differences #### # done in the structure statement #### Now for attributes #### attrs.tmp <- as.data.frame(object$vars.summary[idx_var_sum(object, "non.identical.attributes"), c("var.x", "var.y", "attrs"), drop = FALSE]) attrs.diffs <- do.call(rbind, Map(cbind, var.x = attrs.tmp$var.x, var.y = attrs.tmp$var.y, attrs.tmp$attrs, MoreArgs = list(stringsAsFactors = FALSE))) if(is.null(attrs.diffs)) { attrs.diffs <- data.frame(var.x = character(0), var.y = character(0), name = character(0), stringsAsFactors = FALSE) } else if(!show.attrs) attrs.diffs <- attrs.diffs[c("var.x", "var.y", "name")] structure(list( frame.summary.table = frame.summary, comparison.summary.table = comparison.summary, vars.ns.table = vars.ns, vars.nc.table = vars.nc, obs.table = obs.ns, diffs.byvar.table = diffs.byvar, diffs.table = diffs.tab, attrs.table = attrs.diffs, control = control ), class = "summary.comparedf") } #' @rdname summary.comparedf #' @export print.summary.comparedf <- function(x, ..., format = "pandoc") { orig <- x sumdiffs <- sum(x$diffs.byvar.table$n) ctrl <- x$control ctrl$max.print.vars.ns <- ctrl$max.print.vars ctrl$max.print.vars.nc <- ctrl$max.print.vars if(is.null(ctrl$max.print.diffs.per.var) || is.na(ctrl$max.print.diffs.per.var)) ctrl$max.print.diffs.per.var <- sumdiffs if(nrow(x$diffs.table) > 0) { x$diffs.table <- do.call(rbind, by(x$diffs.table, factor(x$diffs.table$var.x, levels = unique(x$diffs.table$var.x)), utils::head, ctrl$max.print.diffs.per.var)) # Need this for knitr to output list-cols of factors and dates correctly as_char <- function(x) if(is.factor(x) || is.Date(x)) x <- as.character(x) else x x$diffs.table$values.x <- lapply(x$diffs.table$values.x, as_char) x$diffs.table$values.y <- lapply(x$diffs.table$values.y, as_char) } for(v in c("frame.summary", "comparison.summary", "vars.ns", "vars.nc", "obs", "diffs.byvar", "diffs", "attrs")) { obj <- x[[paste0(v, ".table")]] nprint <- ctrl[[paste0("max.print.", v)]] # there is purposefully no max.print.frame.summary or max.print.comparison.summary if(is.null(nprint) || is.na(nprint)) nprint <- nrow(obj) caption <- switch( v, frame.summary = "Summary of data.frames", comparison.summary = "Summary of overall comparison", vars.ns = "Variables not shared", vars.nc = "Other variables not compared", obs = "Observations not shared", diffs.byvar = "Differences detected by variable", diffs = "Differences detected", attrs = "Non-identical attributes" ) if(nrow(obj) > 0) { if(v == "diffs" && sumdiffs > min(nprint, nrow(obj))) { caption <- paste0(caption, " (", sumdiffs - min(nprint, nrow(obj)), " not shown)") } else if(nrow(obj) > nprint) { caption <- paste0(caption, " (", nrow(obj) - nprint, " not shown)") } print(knitr::kable(utils::head(obj, nprint), format = format, caption = caption, row.names = FALSE, ...)) } else { nocaption <- paste0("No ", tolower(caption)) print(knitr::kable(data.frame(x = nocaption), format = format, caption = caption, row.names = FALSE, col.names = "", ...)) } cat("\n") } invisible(orig) } arsenal/R/formulize.R0000644000176200001440000000633513674210423014250 0ustar liggesusers#' formulize #' #' A shortcut to generate one-, two-, or many-sided formulas from vectors of variable names. #' #' @param y,x,... Character vectors, names, or calls to be collapsed (by \code{"+"}) and put left-to-right in the formula. #' If \code{data} is supplied, these can also be numeric, denoting which column name to use. See examples. #' @param data An R object with non-null column names. #' @param collapse How should terms be collapsed? Default is addition. #' @param collapse.y How should the y-terms be collapsed? Default is addition. Also accepts the special string "list", #' which combines them into a multiple-left-hand-side formula, for use in other functions. #' @param escape A logical indicating whether character vectors should be coerced to names (that is, whether names with spaces should #' be surrounded with backticks or not) #' @seealso \code{\link[stats:delete.response]{reformulate}} #' @author Ethan Heinzen #' @examples #' ## two-sided formula #' f1 <- formulize("y", c("x1", "x2", "x3")) #' #' ## one-sided formula #' f2 <- formulize(x = c("x1", "x2", "x3")) #' #' ## multi-sided formula #' f3 <- formulize("y", c("x1", "x2", "x3"), c("z1", "z2"), "w1") #' #' ## can use numerics for column names #' data(mockstudy) #' f4 <- formulize(y = 1, x = 2:4, data = mockstudy) #' #' ## mix and match #' f5 <- formulize(1, c("x1", "x2", "x3"), data = mockstudy) #' #' ## get an interaction #' f6 <- formulize("y", c("x1*x2", "x3")) #' #' ## get only interactions #' f7 <- formulize("y", c("x1", "x2", "x3"), collapse = "*") #' #' ## no intercept #' f8 <- formulize("y", "x1 - 1") #' f9 <- formulize("y", c("x1", "x2", "-1")) #' #' ## LHS as a list to use in arsenal functions #' f10 <- formulize(c("y1", "y2", "y3"), c("x", "z"), collapse.y = "list") #' #' ## use in an lm #' f11 <- formulize(2, 3:4, data = mockstudy) #' summary(lm(f11, data = mockstudy)) #' #' ## using non-syntactic names or calls (like reformulate example) #' f12 <- formulize(as.name("+-"), c("`P/E`", "`% Growth`")) #' f12 <- formulize("+-", c("P/E", "% Growth"), escape = TRUE) #' #' f <- Surv(ft, case) ~ a + b #' f13 <- formulize(f[[2]], f[[3]]) #' #' @export formulize <- function(y = "", x = "", ..., data = NULL, collapse = "+", collapse.y = collapse, escape = FALSE) { dots <- list(y = y, x = x, ...) if(!is.null(data)) { if(is.null(colnames(data))) stop("colnames(data) is NULL") dots <- lapply(dots, function(elt, cn) if(is.numeric(elt)) lapply(cn[elt], as.name) else elt, cn = colnames(data)) } name.or.call <- function(elt) is.name(elt) || is.call(elt) dots <- lapply(dots, function(elt) if(name.or.call(elt)) list(elt) else if(is.character(elt) && all(nzchar(elt)) && escape) lapply(elt, as.name) else elt) is.ok <- function(x) is.character(x) || (is.list(x) && all(vapply(x, name.or.call, NA))) trash <- lapply(dots, function(elt) if(!is.ok(elt)) stop("One or more argument isn't a character vector, numeric vector, list of names, or list of calls")) dots[[1]] <- if(collapse.y == "list") { paste0("list(", paste0(dots[[1]], collapse = ", "), ")") } else paste0(dots[[1]], collapse = collapse.y) elts <- vapply(dots, paste0, character(1), collapse = collapse) stats::as.formula(paste0(elts, collapse = " ~ "), env = parent.frame()) } arsenal/R/mockstudy.R0000644000176200001440000000431113632700352014245 0ustar liggesusers#' Mock study data for examples #' #' Mock clinical study data for examples to test data manipulation and statistical functions. #' The function \code{muck_up_mockstudy()} is used in examples for \code{\link{comparedf}}. #' #' @format A data frame with 1499 observations on the following 15 variables: #' \describe{ #' \item{\code{case}}{a numeric identifier-patient ID} #' \item{\code{age}}{age in years} #' \item{\code{arm}}{treatment arm divided into 3 groups, character string } #' \item{\code{sex}}{a factor with levels \code{Male} \code{Female}} #' \item{\code{race}}{self-reported race/ethnicity, character string} #' \item{\code{fu.time}}{survival or censoring time in years} #' \item{\code{fu.stat}}{censoring status; 1=censor, 2=death} #' \item{\code{ps}}{integer, ECOG performance score } #' \item{\code{hgb}}{numeric, hemoglobin count} #' \item{\code{bmi}}{numeric, body mass index, kg/m^2} #' \item{\code{alk.phos}}{numeric, alkaline phosphatase} #' \item{\code{ast}}{numeric, aspartate transaminase } #' \item{\code{mdquality.s}}{integer, LASA QOL 0=Clinically Deficient, 1=Not Clinically Deficient } #' \item{\code{age.ord}}{an ordered factor split of age, with levels #' \code{10-19} < \code{20-29} < \code{30-39} < \code{40-49} < #' \code{50-59} < \code{60-69} < \code{70-79} < \code{80-89}} #' } #' @examples #' data(mockstudy) #' str(mockstudy) #' @name mockstudy NULL #> NULL #' @rdname mockstudy "mockstudy" #' @rdname mockstudy #' @export muck_up_mockstudy <- function() { mockstudy <- arsenal::mockstudy mockstudy2 <- mockstudy[c(1, 3, 6, 10:nrow(mockstudy), 2, 4), ] mockstudy2$sex <- factor(mockstudy2$sex, levels = c("Female", "Male")) mockstudy2$ps[3] <- NA mockstudy2$ast[1:3] <- 36 mockstudy2$age <- NULL mockstudy2$fu_time <- mockstudy2$fu.time mockstudy2$fu.time <- NULL mockstudy2$`fu stat` <- mockstudy2$fu.stat mockstudy2$fu.stat <- NULL mockstudy2$Arm <- mockstudy2$arm mockstudy2$arm <- NULL mockstudy2$hgb[is.na(mockstudy2$hgb)] <- -9 mockstudy2$race[mockstudy2$race == "Caucasian"] <- "caucasian" mockstudy2$race <- factor(mockstudy2$race) attr(mockstudy2$sex, "label") <- "Sex (M/F)" mockstudy2 } arsenal/R/write2.default.R0000644000176200001440000000530013755037457015100 0ustar liggesusers#' @rdname write2 #' @export write2.default <- function(object, file, FUN = NULL, ..., append. = FALSE, render. = TRUE, keep.rmd = !render., output_format = NULL) { check_pkg("rmarkdown") if(!is.character(file) || length(file) > 1) stop("'file' argument must be a single character string.") if(!is.logical(append.) || length(append.) > 1) stop("'append.' argument must be a single logical value.") if(!is.logical(render.) || length(render.) > 1) stop("'render.' argument must be a single logical value.") if(!is.logical(keep.rmd) || length(keep.rmd) > 1) stop("'keep.rmd' argument must be a single logical value.") if(is.null(FUN)) { object <- verbatim(object) FUN <- print } FUN <- match.fun(FUN) if(is.character(output_format) && length(output_format) > 1) { warning("At this point, write2() only supports one output type.") output_format <- output_format[1] } output_format <- if(is.null(output_format) || identical(output_format, "html")) { rmarkdown::html_document } else if(identical(output_format, "pdf")) { rmarkdown::pdf_document } else if(identical(output_format, "word")) { rmarkdown::word_document } else output_format filename <- paste0(file, ".Rmd") if(!append. || !file.exists(filename)) file.create(filename) # this will create a blank document when needed but allows the append. = TRUE case to work, too dots <- list(...) if(names(formals(FUN))[1] == "...") # this is when the FUN is, e.g., cat(). Any named arguments would still get cat'd, which we don't want { ARGS <- c(list(object), dots[names(dots) %in% names(formals(FUN))]) utils::capture.output(do.call(FUN, ARGS), file = filename, append = append.) } else { utils::capture.output(FUN(object, ...), file = filename, append = append.) } if(render.) { render.args <- dots[names(dots) %in% names(formals(rmarkdown::render))] render.args$input <- filename render.args$output_file <- file # if output_format is a function, evaluate it with the ... arguments # otherwise, just return the character string or list which rmarkdown::render() will handle render.args$output_format <- if(is.function(output_format)) { if("..." %in% names(formals(output_format))) { do.call(output_format, dots) } else do.call(output_format, dots[names(dots) %in% names(formals(output_format))]) } else output_format do.call(rmarkdown::render, render.args) } # This short-circuits if they want to keep the intermediate files. Otherwise, file.remove() returns a logical about successful file removal if(!keep.rmd && !file.remove(filename)) warning("Something went wrong removing the temporary .Rmd file.") invisible(object) } arsenal/R/modelsum.R0000644000176200001440000002467313714756213014075 0ustar liggesusers #' Fit models over each of a set of independent variables with a response variable #' #' Fit and summarize models for each independent (x) variable with a response variable (y), with options to adjust by variables for each model. #' #' @param formula an object of class \code{\link{formula}}; a symbolic description of the variables to be modeled. See "Details" for more information. #' @param adjust an object of class \code{\link{formula}} or a list of formulas, listing variables to adjust by in all models. #' Specify as a one-sided formula, like: \code{~Age+ Sex}. If a list, the names are used for the summary function. Unadjusted models #' can be specified as \code{~ 1} or as a list: \code{list(Unadjusted = NULL)}. #' @param family similar mechanism to \code{\link[stats]{glm}}, where the model to be fit is driven by the family. #' Options include: binomial, gaussian, survival, poisson, negbin, clog, and ordinal. These can be passed as a string, as a function, #' or as a list resulting from a call to one of the functions. See \code{\link{modelsum.family}} for details on #' survival, ordinal, negbin, and clog families. #' @param data an optional data.frame, list or environment (or object coercible by \code{\link[base]{as.data.frame}} to a data frame) containing the #' variables in the model. If not found in \code{data}, the variables are taken from \code{environment(formula)}, typically #' the environment from which \code{modelsum} is called. #' @param subset an optional vector specifying a subset of observations (rows of \code{data}) to be used in the results. #' If \code{strata} is missing, this works as vector of logicals or an index; otherwise, it should be a logical vector. #' @param weights an optional vector specifying the weights to apply to each data observation (rows of \code{data}) #' @param strata a vector of strata to separate model summaries by an additional group. Note that for families like "clog", #' the "usual" strata term to indicate subject groupings should be given in the \code{adjust} argument. #' @param na.action a function which indicates what should happen when the data contain \code{NA}s. #' The default (\code{NULL}) is to use the defaults of \code{\link[stats]{lm}}, \code{\link[stats]{glm}}, or \code{\link[survival]{coxph}}, #' depending on the \code{family} specifications. #' @param id A vector to identify clusters. Only used for \code{\link{relrisk}} at this time. #' @param control control parameters to handle optional settings within \code{modelsum}. Arguments for \code{modelsum.control} #' can be passed to \code{modelsum} via the \code{...} argument, but if a control object and \code{...} arguments are both supplied, #' the latter are used. See \code{\link{modelsum.control}} for other details. #' @param ... additional arguments to be passed to internal \code{modelsum} functions. #' @return An object with class \code{c("modelsum", "arsenal_table")} #' @author Jason Sinnwell, Patrick Votruba, Beth Atkinson, Gregory Dougherty, and Ethan Heinzen, adapted from SAS Macro of the same name #' @seealso \code{\link{arsenal_table}}, \code{\link{modelsum.control}}, \code{\link{summary.modelsum}}, #' \code{\link{modelsum.internal}}, \code{\link{formulize}} #' @examples #' #' data(mockstudy) #' #' tab1 <- modelsum(bmi ~ sex + age, data = mockstudy) #' summary(tab1, text = TRUE) #' #' tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust = ~ age + sex, #' family = "gaussian", data = mockstudy) #' summary(tab2, text = TRUE) #' #' summary(tab2, show.intercept = FALSE, text = TRUE) #' #' tab2.df <- as.data.frame(tab2) #' #' tab2.df[1:5,] #' @name modelsum NULL #> NULL #' @rdname modelsum #' @export modelsum <- function(formula, family="gaussian", data, adjust=NULL, na.action = NULL, subset=NULL, weights=NULL, id, strata, control = NULL, ...) { Call <- match.call() ## Allow family parameter to passed with or without quotes ## Here, we force quotes to simplify in for loop below family.list <- if(is.function(family) || is.character(family)) match.fun(family)() else family if(family.list$family %nin% c("survival", "gaussian", "binomial", "poisson", "quasibinomial", "quasipoisson", "ordinal", "negbin", "clog", "relrisk")) stop("Family ", family.list$family, " not supported.\n") if(family.list$family != "survival" && any(grepl("Surv\\(", formula))) { warning("Found Surv in formula, assuming family='survival'\n") family.list <- survival() } ## pick up extra control arguments from command via ... control <- c(list(...), control) control <- do.call("modelsum.control", control[!duplicated(names(control))]) ## Tell user if they passed an argument that was not expected, either here or in control expectArgs <- c("formula", "family", "adjust", "data", "na.action", "subset", "weights", "id", "strata", "control", names(control)) match.idx <- match(names(Call)[-1], expectArgs) if(anyNA(match.idx)) warning("Unused arguments: ", paste(names(Call)[c(FALSE, is.na(match.idx))], collapse=", "), "\n") #### Set up "main effects" dataset #### indx.main <- match(c("formula", "data", "subset", "weights", "strata"), names(Call), 0L) if(indx.main[1] == 0) stop("A formula argument is required") if(length(formula) == 2) stop("'formula' should have a response variable!") main.call <- Call[c(1, indx.main)] main.call[[1]] <- quote(stats::model.frame) main.call$na.action <- quote(stats::na.pass) # for now, keep all rows, except for what is subset out if(!missing(data)) { # instead of call("keep.labels", ...), which breaks when arsenal isn't loaded (Can't find "keep.labels") main.call$data <- as.call(list(keep.labels, main.call$data)) } #### Set up "adjustment" dataset #### if(is.null(adjust)) { adjust <- list(unadjusted = NULL) adjustdf <- NULL adjTerms <- NULL adjLabels <- NULL } else { adjust <- as_list_formula(adjust) if(any(lengths(adjust) != 2)) stop("'adjust' formula(s) shouldn't have a response variable!") if(is.null(names(adjust))) { names(adjust) <- paste0("adjusted", seq_along(adjust)) } else if(anyDuplicated(names(adjust))) stop("Names of 'adjust' must be unique.") adj.call <- main.call adj.call$formula <- Reduce(join_formula, adjust) adj.call$weights <- NULL adj.call$strata <- NULL adjustdf <- eval(adj.call, parent.frame()) Terms.a <- stats::terms(adjustdf) adjTerms <- make_ms_term_labels(adjustdf, Terms.a) adjLabels <- lapply(adjTerms, make_ms_labs) } is.numericish <- function(x) is.numeric(x) || is.Date(x) out.tables <- list() formula.list <- as_list_formula(formula) for(FORM in formula.list) { main.call$formula <- FORM maindf <- loosen.labels(eval(main.call, parent.frame())) if(nrow(maindf) == 0) stop("No (non-missing) observations") Terms <- stats::terms(maindf) #### Check for weights #### if(hasWeights <- "(weights)" %in% colnames(maindf)) maindf[["(weights)"]] <- NULL #### Check for strata #### if(hasStrata <- "(strata)" %in% colnames(maindf)) { strata.col <- maindf[["(strata)"]] strataTerm <- deparse(Call$strata) if(is.null(strataLabel <- attr(strata.col, "label"))) strataLabel <- strataTerm if(is.factor(strata.col)) { strata.col <- droplevels(strata.col) strata.levels <- levels(strata.col) } else strata.levels <- sort(unique(strata.col)) maindf[["(strata)"]] <- NULL } else { strata.col <- rep("", nrow(maindf)) strataTerm <- strataLabel <- strata.levels <- "" } #### Get info on y-variable #### yCol <- maindf[[1]] if(family.list$family == "gaussian" && length(unique(yCol)) <= 5) warning("Input family=gaussian, but dependent variable has 5 or fewer categories\n") yTerm <- colnames(maindf)[1] if(is.null(yLabel <- attr(yCol, "label"))) yLabel <- yTerm yList <- list(label = yLabel, term = yTerm) maindf[[1]] <- NULL Terms.x <- stats::delete.response(Terms) #### Now finish the x-variables #### effCols <- seq_len(ncol(attr(Terms, "factors"))) xTerms <- make_ms_term_labels(maindf, Terms.x) strataList <- vector("list", length(strata.levels)) if(hasStrata) names(strataList) <- paste0("(", strataTerm, ") == ", strata.levels) for(strat in strata.levels) { xList <- vector("list", length(effCols)) names(xList) <- names(xTerms) idx <- if(!hasStrata) NULL else call("==", call("(", Call$strata), strat) for(eff in effCols) { xList[[eff]] <- vector("list", length(adjust)) names(xList[[eff]]) <- names(adjust) for(adj.i in seq_along(adjust)) { curr.formula <- stats::drop.terms(Terms, if(length(effCols) > 1) setdiff(effCols, eff) else NULL, keep.response = TRUE) adj.formula <- join_formula(curr.formula, adjust[[adj.i]]) temp.call <- Call[c(1, match(c("data", "subset", "na.action", "weights", "id"), names(Call), 0L))] temp.call$formula <- adj.formula if(hasStrata) { temp.call$subset <- if(!is.null(temp.call$subset)) call("&", call("(", temp.call$subset), idx) else idx } currCols <- maindf[strata.col == strat, attr(Terms.x, "factors")[, eff] > 0, drop=FALSE] results <- modelsum_guts(family.list, temp.call, envir = parent.frame(), conf.level = control$conf.level, scope = stats::delete.response(curr.formula), anyna = anyNA(currCols)) nmiss <- length(results$fit$na.action) xList[[eff]][[adj.i]] <- list( coeff=results$coeffTidy, glance = c( results$modelGlance, N = sum(strata.col == strat) - nmiss, Nmiss = nmiss, Nmiss2 = nmiss, endpoint=yTerm, endlabel=yLabel, x=xTerms[[eff]]$variable, contrasts=list(results$fit$contrasts) ) ) } } strataList[[if(!hasStrata) 1 else paste0("(", strataTerm, ") == ", strat)]] <- xList } out.tables[[yTerm]] <- list(y = yList, strata = list(term = strataTerm, values = strata.levels, label = strataLabel, hasStrata = hasStrata), x = lapply(xTerms, make_ms_labs), adjust = adjLabels, tables = strataList, family = family.list$family, hasWeights = hasWeights) } structure(list(Call = Call, control = control, tables = out.tables), class = c("modelsum", "arsenal_table")) } arsenal/R/write2specific.R0000644000176200001440000000320113632700352015142 0ustar liggesusers#' write2word, write2html, write2pdf #' #' Functions to output tables to a single Word, HTML, or PDF document. #' #' @inheritParams write2 #' @return \code{object} is returned invisibly, and \code{file} is written. #' @details #' To generate the appropriate file type, the \code{write2*} functions use one of \code{rmarkdown::word_document}, \code{rmarkdown::html_document}, #' and \code{rmarkdown::pdf_document} to get the job done. \code{"..."} arguments are passed to these functions, too. #' @seealso \code{\link{write2}} #' @examples #' \dontrun{ #' data(mockstudy) #' # tableby example #' tab1 <- tableby(arm ~ sex + age, data=mockstudy) #' write2html(tab1, "~/trash.html") #' #' # freqlist example #' tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany") #' noby <- freqlist(tab.ex, na.options = "include") #' write2pdf(noby, "~/trash2.pdf") #' #' # A more complicated example #' write2word(tab1, "~/trash.doc", #' keep.md = TRUE, #' reference_docx = mystyles.docx, # passed to rmarkdown::word_document #' quiet = TRUE, # passed to rmarkdown::render #' title = "My cool new title") # passed to summary.tableby #' } #' @author Ethan Heinzen, adapted from code from Krista Goergen #' @name write2specific NULL #> NULL #' @rdname write2specific #' @export write2word <- function(object, file, ...) { write2(object, file, ..., output_format = "word") } #' @rdname write2specific #' @export write2pdf <- function(object, file, ...) { write2(object, file, ..., output_format = "pdf") } #' @rdname write2specific #' @export write2html <- function(object, file, ...) { write2(object, file, ..., output_format = "html") } arsenal/R/modelsum.families.R0000644000176200001440000000210413714572335015647 0ustar liggesusers#' Family functions for modelsum #' #' A set of family functions for \code{\link{modelsum}}. #' #' @param method See \code{MASS::\link[MASS]{polr}}. #' @param link See \code{MASS::\link[MASS]{glm.nb}}. #' @return A list, in particular with element \code{family}. #' @seealso \code{\link[stats]{family}}, \code{\link[survival]{coxph}}, \code{\link[MASS]{polr}} #' @name modelsum.family NULL #> NULL #' @rdname modelsum.family #' @export survival <- function() list(family="survival") #' @rdname modelsum.family #' @export ordinal <- function(method = c("logistic", "probit", "loglog", "cloglog", "cauchit")) { list(family = "ordinal", method = if(is.function(method)) method else match.arg(method)) } #' @rdname modelsum.family #' @export negbin <- function(link = c("log", "identity", "sqrt")) { list(family = "negbin", method = if(is.function(link)) link else match.arg(link)) } #' @rdname modelsum.family #' @export clog <- function() { list(family = "clog") } #' @rdname modelsum.family #' @export relrisk <- function(link = "log") { list(family = "relrisk", link = link) } arsenal/R/write2.R0000644000176200001440000001506513632700352013447 0ustar liggesusers#' write2 #' #' Functions to output tables to a single document. (Also the S3 backbone behind the \code{write2*} functions.) #' #' @param object An object. #' @param file A single character string denoting the filename for the output document. #' @param ... Additional arguments to be passed to \code{FUN}, \code{rmarkdown::render}, etc. #' One popular option is to use \code{quiet = TRUE} to suppress the command line output. #' @param FUN The summary-like or print-like function to use to generate the markdown content. Can be passed as a function or a #' character string. It's expected that \code{FUN(object, ...)} looks "good" when put directly in a \code{.Rmd} file. #' @param append. Logical, denoting whether (if a temporary \code{.Rmd} file of the same name already exists) #' to append on. Used mostly for \code{write2.list}. #' @param render. Logical, denoting whether to render the temporary \code{.Rmd} file. Used mostly for \code{write2.list}. #' @param keep.rmd Logical, denoting whether to keep the intermediate \code{.Rmd} file. Used mostly for \code{write2.list}. #' @param output_format One of the following: #' \enumerate{ #' \item{An output format object, e.g. \code{rmarkdown::\link[rmarkdown]{html_document}(...)}.} #' \item{A character string denoting such a format function, e.g. \code{"html_document"}. In this case, the \code{"..."} are NOT passed.} #' \item{The format function itself, e.g. \code{rmarkdown::html_document}. In this case, the \code{"..."} arguments are passed.} #' \item{One of \code{"html"}, \code{"pdf"}, and \code{"word"}, shortcuts implemented here. In this case, the \code{"..."} arguments are passed.} #' \item{\code{NULL}, in which the output is HTML by default.} #' } #' See \code{rmarkdown::\link[rmarkdown]{render}} for details. #' @return \code{object} is returned invisibly, and \code{file} is written. #' @details \code{write2} is an S3 method. The default prints the object (using \code{\link{print}}) #' inside a section surrounded by three back ticks. See \code{\link{verbatim}} for details. #' #' There are methods implemented for \code{\link{tableby}}, \code{\link{modelsum}}, and \code{\link{freqlist}}, all of which use the #' \code{summary} function. There are also methods compatible with \code{\link[knitr]{kable}}, \code{\link[xtable]{xtable}}, #' and \code{\link[pander]{pander_return}}. Another option is to coerce an object using \code{\link{verbatim}()} to print out the #' results monospaced (as if they were in the terminal). To output multiple tables into a document, simply make a list of them #' and call the same function as before. Finally, to output code chunks to be evaluated, use \code{\link{code.chunk}}. #' #' For more information, see \code{vignette("write2")}. #' @seealso \code{\link{write2word}}, \code{\link{write2pdf}}, \code{\link{write2html}}, #' \code{\link[rmarkdown]{render}}, \code{\link[rmarkdown]{word_document}}, \code{\link[rmarkdown]{html_document}}, \code{\link[rmarkdown]{pdf_document}}, #' \code{\link[rmarkdown]{rtf_document}}, \code{\link[rmarkdown]{md_document}}, \code{\link[rmarkdown]{odt_document}} #' @examples #' \dontrun{ #' data(mockstudy) #' # tableby example #' tab1 <- tableby(arm ~ sex + age, data=mockstudy) #' write2(tab1, tempfile(fileext = ".rtf"), #' toc = TRUE, # passed to rmarkdown::rtf_document, though in this case it's not practical #' quiet = TRUE, # passed to rmarkdown::render #' title = "My cool new title", # passed to summary.tableby #' output_format = rmarkdown::rtf_document) #' #' write2html(list( #' "# Header 1", # a header #' code.chunk(a <- 1, b <- 2, a + b), # a code chunk #' verbatim("hi there") # verbatim output #' ), #' tempfile(fileext = ".html"), #' quite = TRUE) #' } #' @author Ethan Heinzen, adapted from code from Krista Goergen #' @name write2 NULL #> NULL #' @rdname write2 #' @export write2 <- function(object, file, ..., output_format) { UseMethod("write2") } write2_using_summary <- function(object, file, ..., output_format = NULL) { write2.default(object = object, file = file, FUN = summary, ..., output_format = output_format) } write2_using_print <- function(object, file, ..., output_format = NULL) { write2.default(object = object, file = file, FUN = print, ..., output_format = output_format) } ############################ write2 for arsenal objects ############################ #' @rdname write2 #' @export write2.arsenal_table <- write2_using_summary #' @rdname write2 #' @export write2.summary.arsenal_table <- write2_using_print #' @rdname write2 #' @export write2.comparedf <- write2_using_summary #' @rdname write2 #' @export write2.summary.comparedf <- write2_using_print #' @rdname write2 #' @export write2.verbatim <- write2_using_print #' @rdname write2 #' @export write2.yaml <- write2_using_print #' @rdname write2 #' @export write2.code.chunk <- write2_using_print ############################ write2 for external objects ############################ #' @rdname write2 #' @export write2.knitr_kable <- write2_using_print #' @rdname write2 #' @export write2.xtable <- write2_using_print #' @rdname write2 #' @export ## this intended for use with pander_return() write2.character <- function(object, file, ..., output_format = NULL) { write2.default(object = object, file = file, FUN = cat, ..., sep = "\n", output_format = output_format) } ############################ write2 for lists of objects ############################ #' @rdname write2 #' @export write2.list <- function(object, file, ..., append. = FALSE, render. = TRUE, keep.rmd = !render., output_format = NULL) { if(!is.character(file) || length(file) > 1) stop("'file' argument must be a single character string.") if(!is.logical(append.) || length(append.) > 1) stop("'append.' argument must be a single logical value.") filename <- paste0(file, ".Rmd") if(!append. || !file.exists(filename)) file.create(filename) # find any YAML specifications idx <- vapply(object, is.yaml, NA) if(any(idx)) { yamls <- Reduce(c, object[idx]) object <- object[!idx] write2(yamls, file = file, ..., keep.rmd = TRUE, append. = TRUE, render. = FALSE, output_format = output_format) } # separate the tables with a few blank lines, leading with the blank lines object2 <- c(object, as.list(rep("\n\n", times = length(object))))[order(c(seq_along(object), seq_along(object) - 0.5))] lapply(object2, write2, file = file, ..., keep.rmd = TRUE, append. = TRUE, render. = FALSE, output_format = output_format) write2("\n", file = file, ..., render. = render., append. = TRUE, keep.rmd = keep.rmd, output_format = output_format) invisible(object) } arsenal/R/paired.internal.R0000644000176200001440000000573013656527335015326 0ustar liggesusers #' Helper functions for paired #' #' A set of helper functions for \code{\link{paired}}. #' #' @param missings A character string denoting which action to take. See "Details", below. #' @return \code{na.paired} returns a function used to subset data.frames in \code{\link{paired}}. #' @details #' All methods subset out any NA time points or IDs. #' \code{"in.both"} (the default) subsets the data.frame to individuals who appear at both time points. #' \code{"fill"} adds explicit missings for the people missing second time points. #' \code{"asis"} does nothing to add or remove missings. #' @seealso \link{tableby.internal} #' @name paired.internal NULL #> NULL # 'fill' puts in the missing time points # 'asis' doesn't do anything # 'in.both' subsets to the people in both #' @rdname paired.internal #' @export na.paired <- function(missings = c("in.both", "fill", "asis")) { missings <- match.arg(missings) switch( missings, in.both = function(object, ...) { obj.id <- object[["(id)"]] omit <- is.na(object[[1]]) | is.na(obj.id) if("(strata)" %in% names(object)) omit <- omit | is.na(object[["(strata)"]]) by.col <- object[[1]][!omit] if(is.factor(by.col)) { by.col <- droplevels(by.col) by.levels <- levels(by.col) } else by.levels <- sort(unique(by.col)) if(length(by.levels) != 2) stop("Please specify exactly 2 time points") ids <- object[["(id)"]][!omit] omit <- omit | (obj.id %nin% intersect(ids[by.col == by.levels[1]], ids[by.col == by.levels[2]])) xx <- object[!omit, , drop = FALSE] if(any(omit)) { temp <- stats::setNames(seq_along(omit)[omit], attr(object, "row.names")[omit]) attr(temp, "class") <- "omit" attr(xx, "na.action") <- temp } xx }, fill = function(object, ...) { omit <- is.na(object[[1]]) | is.na(object[["(id)"]]) if("(strata)" %in% names(object)) omit <- omit | is.na(object[["(strata)"]]) xx <- object[!omit, , drop = FALSE] all.pairs <- expand.grid(times = unique(xx[[1]]), id = unique(xx[["(id)"]]), stringsAsFactors = FALSE, KEEP.OUT.ATTRS = FALSE) xx <- merge(xx, all.pairs, by.x = c(colnames(xx)[1], "(id)"), by.y = c("times", "id"), all = TRUE, sort = FALSE)[names(xx)] if(any(omit)) { temp <- stats::setNames(seq_along(omit)[omit], attr(object, "row.names")[omit]) attr(temp, "class") <- "omit" attr(xx, "na.action") <- temp } xx }, asis = function(object, ...) { # take away na's in tp and id omit <- is.na(object[[1]]) | is.na(object[["(id)"]]) if("(strata)" %in% names(object)) omit <- omit | is.na(object[["(strata)"]]) xx <- object[!omit, , drop = FALSE] if(any(omit)) { temp <- stats::setNames(seq_along(omit)[omit], attr(object, "row.names")[omit]) attr(temp, "class") <- "omit" attr(xx, "na.action") <- temp } xx } ) } arsenal/R/paired.stat.tests.R0000644000176200001440000000201014045303071015570 0ustar liggesusers paired.t <- function(x, y, ..., na.rm = TRUE) { if(is.Date(x) && is.Date(y)) { x <- as.integer(x) y <- as.integer(y) } if(na.rm) { idx <- is.na(x) | is.na(y) x <- x[!idx] y <- y[!idx] } stats::t.test(x, y, paired = TRUE) } mcnemar <- function(x, y, mcnemar.correct = TRUE, ..., na.rm = TRUE) { if(na.rm) { idx <- is.na(x) | is.na(y) x <- x[!idx] y <- y[!idx] } stats::mcnemar.test(x, y, correct = mcnemar.correct) } signed.rank <- function(x, y, signed.rank.exact = NULL, signed.rank.correct = TRUE, ..., na.rm = TRUE) { if(is.ordered(x) && is.ordered(y)) { x <- as.integer(x) y <- as.integer(y) } if(na.rm) { idx <- is.na(x) | is.na(y) x <- x[!idx] y <- y[!idx] } stats::wilcox.test(x, y, paired = TRUE, exact = signed.rank.exact, correct = signed.rank.correct) } sign.test <- function(x, y, ..., na.rm = TRUE) { if(na.rm) { idx <- is.na(x) | is.na(y) x <- x[!idx] y <- y[!idx] } stats::binom.test(c(sum(x > y), sum(x < y))) } arsenal/R/paired.control.R0000644000176200001440000000405014006277520015150 0ustar liggesusers #' Control settings for \code{paired} function #' #' Control test and summary settings for the \code{\link{paired}} function. #' #' @param diff logical, telling \code{paired} whether to calculate a column of differences between time points. #' @param numeric.test name of test for numeric RHS variables in \code{paired}: paired.t, signed.rank, sign.test. #' @param cat.test name of test for categorical variables: mcnemar #' @param ordered.test name of test for ordered variables: signed.rank, sign.test #' @param date.test name of test to perform for date variables: paired.t, signed.rank, sign.test #' @param mcnemar.correct,signed.rank.exact,signed.rank.correct Options for statistical tests. See \code{\link{wilcox.test}} #' and \code{\link{mcnemar.test}} for details. #' @param ... Arguments passed to \code{\link{tableby.control}} #' @details Note that (with the exception of \code{total}) all arguments to \code{\link{tableby.control}} are accepted in #' this function (in fact, this function passes everything through to \code{\link{tableby.control}}). #' However, there are different defaults for the statistical tests (shown here). For details on the other arguments, #' please see the help page for \code{\link{tableby.control}}. #' @return A list with settings to be used within the \code{\link{paired}} function. #' @seealso \code{\link{paired}}, \code{\link{tableby}}, \code{\link{tableby.control}}, \code{\link{summary.tableby}} #' @author Ethan Heinzen #' @export paired.control <- function( diff=TRUE, numeric.test="paired.t", cat.test="mcnemar", ordered.test="signed.rank", date.test="paired.t", mcnemar.correct = TRUE, signed.rank.exact = NULL, signed.rank.correct = TRUE, ...) { out <- tableby.control(numeric.test=numeric.test, cat.test=cat.test, ordered.test=ordered.test, date.test=date.test, ...) ## new args out$diff <- diff out$mcnemar.correct <- mcnemar.correct out$signed.rank.correct <- signed.rank.correct out$signed.rank.exact <- signed.rank.exact ## never show total out$total <- FALSE out$cat.droplevels <- FALSE out } arsenal/R/keep.labels.R0000644000176200001440000000324513632700352014415 0ustar liggesusers #' Keep Labels #' #' Keep the \code{'label'} attribute on an R object when subsetting. \code{loosen.labels} allows the \code{'label'} #' attribute to be lost again. #' #' @param x An R object #' @param i,value See \code{\link{[<-}}. #' @param ... Other arguments (not in use at this time). #' @return A copy of \code{x} with a "keep_labels" class appended on or removed. Note that for the \code{data.frame} method, #' only classes on the columns are changed; the \code{data.frame} won't have an extra class appended. This is different from previous #' versions of \code{arsenal}. #' @author Ethan Heinzen #' @seealso \code{\link{labels}} #' @name keep.labels NULL #> NULL #' @rdname keep.labels #' @export keep.labels <- function(x, ...) { UseMethod("keep.labels") } #' @rdname keep.labels #' @export keep.labels.data.frame <- function(x, ...) { x[] <- lapply(x, keep.labels) x } #' @rdname keep.labels #' @export keep.labels.default <- function(x, ...) { class(x) <- c("keep_labels", class(x)[class(x) != "keep_labels"]) x } #' @rdname keep.labels #' @export `[.keep_labels` <- function(x, ...) { y <- NextMethod() keep.labels(set_attr(y, "label", attr(x, "label"))) } #' @rdname keep.labels #' @export `[<-.keep_labels` <- function(x, i, value) { x <- loosen.labels(x) out <- NextMethod() keep.labels(out) } #' @rdname keep.labels #' @export loosen.labels <- function(x, ...) { UseMethod("loosen.labels") } #' @rdname keep.labels #' @export loosen.labels.data.frame <- function(x, ...) { x[] <- lapply(x, loosen.labels) x } #' @rdname keep.labels #' @export loosen.labels.default <- function(x, ...) { class(x) <- class(x)[class(x) != "keep_labels"] x } arsenal/R/modelsum.internal.R0000644000176200001440000002350514051176217015674 0ustar liggesusers## Purpose: internal functions (and methods) for tableby function ## Authors: Jason Sinnwell, Beth Atkinson ## Created: 9/4/2015 ## Helper functions for modelsum: merge, subset, and labels (work like names) #' Helper functions for modelsum #' #' A set of helper functions for \code{\link{modelsum}}. #' #' @param object A \code{data.frame} resulting from evaluating a \code{modelsum} formula. #' @param ... Other arguments, or a vector of indices for extracting. #' @param x A \code{modelsum} object. #' @return \code{na.modelsum} returns a subsetted version of \code{object} (with attributes). #' @seealso \code{\link{arsenal_table}} #' @name modelsum.internal NULL #> NULL join_formula <- function(x, y) { x <- stats::formula(x) if(is.null(y)) return(x) y <- stats::formula(y) stopifnot(length(y) == 2) if(length(x) == 2) { x[[2]] <- call("+", x[[2]], y[[2]]) } else { stopifnot(length(x) == 3) x[[3]] <- call("+", x[[3]], y[[2]]) } x } #' @rdname modelsum.internal #' @export is.modelsum <- function(x) inherits(x, "modelsum") #' @rdname modelsum.internal #' @export is.summary.modelsum <- function(x) inherits(x, "summary.modelsum") #' @rdname modelsum.internal #' @export na.modelsum <- na_lhs_strata ##standardized beta function (for gaussian stat) lm.beta <- function (MOD) { b <- stats::coef(MOD)[-1] sx <- rep(NA,length(b)) b.idx <- 1 for(k in 2:ncol(MOD$model)) { ## skip factors and char variables, ## psplines consider doing sx, but need a second for loop for the ncol of those if(any(class(MOD$model[,k]) %in% c("character","factor", "pspline"))) { b.idx <- b.idx + ifelse(is.null(ncol(MOD$model[,k])), length(unique(MOD$model[,k]))-1, ncol(MOD$model[,k])) ## skip as many elements of beta as there are N.levels-1 of categorical variables } else { sx[b.idx] <- stats::sd(as.double(MOD$model[,k]),na.rm=TRUE) b.idx <- b.idx + 1 } } sy <- stats::sd(as.double(MOD$model[,1]),na.rm=TRUE) beta <- c(NA,round(b * sx/sy,3)) return(beta) } make_ms_term_labels <- function(mf, trms) { factors <- attr(trms, "factors") mm <- stats::model.matrix(trms, mf) assign <- attr(mm, "assign") mm <- mm[, assign > 0, drop = FALSE] assign <- assign[assign > 0] lvls <- colnames(factors)[assign] names(lvls) <- colnames(mm) out <- lapply(colnames(factors), function(nm2) { idx <- factors[, nm2] > 0 nm <- names(mf)[idx] labelEff <- vapply(nm, function(nam) { if(is.null(lab <- attr(mf[[nam]], "label"))) lab <- nam lab }, NA_character_) list(variable = paste(nm, collapse = ":"), variable2 = nm2, varterm = nm, varterm2 = row.names(factors)[idx], varlabel = unname(labelEff), term = names(lvls)[lvls == nm2]) }) names(out) <- vapply(out, "[[", NA_character_, "variable") out } modelsum_guts <- function(fam, temp.call, envir, conf.level, scope, anyna) { check_pkg("broom") try_lrt <- function(f, s, a) { if(a) return(NA_real_) out <- setdiff(stats::drop1(f, scope = s, test = "Chisq")[["Pr(>Chi)"]], NA_real_) if(length(out) == 1) out else NA_real_ } ## y is ordered factor if (fam$family == "ordinal") { check_pkg("MASS") temp.call[[1]] <- quote(MASS::polr) temp.call$Hess <- TRUE temp.call$method <- fam$method fit <- eval(temp.call, envir) coeffORTidy <- broom::tidy(fit, exponentiate=TRUE, conf.int=TRUE, conf.level=conf.level) coeffORTidy[coeffORTidy$coef.type != "coefficient", names(coeffORTidy) %nin% c("term", "coef.type")] <- NA coeffTidy <- broom::tidy(fit, exponentiate=FALSE, conf.int=TRUE, conf.level=conf.level) coeffTidy$p.value <- 2*stats::pnorm(abs(coeffTidy$statistic), lower.tail = FALSE) coeffTidy <- cbind(coeffTidy, OR=coeffORTidy$estimate, CI.lower.OR=coeffORTidy$conf.low, CI.upper.OR=coeffORTidy$conf.high) # sort so that zeta comes first, but hold all else fixed coeffTidy <- coeffTidy[order(coeffTidy$coef.type == "coefficient", seq_len(nrow(coeffTidy))), ] modelGlance <- broom::glance(fit) modelGlance$p.value.lrt <- try_lrt(fit, scope, anyna) } else if (fam$family == "gaussian") { # ## issue warning if appears categorical temp.call[[1]] <- quote(stats::lm) temp.call$x <- TRUE fit <- eval(temp.call, envir) coeffTidy <- broom::tidy(fit, conf.int=TRUE, conf.level=conf.level) if("(weights)" %in% colnames(fit$model)) fit$model[["(weights)"]] <- NULL coeffTidy$standard.estimate <- lm.beta(fit) ## Continuous variable (numeric) ############### ## Note: Using tidy changes colname from 't value' to 'statistic' modelGlance <- broom::glance(fit) names(modelGlance)[names(modelGlance) == "statistic"] <- "statistic.F" names(modelGlance)[names(modelGlance) == "p.value"] <- "p.value.F" modelGlance$p.value.lrt <- try_lrt(fit, scope, FALSE) } else if (fam$family == "binomial" || fam$family == "quasibinomial") { ## These families are used in glm check_pkg("pROC") temp.call[[1]] <- quote(stats::glm) temp.call$x <- TRUE temp.call$family <- fam fit <- eval(temp.call, envir) #coeffbeta <- summary(fit)$coef ## find out that broom:::tidy.lm allows conf.int and exp coeffORTidy <- broom::tidy(fit, exponentiate=TRUE, conf.int=TRUE, conf.level=conf.level) coeffORTidy[coeffORTidy$term == "Intercept", -1] <- NA coeffTidy <- broom::tidy(fit, exponentiate=FALSE, conf.int=TRUE, conf.level=conf.level) waldTidy <- suppressMessages(stats::confint.default(fit, conf.level=conf.level)) all_na <- apply(waldTidy, 1, allNA) waldTidy <- stats::setNames(as.data.frame(waldTidy[!all_na, , drop = FALSE]), c("conf.low", "conf.high")) coeffTidy <- cbind(coeffTidy, OR=coeffORTidy$estimate, CI.lower.OR=coeffORTidy$conf.low, CI.upper.OR=coeffORTidy$conf.high, CI.lower.wald=waldTidy$conf.low, CI.upper.wald=waldTidy$conf.high, CI.lower.OR.wald=exp(waldTidy$conf.low), CI.upper.OR.wald=exp(waldTidy$conf.high)) modelGlance <- broom::glance(fit) modelGlance$concordance <- as.numeric(pROC::auc(fit$y ~ predict(fit, type='response'), direction = "<", levels = 0:1)) modelGlance$p.value.lrt <- try_lrt(fit, scope, FALSE) modelGlance$Nevents <- table(fit$y)[2] } else if (fam$family == "quasipoisson" || fam$family == "poisson") { ## These families use glm temp.call[[1]] <- quote(stats::glm) temp.call$x <- TRUE temp.call$family <- fam fit <- eval(temp.call, envir) coeffRRTidy <- broom::tidy(fit, exponentiate=TRUE, conf.int=TRUE, conf.level=conf.level) coeffRRTidy[coeffRRTidy$term == "Intercept", -1] <- NA coeffTidy <- broom::tidy(fit, exponentiate=FALSE, conf.int=TRUE, conf.level=conf.level) coeffTidy <- cbind(coeffTidy, RR=coeffRRTidy$estimate, CI.lower.RR=coeffRRTidy$conf.low, CI.upper.RR=coeffRRTidy$conf.high) modelGlance <- broom::glance(fit) modelGlance$p.value.lrt <- try_lrt(fit, scope, FALSE) } else if (fam$family == "negbin") { ## Also uses glm check_pkg("MASS") temp.call[[1]] <- quote(MASS::glm.nb) temp.call$x <- TRUE temp.call$link <- fam$link fit <- eval(temp.call, envir) coeffRRTidy <- suppressWarnings(broom::tidy(fit, exponentiate=TRUE, conf.int=TRUE, conf.level=conf.level)) coeffRRTidy[coeffRRTidy$term == "Intercept", -1] <- NA coeffTidy <- suppressWarnings(broom::tidy(fit, exponentiate=FALSE, conf.int=TRUE, conf.level=conf.level)) coeffTidy <- cbind(coeffTidy, RR=coeffRRTidy$estimate, CI.lower.RR=coeffRRTidy$conf.low, CI.upper.RR=coeffRRTidy$conf.high) modelGlance <- suppressWarnings(broom::glance(fit)) modelGlance$theta <- fit$theta modelGlance$SE.theta <- fit$SE.theta modelGlance$p.value.lrt <- try_lrt(fit, scope, anyna) } else if(fam$family == "clog") { check_pkg("survival") temp.call[[1]] <- quote(survival::clogit) fit <- eval(temp.call, envir) ## use tidy to get both CIs, merge coeffORTidy <- broom::tidy(fit, exponentiate=TRUE, conf.int=conf.level) coeffTidy <- broom::tidy(fit, exponentiate=FALSE, conf.int=conf.level) coeffTidy <- cbind(coeffTidy, OR=coeffORTidy$estimate, CI.lower.OR=coeffORTidy$conf.low, CI.upper.OR=coeffORTidy$conf.high) modelGlance <- broom::glance(fit) names(modelGlance)[names(modelGlance) == "nevent"] <- "Nevents" modelGlance$p.value.lrt <- try_lrt(fit, scope, anyna) } else if (fam$family == "relrisk") { check_pkg("geepack") temp.call[[1]] <- quote(geepack::geeglm) temp.call$family <- stats::poisson(fam$link) temp.call$corstr <- "independence" fit <- eval(temp.call, envir) coeffRRTidy <- broom::tidy(fit, exponentiate=TRUE, conf.int=TRUE, conf.level=conf.level) coeffRRTidy[coeffRRTidy$term == "Intercept", -1] <- NA coeffTidy <- broom::tidy(fit, exponentiate=FALSE, conf.int=TRUE, conf.level=conf.level) coeffTidy <- cbind(coeffTidy, RR=coeffRRTidy$estimate, CI.lower.RR=coeffRRTidy$conf.low, CI.upper.RR=coeffRRTidy$conf.high) modelGlance <- broom::glance(fit) } else if(fam$family == "survival") { check_pkg("survival") temp.call[[1]] <- quote(survival::coxph) fit <- eval(temp.call, envir) ## use tidy to get both CIs, merge coeffHRTidy <- broom::tidy(fit, exponentiate=TRUE, conf.int=TRUE, conf.level=conf.level) coeffTidy <- broom::tidy(fit, exponentiate=FALSE, conf.int=TRUE, conf.level=conf.level) coeffTidy <- cbind(coeffTidy, HR=coeffHRTidy$estimate, CI.lower.HR=coeffHRTidy$conf.low, CI.upper.HR=coeffHRTidy$conf.high) modelGlance <- broom::glance(fit) names(modelGlance)[names(modelGlance) == "nevent"] <- "Nevents" modelGlance$p.value.lrt <- try_lrt(fit, scope, anyna) } names(coeffTidy)[names(coeffTidy) == "conf.low"] <- "CI.lower.estimate" names(coeffTidy)[names(coeffTidy) == "conf.high"] <- "CI.upper.estimate" modelGlance[] <- lapply(modelGlance, unname) list(coeffTidy = coeffTidy, modelGlance = modelGlance, fit = fit) } arsenal/R/freqlist.R0000644000176200001440000002055313656527335014100 0ustar liggesusers#' freqlist #' #' Approximate the output from SAS's \code{PROC FREQ} procedure when using the \code{/list} option of the \code{TABLE} statement. #' #' @param object An R object, usually of class \code{"table"} or class \code{"xtabs"} #' @param na.options a character string indicating how to handling missing values: \code{"include"} #' (include values with NAs in counts and percentages), #' \code{"showexclude"} (show NAs but exclude from cumulative counts and all percentages), #' \code{"remove"} (remove values with NAs); default is \code{"include"}. #' @param strata (formerly \code{groupBy}) an optional character string specifying a variable(s) to use for grouping when calculating cumulative #' counts and percentages. \code{\link{summary.freqlist}} will also separate by grouping variable for printing. Note that this is different #' from \code{modelsum} and \code{tableby}, which take bare column names (and only one, at that!) #' @param labelTranslations an optional character string (or list) of labels to use for variable levels when summarizing. #' Names will be matched appropriately. #' @param control control parameters to handle optional settings within \code{freqlist}. See \code{\link{freq.control}} #' @param ... additional arguments. In the formula method, these are passed to the table method. These are also passed to #' \code{\link{freq.control}} #' @param formula,data,subset,na.action,addNA,exclude,drop.unused.levels Arguments passed to \code{\link[stats]{xtabs}}. #' @return An object of class \code{c("freqlist", "arsenal_table")} #' @seealso \code{\link{arsenal_table}}, \code{\link{summary.freqlist}}, \code{\link{freq.control}}, \code{\link{freqlist.internal}}, #' \code{\link[base]{table}}, \code{\link[stats]{xtabs}} #' #' @examples #' # load mockstudy data #' data(mockstudy) #' tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA = "ifany") #' noby <- freqlist(tab.ex, na.options = "include") #' summary(noby) #' #' # show the top 6 rows' frequencies and percents #' head(summary(sort(noby, decreasing = TRUE)[c(1:4, 6)])) #' #' withby <- freqlist(tab.ex, strata = c("arm","sex"), na.options = "showexclude") #' summary(withby) #' @author Tina Gunderson, with revisions by Ethan Heinzen #' @name freqlist NULL #> NULL #' @rdname freqlist #' @export freqlist <- function(object, ...) { UseMethod("freqlist") } #' @rdname freqlist #' @export freqlist.table <- function(object, na.options = c("include", "showexclude", "remove"), strata = NULL, labelTranslations = NULL, control = NULL, ...) { control <- c(list(...), control) control <- do.call("freq.control", control[!duplicated(names(control))]) Call <- match.call() na.options <- match.arg(na.options) if(min(dim(object)) < 1) stop("table object has dimension of 0") if("groupBy" %in% names(list(...))) { if(is.null(strata)) strata <- list(...)$groupBy .Deprecated(msg = "Using 'groupBy = ' is deprecated. Use 'strata = ' instead.") } hasStrata <- !is.null(strata) if(hasStrata && any(strata %nin% names(dimnames(object)))) stop("strata variable not found in table names") # all this just to keep non-syntactic names to_df <- function(x) { data.frame( do.call("expand.grid", c(dimnames(provideDimnames(x, sep = "", base = list(LETTERS))), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = TRUE)), Freq = c(x), row.names = NULL, check.names = FALSE ) } tab.freq <- to_df(object) if(hasStrata && is.null(names(dimnames(object))) && is.null(names(labelTranslations))) { if(length(labelTranslations) != ncol(tab.freq) - 1) stop("'labelTranslations' doesn't appear to be the same length as 'object'") names(labelTranslations) <- utils::head(names(tab.freq), -1) } #### x variables (which might include strata) #### xTerms <- lapply(utils::head(names(tab.freq), -1), function(nm) list(variable=nm, label=nm, term=nm)) names(xTerms) <- vapply(xTerms, "[[", NA_character_, "variable") #if a grouping factor is given, will add NA as a factor level so it is not dropped when using the by function if(hasStrata) { if(na.options != 'remove') tab.freq[strata] <- lapply(tab.freq[strata], function(x) if(anyNA(x)) addNA(x) else x) tableout <- unclass(by(tab.freq, tab.freq[rev(strata)], FUN = internalTable, na.options = na.options)) tableout <- lapply(tableout, function(x) { x <- x[c(strata, colnames(x)[colnames(x) %nin% strata])] row.names(x) <- NULL x }) xTerms <- xTerms[c(strata, names(xTerms)[names(xTerms) %nin% strata])] strata.levels <- "" strataLabel <- strata } else { tableout <- list(internalTable(tab.freq, na.options = na.options)) strata <- strata.levels <- strataLabel <- "" } out.tables = list( list( y = list(term = "", label = ""), strata = list(term = strata, values = strata.levels, label = strataLabel, hasStrata = hasStrata), x = add_freqlist_xterms(xTerms), tables = unname(tableout), hasWeights = FALSE, na.options = na.options ) ) out <- structure(list(Call = Call, control = control, tables = out.tables), class = c("freqlist", "arsenal_table")) if(!is.null(labelTranslations)) labels(out) <- labelTranslations out } #' @rdname freqlist #' @export freqlist.formula <- function(formula, data, subset, na.action, na.options = c("include", "showexclude", "remove"), strata = NULL, labelTranslations = NULL, control = NULL, addNA, exclude, drop.unused.levels, ...) { control <- c(list(...), control) control <- do.call("freq.control", control[!duplicated(names(control))]) Call <- match.call() na.options <- match.arg(na.options) if("groupBy" %in% names(list(...))) { if(is.null(strata)) strata <- list(...)$groupBy .Deprecated(msg = "Using 'groupBy = ' is deprecated. Use 'strata = ' instead.") } indx <- match(c("formula", "data", "subset", "na.action", "addNA", "exclude", "drop.unused.levels"), names(Call), nomatch = 0) if(indx[1] == 0) stop("A formula argument is required.") formula.list <- as_list_formula(formula) out.tables <- list() for(FORM in formula.list) { temp.call <- Call[c(1, indx[1:4])] temp.call[[1L]] <- quote(stats::model.frame) temp.call$formula <- FORM if(!missing(data)) temp.call$data <- as.call(list(keep.labels, temp.call$data)) modeldf <- loosen.labels(eval(temp.call, parent.frame())) Terms <- stats::terms(modeldf) hasStrata <- !is.null(strata) if(hasStrata && any(strata %nin% names(modeldf))) stop("strata variable not found in table names") #### Check for strata #### if(hasStrata) { strata.levels <- "" strata.terms <- strata strataLabel <- unname(vapply(strata, function(x) if(is.null(labelEff <- attr(modeldf[[x]], "label"))) x else labelEff, NA_character_)) } else strata.terms <- strata.levels <- strataLabel <- "" strataList <- list(term = strata.terms, values = strata.levels, label = strataLabel, hasStrata = hasStrata) #### Check for weights #### if(hasWeights <- attributes(Terms)$response != 0) { termBy <- names(modeldf)[1] if(is.null(labelBy <- attr(modeldf[[1]], "label"))) labelBy <- termBy yList <- list(term = termBy, label = labelBy) modeldf[[1]] <- NULL } else yList <- list(term = "", label = "") #### x variables (which might include strata) #### xTerms <- Map(modeldf, names(modeldf), f = function(col, nm) { if(is.null(labelEff <- attr(col, "label"))) labelEff <- nm list(variable=nm, label=labelEff, term=nm) }) names(xTerms) <- vapply(xTerms, "[[", NA_character_, "variable") if(hasStrata) xTerms <- xTerms[c(strata, names(xTerms)[names(xTerms) %nin% strata])] #### temp.call <- Call[c(1, indx)] temp.call[[1L]] <- quote(stats::xtabs) temp.call$formula <- FORM if(indx[5] == 0) temp.call$addNA <- TRUE tab <- freqlist(eval(temp.call, parent.frame()), strata = strata, na.options = na.options, ...)$tables[[1]] tab$hasWeights <- hasWeights tab$y <- yList tab$strata <- strataList tab$x <- add_freqlist_xterms(xTerms) # this should still work even if there's multiple LHS -- test that with list(, , y) ~ x out.tables[[yList$term]] <- tab } out <- structure(list(Call = Call, control = control, tables = out.tables), class = c("freqlist", "arsenal_table")) if(!is.null(labelTranslations)) labels(out) <- labelTranslations out } arsenal/R/labels.R0000644000176200001440000000444013632700352013470 0ustar liggesusers #' Labels #' #' Assign and extract the \code{'label'} attribute on an R object. \code{set_labels} is #' the same as \code{labels(x) <- value} but returns \code{x} for use in a pipe chain. #' \code{set_attr} is the same as \code{attr(x, which) <- value} but returns \code{x} #' for use in a pipe chain. #' #' @param x,object An R object. #' @param value A vector or list containing labels to assign. Labels are assigned based on #' names, if available; otherwise, they're assigned in order. Can pass \code{NULL} #' to remove all labels. #' @param which See \code{\link{attr<-}} #' @param ... Other arguments (not in use at this time). #' @return The labels of \code{object}, or \code{object} with new labels. #' @details #' The \code{\link{data.frame}} methods put labels on and extract labels from #' the \emph{columns} of \code{object}. #' @seealso \code{\link{keep.labels}} #' @author Ethan Heinzen #' @name labels NULL #> NULL #' @rdname labels #' @export labels.data.frame <- function(object, ...) { lapply(object, attr, which = "label", exact = TRUE) } #' @rdname labels #' @export labels.keep_labels <- function(object, ...) { attr(object, "label", exact = TRUE) } #' @rdname labels #' @export 'labels<-' <- function(x, value) { UseMethod("labels<-") } #' @rdname labels #' @export `labels<-.keep_labels` <- function(x, value) { set_attr(x, "label", value) } #' @rdname labels #' @export `labels<-.default` <- function(x, value) # we can't overwrite labels.default(), but nothing is stopping us from doing this { set_attr(x, "label", value) } #' @rdname labels #' @export `labels<-.data.frame` <- function(x, value) { if(is.null(value)) { value <- rep(list(NULL), times = ncol(x)) } if(is.null(names(x)) || is.null(names(value))) { if(length(x) != length(value) && length(x) > 0) stop("'x' and 'value' have unequal lengths, and don't have names") idx <- seq_along(x) # just map one-to-one } else { value <- value[names(value) %in% names(x)] idx <- match(names(value), names(x)) } for(i in seq_along(idx)) { attr(x[[idx[i]]], "label") <- value[[i]] } x } #' @rdname labels #' @export set_labels <- function(x, value) { labels(x) <- value x } #' @rdname labels #' @export set_attr <- function(x, which, value) { attr(x, which) <- value x } arsenal/R/na.operations.R0000644000176200001440000000245213632700352015007 0ustar liggesusers #' Some functions to handle NAs #' #' \code{allNA} tests if all elements are NA, and \code{includeNA} sets the #' \code{NA}s in a character vector or factor to an explicit label. #' #' @param x An object #' @param label A character string denoting the label to set \code{NA}s to. #' @param first Logical; should the new label be the first level? #' @param ... Other arguments (not in use at this time). #' #' @seealso \code{\link{is.na}}, \code{\link{anyNA}} #' @author Ethan Heinzen #' @name NA.operations NULL #> NULL #' @rdname NA.operations #' @export allNA <- function(x) all(is.na(x)) #' @rdname NA.operations #' @export includeNA <- function(x, label, ...) { UseMethod("includeNA") } #' @rdname NA.operations #' @export includeNA.factor <- function(x, label = "(Missing)", first = FALSE, ...) { lvl <- levels(x) if(label %in% lvl) { warning('"', label, '" already appears in levels(x).') } else if(first) { levels(x) <- c(label, lvl) x[] <- levels(x)[as.integer(x) + 1L] } else levels(x) <- c(lvl, label) # don't use factor() here, in case you lose attributes x[is.na(x)] <- label x } #' @rdname NA.operations #' @export includeNA.default <- function(x, label = "(Missing)", ...) { if(label %in% x) warning('"', label, '" already appears in x.') x[is.na(x)] <- label x } arsenal/R/release_questions.R0000644000176200001440000000076413632700352015765 0ustar liggesusersrelease_questions <- function() { c( "Have you updated the DESCRIPTION file? Make sure the version number is right.", "Have you checked for reverse dependencies?", "Have you updated README.md?", "Have you updated NEWS.md?", "Have you updated cran-comments.md?", "Have you updated all the documentation using devtools::check_man()?", "Have you gotten approval from all authors to push to CRAN?", "Did you make sure the DESCRIPTION matches what's in arsenal.R?" ) } arsenal/R/paired.R0000644000176200001440000003650414045306367013507 0ustar liggesusers #' Summary Statistics of a Set of Independent Variables Paired Across Two Timepoints #' #' Summarize one or more variables (x) by a paired time variable (y). Variables #' on the right side of the formula, i.e. independent variables, are summarized by the #' two time points on the left of the formula. Optionally, an appropriate test is performed to test the #' distribution of the independent variables across the time points. #' @param formula an object of class \code{\link{formula}} of the form \code{time ~ var1 + ...}. #' See "Details" for more information. #' @inheritParams tableby #' @param id The vector giving IDs to match up data for the same subject across two timepoints. #' @param na.action a function which indicates what should happen when the data contain \code{NA}s. #' The default is \code{na.paired("in.both")}. See \code{\link{na.paired}} for more details #' @param control control parameters to handle optional settings within \code{paired}. #' Two aspects of \code{paired} are controlled with these: test options of RHS variables and x variable summaries. #' Arguments for \code{paired.control} can be passed to \code{paired} via the \code{...} argument, #' but if a control object and \code{...} arguments are both supplied, #' the latter are used. See \code{\link{paired.control}} for more details. #' @param ... additional arguments to be passed to internal \code{paired} functions or \code{\link{paired.control}}. #' @return An object with class \code{c("paired", "tableby", "arsenal_table")} #' @details #' Do note that this function piggybacks off of \code{\link{tableby}} quite heavily, so there is no #' \code{summary.paired} function (for instance). #' #' These tests are accepted: #' \itemize{ #' \item{ #' \code{paired.t}: a paired \code{\link[stats:t.test]{t-test}}. #' } #' \item{ #' \code{mcnemar}: \link[stats:mcnemar.test]{McNemar's test}. #' } #' \item{ #' \code{signed.rank}: a \link[stats:wilcox.test]{signed rank test}. #' } #' \item{ #' \code{sign.test}: a sign test. #' } #' \item{ #' \code{notest}: no test is performed. #' } #' } #' @seealso \code{\link{arsenal_table}}, \code{\link{paired.control}}, \code{\link{tableby}}, \code{\link{formulize}}, \code{\link{selectall}} #' @author Jason Sinnwell, Beth Atkinson, Ryan Lennon, and Ethan Heinzen #' @name paired NULL #> NULL #' @rdname paired #' @export paired <- function(formula, data, id, na.action, subset=NULL, strata, control = NULL, ...) { control <- c(list(...), control) control <- do.call("paired.control", control[!duplicated(names(control))]) Call <- match.call() ## Tell user if they passed an argument that was not expected, either here or in control expectArgs <- c("formula", "data", "na.action", "subset", "strata", "control", names(control), "id") match.idx <- match(names(Call)[-1], expectArgs) if(anyNA(match.idx)) warning("unused arguments: ", paste(names(Call)[1+which(is.na(match.idx))], collapse=", "), "\n") indx <- match(c("formula", "data", "subset", "na.action", "id", "strata"), names(Call), nomatch = 0) if(indx[1] == 0) stop("A formula argument is required") if(length(formula) != 3) stop("'formula' must be two-sided.") if(indx[5] == 0) stop("An id argument is required") special <- c("paired.t", "mcnemar", "signed.rank", "sign.test", "notest") out.tables <- list() formula.list <- as_list_formula(formula) for(FORM in formula.list) { temp.call <- Call[c(1, indx)] temp.call[[1]] <- as.name("model.frame") if(is.null(temp.call$na.action)) temp.call$na.action <- na.paired("in.both") if (missing(data)) { temp.call$formula <- stats::terms(FORM, special) } else { # instead of call("keep.labels", ...), which breaks when arsenal isn't loaded (Can't find "keep.labels") temp.call$data <- as.call(list(keep.labels, temp.call$data)) temp.call$formula <- stats::terms(FORM, special, data = keep.labels(data)) } ## set up new environment for ## if specials, assign dummy versions of those functions tabenv <- new.env(parent = environment(formula)) for(sp in special) { if(!is.null(attr(temp.call$formula, "specials")[[sp]])) assign(sp, inline_tableby_stat_test, envir = tabenv) } ## set tabenv as environment in which to evalulate formula environment(temp.call$formula) <- tabenv ## evaluate the formula with env set for it modeldf <- loosen.labels(eval.parent(temp.call)) if(nrow(modeldf) == 0) stop("No (non-missing) observations") Terms <- stats::terms(modeldf) ###### Check for strata ###### if(hasStrata <- "(strata)" %in% colnames(modeldf)) { strata.col <- modeldf[["(strata)"]] strataTerm <- deparse(Call$strata) if(is.null(strataLabel <- attr(strata.col, "label"))) strataLabel <- strataTerm if(is.factor(strata.col)) { strata.col <- droplevels(strata.col) strata.levels <- levels(strata.col) } else strata.levels <- sort(unique(strata.col)) modeldf[["(strata)"]] <- NULL } else { strata.col <- rep("", nrow(modeldf)) strataTerm <- strataLabel <- strata.levels <- "" } ###### Check for by-variable ###### if(attributes(Terms)$response != 0) { by.col <- modeldf[[1]] termBy <- names(modeldf)[1] if(is.null(labelBy <- attr(by.col, "label"))) labelBy <- termBy if(is.factor(by.col)) { by.col <- droplevels(by.col) by.levels <- levels(by.col) } else by.levels <- sort(unique(by.col)) by.col <- as.character(by.col) by.levels <- as.character(by.levels) if(any(by.levels == "")) { warning('Empty string detected in by-variable is not allowed; converting to " ".') by.col[by.col == ""] <- " " by.levels <- unique(replace(by.levels, by.levels == "", " ")) } modeldf[[1]] <- NULL } if(length(by.levels) != 2) stop("Please specify exactly 2 time points") ids <- modeldf$`(id)` tab <- table(ids, by.col) if(sum(tab > 1) > 0) stop("At least one person has multiple observations for at least one time point") if(sum(rowSums(tab) == 2) == 0) stop("No one appears to have data on both time points") ids.both <- intersect(ids[by.col == by.levels[1]], ids[by.col == by.levels[2]]) strata.col1 <- strata.col[by.col == by.levels[1]] TP1 <- modeldf[by.col == by.levels[1], , drop = FALSE] strata.col2 <- strata.col[by.col == by.levels[2]] TP2 <- modeldf[by.col == by.levels[2], , drop = FALSE] cn <- colnames(modeldf) cn <- cn[cn != "(id)"] idx1 <- match(ids.both, TP1$`(id)`, nomatch = 0) idx2 <- match(ids.both, TP2$`(id)`, nomatch = 0) strata.col1 <- strata.col1[idx1] strata.col2 <- strata.col2[idx2] TP1 <- TP1[idx1, cn, drop = FALSE] TP2 <- TP2[idx2, cn, drop = FALSE] modeldf[["(id)"]] <- NULL if(is.null(difflab <- control$stats.labels$difference)) difflab <- "Difference" ystats <- c(table(factor(by.col, levels=by.levels), exclude=NA), Difference=length(ids.both)) names(ystats)[names(ystats) == "Difference"] <- difflab yList <- list(stats=ystats, label=labelBy, term=termBy) ## find which columnss of modeldf have specials assigned to known specials specialIndices <- unlist(attr(Terms, "specials")) - attributes(Terms)$response specialTests <- rep("", ncol(modeldf)) ## If a special shows up multiple times, the unlist assigned a number at the end. Strip it off. ## This disallows functions with a number at the end specialTests[specialIndices] <- gsub("\\d+$", "", names(specialIndices)) xTerms <- Map(modeldf, names(modeldf), f = function(col, nm) { if(is.null(nameEff <- attr(col, "name"))) nameEff <- nm if(is.null(labelEff <- attr(col, "label"))) labelEff <- nameEff if(is.null(termEff <- attr(col, "term"))) termEff <- nm list(variable=nameEff, label=labelEff, term=termEff, term.orig = nm) }) names(xTerms) <- vapply(xTerms, "[[", NA_character_, "variable") control.list <- lapply(modeldf, attr, "control.list") names(control.list) <- names(xTerms) strataList <- vector("list", length(strata.levels)) if(hasStrata) names(strataList) <- paste0("(", strataTerm, ") == ", strata.levels) for(strat in strata.levels) { ## list of x variables xList <- vector("list", ncol(modeldf)) names(xList) <- names(xTerms) bycol <- by.col[strata.col == strat] for(eff in seq_along(modeldf)) { currcol <- modeldf[[eff]] TP1.eff <- TP1[[eff]] TP2.eff <- TP2[[eff]] ############################################################ if(is.ordered(currcol) || is.logical(currcol) || is.factor(currcol) || is.character(currcol)) { ######## ordinal or categorical variable (character or factor) ############### ## convert logicals and characters to factor if(is.character(currcol)) { lvl <- sort(unique(currcol[!is.na(currcol)])) currcol <- factor(currcol, levels = lvl) TP1.eff <- factor(TP1.eff, levels = lvl) TP2.eff <- factor(TP2.eff, levels = lvl) } else if(is.logical(currcol)) { lvl <- c(FALSE, TRUE) currcol <- factor(currcol, levels=lvl) TP1.eff <- factor(TP1.eff, levels = lvl) TP2.eff <- factor(TP2.eff, levels = lvl) } ## to make sure all levels of cat variable are counted, need to pass values along xlevels <- levels(currcol) if(length(xlevels) == 0) stop(paste0("Zero-length levels found for ", names(xTerms)[eff])) ## get stats funs from either formula or control if(is.ordered(currcol)) { currstats <- control$ordered.stats currtest <- control$ordered.test vartype <- "ordinal" } else { currstats <- control$cat.stats currtest <- control$cat.test vartype <- "categorical" } } else if(is.Date(currcol)) { ######## Date variable ############### xlevels <- sort(unique(currcol)) ## get stats funs from either formula or control currstats <- control$date.stats currtest <- control$date.test vartype <- "Date" } else if(is.selectall(currcol)) { xlevels <- colnames(currcol) currstats <- control$selectall.stats currtest <- control$selectall.test vartype <- "selectall" } else if(inherits(currcol, "Surv")) { ##### Survival (time to event) ####### stop("Sorry, survival objects don't work in this function.") } else if(is.numeric(currcol) || inherits(currcol, "difftime")) { ######## Continuous variable (numeric) ############### ## for difftime, convert to numeric if(inherits(currcol, "difftime")) currcol <- as.numeric(currcol) xlevels <- sort(unique(currcol)) ## if no missings, and control says not to show missings, ## remove Nmiss stat fun currstats <- control$numeric.stats currtest <- control$numeric.test vartype <- "numeric" } else stop("Variable ", names(xTerms), " has unknown class(es): ", paste0(class(currcol)[-1], collapse = ", ")) ############################################################ ## if no missings, and control says not to show missings, ## remove Nmiss stat fun if(!is.null(attrstats <- attr(modeldf[[eff]], "stats"))) currstats <- attrstats # now finally subset currcol <- currcol[strata.col == strat] TP1.eff <- TP1.eff[strata.col1 == strat] TP2.eff <- TP2.eff[strata.col2 == strat] if(!anyNA(currcol) && "Nmiss" %in% currstats) currstats <- currstats[currstats != "Nmiss"] statList <- list() for(statfun2 in currstats) { statfun <- get_stat_function(statfun2) tmp <- get0(statfun, mode = "function") statfun <- if(is.null(tmp)) get(statfun, parent.frame(), mode = "function") else tmp bystatlist <- list() if(statfun2 %in% c("countrowpct", "countcellpct", "rowbinomCI", "Npct")) { bystatlist <- do.call(statfun, list(currcol, levels = xlevels, by = by.col, by.levels = by.levels, na.rm = TRUE)) bystatlist$Total <- NULL } else if(statfun2 == "Nsigntest") { bystatlist <- as.countpct(NA_real_) } else { for(bylev in by.levels) { idx <- bycol == bylev bystatlist[[bylev]] <- do.call(statfun, list(currcol[idx], levels=xlevels, na.rm=TRUE, conf.level = control$conf.level)) } } if(is.selectall(currcol)) { tmp <- as.selectall(TP1.eff != TP2.eff) bystatlist[[difflab]] <- do.call(statfun, list(tmp, levels=xlevels, na.rm=TRUE, conf.level = control$conf.level)) } else if(statfun2 %in% c("countpct", "countrowpct", "countcellpct")) { # countrowpct to get the right percentages bystatlist[[difflab]] <- countrowpct(TP1.eff, levels = xlevels, by = TP1.eff == TP2.eff, by.levels = c(TRUE, FALSE), na.rm = TRUE)[[2]] } else if(statfun2 == "count") { # this doesn't have percentages bystatlist[[difflab]] <- count(replace(TP1.eff, TP1.eff == TP2.eff, NA), levels = xlevels, na.rm = TRUE) } else if(statfun2 %in% c("binomCI", "rowbinomCI")) { bystatlist[[difflab]] <- rowbinomCI(TP1.eff, levels = xlevels, by = TP1.eff == TP2.eff, by.levels = c(TRUE, FALSE), na.rm = TRUE, conf.level = control$conf.level)[[2]] } else if(statfun2 == "Npct") { # get the right percentages bystatlist[[difflab]] <- Npct(TP1.eff, levels = xlevels, by = TP1.eff == TP2.eff, by.levels = c(TRUE, FALSE), na.rm = TRUE)[[2]] } else { bystatlist[[difflab]] <- do.call(statfun, list(as.numeric(TP2.eff) - as.numeric(TP1.eff), levels=xlevels, na.rm=TRUE, conf.level = control$conf.level)) } statList[[statfun2]] <- bystatlist } if(length(statList) == 0) stop(paste0("Nothing to show for variable '", names(xTerms)[eff], "'")) currtest <- if(nchar(specialTests[eff]) > 0) specialTests[eff] else currtest testout <- if(control$test) { eval(call(currtest, TP1.eff, TP2.eff, mcnemar.correct=control$mcnemar.correct, signed.rank.exact = control$signed.rank.exact, signed.rank.correct = control$signed.rank.correct, test.always=control$test.always)) } else notest() xList[[eff]] <- list(stats=statList, test=testout, type=vartype) } strataList[[if(!hasStrata) 1 else paste0("(", strataTerm, ") == ", strat)]] <- xList } out.tables[[termBy]] <- list(formula = FORM, y = yList, strata = list(term = strataTerm, values = strata.levels, label = strataLabel, hasStrata = hasStrata), x = xTerms, tables = strataList, control.list = control.list, hasWeights = FALSE) } structure(list(Call = Call, control = control, tables = out.tables), class = c("paired", "tableby", "arsenal_table")) } arsenal/R/as.data.frame.tableby.R0000644000176200001440000001120613745612376016266 0ustar liggesusersget_tb_strata_part <- function(tbList, sValue, xList, ...) { Map(get_tb_part, tbList, xList, MoreArgs = list(sValue = sValue, ...)) } get_tb_part <- function(tbList, xList, yList, sList, sValue, cntrl) { statLabs <- cntrl$stats.labels f <- function(x, nm, lab = FALSE) { if(inherits(x[[1]], "tbstat_multirow")) return(if(lab) names(x[[1]]) else rep(nm, length(x[[1]]))) if(lab && nm %in% names(statLabs)) statLabs[[nm]] else nm } out <- data.frame( group.term = yList$term, group.label = yList$label, strata.term = if(!sList$hasStrata) "" else paste0("(", sList$term, ") == ", sValue), strata.value = if(!sList$hasStrata) "" else sValue, variable = xList$variable, term = c(xList$term, unlist(Map(f, tbList$stats, names(tbList$stats)), use.names = FALSE)), label = c(xList$label, unlist(Map(f, tbList$stats, names(tbList$stats), lab = TRUE), use.names = FALSE)), variable.type = tbList$type, stringsAsFactors = FALSE ) if(!sList$hasStrata) out$strata.value <- NULL else names(out)[4] <- sList$label f2 <- function(x, lv) { if(inherits(x[[1]], "tbstat_multirow")) x[[lv]] else x[lv] } for(lvl in names(yList$stats)) { out[[lvl]] <- c("", unlist(lapply(tbList$stats, f2, lv = lvl), recursive = FALSE, use.names = FALSE)) } if(cntrl$test) { out$test <- tbList$test$method out$p.value <- tbList$test$p.value } out } #' as.data.frame.tableby #' #' Coerce a \code{\link{tableby}} object to a \code{data.frame}. #' #' @param x A \code{\link{tableby}} object. #' @param ... Arguments to pass to \code{\link{tableby.control}}. #' @inheritParams summary.tableby #' @seealso \code{\link{tableby}}, \code{\link{tableby}} #' @return A \code{data.frame}. #' @author Ethan Heinzen, based on code originally by Greg Dougherty #' @export as.data.frame.tableby <- function(x, ..., labelTranslations = NULL, list.ok = FALSE) { if(!is.null(labelTranslations)) labels(x) <- labelTranslations control <- c(list(...), x$control) control <- do.call("tableby.control", control[!duplicated(names(control))]) out <- lapply(x$tables, as_data_frame_tableby, control = control) if(!list.ok) { if(length(out) == 1) out <- out[[1]] else warning("as.data.frame.tableby is returning a list of data.frames") } set_attr(out, "control", control) } as_data_frame_tableby <- function(byList, control) { stopifnot(length(byList$tables) == length(byList$strata$values)) tabs <- Map(get_tb_strata_part, tbList = byList$tables, sValue = byList$strata$values, MoreArgs = list(yList = byList$y, sList = byList$strata, xList = byList$x, cntrl = control)) out <- do.call(rbind_chr, unlist(tabs, recursive = FALSE, use.names = FALSE)) f <- function(elt, whch) if(is.null(elt[[whch]])) control[[whch]] else elt[[whch]] simp.num <- vapply(byList$control.list, f, NA, "numeric.simplify") simp.cat <- lapply(byList$control.list, f, "cat.simplify") simp.ord <- lapply(byList$control.list, f, "ordered.simplify") simp.dat <- vapply(byList$control.list, f, NA, "date.simplify") if(!all(vapply(simp.cat, identical, NA, FALSE)) || any(simp.num) || !all(vapply(simp.ord, identical, NA, FALSE)) || any(simp.dat)) { simplify <- function(x) { ## make sure there's only two lines of (the same) summary statistic, and that it's categorical if(!identical(simp.cat[[x$variable[1]]], FALSE) && all(x$variable.type == "categorical") && (nrow(x) == 2 || nrow(x) == 3 && x$term[2] == x$term[3])) { y <- x[nrow(x), , drop = FALSE] y$term[1] <- x$term[1] y$label[1] <- paste0(x$label[1], if(identical(simp.cat[[x$variable[1]]], "label")) paste0(" (", y$label[1], ")")) } else if(!identical(simp.ord[[x$variable[1]]], FALSE) && all(x$variable.type == "ordinal") && (nrow(x) == 2 || nrow(x) == 3 && x$term[2] == x$term[3])) { y <- x[nrow(x), , drop = FALSE] y$term[1] <- x$term[1] y$label[1] <- paste0(x$label[1], if(identical(simp.ord[[x$variable[1]]], "label")) paste0(" (", y$label[1], ")")) } else if((simp.num[x$variable[1]] && all(x$variable.type == "numeric") || simp.dat[x$variable[1]] && all(x$variable.type == "Date")) && nrow(x) == 2) { y <- x[2, , drop = FALSE] y$term[1] <- x$term[1] y$label[1] <- x$label[1] } else y <- x y } bylst <- list(factor(out$variable, levels = unique(out$variable))) if(byList$strata$hasStrata) bylst[[2]] <- factor(out[[4]], levels = unique(out[[4]])) out <- do.call(rbind_chr, by(out, bylst, simplify, simplify = FALSE)) } set_attr(set_attr(out, "control.list", byList$control.list), "ylabel", byList$y$label) } arsenal/R/comparedf.R0000644000176200001440000001577513632700352014203 0ustar liggesusers # original function mockup done by Andy Hanson; re-envisioned by EPH starting 3/20/17 #' Compare two data.frames and report differences #' #' Compare two data.frames and report any differences between them, #' much like SAS's \code{PROC COMPARE} procedure. #' #' @param x,y A data.frame to compare #' @param by,by.x,by.y Which variables are IDs to merge the two data.frames? #' If set to \code{"row.names"}, merging will occur over the row.names. #' If set to \code{NULL} (default), merging will occur row-by-row. #' @param control A list of control parameters from \code{\link{comparedf.control}}. #' @param ... Other arguments, passed to \code{\link{comparedf.control}} when appropriate. #' @examples #' #' df1 <- data.frame(id = paste0("person", 1:3), a = c("a", "b", "c"), #' b = c(1, 3, 4), c = c("f", "e", "d"), #' row.names = paste0("rn", 1:3), stringsAsFactors = FALSE) #' df2 <- data.frame(id = paste0("person", 3:1), a = c("c", "b", "a"), #' b = c(1, 3, 4), d = paste0("rn", 1:3), #' row.names = paste0("rn", c(1,3,2)), stringsAsFactors = FALSE) #' summary(comparedf(df1, df2)) #' summary(comparedf(df1, df2, by = "id")) #' summary(comparedf(df1, df2, by = "row.names")) #' @author Ethan Heinzen, adapted from code from Andrew Hanson #' @seealso \code{\link{summary.comparedf}}, \code{\link{comparedf.control}}, #' \code{\link{diffs}}, \code{\link{n.diffs}}, \code{\link{n.diff.obs}} #' @name comparedf NULL #> NULL #' @rdname comparedf #' @export comparedf <- function(x, y, by = NULL, by.x = by, by.y = by, control = NULL, ...) { control <- c(list(...), control) control <- do.call("comparedf.control", control[!duplicated(names(control))]) xname <- paste0(deparse(substitute(x)), collapse = "") yname <- paste0(deparse(substitute(y)), collapse = "") if(!is.data.frame(x) || !is.data.frame(y)) { stop("Both 'x' and 'y' must be data.frames") } if(any(c("..row.x..", "..row.y..", "..row.names..") %in% c(colnames(x), colnames(y)))) { stop("'..row.x..', '..row.y..', and '..row.names..' are reserved colnames in this function.") } if(is.null(by) && is.null(by.x) && is.null(by.y)) { # user didn't supply any by-variables, so we'll merge by row by.x <- by.y <- "row.names" byrow <- TRUE } else if(is.null(by.x) || is.null(by.y)) { stop("Either 'by' or both of 'by.x' and 'by.y' must be specified") } else byrow <- FALSE if(any(by.x %nin% c("row.names", colnames(x))) || any(by.y %nin% c("row.names", colnames(y)))) { stop("One or more of 'by.x' doesn't match colnames(x) or 'by.y' doesn't match colnames(y).") } #### data frame summary #### frame.summary <- data.frame( version = c("x", "y"), arg = c(xname, yname), ncol = c(ncol(x), ncol(y)), nrow = c(nrow(x), nrow(y)), stringsAsFactors = FALSE ) tmp.attrs <- list(attributes(x), attributes(y)) if("row.names" %in% by.x) { x[["..row.names.."]] <- if(byrow) seq_len(nrow(x)) else row.names(x) by.x[by.x == "row.names"] <- "..row.names.." } if("row.names" %in% by.y) { y[["..row.names.."]] <- if(byrow) seq_len(nrow(y)) else row.names(y) by.y[by.y == "row.names"] <- "..row.names.." } frame.summary$by <- set_attr(list(by.x, by.y), "byrow", byrow) frame.summary$attrs <- tmp.attrs #### tweak the column names #### tcn <- tweakcolnames(by.x, by.y, colnames(x), colnames(y), control) rm(by.x, by.y) # just to make sure we don't try to use those anymore by <- tcn$by #### now merge the things together #### together <- merge(cbind(stats::setNames(x, tcn$cn.x), ..row.x.. = seq_len(nrow(x))), cbind(stats::setNames(y, tcn$cn.y), ..row.y.. = seq_len(nrow(y))), by = by, all = TRUE) both <- together[!is.na(together[["..row.x.."]]) & !is.na(together[["..row.y.."]]), , drop = FALSE] #### get the unshared observations #### getunique <- function(whch, whch2, by. = by) { tmp <- together[is.na(together[[whch2]]), c(by., whch), drop = FALSE] colnames(tmp)[colnames(tmp) == whch] <- "observation" tmp } frame.summary$unique <- list(getunique("..row.x..", "..row.y.."), getunique("..row.y..", "..row.x..")) frame.summary$n.shared <- rep(nrow(both), times = 2) #### Make the main object in this function #### mkdf <- function(df, cn.new) { df2 <- data.frame(tmp = cn.new, var = colnames(df), pos = seq_along(df), stringsAsFactors = FALSE) df2$class <- unname(lapply(df, class)) df2 } vars.summary <- merge(mkdf(x, tcn$cn.x), mkdf(y, tcn$cn.y), by = "tmp", all = TRUE) ord <- order(is.na(vars.summary$var.x), is.na(vars.summary$var.y), vars.summary$pos.x, vars.summary$pos.y, na.last = TRUE) vars.summary <- vars.summary[ord, , drop = FALSE] row.names(vars.summary) <- NULL vars.summary$class.x <- lapply(vars.summary$class.x, cleanup.null.na) vars.summary$class.y <- lapply(vars.summary$class.y, cleanup.null.na) vars.summary$values <- lapply(seq_len(nrow(vars.summary)), compare_values, v = vars.summary, df = both, byvars = by, contr = control) vars.summary$attrs <- lapply(seq_len(nrow(vars.summary)), compare_attrs, v = vars.summary, x_ = x, y_ = y) vars.summary$tmp <- NULL structure(list(frame.summary = structure(frame.summary, class = c("comparedf.frame.summary", "data.frame")), vars.summary = structure(vars.summary, class = c("comparedf.vars.summary", "data.frame")), control = control, Call = match.call()), class = "comparedf") } #' @rdname comparedf #' @export print.comparedf <- function(x, ...) { cat("Compare Object\n\n") cat("Function Call: \n") print(x$Call) cat("\n") cat("Shared: ", sum(idx_var_sum(x, "nonby.vars.shared")), " non-by variables and ", x$frame.summary$n.shared[1], " observations.\n", sep = "") cat("Not shared: ", sum(idx_var_sum(x, "vars.not.shared")), " variables and ", n.diff.obs(x), " observations.\n", sep = "") cat("\n") cat("Differences found in ", sum(idx_var_sum(x, "differences.found")), "/", sum(idx_var_sum(x, "vars.compared")), " variables compared.\n", sep = "") cat(sum(idx_var_sum(x, "non.identical.attributes")), " variables compared have non-identical attributes.\n", sep = "") invisible(x) } #' @export print.comparedf.vars.summary <- function(x, ...) { orig <- x f <- function(elt, txt1, txt2) { if(is.data.frame(elt)) paste0(nrow(elt), txt1) else if(is.null(elt)) txt2 else elt } x$values <- vapply(x$values, f, character(1), txt1 = " differences", txt2 = "Not compared") x$attrs <- vapply(x$attrs, f, character(1), txt1 = " attributes", txt2 = "0 attributes") NextMethod() invisible(orig) } #' @export print.comparedf.frame.summary <- function(x, ...) { orig <- x f <- function(elt, txt1, txt2) { if(is.data.frame(elt)) paste0(nrow(elt), txt1) else if(is.list(elt)) paste0(length(elt), txt1) else if(is.null(elt)) txt2 else elt } x$attrs <- vapply(x$attrs, f, character(1), txt1 = " attributes", txt2 = "0 attributes") x$unique <- vapply(x$unique, f, character(1), txt1 = " unique obs", txt2 = "") NextMethod() invisible(orig) } arsenal/R/tableby.control.R0000644000176200001440000002301614045306034015325 0ustar liggesusers #' Control settings for \code{tableby} function #' #' Control test and summary settings for the \code{\link{tableby}} function. #' #' @param test logical, telling \code{tableby} whether to perform tests of x variables across levels of the group variable. #' @param total logical, telling \code{tableby} whether to calculate a column of totals across group variable. #' @param total.pos One of \code{"before"} or \code{"after"}, denoting where to put the total column relative to the by-variable columns. #' @param test.pname character string denoting the p-value column name in \code{\link{summary.tableby}}. #' Modifiable also with \code{\link{modpval.tableby}}. #' @param cat.simplify,ordered.simplify logical, tell \code{tableby} whether to remove the first level of the categorical/ordinal variable if binary. #' If \code{TRUE}, only the summary stats of the second level are reported (unless there's only one level, in which case it's reported). #' If \code{"label"}, the second level's label is additionally appended to the label. #' NOTE: this only simplifies to one line if there is only one statistic reported, such as \code{countpct}. #' In particular, if \code{Nmiss} is specified and there are missings, then the output is not simplified. #' @param cat.droplevels Should levels be dropped for categorical variables? If set to true, p-values will not be displayed #' unless \code{test.always = TRUE} as well. #' @param numeric.simplify,date.simplify logical, tell \code{tableby} whether to condense numeric/date output to a single line. #' NOTE: this only simplifies to one line if there is only one statistic reported, such as \code{meansd}. #' In particular, if \code{Nmiss} is specified and there are missings, then the output is not simplified. #' @param numeric.test name of test for numeric RHS variables in \code{tableby}: anova, kwt (Kruskal-Wallis), medtest (median test). #' If no LHS variable exists, then a mean is required for a univariate test. #' @param numeric.stats,cat.stats,ordered.stats,surv.stats,date.stats,selectall.stats summary statistics to include for the respective class of RHS variables #' within the levels of the group LHS variable. #' @param cat.test name of test for categorical variables: chisq, fe (Fisher's Exact) #' @param wilcox.correct,wilcox.exact See \code{\link[stats]{wilcox.test}} #' @param chisq.correct logical, correction factor for chisq.test #' @param simulate.p.value logical, simulate p-value for categorical tests (fe and chisq) #' @param B number of simulations to perform for simulation-based p-value #' @param ordered.test name of test for ordered variables: trend #' @param surv.test name of test for survival variables: logrank #' @param date.test name of test for date variables: kwt #' @param selectall.test name of test for date variables: notest #' @param stats.labels A named list of labels for all the statistics function names, where the function name is the named element in the list #' and the value that goes with it is a string containing the formal name that will be printed in all printed renderings of the output, #' e.g., \code{list(countpct="Count (Pct)")}. Any unnamed elements will be ignored. Passing \code{NULL} will disable labels. #' @param digits Number of decimal places for numeric values. #' @param digits.count Number of decimal places for count values. #' @param digits.pct Number of decimal places for percents. #' @param digits.p Number of decimal places for p-values. #' @param format.p Logical, denoting whether to format p-values. See "Details", below. #' @param digits.n Number of decimal places for N's in the header. Set it to NA to suppress the N's. #' @param conf.level Numeric, denoting what confidence level to use for confidence intervals. #' (See, e.g., \code{\link{binomCI}}) #' @param times A vector of times to use for survival summaries. #' @param test.always Should the test be performed even if one or more by-group has 0 observations? Relevant #' for kwt and anova. #' @param ... additional arguments. #' @details #' All tests can be turned off by setting \code{test} to FALSE. #' Otherwise, test are set to default settings in this list, or set explicitly in the formula of \code{tableby}. #' #' If \code{format.p} is \code{FALSE}, \code{digits.p} denotes the number of significant digits shown. The #' p-values will be in exponential notation if necessary. If \code{format.p} is \code{TRUE}, #' \code{digits.p} will determine the number of digits after the decimal point to show. If the p-value #' is less than the resulting number of places, it will be formatted to show so. #' #' Options for statistics are described more thoroughly in the vignette and are listed in \link{tableby.stats} #' #' #' @return A list with settings to be used within the \code{tableby} function. #' #' @seealso \code{\link[stats]{anova}}, \code{\link[stats]{chisq.test}}, \code{\link{tableby}}, \code{\link{summary.tableby}}, #' \code{\link{tableby.stats}}. #' @author Jason Sinnwell, Beth Atkinson, Ethan Heinzen, Terry Therneau, adapted from SAS Macros written by Paul Novotny and Ryan Lennon #' @examples #' set.seed(100) #' ## make 3+ categories for Response #' mdat <- data.frame(Response=c(0,0,0,0,0,1,1,1,1,1), #' Sex=sample(c("Male", "Female"), 10,replace=TRUE), #' Age=round(rnorm(10,mean=40, sd=5)), #' HtIn=round(rnorm(10,mean=65,sd=5))) #' #' ## allow default summaries in RHS variables, and pass control args to #' ## main function, to be picked up with ... when calling tableby.control #' outResp <- tableby(Response ~ Sex + Age + HtIn, data=mdat, total=FALSE, test=TRUE) #' outCtl <- tableby(Response ~ Sex + Age + HtIn, data=mdat, #' control=tableby.control(total=TRUE, cat.simplify=TRUE, #' cat.stats=c("Nmiss","countpct"),digits=1)) #' summary(outResp, text=TRUE) #' summary(outCtl, text=TRUE) #' @export tableby.control <- function( test=TRUE,total=TRUE, total.pos = c("after", "before"), test.pname=NULL, numeric.simplify=FALSE, cat.simplify=FALSE, cat.droplevels=FALSE, ordered.simplify=FALSE, date.simplify=FALSE, numeric.test="anova", cat.test="chisq", ordered.test="trend", surv.test="logrank", date.test="kwt", selectall.test="notest", test.always = FALSE, numeric.stats=c("Nmiss","meansd","range"), cat.stats=c("Nmiss","countpct"), ordered.stats=c("Nmiss", "countpct"), surv.stats=c("Nmiss", "Nevents","medSurv"), date.stats=c("Nmiss", "median","range"), selectall.stats=c("Nmiss", "countpct"), stats.labels = list(), digits = 3L, digits.count = 0L, digits.pct = 1L, digits.p = 3L, format.p = TRUE, digits.n = 0L, conf.level = 0.95, wilcox.correct = FALSE, wilcox.exact = NULL, chisq.correct=FALSE, simulate.p.value=FALSE, B=2000, times = 1:5, ...) { nm <- names(list(...)) if("digits.test" %in% nm) .Deprecated(msg = "Using 'digits.test = ' is deprecated. Use 'digits.p = ' instead.") if("nsmall" %in% nm) .Deprecated(msg = "Using 'nsmall = ' is deprecated. Use 'digits = ' instead.") if("nsmall.pct" %in% nm) .Deprecated(msg = "Using 'nsmall.pct = ' is deprecated. Use 'digits.pct = ' instead.") ## validate digits # digits and digits.test are OK to be NULL. See ?format if(!is.null(digits) && digits < 0L) { warning("digits must be >= 0. Set to default.") digits <- 3L } if(!is.null(digits.count) && digits.count < 0L) { warning("digits.count must be >= 0. Set to default.") digits.count <- 0L } if(!is.null(digits.pct) && digits.pct < 0L) { warning("digits.pct must be >= 0. Set to default.") digits.pct <- 1L } if(!is.null(digits.p) && digits.p < 0L) { warning("digits.p must be >= 0. Set to default.") digits.p <- 3L } if(!is.null(digits.n) && !is.na(digits.n) && digits.p < 0L) { warning("digits.n must be >= 0 or NA or NULL. Set to default.") digits.n <- 0L } stats.labels <- if(is.null(stats.labels)) NULL else add_tbc_stats_labels(stats.labels) list(test=test, total=total, total.pos = match.arg(total.pos), test.pname=test.pname, numeric.simplify=numeric.simplify, cat.simplify=cat.simplify, cat.droplevels = cat.droplevels, ordered.simplify=ordered.simplify, date.simplify=date.simplify, numeric.test=numeric.test, cat.test=cat.test, ordered.test=ordered.test, surv.test=surv.test, date.test=date.test, selectall.test=selectall.test, test.always=test.always, numeric.stats=numeric.stats, cat.stats=cat.stats, ordered.stats=ordered.stats, surv.stats=surv.stats, date.stats=date.stats, selectall.stats=selectall.stats, stats.labels=stats.labels, digits=digits, digits.p=digits.p, digits.count = digits.count, digits.pct = digits.pct, format.p = format.p, digits.n = digits.n, conf.level=conf.level, wilcox.correct = wilcox.correct, wilcox.exact = wilcox.exact, chisq.correct=chisq.correct, simulate.p.value=simulate.p.value, B=B, times=times) } add_tbc_stats_labels <- function(x) { start <- list( Nmiss="N-Miss", Nmiss2="N-Miss", meansd="Mean (SD)", meanse = "Mean (SE)", medianrange="Median (Range)", median="Median", medianq1q3="Median (Q1, Q3)", q1q3="Q1, Q3", iqr = "IQR", mean = "Mean", sd = "SD", var = "Var", max = "Max", min = "Min", meanCI = "Mean (CI)", sum = "Sum", gmean = "Geom Mean", gsd = "Geom SD", gmeansd = "Geom Mean (Geom SD)", gmeanCI = "Geom Mean (CI)", range="Range", Npct="N (Pct)", Nevents="Events", medSurv="Median Survival", medTime = "Median Follow-Up", medianmad="Median (MAD)", Nsigntest = "N (sign test)", overall = "Overall", total = "Total", difference = "Difference" ) nms <- setdiff(names(x), "") start[nms] <- x[nms] start } arsenal/R/summary.tableby.R0000644000176200001440000002244114053011123015332 0ustar liggesusers #' The summary method for a \code{tableby} object #' #' The summary method for a \code{\link{tableby}} object, which is a pretty rendering of a \code{\link{tableby}} #' object into a publication-quality results table in R Markdown, and can render well in text-only. #' #' @param object An object of class \code{"tableby"}, made by the \code{\link{tableby}} function. #' @param x An object of class \code{"summary.tableby"}. #' @param ... For \code{summary.tableby}, other arguments passed to \code{\link{as.data.frame.tableby}}. #' For \code{print}ing the summary object, these are passed to both \code{as.data.frame.summary.tableby} and #' \code{\link[knitr]{kable}}. #' @param title Title/caption for the table, defaulting to \code{NULL} (no title). Passed to \code{\link[knitr]{kable}}. #' Can be length > 1 if the more than one table is being printed. #' @param labelTranslations A named list (or vector) where the name is the label in the #' output to be replaced in the pretty rendering by the character string #' value for the named element of the list, e.g., \code{list(age = "Age(Years)", meansd = "Mean(SD)")}. #' @param text An argument denoting how to print the summary to the screen. #' Default is \code{FALSE} (show markdown output). \code{TRUE} and \code{NULL} output a text-only version, with #' the latter avoiding all formatting. #' \code{"html"} uses the HTML tag \code{} instead of the markdown formatting, and \code{"latex"} uses #' the LaTeX command \code{\\textbf}. #' @param pfootnote Logical, denoting whether to put footnotes describing the tests used to generate the p-values. Alternatively, #' "html" to surround the outputted footnotes with \code{
  • }. #' @param term.name A character vector denoting the column name for the "terms" column. It should be the same length #' as the number of tables or less (it will get recycled if needed). The special value \code{TRUE} will #' use the y-variable's label for each table. #' @param list.ok If the object has multiple by-variables, is it okay to return a list of data.frames instead of a single data.frame? #' If \code{FALSE} but there are multiple by-variables, a warning is issued. #' @inheritParams arsenal_table #' @return An object of class \code{summary.tableby} #' @seealso \code{\link{tableby.control}}, \code{\link{tableby}} #' @author Ethan Heinzen, based on code by Gregory Dougherty, Jason Sinnwell, Beth Atkinson, #' adapted from SAS Macros written by Paul Novotny and Ryan Lennon #' @examples #' #' set.seed(100) #' ## make 3+ categories for response #' nsubj <- 90 #' mdat <- data.frame(Response=sample(c(1,2,3),nsubj, replace=TRUE), #' Sex=sample(c("Male", "Female"), nsubj,replace=TRUE), #' Age=round(rnorm(nsubj,mean=40, sd=5)), #' HtIn=round(rnorm(nsubj,mean=65,sd=5))) #' #' ## allow default summaries on RHS variables #' out <- tableby(Response ~ Sex + Age + HtIn, data=mdat) #' summary(out, text=TRUE) #' labels(out) #' labels(out) <- c(Age="Age (years)", HtIn="Height (inches)") #' summary(out, stats.labels=c(meansd="Mean-SD", q1q3 = "Q1-Q3"), text=TRUE) #' #' @name summary.tableby NULL #> NULL #' @rdname summary.tableby #' @export summary.tableby <- function(object, ..., labelTranslations = NULL, text = FALSE, title = NULL, pfootnote = FALSE, term.name = "") { dat <- as.data.frame(object, ..., labelTranslations = labelTranslations, list.ok = TRUE) structure(list( object = set_attr(dat, "control", NULL), control = attr(dat, "control"), totals = lapply(object$tables, function(x) x$y$stats), hasStrata = has_strata(object), text = text, title = title, pfootnote = pfootnote, term.name = term.name ), class = c("summary.tableby", "summary.arsenal_table")) } as_data_frame_summary_tableby <- function(df, totals, hasStrata, term.name, control, text, pfootnote, width, min.split) { df.orig <- df idx <- names(df)[names(df) %nin% c("group.term", "group.label", "strata.term", "strata.label", "variable", "term", "label", "variable.type", "test", "p.value")] dgt <- attr(df, "control.list") f <- function(j, whch) if(is.null(dgt[[j]]) || is.null(dgt[[j]][[whch]])) control[[whch]] else dgt[[j]][[whch]] digits <- vapply(df$variable, function(i) f(i, "digits"), numeric(1)) digits.count <- vapply(df$variable, function(i) f(i, "digits.count"), numeric(1)) digits.pct <- vapply(df$variable, function(i) f(i, "digits.pct"), numeric(1)) df[idx] <- lapply(df[idx], function(col) { out <- Map( function(xx, ...) if(is.character(xx)) xx else format(xx, ...), col, digits = digits, format = "f", # the format="f" is not used for tbstat objects digits.count = digits.count, digits.pct = digits.pct ) unlist(out) }) if(is.numeric(df$p.value)) { df$p.value <- formatC(df$p.value, digits = control$digits.p, format = if(control$format.p) "f" else "g") if(control$format.p) { cutoff <- 10^(-control$digits.p) fmt <- paste0("< ", format(cutoff, digits = control$digits.p, format = "f")) df$p.value[df.orig$p.value < cutoff] <- fmt } } if(!is.null(df$p.value)) df$p.value[grepl("^\\s*NA$", df$p.value) | is.na(df$p.value)] <- "" tests.used <- NULL if(control$test && (isTRUE(pfootnote) || identical(pfootnote, "html"))) { tests.used <- unique(df$test[df$test != "No test"]) sup <- if(!is.null(text) && identical(text, "html")) c("", "") else if(isTRUE(text)) c(" (", ")") else c("^", "^") df$p.value <- ifelse(df$p.value == "", "", paste0(df$p.value, sup[1], as.integer(factor(df[["test"]], levels = tests.used)), sup[2])) tests.used <- if(identical(pfootnote, "html")) c("
      ", paste0("
    1. ", tests.used, "
    2. "), "
    ") else paste0(seq_along(tests.used), ". ", tests.used) } #### don't show the same statistics more than once #### dups <- if(hasStrata) { unlist(by(df, factor(df[[4]], levels = unique(df[[4]])), function(x) duplicated(x$variable), simplify = FALSE), use.names = FALSE) } else duplicated(df$variable) df$p.value[dups] <- "" if(hasStrata) { df[[4]] <- as.character(df[[4]]) df[[4]][duplicated(df[[4]])] <- "" } #### get rid of unnecessary columns #### df$group.term <- NULL df$group.label <- NULL df$strata.term <- NULL df$variable <- NULL df$term <- NULL df$test <- NULL df$variable.type <- NULL if(!control$test) df$p.value <- NULL if(!control$total && !identical(control$stats.labels$overall, "Total")) df[[control$stats.labels$total]] <- NULL #### Format if necessary #### if(!is.null(width)) { firstcol <- smart.split(df[[1L + hasStrata]], width = width, min.split = min.split) lens <- vapply(firstcol, length, NA_integer_) df <- do.call(cbind.data.frame, c(list(label = unlist(firstcol, use.names = FALSE)), lapply(df[-1L - hasStrata], insert_elt, times = lens))) if(hasStrata) df <- df[replace(seq_along(df), 1:2, 2:1)] row.names(df) <- NULL dups <- insert_elt(dups, times = lens, elt = NULL) } df$label <- trimws(df$label) if(!is.null(text)) { df$label <- if(identical(text, "html")) { ifelse(dups, paste0("   ", df$label), paste0("", df$label, "")) } else if(identical(text, "latex")) { ifelse(dups, paste0("~~~", df$label), paste0("\\textbf{", df$label, "}")) } else if(text) { ifelse(dups, paste0("- ", df$label), df$label) } else ifelse(dups, paste0("   ", df$label), paste0("**", ifelse(df$label == "", " ", df$label), "**")) if(identical(text, "latex")) df[] <- lapply(df, gsub, pattern = "%", replacement = "\\%", fixed = TRUE) } #### tweak column names according to specifications #### cn <- stats::setNames(colnames(df), colnames(df)) align <- c(if(hasStrata) "l", "l", rep("c", times = sum(cn != "p.value")-1), if("p.value" %in% cn) "r") nm <- intersect(cn, names(totals)) if(length(nm) && (is.null(control$digits.n) || !is.na(control$digits.n))) cn[nm] <- paste0(cn[nm], " (N=", formatC(totals[nm], digits = control$digits.n, format = "f"), ")") cn["label"] <- term.name if("p.value" %in% cn && is.null(control$test.pname)) cn["p.value"] <- "p value" else if("p.value" %in% cn) cn["p.value"] <- control$test.pname colnames(df) <- cn if(control$total.pos == "before" && control$stats.labels$total %in% nm) { idx <- seq_along(cn) idx2 <- idx[names(cn) %in% nm] idx[idx2] <- c(utils::tail(idx2, 1), utils::head(idx2, -1)) df <- df[idx] } set_attr(set_attr(df, "tests", tests.used), "align", align) } #' @rdname summary.tableby #' @export as.data.frame.summary.tableby <- function(x, ..., text = x$text, pfootnote = x$pfootnote, term.name = x$term.name, width = NULL, min.split = NULL, list.ok = FALSE) { if(is.null(term.name) || identical(term.name, TRUE)) { term.name <- vapply(x$object, attr, NA_character_, "ylabel") } stopifnot(length(term.name) <= length(x$object)) out <- Map(as_data_frame_summary_tableby, x$object, x$totals, x$hasStrata, term.name, MoreArgs = list(control = x$control, text = text, pfootnote = pfootnote, width = width, min.split = min.split)) if(!list.ok) { if(length(out) == 1) out <- out[[1]] else warning("as.data.frame.summary.tableby is returning a list of data.frames") } out } arsenal/R/yaml.R0000644000176200001440000000232013755041062013165 0ustar liggesusers #' Include a YAML header in \code{write2} #' #' @param x An object of class \code{"yaml"}. #' @param ... For \code{yaml()}, arguments to be bundled into a list and passed to \code{\link[yaml]{as.yaml}}. #' For \code{print.yaml()}, extra arguments. For \code{c.yaml()}, "yaml" objects to be concatenated. #' @param recursive Not in use at this time. #' @return A text string of class \code{"yaml"}. #' @author Ethan Heinzen, adapted from an idea by Brendan Broderick #' @seealso \code{\link[yaml]{as.yaml}}, \code{\link{write2}} #' @examples #' x <- yaml(title = "My cool title", author = "Ethan P Heinzen") #' x #' y <- yaml("header-includes" = list("\\usepackage[labelformat=empty]{caption}")) #' y #' c(x, y) #' @name yaml NULL #> NULL #' @rdname yaml #' @export yaml <- function(...) { check_pkg("yaml") x <- yaml::as.yaml(list(...)) class(x) <- "yaml" x } #' @rdname yaml #' @export print.yaml <- function(x, ...) { cat(c("---\n", x, "---\n"), sep = "") invisible(x) } #' @rdname yaml #' @export c.yaml <- function(..., recursive = FALSE) { structure(paste0(c(unlist(lapply(list(...), unclass))), collapse = ""), class = "yaml") } #' @rdname yaml #' @export is.yaml <- function(x) { inherits(x, "yaml") } arsenal/R/internal.functions.R0000644000176200001440000000473413755036404016065 0ustar liggesusers check_pkg <- function(p) { if(!requireNamespace(p, quietly = TRUE)) stop("Package '", p, "' is required for this functionality, but is not installed.") } locate <- function(string, pattern) gregexpr(pattern, string)[[1L]] smartsplit <- function(string, width, min.split) { if(width < min.split) stop("Desired width < min.split?") if(nchar(string) <= width) return(string) pos <- locate(string, "[ \t\n_.;:,-]") splt <- if((length(pos) == 1 && pos == -1) || !any(idx <- (pos <= width & pos >= min.split))) width else max(pos[idx]) c(substring(string, 1L, splt), smartsplit(substring(string, splt+1L), width = width, min.split = min.split)) } #' Internal Functions #' #' @param string A character vector #' @param width Either \code{Inf} or \code{NULL} to specify no splitting, #' or a positive integer giving the largest allowed string length. #' @param min.split Either \code{-Inf} or \code{NULL} to specify no #' lower bound on the string length, or a positive integer giving the minimum string length. #' @inheritParams base::replace #' @return For \code{smart.split}, a list of the same length as \code{string}, with each element being #' the "intelligently" split string. #' #' For \code{replace2}, a vector with the proper values replaced. #' @seealso \code{\link[base]{replace}} #' @name internal.functions NULL #> NULL #' @rdname internal.functions #' @export smart.split <- function(string, width = Inf, min.split = -Inf) { if(is.null(width)) width <- Inf if(is.null(min.split)) min.split <- -Inf lapply(string, smartsplit, width = width, min.split = min.split) } insert_elt <- function(col, times, elt = "") { f <- if(is.null(elt)) rep else function(x, i) c(x, rep(elt, times = i - 1L)) unlist(Map(f, col, times), use.names = FALSE) } #' @rdname internal.functions #' @export replace2 <- function(x, list, values) { x[[list]] <- values x } as_list_formula <- function(formula) { if(is.list(formula)) return(formula) if(length(formula) == 2 || is.name(formula[[2]]) || !identical(formula[[2]][[1]], as.name("list"))) return(list(formula)) # one-sided or LHS is single arg if(length(formula[[2]]) == 1) return(list(replace2(formula, 2, NULL))) # for empty LHS-list() lapply(formula[[2]][-1], replace2, list = 2, x = formula) } # set all factors to characters rbind_chr <- function(...) { out <- rbind(..., make.row.names = FALSE) if(is.data.frame(out)) { idx <- vapply(out, is.factor, NA) out[idx] <- lapply(out[idx], as.character) } out } arsenal/R/as.data.frame.modelsum.R0000644000176200001440000000704513714574713016475 0ustar liggesusers get_ms_strata_part <- function(msList, sValue, xList, ...) { Map(get_ms_part, msList, c(0, cumsum(lengths(msList)[-1])), xList, MoreArgs = list(sValue = sValue, ...)) } get_ms_part <- function(msList, modelnum, xList, yList, aList, sList, sValue, fam, cntrl) { get_labs <- function(x) stats::setNames(x$label, x$term) # even if the model doesn't have an intercept, that's okay labs <- c("(Intercept)" = "(Intercept)", get_labs(xList), adj <- unlist(unname(lapply(aList, get_labs)))) statFields <- switch( fam, quasibinomial = cntrl$binomial.stats, binomial = cntrl$binomial.stats, quasipoisson = cntrl$poisson.stats, poisson = cntrl$poisson.stats, negbin = cntrl$negbin.stats, clog = cntrl$clog.stats, survival = cntrl$survival.stats, ordinal = cntrl$ordinal.stats, gaussian = cntrl$gaussian.stats, relrisk = cntrl$relrisk.stats ) OUT <- NULL for(adj.i in seq_along(msList)) { msLst <- msList[[adj.i]] trms <- msLst$coeff$term out <- data.frame( y.term = yList$term, y.label = yList$label, strata.term = if(!sList$hasStrata) "" else paste0("(", sList$term, ") == ", sValue), strata.value = if(!sList$hasStrata) "" else sValue, adjustment = names(msList)[adj.i], model = modelnum + adj.i, term = trms, label = ifelse(trms %in% names(labs), labs[trms], trms), term.type = ifelse(trms %in% names(adj), "Adjuster", ifelse(trms %in% xList$term, "Term", "Intercept")), stringsAsFactors = FALSE ) if(!sList$hasStrata) out$strata.value <- NULL else names(out)[4] <- sList$label if(any(names(msLst$coeff) %in% statFields)) out <- cbind(out, msLst$coeff[intersect(statFields, names(msLst$coeff))]) if(any(names(msLst$glance) %in% statFields)) out <- cbind(out, msLst$glance[intersect(statFields, names(msLst$glance))]) OUT <- rbind_chr(OUT, out) } OUT } #' as.data.frame.modelsum #' #' Coerce a \code{\link{modelsum}} object to a \code{data.frame}. #' #' @param x A \code{\link{modelsum}} object. #' @param ... Arguments to pass to \code{\link{modelsum.control}}. #' @inheritParams summary.modelsum #' @seealso \code{\link{modelsum}}, \code{\link{summary.modelsum}} #' @return A \code{data.frame}. #' @author Ethan Heinzen, based on code originally by Greg Dougherty #' @export as.data.frame.modelsum <- function(x, ..., labelTranslations = NULL, list.ok = FALSE) { if(!is.null(labelTranslations)) labels(x) <- labelTranslations control <- c(list(...), x$control) control <- do.call("modelsum.control", control[!duplicated(names(control))]) out <- lapply(x$tables, as_data_frame_modelsum, control = control) if(!list.ok) { if(length(out) == 1) out <- out[[1]] else warning("as.data.frame.modelsum is returning a list of data.frames") } set_attr(out, "control", control) } as_data_frame_modelsum <- function(lhsList, control) { stopifnot(length(lhsList$tables) == length(lhsList$strata$values)) tabs <- Map(get_ms_strata_part, msList = lhsList$tables, sValue = lhsList$strata$values, MoreArgs = list(yList = lhsList$y, sList = lhsList$strata, xList = lhsList$x, aList = lhsList$adjust, fam = lhsList$family, cntrl = control)) out <- do.call(rbind_chr, unlist(tabs, recursive = FALSE, use.names = FALSE)) out <- out[out$term.type %in% c("Term", if(control$show.intercept) "Intercept", if(control$show.adjust) "Adjuster"), , drop = FALSE] row.names(out) <- NULL # Get rid of Nmiss if none missing if("Nmiss" %in% names(out) && all(out$Nmiss == 0)) out$Nmiss <- NULL set_attr(out, "ylabel", lhsList$y$label) } arsenal/R/tableby.internal.R0000644000176200001440000003025314006277140015464 0ustar liggesusers ## allow stat functions to be passed as single arguments that are strings of function names ## Store this as attribute in the modeldf column, along with the actual name of the variable, ## rather than anova(age) showing up in the result (though anova(age) will be the column name in modeldf ## but we pull these attributes off later. inline_tableby_stat_test <- function(x, ..., digits = NULL, digits.count = NULL, digits.pct = NULL, numeric.simplify = NULL, cat.simplify = NULL, cat.droplevels = NULL, ordered.simplify = NULL, date.simplify = NULL) { attr(x, "term") <- attr(x, "name") <- deparse(substitute(x)) attr(x, "stats") <- if(missing(...)) NULL else list(...) attr(x, "control.list") <- list(digits = digits, digits.count = digits.count, digits.pct = digits.pct, numeric.simplify = numeric.simplify, cat.simplify = cat.simplify, cat.droplevels = cat.droplevels, ordered.simplify = ordered.simplify, date.simplify = date.simplify) class(x) <- c("keep_tableby_attrs", class(x)[class(x) != "keep_tableby_attrs"]) x } #' @export `[.keep_tableby_attrs` <- function(x, ...) { y <- NextMethod() attr(y, "name") <- attr(x, "name") attr(y, "stats") <- attr(x, "stats") attr(y, "term") <- attr(x, "term") attr(y, "control.list") <- attr(x, "control.list") class(y) <- class(y)[class(y) != "keep_tableby_attrs"] # purposely drop the class y } get_attr <- function(x, which, default) { x <- attr(x, which, exact = TRUE) if(is.null(x)) default else x } #' @export format.tbstat <- function(x, digits = NULL, ...) { x <- x[] # to remove classes if(is.numeric(x)) x <- trimws(formatC(x, digits = digits, format = "f")) if(is.list(x) && any(idx <- vapply(x, inherits, NA, "difftime"))) { x[idx] <- lapply(x[idx], function(xx) paste(trimws(formatC(unclass(xx), digits = digits, format = "f")), units(xx))) } if(length(x) == 1) return(paste0(x)) parens <- get_attr(x, "parens", c("", "")) sep <- get_attr(x, "sep", " ") sep2 <- get_attr(x, "sep2", " ") pct <- get_attr(x, "pct", "") if(length(x) == 2) { paste0(x[1], sep, parens[1], x[2], pct, parens[2]) } else paste0(x[1], sep, parens[1], x[2], sep2, x[3], parens[2]) } #' @export format.tbstat_countpct <- function(x, digits.count = NULL, digits.pct = NULL, digits = NULL, ...) { att <- attributes(x) x <- vapply(seq_along(x), function(i) { d <- if(i %in% att$which.pct) digits.pct else if(i %in% att$which.count) digits.count else digits formatC(x[i], digits = d, format = "f") }, NA_character_) x <- trimws(x) attributes(x) <- att NextMethod("format") } #' Internal \code{tableby} functions #' #' A collection of functions that may help users create custom functions that are formatted correctly. #' @param x Usually a vector. #' @param oldClass class(es) to add to the resulting object. #' @param sep The separator between \code{x[1]} and the rest of the vector. #' @param parens A length-2 vector denoting parentheses to use around \code{x[2]} and \code{x[3]}. #' @param sep2 The separator between \code{x[2]} and \code{x[3]}. #' @param pct For statistics of length 2, the symbol to use after the second one. (It's called #' "pct" because usually the first statistic is never a percent, but the second often is.) #' @param which.pct Which statistics are percents? The default is 0, indicating that none are. #' @param which.count Which statistics are counts? The default is everything except the things that are percents. #' @param ... arguments to pass to \code{as.tbstat}. #' @details #' The vignette has an example on how to use these. #' #' \code{as.tbstat} defines a tableby statistic with its appropriate formatting. #' #' \code{as.countpct} adds another class to \code{as.tbstat} to use different "digits" arguments #' (i.e., \code{digits.count} or \code{digits.pct}). See \code{\link{tableby.control}}. #' #' \code{as.tbstat_multirow} marks an object (usually a list) for multiple-row printing. #' @name tableby.stats.internal NULL #> NULL #' @rdname tableby.stats.internal #' @export as.tbstat <- function(x, oldClass = NULL, sep = NULL, parens = NULL, sep2 = NULL, pct = NULL, ...) { structure(x, class = c("tbstat", oldClass), sep = sep, parens = parens, sep2 = sep2, pct = pct, ...) } #' @rdname tableby.stats.internal #' @export as.countpct <- function(x, ..., which.count = setdiff(seq_along(x), which.pct), which.pct = 0L) { tmp <- as.tbstat(x, ..., which.count = which.count, which.pct = which.pct) class(tmp) <- c("tbstat_countpct", class(tmp)) tmp } #' @rdname tableby.stats.internal #' @export as.tbstat_multirow <- function(x) { class(x) <- c("tbstat_multirow", class(x)) x } extract_tbstat <- function(x, ...) { x <- NextMethod("[") class(x) <- class(x)[class(x) %nin% c("tbstat", "tbstat_countpct", "tbstat_multirow")] x } extract2_tbstat <- function(x, ...) { x <- NextMethod("[[") class(x) <- class(x)[class(x) %nin% c("tbstat", "tbstat_countpct", "tbstat_multirow")] x } #' @export `[.tbstat` <- extract_tbstat #' @export `[.tbstat_countpct` <- extract_tbstat #' @export `[.tbstat_multirow` <- extract_tbstat #' @export `[[.tbstat` <- extract2_tbstat #' @export `[[.tbstat_countpct` <- extract2_tbstat #' @export `[[.tbsta_multirowt` <- extract2_tbstat ## merge two tableby objects ## both must have same "by" variable and levels ## if some RHS variables have same names, keep both, the one in y add ".y" #' Helper functions for tableby #' #' A set of helper functions for \code{\link{tableby}}. #' #' @param ... Other arguments. #' @param x A \code{tableby} object. #' @param pdata A named data.frame where the first column is the by-variable names, the (optional) second is the strata value, the next is #' the x variable names, the next is p-values (or some test stat), and the (optional) next column is the method name. #' @param e1,e2 \code{\link{tableby}} objects, or numbers to compare them to. #' @param use.pname Logical, denoting whether the column name in \code{pdata} corresponding to the p-values should be used #' in the output of the object. #' @param n A single integer. See \code{\link[utils]{head}} or \code{\link[utils:head]{tail}} for more details #' @param lhs Logical, denoting whether to remove \code{NA}s from the first column of the data.frame (the "left-hand side") #' @return \code{na.tableby} returns a subsetted version of \code{object} (with attributes). \code{Ops.tableby} returns #' a logical vector. \code{xtfrm.tableby} returns the p-values (which are ordered by \code{\link{order}} to \code{\link{sort}}). #' @details #' Logical comparisons are implemented for \code{Ops.tableby}. #' @seealso \code{\link{arsenal_table}}, \code{\link{sort}}, \code{\link[utils]{head}}, \code{\link[utils:head]{tail}}, #' \code{\link{tableby}}, \code{\link{summary.tableby}}, \code{\link{tableby.control}} #' @name tableby.internal NULL #> NULL #' @rdname tableby.internal #' @export is.tableby <- function(x) inherits(x, "tableby") #' @rdname tableby.internal #' @export is.summary.tableby <- function(x) inherits(x, "summary.tableby") ## pdata is a named data.frame where the first column is the x variable names matched by name, ## p-values (or some test stat) are numbers and the name is matched ## method name is the third column (optional) ## to the x variable in the tableby object (x) #' @rdname tableby.internal #' @export modpval.tableby <- function(x, pdata, use.pname=FALSE) { ## set control$test to TRUE if(any(pdata[[1]] %in% names(x$tables))) { x$control$test <- TRUE ## change test results for(k in seq_len(nrow(pdata))) { yname <- as.character(pdata[[1]][k]) hasStrata <- x$tables[[yname]]$strata$hasStrata strat <- if(hasStrata) as.character(pdata[[2]][k]) else "" xname <- as.character(pdata[[2 + hasStrata]][k]) p <- pdata[[3 + hasStrata]][k] method <- if(ncol(pdata) > 3 + hasStrata) pdata[[4 + hasStrata]][k] else "Modified by user" if(xname %in% names(x$tables[[yname]]$x) && strat %in% x$tables[[yname]]$strata$values) { idx <- which(x$tables[[yname]]$strata$values == strat) stopifnot(length(idx) == 1) x$tables[[yname]]$tables[[idx]][[xname]]$test$p.value <- p x$tables[[yname]]$tables[[idx]][[xname]]$test$method <- method } } if(use.pname & nchar(names(pdata)[3 + hasStrata]) > 0) { ## put different test column name in control x$control$test.pname <- names(pdata)[3 + hasStrata] } } else warning("Couldn't match any by-variables to the first column of 'x'.") return(x) } ## Get the labels from the tableby object's elements in the order they appear in the fomula/Call ## including the y and x variables # labels <- function(x) { # UseMethod("labels") # } ## retrieve variable labels (y, x-vec) from tableby object ## define generic function for tests, so tests(tbObj) will work #' @rdname tableby.internal #' @export tests <- function(x) UseMethod("tests") ## retrieve the names of the tests performed per variable #' @rdname tableby.internal #' @export tests.tableby <- function(x) { if(x$control$test) { df <- as.data.frame(x, list.ok = TRUE) hasStrata <- has_strata(x) if(any(hasStrata) != all(hasStrata)) stop("Some tables in 'x' have strata, but others don't") testdf <- do.call(rbind_chr, Map(df, hasStrata, f = function(i, s) i[c("group.label", if(s) names(i)[4], "variable", "p.value", "test")])) testdf <- unique(testdf) row.names(testdf) <- NULL names(testdf)[c(1, any(hasStrata) + (2:4))] <- c("Group", "Variable", if(!is.null(x$control$test.pname)) x$control$test.pname else "p.value", "Method") } else { cat("No tests run on tableby object\n") testdf <- NULL } testdf } ## function to handle na.action for tableby formula, data.frame #' @rdname tableby.internal #' @export na.tableby <- function(lhs = TRUE) { if(is.data.frame(lhs)) stop("na.tableby now generates functions (and no longer accepts data.frames). ", "Use 'na.tableby()' to generate the function that used to be 'na.tableby'.") if(lhs) return(na_lhs_strata) function(object, ...) { omit <- if("(strata)" %in% names(object)) is.na(object[["(strata)"]]) else rep(FALSE, nrow(object)) xx <- object[!omit, , drop = FALSE] if(any(omit)) { temp <- stats::setNames(seq_along(omit)[omit], attr(object, "row.names")[omit]) attr(temp, "class") <- "omit" attr(xx, "na.action") <- temp } xx } } #' @rdname tableby.internal #' @export xtfrm.tableby <- function(x) { if(!x$control$test) stop("Can't extract p-values from a tableby object created with test=FALSE.") unlist(lapply(x$tables, function(lst) lapply(lst$tables, function(strat) lapply(strat, function(i) i$test$p.value))), use.names = FALSE) } #' @rdname tableby.internal #' @export sort.tableby <- function(x, ...) { if(!x$control$test) stop("Can't sort a tableby object created with test=FALSE.") if(any(has_strata(x)) || length(x$tables) > 1) stop("Can't sort a tableby object with strata or multiple by variables") NextMethod() } #' @rdname tableby.internal #' @export Ops.tableby <- function(e1, e2) { ok <- switch(.Generic, `<` = , `>` = , `<=` = , `>=` = , `==` = , `!=` = TRUE, FALSE) if(!ok) stop("'", .Generic, "' is not meaningful for tableby objects") if(inherits(e1, "tableby")) e1 <- xtfrm(e1) if(inherits(e2, "tableby")) e2 <- xtfrm(e2) get(.Generic, mode = "function")(e1, e2) } #' @rdname tableby.internal #' @export head.tableby <- function(x, n = 6L, ...) { stopifnot(length(n) == 1L) xlen <- unique(vapply(x$tables, function(obj) length(obj$x), NA_integer_)) if(length(xlen) != 1) stop("length isn't defined for tableby objects with differing x-variables per by-variable") n <- if(n < 0L) max(xlen + n, 0L) else min(n, xlen) x[seq_len(n)] } #' @rdname tableby.internal #' @export tail.tableby <- function(x, n = 6L, ...) { stopifnot(length(n) == 1L) xlen <- unique(vapply(x$tables, function(obj) length(obj$x), NA_integer_)) if(length(xlen) != 1) stop("length isn't defined for tableby objects with differing x-variables per by-variable") n <- if(n < 0L) max(xlen + n, 0L) else min(n, xlen) x[seq.int(to = xlen, length.out = n)] } arsenal/R/write2.internal.R0000644000176200001440000000325313656527335015274 0ustar liggesusers#' Helper functions for \code{write2} #' #' Helper functions for \code{\link{write2}}. #' #' @param ... For \code{verbatim}, objects to print out monospaced (as if in the terminal). For \code{code.chunk}, #' either expressions or single character strings to paste into the code chunk. #' @param chunk.opts A single character string giving the code chunk options. Make sure to specify the engine! #' @details #' The \code{"verbatim"} class is to tell \code{\link{write2}} to print the object inside #' a section surrounded by three back ticks. The results will look like it would in the terminal (monospaced). #' #' \code{code.chunk()} is to write explicit code chunks in the \code{.Rmd} file; it captures the call and writes it to the #' file, to execute upon knitting. #' @name write2.internal NULL #> NULL #' @export print.verbatim <- function(x, ...) { for(i in seq_along(x)) { cat("```\n") print(x[[i]], ...) cat("\n```\n\n") } invisible(x) } #' @rdname write2.internal #' @export verbatim <- function(...) { structure(list(...), class = "verbatim") } #' @rdname write2.internal #' @export code.chunk <- function(..., chunk.opts = "r") { if(!is.character(chunk.opts) || length(chunk.opts) != 1) stop("'chunk.opts' should be a single character string") Call <- match.call() Call[[1]] <- NULL Call$chunk.opts <- NULL if(length(Call) == 0) Call[[1]] <- "" set_attr(set_attr(Call, "chunk.opts", chunk.opts), "class", c("code.chunk", class(Call))) } #' @export print.code.chunk <- function(x, ...) { cat("```{", attr(x, "chunk.opts"), "}\n", sep = "") lapply(x, function(elt) if(is.character(elt)) cat(elt, "\n", sep = "") else print(elt)) cat("```\n\n") } arsenal/R/tableby.stats.R0000644000176200001440000004522014046312206015003 0ustar liggesusers##################################################### ## Testing and Summary stats methods for internal use in tableby ########################################### #' tableby Summary Statistics Functions #' #' A collection of functions that will report summary statistics. To create a custom function, #' consider using a function with all three arguments and \code{...}. See the \code{\link{tableby}} vignette #' for an example. #' #' @param x Usually a vector. #' @param na.rm Should NAs be removed? #' @param weights A vector of weights. #' @param levels A vector of levels that character \code{x}s should have. #' @param times A vector of times to use for survival summaries. #' @param by a vector of the by-values. #' @param by.levels a vector of the levels of \code{by}. #' @param conf.level Numeric, denoting what confidence level to use for confidence intervals. #' @param ... Other arguments. #' @param totallab What to call the total "column" #' @return Usually a vector of the appropriate numbers. #' @details Not all these functions are exported, in order to avoid conflicting NAMESPACES. #' Note also that the functions prefixed with \code{"arsenal_"} can be referred to by their short names #' (e.g., \code{"min"} for \code{"arsenal_min"}). #' @seealso \code{\link{includeNA}}, \code{\link{tableby.control}} #' @name tableby.stats NULL #> NULL get_stat_function <- function(x) switch(x, sum = , min = , max = , range = , mean = , sd = , var = , median = paste0("arsenal_", x), x) #' @rdname tableby.stats arsenal_sum <- function(x, na.rm=TRUE, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { sum(x, na.rm=na.rm) } as.tbstat(y) # unclear what the sum of dates should be? } #' @rdname tableby.stats arsenal_min <- function(x, na.rm=TRUE, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { min(x, na.rm=na.rm) } as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL) } #' @rdname tableby.stats arsenal_max <- function(x, na.rm=TRUE, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { max(x, na.rm=na.rm) } as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL) } #' @rdname tableby.stats arsenal_mean <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { wtd.mean(x, weights=weights, na.rm=na.rm) } as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL) } #' @rdname tableby.stats arsenal_sd <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { s <- sqrt(wtd.var(x, weights=weights, na.rm=na.rm)) if(is.Date(x)) list(as.difftime(s, units = "days")) else s } as.tbstat(y) } #' @rdname tableby.stats arsenal_var <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { wtd.var(x, weights=weights, na.rm=na.rm) # if(is.Date(x)) as.difftime(s, units = "days") else s } as.tbstat(y) } #' @rdname tableby.stats #' @export meansd <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { m <- wtd.mean(x, weights=weights, na.rm=na.rm) s <- sqrt(wtd.var(x, weights=weights, na.rm=na.rm)) if(is.Date(x)) list(as.character(m), as.difftime(s, units = "days")) else c(m, s) } as.tbstat(y, parens = c("(", ")")) } #' @rdname tableby.stats #' @export meanse <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { if(!is.null(weights)) stop("'meanse' can only be used without weights") m <- mean(x, na.rm=na.rm) s <- stats::sd(x, na.rm=na.rm)/sqrt(sum(!is.na(x))) if(is.Date(x)) list(as.character(m), as.difftime(s, units = "days")) else c(m, s) } as.tbstat(y, parens = c("(", ")")) } #' @rdname tableby.stats #' @export meanCI <- function(x, na.rm=TRUE, weights = NULL, conf.level = 0.95, ...) { y <- if(!is.null(weights) || (na.rm && allNA(x))) { NA_real_ } else { if(na.rm) x <- x[!is.na(x)] s <- stats::sd(x, na.rm = na.rm) m <- mean(x, na.rm = na.rm) n <- length(x) a <- (1 - conf.level)/2 c(m, m + stats::qt(c(a, 1 - a), df = n - 1) * s / sqrt(n)) } as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL, parens = c("(", ")"), sep2 = ", ") } #' @rdname tableby.stats #' @export medianrange <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) NA_real_ else wtd.quantile(x, probs=c(0.5, 0, 1), na.rm=na.rm, weights=weights) as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL, parens = c("(", ")"), sep2 = ", ") } #' @rdname tableby.stats #' @export medianmad <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(!is.null(weights) || (na.rm && allNA(x))) { NA_real_ } else { m <- stats::median(x, na.rm=na.rm) s <- stats::mad(x, na.rm=na.rm, constant = 1) if(is.Date(x)) list(as.character(m), as.difftime(s, units = "days")) else c(m, s) } as.tbstat(y, parens = c("(", ")")) } #' @rdname tableby.stats arsenal_median <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else if(is.Date(x)) { as.Date(wtd.quantile(as.integer(x), weights=weights, probs=0.5, na.rm=na.rm), origin="1970/01/01") } else { wtd.quantile(x, weights=weights, probs=0.5, na.rm=na.rm) } as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL) } #' @rdname tableby.stats arsenal_range <- function(x, na.rm=TRUE, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { range(x, na.rm=na.rm) } as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL, sep = " - ") } #' @rdname tableby.stats #' @export gmean <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if((na.rm && allNA(x)) || any(x < 0, na.rm = TRUE) || is.Date(x)) { NA_real_ } else { exp(wtd.mean(log(x), weights=weights, na.rm=na.rm)) } as.tbstat(y) } #' @rdname tableby.stats #' @export gsd <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if((na.rm && allNA(x)) || any(x <= 0, na.rm = TRUE) || is.Date(x)) { NA_real_ } else { n <- sum(!is.na(x)) exp(sqrt(wtd.var(log(x), weights = weights, na.rm = na.rm) * (n-1)/n)) } as.tbstat(y) } #' @rdname tableby.stats #' @export gmeansd <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if((na.rm && allNA(x)) || any(x < 0, na.rm = TRUE) || is.Date(x)) { NA_real_ } else { m <- exp(wtd.mean(log(x), weights=weights, na.rm=na.rm)) n <- sum(!is.na(x)) s <- if(any(x == 0, na.rm = TRUE)) { NA_real_ } else exp(sqrt(wtd.var(log(x), weights = weights, na.rm = na.rm) * (n-1)/n)) c(m, s) } as.tbstat(y, parens = c("(", ")")) } #' @rdname tableby.stats #' @export gmeanCI <- function(x, na.rm=TRUE, weights = NULL, conf.level = 0.95, ...) { y <- if(!is.null(weights) || (na.rm && allNA(x)) || any(x < 0, na.rm = TRUE) || is.Date(x)) { NA_real_ } else { if(na.rm) x <- x[!is.na(x)] n <- length(x) s <- sqrt(stats::var(log(x), na.rm = na.rm) * (n-1)/n) m <- mean(log(x), na.rm = na.rm) a <- (1 - conf.level)/2 ci <- if(any(x == 0, na.rm = TRUE)) NA_real_ else m + stats::qt(c(a, 1 - a), df = n - 1) * s / sqrt(n) exp(c(m, ci)) } as.tbstat(y, parens = c("(", ")"), sep2 = ", ") } #' @rdname tableby.stats #' @export Nsigntest <- function(x, na.rm = TRUE, weights = NULL, ...) { if(is.null(weights)) weights <- rep(1, NROW(x)) as.countpct(sum(weights*(x != 0), na.rm = na.rm)) } ## survival stats #' @rdname tableby.stats #' @export Nevents <- function(x, na.rm = TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { check_pkg("survival") mat <- summary(survival::survfit(x ~ 1, weights = weights))$table as.numeric(mat["events"]) } as.countpct(y) } ## Median survival #' @rdname tableby.stats #' @export medSurv <- function(x, na.rm = TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { check_pkg("survival") mat <- summary(survival::survfit(x ~ 1, weights = weights))$table as.numeric(mat["median"]) # if we don't hit the median, or if all obs are censors, this is NA } as.tbstat(y) } #' @rdname tableby.stats #' @export NeventsSurv <- function(x, na.rm = TRUE, weights = NULL, times=1:5, ...) { y <- if(na.rm && allNA(x)) { matrix(NA_real_, nrow = 1, ncol = length(times)) } else { check_pkg("survival") xsumm <- summary(survival::survfit(x ~ 1, weights = weights), times=times) t(cbind(cumsum(xsumm$n.event), 100*xsumm$surv)) } out <- stats::setNames(as.list(as.data.frame(y)), paste0("time = ", times)) as.tbstat_multirow(lapply(out, as.countpct, parens = c("(", ")"), which.pct = 2L)) } #' @rdname tableby.stats #' @export NriskSurv <- function(x, na.rm = TRUE, weights = NULL, times=1:5, ...) { y <- if(na.rm && allNA(x)) { matrix(NA_real_, nrow = 1, ncol = length(times)) } else { check_pkg("survival") xsumm <- summary(survival::survfit(x ~ 1, weights = weights), times=times) t(cbind(xsumm$n.risk, 100*xsumm$surv)) } out <- stats::setNames(as.list(as.data.frame(y)), paste0("time = ", times)) as.tbstat_multirow(lapply(out, as.countpct, parens = c("(", ")"), which.pct = 2L)) } #' @rdname tableby.stats #' @export Nrisk <- function(x, na.rm = TRUE, weights = NULL, times=1:5, ...) { y <- if(na.rm && allNA(x)) { rep(NA_real_, times = length(times)) } else { check_pkg("survival") summary(survival::survfit(x ~ 1, weights = weights), times=times)$n.risk } out <- stats::setNames(as.list(y), paste0("time = ", times)) as.tbstat_multirow(lapply(out, as.countpct)) } #' @rdname tableby.stats #' @export medTime <- function(x, na.rm = TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { x[, 2] <- 1 - x[, 2] # censor events instead check_pkg("survival") mat <- summary(survival::survfit(x ~ 1, weights = weights))$table as.numeric(mat["median"]) # if we don't hit the median, or if all obs are events, this is NA } as.tbstat(y) } ## quantiles #' @rdname tableby.stats #' @export q1q3 <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else wtd.quantile(x, weights=weights, probs=c(0.25, .75), na.rm=na.rm) as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL, sep = ", ") } #' @rdname tableby.stats #' @export medianq1q3 <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else wtd.quantile(x, weights=weights, probs=c(0.5, 0.25, 0.75), na.rm=na.rm) as.tbstat(y, oldClass = if(is.Date(x)) "Date" else NULL, parens = c("(", ")"), sep2 = ", ") } #' @rdname tableby.stats #' @export iqr <- function(x, na.rm=TRUE, weights = NULL, ...) { y <- if(na.rm && allNA(x)) { NA_real_ } else { s <- diff(wtd.quantile(x, weights=weights, probs=c(0.25, 0.75), na.rm=na.rm)) if(is.Date(x)) list(as.difftime(s, units = "days")) else s } as.tbstat(y) } ## Count of missings: always show missings #' @rdname tableby.stats #' @export Nmiss <- function(x, weights = NULL, ...) { if(is.null(weights)) weights <- rep(1, NROW(x)) weights <- weights[is.na(x) | is.na(weights)] as.countpct(sum(weights)) } ## Nmiss2 make similar, but in tableby, always keep nmiss, ## even if there are no missings #' @rdname tableby.stats #' @export Nmiss2 <- Nmiss ## count of complete samples #' @rdname tableby.stats #' @export N <- function(x, na.rm=TRUE, weights = NULL, ...) { if(is.null(weights)) weights <- rep(1, NROW(x)) if(na.rm) weights <- weights[!is.na(x) & !is.na(weights)] as.countpct(sum(weights)) } #' @rdname tableby.stats #' @export Npct <- function(x, levels=NULL, by, by.levels=sort(unique(by)), na.rm=TRUE, weights = NULL, ..., totallab = "Total") { if(is.null(levels)) levels <- sort(unique(x)) if(na.rm) { idx <- !is.na(x) & !is.na(by) if(!is.null(weights)) idx <- idx & !is.na(weights) x <- x[idx] by <- by[idx] weights <- weights[idx] } tmp <- wtd.table(factor(by, levels = by.levels), weights = weights) wtbl <- c(tmp, stats::setNames(sum(tmp), totallab)) lapply(wtbl, function(elt) as.countpct(c(elt, 100*elt/sum(tmp)), parens = c("(", ")"), pct = "%", which.pct = 2L)) } ## count within group variable #' @rdname tableby.stats count <- function (x, levels=NULL, na.rm = TRUE, weights = NULL, ...) { if(is.null(levels)) levels <- sort(unique(x)) if(na.rm) { idx <- !is.na(x) if(!is.null(weights)) idx <- idx & !is.na(weights) x <- x[idx] weights <- weights[idx] } if(is.selectall(x)) { if(is.null(weights)) weights <- rep(1, nrow(x)) wtbl <- apply(as.matrix(x) == 1, 2, function(y) sum(weights[y])) } else wtbl <- wtd.table(factor(x, levels=levels), weights=weights) as.tbstat_multirow(lapply(as.list(wtbl), as.countpct)) } ## count (pct) where pct is within group variable total #' @rdname tableby.stats #' @export countpct <- function(x, levels=NULL, na.rm=TRUE, weights = NULL, ...) { if(is.null(levels)) levels <- sort(unique(x)) if(na.rm) { idx <- !is.na(x) if(!is.null(weights)) idx <- idx & !is.na(weights) x <- x[idx] weights <- weights[idx] } if(is.selectall(x)) { if(is.null(weights)) weights <- rep(1, nrow(x)) wtbl <- apply(as.matrix(x) == 1, 2, function(y) sum(weights[y])) denom <- sum(weights) } else { wtbl <- wtd.table(factor(x, levels=levels), weights=weights) denom <- sum(wtbl) } as.tbstat_multirow(lapply(Map(c, wtbl, if(any(wtbl > 0)) 100*wtbl/denom else rep(list(NULL), times = length(wtbl))), as.countpct, parens = c("(", ")"), pct = "%", which.pct = 2L)) } #' @rdname tableby.stats #' @export countN <- function(x, levels=NULL, na.rm=TRUE, weights = NULL, ...) { if(is.null(levels)) levels <- sort(unique(x)) wtbl <- wtd.table(factor(x[!is.na(x)], levels=levels), weights=weights[!is.na(x)]) n <- sum(wtbl) as.tbstat_multirow(lapply(Map(c, wtbl, rep(n, times = length(wtbl))), as.countpct, sep = "/")) } transpose_list <- function(x, levels, by.levels) stats::setNames(lapply(by.levels, function(i) as.tbstat_multirow(stats::setNames(lapply(x, "[[", i), levels))), by.levels) #' @rdname tableby.stats #' @export countrowpct <- function(x, levels=NULL, by, by.levels=sort(unique(by)), na.rm=TRUE, weights = NULL, ..., totallab = "Total") { if(is.null(levels)) levels <- sort(unique(x)) if(na.rm) { idx <- !is.na(x) & !is.na(by) if(!is.null(weights)) idx <- idx & !is.na(weights) x <- x[idx] by <- by[idx] weights <- weights[idx] } wtbls <- lapply(levels, function(L) { tmp <- wtd.table(factor(by[x == L], levels = by.levels), weights = weights[x == L]) wtbl <- c(tmp, stats::setNames(sum(tmp), totallab)) lapply(wtbl, function(elt) as.countpct(c(elt, 100*elt/sum(tmp)), parens = c("(", ")"), pct = "%", which.pct = 2L)) }) transpose_list(wtbls, levels, c(by.levels, totallab)) } #' @rdname tableby.stats #' @export countcellpct <- function(x, levels=NULL, by, by.levels=sort(unique(by)), na.rm=TRUE, weights = NULL, ..., totallab = "Total") { if(is.null(levels)) levels <- sort(unique(x)) if(na.rm) { idx <- !is.na(x) & !is.na(by) if(!is.null(weights)) idx <- idx & !is.na(weights) x <- x[idx] by <- by[idx] weights <- weights[idx] } tot <- sum(vapply(levels, function(L) { sum(wtd.table(factor(by[x == L], levels = by.levels), weights = weights[x == L])) }, numeric(1))) wtbls <- lapply(levels, function(L) { tmp <- wtd.table(factor(by[x == L], levels = by.levels), weights = weights[x == L]) wtbl <- c(tmp, stats::setNames(sum(tmp), totallab)) lapply(wtbl, function(elt) as.countpct(c(elt, 100*elt/tot), parens = c("(", ")"), pct = "%", which.pct = 2L)) }) transpose_list(wtbls, levels, c(by.levels, totallab)) } get_binom_est_ci <- function(x, tot, setNA, conf.level = 0.95) { if(setNA) return(NA_real_) b <- stats::binom.test(x, tot, conf.level = conf.level) unname(c(b$estimate, b$conf.int)) } #' @rdname tableby.stats #' @export binomCI <- function(x, levels=NULL, na.rm=TRUE, weights = NULL, conf.level = 0.95, ...) { if(is.null(levels)) levels <- sort(unique(x)) wtbl <- wtd.table(factor(x[!is.na(x)], levels=levels), weights=weights[!is.na(x)]) ests <- lapply(wtbl, get_binom_est_ci, tot = sum(wtbl), setNA = !is.null(weights), conf.level = conf.level) as.tbstat_multirow(lapply(ests, as.tbstat, parens = c("(", ")"), sep2 = ", ")) } #' @rdname tableby.stats #' @export rowbinomCI <- function(x, levels=NULL, by, by.levels=sort(unique(by)), na.rm=TRUE, weights = NULL, conf.level = 0.95, ..., totallab = "Total") { if(is.null(levels)) levels <- sort(unique(x)) wts <- !is.null(weights) if(na.rm) { idx <- !is.na(x) & !is.na(by) if(wts) idx <- idx & !is.na(weights) x <- x[idx] by <- by[idx] weights <- weights[idx] } wtbls <- lapply(levels, function(L) { tmp <- wtd.table(factor(by[x == L], levels = by.levels), weights = weights[x == L]) wtbl <- c(tmp, stats::setNames(sum(tmp), totallab)) wtbl <- lapply(wtbl, get_binom_est_ci, tot = sum(tmp), setNA = wts, conf.level = conf.level) lapply(wtbl, as.tbstat, parens = c("(", ")"), sep2 = ", ") }) # as.tbstat_multirow(lapply(wtbl, f)) transpose_list(wtbls, levels, c(by.levels, totallab)) } ######## internal functions that we use above ######## wtd.table <- function(x, weights = NULL, na.rm = TRUE) { if(!length(weights)) return(table(x)) tmp <- tapply(weights, x, sum, na.rm = na.rm) tmp[is.na(tmp)] <- 0 # (tapply(default = 0) would be enough in R >= 3.4, but we'll make this backwards-compatible) tmp } wtd.mean <- function(x, weights = NULL, na.rm = TRUE) { if(!length(weights)) return(mean(x, na.rm = na.rm)) if(na.rm) { idx <- !is.na(x + weights) x <- x[idx] weights <- weights[idx] } sum(weights * x)/sum(weights) } wtd.quantile <- function(x, weights=NULL, probs=c(0,0.25,0.5,0.75,1), na.rm=TRUE) { if(!length(weights)) return(stats::quantile(as.numeric(x), probs = probs, na.rm = na.rm)) if(any(probs < 0) || any(probs > 1)) stop("Probabilities must be between 0 and 1 inclusive") wts <- wtd.table(x, weights, na.rm = na.rm) x <- if(is.Date(x)) as.numeric(as.Date(names(wts))) else as.numeric(names(wts)) n <- sum(wts) order <- 1 + (n - 1) * probs low <- pmax(floor(order), 1) high <- pmin(low + 1, n) order <- order%%1 allq <- stats::approx(cumsum(wts), x, xout = c(low, high), method = "constant", f = 1, rule = 2)$y k <- length(probs) stats::setNames((1 - order) * allq[1:k] + order * allq[-(1:k)], probs) } wtd.var <- function(x, weights = NULL, na.rm=TRUE, method = c("unbiased", "ML")) { method <- match.arg(method) if(!length(weights)) return(stats::var(x, na.rm = na.rm)) if(na.rm) { idx <- !is.na(x + weights) x <- x[idx] weights <- weights[idx] } if(length(x) < 2) return(NA_real_) as.numeric(stats::cov.wt(matrix(x, ncol = 1), weights, method = method)$cov) } arsenal/R/defunct.R0000644000176200001440000000132113656527335013667 0ustar liggesusers #' Defunct functions in \code{arsenal} #' #' Details about defunct functions in \code{arsenal} #' #' @param x,y See \code{\link{comparedf}}. #' @param ... Other arguments. #' @details #' \code{comparison.control} was renamed to \code{\link{comparedf.control}} in version 3.0.0. #' #' \code{compare.data.frame} was renamed to \code{\link{comparedf}} in version 3.0.0. #' #' \code{length.tableby} was removed in version 2.0.0. #' #' \code{includeNA.character} and \code{includeNA.numeric} were removed in version 2.0.0 and #' replaced with a default method. #' #' \code{rangeTime} was removed in version 1.5.0. #' @seealso \code{\link{arsenal-deprecated}}, \code{\link{comparedf}} #' @name arsenal-defunct NULL #> NULL arsenal/R/arsenal_table.R0000644000176200001440000002301713737352012015025 0ustar liggesusers #' \code{arsenal} tables with common structure #' #' @param x,y,object An object of class \code{"arsenal_table"} #' @param i,j A vector to index \code{x} with: either names of variables, a numeric vector, or a logical vector of appropriate length. #' \code{i} indexes the x-variables, and \code{j} indexes the by-variables. #' @param value A list of new labels. #' @param all,all.x,all.y Logicals, denoting which terms to keep if not all are in common. #' @param ... Other arguments (only used in \code{print.summary.arsenal_table}) #' @param format Passed to \code{\link[knitr]{kable}}: the format for the table. The default here is "markdown". #' To use the default in \code{kable}, pass \code{NULL}. If \code{x$text} specifies LaTeX or HTML formatting, #' that format is used in the table. #' @param escape Passed to \code{\link[knitr]{kable}}: should special characters be escaped when printed? #' @param width,min.split Passed to \code{\link{smart.split}} for formatting of the "term" column. #' @seealso \code{\link{merge}}, \code{\link{labels}} #' @name arsenal_table NULL #> NULL #' @rdname arsenal_table #' @export has_strata <- function(x) vapply(x$tables, function(x) x$strata$hasStrata, NA) na_lhs_strata <- function(object, ...) { omit <- is.na(object[[1]]) if("(strata)" %in% names(object)) omit <- omit | is.na(object[["(strata)"]]) xx <- object[!omit, , drop = FALSE] if(any(omit)) { temp <- stats::setNames(seq_along(omit)[omit], attr(object, "row.names")[omit]) attr(temp, "class") <- "omit" attr(xx, "na.action") <- temp } xx } #' @rdname arsenal_table #' @export `[.arsenal_table` <- function(x, i, j, ...) { if(missing(i) && missing(j)) return(x) newx <- x give_warn <- function(vec) warning(paste0("Some indices not found in object: ", paste0(vec, collapse = ", ")), call. = FALSE) if(!missing(j)) { if(is.character(j)) { if(any(tmp <- j %nin% names(newx$tables))) { give_warn(j[tmp]) j <- j[!tmp] } # we could leave "j" alone here and use the names, but for when names are missing ("") we'll do this j <- match(names(newx$tables), j, nomatch = 0) } else if(is.numeric(j) && any(tmp <- j %nin% seq_along(newx$tables))) { give_warn(j[tmp]) j <- j[!tmp] } else if(is.logical(j) && length(j) != length(newx$tables)) { stop("Logical vector index not the right length") } if(length(j) == 0 || anyNA(j)) stop("Indices must have nonzero length and no NAs.") newx$tables <- newx$tables[j] } if(!missing(i)) { newx$tables <- lapply(newx$tables, function(yList) { if(is.character(i) && any(tmp <- i %nin% names(yList$x))) { give_warn(i[tmp]) i <- i[!tmp] # we expect the names to be non-missing (""), unlike y } else if(is.numeric(i) && any(tmp <- i %nin% seq_along(yList$x))) { give_warn(i[tmp]) i <- i[!tmp] } else if(is.logical(i) && length(i) != length(yList$x)) { stop("Logical vector index not the right length") } if(length(i) == 0 || anyNA(i)) stop("Indices must have nonzero length and no NAs.") yList$x <- yList$x[i] yList$tables <- lapply(yList$tables, "[", i) yList }) } newx } #' @rdname arsenal_table #' @export labels.arsenal_table <- function(object, ...) { get_lab <- function(x) { xLabs <- vapply(x$x, "[[", NA_character_, "label") aLabs <- if(is.null(x$adjust)) NULL else vapply(x$adjust, "[[", NA_character_, "label") c(stats::setNames(x$y$label, x$y$term), xLabs, aLabs) } labs <- unlist(unname(lapply(object$tables, get_lab)), recursive = FALSE) labs[!duplicated(labs) | !duplicated(names(labs))] } make_ms_labs <- function(x) { tmp <- x$term for(i in seq_along(x$varterm2)) { colon <- if(i > 1) ":" else "" space <- if(all(grepl(paste0(x$varterm2[i], ":"), tmp, fixed = TRUE))) "" else " " tmp <- sub(paste0(colon, x$varterm2[i]), paste0(colon, x$varlabel[i], space), tmp, fixed = TRUE) } x$label <- trimws(tmp) x } #' @rdname arsenal_table #' @export `labels<-.arsenal_table` <- function(x, value) { ## if the value vector is named, then assign the labels to ## those names that match those in x and y if(is.list(value)) value <- unlist(value) if(is.null(value)) { for(i in seq_along(x$tables)) { mk_lab <- function(elt) { if(is.null(elt$variable2) || is.null(elt$varlabel) || is.null(elt$varterm)) { # for tableby elt$label <- elt$term } else { elt$varlabel <- elt$varterm elt <- make_ms_labs(elt) } elt } for(j in seq_along(x$tables[[i]]$x)) x$tables[[i]]$x[[j]] <- mk_lab(x$tables[[i]]$x[[j]]) for(j in seq_along(x$tables[[i]]$adjust)) x$tables[[i]]$adjust[[j]] <- mk_lab(x$tables[[i]]$adjust[[j]]) x$tables[[i]]$y$label <- x$tables[[i]]$y$term x$tables[[i]]$strata$label <- x$tables[[i]]$strata$term } } else if(!is.null(names(value))) { for(L in seq_along(value)) { for(i in seq_along(x$tables)) { nm <- names(value)[L] val <- unname(value[L]) for(j in seq_along(x$tables[[i]]$x)) { if(nm %in% x$tables[[i]]$x[[j]]$term) { x$tables[[i]]$x[[j]]$label[x$tables[[i]]$x[[j]]$term == nm] <- val } else if(nm %in% x$tables[[i]]$x[[j]]$term.orig) { # for backwards-compatibility x$tables[[i]]$x[[j]]$label[x$tables[[i]]$x[[j]]$term.orig == nm] <- val } } for(j in seq_along(x$tables[[i]]$adjust)) { if(nm %in% x$tables[[i]]$adjust[[j]]$term) x$tables[[i]]$adjust[[j]]$label[x$tables[[i]]$adjust[[j]]$term == nm] <- val } if(nm == x$tables[[i]]$y$term) x$tables[[i]]$y$label <- val if(nm %in% x$tables[[i]]$strata$term) x$tables[[i]]$strata$label[x$tables[[i]]$strata$term == nm] <- val } } } else stop("Unnamed label assignments are no longer supported") x } #' @rdname arsenal_table #' @export print.arsenal_table <- function(x, ...) { cat(class(x)[1], "Object\n\nFunction Call:\n") print(x$Call) cat("\nVariable(s):\n") lapply(x$tables, function(tab) { cat(tab$y$term, " ~ ", paste0(unlist(lapply(tab$x, "[[", "term"), use.names = FALSE), collapse = ", "), if(tab$strata$hasStrata) paste0(" (strata = ", paste0(tab$strata$term, collapse = ", "), ")"), "\n", sep = "") }) invisible(x) } #' @rdname arsenal_table #' @export merge.arsenal_table <- function(x, y, all = FALSE, all.x = all, all.y = all, ...) { Call <- match.call() nms.x <- names(x$tables) nms.y <- names(y$tables) nms <- if(all.x && all.y) union(nms.x, nms.y) else if(all.x) nms.x else if(all.y) nms.y else intersect(nms.x, nms.y) if(length(nms) == 0) stop("No terms in common. Do you need 'all=TRUE'?") x <- x[, nms.x %in% nms] # could use names themselves with warn=FALSE, but for freqlist, let's do this instead y <- y[, nms.y %in% nms] nms.x <- names(x$tables) nms.y <- names(y$tables) for(i in seq_along(y$tables)) { ytrm <- names(y$tables)[i] if(ytrm == "") { x$tables[[length(x$tables) + 1]] <- y$tables[[i]] # this is why we need 'i' instead of just 'ytrm' next } else if(ytrm %nin% names(x$table)) { x$tables[[ytrm]] <- y$tables[[ytrm]] next } if(!identical(x$tables[[ytrm]]$y, y$tables[[ytrm]]$y)) stop("By-variables not identical for term ", ytrm) if(!identical(x$tables[[ytrm]]$strata, y$tables[[ytrm]]$strata)) stop("Strata not identical for term ", ytrm) if(!identical(x$tables[[ytrm]]$adjust, y$tables[[ytrm]]$adjust)) stop("Adjust not identical for term ", ytrm) if(x$tables[[ytrm]]$hasWeights != y$tables[[ytrm]]$hasWeights) stop("Weights not present in both objects for term ", ytrm) if(!identical(x$tables[[ytrm]]$family, y$tables[[ytrm]]$family)) stop("'family' not the same in both objects for term ", ytrm) xtrms <- names(y$tables[[ytrm]]$x) x$tables[[ytrm]]$x[xtrms] <- y$tables[[ytrm]]$x if(!is.null(x$tables[[ytrm]]$control.list) && !is.null(y$tables[[ytrm]]$control.list)) x$tables[[ytrm]]$control.list[xtrms] <- y$tables[[ytrm]]$control.list for(j in seq_along(x$tables[[ytrm]]$tables)) { x$tables[[ytrm]]$tables[[j]][xtrms] <- y$tables[[ytrm]]$tables[[j]] } } x$Call <- Call x } #' @rdname arsenal_table #' @export merge.freqlist <- function(x, y, all = TRUE, ...) { nms.x <- setdiff(names(x$tables), "") nms.y <- setdiff(names(y$tables), "") # this check should be okay even if both nms.x and nms.y are empty if(any(nms.x %in% nms.y)) stop("Can only merge freqlist objects with different left-hand sides") NextMethod(all = TRUE) } #' @rdname arsenal_table #' @export print.summary.arsenal_table <- function(x, ..., format = if(!is.null(x$text) && x$text %in% c("html", "latex")) x$text else "markdown", escape = x$text %nin% c("html", "latex"), width = NULL, min.split = NULL) { df <- as.data.frame(x, ..., width = width, min.split = min.split, list.ok = TRUE) #### finally print it out #### shown <- FALSE for(i in seq_along(df)) { if(nrow(df[[i]]) == 0) next shown <- TRUE cap <- if(length(x$title) < i) NULL else x$title[[i]] print(knitr::kable(df[[i]], caption = cap, align = attr(df[[i]], "align"), format = format, row.names = FALSE, escape = escape, ...)) if(!is.null(attr(df[[i]], "tests"))) cat(paste0(attr(df[[i]], "tests"), "\n", collapse = "")) } if(!shown) stop("There wasn't anything to summarize! (All of the tables have 0 rows)") cat("\n") invisible(x) } arsenal/NEWS.md0000644000176200001440000011031514056431136013001 0ustar liggesusers# arsenal v3.6.3 * Added `medtest()` to `tableby()` for a median test. (#327) * Account for NAs in `sign.test()` in `paired()`. (#326) * Add `Nsigntest()` for `paired()`. (#326) * Add `"Nevents"` for binomial GLMs. (#325) * A fix for R devel when digits=0. # arsenal v3.6.2 * Fixed one URL * Fixed two `modelsum` test errors that were a result of rounding on macos. # arsenal v3.6.1 * Fixed two URLs # arsenal v3.6.0 Possible breaking change: * `tableby()`, `modelsum()`, and `freqlist()` now use the `caption=` argument in `knitr::kable()` to generate captions (`comparedf()` already does). (#310) * Changes to `DESCRIPTION` file: - Changed `broom` requirement to >= 0.7.1, in which a bug with `geepack::geeglm` was fixed. This affected one test in the test suite. - Added `geepack` package to "Suggests" (#279). - Removed `gam` from the "Suggests", in favor of `splines`, which the `modelsum()` vignette actually uses. - Changed `survival` requirement to `>= 2.43-1`. Other changes: * Added code to error informatively when "Suggests" aren't available. (#317) * `tableby()`: - Added `selectall()`. - Added "label" option to `cat.simplify=` and `ord.simplify=` for `tableby()`/`paired()`. (#288) - Fixed a bug in `tableby()` / `paired()` where `stats.labels=` specification would remove all default labels. (#316) - Added `wt()` (Wilcoxon test) option for `tableby()`. (#321) - Fixed a bug in `tableby()` relating to a weird edge case when a by-level is "Total" and the total label is set to something else. - Added a feature to `tableby.control()` to allow for the total column to be moved before the other columns. (#320) - Added a feature to `tableby.control()` to allow for dropping of categorical levels. (#318) - Added `meanse()` for `tableby()`. (#315) * `modelsum()`: - Added `relrisk()` to `modelsum()` families (with corresponding addition of `geepack` package to "Suggests"). (#279) - Fixed a bug in `modelsum()` with confidence level for survival. - Suppressed warnings with new broom version when using `MASS::glm.nb()`. * `comparedf()`: - Added option to `diffs()` for extracting not-shared observations. (#305) - Fixed bug in `comparedf()` when things are infinite. (#306) * `write2()`: - Added note to `write2()` vignette about a global option for R Markdown documents in R Studio. (#312) # arsenal v3.5.0 * Change R requirement to >= 3.4.0. * Fixed a bug to conform with new `broom` publication, and change `broom` requirement to >= 0.7.0. (#296) * Fixed a bug to conform with new `knitr` publication, and change `knitr` requirement to >= 1.29. (#299, #300, #301) * The GitHub repository was moved from https://github.com/eheinzen/arsenal/ to https://github.com/mayoverse/arsenal/. The corresponding `pkgdown` site is now at https://mayoverse.github.io/arsenal/ * `freqlist()`: Changed the default for `addNA` to be `TRUE` in the formula method, so that you only have to specify `na.options` like in the table method. * `modelsum()`: - Added support for conditional logistic regressions. (#275) - Fixed a bug in `modelsum.control()` with confidence interval specification. - Expanded the statistic list for survival models. - Eliminated call to `broom::confint_tidy()`. (#296) * `tableby()` / `paired()`: - Removed checks for existance of stat functions (the check wasn't working anyway) and search both the enclosing environment as well as `parent.frame()` (for custom stat functions). - `tableby()`: better described `as.tbstat()` and `as.countpct()` in the vignette, and make `as.countpct()` slightly more flexible. (#283) - `paired()`: fixed a bug with detecting stat functions. - `tableby()`: added an error if reserved terms are used in the by-variable. (#277) - Added an option for HTML footnotes (and superscripts) in `summary.tableby()`. (#298) # arsenal v3.4.0 * Added a `pkgdown` site: https://eheinzen.github.io/arsenal/ * Moved knitr to an import. (#255) * `comparedf()`: Added support for tolerances by variable. (#167) * `tableby()` / `paired()`: - `tableby()`: * Allowed for changing of "Overall" and "Total" labels. (#253, #261) * Allowed for suppression of N's in the header. (#256, #36) * Allowed for digits formatting of N's in the header. (#257) * Escaped `%` for `text="latex"`. (#258) * Added to vignette describing `merge(all=TRUE)`. * Fixed vignette re: outputting to CSV. (#278) - `paired()`: * Allowed for changing of "Difference" label. (#271) * Removed "..." from the documentation for `paired.internal` per CRAN's request (PR#16223 for R-devel). - Added support for "min", "max", "sd", "mean", and "var" (#259) and "gmean", "gsd", "gmeansd", "gmeanCI" (#260) and "Npct" (#263) and "sum" (#281). - Added a more informative error message when no summary statistic is computed. (#273) * `modelsum()`: - Fixed "statistic.F" for family="gaussian". (#262) - Fixed "Nevents" for family="survival". (#266) - Fixed vignette re: outputting to CSV. (#278) * `freqlist()`: Updated `head.summary.freqlist()` and `tail.summary.freqlist()` to comply with new R-devel definitions. * Updated "labels" vignette. (#267) * Added `escape =` argument to `formulize()`. (#282) # arsenal v3.3.0 * `tableby()` / `paired()`: - Redid how weights are handled. The only user-visible changes should be that standard deviations on length-1 groups are now reported as `NA` instead of `NaN`. - Fixed a bug with `modpval.tableby()` when factors are involved. (#239) - Added `meanCI()` and `medianmad()`. (#230, #232) - Added the units for `difftime` statistics when using dates (e.g., `meansd`, `medianmad`, `iqr`). - Fixed Chi-square and Fisher's Exact test for one-level categorical variables. (#227, #228) - Fixed the n's in the header when using weights. (#229) - Fixed a bug with confidence levels supplied through the control argument. (#234) - `paired()`: fixed a bug when using `count()` with factors. (#235) - `tableby.control()`: added explicit `times=` argument for survival summaries. - Added option to run statistical tests even if one by-group has 0 observations. (#233, #250, #251) - Stopped the formatting of p-values when they're not numeric (if, say, they're pre-formatted by the user). (#249) * `modelsum()`: - Added functionality for multiple adjustor sets. (#240) - Fixed "Nmiss" and "N" when used with strata, which now both report the missings for the entire fit. (#241, #242, #243) - Suppressed messages from `pROC::auc()` when calculating AUC. (#244) - Fixed confidence level for survival models. (#245) - Added an option for the likelihood ratio test for the main effect (but not the adjustors): `p.value.lrt` (#238) - Blanked out p-values that are NA. (#246) * `code.chunk()`: - Fixed logic checking the length of `chunk.opts=`. - Allowed for empty code chunks. (#236) * `verbatim()`: removed named argument in favor of the dots; reworked the object structure to fix edge case printing oddities. (#248) * Removed defunct functions. # arsenal v3.2.0 * `comparedf()`: - Fixed a bug when "row.names" was used in combination with other by-variables. (#212) - Allowed for comparison of variables which have any class in common. (#216) - `summary.comparedf()`: * Removed the "comparedf.frame.summary" class from the first element to allow it to print. (#211) * Fixed a bug with reporting blank by-variables. (#213) * Fixed a bug with reporting by-variables as variables in common. (#214) * `modelsum()`: added Wald confidence intervals to `binom.stats=`. (#219) * `tableby()` / `paired()`: - Fixed a bug with `merge.arsenal_table()` losing control parameters for `tableby()` objects. (#221) - Allowed for variable-name-only `labelTranslations=` assignment for terms with inline statistical test specification. Backward compatibility should be maintained here. (#220) - Fixed a bug with assigning `NULL` labels with inline statistical test specifications. (#222) - `summary.tableby()`: fixed a bug with formatting when strata aren't in alphabetical order and have different number of elements (e.g., if only one includes missing values). (#215) # arsenal v3.1.0 * `tableby()` / `paired()`: - Added "Nmiss" to default `surv.stats=` in `tableby.control()`. - Fixed a bug when some `Surv()` elements are NA. (#208) - `tableby.control()`: fixed a bug with simplifying categorical and numeric output. (#199, #203) This fix also allows for simplification of custom statistics. (#200) - `tableby.control()`: added `date.simplify=` and `ordered.simplify=` arguments. (#202) The order of arguments has changed slightly for consistency. - `paired.control()`: took away the arguments that should be the same as `tableby.control()`, and only included arguments with new defaults or which don't appear in `tableby.control()`. - Added the functions `countN()` and `Nrisk()` (#201). `Nrisk()` now outputs what `NriskSurv()` used to; `NriskSurv()` now outputs what its name suggests: the number at risk, and the survival. Additionally, `as.countpct()` gains the `which.pct=` argument, whose default of `0` may break the formatting of percents (`digits.pct=`). * `comparedf()`: - Added additional summary table to the `summary()` output. - Moved the `max.print...=` arguments to `comparedf.control()`. `max.print.diff=` is now deprecated and is replaced by `max.print.diffs.per.var=`. `max.print.diffs=` was also added to control overall number of differences printed. - Fixed a bug with numeric percent tolerances when both values being compared are 0. (#206) - Fixed a bug in `diffs()` (and hence `summary()`) when no variables are compared (#207). Note that this change also included a change to the by-variables reported in the `comparedf()` object when merging over row.names. # arsenal v3.0.0 **There are a few non-backwards-compatible updates.** Major changes: * Renamed `compare()` -> `comparedf()` and `comparison.control()` -> `comparedf.control()`. (#179) * `modelsum()`: Fixed bug(s) with interaction terms. (#173, #177) * Added a new function `loosen.labels()` which removes the classes added by `keep.labels()` and thereby speeds up subsetting when labels are no longer needed. This is now used in `tableby()`, `modelsum()`, `freqlist()`, and `paired()`. Smaller changes: * `tableby()` / `paired()`: - Fixed two bugs relating to `modpval.tableby()`: one which didn't properly assign the p-value name (#174), and one which broke `as.data.frame()` when assigning custom p-values for only one strata (#175). - These now issue informative error when class isn't recognized. (#180) - Fixed two bugs in the `tableby()` vignette: `modpval.tableby()` wasn't working properly (#170), and `pfootnote=TRUE` was commented out (#169). - Fixed a bug with per-variable stats and digit specifications being lost when using the `subset=` argument. (#182, #183) - Made all-NA summaries prettier. (#190) - This now issues a warning when `coin` isn't available for the trend test. (#193) * `comparedf()`: - This now allows for zero-row data.frames. (#166) - `comparedf.control()` now allows for named `tol.vars=` argument to manually match column names. (#165) * `freqlist()`: - Fixed a bug where labels would get dropped when using the `subset=` argument. (#184) - Fixed a bug where labels were lost when subsetting the table and using strata terms. (#196) - `freqlist()`: implemented a `sort()` method to sort tables on frequency. (#187) - `summary.freqlist()`: Implemented `head()` and `tail()`. (#188) - `summary.freqlist()`: fixed a bug when all table counts are 0 and `sparse=FALSE`. (#186, #194) * `keep.labels()`: - This no longer sticks another class on data.frames. - Fixed a bug with replacement for objects of class `"keep_labels"`. * `formulize()`: added the `collapse=` and `collapse.y=` arguments. (#197) # arsenal v2.0.0 There is a new class system (`"arsenal_table"`) which unifies `tableby()`, `modelsum()`, and `freqlist()`. * `arsenal` now imports and re-exports `utils::head()` and `utils::tail()`. * `arsenal` now has a sticker! * `arsenal_table`: - Implemented a new class (without a constructor). - `labels<-.arsenal_table()` doesn't support unnamed labels, as it's unclear how to assign them to multiple by-variables and strata. It also doesn't give warnings if your labels are not used. - `[.arsenal_table()` has an argument `j=` to select the by-variables. - `merge.arsenal_table()` has arguments to select which by-variables to keep if not all are in common. It also checks to make sure that strata, weights, and by-variables are all identical. - `print.arsenal_table()` shows y- and x-variables, plus any strata. * `tableby()` and `paired()`: - Added functionality for multiple by-variables and strata terms. This required completely reworking the innards of the `tableby` object. - Removed `length.tableby()` (because it was messing up `str()` and R Studio) and replaced with `head.tableby()` and `tail.tableby()` (the original purpose to having `length()` defined). - Implemented `sort.tableby()`, which errors out if the object has strata or multiple by-variables, and then runs the default method. - `modpval.tableby()` now requires the first column to be the by-variable, and if the object has a strata, the second column is required to be the corresponding strata value. - `tests.tableby()` now returns a data.frame with a by-variable column and (if applicable) a strata column. - `na.tableby()` now generates functions. The "lhs=" argument determines whether to remove NAs from the first column of the data. If `tableby()` detects a one-sided formula, it sets this to FALSE. Both versions now remove rows with NAs in the strata column (when applicable). - `na.paired()` now removes rows with NAs in the strata column (when applicable). - `padjust.tableby()` and `padjust.summary.tableby()` will error if fed an object with strata or multiple by-variables. - `as.data.frame.tableby()` and `as.data.frame.summary.tableby()` gain the `list.ok=` argument, for when there are multiple left-hand-sides. - Added logic to statistical tests to detect missing levels of the by-variable. - Fixed a bug with LaTeX formatting involving the `align=` argument to `knitr::kable()`. - Passing `term.name=TRUE` to `summary.tableby()` or `as.data.frame.summary.tableby()` will now put the term name in the top left corner of each table. * `modelsum()`: - Added functionality for multiple by-variables and strata terms. This required completely reworking the innards of the `modelsum` object. - `na.modelsum()` now removes rows with NAs in the strata column (when applicable). - `as.data.frame.modelsum()` and `as.data.frame.summary.modelsum()` gain the `list.ok=` argument, for when there are multiple left-hand-sides. - Passing `term.name=TRUE` to `summary.modelsum()` or `as.data.frame.summary.modelsum()` will now put the term name in the top left corner of each table. * `freqlist()`: - Added functionality for multiple by-variables. This required completely reworking the innards of the `freqlist` object. - Changed the argument `groupBy=` to `strata=` to match `tableby()` and `modelsum()`. - Added `merge.freqlist()` and `as.data.frame.summary.freqlist()`. Note that `[.arsenal_table()` now allows you to remove the cumulative and percent columns. - Note that `labels<-.arsenal_table()` no longer supports unnamed labels, but now accepts labels for the frequency, cumulative, and percent columns for `freqlist` objects. - Removed the `digits=`, `sparse=`, `single=`, and `dupLabels=` arguments from `freqlist()` and `summary.freqlist()`. These are now arguments to the new `freq.control()`, and are passed through the dots (for backwards compatibility). `freqlist()` also gained the `control=` argument for objects from `freq.control()`. - `as.data.frame.freqlist()` no longer rounds its digits, nor does it label its columns. Use `as.data.frame.summary.freqlist()` for that instead. It also gained the `list.ok=` argument, for when there are multiple left-hand-sides. * `includeNA()`: removed the "character" and "numeric" methods, replacing them with a default. In particular, this changes the default label of what used to be `includeNA.numeric()`. * `write2()`: - Changed the output to an `.Rmd` file instead of a `.md`. This shouldn't break anything, unless you're relying on the intermediate file. - Replaced the `keep.md=` argument with `keep.rmd=` (since we're not using `.md` files directly anymore). - Added the function `code.chunk()` to write executable code chunks to the `.Rmd`. # arsenal v1.5.0 * `tableby()` and `paired()`: - fixed a bug with specifying individual statistics for character and logical vectors. (#142) - `tableby()` and `paired()`: added a function (`notest()`) to prevent performing a test on an individual variable. (#144) - `summary.tableby()`: changed NA p-values to blanks. (#145) - `summary.tableby()`: added documentation on `bookdown`. (#147) - Wrote `padjust()`, an S3 wrapper for `p.adjust()`, which can also adjust `tableby()` (and hence `paired()`) objects. (#146) - `print.summary.tableby()`, `as.data.frame.summary.tableby()`: added `width=` and `min.split=` as formal arguments. - Fixed `medSurv()` which was calculating the median survival incorrectly, and removed `rangeTime()`, an ambiguous survival statistic. (#32) * `modelsum()`: - `summary.modelsum()`: added documentation on `bookdown`. (#147) - `print.summary.modelsum()`, `as.data.frame.summary.modelsum()`: added `width=` and `min.split=` as formal arguments. * `summary.freqlist()`: added documentation on `bookdown`. (#147) * `formulize()`: added support for names and calls. (#152, #153) # arsenal v1.4.0 **There are a few non-backwards-compatible updates.** Major changes: * `modelsum()`: - Added `family="ordinal"` to do ordinal logistic regression using `MASS::polr()`. (#130) - Added `family="negbin"` to do negative binomial regression using `MASS::glm.nb()`. (#15) - Added support for ordinal regressors and adjustment terms (by adding support for their associated contrasts). (#133) - Allowed for LaTeX formatting. NOTE: this changes (hence possibly breaking old code) the formatting behavior when specifying `text="html"`. (#123) * `summary.compare.data.frame()`: Added a small summary of the input data.frames as the first table. (#126) NOTE: this changes the structure and printed output of `summary()`! * `tableby()`: - Allowed for LaTeX formatting. NOTE: this changes (hence possibly breaking old code) the formatting behavior when specifying `text="html"`. (#123) - Added functionality to in-formula functions to allow the specification of `digits=` (etc.), `numeric.simplify=`, and `cat.simplify=` for a single variable. (#107, #134, #139) NB: this has the following breaking changes: * There is no longer a "name" element in the "tableby" object's x-specifications; instead it's now called "term" * An element for "variable", containing the variable name, was added to the "tableby" object's x-specifications. * An element for "control.list", recording format specifications, was added to the "tableby" object's x-specifications. * The output of `as.data.frame.tableby()` now reports only the variable name in the "variable" column when using internal statistical functions (like `anova()` and `chisq()`--it used to include the function call as well). * The output of `as.data.frame.tableby()` no longer includes category levels in the "term" column; instead, it contains the statistical function used (like `countpct()` and `count()`). Smaller changes: * `modelsum()`: - Added support for *calls* to the family functions, in case a different link function (for example) is required. - Properly propogated "term.name" to the `as.data.frame()` method. (#128) - Fixed formatting of error about unsupported families. - Removed "concordance" from the list of supported statistics for Poisson regression. This shouldn't break much code, as specifying "concordance" wouldn't have shown anything anyway. - Fixed a bug with formatting one-per-model p-values. (#140) * `tableby()`: - Added a warning for when by-variable contains empty string. (#121) - Properly propogated "term.name" to the `as.data.frame()` method. (#127) - Fixed an error that sometimes occured when using categorical statistics on numeric variables. (#137) - Added an argument to `tableby.control()` to simplify one-line numeric output. (#139) * `paired()`: - Added a warning for when by-variable contains empty string. (#121) - Added error to `na.paired("in.both")` when there are more than two time points. * `compare.data.frame()`: - Implemented `n.diff.obs()`. (#124) - Added an argument to `summary()` to allow for the display of attributes. (#125) - Fixed `summary()` to now return the object passed it. (#141) * Updated documentation where appropriate. # arsenal v1.3.0 **This is a mostly backwards-compatible update.** Major changes: * Implemented the function `paired()` for paired data, based on `tableby()`. This comes with a very light vignette. * `tableby()`: Change default for chi-square tests to `correct=FALSE`. Note that this only affects the 2 x 2 case. Smaller changes: * Added the a default method for label assignment (`labels<-`). (#118) * Update `formulize()` to handle non-syntactic names in the `data=` argument. (#105) * `tableby()`: - Implemented `is.tableby()` and `is.summary.tableby()`. (#112) - Changed how arguments are passed to stat tests. - Issue a warning if statistical tests are requested when there are fewer than two by-levels. (#108) - Fixed `trend()` and `anova()` to return an object instead of the object being invisible. - Implemented the stat functions `binomCI()` and `rowbinomCI()` for binomial confidence intervals. (#117) - `summary.tableby()`: ignore row.names when printing summary objects. - `summary.tableby()` and `as.data.frame.summary.tableby()`: added a `term.name=` argument. (#110) - `summary.tableby()`: pass `text="html"` to get better formatting in R shiny. (#114) - Implemented `Ops.tableby()` to compare tableby objects to a number (p-value). (#96) - Implemented `xtfrm.tableby()`, so that tableby objects can be sorted by p-value. (#96) - Implemented `length.tableby()`, so that `head()` and `tail()` also work. (#97) - Implemented `countcellpct()` for counts and cell percentages. (#106) * `modelsum()`: - Implemented `is.modelsum()` and `is.summary.modelsum()`. (#111) - `summary.modelsum()`: ignore row.names when printing summary objects. - `summary.modelsum()` and `as.data.frame.summary.modelsum()`: added a `term.name=` argument. (#109) - Allow for `weights=` and `na.action=`. (#99) - Fixed problem with column names which are prefixes of other column names. (#98) - Fixed problem with column labels overwriting categorical levels which also match. (#100) - `summary.modelsum()`: pass `text="html"` to get better formatting in R shiny. (#115) * `freqlist()`: - Implemented `is.freqlist()` and `is.summary.freqlist()`. (#113) - Fixed a problem with a column named "method". (#95) - `summary.freqlist()`: ignore row.names when printing summary objects. * Update documentation. # arsenal 1.2.0 * Implemented `write2()` methods for `"summary.tableby"`, `"summary.modelsum"`, and `"summary.freqlist"` objects. (#89, #90, #91) * Center-aligned `tableby()` grouping columns in the summary output. (#93) # arsenal 1.1.0 **This is a mostly backwards-compatible update.** Major changes: * `summary.freqlist()` now returns an object. `print.summary.freqlist()` prints the resulting object. (#76) Smaller changes: * `tableby()`: - Fixed a bug in `print.summary.tableby()` involving the lack of wrapping for long labels. (#59) - `as.data.frame.summary.tableby()` has been implemented, and `print.summary.tableby()` updated accordingly. (#60) - Fixed a bug with assigning labels for tableby objects when some value names are unmatched. (#64) - Fixed a bug in `print.summary.tableby()` with regards to knitting in R Markdown with plots immediately following. (#65) - Fixed a bug in `print.summary.tableby()` with regards to PDF output in bookdown. (#69) - Changed `tests.tableby()` to return a data.frame without factors. - Fixed a bug in `meansd()` when all inputs are NA. (#80) - Fixed a bug with `kwt()`, `anova()`, and `summary.tableby()` formatting when all inputs are NA. (#81) - Fixed a bug with survival statistics when all inputs are NA. (#82) - Fixed a bug with `logrank()` when all inputs are NA. (#83) - Fixed how arguments get passed to stats functions in `tableby()`. In particular, this affected the `times=` option. (#84) - Added `iqr()` as a tableby stat option. (#86) - Fixed quantile functions `q1q3()` and `medianq1q3()` for dates. (#87) * `modelsum()`: - Fixed a bug in `print.summary.modelsum()` involving the lack of wrapping for long labels. (#59) - Fixed a bug in `print.summary.modelsum()` with regards to knitting in R Markdown with plots immediately following. (#66) - Fixed a bug in `print.summary.modelsum()` with regards to PDF output in bookdown. (#70) - `as.data.frame.summary.modelsum()` has been implemented, and `print.summary.modelsum()` updated accordingly. (#74) * `freqlist()`: - Fixed a bug in `summary.freqlist()` with regards to knitting in R Markdown with plots immediately following. (#67) - Fixed a bug in `summary.freqlist()` with regards to PDF output in bookdown. (#71) * Other: - `includeNA()` now has dots, and the factor method gained a `first=` argument. (#62) - `includeNA()` also gained a numeric method, especially for use in `freqlist.formula()`. (#78) - Fixed a bug in `print.summary.compare.data.frame()` with regards to PDF output in bookdown. (#72) # arsenal 1.0.0 **This is a non-backwards-compatible update.** Major changes: * `freqlist()`: - `freqlist()` is now an S3 generic. (#35) - The first argument to `freqlist()` has changed from `tab=` to `object=`, for S3 consistency. (#35) - `freqlist.formula()` was implemented, piggybacking off of `stats::xtabs()`. (#35) - The `title=` argument was added to `summary.freqlist()`. Passing `caption=` through the dots to `knitr::kable()` will now throw an error. (#34) * `tableby()`: - `as.data.frame.tableby()` has been totally overhauled. It now uses list-columns to give exact values. - `summary.tableby()` has been totally overhauled. * Most arguments are no longer named, but passed through the dots. * It now returns an object, abusing in the process `as.data.frame.tableby()`. `print.summary.tableby()` prints the resulting object. (#8) * `print.summary.tableby()` now uses `knitr::kable()` to print results, instead of internal functions. As such, non-exported helper functions have all been removed. - The arguments to `tableby.control()` have changed. Warnings will be issued for using old arguments. * `nsmall=` has been removed. `digits=` takes its place. * `nsmall.pct=` and `digits.test=` have been renamed to `digits.pct=` and `digits.p=`, respectively. * There's now an option for count digits (`digits.count=`). * `format.p=` has been added, to turn on formatting of p-values. * `q1q3` is no longer a default continuous statistic. - NAs can be included in percents using `includeNA()`. (#57) - Some additional survival summary functions are now available. (#32) - It is now possible to report row-percents using `countrowpct()`. (#9) * `modelsum()`: - `modelsum()` has been totally overhauled: * It now uses `stats::model.frame()` and unevaluated calls instead of custom-creating data.frames. * It now allows for non-syntactic names (#44, #45). - `as.data.frame.modelsum()` has been totally overhauled. It now gives exact values instead of formatted values. - `summary.modelsum()` has been totally overhauled. * Most arguments are no longer named, but passed through the dots. * It now returns an object, abusing in the process `as.data.frame.modelsum()`. `print.summary.modelsum()` prints the resulting object. (#37) * `print.summary.modelsum()` now uses `knitr::kable()` to print results, instead of internal functions. As such, non-exported helper functions have all been removed. * `print.summary.modelsum()` now strips leading and trailing whitespace from labels to fix formatting with `text=FALSE`. (#48) * `labelTranslations=` no longer accepts labels for the statistics columns. Use `modelsum.control(stat.labels=)` for this instead. - The arguments to `modelsum.control()` have changed. Warnings will be issued for using old arguments. * `nsmall=` has been removed. `digits=` takes its place. * `nsmall.ratio=` and `digits.test=` have been renamed to `digits.ratio=` and `digits.p=`, respectively. * `format.p=` has been added, to turn off formatting of p-values. * `stat.labels=` has been added, to label the statistics columns. - `"[.modelsum"()` now has a named argument, and accepts character, numeric, and logical subscripts. Smaller changes: * `freqlist()`: - `freqlist()` will no longer issue a warning about using the deprecated `varnames=` argument. - `print.freqlist()` has been made slightly more concise. The only change to the printed output is making "variables" singular ("variable") when only one variable is present. * `tableby()`: - `tableby()` has also been made slightly more concise and easier to read. - A bug was fixed when trying to specify "stats" attributes for categorical variables. (#39) - A bug was fixed relating to unnamed passing of arguments for `medianrange()`. (#49) - `as.data.frame.tableby()` no longer breaks with date ranges. (#10) - `as.data.frame.tableby()` no longer breaks with both `count()` and `countpct()`. (#51) - `labels<-.tableby()` no longer breaks for unmatched variables. (#53) - `labels<-.tableby()` now accepts `NULL` to set all labels to NULL. (#52) - The function `Nmiss2()` is now exported for `tableby()`. Note that it is exactly the same as `Nmiss()`, but is interpreted differently in `tableby()`. * `modelsum()`: - `modelsum()` has been made slightly more concise. - "Nmiss2" has been added to the `modelsum()` object and no longer replaces "Nmiss". - `as.data.frame.modelsum()` no longer turns "<0.001" into `NA`. (#31) - `as.data.frame.modelsum()` no longer breaks if there are too many adjustment variables. (#12) - `summary.modelsum()` now has working labels for factors. (#13) - `"labels<-.modelsum"()` has been tweaked slightly. The results shouldn't change. - `print.modelsum()` has been fixed to show its y-variable. (#33) * Documentation and vignettes have been re-reviewed and updated where appropriate. * Tests have been updated to reflect major changes. # arsenal 0.6.1 This is a patch to fix an error appearing with R-devel. We anticipate releasing v1.0.0 soon, which will not be backwards-compatible. * Re-fix trailing whitespace problem in tableby. (#3) # arsenal 0.6.0 * Updated `freqlist()` to be more efficient. (#20) * `freqlist()` now allows named labels. * Fixed one-sided formula detection in `tableby()` when used with `formulize()`. (#21) * Changed environment of formula returned by `formulize()`. * Added variable-name subsetting to `[.tableby()`. One can now subset by logicals, numerics, or character vectors. * Fixed a bug in `tableby()` related to zero-length factor levels. (#22) * Fixed a bug in `tableby()` and `modelsum()` when calling them without loading the package. (#25) * Allowed `nsmall = ` and `digits = ` to be 0, for rounding to integers. (#23) * Added `yaml()` function to use with `write2()`. (#28) * Added the `yaml` package as a suggested package. * Added `write2()` methods for `compare.data.frame()` objects. (#27) * Updated documentation. # arsenal 0.5.0 * Code all now in GitHub; issues can now be submitted there. Checking is now performed automatically on Travis-CI. * Included documentation for getting a caption with `write2()` and `freqlist()`. (#16) * Fixed subsetting in `modelsum()`. (#14) * Fixed multiple class errors in `tableby()`. (#17) * Fixed subset dropping labels in `tableby()` and `modelsum()` with new function `keep.labels()` to make labels "stick". (#1) * Added a vignette discussing labels. * Add `compare.data.frame()`, with an accompanying vignette. # arsenal 0.4.2 * Updated `labelTranslation` documentation * Changed `format.translations` list to `defaultLabelTranslations()` function, and removed labels for sex and age. * Tweaked `labels<-.freqlist` to allow for list input. # arsenal 0.4.1 * Updated `tableby()` vignette. # arsenal 0.4.0 * Tweaked `freqlist()` to behave better with data.frame subsetting and the infamous `drop=TRUE`. * Added `dupLabels=` argument to `summary.freqlist()`. (#6) * Added a label for `medianq1q3()` in `tableby.control()`. (#4) * Changed the `...` vs. `control=` action in `tableby()` and `modelsum()` to fix which arguments are used over which other arguments. (#5) * Moved import `broom` and `stringr` to "Suggests", adding `magrittr` for piping. * Added piping to `write2*()` vignette. * Several `tableby()` fixes for counts and percents. * New options in `tableby.control()` to modify the statistical tests performed. (#2) * Fixed trailing whitespace issue in `tableby()` (#3) # arsenal 0.3.0 * The CRAN release of the locally stable 0.2.3. For NEWS on this version, see below. * Tweaked the DESCRIPTION to include `write2()`. # arsenal 0.2.3 * Fixed ordered stats in `tableby()`. * Fixed a problem with `as.data.frame.modelsum()` introduced in v0.2.1. # arsenal 0.2.2 * Added `count()` function for tableby stats. * Two problems with survival models in `modelsum()` have been resolved. # arsenal 0.2.1 * `write2.list()` has been implemented, allowing multiple tables output into a single document. `write2.verbatim()` has been implemented, allowing monospaced output. The vignette has been updated along with all documentation. * The `summary()` output for `tableby` and `modelsum` objects now prints an extra blank header line, for better use inside R Markdown code chunks. * Two bugs in `tableby()` were corrected. # arsenal 0.2.0 * Vignettes have been updated. * `write2()` is now exported and supports all output formats supported by `rmarkdown::render()`. There is now a vignette for it and the S3 methods have been expanded to handle more inputs, including `knitr::kable()`, `xtable::xtable()`, and `pander::pander_return()`. * Fixed a bug in `summary.modelsum()`. # arsenal 0.1.2 * `broom` and `stringr` have been moved to `imports` instead of `depends`. * Several minor tweaks to `freqlist` for better readability and performance. # arsenal 0.1.1 * The description and the title are more descriptive now, per request. # arsenal 0.1.0 * First release contains major functions `tableby()`, `modelsum()`, `freqlist()`, `formulize()`, and `write2...`. * Vignettes are included for `tableby()`, `modelsum()`, and `freqlist()`. arsenal/MD50000644000176200001440000002066314056520012012211 0ustar liggesusers00fc57a586d278a59ca7001119f46b7b *DESCRIPTION 366aff3414d3769de3fe49a6ec8ecbf7 *NAMESPACE 0bd1904d1388a921d018b7b7f39a67cb *NEWS.md 00b3c9c1fef30661fe7d4886cd5155ef *R/arsenal.R b931ad5809c77ff9624c25592afb6c80 *R/arsenal_table.R 59d0400128715ca191a6f4ef69eb7fc4 *R/as.data.frame.freqlist.R 70bd88f62f63729c07a5fc1ecf7116bb *R/as.data.frame.modelsum.R f7caba614a449218be4e6af68dcf004b *R/as.data.frame.tableby.R cf64d94a1c4cd1104df8884bdc11a9e5 *R/comparedf.R 17aa65b6fc3dff909fb8a837625d81c5 *R/comparedf.control.R 8bbb93a33169e1278f9622d21ccff350 *R/comparedf.internal.R b18e0882d0f1ed4f265615f83db1d0f6 *R/comparedf.tolerances.R fcf9ac38877e24e4c1fa3dc260998e7e *R/defunct.R 29ec82599966ed40f9019957353a72d3 *R/deprecated.R 32b0901543a8a232d1ad693054eaba53 *R/formulize.R 8a7ac1f97afb907409c9afe7a90395b5 *R/freq.control.R cfc2cec0f78f6f55e0a7d7ad82c86d39 *R/freqlist.R d45106b4e9384f73b902a7a517936643 *R/freqlist.internal.R c099012c15f125ddb70b5c3f775626e9 *R/internal.functions.R 8288971c4eef779b835ffd7eaa8042b0 *R/keep.labels.R 4f817aba0c6b02ee0eee33dec51690a4 *R/labels.R 310d0f18b36486d507146ac617eb00c1 *R/magic8.R f3bc182ae0dcd827a0d3d3e191db4828 *R/mdy.Date.R 551c162106f2f1b9a25b8ccc697d13cc *R/mockstudy.R 4be7b4a48e98e8b415c6af0b26a99506 *R/modelsum.R 50022d0af0fd81ff2db0bd02f38138f3 *R/modelsum.control.R 007cda86a45d1ebafe6c1fd86ab14bb6 *R/modelsum.families.R b0e4327ad6c8fb2a1c9d355616183415 *R/modelsum.internal.R d280d3f55b975402d897e7d03047b59d *R/na.operations.R d32519516b0454af38d04de735385213 *R/not.in.R e2458db48a5f1c367d36bf8cd890c549 *R/padjust.R bed06ca6b69970705afb0518b316f74a *R/paired.R 52e7ff674d099b0fd11534743ec0492c *R/paired.control.R 1692e0bf1e6245bdcdd255f3e2d7e8a0 *R/paired.internal.R 7216bcf08eecd8552952d258693c0316 *R/paired.stat.tests.R e1427edd79dc20bae932449d90fb5161 *R/release_questions.R c7ff88360b5b303ba25fe35599b1312e *R/selectall.R 9bd8c43599abcd802d96749ff3cfdd45 *R/summary.comparedf.R d588d11d3ed15aff92b71a3faceeec01 *R/summary.freqlist.R a8a5fa4ca064855b13ebef1cbd6c6f84 *R/summary.modelsum.R 2661143b04f86acad248557235d06882 *R/summary.tableby.R f5532807ca62e7337742edbe94907bc7 *R/tableby.R 541acf9cc93a204197bed71501814d4f *R/tableby.control.R bde4d2d99b5413ea6e46aec3b81d4957 *R/tableby.internal.R a99b9f34b40f0f8fa46e8e691ef431dc *R/tableby.stat.tests.R 3124d05e25213af6783741abdcc22be7 *R/tableby.stats.R a8199d9cbc7b5cc81a504f7450a2a65b *R/write2.R e249005a3d555b60cbc592ad65a4c057 *R/write2.default.R 95694e0917452475f545132ebacc6ecd *R/write2.internal.R c758b698e0c957f3004fd3113a10852e *R/write2specific.R 850b09b96881f5f15c6d0487121f740a *R/yaml.R 044550a0135e7924325db1f785d403bc *README.md 1732109edc956fcf4926236ff7992165 *build/vignette.rds 30cd2d9369ee4d94dd76dcbf5a215559 *data/mockstudy.RData 1ec3a30bcb8667c36653fa4882de4474 *inst/doc/comparedf.R acfca7782c03b9fb51e9811311a51150 *inst/doc/comparedf.Rmd 4b7fe5c39e727075de4d59a9bda588bb *inst/doc/comparedf.html 60b4c69c6150170b31cb42672e2abb0d *inst/doc/freqlist.R fabc612ba0ca6cbeaae1f50f39f49171 *inst/doc/freqlist.Rmd 7276c16721d2d65fb1829cb38e8561be *inst/doc/freqlist.html c59e9af04a835c727b8259455fe769ec *inst/doc/labels.R 4a383f50cc3f97e762a8eb438642faf0 *inst/doc/labels.Rmd 32115502fca4b709fdfea82aff4a0244 *inst/doc/labels.html dc07ca201d7e3ef30e9c840c1b8c5284 *inst/doc/modelsum.R 6c3af64e4530e397b0be7ba2950684f5 *inst/doc/modelsum.Rmd 589560cac724460fa9f3de18bc0bde5a *inst/doc/modelsum.html 5952422462e488184a0279fc950ce329 *inst/doc/paired.R 25b3a2b7399fd3f0b66cc2ad63baf113 *inst/doc/paired.Rmd cfa6d4461756aef5f55ce7305415b25d *inst/doc/paired.html a034e553196b858eee3cfea3fa73162e *inst/doc/tableby.R 6f0fe40700de6a1e1e76c0cb631d072d *inst/doc/tableby.Rmd 504d3597088e7c244ac9920032f8ec30 *inst/doc/tableby.html 71e7e56bb01fe2d9debf687c8a0114ff *inst/doc/write2.R 8d33b75060ec4e8c27466a79b85e617c *inst/doc/write2.Rmd dbb137fbd352adedf173252d37ebca45 *inst/doc/write2.html 24de5e45bd9a676927300962673534fd *man/NA.operations.Rd 475674b09206071649dbd8c6f9bf5523 *man/arsenal-defunct.Rd c4debf759ea8683ad63ab3712d9b9f26 *man/arsenal-deprecated.Rd 4d21261e996cf9ca9997e6e90f95acbf *man/arsenal.Rd a966c2c4b3d700c3d03c9f1bf655a567 *man/arsenal_table.Rd a4a8a304b0fa51d3cc672929d2a74991 *man/as.data.frame.freqlist.Rd 62711eb634b1814d49d3b3b5d07b0a8a *man/as.data.frame.modelsum.Rd eca39c96234fce19545e5c88c7bf2031 *man/as.data.frame.tableby.Rd a019595ffe9e5d8913748027c2c2d437 *man/comparedf.Rd c5b9b08b9e9df07f25b246a0f837147b *man/comparedf.control.Rd 3f4a1a200a7fccc9ec2dc1a1bd1973e8 *man/comparedf.tolerances.Rd 856aaa9db9d3dcde1cd8c402218b26bf *man/diffs.Rd a7c4a5bf6aba71aaf4d3a65434806c7f *man/figures/logo.png dad16bce322acc65d74d124bc830ab65 *man/formulize.Rd df524001e3a56beb780b069d84261fc6 *man/freq.control.Rd bad643ee1d5be4453aa692565e59e4cf *man/freqlist.Rd cc57de41e116e731abbb9751253d0624 *man/freqlist.internal.Rd 511e02b02b5788d3b8daf01ddfa912b5 *man/grapes-nin-grapes.Rd a4284c69120fcad98a851e1466e1c505 *man/internal.functions.Rd 71c0d28d46550b4fcc876a38d3ad4f30 *man/keep.labels.Rd 95045a8cf7451abf38558db7bb12c6c4 *man/labels.Rd 06d96a03bd7eb27621d7188470481597 *man/mdy.Date.Rd f030c577ebcb13219aee2f21d2556e7f *man/mockstudy.Rd 848a1c05a6ba7fc0562e34ce25764107 *man/modelsum.Rd e9b6775d8251955678e8cc1d17cfb2cf *man/modelsum.control.Rd 828791dee9838f898f4494d2e6e224cb *man/modelsum.family.Rd 0c61e87cc402fb0d6ce98e87873fb37c *man/modelsum.internal.Rd 32e6a21a61bc2634219e9c4ad31d9ffa *man/padjust.Rd 6abc70b7878f2a1de812e025524fc973 *man/paired.Rd 91c05999b7d4f95457f2d57b66fd5c50 *man/paired.control.Rd 358900dd4326ea913946830468c762d6 *man/paired.internal.Rd c0da2a53ae130f8419ab085e076988b3 *man/selectall.Rd ae670b101964fb1db3f1993c60691263 *man/summary.comparedf.Rd 4814e66a9130510e1392fc2484e9d6e7 *man/summary.freqlist.Rd 8b9d4c3e12dd4dfe2d840d6f3500ff50 *man/summary.modelsum.Rd b7d1d902f3cf720cff97f41efdf82021 *man/summary.tableby.Rd d6ba644b24eaa87581f2ecf138590e1e *man/tableby.Rd b97e0db7205c972ebac3dcff27b9de1f *man/tableby.control.Rd 926bec4f424b6e9fba4869d35f4b2538 *man/tableby.internal.Rd ce448a34b8db21732184ee1d9c3734cc *man/tableby.stats.Rd b0a3e4d83c1386cf215f8651d4403ec5 *man/tableby.stats.internal.Rd da2448fceb32aacc23f58b61c2820132 *man/write2.Rd e60171f65a62dbf0bc93159d953ad815 *man/write2.internal.Rd c7fc15db72ee5bc262c6b816ae565f48 *man/write2specific.Rd a0caade19c743fa6c1dfb2cbdb537a01 *man/yaml.Rd b86d96d2b720f7aeeb3b9412b87d67ef *tests/testthat.R 48bee3e500877f2e38183bac2f96c09e *tests/testthat/helper-data.R 0e1b2cd238f9c449a27ba0cb5abd358b *tests/testthat/helper-script.R 422b5e4f05f56ef1f659e4c98d318ef5 *tests/testthat/mdat.rds 4c4d3d02eaf28864a82806ad94bf6f3f *tests/testthat/test_comparedf.R cae8cbd85c7b385600f520d296f9d575 *tests/testthat/test_formulize.R 49990d4011033e5a8b942d0d94ff3536 *tests/testthat/test_freqlist.R c04c29dbdce4f382d34b30c0bbe57684 *tests/testthat/test_keep.labels.R de47e8a77a6ad7ac83d6c414ef092ab9 *tests/testthat/test_lhs_freqlist.R 977694d2ffde675dd9f6bb29ce105ff3 *tests/testthat/test_lhs_modelsum.R 463e72b1821c9f091bb2d98c0484807a *tests/testthat/test_lhs_tableby.R 9b982aae5c729554e7e908779edf6777 *tests/testthat/test_modelsum.R 553d1c8aa0786363e390de35e4208815 *tests/testthat/test_paired.R 434e81e3a5570a62204d0a3bd7aa9c51 *tests/testthat/test_tableby.R 35fa02cb7d038fcc67d916e82d2d707e *tests/testthat/test_write2.R b3794531e6505921de872121a0862b46 *tests/testthat/write2.char.html.Rmd 7aee1665f5ecbd9669f105288c266a47 *tests/testthat/write2.freqlist.doc.Rmd 5b9c1238b72c0e69c7d1ef3f654f426c *tests/testthat/write2.freqlist.html.Rmd 5407d67865420aa84c2b463ea2636c47 *tests/testthat/write2.lm.pdf.Rmd 54ece8d9820a82a439882fd31b03c01e *tests/testthat/write2.modelsum.html.Rmd b77d5e6c44e33120a7821954250bba19 *tests/testthat/write2.multititles.pdf.Rmd 43d7972424cdfaa0da368acf36722105 *tests/testthat/write2.mylist.pdf.Rmd 576f0f6562372acaa7a72477975487a9 *tests/testthat/write2.mylist2.doc.Rmd 1d5338db1710bc6206559b43f80af37d *tests/testthat/write2.mylists.pdf.Rmd 7e00a1500436dad5485ecf11e6519e37 *tests/testthat/write2.render.html.Rmd b3a16b1d2dcbf18530ce768fa0365e57 *tests/testthat/write2.tableby.html.Rmd 9fc0c6f51eb8e3f9c1b062bb49385eb7 *tests/testthat/write2.yaml.pdf.Rmd acfca7782c03b9fb51e9811311a51150 *vignettes/comparedf.Rmd fabc612ba0ca6cbeaae1f50f39f49171 *vignettes/freqlist.Rmd 4a383f50cc3f97e762a8eb438642faf0 *vignettes/labels.Rmd 6c3af64e4530e397b0be7ba2950684f5 *vignettes/modelsum.Rmd 25b3a2b7399fd3f0b66cc2ad63baf113 *vignettes/paired.Rmd 6f0fe40700de6a1e1e76c0cb631d072d *vignettes/tableby.Rmd 8d33b75060ec4e8c27466a79b85e617c *vignettes/write2.Rmd arsenal/inst/0000755000176200001440000000000014056514665012670 5ustar liggesusersarsenal/inst/doc/0000755000176200001440000000000014056514665013435 5ustar liggesusersarsenal/inst/doc/tableby.R0000644000176200001440000004633614056514662015213 0ustar liggesusers## ----echo = FALSE--------------------------------------------------------------------------------- options(width = 100) ge330 <- getRversion() >= "3.3.0" ## ---- load-data----------------------------------------------------------------------------------- library(arsenal) require(knitr) require(survival) data(mockstudy) ##load data dim(mockstudy) ##look at how many subjects and variables are in the dataset # help(mockstudy) ##learn more about the dataset and variables str(mockstudy) ##quick look at the data ## ---- simple1------------------------------------------------------------------------------------- tab1 <- tableby(arm ~ sex + age, data=mockstudy) ## ---- simple-text--------------------------------------------------------------------------------- summary(tab1, text=TRUE) ## ---- simple-markdown, results='asis'------------------------------------------------------------- summary(tab1) ## ------------------------------------------------------------------------------------------------- as.data.frame(tab1) ## ------------------------------------------------------------------------------------------------- ## base R frequency example tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm) tmp # Note: The continuity correction is applied by default in R (not used in %table) chisq.test(tmp) ## base R numeric summary example tapply(mockstudy$age, mockstudy$arm, summary) summary(aov(age ~ arm, data=mockstudy)) ## ---- check-labels-------------------------------------------------------------------------------- ## Look at one variable's label attr(mockstudy$age,'label') ## See all the variables with a label unlist(lapply(mockstudy,'attr','label')) # Can also use labels(mockstudy) ## ---- add-label, results='asis'------------------------------------------------------------------- attr(mockstudy$sex,'label') <- 'Gender' tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1) ## ---- results = 'asis'---------------------------------------------------------------------------- labels(mockstudy) <- c(age = 'Age, yrs', sex = "Gender") tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1) ## ---- results='asis'------------------------------------------------------------------------------ mylabels <- list(sex = "SEX", age = "Age, yrs") summary(tab1, labelTranslations = mylabels) ## ---- assignlabels-------------------------------------------------------------------------------- labels(tab1) labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)") labels(tab1) ## ---- results='asis'------------------------------------------------------------------------------ summary(tab1) ## ---- results='asis'------------------------------------------------------------------------------ mycontrols <- tableby.control(test=FALSE, total=FALSE, numeric.test="kwt", cat.test="chisq", numeric.stats=c("N", "median", "q1q3"), cat.stats=c("countpct"), stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3')) tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols) summary(tab2) ## ---- results='asis'------------------------------------------------------------------------------ tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, numeric.stats=c("median","q1q3"), numeric.test="kwt") summary(tab3) ## ---- testformula--------------------------------------------------------------------------------- tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + notest(ast), data=mockstudy) tests(tab.test) ## ---- results='asis'------------------------------------------------------------------------------ summary(tab.test) ## ---- testsAndStats, results='asis'--------------------------------------------------------------- tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") + notest(bmi, "Nmiss","median"), data=mockstudy) summary(tab.test) ## ---- simfe, results='asis'----------------------------------------------------------------------- set.seed(100) tab.catsim <- tableby(arm ~ sex + race, cat.test="fe", simulate.p.value=TRUE, B=500, data=mockstudy) tests(tab.catsim) ## ---- chisqcorrect, results='asis'---------------------------------------------------------------- cat.correct <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), data=mockstudy) tests(cat.correct) cat.nocorrect <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), chisq.correct=FALSE, data=mockstudy) tests(cat.nocorrect) ## ---- nobyvar, results='asis'--------------------------------------------------------------------- tab.noby <- tableby(~ bmi + sex + age, data=mockstudy) summary(tab.noby) ## ---- results="asis"------------------------------------------------------------------------------ summary(tab.test, pfootnote=TRUE) ## ------------------------------------------------------------------------------------------------- mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)]))) table(mockstudy$age.ord, mockstudy$sex) table(mockstudy$age.ordnew, mockstudy$sex) class(mockstudy$age.ord) ## ---- results="asis", eval=requireNamespace("coin", quietly = TRUE)------------------------------- summary(tableby(sex ~ age.ordnew, data = mockstudy), pfootnote = TRUE) summary(tableby(sex ~ age.ord, data = mockstudy), pfootnote = TRUE) ## ---- eval=ge330---------------------------------------------------------------------------------- survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy) survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy) ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy)) ## ---- eval=ge330---------------------------------------------------------------------------------- summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5) ## ---- results='asis', eval=ge330------------------------------------------------------------------ summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv"))) ## ---- results='asis'------------------------------------------------------------------------------ set.seed(100) N <- nrow(mockstudy) mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), year=sample(2005:2009,N,replace=T)) summary(tableby(sex ~ dtentry, data=mockstudy)) ## ---- results='asis'------------------------------------------------------------------------------ ## create a vector specifying the variable names myvars <- names(mockstudy) ## select the 8th through the last variables ## paste them together, separated by the + sign RHS <- paste(myvars[8:10], collapse="+") RHS ## create a formula using the as.formula function as.formula(paste('arm ~ ', RHS)) ## use the formula in the tableby function summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy)) ## ---- results='asis'------------------------------------------------------------------------------ ## The formulize function does the paste and as.formula steps tmp <- formulize('arm',myvars[8:10]) tmp ## More complex formulas could also be written using formulize tmp2 <- formulize('arm',c('ps','hgb^2','bmi')) ## use the formula in the tableby function summary(tableby(tmp, data=mockstudy)) ## ----results='asis'------------------------------------------------------------------------------- varlist1 <- c('age','sex','hgb') varlist2 <- paste0("anova(", c('bmi','alk.phos','ast'), ", 'meansd')") summary(tableby(formulize("arm", c(varlist1, varlist2)), data = mockstudy, numeric.test = "kwt"), pfootnote = TRUE) ## ------------------------------------------------------------------------------------------------- newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi)) dim(mockstudy) table(mockstudy$arm) dim(newdata) names(newdata) ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(sex ~ ., data=newdata)) ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) ## ------------------------------------------------------------------------------------------------- ## create a variable combining the levels of mdquality.s and sex with(mockstudy, table(interaction(mdquality.s,sex))) ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy)) ## ---- results='asis'------------------------------------------------------------------------------ ## create a new grouping variable with combined levels of arm and sex summary(tableby(interaction(mdquality.s, sex) ~ age + bmi, data=mockstudy, subset=arm=="F: FOLFOX")) ## ---- maketrans, results='asis'------------------------------------------------------------------- trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')), data=mockstudy) summary(trans) ## ---- assignlabels2------------------------------------------------------------------------------- labels(trans) labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality') labels(trans) ## ---- transsummary, results='asis'---------------------------------------------------------------- summary(trans) ## ---- results='asis'------------------------------------------------------------------------------ class(mockstudy$mdquality.s) summary(tableby(arm~mdquality.s, data=mockstudy)) ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy)) ## ---- results='asis'------------------------------------------------------------------------------ mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy) mytab2 <- mytab[c('age','sex','alk.phos')] summary(mytab2) summary(mytab[c('age','sex')], digits = 2) summary(mytab[c(3,1)], digits = 3) summary(sort(mytab, decreasing = TRUE)) summary(mytab[mytab < 0.5]) head(mytab, 1) # can also use tail() ## ---- results="asis"------------------------------------------------------------------------------ ## demographics tab1 <- tableby(arm ~ sex + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE)) ## lab data tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"), numeric.test="kwt", total=FALSE)) tab12 <- merge(tab1, tab2) class(tab12) summary(tab12) ## ---- results='asis'------------------------------------------------------------------------------ summary(merge( tableby(sex ~ age, data = mockstudy), tableby(arm ~ bmi, data = mockstudy), all = TRUE )) ## ---- results='asis'------------------------------------------------------------------------------ t1 <- tableby(arm ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(list(arm, sex) ~ age, data = mockstudy), title = c("arm table", "sex table")) ## ------------------------------------------------------------------------------------------------- ## look at how many missing values there are for each variable apply(is.na(mockstudy),2,sum) ## ---- results='asis'------------------------------------------------------------------------------ ## Show how many subjects have each variable (non-missing) summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("N","median"), total=FALSE))) ## Always list the number of missing values summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE))) ## Only show the missing values if there are some (default) summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE))) ## Don't show N at all summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("mean"),total=FALSE))) ## ---- results = 'asis'---------------------------------------------------------------------------- mockstudy$ps.cat <- factor(mockstudy$ps) attr(mockstudy$ps.cat, "label") <- "ps" summary(tableby(sex ~ includeNA(ps.cat), data = mockstudy, cat.stats = "countpct")) ## ---- results='asis'------------------------------------------------------------------------------ summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.p=2, digits.pct=1) ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(arm ~ chisq(sex, digits.pct=1) + anova(age, digits=4) + anova(fu.time, digits = 1), data=mockstudy)) ## ---- results='asis'------------------------------------------------------------------------------ trim10 <- function(x, weights=rep(1,length(x)), ...){ mean(x, trim=.1, ...) } summary(tableby(sex ~ hgb, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","trim10"), numeric.test="kwt", stats.labels=list(Nmiss='Missing values', trim10="Trimmed Mean, 10%")))) ## ---- results='asis'------------------------------------------------------------------------------ trim510comma <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = ", ") } trim510bracket <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = " ", parens = c("[", "]")) } summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510comma"), test = FALSE)) summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510bracket"), test = FALSE)) ## ----results='asis'------------------------------------------------------------------------------- trim10pct <- function(x, weights=rep(1,length(x)), ...){ tmp <- mean(x, trim = 0.05, ...) as.countpct(c(tmp, 10), sep = " ", parens = c("(", ")"), which.count = 0, which.pct = 2, pct = "%") } summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim10pct"), digits = 2, digits.pct = 0, test = FALSE)) ## ------------------------------------------------------------------------------------------------- ##create fake group that is not balanced by age/sex set.seed(200) mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)), sample(c('A','B'),replace=T, prob=c(.8,.4))) mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE) ## create weights based on agegp and sex distribution tab1 <- with(mockstudy,table(agegp, sex)) tab2 <- with(mockstudy, table(agegp, sex, fake_arm)) tab2 gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2 gpwts[gpwts>50] <- 30 ## apply weights to subjects index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$fake_arm, summary) ## ---- results='asis', eval=ge330------------------------------------------------------------------ orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE) summary(orig, title='No Case Weights used') tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts) summary(tab1, title='Case Weights used') ## ---- results='asis', eval=ge330------------------------------------------------------------------ mypval <- data.frame( byvar = "fake_arm", variable = c('age','sex','Surv(fu.time/365, fu.stat)'), adj.pvalue = c(.953,.811,.01), method = c('Age/Sex adjusted model results') ) tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE) summary(tab2, title='Case Weights used, p-values added', pfootnote=TRUE) ## ---- results='asis'------------------------------------------------------------------------------ table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE) summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality")) ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(arm ~ age + ast, data = mockstudy, numeric.simplify=TRUE, numeric.stats=c("Nmiss", "meansd"))) ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(arm ~ anova(age, "meansd", numeric.simplify=TRUE) + chisq(sex, cat.simplify=TRUE), data = mockstudy)) ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(arm ~ sex, cat.simplify = "label", data = mockstudy)) ## ------------------------------------------------------------------------------------------------- tab1 <- summary(tableby(arm~sex+age, data=mockstudy), text = NULL) as.data.frame(tab1) # write.csv(tab1, '/my/path/here/my_table.csv') ## ----eval = FALSE--------------------------------------------------------------------------------- # ## write to an HTML document # tab1 <- tableby(arm ~ sex + age, data=mockstudy) # write2html(tab1, "~/trash.html") # # ## write to a Word document # write2word(tab1, "~/trash.doc", title="My table in Word") ## ----eval=FALSE----------------------------------------------------------------------------------- # # A standalone shiny app # library(shiny) # library(arsenal) # data(mockstudy) # # shinyApp( # ui = fluidPage(tableOutput("table")), # server = function(input, output) { # output$table <- renderTable({ # as.data.frame(summary(tableby(sex ~ age, data = mockstudy), text = "html")) # }, sanitize.text.function = function(x) x) # } # ) ## ----eval=FALSE----------------------------------------------------------------------------------- # summary(tableby(sex ~ age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ## ----results='asis'------------------------------------------------------------------------------- tab <- summary(tableby(sex ~ age + fu.time + bmi + mdquality.s, data = mockstudy)) tab padjust(tab, method = "bonferroni") ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy)) ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy), term.name = TRUE) ## ----results='asis'------------------------------------------------------------------------------- summary(tableby(list(sex, ps) ~ age + bmi, strata = arm, data = mockstudy)) ## ------------------------------------------------------------------------------------------------- args(tableby.control) ## ------------------------------------------------------------------------------------------------- args(arsenal:::summary.tableby) arsenal/inst/doc/paired.Rmd0000644000176200001440000000660013656527336015353 0ustar liggesusers--- title: "The paired function" author: "Ethan Heinzen, Beth Atkinson, Jason Sinnwell" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The paired function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = FALSE} options(width = 100) ``` # Introduction Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro `%paired` was written to create summary tables with a single call. With the increasing interest in R, we have developed the function `paired()` to create similar tables within the R environment. This vignette is light on purpose; `paired()` piggybacks off of tableby, so most documentation there applies here, too. # Simple Example The first step when using the `paired()` function is to load the `arsenal` package. We can't use `mockstudy` here because we need a dataset with paired observations, so we'll create our own dataset. ```{r, load-data} library(arsenal) dat <- data.frame( tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)), id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6), Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"), Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")), Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA), Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")), Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4), stringsAsFactors = FALSE ) ``` To create a simple table stratified by time point, use a `formula=` statement to specify the variables that you want summarized and the `id=` argument to specify the paired observations. ```{r results = 'asis'} p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE) summary(p) ``` The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2. # NAs Note that by default, observations which do not have both timepoints are removed. This is easily changed using the `na.action = na.paired("")` argument. For example: ```{r results = 'asis'} p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("fill")) summary(p) ``` For more details, see the help page for `na.paired()`. # Available Function Options ## Testing options The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function. The following tests are accepted: * `paired.t`: A paired t-test. * `mcnemar`: McNemar's test. * `signed.rank`: the signed-rank test. * `sign.test`: the sign test. * `notest`: Don't perform a test. ## `paired.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `paired.control` or in `summary.tableby`. ```{r} args(paired.control) ``` ## `summary.tableby` settings Since the "paired" object inherits "tableby", the `summary.tableby` function is what's actually used to format and print the table. ```{r} args(arsenal:::summary.tableby) ``` arsenal/inst/doc/freqlist.Rmd0000644000176200001440000002277213656527336015750 0ustar liggesusers--- title: "The freqlist function" author: "Tina Gunderson and Ethan Heinzen" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The freqlist function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) ``` # Overview `freqlist()` is a function meant to produce output similar to SAS's `PROC FREQ` procedure when using the `/list` option of the `TABLE` statement. `freqlist()` provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the `knitr` package for printing. ```{r message = FALSE} require(arsenal) ``` ## Sample dataset For our examples, we'll load the `mockstudy` data included with this package and use it to create a basic table. Because they have fewer levels, for brevity, we'll use the variables arm, sex, and mdquality.s to create the example table. We'll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R. ```{r loading.data} # load the data data(mockstudy) # retain NAs when creating the table using the useNA argument tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") ``` # The `freqlist` object The `freqlist()` function is an S3 generic (with methods for tables and formulas) which returns an object of class `"freqlist"`. ```{r console.output} example1 <- freqlist(tab.ex) str(example1) # view the data frame portion of freqlist output head(as.data.frame(example1)) ## or use as.data.frame(example1) ``` # Basic output using `summary()` The `summary` method for `freqlist()` relies on the `kable()` function (in the `knitr` package) for printing. `knitr::kable()` converts the output to markdown which can be printed in the console or easily rendered in Word, PDF, or HTML documents. Note that you must supply `results="asis"` to properly format the markdown output. ```{r, results = 'asis'} summary(example1) ``` You can print a title for the table using the `title=` argument. ```{r, results = 'asis'} summary(example1, title="Basic freqlist output") ``` You can also easily pull out the `freqlist` data frame for more complicated formatting or manipulation (e.g. with another function such as `xtable()` or `pander()`) using `as.data.frame(summary())`: ```{r} head(as.data.frame(summary(example1))) ``` # Using a formula with `freqlist` Instead of passing a pre-computed table to `freqlist()`, you can instead pass a formula, which will be in turn passed to the `xtabs()` function. Additional `freqlist()` arguments are passed through the `...` to the `freqlist()` table method. Note that `freqlist()` sets the `addNA=TRUE` argument by default: ```{r results='asis'} summary(freqlist(~ arm + sex + mdquality.s, data = mockstudy)) ``` One can also set NAs to an explicit value using `includeNA()`. ```{r results='asis'} summary(freqlist(~ arm + sex + includeNA(mdquality.s, "Missing"), data = mockstudy)) ``` In fact, since `xtabs()` allows for left-hand-side weights, so does `freqlist()`! ```{r results='asis'} mockstudy$weights <- c(10000, rep(1, nrow(mockstudy) - 1)) summary(freqlist(weights ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ``` You can also specify multiple weights: ```{r results='asis'} mockstudy$weights2 <- c(rep(1, nrow(mockstudy) - 1), 10000) summary(freqlist(list(weights, weights2) ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ``` # Rounding percentage digits or changing variable names for printing The `digits.pct=` argument takes a single numeric value and controls the number of digits of percentages in the output. The `digits.count=` argument takes a similar argument and controls the number of digits of the count columns. The `labelTranslations=` argument is a named character vector or list. Both options are applied in the following example. ```{r labelTranslations, results = 'asis'} example2 <- freqlist(tab.ex, labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL"), digits.pct = 1, digits.count = 1) summary(example2) ``` # Additional examples ## Including combinations with frequencies of zero The `sparse=` argument takes a single logical value as input. The default option is `FALSE`. If set to `TRUE`, the sparse option will include combinations with frequencies of zero in the list of results. As our initial table did not have any such levels, we create a second table to use in our example. ```{r sparse, results = 'asis'} summary(freqlist(~ race + sex + arm, data = mockstudy, sparse = TRUE, digits.pct=1)) ``` ## Options for NA handling The various `na.options=` allow you to include or exclude data with missing values for one or more factor levels in the counts and percentages, as well as show the missing data but exclude it from the cumulative counts and percentages. The default option is to include all combinations with missing values. ```{r na.options, results = 'asis'} summary(freqlist(tab.ex, na.options="include")) summary(freqlist(tab.ex, na.options="showexclude")) summary(freqlist(tab.ex, na.options="remove")) ``` ## Frequency counts and percentages subset by factor levels The `strata=` argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the `single = TRUE` option when printing will collapse the subsetted result into a single table. ```{r freq.counts, results='asis'} example3 <- freqlist(tab.ex, strata = c("arm","sex")) summary(example3) #using the single = TRUE argument will collapse results into a single table for printing summary(example3, single = TRUE) ``` ## Show only the "n" most common combinations in each table (`head()` and `sort()`) You can now sort `freqlist()` objects, and, by taking the `head()` of the summary, output the most common frequencies. This looks the prettiest with `dupLabels=TRUE`. ```{r} head(summary(sort(example1, decreasing = TRUE), dupLabels = TRUE)) ``` ## Change labels on the fly ```{r changelabs, results = 'asis'} labs <- c(arm = "Arm", sex = "Sex", mdquality.s = "QOL", freqPercent = "%") labels(example1) <- labs summary(example1) ``` You can also supply `labelTranslations=` to `summary()`. ```{r, results = 'asis'} summary(example1, labelTranslations = labs) ``` ## Using `xtable()` to format and print `freqlist()` results Fair warning: `xtable()` has kind of a steep learning curve. These examples are given without explanation, for more advanced users. ```{r results='asis'} require(xtable) # set up custom function for xtable text italic <- function(x) paste0('', x, '') xftbl <- xtable(as.data.frame(summary(example1)), caption = "xtable formatted output of freqlist data frame", align="|r|r|r|r|c|c|c|r|") # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ``` ## Use `freqlist` in bookdown Since the backbone of `freqlist()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.freqlist()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(freqlist(~ sex + age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` # Appendix: Notes regarding table options in R ## NAs There are several widely used options for basic tables in R. The `table()` function in base R is probably the most common; by default it excludes NA values. You can change NA handling in `base::table()` using the `useNA=` or `exclude=` arguments. ```{r} # base table default removes NAs tab.d1 <- base::table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") tab.d1 ``` `xtabs()` is similar to `table()`, but uses a formula-based syntax. However, NAs must be explicitly added to each factor using the `addNA()` function or using the argument `addNA = TRUE`. ```{r} # without specifying addNA tab.d2 <- xtabs(formula = ~ arm + sex + mdquality.s, data = mockstudy) tab.d2 # now with addNA tab.d3 <- xtabs(~ arm + sex + addNA(mdquality.s), data = mockstudy) tab.d3 ``` Since the formula method of `freqlist()` uses `xtabs()`, NAs should be treated in the same way. `includeNA()` can also be helpful here for setting explicit NA values. ## Table dimname names (dnn) Supplying a data.frame to the `table()` function without giving columns individually will create a contingency table using all variables in the data.frame. However, if the columns of a data.frame or matrix are supplied separately (i.e., as vectors), column names will not be preserved. ```{r} # providing variables separately (as vectors) drops column names table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s) ``` If desired, you can use the `dnn=` argument to pass variable names. ```{r} # add the column name labels back using dnn option in base::table table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s, dnn=c("Arm", "Sex", "QOL")) ``` You can also name the arguments to `table()`: ```{r} table(Arm = mockstudy$arm, Sex = mockstudy$sex, QOL = mockstudy$mdquality.s) ``` If using `freqlist()`, you can provide the labels directly to `freqlist()` or to `summary()` using `labelTranslations=`. arsenal/inst/doc/tableby.Rmd0000755000176200001440000010606514045272104015521 0ustar liggesusers--- title: "The tableby function" author: "Beth Atkinson, Ethan Heinzen, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The tableby function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = FALSE} options(width = 100) ge330 <- getRversion() >= "3.3.0" ``` # Introduction One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm). Locally at Mayo, the SAS macros `%table` and `%summary` were written to create summary tables with a single call. With the increasing interest in R, we have developed the function `tableby` to create similar tables within the R environment. In developing the `tableby()` function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R's strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `tableby()`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R Markdown documents. For those who are new to the topic, a good initial resource is available at [rmarkdown.rstudio.com](https://rmarkdown.rstudio.com/). # Simple Example The first step when using the `tableby` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples. ```{r, load-data} library(arsenal) require(knitr) require(survival) data(mockstudy) ##load data dim(mockstudy) ##look at how many subjects and variables are in the dataset # help(mockstudy) ##learn more about the dataset and variables str(mockstudy) ##quick look at the data ``` To create a simple table stratified by treatment arm, use a formula statement to specify the variables that you want summarized. The example below uses age (a continuous variable) and sex (a factor). ```{r, simple1} tab1 <- tableby(arm ~ sex + age, data=mockstudy) ``` If you want to take a quick look at the table, you can use `summary()` on your tableby object and the table will print out as text in your R console window. If you use `summary()` without any options you will see a number of $\ $ statements which translates to "space" in HTML. ## Pretty text version of table If you want a nicer version in your console window then add the `text=TRUE` option. ```{r, simple-text} summary(tab1, text=TRUE) ``` ## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the R chunk. This changes the layout slightly (compresses it) and bolds the variable names. ```{r, simple-markdown, results='asis'} summary(tab1) ``` ## Data frame version of table If you want a data.frame version, simply use `as.data.frame`. ```{r} as.data.frame(tab1) ``` ## Summaries using standard R code ```{r} ## base R frequency example tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm) tmp # Note: The continuity correction is applied by default in R (not used in %table) chisq.test(tmp) ## base R numeric summary example tapply(mockstudy$age, mockstudy$arm, summary) summary(aov(age ~ arm, data=mockstudy)) ``` # Modifying Output ## Add labels In the above example, age is shown with a label (Age in Years), but sex is listed "as is" with lower case letters. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R. ```{r, check-labels} ## Look at one variable's label attr(mockstudy$age,'label') ## See all the variables with a label unlist(lapply(mockstudy,'attr','label')) # Can also use labels(mockstudy) ``` If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset. ```{r, add-label, results='asis'} attr(mockstudy$sex,'label') <- 'Gender' tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1) ``` You can also use the built-in `data.frame` method for `labels<-`: ```{r, results = 'asis'} labels(mockstudy) <- c(age = 'Age, yrs', sex = "Gender") tab1 <- tableby(arm ~ sex + age, data=mockstudy) summary(tab1) ``` Another option is to add labels after you have created the table ```{r, results='asis'} mylabels <- list(sex = "SEX", age = "Age, yrs") summary(tab1, labelTranslations = mylabels) ``` Alternatively, you can check the variable labels and manipulate them with a function called `labels`, which works on the `tableby` object. ```{r, assignlabels} labels(tab1) labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)") labels(tab1) ``` ```{r, results='asis'} summary(tab1) ``` ## Change summary statistics globally Currently the default behavior is to summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value. For categorical variables the default is to show: Number of missing values and count (column percent) with a chi-square p-value. This behavior can be modified using the tableby.control function. In fact, you can save your standard settings and use that for future tables. Note that `test=FALSE` and `total=FALSE` results in the total column and p-value column not being printed. ```{r, results='asis'} mycontrols <- tableby.control(test=FALSE, total=FALSE, numeric.test="kwt", cat.test="chisq", numeric.stats=c("N", "median", "q1q3"), cat.stats=c("countpct"), stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3')) tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols) summary(tab2) ``` You can also change these settings directly in the tableby call. ```{r, results='asis'} tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, numeric.stats=c("median","q1q3"), numeric.test="kwt") summary(tab3) ``` ## Change summary statistics within the formula In addition to modifying summary options globally, it is possible to modify the test and summary statistics for specific variables within the formula statement. For example, both the kwt (Kruskal-Wallis rank-based) and anova (asymptotic analysis of variance) tests apply to numeric variables, and we can use one for the variable "age", another for the variable "bmi", and no test for the variable "ast". A list of all the options is shown at the end of the vignette. The `tests` function can do a quick check on what tests were performed on each variable in tableby. ```{r, testformula} tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + notest(ast), data=mockstudy) tests(tab.test) ``` ```{r, results='asis'} summary(tab.test) ``` Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function. The function names must be strings that are functions already written for tableby, built-in R functions like mean and range, or user-defined functions. ```{r, testsAndStats, results='asis'} tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") + notest(bmi, "Nmiss","median"), data=mockstudy) summary(tab.test) ``` ## Controlling Options for Categorical Tests (Chisq and Fisher's) The formal tests for categorical variables against the levels of the by variable, chisq and fe, have options to simulate p-values. We show how to turn on the simulations for these with 500 replicates for the Fisher's test (fe). ```{r, simfe, results='asis'} set.seed(100) tab.catsim <- tableby(arm ~ sex + race, cat.test="fe", simulate.p.value=TRUE, B=500, data=mockstudy) tests(tab.catsim) ``` The chi-square test on 2x2 tables applies Yates' continuity correction by default, so we provide an option to turn off the correction. We show the results with and without the correction that is applied to treatment arm by sex, if we use subset to ignore one of the three treatment arms. ```{r, chisqcorrect, results='asis'} cat.correct <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), data=mockstudy) tests(cat.correct) cat.nocorrect <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), chisq.correct=FALSE, data=mockstudy) tests(cat.nocorrect) ``` ## Modifying the look & feel in Word documents You can easily create Word versions of `tableby` output via an Rmarkdown report and the default options will give you a reasonable table in Word - just select the "Knit Word" option in RStudio. **The functionality listed in this next paragraph is coming soon but needs an upgraded version of RStudio** If you want to modify fonts used for the table, then you'll need to add an extra line to your header at the beginning of your file. You can take the `WordStylesReference01.docx` file and modify the fonts (storing the format preferences in your project directory). To see how this works, run your report once using WordStylesReference01.docx and then WordStylesReference02.docx. ``` output: word_document reference_docx: /projects/bsi/gentools/R/lib320/arsenal/doc/WordStylesReference01.docx ``` For more information on changing the look/feel of your Word document, see the [Rmarkdown documentation](https://bookdown.org/yihui/rmarkdown/word-document.html) website. # Additional Examples Here are multiple examples showing how to use some of the different options. ## 1. Summarize without a group/by variable ```{r, nobyvar, results='asis'} tab.noby <- tableby(~ bmi + sex + age, data=mockstudy) summary(tab.noby) ``` ## 2. Display footnotes indicating which "test" was used ```{r, results="asis"} summary(tab.test, pfootnote=TRUE) ``` ## 3. Summarize an ordered factor When comparing groups of ordered data there are a couple of options. The **default** uses a general independence test available from the `coin` package. For two-group comparisons, this is essentially the Armitage trend test. The other option is to specify the Kruskal Wallis test. The example below shows both options. ```{r} mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)]))) table(mockstudy$age.ord, mockstudy$sex) table(mockstudy$age.ordnew, mockstudy$sex) class(mockstudy$age.ord) ``` ```{r, results="asis", eval=requireNamespace("coin", quietly = TRUE)} summary(tableby(sex ~ age.ordnew, data = mockstudy), pfootnote = TRUE) summary(tableby(sex ~ age.ord, data = mockstudy), pfootnote = TRUE) ``` ## 4. Summarize a survival variable First look at the information that is presented by the `survfit()` function, then see how the same results can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%). ```{r, eval=ge330} survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy) survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy) ``` ```{r, results='asis'} summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy)) ``` It is also possible to obtain summaries of the % survival at certain time points (say the probability of surviving 1-year). ```{r, eval=ge330} summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5) ``` ```{r, results='asis', eval=ge330} summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv"))) ``` ## 5. Summarize date variables Date variables by default are summarized with the number of missing values, the median, and the range. For example purposes we've created a random date. Missing values are introduced for impossible February dates. ```{r, results='asis'} set.seed(100) N <- nrow(mockstudy) mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), year=sample(2005:2009,N,replace=T)) summary(tableby(sex ~ dtentry, data=mockstudy)) ``` ## 6. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. ```{r, results='asis'} ## create a vector specifying the variable names myvars <- names(mockstudy) ## select the 8th through the last variables ## paste them together, separated by the + sign RHS <- paste(myvars[8:10], collapse="+") RHS ## create a formula using the as.formula function as.formula(paste('arm ~ ', RHS)) ## use the formula in the tableby function summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy)) ``` These steps can also be done using the `formulize` function. ```{r, results='asis'} ## The formulize function does the paste and as.formula steps tmp <- formulize('arm',myvars[8:10]) tmp ## More complex formulas could also be written using formulize tmp2 <- formulize('arm',c('ps','hgb^2','bmi')) ## use the formula in the tableby function summary(tableby(tmp, data=mockstudy)) ``` To change summary statistics or statistical tests en masse, consider using `paste0()` together with `formulize()`: ```{r results='asis'} varlist1 <- c('age','sex','hgb') varlist2 <- paste0("anova(", c('bmi','alk.phos','ast'), ", 'meansd')") summary(tableby(formulize("arm", c(varlist1, varlist2)), data = mockstudy, numeric.test = "kwt"), pfootnote = TRUE) ``` ## 7. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>5 and in the F: FOLFOX treatment group). * The first approach uses the subset function applied to the dataset `mockstudy`. This example also selects a subset of variables. The `tableby` function is then applied to this subsetted data. ```{r} newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi)) dim(mockstudy) table(mockstudy$arm) dim(newdata) names(newdata) ``` ```{r, results='asis'} summary(tableby(sex ~ ., data=newdata)) ``` * The second approach does the same analysis but uses the subset argument within `tableby` to subset the data. ```{r, results='asis'} summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) ``` ## 8. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex with(mockstudy, table(interaction(mdquality.s,sex))) ``` ```{r, results='asis'} summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy)) ``` ```{r, results='asis'} ## create a new grouping variable with combined levels of arm and sex summary(tableby(interaction(mdquality.s, sex) ~ age + bmi, data=mockstudy, subset=arm=="F: FOLFOX")) ``` ## 9. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols `/ - + ^ *` then surround the new variable by `I()`. ```{r, maketrans, results='asis'} trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')), data=mockstudy) summary(trans) ``` The labels for these variables aren't exactly what we'd like, so we can change modify those after the fact. Instead of typing out the very long variable names, you can modify specific labels by position. ```{r, assignlabels2} labels(trans) labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality') labels(trans) ``` ```{r, transsummary, results='asis'} summary(trans) ``` Note that if we had not changed `mdquality.s` to a factor, it would have been summarized as though it were a continuous variable. ```{r, results='asis'} class(mockstudy$mdquality.s) summary(tableby(arm~mdquality.s, data=mockstudy)) ``` Another option would be to specify the test and summary statistics. In fact, if I had a set of variables coded 0/1 and that was all I was summarizing, then I could change the global option for continuous variables to use the chi-square test and show countpct. ```{r, results='asis'} summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy)) ``` ## 10. Subsetting (change the ordering of the variables, delete a variable, sort by p-value, filter by p-value) ```{r, results='asis'} mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy) mytab2 <- mytab[c('age','sex','alk.phos')] summary(mytab2) summary(mytab[c('age','sex')], digits = 2) summary(mytab[c(3,1)], digits = 3) summary(sort(mytab, decreasing = TRUE)) summary(mytab[mytab < 0.5]) head(mytab, 1) # can also use tail() ``` ## 11. Merge two `tableby` objects together It is possible to combine two tableby objects so that they print out together. Overlapping by-variables will have their x-variables concatenated, and (if `all=TRUE`) non-overlapping by-variables will have their tables printed separately. ```{r, results="asis"} ## demographics tab1 <- tableby(arm ~ sex + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE)) ## lab data tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"), numeric.test="kwt", total=FALSE)) tab12 <- merge(tab1, tab2) class(tab12) summary(tab12) ``` For tables with two different outcomes, consider the `all=TRUE` argument: ```{r, results='asis'} summary(merge( tableby(sex ~ age, data = mockstudy), tableby(arm ~ bmi, data = mockstudy), all = TRUE )) ``` ## 12. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. ```{r, results='asis'} t1 <- tableby(arm ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` With multiple left-hand sides, you can pass a vector or list to determine labels for each table: ```{r, results='asis'} summary(tableby(list(arm, sex) ~ age, data = mockstudy), title = c("arm table", "sex table")) ``` ## 13. Modify how missing values are displayed Depending on the report you are writing you have the following options: * Show how many subjects have each variable * Show how many subjects are missing each variable * Show how many subjects are missing each variable only if there are any missing values * Don't indicate missing values at all ```{r} ## look at how many missing values there are for each variable apply(is.na(mockstudy),2,sum) ``` ```{r, results='asis'} ## Show how many subjects have each variable (non-missing) summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("N","median"), total=FALSE))) ## Always list the number of missing values summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE))) ## Only show the missing values if there are some (default) summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE))) ## Don't show N at all summary(tableby(sex ~ ast + age, data=mockstudy, control=tableby.control(numeric.stats=c("mean"),total=FALSE))) ``` One might also consider the use of `includeNA()` to include NAs in the counts and percents for categorical variables. ```{r, results = 'asis'} mockstudy$ps.cat <- factor(mockstudy$ps) attr(mockstudy$ps.cat, "label") <- "ps" summary(tableby(sex ~ includeNA(ps.cat), data = mockstudy, cat.stats = "countpct")) ``` ## 14. Modify the number of digits used Within tableby.control function there are 4 options for controlling the number of significant digits shown. * digits: controls the number of digits after the decimal place for continuous values * digits.count: controls the number of digits after the decimal point for counts * digits.pct: controls the number of digits after the decimal point for percents * digits.p: controls the number of digits after the decimal point for p-values ```{r, results='asis'} summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.p=2, digits.pct=1) ``` With the exception of `digits.p`, all of these can be specified on a per-variable basis using the in-formula functions that specify which tests are run: ```{r results='asis'} summary(tableby(arm ~ chisq(sex, digits.pct=1) + anova(age, digits=4) + anova(fu.time, digits = 1), data=mockstudy)) ``` ## 15. Create a user-defined summary statistic For purposes of this example, the code below creates a trimmed mean function (trims 10%) and use that to summarize the data. Note the use of the `...` which tells R to pass extra arguments on - this is required for user-defined functions. In this case, `na.rm=T` is passed to `myfunc`. The *weights* argument is also required, even though it isn't passed on to the internal function in this particular example. ```{r, results='asis'} trim10 <- function(x, weights=rep(1,length(x)), ...){ mean(x, trim=.1, ...) } summary(tableby(sex ~ hgb, data=mockstudy, control=tableby.control(numeric.stats=c("Nmiss","trim10"), numeric.test="kwt", stats.labels=list(Nmiss='Missing values', trim10="Trimmed Mean, 10%")))) ``` For statistics to be formatted appropriately, you may want to use `as.tbstat()` or `as.countpct()`. For example, suppose you want to create a trimmed mean function that trims by both 5 and 10 percent. The first example shows them separated by a comma; the second puts the 10% trimmed mean in brackets ```{r, results='asis'} trim510comma <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = ", ") } trim510bracket <- function(x, weights=rep(1,length(x)), ...){ tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...)) as.tbstat(tmp, sep = " ", parens = c("[", "]")) } summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510comma"), test = FALSE)) summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510bracket"), test = FALSE)) ``` Or perhaps it's useful to put the amount of trimming in parentheses. Since it is a percent, we can flag it as such: ```{r results='asis'} trim10pct <- function(x, weights=rep(1,length(x)), ...){ tmp <- mean(x, trim = 0.05, ...) as.countpct(c(tmp, 10), sep = " ", parens = c("(", ")"), which.count = 0, which.pct = 2, pct = "%") } summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim10pct"), digits = 2, digits.pct = 0, test = FALSE)) ``` ## 16. Use case-weights for creating summary statistics When comparing groups, they are often unbalanced when it comes to nuisances such as age and sex. The `tableby` function allows you to create weighted summary statistics. If this option us used then p-values are not calculated (`test=FALSE`). ```{r} ##create fake group that is not balanced by age/sex set.seed(200) mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)), sample(c('A','B'),replace=T, prob=c(.8,.4))) mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE) ## create weights based on agegp and sex distribution tab1 <- with(mockstudy,table(agegp, sex)) tab2 <- with(mockstudy, table(agegp, sex, fake_arm)) tab2 gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2 gpwts[gpwts>50] <- 30 ## apply weights to subjects index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$fake_arm, summary) ``` ```{r, results='asis', eval=ge330} orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE) summary(orig, title='No Case Weights used') tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts) summary(tab1, title='Case Weights used') ``` ## 17. Create your own p-value and add it to the table When using weighted summary statistics, it is often desirable to then show a p-value from a model that corresponds to the weighted analysis. It is possible to add your own p-value and modify the column title for that new p-value. Another use for this would be to add standardized differences or confidence intervals instead of a p-value. To add the p-value, you simply need to create a data frame and use the function `modpval.tableby()`. The first few columns in the data.frame are required: (1) the by-variable, (2) the strata value (if the table has a strata term), (3) the x-variable, and (4) the new p-value (or test statistic). Another optional column can be used to indicate what method was used to calculate the p-value. If you specify `use.pname=TRUE` then the column name indicating the p-value will be also be used in the tableby summary. ```{r, results='asis', eval=ge330} mypval <- data.frame( byvar = "fake_arm", variable = c('age','sex','Surv(fu.time/365, fu.stat)'), adj.pvalue = c(.953,.811,.01), method = c('Age/Sex adjusted model results') ) tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE) summary(tab2, title='Case Weights used, p-values added', pfootnote=TRUE) ``` ## 18. For two-level categorical variables or one-line numeric variables, simplify the output. If the `cat.simplify` option is set to `TRUE`, then only the second level of two-level categorical varialbes is shown. In the example below, `sex` has two levels, and "Female" is the second level, hence only the counts and percents for Female are shown. Similarly, "mdquality.s" was turned to a factor, and "1" is the second level, but since there are missings, the table ignores `cat.simplify` and displays all levels (since the output can no longer be displayed on one line). ```{r, results='asis'} table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE) summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality")) ``` Similarly, if `numeric.simplify` is set to `TRUE`, then any numerics which only have one row of summary statistics are simplified into a single row. Note again that `ast` has missing values and so is not simplified to a single row. ```{r results='asis'} summary(tableby(arm ~ age + ast, data = mockstudy, numeric.simplify=TRUE, numeric.stats=c("Nmiss", "meansd"))) ``` The in-formula functions to change which tests are run can also be used to specify these options for each variable at a time. ```{r results='asis'} summary(tableby(arm ~ anova(age, "meansd", numeric.simplify=TRUE) + chisq(sex, cat.simplify=TRUE), data = mockstudy)) ``` The `cat.simplify` and `ord.simplify` argument also accept the special string `"label"`, which appends the shown level's label to the overall label: ```{r results='asis'} summary(tableby(arm ~ sex, cat.simplify = "label", data = mockstudy)) ``` ## 19. Use `tableby` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. ``` \documentclass{article} \usepackage{longtable} \usepackage{pdfpages} \begin{document} \section{Read in Data} <>= require(arsenal) require(knitr) require(rmarkdown) data(mockstudy) tab1 <- tableby(arm~sex+age, data=mockstudy) @ \section{Convert Summary.Tableby to LaTeX} <>= capture.output(summary(tab1), file="Test.md") ## Convert R Markdown Table to LaTeX render("Test.md", pdf_document(keep_tex=TRUE)) @ \includepdf{Test.pdf} \end{document} ``` ## 20. Export `tableby` object to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. ```{r} tab1 <- summary(tableby(arm~sex+age, data=mockstudy), text = NULL) as.data.frame(tab1) # write.csv(tab1, '/my/path/here/my_table.csv') ``` ## 21. Write `tableby` object to a separate Word or HTML file ```{r eval = FALSE} ## write to an HTML document tab1 <- tableby(arm ~ sex + age, data=mockstudy) write2html(tab1, "~/trash.html") ## write to a Word document write2word(tab1, "~/trash.doc", title="My table in Word") ``` ## 22. Use `tableby` in R Shiny The easiest way to output a `tableby()` object in an R Shiny app is to use the `tableOutput()` UI in combination with the `renderTable()` server function and `as.data.frame(summary(tableby()))`: ```{r eval=FALSE} # A standalone shiny app library(shiny) library(arsenal) data(mockstudy) shinyApp( ui = fluidPage(tableOutput("table")), server = function(input, output) { output$table <- renderTable({ as.data.frame(summary(tableby(sex ~ age, data = mockstudy), text = "html")) }, sanitize.text.function = function(x) x) } ) ``` This can be especially powerful if you feed the selections from a `selectInput(multiple = TRUE)` into `formulize()` to make the table dynamic! ## 23. Use `tableby` in bookdown Since the backbone of `tableby()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.tableby()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(tableby(sex ~ age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` ## 24. Adjust `tableby` for multiple p-values The `padjust()` function is a new S3 generic piggybacking off of `p.adjust()`. It works on both `tableby` and `summary.tableby` objects: ```{r results='asis'} tab <- summary(tableby(sex ~ age + fu.time + bmi + mdquality.s, data = mockstudy)) tab padjust(tab, method = "bonferroni") ``` ## 25. Tabulate multiple endpoints You can now use `list()` on the left-hand side of `tableby()` to give multiple endpoints. ```{r results='asis'} summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy)) ``` To avoid confusion about which table is which endpoint, you can set `term.name=TRUE` in `summary()`. This takes the labels for each by-variable and puts them in the top-left of the table. ```{r results='asis'} summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy), term.name = TRUE) ``` ## 26. Tabulate data by a non-test group (strata) You can also specify a second grouping variable that doesn't get tested (but instead separates results): a *strata* variable. ```{r results='asis'} summary(tableby(list(sex, ps) ~ age + bmi, strata = arm, data = mockstudy)) ``` # Available Function Options ## Summary statistics The **default** summary statistics, by varible type, are: * `numeric.stats`: Continuous variables will show by default `Nmiss, meansd, range` * `cat.stats`: Categorical and factor variables will show by default `Nmiss, countpct` * `ordered.stats`: Ordered factors will show by default `Nmiss, countpct` * `surv.stats`: Survival variables will show by default `Nmiss, Nevents, medsurv` * `date.stats`: Date variables will show by default `Nmiss, median, range` There are a number of extra functions defined specifically for the tableby function. * `N`: a count of the number of observations for a particular group * `Nmiss`: only show the count of the number of missing values if there are some missing values * `Nmiss2`: always show a count of the number of missing values for a variable within each group * `meansd`: print the mean and standard deviation in the format `mean(sd)` * `meanse`: print the mean and standard error in the format `mean(se)` * `meanCI`: print the mean and a (t) confidence interval * `count`: print the number of values in a category * `countN`: print the number of values in a category plus the total N for the group in the format `N/Total` * `countpct`: print the number of values in a category plus the column-percentage in the format `N (%)` * `countrowpct`: print the number of values in a category plus the row-percentage in the format `N (%)` * `countcellpct`: print the number of values in a category plus the cell-percentage in the format `N (%)` * `binomCI`: print the proportion in a category plus a binomial confidence interval. * `rowbinomCI`: print the row proportion in a category plus a binomial confidence interval. * `medianq1q3`: print the median, 25th, and 75th quantiles `median (Q1, Q3)` * `q1q3`: print the 25th and 75th quantiles `Q1, Q3` * `iqr`: print the inter-quartile range. * `medianrange`: print the median, minimum and maximum values `median (minimum, maximum)` * `medianmad`: print the median and median absolute deviation (mad) * `Nevents`: print number of events for a survival object within each grouping level * `medSurv`: print the median survival * `NeventsSurv`: print number of events and survival at given times * `NriskSurv`: print the number still at risk and survival at given times * `Nrisk`: print the number still at risk at given times * `medTime`: print the median follow-up time * `sum` * `max` * `min` * `mean` * `sd` * `var` * `median` * `range` * `gmean`, `gsd`, `gmeansd`, `gmeanCI`: geometric means, sds, and confidence intervals. ## Testing options The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function. The following tests are accepted: * `anova`: analysis of variance test; the default test for continuous variables. When the grouping variable has two levels, it is equivalent to the two-sample t-test with equal variance. * `kwt`: Kruskal-Wallis test, optional test for continuous variables. When the grouping variable has two levels, it is equivalent to the Wilcoxon Rank Sum test. * `wt`: An explicit Wilcoxcon test. * `medtest`: Median test test, optional test for continuous variables. * `chisq`: chi-square goodness of fit test for equal counts of a categorical variable across categories; the default for categorical or factor variables * `fe`: Fisher's exact test for categorical variables; optional * `logrank`: log-rank test, the default test for time-to-event variables * `trend`: The `independence_test` function from the `coin` is used to test for trends. Whenthe grouping variable has two levels, it is equivalent to the Armitage trend test. This is the default for ordered factors * `notest`: Don't perform a test. ## `tableby.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `tableby.control` or in `summary.tableby`. ```{r} args(tableby.control) ``` ## `summary.tableby` settings The `summary.tableby` function has options that modify how the table appears (such as adding a title or modifying labels). ```{r} args(arsenal:::summary.tableby) ``` arsenal/inst/doc/labels.Rmd0000644000176200001440000000717513656527336015361 0ustar liggesusers--- title: "A Few Notes on Labels" author: "Ethan Heinzen" output: rmarkdown::html_vignette: toc: true vignette: | %\VignetteIndexEntry{A Few Notes on Labels} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = FALSE} knitr::opts_chunk$set(message = FALSE, results = 'asis') ``` # Introduction The `arsenal` package relies somewhat heavily on variable labels to make output more "pretty". A `label` here is understood to be a single character string with "pretty" text (i.e., not an "ugly" variable name). Three of the main `arsenal` function use labels in their `summary()` output. There are several ways to set these labels. We'll use the `mockstudy` dataset for all examples here: ```{r} library(arsenal) data(mockstudy) library(magrittr) # for 'freqlist' examples tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA="ifany") ``` # Examples ## Set labels in the function call The `summary()` method for `tableby()`, `modelsum()`, and `freqlist()` objects contains a `labelTranslations = ` argument to specify labels in the function call. Note that the `freqlist()` function matches labels in order, whereas the other two match labels by name. The labels can be input as a list or a character vector. ```{r} summary(freqlist(tab.ex), labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL")) summary(tableby(arm ~ sex + age, data = mockstudy), labelTranslations = c(sex = "SEX", age = "Age, yrs")) summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = list(sexFemale = "Female", age = "Age, yrs")) ``` ## Modify labels after the fact Another option is to add labels after you have created the object. To do this, you can use the form `labels(x) <- value` or use the pipe-able version, `set_labels()`. ```{r} # the non-pipe version; somewhat clunky tmp <- freqlist(tab.ex) labels(tmp) <- c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL") summary(tmp) # piped--much cleaner mockstudy %>% tableby(arm ~ sex + age, data = .) %>% set_labels(c(sex = "SEX", age = "Age, yrs")) %>% summary() mockstudy %>% modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% set_labels(list(sexFemale = "Female", age = "Age, yrs")) %>% summary() ``` ## Add labels to a `data.frame` `tableby()` and `modelsum()` also allow you to have label attributes on the data. Note that by default these attributes usually get dropped upon subsetting, but `tableby()` and `modelsum()` use the `keep.labels()` function to retain them. ```{r} mockstudy.lab <- keep.labels(mockstudy) class(mockstudy$age) class(mockstudy.lab$age) ``` To undo this, simply `loosen.labels()`: ```{r} class(loosen.labels(mockstudy.lab)$age) ``` You can set attributes one at a time in two ways: ```{r} attr(mockstudy.lab$sex, "label") <- "Sex" labels(mockstudy.lab$age) <- "Age, yrs" ``` ...or all at once: ```{r} labels(mockstudy.lab) <- list(sex = "Sex", age = "Age, yrs") summary(tableby(arm ~ sex + age, data = mockstudy.lab)) ``` You can pipe this, too. ```{r} mockstudy %>% set_labels(list(sex = "SEX", age = "Age, yrs")) %>% modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% summary() ``` To extract labels from a `data.frame`, simply use the `labels()` function: ```{r results='markdown'} labels(mockstudy.lab) ``` ## When labels get long `tableby()` and `modelsum()` both support the wrapping of long labels. Consider the `width=` argument in the `print()` function: ```{r} mockstudy %>% set_labels(list(age = "This is a really long label for the arm variable")) %>% tableby(sex ~ age, data = .) %>% summary() %>% print(width = 20) ``` arsenal/inst/doc/tableby.html0000644000176200001440000075406314056514664015763 0ustar liggesusers The tableby function

    The tableby function

    Beth Atkinson, Ethan Heinzen, Jason Sinnwell, Shannon McDonnell and Greg Dougherty

    Introduction

    One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm). Locally at Mayo, the SAS macros %table and %summary were written to create summary tables with a single call. With the increasing interest in R, we have developed the function tableby to create similar tables within the R environment.

    In developing the tableby() function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.

    This report provides step-by-step directions for using the functions associated with tableby(). All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R Markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

    Simple Example

    The first step when using the tableby function is to load the arsenal package. All the examples in this report use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

    library(arsenal)
    require(knitr)
    require(survival)
    data(mockstudy) ##load data
    dim(mockstudy)  ##look at how many subjects and variables are in the dataset 
    ## [1] 1499   14
    # help(mockstudy) ##learn more about the dataset and variables
    str(mockstudy) ##quick look at the data
    ## 'data.frame':    1499 obs. of  14 variables:
    ##  $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
    ##  $ age        : int  67 74 50 71 69 56 50 57 51 63 ...
    ##   ..- attr(*, "label")= chr "Age in Years"
    ##  $ arm        : chr  "F: FOLFOX" "A: IFL" "A: IFL" "G: IROX" ...
    ##   ..- attr(*, "label")= chr "Treatment Arm"
    ##  $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
    ##  $ race       : chr  "Caucasian" "Caucasian" "Caucasian" "Caucasian" ...
    ##   ..- attr(*, "label")= chr "Race"
    ##  $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
    ##  $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
    ##  $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
    ##  $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
    ##  $ bmi        : num  25.1 19.5 NA 29.4 26.4 ...
    ##   ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
    ##  $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
    ##  $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
    ##  $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
    ##  $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...

    To create a simple table stratified by treatment arm, use a formula statement to specify the variables that you want summarized. The example below uses age (a continuous variable) and sex (a factor).

    tab1 <- tableby(arm ~ sex + age, data=mockstudy)

    If you want to take a quick look at the table, you can use summary() on your tableby object and the table will print out as text in your R console window. If you use summary() without any options you will see a number of \(\&nbsp;\) statements which translates to “space” in HTML.

    Pretty text version of table

    If you want a nicer version in your console window then add the text=TRUE option.

    summary(tab1, text=TRUE)
    ## 
    ## 
    ## |             | A: IFL (N=428)  | F: FOLFOX (N=691) | G: IROX (N=380) | Total (N=1499)  | p value|
    ## |:------------|:---------------:|:-----------------:|:---------------:|:---------------:|-------:|
    ## |sex          |                 |                   |                 |                 |   0.190|
    ## |-  Male      |   277 (64.7%)   |    411 (59.5%)    |   228 (60.0%)   |   916 (61.1%)   |        |
    ## |-  Female    |   151 (35.3%)   |    280 (40.5%)    |   152 (40.0%)   |   583 (38.9%)   |        |
    ## |Age in Years |                 |                   |                 |                 |   0.614|
    ## |-  Mean (SD) | 59.673 (11.365) |  60.301 (11.632)  | 59.763 (11.499) | 59.985 (11.519) |        |
    ## |-  Range     | 27.000 - 88.000 |  19.000 - 88.000  | 26.000 - 85.000 | 19.000 - 88.000 |        |

    Pretty Rmarkdown version of table

    In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the R chunk. This changes the layout slightly (compresses it) and bolds the variable names.

    summary(tab1)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    sex 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age in Years 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    Data frame version of table

    If you want a data.frame version, simply use as.data.frame.

    as.data.frame(tab1)
    ##   group.term   group.label strata.term variable     term        label variable.type
    ## 1        arm Treatment Arm                  sex      sex          sex   categorical
    ## 2        arm Treatment Arm                  sex countpct         Male   categorical
    ## 3        arm Treatment Arm                  sex countpct       Female   categorical
    ## 4        arm Treatment Arm                  age      age Age in Years       numeric
    ## 5        arm Treatment Arm                  age   meansd    Mean (SD)       numeric
    ## 6        arm Treatment Arm                  age    range        Range       numeric
    ##                A: IFL           F: FOLFOX            G: IROX              Total
    ## 1                                                                              
    ## 2 277.00000, 64.71963 411.00000, 59.47902            228, 60  916.0000, 61.1074
    ## 3 151.00000, 35.28037 280.00000, 40.52098            152, 40  583.0000, 38.8926
    ## 4                                                                              
    ## 5  59.67290, 11.36454  60.30101, 11.63225 59.76316, 11.49930 59.98532, 11.51877
    ## 6              27, 88              19, 88             26, 85             19, 88
    ##                         test   p.value
    ## 1 Pearson's Chi-squared test 0.1904388
    ## 2 Pearson's Chi-squared test 0.1904388
    ## 3 Pearson's Chi-squared test 0.1904388
    ## 4         Linear Model ANOVA 0.6143859
    ## 5         Linear Model ANOVA 0.6143859
    ## 6         Linear Model ANOVA 0.6143859

    Summaries using standard R code

    ## base R frequency example
    tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm)
    tmp
    ##         Study Arm
    ## Gender   A: IFL F: FOLFOX G: IROX
    ##   Male      277       411     228
    ##   Female    151       280     152
    # Note: The continuity correction is applied by default in R (not used in %table)
    chisq.test(tmp)
    ## 
    ##  Pearson's Chi-squared test
    ## 
    ## data:  tmp
    ## X-squared = 3.3168, df = 2, p-value = 0.1904
    ## base R numeric summary example
    tapply(mockstudy$age, mockstudy$arm, summary)
    ## $`A: IFL`
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   27.00   53.00   61.00   59.67   68.00   88.00 
    ## 
    ## $`F: FOLFOX`
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##    19.0    52.0    61.0    60.3    69.0    88.0 
    ## 
    ## $`G: IROX`
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   26.00   52.00   61.00   59.76   68.00   85.00
    summary(aov(age ~ arm, data=mockstudy))
    ##               Df Sum Sq Mean Sq F value Pr(>F)
    ## arm            2    129    64.7   0.487  0.614
    ## Residuals   1496 198628   132.8

    Modifying Output

    Add labels

    In the above example, age is shown with a label (Age in Years), but sex is listed “as is” with lower case letters. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.

    ## Look at one variable's label
    attr(mockstudy$age,'label')
    ## [1] "Age in Years"
    ## See all the variables with a label
    unlist(lapply(mockstudy,'attr','label'))
    ##                        age                        arm                       race 
    ##             "Age in Years"            "Treatment Arm"                     "Race" 
    ##                        bmi 
    ## "Body Mass Index (kg/m^2)"
    # Can also use labels(mockstudy)

    If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.

    attr(mockstudy$sex,'label')  <- 'Gender'
    
    tab1 <- tableby(arm ~ sex + age, data=mockstudy)
    summary(tab1)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age in Years 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    You can also use the built-in data.frame method for labels<-:

    labels(mockstudy)  <- c(age = 'Age, yrs', sex = "Gender")
    
    tab1 <- tableby(arm ~ sex + age, data=mockstudy)
    summary(tab1)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    Another option is to add labels after you have created the table

    mylabels <- list(sex = "SEX", age = "Age, yrs")
    summary(tab1, labelTranslations = mylabels)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    SEX 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the tableby object.

    labels(tab1)
    ##             arm             sex             age 
    ## "Treatment Arm"        "Gender"      "Age, yrs"
    labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)")
    labels(tab1)
    ##                    arm                    sex                    age 
    ## "Treatment Assignment"               "Gender"   "Baseline Age (yrs)"
    summary(tab1)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Baseline Age (yrs) 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    Change summary statistics globally

    Currently the default behavior is to summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value. For categorical variables the default is to show: Number of missing values and count (column percent) with a chi-square p-value. This behavior can be modified using the tableby.control function. In fact, you can save your standard settings and use that for future tables. Note that test=FALSE and total=FALSE results in the total column and p-value column not being printed.

    mycontrols  <- tableby.control(test=FALSE, total=FALSE,
                                   numeric.test="kwt", cat.test="chisq",
                                   numeric.stats=c("N", "median", "q1q3"),
                                   cat.stats=c("countpct"),
                                   stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
    tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols)
    summary(tab2)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)
    Gender
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%)
    Age, yrs
       Count 428 691 380
       Median 61.000 61.000 61.000
       Q1,Q3 53.000, 68.000 52.000, 69.000 52.000, 68.000

    You can also change these settings directly in the tableby call.

    tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, 
                    numeric.stats=c("median","q1q3"), numeric.test="kwt")
    summary(tab3)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)
    Gender
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%)
    Age, yrs
       Median 61.000 61.000 61.000
       Q1, Q3 53.000, 68.000 52.000, 69.000 52.000, 68.000

    Change summary statistics within the formula

    In addition to modifying summary options globally, it is possible to modify the test and summary statistics for specific variables within the formula statement. For example, both the kwt (Kruskal-Wallis rank-based) and anova (asymptotic analysis of variance) tests apply to numeric variables, and we can use one for the variable “age”, another for the variable “bmi”, and no test for the variable “ast”. A list of all the options is shown at the end of the vignette.

    The tests function can do a quick check on what tests were performed on each variable in tableby.

    tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + notest(ast), data=mockstudy)
    tests(tab.test)
    ##           Group Variable   p.value                       Method
    ## 1 Treatment Arm      age 0.6390614 Kruskal-Wallis rank sum test
    ## 2 Treatment Arm      bmi 0.8916552           Linear Model ANOVA
    ## 3 Treatment Arm      ast        NA                      No test
    summary(tab.test)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.639
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.892
       N-Miss 9 20 4 33
       Mean (SD) 27.290 (5.552) 27.210 (5.173) 27.106 (5.751) 27.206 (5.432)
       Range 14.053 - 53.008 16.649 - 49.130 15.430 - 60.243 14.053 - 60.243
    ast
       N-Miss 69 141 56 266
       Mean (SD) 37.292 (28.036) 35.202 (26.659) 35.670 (25.807) 35.933 (26.843)
       Range 10.000 - 205.000 7.000 - 174.000 5.000 - 176.000 5.000 - 205.000

    Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function. The function names must be strings that are functions already written for tableby, built-in R functions like mean and range, or user-defined functions.

    tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") +
                        notest(bmi, "Nmiss","median"), data=mockstudy)
    summary(tab.test)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    ast 0.039
       N-Miss 69 141 56 266
       Median 29.000 25.500 27.000 27.000
    Age, yrs 0.614
       N 428 691 380 1499
       Mean 59.673 60.301 59.763 59.985
    Body Mass Index (kg/m^2)
       N-Miss 9 20 4 33
       Median 26.234 26.525 25.978 26.325

    Controlling Options for Categorical Tests (Chisq and Fisher’s)

    The formal tests for categorical variables against the levels of the by variable, chisq and fe, have options to simulate p-values. We show how to turn on the simulations for these with 500 replicates for the Fisher’s test (fe).

    set.seed(100)
    tab.catsim <- tableby(arm ~ sex + race, cat.test="fe", simulate.p.value=TRUE, B=500, data=mockstudy)
    tests(tab.catsim)
          Group Variable   p.value

    1 Treatment Arm sex 0.2195609 2 Treatment Arm race 0.3093812 Method 1 Fisher’s Exact Test for Count Data with simulated p-valuebased on 500 replicates) 2 Fisher’s Exact Test for Count Data with simulated p-valuebased on 500 replicates)

    The chi-square test on 2x2 tables applies Yates’ continuity correction by default, so we provide an option to turn off the correction. We show the results with and without the correction that is applied to treatment arm by sex, if we use subset to ignore one of the three treatment arms.

    cat.correct <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm), data=mockstudy)
    tests(cat.correct)
          Group Variable   p.value                     Method

    1 Treatment Arm sex 0.1666280 Pearson’s Chi-squared test 2 Treatment Arm race 0.8108543 Pearson’s Chi-squared test

    cat.nocorrect <- tableby(arm ~ sex + race, cat.test="chisq", subset = !grepl("^F", arm),
         chisq.correct=FALSE, data=mockstudy)
    tests(cat.nocorrect)
          Group Variable   p.value                     Method

    1 Treatment Arm sex 0.1666280 Pearson’s Chi-squared test 2 Treatment Arm race 0.8108543 Pearson’s Chi-squared test

    Modifying the look & feel in Word documents

    You can easily create Word versions of tableby output via an Rmarkdown report and the default options will give you a reasonable table in Word - just select the “Knit Word” option in RStudio.

    The functionality listed in this next paragraph is coming soon but needs an upgraded version of RStudio If you want to modify fonts used for the table, then you’ll need to add an extra line to your header at the beginning of your file. You can take the WordStylesReference01.docx file and modify the fonts (storing the format preferences in your project directory). To see how this works, run your report once using WordStylesReference01.docx and then WordStylesReference02.docx.

    output: word_document
      reference_docx: /projects/bsi/gentools/R/lib320/arsenal/doc/WordStylesReference01.docx 

    For more information on changing the look/feel of your Word document, see the Rmarkdown documentation website.

    Additional Examples

    Here are multiple examples showing how to use some of the different options.

    1. Summarize without a group/by variable

    tab.noby <- tableby(~ bmi + sex + age, data=mockstudy)
    summary(tab.noby)
    Overall (N=1499)
    Body Mass Index (kg/m^2)
       N-Miss 33
       Mean (SD) 27.206 (5.432)
       Range 14.053 - 60.243
    Gender
       Male 916 (61.1%)
       Female 583 (38.9%)
    Age, yrs
       Mean (SD) 59.985 (11.519)
       Range 19.000 - 88.000

    2. Display footnotes indicating which “test” was used

    summary(tab.test, pfootnote=TRUE)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    ast 0.0391
       N-Miss 69 141 56 266
       Median 29.000 25.500 27.000 27.000
    Age, yrs 0.6142
       N 428 691 380 1499
       Mean 59.673 60.301 59.763 59.985
    Body Mass Index (kg/m^2)
       N-Miss 9 20 4 33
       Median 26.234 26.525 25.978 26.325
    1. Kruskal-Wallis rank sum test
    2. Linear Model ANOVA

    3. Summarize an ordered factor

    When comparing groups of ordered data there are a couple of options. The default uses a general independence test available from the coin package. For two-group comparisons, this is essentially the Armitage trend test. The other option is to specify the Kruskal Wallis test. The example below shows both options.

    mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)])))
    table(mockstudy$age.ord, mockstudy$sex)
    ##        
    ##         Male Female
    ##   10-19    1      0
    ##   20-29    8     11
    ##   30-39   37     30
    ##   40-49  127     83
    ##   50-59  257    179
    ##   60-69  298    170
    ##   70-79  168    101
    ##   80-89   20      9
    table(mockstudy$age.ordnew, mockstudy$sex)
    ##        
    ##         Male Female
    ##   10-19    1      0
    ##   20-29    8     11
    ##   30-39   37     30
    ##   40-49  127     83
    ##   50-59  257    179
    ##   60-69  297    170
    ##   70-79  168    100
    ##   80-89   20      9
    ##   a        1      0
    class(mockstudy$age.ord)
    ## [1] "ordered" "factor"
    summary(tableby(sex ~ age.ordnew, data = mockstudy), pfootnote = TRUE)
    Male (N=916) Female (N=583) Total (N=1499) p value
    age.ordnew 0.0401
       N-Miss 0 1 1
       10-19 1 (0.1%) 0 (0.0%) 1 (0.1%)
       20-29 8 (0.9%) 11 (1.9%) 19 (1.3%)
       30-39 37 (4.0%) 30 (5.2%) 67 (4.5%)
       40-49 127 (13.9%) 83 (14.3%) 210 (14.0%)
       50-59 257 (28.1%) 179 (30.8%) 436 (29.1%)
       60-69 297 (32.4%) 170 (29.2%) 467 (31.2%)
       70-79 168 (18.3%) 100 (17.2%) 268 (17.9%)
       80-89 20 (2.2%) 9 (1.5%) 29 (1.9%)
       a 1 (0.1%) 0 (0.0%) 1 (0.1%)
    1. Trend test for ordinal variables
    summary(tableby(sex ~ age.ord, data = mockstudy), pfootnote = TRUE)
    Male (N=916) Female (N=583) Total (N=1499) p value
    age.ord 0.0491
       10-19 1 (0.1%) 0 (0.0%) 1 (0.1%)
       20-29 8 (0.9%) 11 (1.9%) 19 (1.3%)
       30-39 37 (4.0%) 30 (5.1%) 67 (4.5%)
       40-49 127 (13.9%) 83 (14.2%) 210 (14.0%)
       50-59 257 (28.1%) 179 (30.7%) 436 (29.1%)
       60-69 298 (32.5%) 170 (29.2%) 468 (31.2%)
       70-79 168 (18.3%) 101 (17.3%) 269 (17.9%)
       80-89 20 (2.2%) 9 (1.5%) 29 (1.9%)
    1. Trend test for ordinal variables

    4. Summarize a survival variable

    First look at the information that is presented by the survfit() function, then see how the same results can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%).

    survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy)
    ## Call: survfit(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
    ## 
    ##              n events median 0.95LCL 0.95UCL
    ## sex=Male   916    829    550     515     590
    ## sex=Female 583    527    543     511     575
    survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy)
    ## Call:
    ## survdiff(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
    ## 
    ##              N Observed Expected (O-E)^2/E (O-E)^2/V
    ## sex=Male   916      829      830  0.000370  0.000956
    ## sex=Female 583      527      526  0.000583  0.000956
    ## 
    ##  Chisq= 0  on 1 degrees of freedom, p= 1
    summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy))
    Male (N=916) Female (N=583) Total (N=1499) p value
    Surv(fu.time, fu.stat) 0.975
       Events 829 527 1356
       Median Survival 550.000 543.000 546.000

    It is also possible to obtain summaries of the % survival at certain time points (say the probability of surviving 1-year).

    summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5)
    ## Call: survfit(formula = Surv(fu.time/365.25, fu.stat) ~ sex, data = mockstudy)
    ## 
    ##                 sex=Male 
    ##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
    ##     1    626     286   0.6870  0.0153       0.6576       0.7177
    ##     2    309     311   0.3437  0.0158       0.3142       0.3761
    ##     3    152     151   0.1748  0.0127       0.1516       0.2015
    ##     4     57      61   0.0941  0.0104       0.0759       0.1168
    ##     5     24      16   0.0628  0.0095       0.0467       0.0844
    ## 
    ##                 sex=Female 
    ##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
    ##     1    380     202   0.6531  0.0197       0.6155        0.693
    ##     2    190     189   0.3277  0.0195       0.2917        0.368
    ##     3     95      90   0.1701  0.0157       0.1420        0.204
    ##     4     51      32   0.1093  0.0133       0.0861        0.139
    ##     5     18      12   0.0745  0.0126       0.0534        0.104
    summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv")))
    Male (N=916) Female (N=583) Total (N=1499) p value
    Surv(fu.time/365.25, fu.stat) 0.975
       time = 1 286 (68.7) 202 (65.3) 488 (67.4)
       time = 2 597 (34.4) 391 (32.8) 988 (33.7)
       time = 3 748 (17.5) 481 (17.0) 1229 (17.3)
       time = 4 809 (9.4) 513 (10.9) 1322 (10.1)
       time = 5 825 (6.3) 525 (7.4) 1350 (6.8)
       time = 1 626 (68.7) 380 (65.3) 1006 (67.4)
       time = 2 309 (34.4) 190 (32.8) 499 (33.7)
       time = 3 152 (17.5) 95 (17.0) 247 (17.3)
       time = 4 57 (9.4) 51 (10.9) 108 (10.1)
       time = 5 24 (6.3) 18 (7.4) 42 (6.8)

    5. Summarize date variables

    Date variables by default are summarized with the number of missing values, the median, and the range. For example purposes we’ve created a random date. Missing values are introduced for impossible February dates.

    set.seed(100)
    N <- nrow(mockstudy)
    mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), 
                                  year=sample(2005:2009,N,replace=T))
    summary(tableby(sex ~ dtentry, data=mockstudy))
    Male (N=916) Female (N=583) Total (N=1499) p value
    dtentry 0.661
       N-Miss 2 3 5
       Median 2007-05-25 2007-05-08 2007-05-22
       Range 2005-01-02 - 2009-12-28 2005-01-01 - 2009-12-25 2005-01-01 - 2009-12-28

    6. Summarize multiple variables without typing them out

    Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste command with the collapse="+" option.

    ## create a vector specifying the variable names
    myvars <- names(mockstudy)
    
    ## select the 8th through the last variables
    ## paste them together, separated by the + sign
    RHS <- paste(myvars[8:10], collapse="+")
    RHS

    [1] “ps+hgb+bmi”

    ## create a formula using the as.formula function
    as.formula(paste('arm ~ ', RHS))

    arm ~ ps + hgb + bmi

    ## use the formula in the tableby function
    summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    ps 0.903
       N-Miss 69 141 56 266
       Mean (SD) 0.529 (0.597) 0.547 (0.595) 0.537 (0.606) 0.539 (0.598)
       Range 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000
    hgb 0.639
       N-Miss 69 141 56 266
       Mean (SD) 12.276 (1.686) 12.381 (1.763) 12.373 (1.680) 12.348 (1.719)
       Range 9.060 - 17.300 9.000 - 18.200 9.000 - 17.000 9.000 - 18.200
    Body Mass Index (kg/m^2) 0.892
       N-Miss 9 20 4 33
       Mean (SD) 27.290 (5.552) 27.210 (5.173) 27.106 (5.751) 27.206 (5.432)
       Range 14.053 - 53.008 16.649 - 49.130 15.430 - 60.243 14.053 - 60.243

    These steps can also be done using the formulize function.

    ## The formulize function does the paste and as.formula steps
    tmp <- formulize('arm',myvars[8:10])
    tmp

    arm ~ ps + hgb + bmi

    ## More complex formulas could also be written using formulize
    tmp2 <- formulize('arm',c('ps','hgb^2','bmi'))
    
    ## use the formula in the tableby function
    summary(tableby(tmp, data=mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    ps 0.903
       N-Miss 69 141 56 266
       Mean (SD) 0.529 (0.597) 0.547 (0.595) 0.537 (0.606) 0.539 (0.598)
       Range 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000
    hgb 0.639
       N-Miss 69 141 56 266
       Mean (SD) 12.276 (1.686) 12.381 (1.763) 12.373 (1.680) 12.348 (1.719)
       Range 9.060 - 17.300 9.000 - 18.200 9.000 - 17.000 9.000 - 18.200
    Body Mass Index (kg/m^2) 0.892
       N-Miss 9 20 4 33
       Mean (SD) 27.290 (5.552) 27.210 (5.173) 27.106 (5.751) 27.206 (5.432)
       Range 14.053 - 53.008 16.649 - 49.130 15.430 - 60.243 14.053 - 60.243

    To change summary statistics or statistical tests en masse, consider using paste0() together with formulize():

    varlist1 <- c('age','sex','hgb')
    varlist2 <- paste0("anova(", c('bmi','alk.phos','ast'), ", 'meansd')")
    
    summary(tableby(formulize("arm", c(varlist1, varlist2)),
                    data = mockstudy, numeric.test = "kwt"), pfootnote = TRUE)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.6391
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    Gender 0.1902
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    hgb 0.5701
       N-Miss 69 141 56 266
       Mean (SD) 12.276 (1.686) 12.381 (1.763) 12.373 (1.680) 12.348 (1.719)
       Range 9.060 - 17.300 9.000 - 18.200 9.000 - 17.000 9.000 - 18.200
    Body Mass Index (kg/m^2) 0.8923
       Mean (SD) 27.290 (5.552) 27.210 (5.173) 27.106 (5.751) 27.206 (5.432)
    alk.phos 0.2263
       Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
    ast 0.5073
       Mean (SD) 37.292 (28.036) 35.202 (26.659) 35.670 (25.807) 35.933 (26.843)
    1. Kruskal-Wallis rank sum test
    2. Pearson’s Chi-squared test
    3. Linear Model ANOVA

    7. Subset the dataset used in the analysis

    Here are two ways to get the same result (limit the analysis to subjects age>5 and in the F: FOLFOX treatment group).

    • The first approach uses the subset function applied to the dataset mockstudy. This example also selects a subset of variables. The tableby function is then applied to this subsetted data.
    newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi))
    dim(mockstudy)
    ## [1] 1499   16
    table(mockstudy$arm)
    ## 
    ##    A: IFL F: FOLFOX   G: IROX 
    ##       428       691       380
    dim(newdata)
    ## [1] 557   4
    names(newdata)
    ## [1] "sex" "ps"  "hgb" "bmi"
    summary(tableby(sex ~ ., data=newdata))
    Male (N=333) Female (N=224) Total (N=557) p value
    ps 0.652
       N-Miss 64 44 108
       Mean (SD) 0.554 (0.600) 0.528 (0.602) 0.543 (0.600)
       Range 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000
    hgb < 0.001
       N-Miss 64 44 108
       Mean (SD) 12.720 (1.925) 12.063 (1.395) 12.457 (1.760)
       Range 9.000 - 18.200 9.100 - 15.900 9.000 - 18.200
    bmi 0.650
       N-Miss 9 6 15
       Mean (SD) 27.539 (4.780) 27.337 (5.508) 27.458 (5.081)
       Range 17.927 - 47.458 16.649 - 49.130 16.649 - 49.130
    • The second approach does the same analysis but uses the subset argument within tableby to subset the data.
    summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
    Male (N=333) Female (N=224) Total (N=557) p value
    ps 0.652
       N-Miss 64 44 108
       Mean (SD) 0.554 (0.600) 0.528 (0.602) 0.543 (0.600)
       Range 0.000 - 2.000 0.000 - 2.000 0.000 - 2.000
    hgb < 0.001
       N-Miss 64 44 108
       Mean (SD) 12.720 (1.925) 12.063 (1.395) 12.457 (1.760)
       Range 9.000 - 18.200 9.100 - 15.900 9.000 - 18.200
    Body Mass Index (kg/m^2) 0.650
       N-Miss 9 6 15
       Mean (SD) 27.539 (4.780) 27.337 (5.508) 27.458 (5.081)
       Range 17.927 - 47.458 16.649 - 49.130 16.649 - 49.130

    8. Create combinations of variables on the fly

    ## create a variable combining the levels of mdquality.s and sex
    with(mockstudy, table(interaction(mdquality.s,sex)))
    ## 
    ##   0.Male   1.Male 0.Female 1.Female 
    ##       77      686       47      437
    summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    interaction(mdquality.s, sex) 0.493
       N-Miss 55 156 41 252
       0.Male 29 (7.8%) 31 (5.8%) 17 (5.0%) 77 (6.2%)
       1.Male 214 (57.4%) 285 (53.3%) 187 (55.2%) 686 (55.0%)
       0.Female 12 (3.2%) 21 (3.9%) 14 (4.1%) 47 (3.8%)
       1.Female 118 (31.6%) 198 (37.0%) 121 (35.7%) 437 (35.0%)
    ## create a new grouping variable with combined levels of arm and sex
    summary(tableby(interaction(mdquality.s, sex) ~  age + bmi, data=mockstudy, subset=arm=="F: FOLFOX"))
    0.Male (N=31) 1.Male (N=285) 0.Female (N=21) 1.Female (N=198) Total (N=535) p value
    Age, yrs 0.190
       Mean (SD) 63.065 (11.702) 60.653 (11.833) 60.810 (10.103) 58.924 (11.366) 60.159 (11.612)
       Range 41.000 - 82.000 19.000 - 88.000 42.000 - 81.000 29.000 - 83.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.894
       N-Miss 0 6 1 5 12
       Mean (SD) 26.633 (5.094) 27.387 (4.704) 27.359 (4.899) 27.294 (5.671) 27.307 (5.100)
       Range 20.177 - 41.766 17.927 - 47.458 19.801 - 39.369 16.799 - 44.841 16.799 - 47.458

    9. Transform variables on the fly

    Certain transformations need to be surrounded by I() so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ * then surround the new variable by I().

    trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')),
                     data=mockstudy)
    summary(trans)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.614
       Mean (SD) 5.967 (1.136) 6.030 (1.163) 5.976 (1.150) 5.999 (1.152)
       Range 2.700 - 8.800 1.900 - 8.800 2.600 - 8.500 1.900 - 8.800
    Body Mass Index (kg/m^2) 0.811
       N-Miss 9 20 4 33
       Mean (SD) 3.287 (0.197) 3.286 (0.183) 3.279 (0.200) 3.285 (0.192)
       Range 2.643 - 3.970 2.812 - 3.894 2.736 - 4.098 2.643 - 4.098
    factor(mdquality.s, levels = 0:1, labels = c(“N”, “Y”)) 0.694
       N-Miss 55 156 41 252
       N 41 (11.0%) 52 (9.7%) 31 (9.1%) 124 (9.9%)
       Y 332 (89.0%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

    The labels for these variables aren’t exactly what we’d like, so we can change modify those after the fact. Instead of typing out the very long variable names, you can modify specific labels by position.

    labels(trans)
    ##                                                           arm 
    ##                                               "Treatment Arm" 
    ##                                                     I(age/10) 
    ##                                                    "Age, yrs" 
    ##                                                      log(bmi) 
    ##                                    "Body Mass Index (kg/m^2)" 
    ##       factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) 
    ## "factor(mdquality.s, levels = 0:1, labels = c(\"N\", \"Y\"))"
    labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality')
    labels(trans)
    ##                                                     arm 
    ##                                         "Treatment Arm" 
    ##                                               I(age/10) 
    ##                                        "Age per 10 yrs" 
    ##                                                log(bmi) 
    ##                                              "log(BMI)" 
    ## factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) 
    ##                                            "MD Quality"
    summary(trans)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age per 10 yrs 0.614
       Mean (SD) 5.967 (1.136) 6.030 (1.163) 5.976 (1.150) 5.999 (1.152)
       Range 2.700 - 8.800 1.900 - 8.800 2.600 - 8.500 1.900 - 8.800
    log(BMI) 0.811
       N-Miss 9 20 4 33
       Mean (SD) 3.287 (0.197) 3.286 (0.183) 3.279 (0.200) 3.285 (0.192)
       Range 2.643 - 3.970 2.812 - 3.894 2.736 - 4.098 2.643 - 4.098
    MD Quality 0.694
       N-Miss 55 156 41 252
       N 41 (11.0%) 52 (9.7%) 31 (9.1%) 124 (9.9%)
       Y 332 (89.0%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

    Note that if we had not changed mdquality.s to a factor, it would have been summarized as though it were a continuous variable.

    class(mockstudy$mdquality.s)

    [1] “integer”

    summary(tableby(arm~mdquality.s, data=mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    mdquality.s 0.695
       N-Miss 55 156 41 252
       Mean (SD) 0.890 (0.313) 0.903 (0.297) 0.909 (0.289) 0.901 (0.299)
       Range 0.000 - 1.000 0.000 - 1.000 0.000 - 1.000 0.000 - 1.000

    Another option would be to specify the test and summary statistics. In fact, if I had a set of variables coded 0/1 and that was all I was summarizing, then I could change the global option for continuous variables to use the chi-square test and show countpct.

    summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    mdquality.s 0.694
       N-Miss 55 156 41 252
       0 41 (11.0%) 52 (9.7%) 31 (9.1%) 124 (9.9%)
       1 332 (89.0%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

    10. Subsetting (change the ordering of the variables, delete a variable, sort by p-value, filter by p-value)

    mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy)
    mytab2 <- mytab[c('age','sex','alk.phos')]
    summary(mytab2)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    alk.phos 0.226
       N-Miss 69 141 56 266
       Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
       Range 11.000 - 858.000 10.000 - 1014.000 7.000 - 982.000 7.000 - 1014.000
    summary(mytab[c('age','sex')], digits = 2)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.614
       Mean (SD) 59.67 (11.36) 60.30 (11.63) 59.76 (11.50) 59.99 (11.52)
       Range 27.00 - 88.00 19.00 - 88.00 26.00 - 85.00 19.00 - 88.00
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    summary(mytab[c(3,1)], digits = 3)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    summary(sort(mytab, decreasing = TRUE))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    alk.phos 0.226
       N-Miss 69 141 56 266
       Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
       Range 11.000 - 858.000 10.000 - 1014.000 7.000 - 982.000 7.000 - 1014.000
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    summary(mytab[mytab < 0.5])
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    alk.phos 0.226
       N-Miss 69 141 56 266
       Mean (SD) 175.577 (128.608) 161.984 (121.978) 173.506 (138.564) 168.969 (128.492)
       Range 11.000 - 858.000 10.000 - 1014.000 7.000 - 982.000 7.000 - 1014.000
    head(mytab, 1) # can also use tail()

    tableby Object

    Function Call: tableby(formula = arm ~ sex + alk.phos + age, data = mockstudy)

    Variable(s): arm ~ sex

    11. Merge two tableby objects together

    It is possible to combine two tableby objects so that they print out together. Overlapping by-variables will have their x-variables concatenated, and (if all=TRUE) non-overlapping by-variables will have their tables printed separately.

    ## demographics
    tab1 <- tableby(arm ~ sex + age, data=mockstudy,
                    control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE))
    ## lab data
    tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy,
                    control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"),
                                            numeric.test="kwt", total=FALSE))
    tab12 <- merge(tab1, tab2)
    class(tab12)

    [1] “tableby” “arsenal_table”

    summary(tab12)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499)
    hgb 0.570
       N-Miss 69 141 56
       Median 12.100 12.200 12.400
       Q1, Q3 11.000, 13.450 11.100, 13.600 11.175, 13.625
    alk.phos 0.104
       N-Miss 69 141 56
       Median 133.000 116.000 122.000
       Q1, Q3 89.000, 217.000 85.000, 194.750 87.750, 210.250

    For tables with two different outcomes, consider the all=TRUE argument:

    summary(merge(
      tableby(sex ~ age, data = mockstudy),
      tableby(arm ~ bmi, data = mockstudy),
      all = TRUE
    ))
    Male (N=916) Female (N=583) Total (N=1499) p value
    Age, yrs 0.048
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Body Mass Index (kg/m^2) 0.892
       N-Miss 9 20 4 33
       Mean (SD) 27.290 (5.552) 27.210 (5.173) 27.106 (5.751) 27.206 (5.432)
       Range 14.053 - 53.008 16.649 - 49.130 15.430 - 60.243 14.053 - 60.243

    12. Add a title to the table

    When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

    t1 <- tableby(arm ~ sex + age, data=mockstudy)
    summary(t1, title='Demographics')
    Demographics
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    With multiple left-hand sides, you can pass a vector or list to determine labels for each table:

    summary(tableby(list(arm, sex) ~ age, data = mockstudy), title = c("arm table", "sex table"))
    arm table
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    sex table
    Male (N=916) Female (N=583) Total (N=1499) p value
    Age, yrs 0.048
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000

    13. Modify how missing values are displayed

    Depending on the report you are writing you have the following options:

    • Show how many subjects have each variable

    • Show how many subjects are missing each variable

    • Show how many subjects are missing each variable only if there are any missing values

    • Don’t indicate missing values at all

    ## look at how many missing values there are for each variable
    apply(is.na(mockstudy),2,sum)
    ##        case         age         arm         sex        race     fu.time     fu.stat          ps 
    ##           0           0           0           0           7           0           0         266 
    ##         hgb         bmi    alk.phos         ast mdquality.s     age.ord  age.ordnew     dtentry 
    ##         266          33         266         266         252           0           1           5
    ## Show how many subjects have each variable (non-missing)
    summary(tableby(sex ~ ast + age, data=mockstudy,
                    control=tableby.control(numeric.stats=c("N","median"), total=FALSE)))
    Male (N=916) Female (N=583) p value
    ast 0.921
       N 754 479
       Median 27.000 27.000
    Age, yrs 0.048
       N 916 583
       Median 61.000 60.000
    ## Always list the number of missing values
    summary(tableby(sex ~ ast + age, data=mockstudy,
                    control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE)))
    Male (N=916) Female (N=583) p value
    ast 0.921
       N-Miss 162 104
       Median 27.000 27.000
    Age, yrs 0.048
       N-Miss 0 0
       Median 61.000 60.000
    ## Only show the missing values if there are some (default)
    summary(tableby(sex ~ ast + age, data=mockstudy, 
                    control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE)))
    Male (N=916) Female (N=583) p value
    ast 0.921
       N-Miss 162 104
       Mean 35.873 36.029
    Age, yrs 0.048
       Mean 60.455 59.247
    ## Don't show N at all
    summary(tableby(sex ~ ast + age, data=mockstudy, 
                    control=tableby.control(numeric.stats=c("mean"),total=FALSE)))
    Male (N=916) Female (N=583) p value
    ast 0.921
       Mean 35.873 36.029
    Age, yrs 0.048
       Mean 60.455 59.247

    One might also consider the use of includeNA() to include NAs in the counts and percents for categorical variables.

    mockstudy$ps.cat <- factor(mockstudy$ps)
    attr(mockstudy$ps.cat, "label") <- "ps"
    summary(tableby(sex ~ includeNA(ps.cat), data = mockstudy, cat.stats = "countpct"))
    Male (N=916) Female (N=583) Total (N=1499) p value
    ps 0.354
       0 391 (42.7%) 244 (41.9%) 635 (42.4%)
       1 329 (35.9%) 202 (34.6%) 531 (35.4%)
       2 34 (3.7%) 33 (5.7%) 67 (4.5%)
       (Missing) 162 (17.7%) 104 (17.8%) 266 (17.7%)

    14. Modify the number of digits used

    Within tableby.control function there are 4 options for controlling the number of significant digits shown.

    • digits: controls the number of digits after the decimal place for continuous values

    • digits.count: controls the number of digits after the decimal point for counts

    • digits.pct: controls the number of digits after the decimal point for percents

    • digits.p: controls the number of digits after the decimal point for p-values

    summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.p=2, digits.pct=1)
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.19
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.61
       Mean (SD) 59.6729 (11.3645) 60.3010 (11.6323) 59.7632 (11.4993) 59.9853 (11.5188)
       Range 27.0000 - 88.0000 19.0000 - 88.0000 26.0000 - 85.0000 19.0000 - 88.0000
    fu.time < 0.01
       Mean (SD) 553.5841 (419.6065) 731.2460 (487.7443) 607.2421 (435.5092) 649.0841 (462.5109)
       Range 9.0000 - 2170.0000 0.0000 - 2472.0000 17.0000 - 2118.0000 0.0000 - 2472.0000

    With the exception of digits.p, all of these can be specified on a per-variable basis using the in-formula functions that specify which tests are run:

    summary(tableby(arm ~ chisq(sex, digits.pct=1) + anova(age, digits=4) +
                      anova(fu.time, digits = 1), data=mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.6729 (11.3645) 60.3010 (11.6323) 59.7632 (11.4993) 59.9853 (11.5188)
       Range 27.0000 - 88.0000 19.0000 - 88.0000 26.0000 - 85.0000 19.0000 - 88.0000
    fu.time < 0.001
       Mean (SD) 553.6 (419.6) 731.2 (487.7) 607.2 (435.5) 649.1 (462.5)
       Range 9.0 - 2170.0 0.0 - 2472.0 17.0 - 2118.0 0.0 - 2472.0

    15. Create a user-defined summary statistic

    For purposes of this example, the code below creates a trimmed mean function (trims 10%) and use that to summarize the data. Note the use of the ... which tells R to pass extra arguments on - this is required for user-defined functions. In this case, na.rm=T is passed to myfunc. The weights argument is also required, even though it isn’t passed on to the internal function in this particular example.

    trim10 <- function(x, weights=rep(1,length(x)), ...){
      mean(x, trim=.1, ...)
    }
    
    summary(tableby(sex ~ hgb, data=mockstudy, 
                    control=tableby.control(numeric.stats=c("Nmiss","trim10"), numeric.test="kwt",
                        stats.labels=list(Nmiss='Missing values', trim10="Trimmed Mean, 10%"))))
    Male (N=916) Female (N=583) Total (N=1499) p value
    hgb < 0.001
       Missing values 162 104 266
       Trimmed Mean, 10% 12.6 11.9 12.3

    For statistics to be formatted appropriately, you may want to use as.tbstat() or as.countpct(). For example, suppose you want to create a trimmed mean function that trims by both 5 and 10 percent. The first example shows them separated by a comma; the second puts the 10% trimmed mean in brackets

    trim510comma <- function(x, weights=rep(1,length(x)), ...){
      tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...))
      as.tbstat(tmp, sep = ", ")
    }
    trim510bracket <- function(x, weights=rep(1,length(x)), ...){
      tmp <- c(mean(x, trim = 0.05, ...), mean(x, trim = 0.1, ...))
      as.tbstat(tmp, sep = " ", parens = c("[", "]"))
    }
    
    summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510comma"), test = FALSE))
    Male (N=916) Female (N=583) Total (N=1499)
    hgb
       N-Miss 162 104 266
       trim510comma 12.570, 12.564 11.924, 11.910 12.308, 12.291
    summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim510bracket"), test = FALSE))
    Male (N=916) Female (N=583) Total (N=1499)
    hgb
       N-Miss 162 104 266
       trim510bracket 12.570 [12.564] 11.924 [11.910] 12.308 [12.291]

    Or perhaps it’s useful to put the amount of trimming in parentheses. Since it is a percent, we can flag it as such:

    trim10pct <- function(x, weights=rep(1,length(x)), ...){
      tmp <- mean(x, trim = 0.05, ...)
      as.countpct(c(tmp, 10), sep = " ", parens = c("(", ")"), which.count = 0, which.pct = 2, pct = "%")
    }
    summary(tableby(sex ~ hgb, data=mockstudy, numeric.stats=c("Nmiss", "trim10pct"),
                    digits = 2, digits.pct = 0, test = FALSE))
    Male (N=916) Female (N=583) Total (N=1499)
    hgb
       N-Miss 162 104 266
       trim10pct 12.57 (10%) 11.92 (10%) 12.31 (10%)

    16. Use case-weights for creating summary statistics

    When comparing groups, they are often unbalanced when it comes to nuisances such as age and sex. The tableby function allows you to create weighted summary statistics. If this option us used then p-values are not calculated (test=FALSE).

    ##create fake group that is not balanced by age/sex 
    set.seed(200)
    mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)),
                                sample(c('A','B'),replace=T, prob=c(.8,.4)))
    
    mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
    
    ## create weights based on agegp and sex distribution
    tab1 <- with(mockstudy,table(agegp, sex))
    tab2 <- with(mockstudy, table(agegp, sex, fake_arm))
    tab2
    ## , , fake_arm = A
    ## 
    ##          sex
    ## agegp     Male Female
    ##   [18,50)   73     62
    ##   [50,60)  128     94
    ##   [60,70)  139      7
    ##   [70,90)  102      0
    ## 
    ## , , fake_arm = B
    ## 
    ##          sex
    ## agegp     Male Female
    ##   [18,50)   79     48
    ##   [50,60)  130     84
    ##   [60,70)  156    166
    ##   [70,90)  109    122
    gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2
    gpwts[gpwts>50] <- 30
    
    ## apply weights to subjects
    index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) 
    mockstudy$wts <- gpwts[index]
    
    ## show weights by treatment arm group
    tapply(mockstudy$wts,mockstudy$fake_arm, summary)
    ## $A
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   1.774   1.894   2.069   2.276   2.082  24.714 
    ## 
    ## $B
    ##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    ##   1.000   1.042   1.924   1.677   1.985   2.292
    orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE)
    summary(orig, title='No Case Weights used')
    No Case Weights used
    A (N=605) B (N=894) Total (N=1499)
    Age, yrs
       Mean (SD) 57.413 (11.618) 61.726 (11.125) 59.985 (11.519)
       Range 22.000 - 85.000 19.000 - 88.000 19.000 - 88.000
    Gender
       Male 442 (73.1%) 474 (53.0%) 916 (61.1%)
       Female 163 (26.9%) 420 (47.0%) 583 (38.9%)
    Surv(fu.time/365, fu.stat)
       Events 554 802 1356
       Median Survival 1.504 1.493 1.496
    tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts)
    summary(tab1, title='Case Weights used')
    Case Weights used
    A (N=1377) B (N=1499) Total (N=2876)
    Age, yrs
       Mean (SD) 58.009 (10.925) 60.151 (11.428) 59.126 (11.235)
       Range 22.000 - 85.000 19.000 - 88.000 19.000 - 88.000
    Gender
       Male 916 (66.5%) 916 (61.1%) 1832 (63.7%)
       Female 461 (33.5%) 583 (38.9%) 1044 (36.3%)
    Surv(fu.time/365, fu.stat)
       Events 1252 1348 2599
       Median Survival 1.534 1.496 1.532

    17. Create your own p-value and add it to the table

    When using weighted summary statistics, it is often desirable to then show a p-value from a model that corresponds to the weighted analysis. It is possible to add your own p-value and modify the column title for that new p-value. Another use for this would be to add standardized differences or confidence intervals instead of a p-value.

    To add the p-value, you simply need to create a data frame and use the function modpval.tableby(). The first few columns in the data.frame are required: (1) the by-variable, (2) the strata value (if the table has a strata term), (3) the x-variable, and (4) the new p-value (or test statistic). Another optional column can be used to indicate what method was used to calculate the p-value. If you specify use.pname=TRUE then the column name indicating the p-value will be also be used in the tableby summary.

    mypval <- data.frame(
      byvar = "fake_arm",
      variable = c('age','sex','Surv(fu.time/365, fu.stat)'), 
      adj.pvalue = c(.953,.811,.01), 
      method = c('Age/Sex adjusted model results')
    )
    tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE)
    summary(tab2, title='Case Weights used, p-values added', pfootnote=TRUE)
    Case Weights used, p-values added
    A (N=1377) B (N=1499) Total (N=2876) adj.pvalue
    Age, yrs 0.9531
       Mean (SD) 58.009 (10.925) 60.151 (11.428) 59.126 (11.235)
       Range 22.000 - 85.000 19.000 - 88.000 19.000 - 88.000
    Gender 0.8111
       Male 916 (66.5%) 916 (61.1%) 1832 (63.7%)
       Female 461 (33.5%) 583 (38.9%) 1044 (36.3%)
    Surv(fu.time/365, fu.stat) 0.0101
       Events 1252 1348 2599
       Median Survival 1.534 1.496 1.532
    1. Age/Sex adjusted model results

    18. For two-level categorical variables or one-line numeric variables, simplify the output.

    If the cat.simplify option is set to TRUE, then only the second level of two-level categorical varialbes is shown. In the example below, sex has two levels, and “Female” is the second level, hence only the counts and percents for Female are shown. Similarly, “mdquality.s” was turned to a factor, and “1” is the second level, but since there are missings, the table ignores cat.simplify and displays all levels (since the output can no longer be displayed on one line).

    table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE)
    summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality"))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%) 0.190
    MD Quality 0.694
       N-Miss 55 156 41 252
       0 41 (11.0%) 52 (9.7%) 31 (9.1%) 124 (9.9%)
       1 332 (89.0%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

    Similarly, if numeric.simplify is set to TRUE, then any numerics which only have one row of summary statistics are simplified into a single row. Note again that ast has missing values and so is not simplified to a single row.

    summary(tableby(arm ~ age + ast, data = mockstudy,
                    numeric.simplify=TRUE, numeric.stats=c("Nmiss", "meansd")))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519) 0.614
    ast 0.507
       N-Miss 69 141 56 266
       Mean (SD) 37.292 (28.036) 35.202 (26.659) 35.670 (25.807) 35.933 (26.843)

    The in-formula functions to change which tests are run can also be used to specify these options for each variable at a time.

    summary(tableby(arm ~ anova(age, "meansd", numeric.simplify=TRUE) +
                      chisq(sex, cat.simplify=TRUE), data = mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Age, yrs 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519) 0.614
    Gender 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%) 0.190

    The cat.simplify and ord.simplify argument also accept the special string "label", which appends the shown level’s label to the overall label:

    summary(tableby(arm ~ sex, cat.simplify = "label", data = mockstudy))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Gender (Female) 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%) 0.190

    19. Use tableby within an Sweave document

    For those users who wish to create tables within an Sweave document, the following code seems to work.

    \documentclass{article}
    
    \usepackage{longtable}
    \usepackage{pdfpages}
    
    \begin{document}
    
    \section{Read in Data}
    <<echo=TRUE>>=
    require(arsenal)
    require(knitr)
    require(rmarkdown)
    data(mockstudy)
    
    tab1 <- tableby(arm~sex+age, data=mockstudy)
    @
    
    \section{Convert Summary.Tableby to LaTeX}
    <<echo=TRUE, results='hide', message=FALSE>>=
    capture.output(summary(tab1), file="Test.md")
    
    ## Convert R Markdown Table to LaTeX
    render("Test.md", pdf_document(keep_tex=TRUE))
    @ 
    
    \includepdf{Test.pdf}
    
    \end{document}

    20. Export tableby object to a .CSV file

    When looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame function creates a data frame object that can be exported or further manipulated within R.

    tab1 <- summary(tableby(arm~sex+age, data=mockstudy), text = NULL)
    as.data.frame(tab1)
    ##              A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)  Total (N=1499) p value
    ## 1    Gender                                                                     0.190
    ## 2      Male     277 (64.7%)       411 (59.5%)     228 (60.0%)     916 (61.1%)        
    ## 3    Female     151 (35.3%)       280 (40.5%)     152 (40.0%)     583 (38.9%)        
    ## 4  Age, yrs                                                                     0.614
    ## 5 Mean (SD) 59.673 (11.365)   60.301 (11.632) 59.763 (11.499) 59.985 (11.519)        
    ## 6     Range 27.000 - 88.000   19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    # write.csv(tab1, '/my/path/here/my_table.csv')

    21. Write tableby object to a separate Word or HTML file

    ## write to an HTML document
    tab1 <- tableby(arm ~ sex + age, data=mockstudy)
    write2html(tab1, "~/trash.html")
    
    ## write to a Word document
    write2word(tab1, "~/trash.doc", title="My table in Word")

    22. Use tableby in R Shiny

    The easiest way to output a tableby() object in an R Shiny app is to use the tableOutput() UI in combination with the renderTable() server function and as.data.frame(summary(tableby())):

    # A standalone shiny app
    library(shiny)
    library(arsenal)
    data(mockstudy)
    
    shinyApp(
      ui = fluidPage(tableOutput("table")),
      server = function(input, output) {
        output$table <- renderTable({
          as.data.frame(summary(tableby(sex ~ age, data = mockstudy), text = "html"))
        }, sanitize.text.function = function(x) x)
      }
    )

    This can be especially powerful if you feed the selections from a selectInput(multiple = TRUE) into formulize() to make the table dynamic!

    23. Use tableby in bookdown

    Since the backbone of tableby() is knitr::kable(), tables still render well in bookdown. However, print.summary.tableby() doesn’t use the caption= argument of kable(), so some tables may not have a properly numbered caption. To fix this, use the method described on the bookdown site to give the table a tag/ID.

    summary(tableby(sex ~ age, data = mockstudy), title="(\\#tab:mytableby) Caption here")

    24. Adjust tableby for multiple p-values

    The padjust() function is a new S3 generic piggybacking off of p.adjust(). It works on both tableby and summary.tableby objects:

    tab <- summary(tableby(sex ~ age + fu.time + bmi + mdquality.s, data = mockstudy))
    tab
    Male (N=916) Female (N=583) Total (N=1499) p value
    Age, yrs 0.048
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    fu.time 0.978
       Mean (SD) 649.345 (454.332) 648.674 (475.472) 649.084 (462.511)
       Range 0.000 - 2472.000 9.000 - 2441.000 0.000 - 2472.000
    Body Mass Index (kg/m^2) 0.012
       N-Miss 22 11 33
       Mean (SD) 27.491 (5.030) 26.760 (5.984) 27.206 (5.432)
       Range 14.053 - 60.243 15.430 - 53.008 14.053 - 60.243
    mdquality.s 0.827
       N-Miss 153 99 252
       Mean (SD) 0.899 (0.301) 0.903 (0.296) 0.901 (0.299)
       Range 0.000 - 1.000 0.000 - 1.000 0.000 - 1.000
    padjust(tab, method = "bonferroni")
    Male (N=916) Female (N=583) Total (N=1499) p value
    Age, yrs 0.191
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    fu.time 1.000
       Mean (SD) 649.345 (454.332) 648.674 (475.472) 649.084 (462.511)
       Range 0.000 - 2472.000 9.000 - 2441.000 0.000 - 2472.000
    Body Mass Index (kg/m^2) 0.048
       N-Miss 22 11 33
       Mean (SD) 27.491 (5.030) 26.760 (5.984) 27.206 (5.432)
       Range 14.053 - 60.243 15.430 - 53.008 14.053 - 60.243
    mdquality.s 1.000
       N-Miss 153 99 252
       Mean (SD) 0.899 (0.301) 0.903 (0.296) 0.901 (0.299)
       Range 0.000 - 1.000 0.000 - 1.000 0.000 - 1.000

    25. Tabulate multiple endpoints

    You can now use list() on the left-hand side of tableby() to give multiple endpoints.

    summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy))
    Male (N=916) Female (N=583) Total (N=1499) p value
    Age, yrs 0.048
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.012
       N-Miss 22 11 33
       Mean (SD) 27.491 (5.030) 26.760 (5.984) 27.206 (5.432)
       Range 14.053 - 60.243 15.430 - 53.008 14.053 - 60.243
    0 (N=124) 1 (N=1123) Total (N=1247) p value
    Age, yrs 0.766
       Mean (SD) 60.089 (11.627) 59.763 (11.537) 59.796 (11.542)
       Range 29.000 - 82.000 19.000 - 88.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.225
       N-Miss 3 18 21
       Mean (SD) 26.684 (6.331) 27.309 (5.274) 27.247 (5.388)
       Range 16.071 - 60.243 14.053 - 53.008 14.053 - 60.243
    0 (N=635) 1 (N=531) 2 (N=67) Total (N=1233) p value
    Age, yrs 0.335
       Mean (SD) 59.935 (11.261) 60.800 (11.721) 59.254 (12.090) 60.271 (11.507)
       Range 22.000 - 85.000 26.000 - 88.000 28.000 - 80.000 22.000 - 88.000
    Body Mass Index (kg/m^2) 0.028
       N-Miss 7 20 1 28
       Mean (SD) 27.539 (5.222) 26.842 (5.436) 26.178 (5.808) 27.169 (5.358)
       Range 14.053 - 48.384 15.430 - 60.243 16.071 - 44.922 14.053 - 60.243

    To avoid confusion about which table is which endpoint, you can set term.name=TRUE in summary(). This takes the labels for each by-variable and puts them in the top-left of the table.

    summary(tableby(list(sex, mdquality.s, ps) ~ age + bmi, data = mockstudy), term.name = TRUE)
    Gender Male (N=916) Female (N=583) Total (N=1499) p value
    Age, yrs 0.048
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.012
       N-Miss 22 11 33
       Mean (SD) 27.491 (5.030) 26.760 (5.984) 27.206 (5.432)
       Range 14.053 - 60.243 15.430 - 53.008 14.053 - 60.243
    mdquality.s 0 (N=124) 1 (N=1123) Total (N=1247) p value
    Age, yrs 0.766
       Mean (SD) 60.089 (11.627) 59.763 (11.537) 59.796 (11.542)
       Range 29.000 - 82.000 19.000 - 88.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.225
       N-Miss 3 18 21
       Mean (SD) 26.684 (6.331) 27.309 (5.274) 27.247 (5.388)
       Range 16.071 - 60.243 14.053 - 53.008 14.053 - 60.243
    ps 0 (N=635) 1 (N=531) 2 (N=67) Total (N=1233) p value
    Age, yrs 0.335
       Mean (SD) 59.935 (11.261) 60.800 (11.721) 59.254 (12.090) 60.271 (11.507)
       Range 22.000 - 85.000 26.000 - 88.000 28.000 - 80.000 22.000 - 88.000
    Body Mass Index (kg/m^2) 0.028
       N-Miss 7 20 1 28
       Mean (SD) 27.539 (5.222) 26.842 (5.436) 26.178 (5.808) 27.169 (5.358)
       Range 14.053 - 48.384 15.430 - 60.243 16.071 - 44.922 14.053 - 60.243

    26. Tabulate data by a non-test group (strata)

    You can also specify a second grouping variable that doesn’t get tested (but instead separates results): a strata variable.

    summary(tableby(list(sex, ps) ~ age + bmi, strata = arm, data = mockstudy))
    Treatment Arm Male (N=916) Female (N=583) Total (N=1499) p value
    A: IFL Age, yrs 0.572
       Mean (SD) 59.903 (11.347) 59.252 (11.422) 59.673 (11.365)
       Range 28.000 - 83.000 27.000 - 88.000 27.000 - 88.000
    Body Mass Index (kg/m^2) 0.050
       N-Miss 7 2 9
       Mean (SD) 27.685 (5.072) 26.575 (6.287) 27.290 (5.552)
       Range 14.053 - 48.384 16.880 - 53.008 14.053 - 53.008
    F: FOLFOX Age, yrs 0.286
       Mean (SD) 60.691 (11.598) 59.729 (11.679) 60.301 (11.632)
       Range 19.000 - 88.000 22.000 - 83.000 19.000 - 88.000
    Body Mass Index (kg/m^2) 0.768
       N-Miss 12 8 20
       Mean (SD) 27.259 (4.715) 27.139 (5.789) 27.210 (5.173)
       Range 17.927 - 47.458 16.649 - 49.130 16.649 - 49.130
    G: IROX Age, yrs 0.051
       Mean (SD) 60.702 (10.999) 58.355 (12.113) 59.763 (11.499)
       Range 29.000 - 85.000 26.000 - 82.000 26.000 - 85.000
    Body Mass Index (kg/m^2) 0.020
       N-Miss 3 1 4
       Mean (SD) 27.672 (5.505) 26.262 (6.021) 27.106 (5.751)
       Range 17.377 - 60.243 15.430 - 45.354 15.430 - 60.243
    Treatment Arm 0 (N=635) 1 (N=531) 2 (N=67) Total (N=1233) p value
    A: IFL Age, yrs 0.413
       Mean (SD) 60.101 (10.948) 60.579 (12.026) 56.842 (13.226) 60.131 (11.535)
       Range 27.000 - 81.000 28.000 - 88.000 34.000 - 75.000 27.000 - 88.000
    Body Mass Index (kg/m^2) 0.023
       N-Miss 1 6 1 8
       Mean (SD) 27.850 (5.318) 26.224 (5.347) 26.954 (5.560) 27.128 (5.385)
       Range 14.053 - 48.384 17.029 - 53.008 17.177 - 37.223 14.053 - 53.008
    F: FOLFOX Age, yrs 0.272
       Mean (SD) 60.173 (11.096) 61.342 (11.918) 63.138 (9.303) 60.845 (11.391)
       Range 22.000 - 82.000 26.000 - 88.000 44.000 - 80.000 22.000 - 88.000
    Body Mass Index (kg/m^2) 0.225
       N-Miss 5 11 0 16
       Mean (SD) 27.569 (5.004) 27.192 (5.248) 25.904 (5.338) 27.315 (5.134)
       Range 16.649 - 43.867 16.799 - 49.130 20.833 - 44.922 16.649 - 49.130
    G: IROX Age, yrs 0.312
       Mean (SD) 59.361 (11.904) 60.081 (11.037) 55.737 (13.523) 59.451 (11.653)
       Range 26.000 - 85.000 28.000 - 84.000 28.000 - 76.000 26.000 - 85.000
    Body Mass Index (kg/m^2) 0.642
       N-Miss 1 3 0 4
       Mean (SD) 27.143 (5.462) 26.910 (5.824) 25.861 (6.890) 26.970 (5.694)
       Range 17.615 - 46.204 15.430 - 60.243 16.071 - 44.734 15.430 - 60.243

    Available Function Options

    Summary statistics

    The default summary statistics, by varible type, are:

    • numeric.stats: Continuous variables will show by default Nmiss, meansd, range
    • cat.stats: Categorical and factor variables will show by default Nmiss, countpct
    • ordered.stats: Ordered factors will show by default Nmiss, countpct
    • surv.stats: Survival variables will show by default Nmiss, Nevents, medsurv
    • date.stats: Date variables will show by default Nmiss, median, range

    There are a number of extra functions defined specifically for the tableby function.

    • N: a count of the number of observations for a particular group
    • Nmiss: only show the count of the number of missing values if there are some missing values
    • Nmiss2: always show a count of the number of missing values for a variable within each group
    • meansd: print the mean and standard deviation in the format mean(sd)
    • meanse: print the mean and standard error in the format mean(se)
    • meanCI: print the mean and a (t) confidence interval
    • count: print the number of values in a category
    • countN: print the number of values in a category plus the total N for the group in the format N/Total
    • countpct: print the number of values in a category plus the column-percentage in the format N (%)
    • countrowpct: print the number of values in a category plus the row-percentage in the format N (%)
    • countcellpct: print the number of values in a category plus the cell-percentage in the format N (%)
    • binomCI: print the proportion in a category plus a binomial confidence interval.
    • rowbinomCI: print the row proportion in a category plus a binomial confidence interval.
    • medianq1q3: print the median, 25th, and 75th quantiles median (Q1, Q3)
    • q1q3: print the 25th and 75th quantiles Q1, Q3
    • iqr: print the inter-quartile range.
    • medianrange: print the median, minimum and maximum values median (minimum, maximum)
    • medianmad: print the median and median absolute deviation (mad)
    • Nevents: print number of events for a survival object within each grouping level
    • medSurv: print the median survival
    • NeventsSurv: print number of events and survival at given times
    • NriskSurv: print the number still at risk and survival at given times
    • Nrisk: print the number still at risk at given times
    • medTime: print the median follow-up time
    • sum
    • max
    • min
    • mean
    • sd
    • var
    • median
    • range
    • gmean, gsd, gmeansd, gmeanCI: geometric means, sds, and confidence intervals.

    Testing options

    The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

    The following tests are accepted:

    • anova: analysis of variance test; the default test for continuous variables. When the grouping variable has two levels, it is equivalent to the two-sample t-test with equal variance.

    • kwt: Kruskal-Wallis test, optional test for continuous variables. When the grouping variable has two levels, it is equivalent to the Wilcoxon Rank Sum test.

    • wt: An explicit Wilcoxcon test.

    • medtest: Median test test, optional test for continuous variables.

    • chisq: chi-square goodness of fit test for equal counts of a categorical variable across categories; the default for categorical or factor variables

    • fe: Fisher’s exact test for categorical variables; optional

    • logrank: log-rank test, the default test for time-to-event variables

    • trend: The independence_test function from the coin is used to test for trends. Whenthe grouping variable has two levels, it is equivalent to the Armitage trend test. This is the default for ordered factors

    • notest: Don’t perform a test.

    tableby.control settings

    A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in tableby.control or in summary.tableby.

    args(tableby.control)
    ## function (test = TRUE, total = TRUE, total.pos = c("after", "before"), 
    ##     test.pname = NULL, numeric.simplify = FALSE, cat.simplify = FALSE, 
    ##     cat.droplevels = FALSE, ordered.simplify = FALSE, date.simplify = FALSE, 
    ##     numeric.test = "anova", cat.test = "chisq", ordered.test = "trend", 
    ##     surv.test = "logrank", date.test = "kwt", selectall.test = "notest", 
    ##     test.always = FALSE, numeric.stats = c("Nmiss", "meansd", 
    ##         "range"), cat.stats = c("Nmiss", "countpct"), ordered.stats = c("Nmiss", 
    ##         "countpct"), surv.stats = c("Nmiss", "Nevents", "medSurv"), 
    ##     date.stats = c("Nmiss", "median", "range"), selectall.stats = c("Nmiss", 
    ##         "countpct"), stats.labels = list(), digits = 3L, digits.count = 0L, 
    ##     digits.pct = 1L, digits.p = 3L, format.p = TRUE, digits.n = 0L, 
    ##     conf.level = 0.95, wilcox.correct = FALSE, wilcox.exact = NULL, 
    ##     chisq.correct = FALSE, simulate.p.value = FALSE, B = 2000, 
    ##     times = 1:5, ...) 
    ## NULL

    summary.tableby settings

    The summary.tableby function has options that modify how the table appears (such as adding a title or modifying labels).

    args(arsenal:::summary.tableby)
    ## function (object, ..., labelTranslations = NULL, text = FALSE, 
    ##     title = NULL, pfootnote = FALSE, term.name = "") 
    ## NULL
    arsenal/inst/doc/labels.html0000644000176200001440000012473414056514635015575 0ustar liggesusers A Few Notes on Labels

    A Few Notes on Labels

    Ethan Heinzen

    Introduction

    The arsenal package relies somewhat heavily on variable labels to make output more “pretty”. A label here is understood to be a single character string with “pretty” text (i.e., not an “ugly” variable name). Three of the main arsenal function use labels in their summary() output. There are several ways to set these labels.

    We’ll use the mockstudy dataset for all examples here:

    library(arsenal)
    data(mockstudy)
    library(magrittr)
    ## Warning: package 'magrittr' was built under R version 4.0.2
    # for 'freqlist' examples
    tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA="ifany")

    Examples

    Set labels in the function call

    The summary() method for tableby(), modelsum(), and freqlist() objects contains a labelTranslations = argument to specify labels in the function call. Note that the freqlist() function matches labels in order, whereas the other two match labels by name. The labels can be input as a list or a character vector.

    summary(freqlist(tab.ex),
            labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL"))
    Treatment Arm Gender LASA QOL Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00
    summary(tableby(arm ~ sex + age, data = mockstudy),
            labelTranslations = c(sex = "SEX", age = "Age, yrs"))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    SEX 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy),
            labelTranslations = list(sexFemale = "Female", age = "Age, yrs"))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Age, yrs 0.012 0.012 0.348
    Female -0.718 0.291 0.014

    Modify labels after the fact

    Another option is to add labels after you have created the object. To do this, you can use the form labels(x) <- value or use the pipe-able version, set_labels().

    # the non-pipe version; somewhat clunky
    tmp <- freqlist(tab.ex)
    labels(tmp) <- c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL")
    summary(tmp)
    Treatment Arm Gender LASA QOL Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00
    # piped--much cleaner
    mockstudy %>% 
      tableby(arm ~ sex + age, data = .) %>% 
      set_labels(c(sex = "SEX", age = "Age, yrs")) %>% 
      summary()
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    SEX 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000
    mockstudy %>% 
      modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% 
      set_labels(list(sexFemale = "Female", age = "Age, yrs")) %>% 
      summary()
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Age, yrs 0.012 0.012 0.348
    Female -0.718 0.291 0.014

    Add labels to a data.frame

    tableby() and modelsum() also allow you to have label attributes on the data. Note that by default these attributes usually get dropped upon subsetting, but tableby() and modelsum() use the keep.labels() function to retain them.

    mockstudy.lab <- keep.labels(mockstudy)
    class(mockstudy$age)

    [1] “integer”

    class(mockstudy.lab$age)

    [1] “keep_labels” “integer”

    To undo this, simply loosen.labels():

    class(loosen.labels(mockstudy.lab)$age)

    [1] “integer”

    You can set attributes one at a time in two ways:

    attr(mockstudy.lab$sex, "label") <- "Sex"
    labels(mockstudy.lab$age) <- "Age, yrs"

    …or all at once:

    labels(mockstudy.lab) <- list(sex = "Sex", age = "Age, yrs")
    summary(tableby(arm ~ sex + age, data = mockstudy.lab))
    A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
    Sex 0.190
       Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
       Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
    Age, yrs 0.614
       Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
       Range 27.000 - 88.000 19.000 - 88.000 26.000 - 85.000 19.000 - 88.000

    You can pipe this, too.

    mockstudy %>% 
      set_labels(list(sex = "SEX", age = "Age, yrs")) %>% 
      modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% 
      summary()
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Age, yrs 0.012 0.012 0.348
    SEX Female -0.718 0.291 0.014

    To extract labels from a data.frame, simply use the labels() function:

    labels(mockstudy.lab)
    ## $case
    ## NULL
    ## 
    ## $age
    ## [1] "Age, yrs"
    ## 
    ## $arm
    ## [1] "Treatment Arm"
    ## 
    ## $sex
    ## [1] "Sex"
    ## 
    ## $race
    ## [1] "Race"
    ## 
    ## $fu.time
    ## NULL
    ## 
    ## $fu.stat
    ## NULL
    ## 
    ## $ps
    ## NULL
    ## 
    ## $hgb
    ## NULL
    ## 
    ## $bmi
    ## [1] "Body Mass Index (kg/m^2)"
    ## 
    ## $alk.phos
    ## NULL
    ## 
    ## $ast
    ## NULL
    ## 
    ## $mdquality.s
    ## NULL
    ## 
    ## $age.ord
    ## NULL

    When labels get long

    tableby() and modelsum() both support the wrapping of long labels. Consider the width= argument in the print() function:

    mockstudy %>% 
      set_labels(list(age = "This is a really long label for the arm variable")) %>% 
      tableby(sex ~ age, data = .) %>% 
      summary() %>% 
      print(width = 20)
    Male (N=916) Female (N=583) Total (N=1499) p value
    This is a really 0.048
    long label for the
    arm variable
       Mean (SD) 60.455 (11.369) 59.247 (11.722) 59.985 (11.519)
       Range 19.000 - 88.000 22.000 - 88.000 19.000 - 88.000
    arsenal/inst/doc/modelsum.R0000644000176200001440000004102014056514653015377 0ustar liggesusers## ---- echo=FALSE, message=FALSE, results='hide', warning=FALSE---------------- require(knitr) require(broom) require(MASS) require(pROC) require(rpart) opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE, collapse=TRUE) ## ---- load-data--------------------------------------------------------------- require(arsenal) data(mockstudy) # load data dim(mockstudy) # look at how many subjects and variables are in the dataset # help(mockstudy) # learn more about the dataset and variables str(mockstudy) # quick look at the data ## ----simple1------------------------------------------------------------------ tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) ## ----simple-text-------------------------------------------------------------- summary(tab1, text=TRUE) ## ----simple-markdown, results='asis'------------------------------------------ summary(tab1) ## ----------------------------------------------------------------------------- as.data.frame(tab1) ## ----adjust, results="asis"--------------------------------------------------- tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy) summary(tab2) ## ----------------------------------------------------------------------------- fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy) summary(fit) plot(fit) ## ----------------------------------------------------------------------------- require(MASS) boxcox(fit) ## ----------------------------------------------------------------------------- fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy) summary(fit2) plot(fit2) ## ----------------------------------------------------------------------------- require(splines) fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy) # test whether there is a difference between models stats::anova(fit2,fit3) # look at functional form of age termplot(fit3, term=2, se=T, rug=T) ## ----------------------------------------------------------------------------- tmp <- tidy(fit3) # coefficients, p-values class(tmp) tmp glance(fit3) ## ---- results="asis"---------------------------------------------------------- ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, family=gaussian, gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value")) summary(ms.logy) ## ----------------------------------------------------------------------------- boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s') fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial) summary(fit) # create Odd's ratio w/ confidence intervals tmp <- data.frame(summary(fit)$coef) tmp tmp$OR <- round(exp(tmp[,1]),2) tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2) tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2) names(tmp)[4] <- 'P-value' kable(tmp[,c('OR','lower.CI','upper.CI','P-value')]) # Assess the predictive ability of the model # code using the pROC package require(pROC) pred <- predict(fit, type='response') tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE) tmp$auc ## ----------------------------------------------------------------------------- tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals glance(fit) # model summary statistics ## ---- results="asis"---------------------------------------------------------- summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial)) fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial, binomial.stats=c("Nmiss2","OR","p.value")) summary(fitall) ## ----survival----------------------------------------------------------------- require(survival) # multivariable model with all 3 terms fit <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy) summary(fit) # check proportional hazards assumption fit.z <- cox.zph(fit) fit.z plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case abline(h=coef(fit)[1], col='red') # check functional form for age using pspline (penalized spline) # results are returned for the linear and non-linear components fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy) fit2 # plot smoothed age to visualize why significant termplot(fit2, se=T, terms=1) abline(h=0) # The c-statistic comes out in the summary of the fit summary(fit2)$concordance # It can also be calculated using the survConcordance function survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy) ## ----------------------------------------------------------------------------- tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ## ----results="asis"----------------------------------------------------------- ##Note: You must use quotes when specifying family="survival" ## family=survival will not work summary(modelsum(Surv(fu.time, fu.stat) ~ arm, adjust=~age + sex, data=mockstudy, family="survival")) ##Note: the pspline term is not working yet #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, # adjust=~pspline(age) + sex, data=mockstudy, family='survival')) ## ----poisson------------------------------------------------------------------ require(rpart) ##just to get access to solder dataset data(solder) hist(solder$skips) fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson) stats::anova(fit, test='Chi') summary(fit) ## ----------------------------------------------------------------------------- 1-pchisq(fit$deviance, fit$df.residual) ## ----------------------------------------------------------------------------- fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson) summary(fit2) ## ----------------------------------------------------------------------------- tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ## ----results='asis'----------------------------------------------------------- summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson")) summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson")) ## ----------------------------------------------------------------------------- # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson) summary(fit) 1-pchisq(fit$deviance, fit$df.residual) coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy)) coef(fit)[-1] # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio) exp(coef(fit)[-1]) # As before, we can model the dispersion which alters the standard error fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=quasipoisson) summary(fit2) ## ----------------------------------------------------------------------------- tidy(fit) ##coefficients, p-values glance(fit) ##model summary statistics ## ----results="asis", eval=TRUE------------------------------------------------ summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, data=mockstudy, family=poisson)) ## ---- results='asis'---------------------------------------------------------- mycontrols <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), show.adjust=FALSE, show.intercept=FALSE) tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols) summary(tab2) ## ---- results='asis'---------------------------------------------------------- tab3 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), show.intercept=FALSE, show.adjust=FALSE) summary(tab3) ## ----check-labels------------------------------------------------------------- ## Look at one variable's label attr(mockstudy$age,'label') ## See all the variables with a label unlist(lapply(mockstudy,'attr','label')) ## or cbind(sapply(mockstudy,attr,'label')) ## ----add-label, results='asis'------------------------------------------------ attr(mockstudy$age,'label') <- 'Age, yrs' tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy) summary(tab1) ## ---- results = 'asis'-------------------------------------------------------- labels(mockstudy) <- c(age = 'Age, yrs') tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy) summary(tab1) ## ---- results='asis'---------------------------------------------------------- mylabels <- list(sexFemale = "Female", age ="Age, yrs") summary(tab1, labelTranslations = mylabels) ## ---- eval=TRUE--------------------------------------------------------------- labels(tab1) labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)") labels(tab1) ## ---- results='asis'---------------------------------------------------------- summary(tab1) ## ---- results='asis'---------------------------------------------------------- summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE) ## ---- results='asis'---------------------------------------------------------- summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial), show.adjust=FALSE) ## ---- results='asis'---------------------------------------------------------- # create a vector specifying the variable names myvars <- names(mockstudy) # select the 8th through the 12th # paste them together, separated by the + sign RHS <- paste(myvars[8:12], collapse="+") RHS # create a formula using the as.formula function as.formula(paste('mdquality.s ~ ', RHS)) # use the formula in the modelsum function summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy)) ## ---- results='asis'---------------------------------------------------------- ## The formulize function does the paste and as.formula steps tmp <- formulize('mdquality.s',myvars[8:10]) tmp ## More complex formulas could also be written using formulize tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)')) ## use the formula in the modelsum function summary(modelsum(tmp, data=mockstudy, family=binomial)) ## ----------------------------------------------------------------------------- newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos)) dim(mockstudy) table(mockstudy$arm) dim(newdata) names(newdata) ## ---- results='asis'---------------------------------------------------------- summary(modelsum(alk.phos ~ ., data=newdata)) ## ---- results='asis', eval=TRUE----------------------------------------------- summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy)) summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy)) ## ----------------------------------------------------------------------------- ## create a variable combining the levels of mdquality.s and sex with(mockstudy, table(interaction(mdquality.s,sex))) ## ---- results='asis'---------------------------------------------------------- summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy)) ## ---- results='asis'---------------------------------------------------------- summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s, data=mockstudy, family=binomial)) ## ---- results='asis'---------------------------------------------------------- mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy) mytab2 <- mytab[c('age','sex','alk.phos')] summary(mytab2) summary(mytab[c('age','sex')]) summary(mytab[c(3,1)]) ## ---- results="asis"---------------------------------------------------------- ## demographics tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) ## lab data tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial) tab12 <- merge(tab1, tab2, all = TRUE) class(tab12) summary(tab12) ## ---- results='asis'---------------------------------------------------------- t1 <- modelsum(bmi ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ## ----------------------------------------------------------------------------- ## look at how many missing values there are for each variable apply(is.na(mockstudy),2,sum) ## ---- results='asis'---------------------------------------------------------- ## Show how many subjects have each variable (non-missing) summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("N","estimate")))) ## Always list the number of missing values summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("Nmiss2","estimate")))) ## Only show the missing values if there are some (default) summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("Nmiss","estimate")))) ## Don't show N at all summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("estimate")))) ## ---- results='asis'---------------------------------------------------------- summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2) ## ----------------------------------------------------------------------------- mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE) ## create weights based on agegp and sex distribution tab1 <- with(mockstudy,table(agegp, sex)) tab1 tab2 <- with(mockstudy, table(agegp, sex, arm)) gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2 ## apply weights to subjects index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$arm, summary) ## ----results='asis'----------------------------------------------------------- mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL') tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), family=binomial) summary(tab1, title='No Case Weights used') suppressWarnings({ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), weights=wts, family=binomial) summary(tab2, title='Case Weights used') }) ## ----------------------------------------------------------------------------- summary(tab2, text=T) tmp <- as.data.frame(summary(tab2, text = TRUE)) tmp # write.csv(tmp, '/my/path/here/mymodel.csv') ## ----eval = FALSE------------------------------------------------------------- # ## write to an HTML document # write2html(tab2, "~/ibm/trash.html") # # ## write to a Word document # write2word(tab2, "~/ibm/trash.doc", title="My table in Word") ## ----eval=FALSE--------------------------------------------------------------- # # A standalone shiny app # library(shiny) # library(arsenal) # data(mockstudy) # # shinyApp( # ui = fluidPage(tableOutput("table")), # server = function(input, output) { # output$table <- renderTable({ # as.data.frame(summary(modelsum(age ~ sex, data = mockstudy), text = "html")) # }, sanitize.text.function = function(x) x) # } # ) ## ----eval=FALSE--------------------------------------------------------------- # summary(modelsum(age ~ sex, data = mockstudy), title="(\\#tab:mytableby) Caption here") ## ----results='asis'----------------------------------------------------------- summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy)) ## ----results='asis'----------------------------------------------------------- summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy), term.name = TRUE) ## ----results='asis'----------------------------------------------------------- summary(modelsum(list(age, hgb) ~ bmi + sex, strata = arm, data = mockstudy)) ## ----------------------------------------------------------------------------- adj.list <- list( Unadjusted = ~ 1, # can also specify NULL here "Adjusted for Arm" = ~ arm ) multi.adjust <- modelsum(list(age, bmi) ~ fu.time + ast, adjust = adj.list, data = mockstudy) summary(multi.adjust, adjustment.names = TRUE) summary(multi.adjust, adjustment.names = TRUE, show.intercept = FALSE, show.adjust = FALSE) ## ----------------------------------------------------------------------------- args(modelsum.control) ## ----------------------------------------------------------------------------- args(arsenal:::summary.modelsum) arsenal/inst/doc/write2.R0000644000176200001440000002100414056514664014770 0ustar liggesusers## ----include = FALSE------------------------------------------------------------------------------ knitr::opts_chunk$set(eval = FALSE, message = FALSE) ## ------------------------------------------------------------------------------------------------- # library(arsenal) # library(magrittr) # data(mockstudy) # tmpdir <- tempdir() ## ------------------------------------------------------------------------------------------------- # mylabels <- list(sex = "SEX", age ="Age, yrs") # tab1 <- tableby(arm ~ sex + age, data=mockstudy) # # write2html( # tab1, paste0(tmpdir, "/test.tableby.html"), quiet = TRUE, # title = "My test table", # passed to summary.tableby # labelTranslations = mylabels, # passed to summary.tableby # total = FALSE # passed to summary.tableby # ) ## ------------------------------------------------------------------------------------------------- # tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy) # # write2pdf( # tab2, paste0(tmpdir, "/test.modelsum.pdf"), quiet = TRUE, # title = "My test table", # passed to summary.modelsum # show.intercept = FALSE, # passed to summary.modelsum # digits = 5 # passed to summary.modelsum # ) ## ------------------------------------------------------------------------------------------------- # mockstudy[, c("arm", "sex", "mdquality.s")] %>% # table(useNA = "ifany") %>% # freqlist(groupBy = c("arm", "sex")) %>% # write2word( # paste0(tmpdir, "/test.freqlist.doc"), quiet = TRUE, # single = FALSE, # passed to summary.freqlist # title = "My cool title" # passed to summary.freqlist # ) ## ------------------------------------------------------------------------------------------------- # mockstudy %>% # head() %>% # knitr::kable() %>% # write2html(paste0(tmpdir, "/test.kable.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # mockstudy %>% # head() %>% # xtable::xtable(caption = "My xtable") %>% # write2pdf( # paste0(tmpdir, "/test.xtable.pdf"), quiet = TRUE, # comment = FALSE, # passed to print.xtable to turn off the default message about xtable version # include.rownames = FALSE, # passed to print.xtable # caption.placement = "top" # passed to print.xtable # ) ## ------------------------------------------------------------------------------------------------- # mockstudy %>% # head() %>% # xtable::xtable(caption = "My xtable") %>% # write2html( # paste0(tmpdir, "/test.xtable.html"), quiet = TRUE, # type = "html", # passed to print.xtable # comment = FALSE, # passed to print.xtable to turn off the default message about xtable version # include.rownames = FALSE, # passed to print.xtable # caption.placement = "top" # passed to print.xtable # ) ## ------------------------------------------------------------------------------------------------- # write2word(pander::pander_return(head(mockstudy)), file = paste0(tmpdir, "/test.pander.doc"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # mylist <- list( # tableby(sex ~ age, data = mockstudy), # freqlist(table(mockstudy[, c("sex", "arm")])), # knitr::kable(head(mockstudy)) # ) # # write2pdf(mylist, paste0(tmpdir, "/test.mylist.pdf"), quiet = TRUE) # ## ------------------------------------------------------------------------------------------------- # mylist2 <- list( # "# Header 1", # "This is a small paragraph introducing tableby.", # tableby(sex ~ age, data = mockstudy), # "
    ", # "# Header 2", # "I can change color of my text!" # ) # write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # write2pdf(list(mylist2, mylist), paste0(tmpdir, "/test.mylists.pdf"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # lm(age ~ sex, data = mockstudy) %>% # summary() %>% # write2pdf(paste0(tmpdir, "/test.lm.pdf"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # tab4 <- tableby(arm ~ sex + age, data=mockstudy) # write2html(verbatim(tab4), paste0(tmpdir, "/test.print.tableby.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # chr <- paste0("MyVector", 1:10) # write2pdf(verbatim(chr), paste0(tmpdir, "/test.character.pdf"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # write2pdf(verbatim(tab4, chr), paste0(tmpdir, "/test.verbatim.pdf"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # mylist3 <- list( # yaml(title = "Test YAML Title", author = "My cool author name"), # "# Header 1", # "This is a small paragraph introducing tableby.", # tableby(sex ~ age, data = mockstudy) # ) # write2html(mylist3, paste0(tmpdir, "/test.yaml.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # mylist4 <- list( # "# Header 1", # "This is a small paragraph introducing tableby.", # yaml(title = "Test YAML Title"), # tableby(sex ~ age, data = mockstudy), # yaml(author = "My cool author name") # ) # write2html(mylist4, paste0(tmpdir, "/test.yaml2.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # mylist5 <- list( # "# What is 1 + 2?", # code.chunk(a <- 1, b <- 2), # code.chunk(a + b, chunk.opts = "r echo=FALSE, eval=TRUE") # ) # write2html(mylist5, paste0(tmpdir, "/test.code.chunk.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # write2html( # knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.quiet.html"), # quiet = TRUE # passed to rmarkdown::render # ) ## ------------------------------------------------------------------------------------------------- # write2html( # knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.rmd.html"), # quiet = TRUE, # passed to rmarkdown::render # keep.rmd = TRUE # ) ## ------------------------------------------------------------------------------------------------- # write2html( # knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.dont.render.html"), # render. = FALSE # ) ## ------------------------------------------------------------------------------------------------- # mylist2 <- list( # "# Header 1", # "This is a small paragraph introducing tableby.", # tableby(sex ~ age, data = mockstudy), # "
    ", # "# Header 2", # "I can change color of my text!" # ) # write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE) ## ------------------------------------------------------------------------------------------------- # write2html( # knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.theme.html"), # quiet = TRUE, # passed to rmarkdown::render # theme = "yeti" # passed to rmarkdown::html_document # ) ## ------------------------------------------------------------------------------------------------- # write2( # knitr::kable(head(mockstudy[, 1:4])), paste0(tmpdir, "/test.kable.rtf"), # quiet = TRUE, # passed to rmarkdown::render # output_format = rmarkdown::rtf_document # ) ## ------------------------------------------------------------------------------------------------- # mylist5 <- list( # yaml("header-includes" = list("\\usepackage[labelformat=empty]{caption}")), # "# Header 1", # "This is a small paragraph introducing tableby.", # tableby(sex ~ age, data = mockstudy) # ) # write2pdf(mylist5, paste0(tmpdir, "/test.noprefixes.pdf"), title = "My tableby") ## ------------------------------------------------------------------------------------------------- # mylist6 <- list( # summary(tableby(sex ~ age, data = mockstudy), title = "A Title for tableby"), # summary(modelsum(age ~ sex, data = mockstudy), title = "A Title for modelsum"), # summary(freqlist(~ sex, data = mockstudy), title = "A Title for freqlist") # ) # write2pdf(mylist6, paste0(tmpdir, "/test.multiple.titles.pdf")) arsenal/inst/doc/freqlist.html0000644000176200001440000035767414056514633016175 0ustar liggesusers The freqlist function

    The freqlist function

    Tina Gunderson and Ethan Heinzen

    Overview

    freqlist() is a function meant to produce output similar to SAS’s PROC FREQ procedure when using the /list option of the TABLE statement. freqlist() provides options for handling missing or sparse data and can provide cumulative counts and percentages based on subgroups. It depends on the knitr package for printing.

    require(arsenal)

    Sample dataset

    For our examples, we’ll load the mockstudy data included with this package and use it to create a basic table. Because they have fewer levels, for brevity, we’ll use the variables arm, sex, and mdquality.s to create the example table. We’ll retain NAs in the table creation. See the appendix for notes regarding default NA handling and other useful information regarding tables in R.

    # load the data
    data(mockstudy)
    
    # retain NAs when creating the table using the useNA argument
    tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany")

    The freqlist object

    The freqlist() function is an S3 generic (with methods for tables and formulas) which returns an object of class "freqlist".

    example1 <- freqlist(tab.ex)
    
    str(example1)
    List of 3
     $ Call   : language freqlist.table(object = tab.ex)
     $ control:List of 5
      ..$ sparse      : logi FALSE
      ..$ single      : logi FALSE
      ..$ dupLabels   : logi FALSE
      ..$ digits.count: int 0
      ..$ digits.pct  : int 2
     $ tables :List of 1
      ..$ :List of 6
      .. ..$ y         :List of 2
      .. .. ..$ term : chr ""
      .. .. ..$ label: chr ""
      .. ..$ strata    :List of 4
      .. .. ..$ term     : chr ""
      .. .. ..$ values   : chr ""
      .. .. ..$ label    : chr ""
      .. .. ..$ hasStrata: logi FALSE
      .. ..$ x         :List of 7
      .. .. ..$ arm        :List of 3
      .. .. .. ..$ variable: chr "arm"
      .. .. .. ..$ label   : chr "arm"
      .. .. .. ..$ term    : chr "arm"
      .. .. ..$ sex        :List of 3
      .. .. .. ..$ variable: chr "sex"
      .. .. .. ..$ label   : chr "sex"
      .. .. .. ..$ term    : chr "sex"
      .. .. ..$ mdquality.s:List of 3
      .. .. .. ..$ variable: chr "mdquality.s"
      .. .. .. ..$ label   : chr "mdquality.s"
      .. .. .. ..$ term    : chr "mdquality.s"
      .. .. ..$ Freq       :List of 3
      .. .. .. ..$ variable: chr "Freq"
      .. .. .. ..$ label   : chr "Freq"
      .. .. .. ..$ term    : chr "Freq"
      .. .. ..$ cumFreq    :List of 3
      .. .. .. ..$ variable: chr "cumFreq"
      .. .. .. ..$ label   : chr "Cumulative Freq"
      .. .. .. ..$ term    : chr "cumFreq"
      .. .. ..$ freqPercent:List of 3
      .. .. .. ..$ variable: chr "freqPercent"
      .. .. .. ..$ label   : chr "Percent"
      .. .. .. ..$ term    : chr "freqPercent"
      .. .. ..$ cumPercent :List of 3
      .. .. .. ..$ variable: chr "cumPercent"
      .. .. .. ..$ label   : chr "Cumulative Percent"
      .. .. .. ..$ term    : chr "cumPercent"
      .. ..$ tables    :List of 1
      .. .. ..$ :'data.frame':  18 obs. of  7 variables:
      .. .. .. ..$ arm        : Factor w/ 3 levels "A: IFL","F: FOLFOX",..: 1 1 1 1 1 1 2 2 2 2 ...
      .. .. .. ..$ sex        : Factor w/ 2 levels "Male","Female": 1 1 1 2 2 2 1 1 1 2 ...
      .. .. .. ..$ mdquality.s: Factor w/ 2 levels "0","1": 1 2 NA 1 2 NA 1 2 NA 1 ...
      .. .. .. ..$ Freq       : int [1:18] 29 214 34 12 118 21 31 285 95 21 ...
      .. .. .. ..$ cumFreq    : int [1:18] 29 243 277 289 407 428 459 744 839 860 ...
      .. .. .. ..$ freqPercent: num [1:18] 1.935 14.276 2.268 0.801 7.872 ...
      .. .. .. ..$ cumPercent : num [1:18] 1.93 16.21 18.48 19.28 27.15 ...
      .. ..$ hasWeights: logi FALSE
      .. ..$ na.options: chr "include"
     - attr(*, "class")= chr [1:2] "freqlist" "arsenal_table"
    # view the data frame portion of freqlist output
    head(as.data.frame(example1))  ## or use as.data.frame(example1)
         arm    sex mdquality.s Freq cumFreq freqPercent cumPercent
    1 A: IFL   Male           0   29      29   1.9346231   1.934623
    2 A: IFL   Male           1  214     243  14.2761841  16.210807
    3 A: IFL   Male        <NA>   34     277   2.2681788  18.478986
    4 A: IFL Female           0   12     289   0.8005337  19.279520
    5 A: IFL Female           1  118     407   7.8719146  27.151434
    6 A: IFL Female        <NA>   21     428   1.4009340  28.552368

    Basic output using summary()

    The summary method for freqlist() relies on the kable() function (in the knitr package) for printing. knitr::kable() converts the output to markdown which can be printed in the console or easily rendered in Word, PDF, or HTML documents.

    Note that you must supply results="asis" to properly format the markdown output.

    summary(example1)
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00

    You can print a title for the table using the title= argument.

    summary(example1, title = "Basic freqlist output")
    Basic freqlist output
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00

    You can also easily pull out the freqlist data frame for more complicated formatting or manipulation (e.g. with another function such as xtable() or pander()) using as.data.frame(summary()):

    head(as.data.frame(summary(example1)))
         arm    sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    1 A: IFL   Male           0   29              29    1.93               1.93
    2                         1  214             243   14.28              16.21
    3                      <NA>   34             277    2.27              18.48
    4        Female           0   12             289    0.80              19.28
    5                         1  118             407    7.87              27.15
    6                      <NA>   21             428    1.40              28.55

    Using a formula with freqlist

    Instead of passing a pre-computed table to freqlist(), you can instead pass a formula, which will be in turn passed to the xtabs() function. Additional freqlist() arguments are passed through the ... to the freqlist() table method.

    Note that freqlist() sets the addNA=TRUE argument by default:

    summary(freqlist(~arm + sex + mdquality.s, data = mockstudy))
    Treatment Arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00

    One can also set NAs to an explicit value using includeNA().

    summary(freqlist(~arm + sex + includeNA(mdquality.s, "Missing"), data = mockstudy))
    Treatment Arm sex includeNA(mdquality.s, “Missing”) Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    Missing 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    Missing 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    Missing 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    Missing 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    Missing 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    Missing 17 1499 1.13 100.00

    In fact, since xtabs() allows for left-hand-side weights, so does freqlist()!

    mockstudy$weights <- c(10000, rep(1, nrow(mockstudy) - 1))
    summary(freqlist(weights ~ arm + sex + addNA(mdquality.s), data = mockstudy))
    Treatment Arm sex addNA(mdquality.s) Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 0.25 0.25
    1 214 243 1.86 2.11
    NA 34 277 0.30 2.41
    Female 0 12 289 0.10 2.51
    1 118 407 1.03 3.54
    NA 21 428 0.18 3.72
    F: FOLFOX Male 0 31 459 0.27 3.99
    1 285 744 2.48 6.47
    NA 10094 10838 87.79 94.26
    Female 0 21 10859 0.18 94.44
    1 198 11057 1.72 96.16
    NA 61 11118 0.53 96.70
    G: IROX Male 0 17 11135 0.15 96.84
    1 187 11322 1.63 98.47
    NA 24 11346 0.21 98.68
    Female 0 14 11360 0.12 98.80
    1 121 11481 1.05 99.85
    NA 17 11498 0.15 100.00

    You can also specify multiple weights:

    mockstudy$weights2 <- c(rep(1, nrow(mockstudy) - 1), 10000)
    summary(freqlist(list(weights, weights2) ~ arm + sex + addNA(mdquality.s), data = mockstudy))
    Treatment Arm sex addNA(mdquality.s) Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 0.25 0.25
    1 214 243 1.86 2.11
    NA 34 277 0.30 2.41
    Female 0 12 289 0.10 2.51
    1 118 407 1.03 3.54
    NA 21 428 0.18 3.72
    F: FOLFOX Male 0 31 459 0.27 3.99
    1 285 744 2.48 6.47
    NA 10094 10838 87.79 94.26
    Female 0 21 10859 0.18 94.44
    1 198 11057 1.72 96.16
    NA 61 11118 0.53 96.70
    G: IROX Male 0 17 11135 0.15 96.84
    1 187 11322 1.63 98.47
    NA 24 11346 0.21 98.68
    Female 0 14 11360 0.12 98.80
    1 121 11481 1.05 99.85
    NA 17 11498 0.15 100.00
    Treatment Arm sex addNA(mdquality.s) Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 0.25 0.25
    1 214 243 1.86 2.11
    NA 34 277 0.30 2.41
    Female 0 12 289 0.10 2.51
    1 118 407 1.03 3.54
    NA 21 428 0.18 3.72
    F: FOLFOX Male 0 31 459 0.27 3.99
    1 285 744 2.48 6.47
    NA 95 839 0.83 7.30
    Female 0 21 860 0.18 7.48
    1 198 1058 1.72 9.20
    NA 10060 11118 87.49 96.70
    G: IROX Male 0 17 11135 0.15 96.84
    1 187 11322 1.63 98.47
    NA 24 11346 0.21 98.68
    Female 0 14 11360 0.12 98.80
    1 121 11481 1.05 99.85
    NA 17 11498 0.15 100.00

    Rounding percentage digits or changing variable names for printing

    The digits.pct= argument takes a single numeric value and controls the number of digits of percentages in the output. The digits.count= argument takes a similar argument and controls the number of digits of the count columns. The labelTranslations= argument is a named character vector or list. Both options are applied in the following example.

    example2 <- freqlist(tab.ex, labelTranslations = c(arm = "Treatment Arm", sex = "Gender", 
        mdquality.s = "LASA QOL"), digits.pct = 1, digits.count = 1)
    summary(example2)
    Treatment Arm Gender LASA QOL Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29.0 29.0 1.9 1.9
    1 214.0 243.0 14.3 16.2
    NA 34.0 277.0 2.3 18.5
    Female 0 12.0 289.0 0.8 19.3
    1 118.0 407.0 7.9 27.2
    NA 21.0 428.0 1.4 28.6
    F: FOLFOX Male 0 31.0 459.0 2.1 30.6
    1 285.0 744.0 19.0 49.6
    NA 95.0 839.0 6.3 56.0
    Female 0 21.0 860.0 1.4 57.4
    1 198.0 1058.0 13.2 70.6
    NA 61.0 1119.0 4.1 74.6
    G: IROX Male 0 17.0 1136.0 1.1 75.8
    1 187.0 1323.0 12.5 88.3
    NA 24.0 1347.0 1.6 89.9
    Female 0 14.0 1361.0 0.9 90.8
    1 121.0 1482.0 8.1 98.9
    NA 17.0 1499.0 1.1 100.0

    Additional examples

    Including combinations with frequencies of zero

    The sparse= argument takes a single logical value as input. The default option is FALSE. If set to TRUE, the sparse option will include combinations with frequencies of zero in the list of results. As our initial table did not have any such levels, we create a second table to use in our example.

    summary(freqlist(~race + sex + arm, data = mockstudy, sparse = TRUE, digits.pct = 1))
    Race sex Treatment Arm Freq Cumulative Freq Percent Cumulative Percent
    African-Am Male A: IFL 25 25 1.7 1.7
    F: FOLFOX 24 49 1.6 3.3
    G: IROX 16 65 1.1 4.3
    Female A: IFL 14 79 0.9 5.3
    F: FOLFOX 25 104 1.7 6.9
    G: IROX 11 115 0.7 7.7
    Asian Male A: IFL 0 115 0.0 7.7
    F: FOLFOX 10 125 0.7 8.3
    G: IROX 1 126 0.1 8.4
    Female A: IFL 1 127 0.1 8.5
    F: FOLFOX 4 131 0.3 8.7
    G: IROX 2 133 0.1 8.9
    Caucasian Male A: IFL 240 373 16.0 24.9
    F: FOLFOX 352 725 23.5 48.4
    G: IROX 195 920 13.0 61.4
    Female A: IFL 131 1051 8.7 70.1
    F: FOLFOX 234 1285 15.6 85.7
    G: IROX 136 1421 9.1 94.8
    Hawaii/Pacific Male A: IFL 1 1422 0.1 94.9
    F: FOLFOX 1 1423 0.1 94.9
    G: IROX 0 1423 0.0 94.9
    Female A: IFL 0 1423 0.0 94.9
    F: FOLFOX 2 1425 0.1 95.1
    G: IROX 1 1426 0.1 95.1
    Hispanic Male A: IFL 8 1434 0.5 95.7
    F: FOLFOX 17 1451 1.1 96.8
    G: IROX 12 1463 0.8 97.6
    Female A: IFL 4 1467 0.3 97.9
    F: FOLFOX 11 1478 0.7 98.6
    G: IROX 2 1480 0.1 98.7
    Native-Am/Alaska Male A: IFL 1 1481 0.1 98.8
    F: FOLFOX 0 1481 0.0 98.8
    G: IROX 2 1483 0.1 98.9
    Female A: IFL 1 1484 0.1 99.0
    F: FOLFOX 1 1485 0.1 99.1
    G: IROX 0 1485 0.0 99.1
    Other Male A: IFL 2 1487 0.1 99.2
    F: FOLFOX 2 1489 0.1 99.3
    G: IROX 1 1490 0.1 99.4
    Female A: IFL 0 1490 0.0 99.4
    F: FOLFOX 2 1492 0.1 99.5
    G: IROX 0 1492 0.0 99.5
    NA Male A: IFL 0 1492 0.0 99.5
    F: FOLFOX 5 1497 0.3 99.9
    G: IROX 1 1498 0.1 99.9
    Female A: IFL 0 1498 0.0 99.9
    F: FOLFOX 1 1499 0.1 100.0
    G: IROX 0 1499 0.0 100.0

    Options for NA handling

    The various na.options= allow you to include or exclude data with missing values for one or more factor levels in the counts and percentages, as well as show the missing data but exclude it from the cumulative counts and percentages. The default option is to include all combinations with missing values.

    summary(freqlist(tab.ex, na.options = "include"))
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00
    summary(freqlist(tab.ex, na.options = "showexclude"))
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 2.33 2.33
    1 214 243 17.16 19.49
    NA 34 NA NA NA
    Female 0 12 255 0.96 20.45
    1 118 373 9.46 29.91
    NA 21 NA NA NA
    F: FOLFOX Male 0 31 404 2.49 32.40
    1 285 689 22.85 55.25
    NA 95 NA NA NA
    Female 0 21 710 1.68 56.94
    1 198 908 15.88 72.81
    NA 61 NA NA NA
    G: IROX Male 0 17 925 1.36 74.18
    1 187 1112 15.00 89.17
    NA 24 NA NA NA
    Female 0 14 1126 1.12 90.30
    1 121 1247 9.70 100.00
    NA 17 NA NA NA
    summary(freqlist(tab.ex, na.options = "remove"))
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 2.33 2.33
    1 214 243 17.16 19.49
    Female 0 12 255 0.96 20.45
    1 118 373 9.46 29.91
    F: FOLFOX Male 0 31 404 2.49 32.40
    1 285 689 22.85 55.25
    Female 0 21 710 1.68 56.94
    1 198 908 15.88 72.81
    G: IROX Male 0 17 925 1.36 74.18
    1 187 1112 15.00 89.17
    Female 0 14 1126 1.12 90.30
    1 121 1247 9.70 100.00

    Frequency counts and percentages subset by factor levels

    The strata= argument internally subsets the data by the specified factor prior to calculating cumulative counts and percentages. By default, when used each subset will print in a separate table. Using the single = TRUE option when printing will collapse the subsetted result into a single table.

    example3 <- freqlist(tab.ex, strata = c("arm", "sex"))
    summary(example3)
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 10.47 10.47
    1 214 243 77.26 87.73
    NA 34 277 12.27 100.00
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Female 0 12 12 7.95 7.95
    1 118 130 78.15 86.09
    NA 21 151 13.91 100.00
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    F: FOLFOX Male 0 31 31 7.54 7.54
    1 285 316 69.34 76.89
    NA 95 411 23.11 100.00
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    F: FOLFOX Female 0 21 21 7.50 7.50
    1 198 219 70.71 78.21
    NA 61 280 21.79 100.00
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    G: IROX Male 0 17 17 7.46 7.46
    1 187 204 82.02 89.47
    NA 24 228 10.53 100.00
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    G: IROX Female 0 14 14 9.21 9.21
    1 121 135 79.61 88.82
    NA 17 152 11.18 100.00
    # using the single = TRUE argument will collapse results into a single table for
    # printing
    summary(example3, single = TRUE)
    arm sex mdquality.s Freq Cumulative Freq Percent Cumulative Percent
    A: IFL Male 0 29 29 10.47 10.47
    1 214 243 77.26 87.73
    NA 34 277 12.27 100.00
    Female 0 12 12 7.95 7.95
    1 118 130 78.15 86.09
    NA 21 151 13.91 100.00
    F: FOLFOX Male 0 31 31 7.54 7.54
    1 285 316 69.34 76.89
    NA 95 411 23.11 100.00
    Female 0 21 21 7.50 7.50
    1 198 219 70.71 78.21
    NA 61 280 21.79 100.00
    G: IROX Male 0 17 17 7.46 7.46
    1 187 204 82.02 89.47
    NA 24 228 10.53 100.00
    Female 0 14 14 9.21 9.21
    1 121 135 79.61 88.82
    NA 17 152 11.18 100.00

    Show only the “n” most common combinations in each table (head() and sort())

    You can now sort freqlist() objects, and, by taking the head() of the summary, output the most common frequencies. This looks the prettiest with dupLabels=TRUE.

    head(summary(sort(example1, decreasing = TRUE), dupLabels = TRUE))
    
    
    |arm       |sex    |mdquality.s | Freq| Cumulative Freq| Percent| Cumulative Percent|
    |:---------|:------|:-----------|----:|---------------:|-------:|------------------:|
    |F: FOLFOX |Male   |1           |  285|             285|   19.01|              19.01|
    |A: IFL    |Male   |1           |  214|             499|   14.28|              33.29|
    |F: FOLFOX |Female |1           |  198|             697|   13.21|              46.50|
    |G: IROX   |Male   |1           |  187|             884|   12.47|              58.97|
    |G: IROX   |Female |1           |  121|            1005|    8.07|              67.04|
    |A: IFL    |Female |1           |  118|            1123|    7.87|              74.92|

    Change labels on the fly

    labs <- c(arm = "Arm", sex = "Sex", mdquality.s = "QOL", freqPercent = "%")
    labels(example1) <- labs
    summary(example1)
    Arm Sex QOL Freq Cumulative Freq % Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00

    You can also supply labelTranslations= to summary().

    summary(example1, labelTranslations = labs)
    Arm Sex QOL Freq Cumulative Freq % Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    NA 34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    NA 21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    NA 95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    NA 61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    NA 24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    NA 17 1499 1.13 100.00

    Using xtable() to format and print freqlist() results

    Fair warning: xtable() has kind of a steep learning curve. These examples are given without explanation, for more advanced users.

    require(xtable)
    Loading required package: xtable
    # set up custom function for xtable text
    italic <- function(x) paste0("<i>", x, "</i>")
    
    xftbl <- xtable(as.data.frame(summary(example1)), caption = "xtable formatted output of freqlist data frame", 
        align = "|r|r|r|r|c|c|c|r|")
    
    # change the column names
    names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL")
    
    print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", 
        comment = FALSE)
    xtable formatted output of freqlist data frame
    Arm Gender LASA QOL Freq Cumulative Freq % Cumulative Percent
    A: IFL Male 0 29 29 1.93 1.93
    1 214 243 14.28 16.21
    34 277 2.27 18.48
    Female 0 12 289 0.80 19.28
    1 118 407 7.87 27.15
    21 428 1.40 28.55
    F: FOLFOX Male 0 31 459 2.07 30.62
    1 285 744 19.01 49.63
    95 839 6.34 55.97
    Female 0 21 860 1.40 57.37
    1 198 1058 13.21 70.58
    61 1119 4.07 74.65
    G: IROX Male 0 17 1136 1.13 75.78
    1 187 1323 12.47 88.26
    24 1347 1.60 89.86
    Female 0 14 1361 0.93 90.79
    1 121 1482 8.07 98.87
    17 1499 1.13 100.00

    Use freqlist in bookdown

    Since the backbone of freqlist() is knitr::kable(), tables still render well in bookdown. However, print.summary.freqlist() doesn’t use the caption= argument of kable(), so some tables may not have a properly numbered caption. To fix this, use the method described on the bookdown site to give the table a tag/ID.

    summary(freqlist(~sex + age, data = mockstudy), title = "(\\#tab:mytableby) Caption here")

    Appendix: Notes regarding table options in R

    NAs

    There are several widely used options for basic tables in R. The table() function in base R is probably the most common; by default it excludes NA values. You can change NA handling in base::table() using the useNA= or exclude= arguments.

    # base table default removes NAs
    tab.d1 <- base::table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA = "ifany")
    tab.d1
    , , mdquality.s = 0
    
               sex
    arm         Male Female
      A: IFL      29     12
      F: FOLFOX   31     21
      G: IROX     17     14
    
    , , mdquality.s = 1
    
               sex
    arm         Male Female
      A: IFL     214    118
      F: FOLFOX  285    198
      G: IROX    187    121
    
    , , mdquality.s = NA
    
               sex
    arm         Male Female
      A: IFL      34     21
      F: FOLFOX   95     61
      G: IROX     24     17

    xtabs() is similar to table(), but uses a formula-based syntax. However, NAs must be explicitly added to each factor using the addNA() function or using the argument addNA = TRUE.

    # without specifying addNA
    tab.d2 <- xtabs(formula = ~arm + sex + mdquality.s, data = mockstudy)
    tab.d2
    , , mdquality.s = 0
    
               sex
    arm         Male Female
      A: IFL      29     12
      F: FOLFOX   31     21
      G: IROX     17     14
    
    , , mdquality.s = 1
    
               sex
    arm         Male Female
      A: IFL     214    118
      F: FOLFOX  285    198
      G: IROX    187    121
    # now with addNA
    tab.d3 <- xtabs(~arm + sex + addNA(mdquality.s), data = mockstudy)
    tab.d3
    , , addNA(mdquality.s) = 0
    
               sex
    arm         Male Female
      A: IFL      29     12
      F: FOLFOX   31     21
      G: IROX     17     14
    
    , , addNA(mdquality.s) = 1
    
               sex
    arm         Male Female
      A: IFL     214    118
      F: FOLFOX  285    198
      G: IROX    187    121
    
    , , addNA(mdquality.s) = NA
    
               sex
    arm         Male Female
      A: IFL      34     21
      F: FOLFOX   95     61
      G: IROX     24     17

    Since the formula method of freqlist() uses xtabs(), NAs should be treated in the same way. includeNA() can also be helpful here for setting explicit NA values.

    Table dimname names (dnn)

    Supplying a data.frame to the table() function without giving columns individually will create a contingency table using all variables in the data.frame.

    However, if the columns of a data.frame or matrix are supplied separately (i.e., as vectors), column names will not be preserved.

    # providing variables separately (as vectors) drops column names
    table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s)
    , ,  = 0
    
               
                Male Female
      A: IFL      29     12
      F: FOLFOX   31     21
      G: IROX     17     14
    
    , ,  = 1
    
               
                Male Female
      A: IFL     214    118
      F: FOLFOX  285    198
      G: IROX    187    121

    If desired, you can use the dnn= argument to pass variable names.

    # add the column name labels back using dnn option in base::table
    table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s, dnn = c("Arm", "Sex", 
        "QOL"))
    , , QOL = 0
    
               Sex
    Arm         Male Female
      A: IFL      29     12
      F: FOLFOX   31     21
      G: IROX     17     14
    
    , , QOL = 1
    
               Sex
    Arm         Male Female
      A: IFL     214    118
      F: FOLFOX  285    198
      G: IROX    187    121

    You can also name the arguments to table():

    table(Arm = mockstudy$arm, Sex = mockstudy$sex, QOL = mockstudy$mdquality.s)
    , , QOL = 0
    
               Sex
    Arm         Male Female
      A: IFL      29     12
      F: FOLFOX   31     21
      G: IROX     17     14
    
    , , QOL = 1
    
               Sex
    Arm         Male Female
      A: IFL     214    118
      F: FOLFOX  285    198
      G: IROX    187    121

    If using freqlist(), you can provide the labels directly to freqlist() or to summary() using labelTranslations=.

    arsenal/inst/doc/freqlist.R0000644000176200001440000001177514056514633015417 0ustar liggesusers## ----setup, include=FALSE----------------------------------------------------- knitr::opts_chunk$set(echo = TRUE, tidy.opts=list(width.cutoff=80), tidy=TRUE, comment=NA) options(width=80, max.print=1000) ## ----message = FALSE---------------------------------------------------------- require(arsenal) ## ----loading.data------------------------------------------------------------- # load the data data(mockstudy) # retain NAs when creating the table using the useNA argument tab.ex <- table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") ## ----console.output----------------------------------------------------------- example1 <- freqlist(tab.ex) str(example1) # view the data frame portion of freqlist output head(as.data.frame(example1)) ## or use as.data.frame(example1) ## ---- results = 'asis'-------------------------------------------------------- summary(example1) ## ---- results = 'asis'-------------------------------------------------------- summary(example1, title="Basic freqlist output") ## ----------------------------------------------------------------------------- head(as.data.frame(summary(example1))) ## ----results='asis'----------------------------------------------------------- summary(freqlist(~ arm + sex + mdquality.s, data = mockstudy)) ## ----results='asis'----------------------------------------------------------- summary(freqlist(~ arm + sex + includeNA(mdquality.s, "Missing"), data = mockstudy)) ## ----results='asis'----------------------------------------------------------- mockstudy$weights <- c(10000, rep(1, nrow(mockstudy) - 1)) summary(freqlist(weights ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ## ----results='asis'----------------------------------------------------------- mockstudy$weights2 <- c(rep(1, nrow(mockstudy) - 1), 10000) summary(freqlist(list(weights, weights2) ~ arm + sex + addNA(mdquality.s), data = mockstudy)) ## ----labelTranslations, results = 'asis'-------------------------------------- example2 <- freqlist(tab.ex, labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL"), digits.pct = 1, digits.count = 1) summary(example2) ## ----sparse, results = 'asis'------------------------------------------------- summary(freqlist(~ race + sex + arm, data = mockstudy, sparse = TRUE, digits.pct=1)) ## ----na.options, results = 'asis'--------------------------------------------- summary(freqlist(tab.ex, na.options="include")) summary(freqlist(tab.ex, na.options="showexclude")) summary(freqlist(tab.ex, na.options="remove")) ## ----freq.counts, results='asis'---------------------------------------------- example3 <- freqlist(tab.ex, strata = c("arm","sex")) summary(example3) #using the single = TRUE argument will collapse results into a single table for printing summary(example3, single = TRUE) ## ----------------------------------------------------------------------------- head(summary(sort(example1, decreasing = TRUE), dupLabels = TRUE)) ## ----changelabs, results = 'asis'--------------------------------------------- labs <- c(arm = "Arm", sex = "Sex", mdquality.s = "QOL", freqPercent = "%") labels(example1) <- labs summary(example1) ## ---- results = 'asis'-------------------------------------------------------- summary(example1, labelTranslations = labs) ## ----results='asis'----------------------------------------------------------- require(xtable) # set up custom function for xtable text italic <- function(x) paste0('', x, '') xftbl <- xtable(as.data.frame(summary(example1)), caption = "xtable formatted output of freqlist data frame", align="|r|r|r|r|c|c|c|r|") # change the column names names(xftbl)[1:3] <- c("Arm", "Gender", "LASA QOL") print(xftbl, sanitize.colnames.function = italic, include.rownames = FALSE, type = "html", comment = FALSE) ## ----eval=FALSE--------------------------------------------------------------- # summary(freqlist(~ sex + age, data = mockstudy), title="(\\#tab:mytableby) Caption here") ## ----------------------------------------------------------------------------- # base table default removes NAs tab.d1 <- base::table(mockstudy[, c("arm", "sex", "mdquality.s")], useNA="ifany") tab.d1 ## ----------------------------------------------------------------------------- # without specifying addNA tab.d2 <- xtabs(formula = ~ arm + sex + mdquality.s, data = mockstudy) tab.d2 # now with addNA tab.d3 <- xtabs(~ arm + sex + addNA(mdquality.s), data = mockstudy) tab.d3 ## ----------------------------------------------------------------------------- # providing variables separately (as vectors) drops column names table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s) ## ----------------------------------------------------------------------------- # add the column name labels back using dnn option in base::table table(mockstudy$arm, mockstudy$sex, mockstudy$mdquality.s, dnn=c("Arm", "Sex", "QOL")) ## ----------------------------------------------------------------------------- table(Arm = mockstudy$arm, Sex = mockstudy$sex, QOL = mockstudy$mdquality.s) arsenal/inst/doc/labels.R0000644000176200001440000000502714056514635015023 0ustar liggesusers## ----include = FALSE---------------------------------------------------------- knitr::opts_chunk$set(message = FALSE, results = 'asis') ## ----------------------------------------------------------------------------- library(arsenal) data(mockstudy) library(magrittr) # for 'freqlist' examples tab.ex <- table(mockstudy[c("arm", "sex", "mdquality.s")], useNA="ifany") ## ----------------------------------------------------------------------------- summary(freqlist(tab.ex), labelTranslations = c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL")) summary(tableby(arm ~ sex + age, data = mockstudy), labelTranslations = c(sex = "SEX", age = "Age, yrs")) summary(modelsum(bmi ~ age, adjust = ~sex, data = mockstudy), labelTranslations = list(sexFemale = "Female", age = "Age, yrs")) ## ----------------------------------------------------------------------------- # the non-pipe version; somewhat clunky tmp <- freqlist(tab.ex) labels(tmp) <- c(arm = "Treatment Arm", sex = "Gender", mdquality.s = "LASA QOL") summary(tmp) # piped--much cleaner mockstudy %>% tableby(arm ~ sex + age, data = .) %>% set_labels(c(sex = "SEX", age = "Age, yrs")) %>% summary() mockstudy %>% modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% set_labels(list(sexFemale = "Female", age = "Age, yrs")) %>% summary() ## ----------------------------------------------------------------------------- mockstudy.lab <- keep.labels(mockstudy) class(mockstudy$age) class(mockstudy.lab$age) ## ----------------------------------------------------------------------------- class(loosen.labels(mockstudy.lab)$age) ## ----------------------------------------------------------------------------- attr(mockstudy.lab$sex, "label") <- "Sex" labels(mockstudy.lab$age) <- "Age, yrs" ## ----------------------------------------------------------------------------- labels(mockstudy.lab) <- list(sex = "Sex", age = "Age, yrs") summary(tableby(arm ~ sex + age, data = mockstudy.lab)) ## ----------------------------------------------------------------------------- mockstudy %>% set_labels(list(sex = "SEX", age = "Age, yrs")) %>% modelsum(bmi ~ age, adjust = ~ sex, data = .) %>% summary() ## ----results='markdown'------------------------------------------------------- labels(mockstudy.lab) ## ----------------------------------------------------------------------------- mockstudy %>% set_labels(list(age = "This is a really long label for the arm variable")) %>% tableby(sex ~ age, data = .) %>% summary() %>% print(width = 20) arsenal/inst/doc/modelsum.Rmd0000644000176200001440000007664414051207602015730 0ustar liggesusers--- title: "The modelsum function" author: "Beth Atkinson, Ethan Heinzen, Pat Votruba, Jason Sinnwell, Shannon McDonnell and Greg Dougherty" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The modelsum function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, echo=FALSE, message=FALSE, results='hide', warning=FALSE} require(knitr) require(broom) require(MASS) require(pROC) require(rpart) opts_chunk$set(comment = NA, echo=TRUE, prompt=TRUE, collapse=TRUE) ``` # Introduction Very often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there may be several independent variables that we want to examine univariately or adjusted for certain variables such as age and sex. Locally at Mayo, the SAS macros `%modelsum`, `%glmuniv`, and `%logisuni` were written to create such summary tables. With the increasing interest in R, we have developed the function `modelsum` to create similar tables within the R environment. In developing the `modelsum` function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R's strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report. This report provides step-by-step directions for using the functions associated with `modelsum`. All functions presented here are available within the `arsenal` package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at [rmarkdown.rstudio.com](https://rmarkdown.rstudio.com/). # Simple Example The first step when using the `modelsum` function is to load the `arsenal` package. All the examples in this report use a dataset called `mockstudy` made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples. ```{r, load-data} require(arsenal) data(mockstudy) # load data dim(mockstudy) # look at how many subjects and variables are in the dataset # help(mockstudy) # learn more about the dataset and variables str(mockstudy) # quick look at the data ``` To create a simple linear regression table (the default), use a formula statement to specify the variables that you want summarized. The example below predicts BMI with the variables sex and age. ```{r simple1} tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) ``` If you want to take a quick look at the table, you can use `summary` on your modelsum object and the table will print out as text in your R console window. If you use `summary` without any options you will see a number of $\ $ statements which translates to "space" in HTML. ## Pretty text version of table If you want a nicer version in your console window then adding the `text=TRUE` option. ```{r simple-text} summary(tab1, text=TRUE) ``` ## Pretty Rmarkdown version of table In order for the report to look nice within an R markdown (knitr) report, you just need to specify `results="asis"` when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names. ```{r simple-markdown, results='asis'} summary(tab1) ``` ## Data frame version of table If you want a data.frame version, simply use `as.data.frame`. ```{r} as.data.frame(tab1) ``` ## Add an adjustor to the model The argument `adjust` allows the user to indicate that all the variables should be adjusted for these terms. To adjust each model for age and sex (for instance), we use `adjust = ~ age + sex`: ```{r adjust, results="asis"} tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy) summary(tab2) ``` # Models for each endpoint type To make sure the correct model is run you need to specify "family". The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added. ## Gaussian ### Fit and summarize linear regression model Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship). ```{r} fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy) summary(fit) plot(fit) ``` The results suggest that the endpoint may need to be transformed. Calculating the Box-Cox transformation suggests a log transformation. ```{r} require(MASS) boxcox(fit) ``` ```{r} fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy) summary(fit2) plot(fit2) ``` Finally, look to see whether there there is a non-linear relationship with age. ```{r} require(splines) fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy) # test whether there is a difference between models stats::anova(fit2,fit3) # look at functional form of age termplot(fit3, term=2, se=T, rug=T) ``` In this instance it looks like there isn't enough evidence to say that the relationship is non-linear. ### Extract data using the `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tmp <- tidy(fit3) # coefficients, p-values class(tmp) tmp glance(fit3) ``` ### Create a summary table using modelsum ```{r, results="asis"} ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, family=gaussian, gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value")) summary(ms.logy) ``` ## Binomial ### Fit and summarize logistic regression model ```{r} boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s') fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial) summary(fit) # create Odd's ratio w/ confidence intervals tmp <- data.frame(summary(fit)$coef) tmp tmp$OR <- round(exp(tmp[,1]),2) tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2) tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2) names(tmp)[4] <- 'P-value' kable(tmp[,c('OR','lower.CI','upper.CI','P-value')]) # Assess the predictive ability of the model # code using the pROC package require(pROC) pred <- predict(fit, type='response') tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE) tmp$auc ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals glance(fit) # model summary statistics ``` ### Create a summary table using modelsum ```{r, results="asis"} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial)) fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial, binomial.stats=c("Nmiss2","OR","p.value")) summary(fitall) ``` ## Survival ### Fit and summarize a Cox regression model ```{r survival} require(survival) # multivariable model with all 3 terms fit <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy) summary(fit) # check proportional hazards assumption fit.z <- cox.zph(fit) fit.z plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case abline(h=coef(fit)[1], col='red') # check functional form for age using pspline (penalized spline) # results are returned for the linear and non-linear components fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy) fit2 # plot smoothed age to visualize why significant termplot(fit2, se=T, terms=1) abline(h=0) # The c-statistic comes out in the summary of the fit summary(fit2)$concordance # It can also be calculated using the survConcordance function survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy) ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ``` ### Create a summary table using modelsum ```{r results="asis"} ##Note: You must use quotes when specifying family="survival" ## family=survival will not work summary(modelsum(Surv(fu.time, fu.stat) ~ arm, adjust=~age + sex, data=mockstudy, family="survival")) ##Note: the pspline term is not working yet #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, # adjust=~pspline(age) + sex, data=mockstudy, family='survival')) ``` ## Poisson Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely related and survival data can be summarized using Poisson regression. If you have overdispersion (see if the residual deviance is much larger than degrees of freedom), you may want to use `quasipoisson()` instead of `poisson()`. Some of these diagnostics need to be done outside of `modelsum`. ### Example 1: fit and summarize a Poisson regression model For the first example, use the solder dataset available in the `rpart` package. The endpoint `skips` has a definite Poisson look. ```{r poisson} require(rpart) ##just to get access to solder dataset data(solder) hist(solder$skips) fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson) stats::anova(fit, test='Chi') summary(fit) ``` Overdispersion is when the Residual deviance is larger than the degrees of freedom. This can be tested, approximately using the following code. The goal is to have a p-value that is $>0.05$. ```{r} 1-pchisq(fit$deviance, fit$df.residual) ``` One possible solution is to use the quasipoisson family instead of the poisson family. This adjusts for the overdispersion. ```{r} fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson) summary(fit2) ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit) # coefficients, p-values glance(fit) # model summary statistics ``` ### Create a summary table using modelsum ```{r results='asis'} summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson")) summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson")) ``` ### Example 2: fit and summarize a Poisson regression model This second example uses the survival endpoint available in the `mockstudy` dataset. There is a close relationship between survival and Poisson models, and often it is easier to fit the model using Poisson regression, especially if you want to present absolute risk. ```{r} # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson) summary(fit) 1-pchisq(fit$deviance, fit$df.residual) coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy)) coef(fit)[-1] # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio) exp(coef(fit)[-1]) # As before, we can model the dispersion which alters the standard error fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=quasipoisson) summary(fit2) ``` ### Extract data using `broom` package The `broom` package makes it easy to extract information from the fit. ```{r} tidy(fit) ##coefficients, p-values glance(fit) ##model summary statistics ``` ### Create a summary table using `modelsum` Remember that the result from `modelsum` is different from the `fit` above. The `modelsum` summary shows the results for `age + offset(log(fu.time+.01))` then `sex + offset(log(fu.time+.01))` instead of `age + sex + arm + offset(log(fu.time+.01))`. ```{r results="asis", eval=TRUE} summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, data=mockstudy, family=poisson)) ``` # Additional Examples Here are multiple examples showing how to use some of the different options. ## 1. Change summary statistics globally There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard settings and use that for future tables. ```{r, results='asis'} mycontrols <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), show.adjust=FALSE, show.intercept=FALSE) tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols) summary(tab2) ``` You can also change these settings directly in the modelsum call. ```{r, results='asis'} tab3 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), show.intercept=FALSE, show.adjust=FALSE) summary(tab3) ``` ## 2. Add labels to independent variables In the above example, age is shown with a label (Age in Years), but sex is listed "as is". This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R. ```{r check-labels} ## Look at one variable's label attr(mockstudy$age,'label') ## See all the variables with a label unlist(lapply(mockstudy,'attr','label')) ## or cbind(sapply(mockstudy,attr,'label')) ``` If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset. ```{r add-label, results='asis'} attr(mockstudy$age,'label') <- 'Age, yrs' tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy) summary(tab1) ``` You can also use the built-in `data.frame` method for `labels<-`: ```{r, results = 'asis'} labels(mockstudy) <- c(age = 'Age, yrs') tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy) summary(tab1) ``` Another option is to add labels after you have created the table ```{r, results='asis'} mylabels <- list(sexFemale = "Female", age ="Age, yrs") summary(tab1, labelTranslations = mylabels) ``` Alternatively, you can check the variable labels and manipulate them with a function called `labels`, which works on the `modelsum` object. ```{r, eval=TRUE} labels(tab1) labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)") labels(tab1) ``` ```{r, results='asis'} summary(tab1) ``` ## 3. Don't show intercept values ```{r, results='asis'} summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE) ``` ## 4. Don't show results for adjustment variables ```{r, results='asis'} summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial), show.adjust=FALSE) ``` ## 5. Summarize multiple variables without typing them out Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the `paste` command with the `collapse="+"` option. ```{r, results='asis'} # create a vector specifying the variable names myvars <- names(mockstudy) # select the 8th through the 12th # paste them together, separated by the + sign RHS <- paste(myvars[8:12], collapse="+") RHS # create a formula using the as.formula function as.formula(paste('mdquality.s ~ ', RHS)) # use the formula in the modelsum function summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy)) ``` These steps can also be done using the `formulize` function. ```{r, results='asis'} ## The formulize function does the paste and as.formula steps tmp <- formulize('mdquality.s',myvars[8:10]) tmp ## More complex formulas could also be written using formulize tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)')) ## use the formula in the modelsum function summary(modelsum(tmp, data=mockstudy, family=binomial)) ``` ## 6. Subset the dataset used in the analysis Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group). * The first approach uses the subset function applied to the dataset `mockstudy`. This example also selects a subset of variables. The `modelsum` function is then applied to this subsetted data. ```{r} newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos)) dim(mockstudy) table(mockstudy$arm) dim(newdata) names(newdata) ``` ```{r, results='asis'} summary(modelsum(alk.phos ~ ., data=newdata)) ``` * The second approach does the same analysis but uses the subset argument within `modelsum` to subset the data. ```{r, results='asis', eval=TRUE} summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy)) summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy)) summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy)) ``` ## 7. Create combinations of variables on the fly ```{r} ## create a variable combining the levels of mdquality.s and sex with(mockstudy, table(interaction(mdquality.s,sex))) ``` ```{r, results='asis'} summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy)) ``` ## 8. Transform variables on the fly Certain transformations need to be surrounded by `I()` so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols `/ - + ^ *` then surround the new variable by `I()`. ```{r, results='asis'} summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s, data=mockstudy, family=binomial)) ``` ## 9. Change the ordering of the variables or delete a variable ```{r, results='asis'} mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy) mytab2 <- mytab[c('age','sex','alk.phos')] summary(mytab2) summary(mytab[c('age','sex')]) summary(mytab[c(3,1)]) ``` ## 10. Merge two `modelsum` objects together It is possible to merge two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is sometimes easier to combine two models of the same family (such as two sets of linear models). Overlapping y-variables will have their x-variables concatenated, and (if `all=TRUE`) non-overlapping y-variables will have their tables printed separately. ```{r, results="asis"} ## demographics tab1 <- modelsum(bmi ~ sex + age, data=mockstudy) ## lab data tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial) tab12 <- merge(tab1, tab2, all = TRUE) class(tab12) summary(tab12) ``` ## 11. Add a title to the table When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table. ```{r, results='asis'} t1 <- modelsum(bmi ~ sex + age, data=mockstudy) summary(t1, title='Demographics') ``` ## 12. Modify how missing values are treated Depending on the report you are writing you have the following options: * Use all values available for each variable * Use only those subjects who have measurements available for all the variables ```{r} ## look at how many missing values there are for each variable apply(is.na(mockstudy),2,sum) ``` ```{r, results='asis'} ## Show how many subjects have each variable (non-missing) summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("N","estimate")))) ## Always list the number of missing values summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("Nmiss2","estimate")))) ## Only show the missing values if there are some (default) summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("Nmiss","estimate")))) ## Don't show N at all summary(modelsum(bmi ~ ast + age, data=mockstudy, control=modelsum.control(gaussian.stats=c("estimate")))) ``` ## 13. Modify the number of digits used Within modelsum.control function there are 3 options for controlling the number of significant digits shown. * digits: controls the number of digits after the decimal point for continuous values * digits.ratio: controls the number of digits after the decimal point for continuous values * digits.p: controls the number of digits after the decimal point for continuous values ```{r, results='asis'} summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2) ``` ## 14. Use case-weights in the models Occasionally it is of interest to fit models using case weights. The `modelsum` function allows you to pass on the weights to the models and it will do the appropriate fit. ```{r} mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE) ## create weights based on agegp and sex distribution tab1 <- with(mockstudy,table(agegp, sex)) tab1 tab2 <- with(mockstudy, table(agegp, sex, arm)) gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2 ## apply weights to subjects index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) mockstudy$wts <- gpwts[index] ## show weights by treatment arm group tapply(mockstudy$wts,mockstudy$arm, summary) ``` ```{r results='asis'} mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL') tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), family=binomial) summary(tab1, title='No Case Weights used') suppressWarnings({ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), weights=wts, family=binomial) summary(tab2, title='Case Weights used') }) ``` ## 15. Use `modelsum` within an Sweave document For those users who wish to create tables within an Sweave document, the following code seems to work. ``` \documentclass{article} \usepackage{longtable} \usepackage{pdfpages} \begin{document} \section{Read in Data} <>= require(arsenal) require(knitr) require(rmarkdown) data(mockstudy) tab1 <- modelsum(bmi~sex+age, data=mockstudy) @ \section{Convert Summary.modelsum to LaTeX} <>= capture.output(summary(tab1), file="Test.md") ## Convert R Markdown Table to LaTeX render("Test.md", pdf_document(keep_tex=TRUE)) @ \includepdf{Test.pdf} \end{document} ``` ## 16. Export `modelsum` results to a .CSV file When looking at multiple variables it is sometimes useful to export the results to a csv file. The `as.data.frame` function creates a data frame object that can be exported or further manipulated within R. ```{r} summary(tab2, text=T) tmp <- as.data.frame(summary(tab2, text = TRUE)) tmp # write.csv(tmp, '/my/path/here/mymodel.csv') ``` ## 17. Write `modelsum` object to a separate Word or HTML file ```{r eval = FALSE} ## write to an HTML document write2html(tab2, "~/ibm/trash.html") ## write to a Word document write2word(tab2, "~/ibm/trash.doc", title="My table in Word") ``` ## 18. Use `modelsum` in R Shiny The easiest way to output a `modelsum()` object in an R Shiny app is to use the `tableOutput()` UI in combination with the `renderTable()` server function and `as.data.frame(summary(modelsum()))`: ```{r eval=FALSE} # A standalone shiny app library(shiny) library(arsenal) data(mockstudy) shinyApp( ui = fluidPage(tableOutput("table")), server = function(input, output) { output$table <- renderTable({ as.data.frame(summary(modelsum(age ~ sex, data = mockstudy), text = "html")) }, sanitize.text.function = function(x) x) } ) ``` This can be especially powerful if you feed the selections from a `selectInput(multiple = TRUE)` into `formulize()` to make the table dynamic! ## 23. Use `modelsum` in bookdown Since the backbone of `modelsum()` is `knitr::kable()`, tables still render well in bookdown. However, `print.summary.modelsum()` doesn't use the `caption=` argument of `kable()`, so some tables may not have a properly numbered caption. To fix this, use the method described [on the bookdown site](https://bookdown.org/yihui/bookdown/tables.html) to give the table a tag/ID. ```{r eval=FALSE} summary(modelsum(age ~ sex, data = mockstudy), title="(\\#tab:mytableby) Caption here") ``` ## 24. Model multiple endpoints You can now use `list()` on the left-hand side of `modelsum()` to give multiple endpoints. Note that only one "family" can be specified this way (use `merge()` instead if you want multiple families). ```{r results='asis'} summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy)) ``` To avoid confusion about which table is which endpoint, you can set `term.name=TRUE` in `summary()`. This takes the labels for each endpoint and puts them in the top-left of the table. ```{r results='asis'} summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy), term.name = TRUE) ``` ## 25. Model data by a non-test group (strata) You can also specify a grouping variable that doesn't get tested (but instead separates results): a *strata* variable. ```{r results='asis'} summary(modelsum(list(age, hgb) ~ bmi + sex, strata = arm, data = mockstudy)) ``` ## 26. Add multiple sets of adjustors to the model By putting multiple formulas into a list, you can use multiple sets of adjustors. Use `~ 1` or `NULL` for an "unadjusted" model. By using the `adjustment.names=TRUE` argument and giving names to your adjustor sets in the list, you can name the various analyses. ```{r} adj.list <- list( Unadjusted = ~ 1, # can also specify NULL here "Adjusted for Arm" = ~ arm ) multi.adjust <- modelsum(list(age, bmi) ~ fu.time + ast, adjust = adj.list, data = mockstudy) summary(multi.adjust, adjustment.names = TRUE) summary(multi.adjust, adjustment.names = TRUE, show.intercept = FALSE, show.adjust = FALSE) ``` # Available Function Options ## Summary statistics The available summary statistics, by varible type, are: * `ordinal`: Ordinal logistic regression models + default: `Nmiss, OR, CI.lower.OR, CI.upper.OR, p.value` + optional: `estimate, CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate,` `N, Nmiss2, endpoint, std.error, statistic, logLik, AIC, BIC, edf, deviance, df.residual, p.value.lrt` * `binomial`,`quasibinomial`: Logistic regression models + default: `OR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss` + optional: `estimate, CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate,` `CI.wald, CI.lower.wald, CI.upper.wald, CI.OR.wald, CI.lower.OR.wald, CI.upper.OR.wald,` `N, Nmiss2, Nevents, endpoint, std.error, statistic, logLik, AIC, BIC, null.deviance, deviance, df.residual, df.null, p.value.lrt` * `gaussian`: Linear regression models + default: `estimate, std.error, p.value, adj.r.squared, Nmiss` + optional: `CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, statistic,` `standard.estimate, endpoint, r.squared, AIC, BIC, logLik, statistic.F, p.value.F, p.value.lrt` * `poisson`, `quasipoisson`: Poisson regression models + default: `RR, CI.lower.RR, CI.upper.RR, p.value, Nmiss` + optional: `CI.RR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, std.error,` `estimate, statistic, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null, p.value.lrt` * `negbin`: Negative binomial regression models + default: `RR, CI.lower.RR, CI.upper.RR, p.value, Nmiss` + optional: `CI.RR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, std.error, estimate,` `statistic, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null, theta, SE.theta, p.value.lrt` * `clog`: Conditional Logistic models + default: `OR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss` + optional: `CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, std.error, endpoint, Nevents, statistic,` `r.squared, r.squared.max, logLik, AIC, BIC, statistic.log, p.value.log, statistic.sc, p.value.sc,` `statistic.wald, p.value.wald, N, std.error.concordance, p.value.lrt` * `survival`: Cox models + default: `HR, CI.lower.HR, CI.upper.HR, p.value, concordance, Nmiss` + optional: `CI.HR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, std.error, endpoint,` `Nevents, statistic, r.squared, r.squared.max, logLik, AIC, BIC, statistic.log, p.value.log, statistic.sc, p.value.sc,` `statistic.wald, p.value.wald, N, std.error.concordance, p.value.lrt` The full description of these parameters that can be shown for models include: * `N`: a count of the number of observations used in the analysis * `Nmiss`: only show the count of the number of missing values if there are some missing values * `Nmiss2`: always show a count of the number of missing values for a model * `endpoint`: dependent variable used in the model * `std.err`: print the standard error * `statistic`: test statistic * `statistic.F`: test statistic (F test) * `p.value`: print the p-value * `p.value.lrt`: print the likelihood ratio p-value for *the main effect only* (not the adjustors) * `r.squared`: print the model R-square * `adj.r.squared`: print the model adjusted R-square * `r.squared.max`: print the model R-square * `concordance`: print the model C statistic (which is the AUC for logistic models) * `logLik`: print the loglikelihood value * `p.value.log`: print the p-value for the overall model likelihood test * `p.value.wald`: print the p-value for the overall model wald test * `p.value.sc`: print the p-value for overall model score test * `AIC`: print the Akaike information criterion * `BIC`: print the Bayesian information criterion * `null.deviance`: null deviance * `deviance`: model deviance * `df.residual`: degrees of freedom for the residual * `df.null`: degrees of freedom for the null model * `dispersion`: This is used in Poisson models and is defined as the deviance/df.residual * `statistic.sc`: overall model score statistic * `statistic.wald`: overall model score statistic * `statistic.log`: overall model score statistic * `std.error.concordance`: standard error for the C statistic * `HR`: print the hazard ratio (for survival models), i.e. exp(beta) * `CI.lower.HR, CI.upper.HR`: print the confidence interval for the HR * `OR`: print the odd's ratio (for logistic models), i.e. exp(beta) * `CI.lower.OR, CI.upper.OR`: print the confidence interval for the OR * `CI.lower.OR.wald, CI.upper.OR.wald`: print the Wald confidence interval for the OR * `RR`: print the risk ratio (for poisson models), i.e. exp(beta) * `CI.lower.RR, CI.upper.RR`: print the confidence interval for the RR * `estimate`: print beta coefficient * `standardized.estimate`: print the standardized beta coefficient * `CI.lower.estimate, CI.upper.estimate`: print the confidence interval for the beta coefficient * `CI.lower.wald, CI.upper.wald`: print the Wald confidence interval for the beta coefficient * `edf`: print the effective degrees of freedom. * `theta`: print the estimate of theta. * `SE.theta`: print the estimate of theta's standard error. ## `modelsum.control` settings A quick way to see what arguments are possible to utilize in a function is to use the `args()` command. Settings involving the number of digits can be set in `modelsum.control` or in `summary.modelsum`. ```{r} args(modelsum.control) ``` ## `summary.modelsum` settings The summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels). ```{r} args(arsenal:::summary.modelsum) ``` arsenal/inst/doc/paired.R0000644000176200001440000000275014056514656015030 0ustar liggesusers## ----echo = FALSE--------------------------------------------------------------------------------- options(width = 100) ## ---- load-data----------------------------------------------------------------------------------- library(arsenal) dat <- data.frame( tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)), id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6), Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"), Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")), Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA), Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")), Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4), stringsAsFactors = FALSE ) ## ----results = 'asis'----------------------------------------------------------------------------- p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE) summary(p) ## ----results = 'asis'----------------------------------------------------------------------------- p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("fill")) summary(p) ## ------------------------------------------------------------------------------------------------- args(paired.control) ## ------------------------------------------------------------------------------------------------- args(arsenal:::summary.tableby) arsenal/inst/doc/comparedf.Rmd0000644000176200001440000003260413656527336016052 0ustar liggesusers--- title: "The comparedf function" author: "Ethan Heinzen, Ryan Lennon, Andrew Hanson" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The comparedf function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = FALSE} knitr::opts_chunk$set(eval = TRUE, message = FALSE, results = 'asis', comment='') options(width = 120) ``` # Introduction The `comparedf()` function can be used to determine and report differences between two data.frames. It was written in the spirit of replacing `PROC COMPARE` from SAS. ```{r results = 'asis'} library(arsenal) ``` Why "comparedf"? We originally called this function `compare.data.frame()`, using `testthat::compare()` as our S3 generic, but that ended up getting us in trouble because of conflicting object structures. Why this didn't occur to us at the time remains a mystery. To replace it, we brainstormed several ideas (`comparedf()`, `dfcompare()`, `collate()`, `comparison()`) but settled on the former for three reasons: 1. There were no other objects with that generic or class (see `testthat::compare()` and `compare::compare()`). 2. It is mnemonically easy to remember (we "compare data.frames", not "data.frames compare"). 3. It tab auto-completes from the original "compare". # Basic examples We first build two similar data.frames to compare. ```{r} df1 <- data.frame(id = paste0("person", 1:3), a = c("a", "b", "c"), b = c(1, 3, 4), c = c("f", "e", "d"), row.names = paste0("rn", 1:3), stringsAsFactors = FALSE) df2 <- data.frame(id = paste0("person", 3:1), a = c("c", "b", "a"), b = c(1, 3, 4), d = paste0("rn", 1:3), row.names = paste0("rn", c(1,3,2)), stringsAsFactors = FALSE) ``` To compare these datasets, simply pass them to the `comparedf()` function: ```{r results='markup'} comparedf(df1, df2) ``` Use `summary()` to get a more detailed summary ```{r} summary(comparedf(df1, df2)) ``` By default, the datasets are compared row-by-row. To change this, use the `by=` or `by.x=` and `by.y=` arguments: ```{r} summary(comparedf(df1, df2, by = "id")) ``` # A larger example Let's muck up the `mockstudy` data. ```{r} data(mockstudy) mockstudy2 <- muck_up_mockstudy() ``` We've changed row order, so let's compare by the case ID: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case")) ``` # Column name comparison options It is possible to change which column names are considered "the same variable". ## Ignoring case For example, to ignore case in variable names (so that `Arm` and `arm` are considered the same), pass `tol.vars = "case"`. You can do this using `comparedf.control()` ```{r eval = FALSE} summary(comparedf(mockstudy, mockstudy2, by = "case", control = comparedf.control(tol.vars = "case"))) ``` or pass it through the `...` arguments. ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = "case")) ``` ## Treating dots and underscores the same (equivalence classes) It is possible to treat certain characters or sets of characters as the same by passing a character vector of equivalence classes to the `tol.vars=` argument. In short, each string in the vector is split into single characters, and the resulting set of characters is replaced by the first character in the string. For example, passing `c("._")` would replace all underscores with dots in the column names of both datasets. Similarly, passing `c("aA", "BbCc")` would replace all instances of `"A"` with `"a"` and all instances of `"b"`, `"C"`, or `"c"` with `"B"`. This is one way to ignore case for certain letters. Otherwise, it's possible to combine the equivalence classes with ignoring case, by passing (e.g.) `c("._", "case")`. Passing a single character as an element this vector will replace that character with the empty string. For example, passing c(" ", ".") would remove all spaces and dots from the column names. For mockstudy, let's treat dots, underscores, and spaces as the same, and ignore case: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case") # dots=underscores=spaces, ignore case )) ``` ## Manually specifying columns to match together If you pass a named vector to the `tol.vars=` argument, `comparedf()` will line up the names of that vector to the column names of `x` and the values of that vector to the column names of `y`. In this way, you can manually specify which non-identically-named columns to compare. For mockstudy, let's specify our variables manually in this way: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c(arm = "Arm", fu.stat = "fu stat", fu.time = "fu_time") )) ``` # Column comparison options ## Logical tolerance Use the `tol.logical=` argument to change how logicals are compared. By default, they're expected to be equal to each other. ## Numeric tolerance To allow numeric differences of a certain tolerance, use the `tol.num=` and `tol.num.val=` options. `tol.num.val=` determines the maximum (unsigned) difference tolerated if `tol.num="absolute"` (default), and determines the maximum (unsigned) percent difference tolerated if `tol.num="percent"`. Also note the option `int.as.num=`, which determines whether integers and numerics should be compared despite their class difference. If `TRUE`, the integers are coerced to numeric. Note that `mockstudy$ast` is integer, while `mockstudy2$ast` is numeric: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE # compare integers and numerics )) ``` Suppose a tolerance of up to 10 is allowed for `ast`: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10 # allow absolute differences <= 10 )) ``` ## Factor tolerance By default, factors are compared to each other based on both the labels and the underlying numeric levels. Set `tol.factor="levels"` to match only the numeric levels, or set `tol.factor="labels"` to match only the labels. ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels" # match only factor labels )) ``` Also note the option `factor.as.char=`, which determines whether factors and characters should be compared despite their class difference. If `TRUE`, the factors are coerced to characters. Note that `mockstudy$race` is a character, while `mockstudy2$race` is a factor: ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE # compare factors and characters )) ``` ## Character tolerance Use the `tol.char=` argument to change how character variables are compared. By default, they are compared as-is, but they can be compared after ignoring case or trimming whitespace or both. ```{r} summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case" # ignore case in character vectors )) ``` ## Date tolerance Use the `tol.date=` argument to change how dates are compared. By default, they're expected to be equal to each other. ## Other data type tolerances Use the `tol.other=` argument to change how other objects are compared. By default, they're expected to be `identical()`. ## Specifying tolerances for each variable You can also provide a list of tolerance functions to `comparedf()`: ```{r eval=FALSE} comparedf.control(tol.char = list( "none", # the default x1 = "case", # be case-insensitive for the variable "x1" x2 = function(x, y) tol.NA(x, y, x != y | y == "NA") # a custom-defined tolerance )) ``` ## User-defined tolerance functions ### Details The `comparedf.control()` function accepts functions for any of the tolerance arguments in addition to the short-hand character strings. This allows the user to create custom tolerance functions to suit his/her needs. Any custom tolerance function must accept two vectors as arguments and return a logical vector of the same length. The `TRUE`s in the results should correspond to elements which are deemed "different". Note that the numeric and date tolerance functions should also include a third argument for tolerance size (even if it's not used). CAUTION: the results should not include NAs, since the logical vector is used to subset the input data.frames. The `tol.NA()` function is useful for considering any NAs in the two vectors (but not both) as differences, in addition to other criteria. The `tol.NA()` function is used in all default tolerance functions to help handle NAs. ### Example 1 Suppose we want to ignore any dates which are later in the second dataset than the first. We define a custom tolerance function. ```{r results = 'markup'} my.tol <- function(x, y, tol) { tol.NA(x, y, x > y) } date.df1 <- data.frame(dt = as.Date(c("2017-09-07", "2017-08-08", "2017-07-09", NA))) date.df2 <- data.frame(dt = as.Date(c("2017-10-01", "2017-08-08", "2017-07-10", "2017-01-01"))) n.diffs(comparedf(date.df1, date.df2)) # default finds any differences n.diffs(comparedf(date.df1, date.df2, tol.date = my.tol)) # our function identifies only the NA as different... n.diffs(comparedf(date.df2, date.df1, tol.date = my.tol)) # ... until we change the argument order ``` ### Example 2 (Continuing our mockstudy example) Suppose we're okay with NAs getting replaced by -9. ```{r} tol.minus9 <- function(x, y, tol) { idx1 <- is.na(x) & !is.na(y) & y == -9 idx2 <- tol.num.absolute(x, y, tol) # find other absolute differences return(!idx1 & idx2) } summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case", # ignore case in character vectors tol.num = tol.minus9 # ignore NA -> -9 changes )) ``` # Extract Differences Differences can be easily extracted using the `diffs()` function. If you only want to determine how many differences were found, use the `n.diffs()` function. ```{r results = 'markup'} cmp <- comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), int.as.num = TRUE) n.diffs(cmp) head(diffs(cmp)) ``` Differences can also be summarized by variable. ```{r results = 'markup'} diffs(cmp, by.var = TRUE) ``` To report differences from only a few variables, one can pass a list of variable names to `diffs()`. ```{r results = 'markup'} diffs(cmp, vars = c("ps", "ast"), by.var = TRUE) diffs(cmp, vars = c("ps", "ast")) ``` # Appendix ## Stucture of the Object (This section is just as much for my use as for yours!) ```{r} obj <- comparedf(mockstudy, mockstudy2, by = "case") ``` There are two main objects in the `"comparedf"` object, each with its own print method. The `frame.summary` contains: - the substituted-deparsed arguments - information about the number of columns and rows in each dataset - the by-variables for each dataset (which may not be the same) - the attributes for each dataset (which get counted in the print method) - a data.frame of by-variables and row numbers of observations not shared between datasets - the number of shared observations ```{r results='markup'} print(obj$frame.summary) ``` The `vars.summary` contains: - variable name, column number, and class vector (with possibly more than one element) for each x and y. These are all `NA` if there isn't a match in both datasets. - values, a list-column of the text string `"by-variable"` for the by-variables, `NULL` for columns that aren't compared, or a data.frame containing: - The by-variables for differences found - The values which are different for x and y - The row numbers for differences found - attrs, a list-column of `NULL` if there are no attributes, or a data.frame containing: - The name of the attributes - The attributes for x and y, set to `NA` if non-existant - The actual attributes (if `show.attr=TRUE`). ```{r results='markup'} print(obj$vars.summary) ``` arsenal/inst/doc/write2.html0000644000176200001440000015401414056514665015544 0ustar liggesusers The write2 function

    The write2 function

    Ethan Heinzen

    Introduction

    The write2*() functions were designed as an alternative to SAS’s ODS procedure for useRs who want to save R Markdown tables to separate Word, HTML, or PDF files without needing separate R Markdown programs.

    There are three shortcut functions for the most common output types: HTML, PDF, and Word. Each of these three functions calls write2(), an S3 function which accepts many file output types (see the help pages for rmarkdown::render()). Methods have been implemented for tableby(), modelsum(), and freqlist(), but also knitr::kable(), xtable::xtable(), and pander::pander_return().

    The two most important things to recognize with write2() are the following:

    1. Which function is being used to output the object. Sometimes the write2 functions use summary(), while other times they will use print(). The details for each object specifically are described below.

    2. How the ... arguments are passed. To change the options for the summary-like or print-like function, you can pass named arguments which will in turn get passed to the appropriate function. Details for each object specifically are described below.

    A note on piping

    arsenal is piping-compatible!

    The write2*() functions are probably the most useful place to take advantage of the magrittr package’s piping framework, since commands are often nested several functions deep in the context of write2*(). Piping also allows the arsenal package to become a part of more standard analysis pipelines; instead of needing to write separate R Markdown programs, intermediate analysis tables and output can be easily incorporated into piped statements.

    This vignette will sprinkle the foward pipe (%>%) throughout as a hint at the power and flexibility of arsenal and piping.

    Examples Using arsenal Objects

    library(arsenal)
    library(magrittr)
    data(mockstudy)
    tmpdir <- tempdir()

    tableby

    For tableby objects, the output function in write2() is summary(). For summary.tableby objects, the output function is print(). For available arguments, see the help pages for summary.tableby(). Don’t use the option text = TRUE with the write2 functions.

    mylabels <- list(sex = "SEX", age ="Age, yrs")
    tab1 <- tableby(arm ~ sex + age, data=mockstudy)
    
    write2html(
      tab1, paste0(tmpdir, "/test.tableby.html"), quiet = TRUE,
      title = "My test table",      # passed to summary.tableby
      labelTranslations = mylabels, # passed to summary.tableby
      total = FALSE                 # passed to summary.tableby
    )

    modelsum

    For modelsum objects, the output function in write2() is summary(). For summary.modelsum objects, the output function is print(). For available arguments, see the help pages for summary.modelsum(). Don’t use the option text = TRUE with the write2 functions.

    tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy)
    
    write2pdf(
      tab2, paste0(tmpdir, "/test.modelsum.pdf"), quiet = TRUE,
      title = "My test table", # passed to summary.modelsum
      show.intercept = FALSE,  # passed to summary.modelsum
      digits = 5               # passed to summary.modelsum
    )

    freqlist

    For freqlist objects, the output function in write2() is summary(). For summary.freqlist objects, the output function is print(). For available arguments, see the help pages for summary.freqlist().

    mockstudy[, c("arm", "sex", "mdquality.s")] %>% 
      table(useNA = "ifany") %>% 
      freqlist(groupBy = c("arm", "sex")) %>% 
      write2word(
        paste0(tmpdir, "/test.freqlist.doc"), quiet = TRUE,
        single = FALSE,         # passed to summary.freqlist
        title = "My cool title" # passed to summary.freqlist
      )

    comparedf

    For comparedf objects, the output function in write2() is summary(). For summary.comparedf objects, the output function is print().

    Examples Using Other Objects

    knitr::kable()

    For objects resulting from a call to kable(), the output function in write2() is print(). There aren’t any arguments to the print.knitr_kable() function.

    mockstudy %>% 
      head() %>% 
      knitr::kable() %>% 
      write2html(paste0(tmpdir, "/test.kable.html"), quiet = TRUE)

    xtable::xtable()

    For xtable objects, the output function in write2() is print(). For available arguments, see the help pages for print.xtable().

    mockstudy %>% 
      head() %>% 
      xtable::xtable(caption = "My xtable") %>% 
      write2pdf(
        paste0(tmpdir, "/test.xtable.pdf"), quiet = TRUE,
        comment = FALSE, # passed to print.xtable to turn off the default message about xtable version
        include.rownames = FALSE, # passed to print.xtable
        caption.placement = "top" # passed to print.xtable
      )

    To make an HTML document, use the print.xtable() option type = "html".

    mockstudy %>% 
      head() %>% 
      xtable::xtable(caption = "My xtable") %>% 
      write2html(
        paste0(tmpdir, "/test.xtable.html"), quiet = TRUE,
        type = "html",            # passed to print.xtable
        comment = FALSE, # passed to print.xtable to turn off the default message about xtable version
        include.rownames = FALSE, # passed to print.xtable
        caption.placement = "top" # passed to print.xtable
      )

    User beware! xtable() is not compatible with write2word().

    pander::pander_return()

    Pander is a little bit more tricky. Since pander::pander() doesn’t return an object, the useR should instead use pander::pander_return(). For this (and for all character vectors), the the output function in write2() is cat(sep = '\n').

    write2word(pander::pander_return(head(mockstudy)), file = paste0(tmpdir, "/test.pander.doc"), quiet = TRUE)

    Output Multiple Tables to One Document

    To output multiple tables into a document, simply make a list of them and call the same function as before.

    mylist <- list(
      tableby(sex ~ age, data = mockstudy),
      freqlist(table(mockstudy[, c("sex", "arm")])),
      knitr::kable(head(mockstudy))
    )
    
    write2pdf(mylist, paste0(tmpdir, "/test.mylist.pdf"), quiet = TRUE)

    One neat side-effect of this function is that you can output text and headers, etc. The possibilities are endless!

    mylist2 <- list(
      "# Header 1",
      "This is a small paragraph introducing tableby.",
      tableby(sex ~ age, data = mockstudy),
      "<hr>",
      "# Header 2",
      "<font color='red'>I can change color of my text!</font>"
    )
    write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE)

    In fact, you can even recurse on the lists!

    write2pdf(list(mylist2, mylist), paste0(tmpdir, "/test.mylists.pdf"), quiet = TRUE)

    Output Other Objects Monospaced (as if in a terminal)

    It may be useful at times to write output that would normally be copied from the terminal. The default method for write2() does this automatically. To output the results of summary.lm(), for example:

    lm(age ~ sex, data = mockstudy) %>% 
      summary() %>% 
      write2pdf(paste0(tmpdir, "/test.lm.pdf"), quiet = TRUE)

    The verbatim() function is another option to explicitly alert write2() to do this. This becomes particularly helpful to overrule existing S3 methods.

    For example, suppose you wanted to just print a tableby object (as if it were to print in the terminal):

    tab4 <- tableby(arm ~ sex + age, data=mockstudy)
    write2html(verbatim(tab4), paste0(tmpdir, "/test.print.tableby.html"), quiet = TRUE)

    Or suppose you wanted to print a character vector (as if it were to print in the terminal):

    chr <- paste0("MyVector", 1:10)
    write2pdf(verbatim(chr), paste0(tmpdir, "/test.character.pdf"), quiet = TRUE)

    Note that you can combine multiple objects in one call:

    write2pdf(verbatim(tab4, chr), paste0(tmpdir, "/test.verbatim.pdf"), quiet = TRUE)

    Add a YAML Header to the Output

    You can add a YAML header to write2() output using the yaml() function.

    mylist3 <- list(
      yaml(title = "Test YAML Title", author = "My cool author name"),
      "# Header 1",
      "This is a small paragraph introducing tableby.",
      tableby(sex ~ age, data = mockstudy)
    )
    write2html(mylist3, paste0(tmpdir, "/test.yaml.html"), quiet = TRUE)

    In fact, all detected YAML pieces will be moved as the first output, so that the above code chunk gives the same output as this one:

    mylist4 <- list(
      "# Header 1",
      "This is a small paragraph introducing tableby.",
      yaml(title = "Test YAML Title"),
      tableby(sex ~ age, data = mockstudy),
      yaml(author = "My cool author name")
    )
    write2html(mylist4, paste0(tmpdir, "/test.yaml2.html"), quiet = TRUE)

    Add a Code Chunk to the Output

    It is now possible to add code chunks to the output .Rmd:

    mylist5 <- list(
      "# What is 1 + 2?",
      code.chunk(a <- 1, b <- 2),
      code.chunk(a + b, chunk.opts = "r echo=FALSE, eval=TRUE")
    )
    write2html(mylist5, paste0(tmpdir, "/test.code.chunk.html"), quiet = TRUE)

    This allow flexibility to create objects on-the-fly, to read in saved objects to the temporary .Rmd, etc. The possibilities are endless!

    FAQs

    How do I suppress the note about my document getting rendered?

    This is easily accomplished by using the argument quiet = TRUE (passed to the rmarkdown::render() function).

    write2html(
      knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.quiet.html"),
      quiet = TRUE # passed to rmarkdown::render
    )

    How do I look at the temporary .Rmd file?

    This is easily accomplished by using the option keep.rmd = TRUE.

    write2html(
      knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.rmd.html"),
      quiet = TRUE, # passed to rmarkdown::render
      keep.rmd = TRUE
    )

    How do I prevent my document from being rendered?

    This is easily accomplished by using the option render. = FALSE. Note that this will then default to keep.rmd = TRUE.

    write2html(
      knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.dont.render.html"),
      render. = FALSE
    )

    How do I output headers, raw HTML/LaTeX, paragraphs, etc.?

    One can simply abuse the list S3 method for write2()!

    mylist2 <- list(
      "# Header 1",
      "This is a small paragraph introducing tableby.",
      tableby(sex ~ age, data = mockstudy),
      "<hr>",
      "# Header 2",
      "<font color='red'>I can change color of my text!</font>"
    )
    write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE)

    How do I tweak the default format from write2word(), write2html(), or write2pdf()?

    You can pass arguments to the format functions used behind the scenes.

    write2html(
      knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.theme.html"),
      quiet = TRUE,  # passed to rmarkdown::render
      theme = "yeti" # passed to rmarkdown::html_document
    )

    See the help pages for rmarkdown::word_document(), rmarkdown::html_document(), and rmarkdown::pdf_document().

    How do I output to a file format other than word, HTML, and PDF?

    This can be done using the generic write2() function. The last argument in the function can be another format specification. For details on the acceptable inputs, see the help page for write2().

    write2(
      knitr::kable(head(mockstudy[, 1:4])), paste0(tmpdir, "/test.kable.rtf"),
      quiet = TRUE,  # passed to rmarkdown::render
      output_format = rmarkdown::rtf_document
    )

    How do I avoid prefixes on my table captions in PDF?

    You can do this pretty easily with the yaml() function:

    mylist5 <- list(
      yaml("header-includes" = list("\\usepackage[labelformat=empty]{caption}")),
      "# Header 1",
      "This is a small paragraph introducing tableby.",
      tableby(sex ~ age, data = mockstudy)
    )
    write2pdf(mylist5, paste0(tmpdir, "/test.noprefixes.pdf"), title = "My tableby")

    How do I output multiple tables with different titles?

    There are now write2() methods for the summary objects of arsenal functions. This allows you to specify a title for each table:

    mylist6 <- list(
      summary(tableby(sex ~ age, data = mockstudy), title = "A Title for tableby"),
      summary(modelsum(age ~ sex, data = mockstudy), title = "A Title for modelsum"),
      summary(freqlist(~ sex, data = mockstudy), title = "A Title for freqlist")
    )
    write2pdf(mylist6, paste0(tmpdir, "/test.multiple.titles.pdf"))

    Why is write2() not working in R Markdown/R Studio?

    It’s possible that a global option in R Studio is preventing the tables from rendering. Consider turning off (i.e., unchecking) the option Tools > Global Options > R Markdown > Show output inline for all R Markdown documents.

    arsenal/inst/doc/comparedf.R0000644000176200001440000001777014056514631015525 0ustar liggesusers## ----include = FALSE-------------------------------------------------------------------------------------------------- knitr::opts_chunk$set(eval = TRUE, message = FALSE, results = 'asis', comment='') options(width = 120) ## ----results = 'asis'------------------------------------------------------------------------------------------------- library(arsenal) ## --------------------------------------------------------------------------------------------------------------------- df1 <- data.frame(id = paste0("person", 1:3), a = c("a", "b", "c"), b = c(1, 3, 4), c = c("f", "e", "d"), row.names = paste0("rn", 1:3), stringsAsFactors = FALSE) df2 <- data.frame(id = paste0("person", 3:1), a = c("c", "b", "a"), b = c(1, 3, 4), d = paste0("rn", 1:3), row.names = paste0("rn", c(1,3,2)), stringsAsFactors = FALSE) ## ----results='markup'------------------------------------------------------------------------------------------------- comparedf(df1, df2) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(df1, df2)) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(df1, df2, by = "id")) ## --------------------------------------------------------------------------------------------------------------------- data(mockstudy) mockstudy2 <- muck_up_mockstudy() ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case")) ## ----eval = FALSE----------------------------------------------------------------------------------------------------- # summary(comparedf(mockstudy, mockstudy2, by = "case", control = comparedf.control(tol.vars = "case"))) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = "case")) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case") # dots=underscores=spaces, ignore case )) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c(arm = "Arm", fu.stat = "fu stat", fu.time = "fu_time") )) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE # compare integers and numerics )) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10 # allow absolute differences <= 10 )) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels" # match only factor labels )) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE # compare factors and characters )) ## --------------------------------------------------------------------------------------------------------------------- summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case" # ignore case in character vectors )) ## ----eval=FALSE------------------------------------------------------------------------------------------------------- # comparedf.control(tol.char = list( # "none", # the default # x1 = "case", # be case-insensitive for the variable "x1" # x2 = function(x, y) tol.NA(x, y, x != y | y == "NA") # a custom-defined tolerance # )) ## ----results = 'markup'----------------------------------------------------------------------------------------------- my.tol <- function(x, y, tol) { tol.NA(x, y, x > y) } date.df1 <- data.frame(dt = as.Date(c("2017-09-07", "2017-08-08", "2017-07-09", NA))) date.df2 <- data.frame(dt = as.Date(c("2017-10-01", "2017-08-08", "2017-07-10", "2017-01-01"))) n.diffs(comparedf(date.df1, date.df2)) # default finds any differences n.diffs(comparedf(date.df1, date.df2, tol.date = my.tol)) # our function identifies only the NA as different... n.diffs(comparedf(date.df2, date.df1, tol.date = my.tol)) # ... until we change the argument order ## --------------------------------------------------------------------------------------------------------------------- tol.minus9 <- function(x, y, tol) { idx1 <- is.na(x) & !is.na(y) & y == -9 idx2 <- tol.num.absolute(x, y, tol) # find other absolute differences return(!idx1 & idx2) } summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case int.as.num = TRUE, # compare integers and numerics tol.num.val = 10, # allow absolute differences <= 10 tol.factor = "labels", # match only factor labels factor.as.char = TRUE, # compare factors and characters tol.char = "case", # ignore case in character vectors tol.num = tol.minus9 # ignore NA -> -9 changes )) ## ----results = 'markup'----------------------------------------------------------------------------------------------- cmp <- comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), int.as.num = TRUE) n.diffs(cmp) head(diffs(cmp)) ## ----results = 'markup'----------------------------------------------------------------------------------------------- diffs(cmp, by.var = TRUE) ## ----results = 'markup'----------------------------------------------------------------------------------------------- diffs(cmp, vars = c("ps", "ast"), by.var = TRUE) diffs(cmp, vars = c("ps", "ast")) ## --------------------------------------------------------------------------------------------------------------------- obj <- comparedf(mockstudy, mockstudy2, by = "case") ## ----results='markup'------------------------------------------------------------------------------------------------- print(obj$frame.summary) ## ----results='markup'------------------------------------------------------------------------------------------------- print(obj$vars.summary) arsenal/inst/doc/write2.Rmd0000644000176200001440000003121313741603743015311 0ustar liggesusers--- title: "The write2 function" author: "Ethan Heinzen" output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: | %\VignetteIndexEntry{The write2 function} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r include = FALSE} knitr::opts_chunk$set(eval = FALSE, message = FALSE) ``` # Introduction The `write2*()` functions were designed as an alternative to SAS's `ODS` procedure for useRs who want to save R Markdown tables to separate Word, HTML, or PDF files without needing separate R Markdown programs. There are three shortcut functions for the most common output types: HTML, PDF, and Word. Each of these three functions calls `write2()`, an S3 function which accepts many file output types (see the help pages for `rmarkdown::render()`). Methods have been implemented for `tableby()`, `modelsum()`, and `freqlist()`, but also `knitr::kable()`, `xtable::xtable()`, and `pander::pander_return()`. The two most important things to recognize with `write2()` are the following: 1. Which function is being used to output the object. Sometimes the `write2` functions use `summary()`, while other times they will use `print()`. The details for each object specifically are described below. 2. How the `...` arguments are passed. To change the options for the summary-like or print-like function, you can pass named arguments which will in turn get passed to the appropriate function. Details for each object specifically are described below. # A note on piping `arsenal` is piping-compatible! The `write2*()` functions are probably the most useful place to take advantage of the `magrittr` package's piping framework, since commands are often nested several functions deep in the context of `write2*()`. Piping also allows the `arsenal` package to become a part of more standard analysis pipelines; instead of needing to write separate R Markdown programs, intermediate analysis tables and output can be easily incorporated into piped statements. This vignette will sprinkle the foward pipe (`%>%`) throughout as a hint at the power and flexibility of `arsenal` and piping. # Examples Using `arsenal` Objects ```{r} library(arsenal) library(magrittr) data(mockstudy) tmpdir <- tempdir() ``` ## `tableby` For `tableby` objects, the output function in `write2()` is `summary()`. For `summary.tableby` objects, the output function is `print()`. For available arguments, see the help pages for `summary.tableby()`. Don't use the option `text = TRUE` with the `write2` functions. ```{r} mylabels <- list(sex = "SEX", age ="Age, yrs") tab1 <- tableby(arm ~ sex + age, data=mockstudy) write2html( tab1, paste0(tmpdir, "/test.tableby.html"), quiet = TRUE, title = "My test table", # passed to summary.tableby labelTranslations = mylabels, # passed to summary.tableby total = FALSE # passed to summary.tableby ) ``` ## `modelsum` For `modelsum` objects, the output function in `write2()` is `summary()`. For `summary.modelsum` objects, the output function is `print()`. For available arguments, see the help pages for `summary.modelsum()`. Don't use the option `text = TRUE` with the `write2` functions. ```{r} tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~ age + sex, family = "gaussian", data = mockstudy) write2pdf( tab2, paste0(tmpdir, "/test.modelsum.pdf"), quiet = TRUE, title = "My test table", # passed to summary.modelsum show.intercept = FALSE, # passed to summary.modelsum digits = 5 # passed to summary.modelsum ) ``` ## `freqlist` For `freqlist` objects, the output function in `write2()` is `summary()`. For `summary.freqlist` objects, the output function is `print()`. For available arguments, see the help pages for `summary.freqlist()`. ```{r} mockstudy[, c("arm", "sex", "mdquality.s")] %>% table(useNA = "ifany") %>% freqlist(groupBy = c("arm", "sex")) %>% write2word( paste0(tmpdir, "/test.freqlist.doc"), quiet = TRUE, single = FALSE, # passed to summary.freqlist title = "My cool title" # passed to summary.freqlist ) ``` ## `comparedf` For `comparedf` objects, the output function in `write2()` is `summary()`. For `summary.comparedf` objects, the output function is `print()`. # Examples Using Other Objects ## `knitr::kable()` For objects resulting from a call to `kable()`, the output function in `write2()` is `print()`. There aren't any arguments to the `print.knitr_kable()` function. ```{r} mockstudy %>% head() %>% knitr::kable() %>% write2html(paste0(tmpdir, "/test.kable.html"), quiet = TRUE) ``` ## `xtable::xtable()` For `xtable` objects, the output function in `write2()` is `print()`. For available arguments, see the help pages for `print.xtable()`. ```{r} mockstudy %>% head() %>% xtable::xtable(caption = "My xtable") %>% write2pdf( paste0(tmpdir, "/test.xtable.pdf"), quiet = TRUE, comment = FALSE, # passed to print.xtable to turn off the default message about xtable version include.rownames = FALSE, # passed to print.xtable caption.placement = "top" # passed to print.xtable ) ``` To make an HTML document, use the `print.xtable()` option `type = "html"`. ```{r} mockstudy %>% head() %>% xtable::xtable(caption = "My xtable") %>% write2html( paste0(tmpdir, "/test.xtable.html"), quiet = TRUE, type = "html", # passed to print.xtable comment = FALSE, # passed to print.xtable to turn off the default message about xtable version include.rownames = FALSE, # passed to print.xtable caption.placement = "top" # passed to print.xtable ) ``` User beware! `xtable()` is not compatible with `write2word()`. ## `pander::pander_return()` Pander is a little bit more tricky. Since `pander::pander()` doesn't return an object, the useR should instead use `pander::pander_return()`. For this (and for all character vectors), the the output function in `write2()` is `cat(sep = '\n')`. ```{r} write2word(pander::pander_return(head(mockstudy)), file = paste0(tmpdir, "/test.pander.doc"), quiet = TRUE) ``` # Output Multiple Tables to One Document To output multiple tables into a document, simply make a list of them and call the same function as before. ```{r} mylist <- list( tableby(sex ~ age, data = mockstudy), freqlist(table(mockstudy[, c("sex", "arm")])), knitr::kable(head(mockstudy)) ) write2pdf(mylist, paste0(tmpdir, "/test.mylist.pdf"), quiet = TRUE) ``` One neat side-effect of this function is that you can output text and headers, etc. The possibilities are endless! ```{r} mylist2 <- list( "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy), "
    ", "# Header 2", "I can change color of my text!" ) write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE) ``` In fact, you can even recurse on the lists! ```{r} write2pdf(list(mylist2, mylist), paste0(tmpdir, "/test.mylists.pdf"), quiet = TRUE) ``` # Output Other Objects Monospaced (as if in a terminal) It may be useful at times to write output that would normally be copied from the terminal. The default method for `write2()` does this automatically. To output the results of `summary.lm()`, for example: ```{r} lm(age ~ sex, data = mockstudy) %>% summary() %>% write2pdf(paste0(tmpdir, "/test.lm.pdf"), quiet = TRUE) ``` The `verbatim()` function is another option to explicitly alert `write2()` to do this. This becomes particularly helpful to overrule existing S3 methods. For example, suppose you wanted to just print a tableby object (as if it were to print in the terminal): ```{r} tab4 <- tableby(arm ~ sex + age, data=mockstudy) write2html(verbatim(tab4), paste0(tmpdir, "/test.print.tableby.html"), quiet = TRUE) ``` Or suppose you wanted to print a character vector (as if it were to print in the terminal): ```{r} chr <- paste0("MyVector", 1:10) write2pdf(verbatim(chr), paste0(tmpdir, "/test.character.pdf"), quiet = TRUE) ``` Note that you can combine multiple objects in one call: ```{r} write2pdf(verbatim(tab4, chr), paste0(tmpdir, "/test.verbatim.pdf"), quiet = TRUE) ``` # Add a YAML Header to the Output You can add a YAML header to `write2()` output using the `yaml()` function. ```{r} mylist3 <- list( yaml(title = "Test YAML Title", author = "My cool author name"), "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy) ) write2html(mylist3, paste0(tmpdir, "/test.yaml.html"), quiet = TRUE) ``` In fact, all detected YAML pieces will be moved as the first output, so that the above code chunk gives the same output as this one: ```{r} mylist4 <- list( "# Header 1", "This is a small paragraph introducing tableby.", yaml(title = "Test YAML Title"), tableby(sex ~ age, data = mockstudy), yaml(author = "My cool author name") ) write2html(mylist4, paste0(tmpdir, "/test.yaml2.html"), quiet = TRUE) ``` # Add a Code Chunk to the Output It is now possible to add code chunks to the output `.Rmd`: ```{r} mylist5 <- list( "# What is 1 + 2?", code.chunk(a <- 1, b <- 2), code.chunk(a + b, chunk.opts = "r echo=FALSE, eval=TRUE") ) write2html(mylist5, paste0(tmpdir, "/test.code.chunk.html"), quiet = TRUE) ``` This allow flexibility to create objects on-the-fly, to read in saved objects to the temporary `.Rmd`, etc. The possibilities are endless! # FAQs ## How do I suppress the note about my document getting rendered? This is easily accomplished by using the argument `quiet = TRUE` (passed to the `rmarkdown::render()` function). ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.quiet.html"), quiet = TRUE # passed to rmarkdown::render ) ``` ## How do I look at the temporary `.Rmd` file? This is easily accomplished by using the option `keep.rmd = TRUE`. ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.keep.rmd.html"), quiet = TRUE, # passed to rmarkdown::render keep.rmd = TRUE ) ``` ## How do I prevent my document from being rendered? This is easily accomplished by using the option `render. = FALSE`. Note that this will then default to `keep.rmd = TRUE`. ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.dont.render.html"), render. = FALSE ) ``` ## How do I output headers, raw HTML/LaTeX, paragraphs, etc.? One can simply abuse the list S3 method for `write2()`! ```{r} mylist2 <- list( "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy), "
    ", "# Header 2", "I can change color of my text!" ) write2html(mylist2, paste0(tmpdir, "/test.mylist2.html"), quiet = TRUE) ``` ## How do I tweak the default format from `write2word()`, `write2html()`, or `write2pdf()`? You can pass arguments to the format functions used behind the scenes. ```{r} write2html( knitr::kable(head(mockstudy)), paste0(tmpdir, "/test.kable.theme.html"), quiet = TRUE, # passed to rmarkdown::render theme = "yeti" # passed to rmarkdown::html_document ) ``` See the help pages for `rmarkdown::word_document()`, `rmarkdown::html_document()`, and `rmarkdown::pdf_document()`. ## How do I output to a file format other than word, HTML, and PDF? This can be done using the generic `write2()` function. The last argument in the function can be another format specification. For details on the acceptable inputs, see the help page for `write2()`. ```{r} write2( knitr::kable(head(mockstudy[, 1:4])), paste0(tmpdir, "/test.kable.rtf"), quiet = TRUE, # passed to rmarkdown::render output_format = rmarkdown::rtf_document ) ``` ## How do I avoid prefixes on my table captions in PDF? You can do this pretty easily with the `yaml()` function: ```{r} mylist5 <- list( yaml("header-includes" = list("\\usepackage[labelformat=empty]{caption}")), "# Header 1", "This is a small paragraph introducing tableby.", tableby(sex ~ age, data = mockstudy) ) write2pdf(mylist5, paste0(tmpdir, "/test.noprefixes.pdf"), title = "My tableby") ``` ## How do I output multiple tables with different titles? There are now `write2()` methods for the summary objects of `arsenal` functions. This allows you to specify a title for each table: ```{r} mylist6 <- list( summary(tableby(sex ~ age, data = mockstudy), title = "A Title for tableby"), summary(modelsum(age ~ sex, data = mockstudy), title = "A Title for modelsum"), summary(freqlist(~ sex, data = mockstudy), title = "A Title for freqlist") ) write2pdf(mylist6, paste0(tmpdir, "/test.multiple.titles.pdf")) ``` ## Why is `write2()` not working in R Markdown/R Studio? It's possible that a global option in R Studio is preventing the tables from rendering. Consider turning off (i.e., unchecking) the option Tools > Global Options > R Markdown > Show output inline for all R Markdown documents. arsenal/inst/doc/comparedf.html0000644000176200001440000050326214056514632016265 0ustar liggesusers The comparedf function

    The comparedf function

    Ethan Heinzen, Ryan Lennon, Andrew Hanson

    Introduction

    The comparedf() function can be used to determine and report differences between two data.frames. It was written in the spirit of replacing PROC COMPARE from SAS.

    library(arsenal)

    Why “comparedf”? We originally called this function compare.data.frame(), using testthat::compare() as our S3 generic, but that ended up getting us in trouble because of conflicting object structures. Why this didn’t occur to us at the time remains a mystery. To replace it, we brainstormed several ideas (comparedf(), dfcompare(), collate(), comparison()) but settled on the former for three reasons:

    1. There were no other objects with that generic or class (see testthat::compare() and compare::compare()).

    2. It is mnemonically easy to remember (we “compare data.frames”, not “data.frames compare”).

    3. It tab auto-completes from the original “compare”.

    Basic examples

    We first build two similar data.frames to compare.

    df1 <- data.frame(id = paste0("person", 1:3),
                      a = c("a", "b", "c"),
                      b = c(1, 3, 4),
                      c = c("f", "e", "d"),
                      row.names = paste0("rn", 1:3),
                      stringsAsFactors = FALSE)
    df2 <- data.frame(id = paste0("person", 3:1),
                      a = c("c", "b", "a"),
                      b = c(1, 3, 4),
                      d = paste0("rn", 1:3),
                      row.names = paste0("rn", c(1,3,2)),
                      stringsAsFactors = FALSE)

    To compare these datasets, simply pass them to the comparedf() function:

    comparedf(df1, df2)
    Compare Object
    
    Function Call: 
    comparedf(x = df1, y = df2)
    
    Shared: 3 non-by variables and 3 observations.
    Not shared: 2 variables and 0 observations.
    
    Differences found in 2/3 variables compared.
    0 variables compared have non-identical attributes.

    Use summary() to get a more detailed summary

    summary(comparedf(df1, df2))
    Summary of data.frames
    version arg ncol nrow
    x df1 4 3
    y df2 4 3
    Summary of overall comparison
    statistic value
    Number of by-variables 0
    Number of non-by variables in common 3
    Number of variables compared 3
    Number of variables in x but not y 1
    Number of variables in y but not x 1
    Number of variables compared with some values unequal 2
    Number of variables compared with all values equal 1
    Number of observations in common 3
    Number of observations in x but not y 0
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 2
    Number of observations with all compared variables equal 1
    Number of values unequal 4
    Variables not shared
    version variable position class
    x c 4 character
    y d 4 character
    Other variables not compared
    No other variables not compared
    Observations not shared
    No observations not shared
    Differences detected by variable
    var.x var.y n NAs
    id id 2 0
    a a 2 0
    b b 0 0
    Differences detected
    var.x var.y ..row.names.. values.x values.y row.x row.y
    id id 1 person1 person3 1 1
    id id 3 person3 person1 3 3
    a a 1 a c 1 1
    a a 3 c a 3 3
    Non-identical attributes
    No non-identical attributes

    By default, the datasets are compared row-by-row. To change this, use the by= or by.x= and by.y= arguments:

    summary(comparedf(df1, df2, by = "id"))
    Summary of data.frames
    version arg ncol nrow
    x df1 4 3
    y df2 4 3
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 2
    Number of variables compared 2
    Number of variables in x but not y 1
    Number of variables in y but not x 1
    Number of variables compared with some values unequal 1
    Number of variables compared with all values equal 1
    Number of observations in common 3
    Number of observations in x but not y 0
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 2
    Number of observations with all compared variables equal 1
    Number of values unequal 2
    Variables not shared
    version variable position class
    x c 4 character
    y d 4 character
    Other variables not compared
    No other variables not compared
    Observations not shared
    No observations not shared
    Differences detected by variable
    var.x var.y n NAs
    a a 0 0
    b b 2 0
    Differences detected
    var.x var.y id values.x values.y row.x row.y
    b b person1 1 4 1 3
    b b person3 4 1 3 1
    Non-identical attributes
    No non-identical attributes

    A larger example

    Let’s muck up the mockstudy data.

    data(mockstudy)
    mockstudy2 <- muck_up_mockstudy()

    We’ve changed row order, so let’s compare by the case ID:

    summary(comparedf(mockstudy, mockstudy2, by = "case"))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 9
    Number of variables compared 7
    Number of variables in x but not y 4
    Number of variables in y but not x 3
    Number of variables compared with some values unequal 3
    Number of variables compared with all values equal 4
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1495
    Number of observations with all compared variables equal 0
    Number of values unequal 1762
    Variables not shared
    version variable position class
    x age 2 integer
    x arm 3 character
    x fu.time 6 integer
    x fu.stat 7 integer
    y fu_time 11 integer
    y fu stat 12 integer
    y Arm 13 character
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    ast 12 integer ast 8 numeric
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    sex sex 1495 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1741 not shown)
    var.x var.y case values.x values.y row.x row.y
    sex sex 76170 Male Male 26 20
    sex sex 76240 Male Male 27 21
    sex sex 76431 Female Female 28 22
    sex sex 76712 Male Male 29 23
    sex sex 76780 Female Female 30 24
    sex sex 77066 Female Female 31 25
    sex sex 77316 Male Male 32 26
    sex sex 77355 Male Male 33 27
    sex sex 77591 Male Male 34 28
    sex sex 77851 Male Male 35 29
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    Non-identical attributes
    var.x var.y name
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Column name comparison options

    It is possible to change which column names are considered “the same variable”.

    Ignoring case

    For example, to ignore case in variable names (so that Arm and arm are considered the same), pass tol.vars = "case".

    You can do this using comparedf.control()

    summary(comparedf(mockstudy, mockstudy2, by = "case", control = comparedf.control(tol.vars = "case")))

    or pass it through the ... arguments.

    summary(comparedf(mockstudy, mockstudy2, by = "case", tol.vars = "case"))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 10
    Number of variables compared 8
    Number of variables in x but not y 3
    Number of variables in y but not x 2
    Number of variables compared with some values unequal 3
    Number of variables compared with all values equal 5
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1495
    Number of observations with all compared variables equal 0
    Number of values unequal 1762
    Variables not shared
    version variable position class
    x age 2 integer
    x fu.time 6 integer
    x fu.stat 7 integer
    y fu_time 11 integer
    y fu stat 12 integer
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    ast 12 integer ast 8 numeric
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 1495 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1741 not shown)
    var.x var.y case values.x values.y row.x row.y
    sex sex 76170 Male Male 26 20
    sex sex 76240 Male Male 27 21
    sex sex 76431 Female Female 28 22
    sex sex 76712 Male Male 29 23
    sex sex 76780 Female Female 30 24
    sex sex 77066 Female Female 31 25
    sex sex 77316 Male Male 32 26
    sex sex 77355 Male Male 33 27
    sex sex 77591 Male Male 34 28
    sex sex 77851 Male Male 35 29
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Treating dots and underscores the same (equivalence classes)

    It is possible to treat certain characters or sets of characters as the same by passing a character vector of equivalence classes to the tol.vars= argument.

    In short, each string in the vector is split into single characters, and the resulting set of characters is replaced by the first character in the string. For example, passing c("._") would replace all underscores with dots in the column names of both datasets. Similarly, passing c("aA", "BbCc") would replace all instances of "A" with "a" and all instances of "b", "C", or "c" with "B". This is one way to ignore case for certain letters. Otherwise, it’s possible to combine the equivalence classes with ignoring case, by passing (e.g.) c("._", "case").

    Passing a single character as an element this vector will replace that character with the empty string. For example, passing c(" “,”.") would remove all spaces and dots from the column names.

    For mockstudy, let’s treat dots, underscores, and spaces as the same, and ignore case:

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case") # dots=underscores=spaces, ignore case
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 10
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 3
    Number of variables compared with all values equal 7
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1495
    Number of observations with all compared variables equal 0
    Number of values unequal 1762
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    ast 12 integer ast 8 numeric
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 1495 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1741 not shown)
    var.x var.y case values.x values.y row.x row.y
    sex sex 76170 Male Male 26 20
    sex sex 76240 Male Male 27 21
    sex sex 76431 Female Female 28 22
    sex sex 76712 Male Male 29 23
    sex sex 76780 Female Female 30 24
    sex sex 77066 Female Female 31 25
    sex sex 77316 Male Male 32 26
    sex sex 77355 Male Male 33 27
    sex sex 77591 Male Male 34 28
    sex sex 77851 Male Male 35 29
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Manually specifying columns to match together

    If you pass a named vector to the tol.vars= argument, comparedf() will line up the names of that vector to the column names of x and the values of that vector to the column names of y. In this way, you can manually specify which non-identically-named columns to compare.

    For mockstudy, let’s specify our variables manually in this way:

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c(arm = "Arm", fu.stat = "fu stat", fu.time = "fu_time")
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 10
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 3
    Number of variables compared with all values equal 7
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1495
    Number of observations with all compared variables equal 0
    Number of values unequal 1762
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    ast 12 integer ast 8 numeric
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 1495 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1741 not shown)
    var.x var.y case values.x values.y row.x row.y
    sex sex 76170 Male Male 26 20
    sex sex 76240 Male Male 27 21
    sex sex 76431 Female Female 28 22
    sex sex 76712 Male Male 29 23
    sex sex 76780 Female Female 30 24
    sex sex 77066 Female Female 31 25
    sex sex 77316 Male Male 32 26
    sex sex 77355 Male Male 33 27
    sex sex 77591 Male Male 34 28
    sex sex 77851 Male Male 35 29
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Column comparison options

    Logical tolerance

    Use the tol.logical= argument to change how logicals are compared. By default, they’re expected to be equal to each other.

    Numeric tolerance

    To allow numeric differences of a certain tolerance, use the tol.num= and tol.num.val= options. tol.num.val= determines the maximum (unsigned) difference tolerated if tol.num="absolute" (default), and determines the maximum (unsigned) percent difference tolerated if tol.num="percent".

    Also note the option int.as.num=, which determines whether integers and numerics should be compared despite their class difference. If TRUE, the integers are coerced to numeric. Note that mockstudy$ast is integer, while mockstudy2$ast is numeric:

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                    int.as.num = TRUE            # compare integers and numerics
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 11
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 4
    Number of variables compared with all values equal 7
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1495
    Number of observations with all compared variables equal 0
    Number of values unequal 1765
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 1495 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    ast ast 3 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1741 not shown)
    var.x var.y case values.x values.y row.x row.y
    sex sex 76170 Male Male 26 20
    sex sex 76240 Male Male 27 21
    sex sex 76431 Female Female 28 22
    sex sex 76712 Male Male 29 23
    sex sex 76780 Female Female 30 24
    sex sex 77066 Female Female 31 25
    sex sex 77316 Male Male 32 26
    sex sex 77355 Male Male 33 27
    sex sex 77591 Male Male 34 28
    sex sex 77851 Male Male 35 29
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    ast ast 86205 27 36 6 3
    ast ast 105271 100 36 3 2
    ast ast 110754 35 36 1 1
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Suppose a tolerance of up to 10 is allowed for ast:

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                    int.as.num = TRUE,           # compare integers and numerics
                    tol.num.val = 10             # allow absolute differences <= 10
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 11
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 4
    Number of variables compared with all values equal 7
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1495
    Number of observations with all compared variables equal 0
    Number of values unequal 1763
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 1495 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    ast ast 1 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1741 not shown)
    var.x var.y case values.x values.y row.x row.y
    sex sex 76170 Male Male 26 20
    sex sex 76240 Male Male 27 21
    sex sex 76431 Female Female 28 22
    sex sex 76712 Male Male 29 23
    sex sex 76780 Female Female 30 24
    sex sex 77066 Female Female 31 25
    sex sex 77316 Male Male 32 26
    sex sex 77355 Male Male 33 27
    sex sex 77591 Male Male 34 28
    sex sex 77851 Male Male 35 29
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    ast ast 105271 100 36 3 2
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Factor tolerance

    By default, factors are compared to each other based on both the labels and the underlying numeric levels. Set tol.factor="levels" to match only the numeric levels, or set tol.factor="labels" to match only the labels.

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                    int.as.num = TRUE,           # compare integers and numerics
                    tol.num.val = 10,            # allow absolute differences <= 10
                    tol.factor = "labels"        # match only factor labels
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 11
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 3
    Number of variables compared with all values equal 8
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 268
    Number of observations with all compared variables equal 1227
    Number of values unequal 268
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    var.x pos.x class.x var.y pos.y class.y
    race 5 character race 3 factor
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 0 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    ast ast 1 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (256 not shown)
    var.x var.y case values.x values.y row.x row.y
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    ast ast 105271 100 36 3 2
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Also note the option factor.as.char=, which determines whether factors and characters should be compared despite their class difference. If TRUE, the factors are coerced to characters. Note that mockstudy$race is a character, while mockstudy2$race is a factor:

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                    int.as.num = TRUE,           # compare integers and numerics
                    tol.num.val = 10,            # allow absolute differences <= 10
                    tol.factor = "labels",       # match only factor labels
                    factor.as.char = TRUE        # compare factors and characters
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 12
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 4
    Number of variables compared with all values equal 8
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 1339
    Number of observations with all compared variables equal 156
    Number of values unequal 1553
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    No other variables not compared
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 0 0
    race race 1285 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    ast ast 1 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (1531 not shown)
    var.x var.y case values.x values.y row.x row.y
    race race 76170 Caucasian caucasian 26 20
    race race 76240 Caucasian caucasian 27 21
    race race 76431 Caucasian caucasian 28 22
    race race 76712 Caucasian caucasian 29 23
    race race 76780 Caucasian caucasian 30 24
    race race 77066 Caucasian caucasian 31 25
    race race 77316 Caucasian caucasian 32 26
    race race 77591 Caucasian caucasian 34 28
    race race 77851 Caucasian caucasian 35 29
    race race 77956 Caucasian caucasian 36 30
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    ast ast 105271 100 36 3 2
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Character tolerance

    Use the tol.char= argument to change how character variables are compared. By default, they are compared as-is, but they can be compared after ignoring case or trimming whitespace or both.

    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                    int.as.num = TRUE,           # compare integers and numerics
                    tol.num.val = 10,            # allow absolute differences <= 10
                    tol.factor = "labels",       # match only factor labels
                    factor.as.char = TRUE,       # compare factors and characters
                    tol.char = "case"            # ignore case in character vectors
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 12
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 3
    Number of variables compared with all values equal 9
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 268
    Number of observations with all compared variables equal 1227
    Number of values unequal 268
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    No other variables not compared
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 0 0
    race race 0 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 266 266
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    ast ast 1 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected (256 not shown)
    var.x var.y case values.x values.y row.x row.y
    ps ps 86205 0 NA 6 3
    hgb hgb 88714 NA -9 192 186
    hgb hgb 88955 NA -9 204 198
    hgb hgb 89549 NA -9 229 223
    hgb hgb 89563 NA -9 231 225
    hgb hgb 89584 NA -9 237 231
    hgb hgb 89591 NA -9 238 232
    hgb hgb 89595 NA -9 239 233
    hgb hgb 89647 NA -9 243 237
    hgb hgb 89665 NA -9 244 238
    hgb hgb 89827 NA -9 255 249
    ast ast 105271 100 36 3 2
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Date tolerance

    Use the tol.date= argument to change how dates are compared. By default, they’re expected to be equal to each other.

    Other data type tolerances

    Use the tol.other= argument to change how other objects are compared. By default, they’re expected to be identical().

    Specifying tolerances for each variable

    You can also provide a list of tolerance functions to comparedf():

    comparedf.control(tol.char = list(
      "none",      # the default
      x1 = "case", # be case-insensitive for the variable "x1"
      x2 = function(x, y) tol.NA(x, y, x != y | y == "NA") # a custom-defined tolerance
    ))

    User-defined tolerance functions

    Details

    The comparedf.control() function accepts functions for any of the tolerance arguments in addition to the short-hand character strings. This allows the user to create custom tolerance functions to suit his/her needs.

    Any custom tolerance function must accept two vectors as arguments and return a logical vector of the same length. The TRUEs in the results should correspond to elements which are deemed “different”. Note that the numeric and date tolerance functions should also include a third argument for tolerance size (even if it’s not used).

    CAUTION: the results should not include NAs, since the logical vector is used to subset the input data.frames. The tol.NA() function is useful for considering any NAs in the two vectors (but not both) as differences, in addition to other criteria.

    The tol.NA() function is used in all default tolerance functions to help handle NAs.

    Example 1

    Suppose we want to ignore any dates which are later in the second dataset than the first. We define a custom tolerance function.

    my.tol <- function(x, y, tol)
    {
      tol.NA(x, y, x > y)
    }
    
    date.df1 <- data.frame(dt = as.Date(c("2017-09-07", "2017-08-08", "2017-07-09", NA)))
    date.df2 <- data.frame(dt = as.Date(c("2017-10-01", "2017-08-08", "2017-07-10", "2017-01-01")))
    n.diffs(comparedf(date.df1, date.df2)) # default finds any differences
    [1] 3
    n.diffs(comparedf(date.df1, date.df2, tol.date = my.tol)) # our function identifies only the NA as different...
    [1] 1
    n.diffs(comparedf(date.df2, date.df1, tol.date = my.tol)) # ... until we change the argument order
    [1] 3

    Example 2

    (Continuing our mockstudy example)

    Suppose we’re okay with NAs getting replaced by -9.

    tol.minus9 <- function(x, y, tol)
    {
      idx1 <- is.na(x) & !is.na(y) & y == -9
      idx2 <- tol.num.absolute(x, y, tol) # find other absolute differences
      return(!idx1 & idx2)
    }
    
    summary(comparedf(mockstudy, mockstudy2, by = "case",
                    tol.vars = c("._ ", "case"), # dots=underscores=spaces, ignore case
                    int.as.num = TRUE,           # compare integers and numerics
                    tol.num.val = 10,            # allow absolute differences <= 10
                    tol.factor = "labels",       # match only factor labels
                    factor.as.char = TRUE,       # compare factors and characters
                    tol.char = "case",           # ignore case in character vectors
                    tol.num = tol.minus9         # ignore NA -> -9 changes
    ))
    Summary of data.frames
    version arg ncol nrow
    x mockstudy 14 1499
    y mockstudy2 13 1495
    Summary of overall comparison
    statistic value
    Number of by-variables 1
    Number of non-by variables in common 12
    Number of variables compared 12
    Number of variables in x but not y 1
    Number of variables in y but not x 0
    Number of variables compared with some values unequal 2
    Number of variables compared with all values equal 10
    Number of observations in common 1495
    Number of observations in x but not y 4
    Number of observations in y but not x 0
    Number of observations with some compared variables unequal 2
    Number of observations with all compared variables equal 1493
    Number of values unequal 2
    Variables not shared
    version variable position class
    x age 2 integer
    Other variables not compared
    No other variables not compared
    Observations not shared
    version case observation
    x 88989 9
    x 90158 8
    x 99508 7
    x 112263 5
    Differences detected by variable
    var.x var.y n NAs
    arm Arm 0 0
    sex sex 0 0
    race race 0 0
    fu.time fu_time 0 0
    fu.stat fu stat 0 0
    ps ps 1 1
    hgb hgb 0 0
    bmi bmi 0 0
    alk.phos alk.phos 0 0
    ast ast 1 0
    mdquality.s mdquality.s 0 0
    age.ord age.ord 0 0
    Differences detected
    var.x var.y case values.x values.y row.x row.y
    ps ps 86205 0 NA 6 3
    ast ast 105271 100 36 3 2
    Non-identical attributes
    var.x var.y name
    arm Arm label
    sex sex label
    sex sex levels
    race race class
    race race label
    race race levels
    bmi bmi label

    Extract Differences

    Differences can be easily extracted using the diffs() function. If you only want to determine how many differences were found, use the n.diffs() function.

    cmp <- comparedf(mockstudy, mockstudy2, by = "case", tol.vars = c("._ ", "case"), int.as.num = TRUE)
    n.diffs(cmp)
    [1] 1765
    head(diffs(cmp))
      var.x var.y  case values.x values.y row.x row.y
    1   sex   sex 76170     Male     Male    26    20
    2   sex   sex 76240     Male     Male    27    21
    3   sex   sex 76431   Female   Female    28    22
    4   sex   sex 76712     Male     Male    29    23
    5   sex   sex 76780   Female   Female    30    24
    6   sex   sex 77066   Female   Female    31    25

    Differences can also be summarized by variable.

    diffs(cmp, by.var = TRUE)
             var.x       var.y    n NAs
    1          arm         Arm    0   0
    2          sex         sex 1495   0
    3      fu.time     fu_time    0   0
    4      fu.stat     fu stat    0   0
    5           ps          ps    1   1
    6          hgb         hgb  266 266
    7          bmi         bmi    0   0
    8     alk.phos    alk.phos    0   0
    9          ast         ast    3   0
    10 mdquality.s mdquality.s    0   0
    11     age.ord     age.ord    0   0

    To report differences from only a few variables, one can pass a list of variable names to diffs().

    diffs(cmp, vars = c("ps", "ast"), by.var = TRUE)
      var.x var.y n NAs
    5    ps    ps 1   1
    9   ast   ast 3   0
    diffs(cmp, vars = c("ps", "ast"))
         var.x var.y   case values.x values.y row.x row.y
    1496    ps    ps  86205        0       NA     6     3
    1763   ast   ast  86205       27       36     6     3
    1764   ast   ast 105271      100       36     3     2
    1765   ast   ast 110754       35       36     1     1

    Appendix

    Stucture of the Object

    (This section is just as much for my use as for yours!)

    obj <- comparedf(mockstudy, mockstudy2, by = "case")

    There are two main objects in the "comparedf" object, each with its own print method.

    The frame.summary contains:

    • the substituted-deparsed arguments

    • information about the number of columns and rows in each dataset

    • the by-variables for each dataset (which may not be the same)

    • the attributes for each dataset (which get counted in the print method)

    • a data.frame of by-variables and row numbers of observations not shared between datasets

    • the number of shared observations

    print(obj$frame.summary)
      version        arg ncol nrow   by        attrs       unique n.shared
    1       x  mockstudy   14 1499 case 3 attributes 4 unique obs     1495
    2       y mockstudy2   13 1495 case 3 attributes 0 unique obs     1495

    The vars.summary contains:

    • variable name, column number, and class vector (with possibly more than one element) for each x and y. These are all NA if there isn’t a match in both datasets.

    • values, a list-column of the text string "by-variable" for the by-variables, NULL for columns that aren’t compared, or a data.frame containing:

      • The by-variables for differences found

      • The values which are different for x and y

      • The row numbers for differences found

    • attrs, a list-column of NULL if there are no attributes, or a data.frame containing:

      • The name of the attributes

      • The attributes for x and y, set to NA if non-existant

      • The actual attributes (if show.attr=TRUE).

    print(obj$vars.summary)
             var.x pos.x         class.x       var.y pos.y         class.y           values        attrs
    1         case     1         integer        case     1         integer      by-variable 0 attributes
    2          sex     4          factor         sex     2          factor 1495 differences 2 attributes
    3         race     5       character        race     3          factor     Not compared 3 attributes
    4           ps     8         integer          ps     4         integer    1 differences 0 attributes
    5          hgb     9         numeric         hgb     5         numeric  266 differences 0 attributes
    6          bmi    10         numeric         bmi     6         numeric    0 differences 1 attributes
    7     alk.phos    11         integer    alk.phos     7         integer    0 differences 0 attributes
    8          ast    12         integer         ast     8         numeric     Not compared 0 attributes
    9  mdquality.s    13         integer mdquality.s     9         integer    0 differences 0 attributes
    10     age.ord    14 ordered, factor     age.ord    10 ordered, factor    0 differences 0 attributes
    11         age     2         integer        <NA>    NA              NA     Not compared 0 attributes
    12         arm     3       character        <NA>    NA              NA     Not compared 0 attributes
    13     fu.time     6         integer        <NA>    NA              NA     Not compared 0 attributes
    14     fu.stat     7         integer        <NA>    NA              NA     Not compared 0 attributes
    15        <NA>    NA              NA     fu_time    11         integer     Not compared 0 attributes
    16        <NA>    NA              NA     fu stat    12         integer     Not compared 0 attributes
    17        <NA>    NA              NA         Arm    13       character     Not compared 0 attributes
    arsenal/inst/doc/modelsum.html0000644000176200001440000230237014056514655016156 0ustar liggesusers The modelsum function

    The modelsum function

    Beth Atkinson, Ethan Heinzen, Pat Votruba, Jason Sinnwell, Shannon McDonnell and Greg Dougherty

    Introduction

    Very often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there may be several independent variables that we want to examine univariately or adjusted for certain variables such as age and sex. Locally at Mayo, the SAS macros %modelsum, %glmuniv, and %logisuni were written to create such summary tables. With the increasing interest in R, we have developed the function modelsum to create similar tables within the R environment.

    In developing the modelsum function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.

    This report provides step-by-step directions for using the functions associated with modelsum. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

    Simple Example

    The first step when using the modelsum function is to load the arsenal package. All the examples in this report use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

    > require(arsenal)
    > data(mockstudy) # load data
    > dim(mockstudy)  # look at how many subjects and variables are in the dataset 
    [1] 1499   14
    > # help(mockstudy) # learn more about the dataset and variables
    > str(mockstudy) # quick look at the data
    'data.frame':   1499 obs. of  14 variables:
     $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
     $ age        : int  67 74 50 71 69 56 50 57 51 63 ...
      ..- attr(*, "label")= chr "Age in Years"
     $ arm        : chr  "F: FOLFOX" "A: IFL" "A: IFL" "G: IROX" ...
      ..- attr(*, "label")= chr "Treatment Arm"
     $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
     $ race       : chr  "Caucasian" "Caucasian" "Caucasian" "Caucasian" ...
      ..- attr(*, "label")= chr "Race"
     $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
     $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
     $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
     $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
     $ bmi        : num  25.1 19.5 NA 29.4 26.4 ...
      ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
     $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
     $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
     $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
     $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...

    To create a simple linear regression table (the default), use a formula statement to specify the variables that you want summarized. The example below predicts BMI with the variables sex and age.

    > tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)

    If you want to take a quick look at the table, you can use summary on your modelsum object and the table will print out as text in your R console window. If you use summary without any options you will see a number of \(\&nbsp;\) statements which translates to “space” in HTML.

    Pretty text version of table

    If you want a nicer version in your console window then adding the text=TRUE option.

    > summary(tab1, text=TRUE)
    
    
    |             |estimate |std.error |p.value |adj.r.squared |Nmiss |
    |:------------|:--------|:---------|:-------|:-------------|:-----|
    |(Intercept)  |27.491   |0.181     |< 0.001 |0.004         |33    |
    |sex Female   |-0.731   |0.290     |0.012   |              |      |
    |(Intercept)  |26.424   |0.752     |< 0.001 |0.000         |33    |
    |Age in Years |0.013    |0.012     |0.290   |              |      |

    Pretty Rmarkdown version of table

    In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names.

    > summary(tab1)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 27.491 0.181 < 0.001 0.004 33
    sex Female -0.731 0.290 0.012
    (Intercept) 26.424 0.752 < 0.001 0.000 33
    Age in Years 0.013 0.012 0.290

    Data frame version of table

    If you want a data.frame version, simply use as.data.frame.

    > as.data.frame(tab1)
      y.term                  y.label strata.term adjustment model        term
    1    bmi Body Mass Index (kg/m^2)             unadjusted     1 (Intercept)
    2    bmi Body Mass Index (kg/m^2)             unadjusted     1   sexFemale
    3    bmi Body Mass Index (kg/m^2)             unadjusted     2 (Intercept)
    4    bmi Body Mass Index (kg/m^2)             unadjusted     2         age
             label term.type    estimate  std.error       p.value adj.r.squared
    1  (Intercept) Intercept 27.49147713 0.18134740  0.000000e+00  3.632258e-03
    2   sex Female      Term -0.73105055 0.29032223  1.190605e-02  3.632258e-03
    3  (Intercept) Intercept 26.42372272 0.75211474 1.279109e-196  8.354809e-05
    4 Age in Years      Term  0.01304859 0.01231653  2.895753e-01  8.354809e-05
      Nmiss
    1    33
    2    33
    3    33
    4    33

    Add an adjustor to the model

    The argument adjust allows the user to indicate that all the variables should be adjusted for these terms. To adjust each model for age and sex (for instance), we use adjust = ~ age + sex:

    > tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy)
    > summary(tab2)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 175.548 20.587 < 0.001 -0.001 266
    Treatment Arm F: FOLFOX -13.701 8.730 0.117
    Treatment Arm G: IROX -2.245 9.860 0.820
    Age in Years -0.017 0.319 0.956
    sex Female 3.016 7.521 0.688
    (Intercept) 148.391 19.585 < 0.001 0.045 266
    ps 46.721 5.987 < 0.001
    Age in Years -0.084 0.311 0.787
    sex Female 1.169 7.343 0.874
    (Intercept) 336.554 32.239 < 0.001 0.031 266
    hgb -13.845 2.137 < 0.001
    Age in Years 0.095 0.314 0.763
    sex Female -5.980 7.516 0.426

    Models for each endpoint type

    To make sure the correct model is run you need to specify “family”. The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added.

    Gaussian

    Fit and summarize linear regression model

    Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship).

    > fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy)
    > summary(fit)
    
    Call:
    lm(formula = alk.phos ~ arm + age + sex, data = mockstudy)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -168.80  -81.45  -47.17   37.39  853.56 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  175.54808   20.58665   8.527   <2e-16 ***
    armF: FOLFOX -13.70062    8.72963  -1.569    0.117    
    armG: IROX    -2.24498    9.86004  -0.228    0.820    
    age           -0.01741    0.31878  -0.055    0.956    
    sexFemale      3.01598    7.52097   0.401    0.688    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 128.5 on 1228 degrees of freedom
      (266 observations deleted due to missingness)
    Multiple R-squared:  0.002552,  Adjusted R-squared:  -0.0006969 
    F-statistic: 0.7855 on 4 and 1228 DF,  p-value: 0.5346
    > plot(fit)

    The results suggest that the endpoint may need to be transformed. Calculating the Box-Cox transformation suggests a log transformation.

    > require(MASS)
    > boxcox(fit)

    > fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy)
    > summary(fit2)
    
    Call:
    lm(formula = log(alk.phos) ~ arm + age + sex, data = mockstudy)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -3.0098 -0.4470 -0.1065  0.4205  2.0620 
    
    Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)   4.9692474  0.1025239  48.469   <2e-16 ***
    armF: FOLFOX -0.0766798  0.0434746  -1.764    0.078 .  
    armG: IROX   -0.0192828  0.0491041  -0.393    0.695    
    age          -0.0004058  0.0015876  -0.256    0.798    
    sexFemale     0.0179253  0.0374553   0.479    0.632    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.6401 on 1228 degrees of freedom
      (266 observations deleted due to missingness)
    Multiple R-squared:  0.003121,  Adjusted R-squared:  -0.0001258 
    F-statistic: 0.9613 on 4 and 1228 DF,  p-value: 0.4278
    > plot(fit2)

    Finally, look to see whether there there is a non-linear relationship with age.

    > require(splines)
    Loading required package: splines
    > fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy)
    > 
    > # test whether there is a difference between models 
    > stats::anova(fit2,fit3)
    Analysis of Variance Table
    
    Model 1: log(alk.phos) ~ arm + age + sex
    Model 2: log(alk.phos) ~ arm + ns(age, df = 2) + sex
      Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
    1   1228 503.19                              
    2   1227 502.07  1    1.1137 2.7218 0.09924 .
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    > 
    > # look at functional form of age
    > termplot(fit3, term=2, se=T, rug=T)

    In this instance it looks like there isn’t enough evidence to say that the relationship is non-linear.

    Extract data using the broom package

    The broom package makes it easy to extract information from the fit.

    > tmp <- tidy(fit3) # coefficients, p-values
    > class(tmp)
    [1] "tbl_df"     "tbl"        "data.frame"
    > tmp
    # A tibble: 6 x 5
      term             estimate std.error statistic   p.value
      <chr>               <dbl>     <dbl>     <dbl>     <dbl>
    1 (Intercept)        4.76      0.141     33.8   1.93e-177
    2 armF: FOLFOX      -0.0767    0.0434    -1.77  7.78e-  2
    3 armG: IROX        -0.0195    0.0491    -0.396 6.92e-  1
    4 ns(age, df = 2)1   0.330     0.260      1.27  2.04e-  1
    5 ns(age, df = 2)2  -0.101     0.0935    -1.08  2.82e-  1
    6 sexFemale          0.0183    0.0374     0.489 6.25e-  1
    > 
    > glance(fit3)
    # A tibble: 1 x 12
      r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
          <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
    1   0.00533       0.00127 0.640      1.31   0.255     5 -1196. 2405. 2441.
    # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

    Create a summary table using modelsum

    > ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex, 
    +                     family=gaussian,  
    +                     gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value"))
    > summary(ms.logy)
    estimate CI.lower.estimate CI.upper.estimate p.value
    (Intercept) 4.969 4.768 5.170 < 0.001
    Treatment Arm F: FOLFOX -0.077 -0.162 0.009 0.078
    Treatment Arm G: IROX -0.019 -0.116 0.077 0.695
    Age in Years -0.000 -0.004 0.003 0.798
    sex Female 0.018 -0.056 0.091 0.632
    (Intercept) 4.832 4.640 5.023 < 0.001
    ps 0.226 0.167 0.284 < 0.001
    Age in Years -0.001 -0.004 0.002 0.636
    sex Female 0.009 -0.063 0.081 0.814
    (Intercept) 5.765 5.450 6.080 < 0.001
    hgb -0.069 -0.090 -0.048 < 0.001
    Age in Years 0.000 -0.003 0.003 0.925
    sex Female -0.027 -0.101 0.046 0.468

    Binomial

    Fit and summarize logistic regression model

    > boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s')

    > 
    > fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial)
    > summary(fit)
    
    Call:
    glm(formula = mdquality.s ~ age + sex, family = binomial, data = mockstudy)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -2.1832   0.4500   0.4569   0.4626   0.4756  
    
    Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept)  2.329442   0.514684   4.526 6.01e-06 ***
    age         -0.002353   0.008256  -0.285    0.776    
    sexFemale    0.039227   0.195330   0.201    0.841    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 807.68  on 1246  degrees of freedom
    Residual deviance: 807.55  on 1244  degrees of freedom
      (252 observations deleted due to missingness)
    AIC: 813.55
    
    Number of Fisher Scoring iterations: 4
    > 
    > # create Odd's ratio w/ confidence intervals
    > tmp <- data.frame(summary(fit)$coef)
    > tmp
                    Estimate  Std..Error    z.value     Pr...z..
    (Intercept)  2.329441734 0.514683688  4.5259677 6.011977e-06
    age         -0.002353404 0.008255814 -0.2850602 7.755980e-01
    sexFemale    0.039227292 0.195330166  0.2008256 8.408350e-01
    > 
    > tmp$OR <- round(exp(tmp[,1]),2)
    > tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2)
    > tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2)
    > names(tmp)[4] <- 'P-value'
    > 
    > kable(tmp[,c('OR','lower.CI','upper.CI','P-value')])
    OR lower.CI upper.CI P-value
    (Intercept) 10.27 3.75 28.17 0.000006
    age 1.00 0.98 1.01 0.775598
    sexFemale 1.04 0.71 1.53 0.840835
    > 
    > # Assess the predictive ability of the model
    > 
    > # code using the pROC package
    > require(pROC)
    > pred <- predict(fit, type='response')
    > tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE)
    Setting levels: control = 0, case = 1
    Setting direction: controls < cases

    > tmp$auc
    Area under the curve: 50.69%

    Extract data using broom package

    The broom package makes it easy to extract information from the fit.

    > tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals
    # A tibble: 3 x 7
      term        estimate std.error statistic    p.value conf.low conf.high
      <chr>          <dbl>     <dbl>     <dbl>      <dbl>    <dbl>     <dbl>
    1 (Intercept)   10.3     0.515       4.53  0.00000601    3.83      28.9 
    2 age            0.998   0.00826    -0.285 0.776         0.981      1.01
    3 sexFemale      1.04    0.195       0.201 0.841         0.712      1.53
    > 
    > glance(fit) # model summary statistics
    # A tibble: 1 x 8
      null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
              <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
    1          808.    1246  -404.  814.  829.     808.        1244  1247

    Create a summary table using modelsum

    > summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial))
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 10.272 3.831 28.876 < 0.001 0.507 252
    Age in Years 0.998 0.981 1.014 0.776
    sex Female 1.040 0.712 1.534 0.841
    (Intercept) 4.814 1.709 13.221 0.003 0.550 273
    Body Mass Index (kg/m^2) 1.023 0.987 1.063 0.220
    sex Female 1.053 0.717 1.561 0.794
    > 
    > fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial,
    +                    binomial.stats=c("Nmiss2","OR","p.value"))
    > summary(fitall)
    OR p.value Nmiss2
    (Intercept) 10.493 < 0.001 252
    Age in Years 0.998 0.766

    Survival

    Fit and summarize a Cox regression model

    > require(survival)
    Loading required package: survival
    > 
    > # multivariable model with all 3 terms
    > fit  <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy)
    > summary(fit)
    Call:
    coxph(formula = Surv(fu.time, fu.stat) ~ age + sex + arm, data = mockstudy)
    
      n= 1499, number of events= 1356 
    
                      coef exp(coef)  se(coef)      z Pr(>|z|)    
    age           0.004600  1.004611  0.002501  1.839   0.0659 .  
    sexFemale     0.039893  1.040699  0.056039  0.712   0.4765    
    armF: FOLFOX -0.454650  0.634670  0.064878 -7.008 2.42e-12 ***
    armG: IROX   -0.140785  0.868676  0.072760 -1.935   0.0530 .  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
                 exp(coef) exp(-coef) lower .95 upper .95
    age             1.0046     0.9954    0.9997    1.0095
    sexFemale       1.0407     0.9609    0.9324    1.1615
    armF: FOLFOX    0.6347     1.5756    0.5589    0.7207
    armG: IROX      0.8687     1.1512    0.7532    1.0018
    
    Concordance= 0.563  (se = 0.009 )
    Likelihood ratio test= 56.21  on 4 df,   p=2e-11
    Wald test            = 56.26  on 4 df,   p=2e-11
    Score (logrank) test = 56.96  on 4 df,   p=1e-11
    > 
    > # check proportional hazards assumption
    > fit.z <- cox.zph(fit)
    > fit.z
           chisq df    p
    age     1.41  1 0.24
    sex     1.08  1 0.30
    arm     1.80  2 0.41
    GLOBAL  4.68  4 0.32
    > plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case
    > abline(h=coef(fit)[1], col='red')

    > 
    > # check functional form for age using pspline (penalized spline)
    > # results are returned for the linear and non-linear components
    > fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy)
    > fit2
    Call:
    coxph(formula = Surv(fu.time, fu.stat) ~ pspline(age) + sex + 
        arm, data = mockstudy)
    
                             coef se(coef)      se2    Chisq   DF       p
    pspline(age), linear  0.00443  0.00237  0.00237  3.48989 1.00  0.0617
    pspline(age), nonlin                            13.11270 3.08  0.0047
    sexFemale             0.03993  0.05610  0.05607  0.50663 1.00  0.4766
    armF: FOLFOX         -0.46240  0.06494  0.06493 50.69608 1.00 1.1e-12
    armG: IROX           -0.15243  0.07301  0.07299  4.35876 1.00  0.0368
    
    Iterations: 6 outer, 16 Newton-Raphson
         Theta= 0.954 
    Degrees of freedom for terms= 4.1 1.0 2.0 
    Likelihood ratio test=70.1  on 7.08 df, p=2e-12
    n= 1499, number of events= 1356 
    > 
    > # plot smoothed age to visualize why significant
    > termplot(fit2, se=T, terms=1)
    > abline(h=0)

    > 
    > # The c-statistic comes out in the summary of the fit
    > summary(fit2)$concordance
              C       se(C) 
    0.568432549 0.008487495 
    > 
    > # It can also be calculated using the survConcordance function
    > survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy)
    Warning: 'survConcordance' is deprecated.
    Use 'concordance' instead.
    See help("Deprecated")
    Warning: 'survConcordance.fit' is deprecated.
    Use 'concordancefit' instead.
    See help("Deprecated")
    $concordance
    concordant 
     0.5684325 
    
    $stats
    concordant discordant  tied.risk  tied.time   std(c-d) 
     620221.00  470282.00    5021.00     766.00   19235.49 
    
    $n
    [1] 1499
    
    $std.err
       std(c-d) 
    0.008779125 
    
    $call
    survConcordance(formula = Surv(fu.time, fu.stat) ~ predict(fit2), 
        data = mockstudy)
    
    attr(,"class")
    [1] "survConcordance"

    Extract data using broom package

    The broom package makes it easy to extract information from the fit.

    > tidy(fit) # coefficients, p-values
    # A tibble: 4 x 5
      term         estimate std.error statistic  p.value
      <chr>           <dbl>     <dbl>     <dbl>    <dbl>
    1 age           0.00460   0.00250     1.84  6.59e- 2
    2 sexFemale     0.0399    0.0560      0.712 4.77e- 1
    3 armF: FOLFOX -0.455     0.0649     -7.01  2.42e-12
    4 armG: IROX   -0.141     0.0728     -1.93  5.30e- 2
    > 
    > glance(fit) # model summary statistics
    # A tibble: 1 x 18
          n nevent statistic.log p.value.log statistic.sc p.value.sc statistic.wald
      <int>  <dbl>         <dbl>       <dbl>        <dbl>      <dbl>          <dbl>
    1  1499   1356          56.2    1.81e-11         57.0   1.26e-11           56.3
    # … with 11 more variables: p.value.wald <dbl>, statistic.robust <dbl>,
    #   p.value.robust <dbl>, r.squared <dbl>, r.squared.max <dbl>,
    #   concordance <dbl>, std.error.concordance <dbl>, logLik <dbl>, AIC <dbl>,
    #   BIC <dbl>, nobs <int>

    Create a summary table using modelsum

    > ##Note: You must use quotes when specifying family="survival" 
    > ##      family=survival will not work
    > summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
    +                  adjust=~age + sex, data=mockstudy, family="survival"))
    HR CI.lower.HR CI.upper.HR p.value concordance
    Treatment Arm F: FOLFOX 0.635 0.559 0.721 < 0.001 0.563
    Treatment Arm G: IROX 0.869 0.753 1.002 0.053
    Age in Years 1.005 1.000 1.010 0.066
    sex Female 1.041 0.932 1.162 0.477
    > 
    > ##Note: the pspline term is not working yet
    > #summary(modelsum(Surv(fu.time, fu.stat) ~ arm, 
    > #                adjust=~pspline(age) + sex, data=mockstudy, family='survival'))

    Poisson

    Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely related and survival data can be summarized using Poisson regression. If you have overdispersion (see if the residual deviance is much larger than degrees of freedom), you may want to use quasipoisson() instead of poisson(). Some of these diagnostics need to be done outside of modelsum.

    Example 1: fit and summarize a Poisson regression model

    For the first example, use the solder dataset available in the rpart package. The endpoint skips has a definite Poisson look.

    > require(rpart) ##just to get access to solder dataset
    > data(solder)
    > hist(solder$skips)

    > 
    > fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson)
    > stats::anova(fit, test='Chi')
    Analysis of Deviance Table
    
    Model: poisson, link: log
    
    Response: skips
    
    Terms added sequentially (first to last)
    
            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
    NULL                      899     8788.2              
    Opening  2   2920.5       897     5867.7 < 2.2e-16 ***
    Solder   1   1168.4       896     4699.3 < 2.2e-16 ***
    Mask     4   2015.7       892     2683.7 < 2.2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    > summary(fit)
    
    Call:
    glm(formula = skips ~ Opening + Solder + Mask, family = poisson, 
        data = solder)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -6.1251  -1.4720  -0.7826   0.5986   6.6031  
    
    Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
    (Intercept) -1.12220    0.07742  -14.50  < 2e-16 ***
    OpeningM     0.57161    0.05707   10.02  < 2e-16 ***
    OpeningS     1.81475    0.05044   35.98  < 2e-16 ***
    SolderThin   0.84682    0.03327   25.45  < 2e-16 ***
    MaskA3       0.51315    0.07098    7.23 4.83e-13 ***
    MaskA6       1.81103    0.06609   27.40  < 2e-16 ***
    MaskB3       1.20225    0.06697   17.95  < 2e-16 ***
    MaskB6       1.86648    0.06310   29.58  < 2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    (Dispersion parameter for poisson family taken to be 1)
    
        Null deviance: 8788.2  on 899  degrees of freedom
    Residual deviance: 2683.7  on 892  degrees of freedom
    AIC: 4802.2
    
    Number of Fisher Scoring iterations: 5

    Overdispersion is when the Residual deviance is larger than the degrees of freedom. This can be tested, approximately using the following code. The goal is to have a p-value that is \(>0.05\).

    > 1-pchisq(fit$deviance, fit$df.residual)
    [1] 0

    One possible solution is to use the quasipoisson family instead of the poisson family. This adjusts for the overdispersion.

    > fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson)
    > summary(fit2)
    
    Call:
    glm(formula = skips ~ Opening + Solder + Mask, family = quasipoisson, 
        data = solder)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -6.1251  -1.4720  -0.7826   0.5986   6.6031  
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -1.12220    0.13483  -8.323 3.19e-16 ***
    OpeningM     0.57161    0.09939   5.751 1.22e-08 ***
    OpeningS     1.81475    0.08784  20.660  < 2e-16 ***
    SolderThin   0.84682    0.05794  14.615  < 2e-16 ***
    MaskA3       0.51315    0.12361   4.151 3.62e-05 ***
    MaskA6       1.81103    0.11510  15.735  < 2e-16 ***
    MaskB3       1.20225    0.11663  10.308  < 2e-16 ***
    MaskB6       1.86648    0.10989  16.984  < 2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    (Dispersion parameter for quasipoisson family taken to be 3.033198)
    
        Null deviance: 8788.2  on 899  degrees of freedom
    Residual deviance: 2683.7  on 892  degrees of freedom
    AIC: NA
    
    Number of Fisher Scoring iterations: 5

    Extract data using broom package

    The broom package makes it easy to extract information from the fit.

    > tidy(fit) # coefficients, p-values
    # A tibble: 8 x 5
      term        estimate std.error statistic   p.value
      <chr>          <dbl>     <dbl>     <dbl>     <dbl>
    1 (Intercept)   -1.12     0.0774    -14.5  1.29e- 47
    2 OpeningM       0.572    0.0571     10.0  1.29e- 23
    3 OpeningS       1.81     0.0504     36.0  1.66e-283
    4 SolderThin     0.847    0.0333     25.5  6.47e-143
    5 MaskA3         0.513    0.0710      7.23 4.83e- 13
    6 MaskA6         1.81     0.0661     27.4  2.45e-165
    7 MaskB3         1.20     0.0670     18.0  4.55e- 72
    8 MaskB6         1.87     0.0631     29.6  2.71e-192
    > 
    > glance(fit) # model summary statistics
    # A tibble: 1 x 8
      null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
              <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
    1         8788.     899 -2393. 4802. 4841.    2684.         892   900

    Create a summary table using modelsum

    > summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson"))
    RR CI.lower.RR CI.upper.RR p.value
    (Intercept) 1.533 1.179 1.952 < 0.001
    Opening M 2.328 1.733 3.167 < 0.001
    Opening S 7.491 5.780 9.888 < 0.001
    (Intercept) 2.904 2.423 3.446 < 0.001
    Solder Thin 2.808 2.295 3.458 < 0.001
    (Intercept) 1.611 1.135 2.204 0.005
    Mask A3 1.469 0.995 2.214 0.059
    Mask A6 8.331 5.839 12.222 < 0.001
    Mask B3 3.328 2.309 4.920 < 0.001
    Mask B6 6.466 4.598 9.378 < 0.001
    > summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson"))
    RR CI.lower.RR CI.upper.RR p.value
    (Intercept) 1.533 1.397 1.678 < 0.001
    Opening M 2.328 2.089 2.599 < 0.001
    Opening S 7.491 6.805 8.267 < 0.001
    (Intercept) 2.904 2.750 3.065 < 0.001
    Solder Thin 2.808 2.637 2.992 < 0.001
    (Intercept) 1.611 1.433 1.804 < 0.001
    Mask A3 1.469 1.280 1.690 < 0.001
    Mask A6 8.331 7.341 9.487 < 0.001
    Mask B3 3.328 2.923 3.800 < 0.001
    Mask B6 6.466 5.724 7.331 < 0.001

    Example 2: fit and summarize a Poisson regression model

    This second example uses the survival endpoint available in the mockstudy dataset. There is a close relationship between survival and Poisson models, and often it is easier to fit the model using Poisson regression, especially if you want to present absolute risk.

    > # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis
    > fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson)
    > summary(fit)
    
    Call:
    glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
        arm, family = poisson, data = mockstudy)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -3.1188  -0.4041   0.3242   0.9727   4.3588  
    
    Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
    (Intercept)  -5.875627   0.108984 -53.913  < 2e-16 ***
    age           0.003724   0.001705   2.184   0.0290 *  
    sexFemale     0.027321   0.038575   0.708   0.4788    
    armF: FOLFOX -0.335141   0.044600  -7.514 5.72e-14 ***
    armG: IROX   -0.107776   0.050643  -2.128   0.0333 *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    (Dispersion parameter for poisson family taken to be 1)
    
        Null deviance: 2113.5  on 1498  degrees of freedom
    Residual deviance: 2048.0  on 1494  degrees of freedom
    AIC: 5888.2
    
    Number of Fisher Scoring iterations: 5
    > 1-pchisq(fit$deviance, fit$df.residual)
    [1] 0
    > 
    > coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy))
             age    sexFemale armF: FOLFOX   armG: IROX 
     0.004600011  0.039892735 -0.454650445 -0.140784996 
    > coef(fit)[-1]
             age    sexFemale armF: FOLFOX   armG: IROX 
     0.003723763  0.027320917 -0.335141090 -0.107775577 
    > 
    > # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio)
    > exp(coef(fit)[-1])
             age    sexFemale armF: FOLFOX   armG: IROX 
       1.0037307    1.0276976    0.7152372    0.8978291 
    > 
    > # As before, we can model the dispersion which alters the standard error
    > fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, 
    +             data=mockstudy, family=quasipoisson)
    > summary(fit2)
    
    Call:
    glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex + 
        arm, family = quasipoisson, data = mockstudy)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -3.1188  -0.4041   0.3242   0.9727   4.3588  
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  -5.875627   0.566666 -10.369   <2e-16 ***
    age           0.003724   0.008867   0.420    0.675    
    sexFemale     0.027321   0.200572   0.136    0.892    
    armF: FOLFOX -0.335141   0.231899  -1.445    0.149    
    armG: IROX   -0.107776   0.263318  -0.409    0.682    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    (Dispersion parameter for quasipoisson family taken to be 27.03493)
    
        Null deviance: 2113.5  on 1498  degrees of freedom
    Residual deviance: 2048.0  on 1494  degrees of freedom
    AIC: NA
    
    Number of Fisher Scoring iterations: 5

    Extract data using broom package

    The broom package makes it easy to extract information from the fit.

    > tidy(fit) ##coefficients, p-values
    # A tibble: 5 x 5
      term         estimate std.error statistic  p.value
      <chr>           <dbl>     <dbl>     <dbl>    <dbl>
    1 (Intercept)  -5.88      0.109     -53.9   0.      
    2 age           0.00372   0.00171     2.18  2.90e- 2
    3 sexFemale     0.0273    0.0386      0.708 4.79e- 1
    4 armF: FOLFOX -0.335     0.0446     -7.51  5.72e-14
    5 armG: IROX   -0.108     0.0506     -2.13  3.33e- 2
    > 
    > glance(fit) ##model summary statistics
    # A tibble: 1 x 8
      null.deviance df.null logLik   AIC   BIC deviance df.residual  nobs
              <dbl>   <int>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
    1         2114.    1498 -2939. 5888. 5915.    2048.        1494  1499

    Create a summary table using modelsum

    Remember that the result from modelsum is different from the fit above. The modelsum summary shows the results for age + offset(log(fu.time+.01)) then sex + offset(log(fu.time+.01)) instead of age + sex + arm + offset(log(fu.time+.01)).

    > summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm, 
    +                  data=mockstudy, family=poisson))
    RR CI.lower.RR CI.upper.RR p.value
    (Intercept) 0.003 0.002 0.003 < 0.001
    Age in Years 1.004 1.000 1.007 0.029
    sex Female 1.028 0.953 1.108 0.479
    Treatment Arm F: FOLFOX 0.715 0.656 0.781 < 0.001
    Treatment Arm G: IROX 0.898 0.813 0.991 0.033

    Additional Examples

    Here are multiple examples showing how to use some of the different options.

    1. Change summary statistics globally

    There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard settings and use that for future tables.

    > mycontrols  <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"),
    +                                 show.adjust=FALSE, show.intercept=FALSE)                            
    > tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols)
    > summary(tab2)
    estimate std.error adj.r.squared Nmiss
    Age in Years 0.012 0.012 0.004 33

    You can also change these settings directly in the modelsum call.

    > tab3 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy,
    +                  gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"), 
    +                  show.intercept=FALSE, show.adjust=FALSE)
    > summary(tab3)
    estimate std.error adj.r.squared Nmiss
    Age in Years 0.012 0.012 0.004 33

    2. Add labels to independent variables

    In the above example, age is shown with a label (Age in Years), but sex is listed “as is”. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.

    > ## Look at one variable's label
    > attr(mockstudy$age,'label')
    [1] "Age in Years"
    > 
    > ## See all the variables with a label
    > unlist(lapply(mockstudy,'attr','label'))
                           age                        arm 
                "Age in Years"            "Treatment Arm" 
                          race                        bmi 
                        "Race" "Body Mass Index (kg/m^2)" 
    > 
    > ## or
    > cbind(sapply(mockstudy,attr,'label'))
                [,1]                      
    case        NULL                      
    age         "Age in Years"            
    arm         "Treatment Arm"           
    sex         NULL                      
    race        "Race"                    
    fu.time     NULL                      
    fu.stat     NULL                      
    ps          NULL                      
    hgb         NULL                      
    bmi         "Body Mass Index (kg/m^2)"
    alk.phos    NULL                      
    ast         NULL                      
    mdquality.s NULL                      
    age.ord     NULL                      

    If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.

    > attr(mockstudy$age,'label')  <- 'Age, yrs'
    > 
    > tab1 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy)
    > summary(tab1)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Age, yrs 0.012 0.012 0.348
    sex Female -0.718 0.291 0.014

    You can also use the built-in data.frame method for labels<-:

    > labels(mockstudy)  <- c(age = 'Age, yrs')
    > 
    > tab1 <- modelsum(bmi ~  age, adjust=~sex, data=mockstudy)
    > summary(tab1)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Age, yrs 0.012 0.012 0.348
    sex Female -0.718 0.291 0.014

    Another option is to add labels after you have created the table

    > mylabels <- list(sexFemale = "Female", age ="Age, yrs")
    > summary(tab1, labelTranslations = mylabels)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Age, yrs 0.012 0.012 0.348
    Female -0.718 0.291 0.014

    Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the modelsum object.

    > labels(tab1)
                           bmi                        age 
    "Body Mass Index (kg/m^2)"                 "Age, yrs" 
                           sex 
                  "sex Female" 
    > labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)")
    > labels(tab1)
                           bmi                        age 
    "Body Mass Index (kg/m^2)"       "Baseline Age (yrs)" 
                           sex 
                      "Female" 
    > summary(tab1)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.793 0.766 < 0.001 0.004 33
    Baseline Age (yrs) 0.012 0.012 0.348
    Female -0.718 0.291 0.014

    3. Don’t show intercept values

    > summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE)
    estimate std.error p.value adj.r.squared Nmiss
    mdquality.s -0.326 1.093 0.766 -0.001 252
    sex Female -1.208 0.610 0.048 0.002 0

    4. Don’t show results for adjustment variables

    > summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial),
    +         show.adjust=FALSE)  
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 10.272 3.831 28.876 < 0.001 0.507 252
    Age, yrs 0.998 0.981 1.014 0.776
    (Intercept) 4.814 1.709 13.221 0.003 0.550 273
    Body Mass Index (kg/m^2) 1.023 0.987 1.063 0.220

    5. Summarize multiple variables without typing them out

    Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste command with the collapse="+" option.

    > # create a vector specifying the variable names
    > myvars <- names(mockstudy)
    > 
    > # select the 8th through the 12th
    > # paste them together, separated by the + sign
    > RHS <- paste(myvars[8:12], collapse="+")
    > RHS

    [1] “ps+hgb+bmi+alk.phos+ast”

    > 
    > # create a formula using the as.formula function
    > as.formula(paste('mdquality.s ~ ', RHS))

    mdquality.s ~ ps + hgb + bmi + alk.phos + ast

    > 
    > # use the formula in the modelsum function
    > summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy))
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 14.628 10.755 20.399 < 0.001 0.620 460
    ps 0.461 0.332 0.639 < 0.001
    (Intercept) 1.236 0.272 5.560 0.783 0.573 460
    hgb 1.176 1.040 1.334 0.011
    (Intercept) 4.963 1.818 13.292 0.002 0.549 273
    Body Mass Index (kg/m^2) 1.023 0.987 1.062 0.225
    (Intercept) 10.622 7.687 14.794 < 0.001 0.552 460
    alk.phos 0.999 0.998 1.000 0.159
    (Intercept) 10.936 7.912 15.232 < 0.001 0.545 460
    ast 0.995 0.988 1.001 0.099

    These steps can also be done using the formulize function.

    > ## The formulize function does the paste and as.formula steps
    > tmp <- formulize('mdquality.s',myvars[8:10])
    > tmp

    mdquality.s ~ ps + hgb + bmi

    > 
    > ## More complex formulas could also be written using formulize
    > tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)'))
    > 
    > ## use the formula in the modelsum function
    > summary(modelsum(tmp, data=mockstudy, family=binomial))
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 14.628 10.755 20.399 < 0.001 0.620 460
    ps 0.461 0.332 0.639 < 0.001
    (Intercept) 1.236 0.272 5.560 0.783 0.573 460
    hgb 1.176 1.040 1.334 0.011
    (Intercept) 4.963 1.818 13.292 0.002 0.549 273
    Body Mass Index (kg/m^2) 1.023 0.987 1.062 0.225

    6. Subset the dataset used in the analysis

    Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group).

    • The first approach uses the subset function applied to the dataset mockstudy. This example also selects a subset of variables. The modelsum function is then applied to this subsetted data.
    > newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos))
    > dim(mockstudy)
    [1] 1499   14
    > table(mockstudy$arm)
    
       A: IFL F: FOLFOX   G: IROX 
          428       691       380 
    > dim(newdata)
    [1] 557   4
    > names(newdata)
    [1] "age"      "sex"      "bmi"      "alk.phos"
    > summary(modelsum(alk.phos ~ ., data=newdata))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 122.577 46.924 0.009 -0.001 108
    age 0.619 0.719 0.390
    (Intercept) 164.814 7.673 < 0.001 -0.002 108
    sex Female -5.497 12.118 0.650
    (Intercept) 238.658 33.705 < 0.001 0.010 119
    bmi -2.776 1.207 0.022
    • The second approach does the same analysis but uses the subset argument within modelsum to subset the data.
    > summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 4.872 0.039 < 0.001 -0.002 108
    sex Female -0.005 0.062 0.931
    (Intercept) 4.770 0.040 < 0.001 0.027 108
    ps 0.183 0.050 < 0.001
    (Intercept) 5.207 0.172 < 0.001 0.007 119
    Body Mass Index (kg/m^2) -0.012 0.006 0.044
    > summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 178.812 14.550 < 0.001 0.007 77
    ps 20.834 13.440 0.122
    sex Female -17.542 16.656 0.293
    (Intercept) 373.008 104.272 < 0.001 0.009 77
    Body Mass Index (kg/m^2) -8.239 4.727 0.083
    sex Female -24.058 16.855 0.155
    > summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 169.112 57.013 0.006 0.294 0
    ps 254.901 68.100 < 0.001
    sex Female 49.566 67.643 0.470
    (Intercept) 453.070 200.651 0.033 -0.049 1
    Body Mass Index (kg/m^2) -5.993 7.408 0.426
    sex Female -22.308 79.776 0.782

    7. Create combinations of variables on the fly

    > ## create a variable combining the levels of mdquality.s and sex
    > with(mockstudy, table(interaction(mdquality.s,sex)))
    
      0.Male   1.Male 0.Female 1.Female 
          77      686       47      437 
    > summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 59.714 1.314 < 0.001 0.003 252
    interaction(mdquality.s, sex) 1.Male 0.730 1.385 0.598
    interaction(mdquality.s, sex) 0.Female 0.988 2.134 0.643
    interaction(mdquality.s, sex) 1.Female -1.021 1.425 0.474

    8. Transform variables on the fly

    Certain transformations need to be surrounded by I() so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ * then surround the new variable by I().

    > summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s,
    +                  data=mockstudy, family=binomial))
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 0.656 0.382 1.124 0.126 0.514 0
    Age, yrs 1.045 0.957 1.142 0.326
    (Intercept) 0.633 0.108 3.698 0.611 0.508 33
    Body Mass Index (kg/m^2) 1.092 0.638 1.867 0.748
    (Intercept) 0.722 0.503 1.029 0.074 0.502 252
    mdquality.s 1.045 0.719 1.527 0.819

    9. Change the ordering of the variables or delete a variable

    > mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy)
    > mytab2 <- mytab[c('age','sex','alk.phos')]
    > summary(mytab2)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.424 0.752 < 0.001 0.000 33
    Age, yrs 0.013 0.012 0.290
    (Intercept) 27.491 0.181 < 0.001 0.004 33
    sex Female -0.731 0.290 0.012
    (Intercept) 27.944 0.253 < 0.001 0.011 294
    alk.phos -0.005 0.001 < 0.001
    > summary(mytab[c('age','sex')])
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.424 0.752 < 0.001 0.000 33
    Age, yrs 0.013 0.012 0.290
    (Intercept) 27.491 0.181 < 0.001 0.004 33
    sex Female -0.731 0.290 0.012
    > summary(mytab[c(3,1)])
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 26.424 0.752 < 0.001 0.000 33
    Age, yrs 0.013 0.012 0.290
    (Intercept) 27.491 0.181 < 0.001 0.004 33
    sex Female -0.731 0.290 0.012

    10. Merge two modelsum objects together

    It is possible to merge two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is sometimes easier to combine two models of the same family (such as two sets of linear models). Overlapping y-variables will have their x-variables concatenated, and (if all=TRUE) non-overlapping y-variables will have their tables printed separately.

    > ## demographics
    > tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
    > ## lab data
    > tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial)
    >                 
    > tab12 <- merge(tab1, tab2, all = TRUE)
    > class(tab12)

    [1] “modelsum” “arsenal_table”

    > summary(tab12)
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 27.491 0.181 < 0.001 0.004 33
    sex Female -0.731 0.290 0.012
    (Intercept) 26.424 0.752 < 0.001 0.000 33
    Age, yrs 0.013 0.012 0.290
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 1.236 0.272 5.560 0.783 0.573 460
    hgb 1.176 1.040 1.334 0.011
    (Intercept) 10.622 7.687 14.794 < 0.001 0.552 460
    alk.phos 0.999 0.998 1.000 0.159

    11. Add a title to the table

    When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

    > t1 <- modelsum(bmi ~ sex + age, data=mockstudy)
    > summary(t1, title='Demographics')
    Demographics
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 27.491 0.181 < 0.001 0.004 33
    sex Female -0.731 0.290 0.012
    (Intercept) 26.424 0.752 < 0.001 0.000 33
    Age, yrs 0.013 0.012 0.290

    12. Modify how missing values are treated

    Depending on the report you are writing you have the following options:

    • Use all values available for each variable

    • Use only those subjects who have measurements available for all the variables

    > ## look at how many missing values there are for each variable
    > apply(is.na(mockstudy),2,sum)
           case         age         arm         sex        race     fu.time 
              0           0           0           0           7           0 
        fu.stat          ps         hgb         bmi    alk.phos         ast 
              0         266         266          33         266         266 
    mdquality.s     age.ord 
            252           0 
    > ## Show how many subjects have each variable (non-missing)
    > summary(modelsum(bmi ~ ast + age, data=mockstudy,
    +                 control=modelsum.control(gaussian.stats=c("N","estimate"))))
    estimate N
    (Intercept) 27.331 1205
    ast -0.005
    (Intercept) 26.424 1466
    Age, yrs 0.013
    > 
    > ## Always list the number of missing values
    > summary(modelsum(bmi ~ ast + age, data=mockstudy,
    +                 control=modelsum.control(gaussian.stats=c("Nmiss2","estimate"))))
    estimate Nmiss2
    (Intercept) 27.331 294
    ast -0.005
    (Intercept) 26.424 33
    Age, yrs 0.013
    > 
    > ## Only show the missing values if there are some (default)
    > summary(modelsum(bmi ~ ast + age, data=mockstudy, 
    +                 control=modelsum.control(gaussian.stats=c("Nmiss","estimate"))))
    estimate Nmiss
    (Intercept) 27.331 294
    ast -0.005
    (Intercept) 26.424 33
    Age, yrs 0.013
    > 
    > ## Don't show N at all
    > summary(modelsum(bmi ~ ast + age, data=mockstudy, 
    +                 control=modelsum.control(gaussian.stats=c("estimate"))))
    estimate
    (Intercept) 27.331
    ast -0.005
    (Intercept) 26.424
    Age, yrs 0.013

    13. Modify the number of digits used

    Within modelsum.control function there are 3 options for controlling the number of significant digits shown.

    • digits: controls the number of digits after the decimal point for continuous values

    • digits.ratio: controls the number of digits after the decimal point for continuous values

    • digits.p: controls the number of digits after the decimal point for continuous values

    > summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2)
    Warning: Using 'digits.test = ' is deprecated. Use 'digits.p = ' instead.
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 27.4915 0.1813 < 0.001 0.0036 33
    sex Female -0.7311 0.2903 0.012
    (Intercept) 26.4237 0.7521 < 0.001 0.0001 33
    Age, yrs 0.0130 0.0123 0.290
    (Intercept) 26.4937 0.2447 < 0.001 0.0079 33
    fu.time 0.0011 0.0003 < 0.001

    14. Use case-weights in the models

    Occasionally it is of interest to fit models using case weights. The modelsum function allows you to pass on the weights to the models and it will do the appropriate fit.

    > mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
    > 
    > ## create weights based on agegp and sex distribution
    > tab1 <- with(mockstudy,table(agegp, sex))
    > tab1
             sex
    agegp     Male Female
      [18,50)  152    110
      [50,60)  258    178
      [60,70)  295    173
      [70,90)  211    122
    > tab2 <- with(mockstudy, table(agegp, sex, arm))
    > gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2
    > 
    > ## apply weights to subjects
    > index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm)))) 
    > mockstudy$wts <- gpwts[index]
    > 
    > ## show weights by treatment arm group
    > tapply(mockstudy$wts,mockstudy$arm, summary)
    $`A: IFL`
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2.923   3.225   3.548   3.502   3.844   4.045 
    
    $`F: FOLFOX`
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2.033   2.070   2.201   2.169   2.263   2.303 
    
    $`G: IROX`
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      3.667   3.734   4.023   3.945   4.031   4.471 
    > mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL')
    > tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
    +                  family=binomial)
    > summary(tab1, title='No Case Weights used')
    No Case Weights used
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 0.590 0.473 0.735 < 0.001 0.550 210
    ast 1.003 0.998 1.008 0.258
    (Intercept) 0.578 0.306 1.093 0.091 0.500 29
    Body Mass Index (kg/m^2) 1.003 0.980 1.026 0.808
    (Intercept) 1.006 0.386 2.631 0.990 0.514 210
    hgb 0.965 0.894 1.043 0.372
    > 
    > suppressWarnings({
    + tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'), 
    +                  weights=wts, family=binomial)
    + summary(tab2, title='Case Weights used')
    + })
    Case Weights used
    OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
    (Intercept) 0.956 0.837 1.091 0.504 0.550 210
    ast 1.003 1.000 1.006 0.068
    (Intercept) 0.957 0.658 1.393 0.820 0.500 29
    Body Mass Index (kg/m^2) 1.002 0.988 1.016 0.780
    (Intercept) 1.829 1.031 3.248 0.039 0.514 210
    hgb 0.956 0.913 1.001 0.058

    15. Use modelsum within an Sweave document

    For those users who wish to create tables within an Sweave document, the following code seems to work.

    \documentclass{article}
    
    \usepackage{longtable}
    \usepackage{pdfpages}
    
    \begin{document}
    
    \section{Read in Data}
    <<echo=TRUE>>=
    require(arsenal)
    require(knitr)
    require(rmarkdown)
    data(mockstudy)
    
    tab1 <- modelsum(bmi~sex+age, data=mockstudy)
    @
    
    \section{Convert Summary.modelsum to LaTeX}
    <<echo=TRUE, results='hide', message=FALSE>>=
    capture.output(summary(tab1), file="Test.md")
    
    ## Convert R Markdown Table to LaTeX
    render("Test.md", pdf_document(keep_tex=TRUE))
    @ 
    
    \includepdf{Test.pdf}
    
    \end{document}

    16. Export modelsum results to a .CSV file

    When looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame function creates a data frame object that can be exported or further manipulated within R.

    > summary(tab2, text=T)
    
    
    |                         |OR    |CI.lower.OR |CI.upper.OR |p.value |concordance |Nmiss |
    |:------------------------|:-----|:-----------|:-----------|:-------|:-----------|:-----|
    |(Intercept)              |0.956 |0.837       |1.091       |0.504   |0.550       |210   |
    |ast                      |1.003 |1.000       |1.006       |0.068   |            |      |
    |(Intercept)              |0.957 |0.658       |1.393       |0.820   |0.500       |29    |
    |Body Mass Index (kg/m^2) |1.002 |0.988       |1.016       |0.780   |            |      |
    |(Intercept)              |1.829 |1.031       |3.248       |0.039   |0.514       |210   |
    |hgb                      |0.956 |0.913       |1.001       |0.058   |            |      |
    > tmp <- as.data.frame(summary(tab2, text = TRUE))
    > tmp
                                  OR CI.lower.OR CI.upper.OR p.value concordance
    1              (Intercept) 0.956       0.837       1.091   0.504       0.550
    2                      ast 1.003       1.000       1.006   0.068            
    3              (Intercept) 0.957       0.658       1.393   0.820       0.500
    4 Body Mass Index (kg/m^2) 1.002       0.988       1.016   0.780            
    5              (Intercept) 1.829       1.031       3.248   0.039       0.514
    6                      hgb 0.956       0.913       1.001   0.058            
      Nmiss
    1   210
    2      
    3    29
    4      
    5   210
    6      
    > # write.csv(tmp, '/my/path/here/mymodel.csv')

    17. Write modelsum object to a separate Word or HTML file

    > ## write to an HTML document
    > write2html(tab2, "~/ibm/trash.html")
    > 
    > ## write to a Word document
    > write2word(tab2, "~/ibm/trash.doc", title="My table in Word")

    18. Use modelsum in R Shiny

    The easiest way to output a modelsum() object in an R Shiny app is to use the tableOutput() UI in combination with the renderTable() server function and as.data.frame(summary(modelsum())):

    > # A standalone shiny app
    > library(shiny)
    > library(arsenal)
    > data(mockstudy)
    > 
    > shinyApp(
    +   ui = fluidPage(tableOutput("table")),
    +   server = function(input, output) {
    +     output$table <- renderTable({
    +       as.data.frame(summary(modelsum(age ~ sex, data = mockstudy), text = "html"))
    +     }, sanitize.text.function = function(x) x)
    +   }
    + )

    This can be especially powerful if you feed the selections from a selectInput(multiple = TRUE) into formulize() to make the table dynamic!

    23. Use modelsum in bookdown

    Since the backbone of modelsum() is knitr::kable(), tables still render well in bookdown. However, print.summary.modelsum() doesn’t use the caption= argument of kable(), so some tables may not have a properly numbered caption. To fix this, use the method described on the bookdown site to give the table a tag/ID.

    > summary(modelsum(age ~ sex, data = mockstudy), title="(\\#tab:mytableby) Caption here")

    24. Model multiple endpoints

    You can now use list() on the left-hand side of modelsum() to give multiple endpoints. Note that only one “family” can be specified this way (use merge() instead if you want multiple families).

    > summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy))
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 58.053 1.614 < 0.001 -0.001 33
    Body Mass Index (kg/m^2) 0.059 0.055 0.289
    Treatment Arm F: FOLFOX 0.593 0.718 0.408
    Treatment Arm G: IROX 0.171 0.819 0.834
    (Intercept) 60.108 0.597 < 0.001 0.001 0
    sex Female -1.232 0.611 0.044
    Treatment Arm F: FOLFOX 0.693 0.709 0.329
    Treatment Arm G: IROX 0.148 0.812 0.855
    estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 11.565 0.267 < 0.001 0.005 294
    Body Mass Index (kg/m^2) 0.028 0.009 0.003
    Treatment Arm F: FOLFOX 0.046 0.118 0.699
    Treatment Arm G: IROX 0.065 0.133 0.624
    (Intercept) 12.505 0.096 < 0.001 0.032 266
    sex Female -0.642 0.099 < 0.001
    Treatment Arm F: FOLFOX 0.131 0.115 0.256
    Treatment Arm G: IROX 0.131 0.130 0.313

    To avoid confusion about which table is which endpoint, you can set term.name=TRUE in summary(). This takes the labels for each endpoint and puts them in the top-left of the table.

    > summary(modelsum(list(age, hgb) ~ bmi + sex, adjust = ~ arm, data = mockstudy), term.name = TRUE)
    Age, yrs estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 58.053 1.614 < 0.001 -0.001 33
    Body Mass Index (kg/m^2) 0.059 0.055 0.289
    Treatment Arm F: FOLFOX 0.593 0.718 0.408
    Treatment Arm G: IROX 0.171 0.819 0.834
    (Intercept) 60.108 0.597 < 0.001 0.001 0
    sex Female -1.232 0.611 0.044
    Treatment Arm F: FOLFOX 0.693 0.709 0.329
    Treatment Arm G: IROX 0.148 0.812 0.855
    hgb estimate std.error p.value adj.r.squared Nmiss
    (Intercept) 11.565 0.267 < 0.001 0.005 294
    Body Mass Index (kg/m^2) 0.028 0.009 0.003
    Treatment Arm F: FOLFOX 0.046 0.118 0.699
    Treatment Arm G: IROX 0.065 0.133 0.624
    (Intercept) 12.505 0.096 < 0.001 0.032 266
    sex Female -0.642 0.099 < 0.001
    Treatment Arm F: FOLFOX 0.131 0.115 0.256
    Treatment Arm G: IROX 0.131 0.130 0.313

    25. Model data by a non-test group (strata)

    You can also specify a grouping variable that doesn’t get tested (but instead separates results): a strata variable.

    > summary(modelsum(list(age, hgb) ~ bmi + sex, strata = arm, data = mockstudy))
    Treatment Arm estimate std.error p.value adj.r.squared Nmiss
    A: IFL (Intercept) 59.147 2.783 < 0.001 -0.002 9
    Body Mass Index (kg/m^2) 0.019 0.100 0.851
    (Intercept) 59.903 0.683 < 0.001 -0.002 0
    sex Female -0.651 1.151 0.572
    F: FOLFOX (Intercept) 57.194 2.407 < 0.001 0.001 20
    Body Mass Index (kg/m^2) 0.112 0.087 0.197
    (Intercept) 60.691 0.574 < 0.001 0.000 0
    sex Female -0.962 0.901 0.286
    G: IROX (Intercept) 59.188 2.873 < 0.001 -0.003 4
    Body Mass Index (kg/m^2) 0.023 0.104 0.822
    (Intercept) 60.702 0.759 < 0.001 0.007 0
    sex Female -2.346 1.200 0.051
    Treatment Arm estimate std.error p.value adj.r.squared Nmiss
    A: IFL (Intercept) 11.247 0.459 < 0.001 0.013 77
    Body Mass Index (kg/m^2) 0.039 0.017 0.018
    (Intercept) 12.527 0.109 < 0.001 0.037 69
    sex Female -0.703 0.182 < 0.001
    F: FOLFOX (Intercept) 11.661 0.414 < 0.001 0.004 157
    Body Mass Index (kg/m^2) 0.026 0.015 0.085
    (Intercept) 12.661 0.095 < 0.001 0.037 141
    sex Female -0.707 0.151 < 0.001
    G: IROX (Intercept) 11.874 0.457 < 0.001 0.001 60
    Body Mass Index (kg/m^2) 0.019 0.017 0.264
    (Intercept) 12.565 0.121 < 0.001 0.016 56
    sex Female -0.470 0.188 0.013

    26. Add multiple sets of adjustors to the model

    By putting multiple formulas into a list, you can use multiple sets of adjustors. Use ~ 1 or NULL for an “unadjusted” model. By using the adjustment.names=TRUE argument and giving names to your adjustor sets in the list, you can name the various analyses.

    > adj.list <- list(
    +   Unadjusted = ~ 1, # can also specify NULL here
    +   "Adjusted for Arm" = ~ arm
    + )
    > multi.adjust <- modelsum(list(age, bmi) ~ fu.time + ast, adjust = adj.list, data = mockstudy)
    > summary(multi.adjust, adjustment.names = TRUE)
    
    
    |adjustment       |                            |estimate |std.error |p.value |adj.r.squared |Nmiss |
    |:----------------|:---------------------------|:--------|:---------|:-------|:-------------|:-----|
    |Unadjusted       |(Intercept)                 |60.766   |0.512     |< 0.001 |0.002         |0     |
    |                 |**fu.time**                 |-0.001   |0.001     |0.061   |              |      |
    |Adjusted for Arm |(Intercept)                 |60.420   |0.663     |< 0.001 |0.002         |0     |
    |                 |**fu.time**                 |-0.001   |0.001     |0.039   |              |      |
    |                 |**Treatment Arm F: FOLFOX** |0.868    |0.717     |0.227   |              |      |
    |                 |**Treatment Arm G: IROX**   |0.163    |0.812     |0.841   |              |      |
    |Unadjusted       |(Intercept)                 |61.343   |0.547     |< 0.001 |0.004         |266   |
    |                 |**ast**                     |-0.030   |0.012     |0.014   |              |      |
    |Adjusted for Arm |(Intercept)                 |61.236   |0.757     |< 0.001 |0.005         |266   |
    |                 |**ast**                     |-0.030   |0.012     |0.015   |              |      |
    |                 |**Treatment Arm F: FOLFOX** |0.653    |0.779     |0.403   |              |      |
    |                 |**Treatment Arm G: IROX**   |-0.728   |0.880     |0.408   |              |      |
    
    
    |adjustment       |                            |estimate |std.error |p.value |adj.r.squared |Nmiss |
    |:----------------|:---------------------------|:--------|:---------|:-------|:-------------|:-----|
    |Unadjusted       |(Intercept)                 |26.494   |0.245     |< 0.001 |0.008         |33    |
    |                 |**fu.time**                 |0.001    |0.000     |< 0.001 |              |      |
    |Adjusted for Arm |(Intercept)                 |26.658   |0.317     |< 0.001 |0.007         |33    |
    |                 |**fu.time**                 |0.001    |0.000     |< 0.001 |              |      |
    |                 |**Treatment Arm F: FOLFOX** |-0.280   |0.341     |0.413   |              |      |
    |                 |**Treatment Arm G: IROX**   |-0.237   |0.385     |0.538   |              |      |
    |Unadjusted       |(Intercept)                 |27.331   |0.259     |< 0.001 |-0.000        |294   |
    |                 |**ast**                     |-0.005   |0.006     |0.433   |              |      |
    |Adjusted for Arm |(Intercept)                 |27.291   |0.356     |< 0.001 |-0.001        |294   |
    |                 |**ast**                     |-0.004   |0.006     |0.440   |              |      |
    |                 |**Treatment Arm F: FOLFOX** |0.181    |0.368     |0.623   |              |      |
    |                 |**Treatment Arm G: IROX**   |-0.161   |0.414     |0.698   |              |      |
    > summary(multi.adjust, adjustment.names = TRUE, show.intercept = FALSE, show.adjust = FALSE)
    
    
    |adjustment       |            |estimate |std.error |p.value |adj.r.squared |Nmiss |
    |:----------------|:-----------|:--------|:---------|:-------|:-------------|:-----|
    |Unadjusted       |**fu.time** |-0.001   |0.001     |0.061   |0.002         |0     |
    |Adjusted for Arm |**fu.time** |-0.001   |0.001     |0.039   |0.002         |0     |
    |Unadjusted       |**ast**     |-0.030   |0.012     |0.014   |0.004         |266   |
    |Adjusted for Arm |**ast**     |-0.030   |0.012     |0.015   |0.005         |266   |
    
    
    |adjustment       |            |estimate |std.error |p.value |adj.r.squared |Nmiss |
    |:----------------|:-----------|:--------|:---------|:-------|:-------------|:-----|
    |Unadjusted       |**fu.time** |0.001    |0.000     |< 0.001 |0.008         |33    |
    |Adjusted for Arm |**fu.time** |0.001    |0.000     |< 0.001 |0.007         |33    |
    |Unadjusted       |**ast**     |-0.005   |0.006     |0.433   |-0.000        |294   |
    |Adjusted for Arm |**ast**     |-0.004   |0.006     |0.440   |-0.001        |294   |

    Available Function Options

    Summary statistics

    The available summary statistics, by varible type, are:

    • ordinal: Ordinal logistic regression models
      • default: Nmiss, OR, CI.lower.OR, CI.upper.OR, p.value
      • optional: estimate, CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, endpoint, std.error, statistic, logLik, AIC, BIC, edf, deviance, df.residual, p.value.lrt
    • binomial,quasibinomial: Logistic regression models
      • default: OR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss
      • optional: estimate, CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.wald, CI.lower.wald, CI.upper.wald, CI.OR.wald, CI.lower.OR.wald, CI.upper.OR.wald, N, Nmiss2, Nevents, endpoint, std.error, statistic, logLik, AIC, BIC, null.deviance, deviance, df.residual, df.null, p.value.lrt
    • gaussian: Linear regression models
      • default: estimate, std.error, p.value, adj.r.squared, Nmiss
      • optional: CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, statistic, standard.estimate, endpoint, r.squared, AIC, BIC, logLik, statistic.F, p.value.F, p.value.lrt
    • poisson, quasipoisson: Poisson regression models
      • default: RR, CI.lower.RR, CI.upper.RR, p.value, Nmiss
      • optional: CI.RR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, std.error, estimate, statistic, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null, p.value.lrt
    • negbin: Negative binomial regression models
      • default: RR, CI.lower.RR, CI.upper.RR, p.value, Nmiss
      • optional: CI.RR, CI.estimate, CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, std.error, estimate, statistic, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null, theta, SE.theta, p.value.lrt
    • clog: Conditional Logistic models
      • default: OR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss
      • optional: CI.OR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, std.error, endpoint, Nevents, statistic, r.squared, r.squared.max, logLik, AIC, BIC, statistic.log, p.value.log, statistic.sc, p.value.sc, statistic.wald, p.value.wald, N, std.error.concordance, p.value.lrt
    • survival: Cox models
      • default: HR, CI.lower.HR, CI.upper.HR, p.value, concordance, Nmiss
      • optional: CI.HR, CI.estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, std.error, endpoint, Nevents, statistic, r.squared, r.squared.max, logLik, AIC, BIC, statistic.log, p.value.log, statistic.sc, p.value.sc, statistic.wald, p.value.wald, N, std.error.concordance, p.value.lrt

    The full description of these parameters that can be shown for models include:

    • N: a count of the number of observations used in the analysis
    • Nmiss: only show the count of the number of missing values if there are some missing values
    • Nmiss2: always show a count of the number of missing values for a model
    • endpoint: dependent variable used in the model
    • std.err: print the standard error
    • statistic: test statistic
    • statistic.F: test statistic (F test)
    • p.value: print the p-value
    • p.value.lrt: print the likelihood ratio p-value for the main effect only (not the adjustors)
    • r.squared: print the model R-square
    • adj.r.squared: print the model adjusted R-square
    • r.squared.max: print the model R-square
    • concordance: print the model C statistic (which is the AUC for logistic models)
    • logLik: print the loglikelihood value
    • p.value.log: print the p-value for the overall model likelihood test
    • p.value.wald: print the p-value for the overall model wald test
    • p.value.sc: print the p-value for overall model score test
    • AIC: print the Akaike information criterion
    • BIC: print the Bayesian information criterion
    • null.deviance: null deviance
    • deviance: model deviance
    • df.residual: degrees of freedom for the residual
    • df.null: degrees of freedom for the null model
    • dispersion: This is used in Poisson models and is defined as the deviance/df.residual
    • statistic.sc: overall model score statistic
    • statistic.wald: overall model score statistic
    • statistic.log: overall model score statistic
    • std.error.concordance: standard error for the C statistic
    • HR: print the hazard ratio (for survival models), i.e. exp(beta)
    • CI.lower.HR, CI.upper.HR: print the confidence interval for the HR
    • OR: print the odd’s ratio (for logistic models), i.e. exp(beta)
    • CI.lower.OR, CI.upper.OR: print the confidence interval for the OR
    • CI.lower.OR.wald, CI.upper.OR.wald: print the Wald confidence interval for the OR
    • RR: print the risk ratio (for poisson models), i.e. exp(beta)
    • CI.lower.RR, CI.upper.RR: print the confidence interval for the RR
    • estimate: print beta coefficient
    • standardized.estimate: print the standardized beta coefficient
    • CI.lower.estimate, CI.upper.estimate: print the confidence interval for the beta coefficient
    • CI.lower.wald, CI.upper.wald: print the Wald confidence interval for the beta coefficient
    • edf: print the effective degrees of freedom.
    • theta: print the estimate of theta.
    • SE.theta: print the estimate of theta’s standard error.

    modelsum.control settings

    A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in modelsum.control or in summary.modelsum.

    > args(modelsum.control)
    function (digits = 3L, digits.ratio = 3L, digits.p = 3L, format.p = TRUE, 
        show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95, 
        ordinal.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value", 
            "Nmiss"), binomial.stats = c("OR", "CI.lower.OR", "CI.upper.OR", 
            "p.value", "concordance", "Nmiss"), gaussian.stats = c("estimate", 
            "std.error", "p.value", "adj.r.squared", "Nmiss"), poisson.stats = c("RR", 
            "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss"), negbin.stats = c("RR", 
            "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss"), relrisk.stats = c("RR", 
            "CI.lower.RR", "CI.upper.RR", "p.value", "Nmiss"), clog.stats = c("OR", 
            "CI.lower.OR", "CI.upper.OR", "p.value", "concordance", 
            "Nmiss"), survival.stats = c("HR", "CI.lower.HR", "CI.upper.HR", 
            "p.value", "concordance", "Nmiss"), stat.labels = list(), 
        ...) 
    NULL

    summary.modelsum settings

    The summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels).

    > args(arsenal:::summary.modelsum)
    function (object, ..., labelTranslations = NULL, text = FALSE, 
        title = NULL, term.name = "", adjustment.names = FALSE) 
    NULL
    arsenal/inst/doc/paired.html0000644000176200001440000007645214056514656015605 0ustar liggesusers The paired function

    The paired function

    Ethan Heinzen, Beth Atkinson, Jason Sinnwell

    Introduction

    Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro %paired was written to create summary tables with a single call. With the increasing interest in R, we have developed the function paired() to create similar tables within the R environment.

    This vignette is light on purpose; paired() piggybacks off of tableby, so most documentation there applies here, too.

    Simple Example

    The first step when using the paired() function is to load the arsenal package. We can’t use mockstudy here because we need a dataset with paired observations, so we’ll create our own dataset.

    library(arsenal)
    dat <- data.frame(
      tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
      id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6),
      Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"),
      Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")),
      Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA),
      Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")),
      Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE),
      Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4),
      stringsAsFactors = FALSE
    )

    To create a simple table stratified by time point, use a formula= statement to specify the variables that you want summarized and the id= argument to specify the paired observations.

    p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE)
    summary(p)
    Time Point 1 (N=4) Time Point 2 (N=4) Difference (N=4) p value
    Cat 1.000
       A 2 (50.0%) 2 (50.0%) 1 (50.0%)
       B 2 (50.0%) 2 (50.0%) 1 (50.0%)
    Fac 0.261
       A 2 (50.0%) 1 (25.0%) 2 (100.0%)
       B 1 (25.0%) 2 (50.0%) 1 (100.0%)
       C 1 (25.0%) 1 (25.0%) 1 (100.0%)
    Num 0.391
       Mean (SD) 2.750 (1.258) 3.250 (0.957) 0.500 (1.000)
       Range 1.000 - 4.000 2.000 - 4.000 -1.000 - 1.000
    Ord 0.174
       I 2 (50.0%) 0 (0.0%) 2 (100.0%)
       II 1 (25.0%) 1 (25.0%) 1 (100.0%)
       III 1 (25.0%) 3 (75.0%) 0 (0.0%)
    Lgl 1.000
       FALSE 2 (50.0%) 1 (25.0%) 2 (100.0%)
       TRUE 2 (50.0%) 3 (75.0%) 1 (50.0%)
    Dat 0.182
       Median 2018-05-03 2018-05-04 0.500
       Range 2018-05-02 - 2018-05-06 2018-05-02 - 2018-05-07 0.000 - 1.000

    The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2.

    NAs

    Note that by default, observations which do not have both timepoints are removed. This is easily changed using the na.action = na.paired("<arg>") argument. For example:

    p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id,
                signed.rank.exact = FALSE, na.action = na.paired("fill"))
    summary(p)
    Time Point 1 (N=6) Time Point 2 (N=6) Difference (N=6) p value
    Cat 1.000
       N-Miss 2 1 2
       A 2 (50.0%) 2 (40.0%) 1 (50.0%)
       B 2 (50.0%) 3 (60.0%) 1 (50.0%)
    Fac 0.261
       N-Miss 1 1 2
       A 2 (40.0%) 2 (40.0%) 2 (100.0%)
       B 1 (20.0%) 2 (40.0%) 1 (100.0%)
       C 2 (40.0%) 1 (20.0%) 1 (100.0%)
    Num 0.391
       N-Miss 1 2 2
       Mean (SD) 2.200 (1.643) 3.250 (0.957) 0.500 (1.000)
       Range 0.000 - 4.000 2.000 - 4.000 -1.000 - 1.000
    Ord 0.174
       N-Miss 1 1 2
       I 2 (40.0%) 1 (20.0%) 2 (100.0%)
       II 2 (40.0%) 1 (20.0%) 1 (100.0%)
       III 1 (20.0%) 3 (60.0%) 0 (0.0%)
    Lgl 1.000
       N-Miss 1 1 2
       FALSE 3 (60.0%) 2 (40.0%) 2 (100.0%)
       TRUE 2 (40.0%) 3 (60.0%) 1 (50.0%)
    Dat 0.182
       N-Miss 1 1 2
       Median 2018-05-04 2018-05-05 0.500
       Range 2018-05-02 - 2018-05-06 2018-05-02 - 2018-05-07 0.000 - 1.000

    For more details, see the help page for na.paired().

    Available Function Options

    Testing options

    The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

    The following tests are accepted:

    • paired.t: A paired t-test.

    • mcnemar: McNemar’s test.

    • signed.rank: the signed-rank test.

    • sign.test: the sign test.

    • notest: Don’t perform a test.

    paired.control settings

    A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in paired.control or in summary.tableby.

    args(paired.control)
    ## function (diff = TRUE, numeric.test = "paired.t", cat.test = "mcnemar", 
    ##     ordered.test = "signed.rank", date.test = "paired.t", mcnemar.correct = TRUE, 
    ##     signed.rank.exact = NULL, signed.rank.correct = TRUE, ...) 
    ## NULL

    summary.tableby settings

    Since the “paired” object inherits “tableby”, the summary.tableby function is what’s actually used to format and print the table.

    args(arsenal:::summary.tableby)
    ## function (object, ..., labelTranslations = NULL, text = FALSE, 
    ##     title = NULL, pfootnote = FALSE, term.name = "") 
    ## NULL