mitml/0000755000176200001440000000000014127042517011377 5ustar liggesusersmitml/NAMESPACE0000644000176200001440000000565014127022540012616 0ustar liggesusersimport(stats) import(jomo) import(pan) importFrom(graphics, abline, layout, par, plot, plot.new, axTicks, axis, lines, text, title) importFrom(grDevices, dev.new, dev.off, devAskNewPage) importFrom(methods, slot) importFrom(utils, flush.console, tail, write.table) export(panImpute, jomoImpute, mitmlComplete, clusterMeans, plot.mitml, read.mitml, summary.mitml, multilevelR2, testEstimates, testModels, testConstraints, with.mitml.list, within.mitml.list, long2mitml.list, jomo2mitml.list, mids2mitml.list, mitml.list2mids, amelia2mitml.list, as.mitml.list, is.mitml.list, c.mitml.list, cbind.mitml.list, rbind.mitml.list, sort.mitml.list, subset.mitml.list, anova.mitml.result, confint.mitml.testEstimates, write.mitml, write.mitmlMplus, write.mitmlSAV, write.mitmlSPSS ) S3method(c, mitml.list) S3method(cbind, mitml.list) S3method(rbind, mitml.list) S3method(sort, mitml.list) S3method(subset, mitml.list) S3method(with, mitml.list) S3method(within, mitml.list) S3method(plot, mitml) S3method(anova, mitml.result) S3method(confint, mitml.testEstimates) S3method(print, mitml) S3method(print, mitml.summary) S3method(print, mitml.testEstimates) S3method(print, mitml.testModels) S3method(print, mitml.testConstraints) S3method(print, mitml.anova) S3method(summary, mitml) S3method(summary, mitml.testEstimates) S3method(summary, mitml.testModels) S3method(summary, mitml.testConstraints) S3method(.getCoef, default) S3method(.getCoef, merMod) S3method(.getCoef, lme) S3method(.getCoef, lavaan) S3method(.getCoef, coxph.null) S3method(.getCoef, polr) S3method(.getVcov, default) S3method(.getVcov, lavaan) S3method(.getVcov, coxph.null) S3method(.getMisc, default) S3method(.getMisc, lm) S3method(.getMisc, glm) S3method(.getMisc, merMod) S3method(.getMisc, lme) S3method(.getMisc, geeglm) S3method(.getMisc, lavaan) S3method(.getLL, default) S3method(.getLL, geeglm) S3method(.getLL, lavaan) S3method(.getArgsLL, default) S3method(.getArgsLL, lm) S3method(.getArgsLL, glm) S3method(.getArgsLL, geeglm) S3method(.getArgsLL, lmerMod) S3method(.getArgsLL, lme) S3method(.getArgsLL, lavaan) S3method(.getUserLL, default) S3method(.getUserLL, lm) S3method(.getUserLL, lmerMod) S3method(.getUserLL, lme) S3method(.getUserLL, lavaan) S3method(.getDataLL, default) S3method(.getDataLL, lme) S3method(.getDataLL, lavaan) S3method(.updateStackedLL, default) S3method(.updateStackedLL, merMod) S3method(.updateStackedLL, lme) S3method(.updateStackedLL, lavaan) S3method(.checkREML, default) S3method(.checkREML, merMod) S3method(.checkREML, lme) S3method(.updateML, default) S3method(.updateML, merMod) S3method(.updateML, lme) S3method(.getDFs, default) S3method(.getDFs, lavaan) S3method(.getFormula, default) S3method(.getFormula, lme) S3method(.getFormula, lavaan) mitml/README.md0000644000176200001440000000360613321120733012653 0ustar liggesusers# mitml #### Tools for multiple imputation in multilevel modeling This [R](https://www.r-project.org/) package provides tools for multiple imputation of missing data in multilevel modeling. It includes a user-friendly interface to the packages `pan` and `jomo`, and several functions for visualization, data management, and the analysis of multiply imputed data sets. The purpose of `mitml` is to provide users with a set of effective and user-friendly tools for multiple imputation of multilevel data without requiring advanced knowledge of its statistical underpinnings. Examples and additional information can be found in the official [documentation](https://cran.r-project.org/package=mitml/mitml.pdf) of the package and in the [Wiki](https://github.com/simongrund1/mitml/wiki) pages on GitHub. If you use `mitml` and have suggestions for improvement, please email me (see [here](https://cran.r-project.org/package=mitml)) or file an [issue](https://github.com/simongrund1/mitml/issues) at the GitHub repository. #### CRAN version The official version of `mitml` is hosted on CRAN and may be found [here](https://cran.r-project.org/package=mitml). The CRAN version can be installed from within R using: ```r install.packages("mitml") ``` [![CRAN release](http://www.r-pkg.org/badges/version/mitml)](https://cran.r-project.org/package=mitml) [![CRAN downloads](http://cranlogs.r-pkg.org/badges/mitml)](https://cran.r-project.org/package=mitml) #### GitHub version The version hosted here is the development version of `mitml`, allowing better tracking of [issues](https://github.com/simongrund1/mitml/issues) and possibly containing features and changes in advance. The GitHub version can be installed using `devtools` as: ```r install.packages("devtools") devtools::install_github("simongrund1/mitml") ``` ![Github commits](https://img.shields.io/github/commits-since/simongrund1/mitml/latest.svg?colorB=green) mitml/data/0000755000176200001440000000000013172633034012307 5ustar liggesusersmitml/data/justice.rda0000644000176200001440000002407413043611417014452 0ustar liggesusers|eWqz^Ou B9'HK\bfci6#]W7iھڪ}ܷ;DxV}sf7sWnO$D}?ShLM s^"ѐiZUKfz4x#MiHGZ=vtx#]HGIy$푌Gy$GHGxdGNȩ9#{ 鑧yyGgyyGG#zyyG#/ = #{<G^ꑿ<1;#+=*#xyG^7xyG쑷xyGwx{yG?z<#Y9#99#y|G.=rG#yb\K=G.#WzGxG>ꑏyG'=)|#g=9|#_=%|#鑯xG#_7o}bngW'_\gU_3c[kٮzmfg3g.{.[εξqdZֿVvߙzoB[?!zՉWks\K_MG_5ZlijMuMgi2şj\1[kGflc`.ߝs\Pւ>|V_(8˜j|vwd⇵÷^g _ d{[>~[u b \2}ko|$>s\>:=ZPw--.jYb`KLOvŖ!s+™?\ZOͥglC5IO4Zl2[ϸ5~ʹswlm,!mcOoE}J;o姍}yz畿霋ξ\?Ϲ˦ut͜Xӗ/}w'gyyrϏn_wMz_Ow~bdNm_d>6K+ol~3Ee֟dY/m7˯釽q٘l/m /xΫV±.Džj}lkzFOrmgM 67?_ y=~HyriG/Urwugm6kwm_~5l~{W'm{v_?9p;mtUl/g'=g7|Ӈ eF e{V6ʸmdosU0޾|Kgs_-JstW|6{'*Ky ?R\}K-YIIq~96y1Bdi6\ncýg6?T6M x?$~q>W_+pـnؼ_O{fsyfl]~TכKYc4\WY_l.]ƣ 6= }@}+6.ޛlYeCӉjY[-q?|?l˿MOn|MK)Tpe ph9}eG)PgiֽϱIg~̦Nii⾻& Y_ۍ|e^ϗ/I]ϒ=e?oU[&Ё_e>'ͻ2JR\CY ]kJWyzsx1K,'Lvz98KzVpFwW]}똔Eeo\_([x%l2T{RعOQYG0\"ceo˿vi@kgW~\ wش4G~O9a)p %yA_)ƕU|!~b ϕ2_SmNӦ&ZCۮ?t8y!OYKQfYβ%Ll>y6 !𙾒;|tmH:v)u9KϕvhܹqbgO2".2U^q ~SӅmyu2ƕկXn?uG~)q^ϿgJeyǍ/wWop c}{^M;gˤơ|fVY.^QZKc3k?OF$O(#GVo;.}x O Is?L+𗂃ETxp3.)CG}~7[q[R7$%G#4nb= ~.3}?Sʎ]NV Od~MO 7 U^Hĺ/d7ypKwyrs;3npw!v+19Ɠ /9AZ)~`v_(ß:sۥ8;:oGyOn-xMO;G3!wUIybul#NoZsW7S'W]sg]ϗfaSػJrgKS nsͬ{h<_f~_6h%=oT%E?II* v$I%CcGVt \PGTxHSkoU\Ao8QGms)9]|a ;4Os:xĮ+'YI}闀+4}6sOK)xG'/z|k>DRxaJiL=O2إGpWگ%ow9lZp&\HυH-ҧkǵ>ROgK[_ }UHǭM<%*KP>@=k"]ќ9ּ@>ǶPƪ5%}ߨ>Vm[tx~o)xVwɣO#_տM ?<ׁ&Y~|Wz[rgOO 7{ <$O\tI rX/SJڿ̒7hOҟXHP1֩>CuI*үи`}$y: η.-W+7zI.mԧ4}~#rg |Bׯɺ/}.Tէy 'O]MgQr/Uu?{摫YVַ^Osxk'} <.=S)_z֋iom' u30:R)]݊iߐeoQU%ɋ勇_Su<;<ßcu[{ >bG ;=wj.O_ Qo|(}J^}E/ρ{~gg!y\]/q(h2 O컜A@>s}&1Z7v|&ާVz 9xK7Y |Dz牏"=L<?t?~kuUc#yF} |EZrء*>QOeR/h̀+YNgma3xFdSY9y|iOy3o&[ -x3ߍdVYJy*RH= e8ϑ [*>AzvxleGԾ_]q_ o>|3 c}ժY s)̓0OxMoI[h_C@sZ7AzN.꾂;_EsY!-^cz6nQ9٣ny WяWiӉu^KA>mu[}%ztvP}Mԉ=W'QUzwrڏ >?*}*\iz;KWOW6>"R戛ӎhsfucsB&%o$^ f[.ᖊ?QɋfHp+~fF>f.׹A?+3(d|x=!Y?7` Ow%fԽnYMlxtwI= Cnc4~~Ǭ{ExøzmV=%k&5~Ax?ycGͤ?_~kJiO qK 3:!wPcƹ"ُqMb7L=G$fYh<<1e̠"8.3f 3$|lxvVtǍ<܄í}"sb_FwKvd||VgL3(כ%(Y'n>!%q_q9ewD{:r)c\{ ?FY"-rcnhFapfwIW«Hܸdp𷂃fa3JuwwH~e?;jq;df#=κ씼dďEl$בww 7HRy!bp}C|Q#v5f;n'5>nN{ -:<%~ ǭG$^Ŏ<W[nXvEoo$ů8pgW7c:]#׃;RDZvs(mT=L~`>A঻Qet4A3>U 8sGz_+c^}Qpw爗^dߍR︣x q馤ycKjŸɟu+J7Y<+~|~Pgf<|X^{Xh/'\uN盪`o#K^wK%TC| It$o >"~bM _1U;o"Q~G??Nla?&QS쌯{XGSOZ]cco":lw+9T}f+x_\I~?)xyKȣCР|NxR9 .\)'o ]>f';.uo )K}N8kww[e+s;ɝ܇8u;Xū1x?K#r-WI?%u5A~Ι u;{'fq(Gsu>xع͞xȶ:6ytݽv#b>xgB^3+MZO o쯇?gS,Mo|{ 88)~=O=ݓp~S88~aO0q{ܣuZ'GAֵ;Yǧ+o`\>#ȫSLݠ\g&$/RC-FRWx 3!rJȟˈ g]n#}'E&c+ā9J^uE%~WxaK?f}x|F븆>>\/ڂ߃gf xLHyο}gnB뛸Q8*%q7?`ϳ[aofo<1reuC8|^:ƼFIuz03 g}2;7#Ko>c}R]O1ε6괛u/wUh[լpcze:E6NS\}_/n%wn~M7pGWV!Z1A~7B^?,l6q&~NsԽ{>u;o%Cb893 uN4ΣUuɕ< :斀7ŕNύ3SA7ߍ?7toq@CZQ<(*U''w6F ko38I/qO?;foׅnh{/?sZO?w[}~W֯obS<ܫؙ>2 /6җ#zYJ,I!掰#$qI>7AnCw/.a/o9$nhʛS~Ȑ`VJߊ̛!puNs8xXoտb GVU#ɽw[KѾ·ݒh|/>^~;"?oebW'(1%xvK 37u{ɗ/i=bog bcc&$Vx3I|T}/؄Ɉq8}nv[|VyY4=wKۆax ϋh?K`7ybvwjx?v6f[߾rTԟ _^1>n>߈;-;} w;Hywk;qh<,>El w"OKතk/qsR<ϝQ4kߚQ$GalwGϏf7s3;N| !?U @}7u'R79΍⟕/kKsf7N'7g8(V<>ob]ME;Cε8-iz*qCk?6~>J1;5>˩vK}J>%-Oǔ/*w[.v6W熶p޼z]ϓ辯֩琣iG߲oC'1Ї?_`vr^Z39.sgq|zNo|Kg U8tfWѸ0uU~x>K7'i]noS>8H_v+}U} ?ѺVOy»g/Eu;xN_sO?-[*>HTo7c)X[xn_Æj}_*P`M?TGHz{ {a]~esdBRz"Ts&Tj=}@Fg6 }`=zSfP[ ~M6P!C]5ugvr2Pz1LW>C6jut6P;^V6w2B2Ef-T;x?X3̛Gb~,|Zsk5Iޯ ҿGPyj~_`y7T5}P=[Qj}O0T{^zj=7wPׅj# _ =Bۅ!x}sۡ::ב/B@ '9Oy-X=@ne2Pϻ.ToouwC f)q}ֿ =P_j{^as̡Z߻ !y3PG1T/D~zuD۴nϻ|7ϻ*W?/\Tu˶.g9q†}>}v==V `uW wY<;{_oA{R?5;Ek$5׬?o~*EǶW__m9_/mo߉l18;e R=\ǣncn?9/kW,t!/ߋu B[/NS_:`F1?k~c ;ww"_my\o?[ִKCOkv ojoXt7@ߘW߿ۼmnx´b ԅ)x E=Yni1{zo4{:_:SC sdx0/~ Jeo\>L)dr`8Wz *^w<邰MgdjD^03;oؒ S>&AH<DXF t/nM㣂p^90+sCʅu)\0;#A$6s]0{@ϫ!Xܞ_a yo] u^Y|q@x2Ua[/ֹa3_X#A6*KbFuӶ0 "y*Y]'Aؓj@b^IJ!Ն@>$.! `Mm>I.i*"z# "czx 'sy"_ri^>%$ƒ 5GɲBbKrw`?*'ד>gQh/Zy1N.5aG.ˑ;h@1:sr>_B!;~ʸ;ٞ_V ?Ko~,4v|-1u+~}'TJk.batxBf< d@|?"uK#Y_*mOAHnyb^}&_Ϳ8 ⎞wnۢv@-ü ¤nyZ@{قwy=fZkN؞EGa3@▲z΃3~sW@~,[nUI@Zo|m=A65;!!H,zV@b_Lj\ԓ-*,h*f! w{t X7suy\tG3&*|Mjϱ"%] >z|h+; `ܳvPq D'JQy0J$CƇBe'O9H*n_LBCHv;= |#ӲW^p @njn tl7q>C늌AKPmdFkh 7!q&׎v6sL]ڷj>X!<(}1H NlORf7wؤx^渊 ܐ^d<ޠ.|oW (n1~͘Dl?ag t~m}yS97*_\ u "?;a!t=lПI:~ǿcE_IƵe(=g@lC"6tO~G02̤="<]}>~juȧria{ii"!¾8gmZGo"\SR2'(tW1qxlP%_B+6w[i\-Nx9ְa1bA3y8K"u֗m_!=<[jL{BxTٞV? 2jwd @nmTHT*{fqmfmz᧘~Lr'7w'w7 |H[2v'CB`M#~4ǎWC770 (<5-w½Y@UM^uZ/5{[!=8r仼($JA_k!$CTCcwQN$ud.d9]g d2wQ"3EPa +OG NVTA=D|;z 9:l* FGD?y7J|m22*Bp;$a;"r?XA)կ.JA Oo@f( aSZNTkN`R2 k$m @r' =Nnԇ؋!l5 r)hQ~Uc3Y>z3]Wo`^8ʗ߾RzLXNf J|\L &6l,u4l!ǟ ! Wr7(A8"ÊtjqQ22|4‰uk0@`T0yׯL' ř;Oy>0m^evAiJYeid"&ӏn&9d\5pxَآ@x7fe>(ye!F+ߡ/T)]Ӈ*HFc5^K7~ W 2jqfO qy[A@c'K@eUkߝ5z}jZu G_D:OƗ59$CyD'Q| !OաcjJ רݤ6ؚ0wNBwN&8͠~d6Z 9FmÖO!ӕ1;Z<9!T~q]+6ض@2?jsj׭bn+EAXcF j{ uQįvC8 U= awK{in4K;iuo\&d ndJϻ uYh|U8\G-Y!Zx稯ywK?9ps}iS^r`+' m*¢&>W@^7x|'/Y4i}@~rIeBpim<|n |˷; 0]д=x|8l?]ݧKmZ}|?.a")AdgBy9'niB04@پdžukOvGvϑ+3#Ðhr)68jβAѕSN,V5$hޝ}vUPO BhRB: LB_+T g" I"Nm (+?MgOaЕBeKmxۜ Wzeʸ pށM#nk6D橕Ԧ!$B3._DlE-d~ P"fw i wnw,Aq$+un˄pJޅ}Uݶ|(R!L+a0sƊdx+;c l0elO"V‘ [?q!c[[b/(?`-2EmZ *7 mAe* XlhO#4u"Nl7sy4t6{:Bx֗$6MБeh{]Ln0:{V=FdUA919t4=x]9sNۄ@c_$~Elf7(CX7CDy:<1>sj74o@ĸy,QZBlx^t=RO)YB w𶜂иZ dczOaz'*dk`e]A9 +|^u]6uvR-Iչn]&uLIwji)◵opj:Jpd|h{4Nw )+0EEsM_ KU :[H1lk $e)IF𜷝gE*hgBȥcݗ (`0##$$tZÄ4Ǥ }%UIue~'2*^os*"܄kU]%OFJmM[Vcg}1Xlmu6}W 뙈"QNW8 X|~qLcjLW|!-gfzYSߗnԅ6.VCv<<}zȬj)KGڰKѕӺ/eɷܹ E/I!:g*ɡr;,N`{@ҩwN^DFd}]@K=lShC6L 5 z](i*Ik,\w}j.q3g~4aVgY0nB?zOT*#c/ד!ϥ# -3ˏ;ThVbOScf $m˛ؘvweft^dUF\ўf^~('(Sb[Gf)dWN1#_Нs4@Q"xZV d"7 $adKj >@sɧc@e2 v:bƝtBY~u6hҞZ׽ھ72:?s?٫w2X -K Wٜdlu S9Zջ)E<-Ľdm%?vvujr׎x4,H!yg :% |(y=SC5V]z'-L4:l|KFhDӸo4mim$+^%ZOnOѾI"w|pf2#|\i܆ڏ~/xQd+H6,:P,zGMF,?Q"N[zLxp\Ə[G/W=tOvoPv@&urߺUvaNN+*_OH"eڲ&U92%Vmk٧5{*=mUy 0 fr"׍f[1$nvh97,C/Oe#y9dg>dNlQ>r[lu;(_l/\?ű#m|!0R:',с5K/1Z3_m_^2Cȏ$5`֪y(?8wAYN.%0n煌=Q^6Sռo S֍$Uy2r12^!}tQ;""Y ;7!{4򟿊 뜮Wʪ-6gDoNoyB{VZkxԑFe&I߭}o`{׊l#npR WmYT @]k*\Gۄ7 !WY [%!fJc]XCި_knI/8!Q 8/ZxEյګ&6e#hN!9~U_PYo7D>o:J~ӹD6y'ǃ݅#x]Ƣ%E'q !WDO]LXĞ}G͝1o=~!=+WEj>d1~g.`_20x0IOb[w%9ɫZTXǶ8!v۫c#rʛP~ҴY> 87g8Pr|zd^M}l, T#rπJT]d82rp0/,l44dӣ,A3pO\UȼcKP9Ooz ޗܟbN%~?H?l'@>B'Իp?Y>L8*[P9bLӃjzUq?VHmZW~9N`|BomR@}eYr{C?hRoSlzZ`U/;x߶:;qmULNއ^2X$BW+!/YWǁv: &|7>h  CM9xznFXVjڸMJrt}ZQ!1̥m_NR``Vߗ(o%GI,Ecy n>.N.?72f"+G료m0 |ĶuGEeӪq<,!B/]qگipgnÞ[V]OlŻKR'i:/z=³3 ϖ/0?m:zvs]-PZ?XF$_@%ehF^8SG1eW-??k1ci? l7^ql1vl_+y\Wҏ4/R/+/tIKv$_īUyo9đ:eJzYvBd\_4e=-uK[3üq~we]<~ҕq}i~y,?eZw VR^v֟]g:oz+|KqcLgXfź?|UXYWI+qfG9,Jޟd|Vן +}e_gJos_-׭K~?wI vki?KrYc}/G~VK8+Kt 񧝬W7WkWau/祸Z#KzX9~i?+ڗnE/n)OkSgR~+ߒ[?_VÒV|me=.l%_Kpwi?%;ouƥ~Ouyв/1+h.V{K3/.W~+?W%/RZץk ϖ/_;^Ureݱ|]r4/-aԂ?\ vMpa}[7)AXtN}Ƴumod[* 8'rrۄbUZۓ!R12=]J pR{F@йkG:,?@az($t:FnQAn\0O=abZ@I? y a! a@'75x/ljdggYP;-aWaG/jw ]&u/Z{`߶†d8B<v;wwG(ipP8YZf 9@M]|{]L$!P;q|M%Dj0iVC) ؤ•nM- =6 4$OQlj5@:Q- >Qn0bgÇ" XnT.o69 Ġ-I@Zr-^Bhդn:hbۀxx/!2 6 Xr@*/J Q! t@i.8} I'E=bם @*izB A#f>} @ěWE Y}\Z!R\‹ZΩ+qE^Oڄ` dR zT%4V;}xvX\"wyԣ糝ƁOo # wmtu[/&?7q z]g߼n )M i; `\y "~@dd| " S4'Q*yMd}8 5.Cȋ%7^q"]Jtڅ7,0΂ۀx«]7)FOM@(v<~,;]_OѵH Y^N8rPp]A2Ĭ3糒!t}dp)(YU:f0mĥ5'1cB5ogIxW Z#kר1|Fw3+@|Ul)iA_og;벁;j2Uɼ0q"cشLeuH1Y ͪr98J`A[ Q%GsB7pV@ZW #+$,ɎV" OB{eNT/#K$ UKg|q@<$!JꙗQ p%wt_, gN gQR Kjʱfhmly ;1q{a6 Ħ !rok_9xΔ!!᠁\jD $I)! 0)UBat e!CΕ_Dj|N4mH6}|*O~ \?ws= *@ّ` F%@T|-23% Σ{\]n!J[-ETՀ%/}J p!EXo~̸@h9>F$BX\Eo/!VQ: r]l5М% grM? : kIRYN+{D'~DUl4T 2SӌX=XtBҷhRROu@_D}¯r\xg" CpMdt5WHրVKБ u)?*lo [mvo!C)M@_<1,u[CDXK牳P]Ys 6ߥaO~TjːJ)v~~WaQOȪ5}m9` Ů'+D}yu9S`Nqb)vs NG=n,9wQˆQ[NV@۱D߽$ũ}C^wEY 25?ߤ}ȶA~+'w|ix>T ɡTd[6Wcn$%D J[ E%ljI{){Q Yڛ{Me_0b|l7Z@@62( m HY"cvik2$Z7C#4/^\괼3Mj;{CX@az2|\W;FĀH{jύ1rlShj?x7ɀȭ3Hc J^EAA˨|kK6y<z u>C.v\qcz~΂u!)P*h8ߡBæ؃8wrw,{ B+ϡyM k5o''O*Yo!|vdG*@Hf^6 u9ٓ w&J~3vH,F:ݬ5S/_#HYM!.le:__8A| ?O_z//S@ X:'x!lCgZ1DĊ$Ku´F:?Ӵj }Fcp;eXiAoT4 jgIt^1h !;MN,RyT;$J rg| DqbhrS)D dt!\^9rj^𨷽 G&b\M9+[}=8r˷0BIv4 *sW a DtjCt_,*>vag_3kZ2A.v:ٛ퇁$J~M#7w#7s6"~և} yNlߥ&Av&CWGc+m:qB C5t 0I\ `})Wd-Bo~kKк#Z3.q<"O1M .T^7 j ![7\u>կ[̅:4WBPDËdo9~ gLE$ 8R]-;3nG%DŽsNc)%Ch 3AҚ~}q,D'~TBP'|vbR!N[ \~AowwH ;cƿǃd(ObBr^ Mf͛>rE DG2:/_7qW3rAqA>?3US G8m? ![9f@`kPC[预>Fkzo[ !BSٰG')*Z/D3B`WB W1B%,ΕfC8un ^H׼ LzCoS)sbjٍVM^No 8Bn@ N$^ }]| ךL, X8%/y Uݝ@)ʣٽBep%f1h4_9 ^z ~.Oq?P[W?Pyv,;.i(< v ;ZN dߝ?=1:Ek6ߗW l`̝&foIA 9iꂐ:bKE3T[( ~__%ڵޤ~\ wt%EQO@[Oۓ7= 3J QWODm{"V(G@Hj0Dpޡ> }p<)؂}v^#)Ù=1 ׻A5iЉV5"%XtB߷ػB):9:!~[94:2,I7|^~KgA! Z \yt84n]C Y}{8jJCPؤf@8.DC 7[ Vwu|M.t@pƂOBNRy=!fc@0@P/rͻvl*wQ(ңdy"sy:C}xk *~hQƨ ->!r=k۬B"Ór!#X߸Ń ƌ.o=Ӹ|֡ᙅN,-T/ŝV~f@!큯 &R<ȗ%ֽYxg򀢆{@:K΃5ޱ`CZ7U3{pw(7V+"6τzK- y?OCwPN_xZHu6^s@4qZD@=aݗ̭m-@TwG8ʹ@(Ef^ϱu\@/Ҝ TGd3Ljvdѣ5Dw8a '.?O=2C~&B@fh X+"RKFP C)fv֟SC1i+}U=ʩpC{JHBWh4TCf o+VO] t*ZDS{pU?C#hg>vu*j 7 H.@=*ȣ4-U@SR#k̎ʍuzPwٮͥmG}Pyu19R^ۭOd"ʉFpP͋4QO&JvH|u {3b6Ud@n7786ҞЗK-ێ$r\xhb0pkGV]lP0PyPȋ ?ǎ)Ӏ#<8y`?P:? V*@CENŠhc D|*݁K2\'?^λF >N"Oc?PQy,nmIg؈g s6- 'æSȂKluZc`eg@&Bq? .>Yо^ ̀:{w^0=p?uqb{)87Ee`UUXq24";Q6y=lU&@#OȌKW?wjȢ+u,L@cS2L7PaN%Srcϲ"]̄!3 PtS0r.$CCLkf -j; dL̊O呟Oc}NhL_JްM}o>rh/1|t뵑5ekv+l6*{'$>D_`Rv^(omλJ~,a@qŽ>KѸsś!T#-G I ka 083f,%b_)s% OL3D@zMf2bo.GQ 0?li`r6US"C;tn: 48^FExAp5g 7O#1o+ 6RT@c,t1Yl? 'JfTݿQcmBhL̿<݉Lۚ)LEq;Z 6WAm{(kSah 7ƇE•h\qϲG{*ju,ʌ@kꁭ@{ھb%`Wjap\S%3YWOo+gޤ1hgs #=Ę-PYDSn4XLY*m9:#dY3r%JC:4.E7i,9mhٙ/LT>~ @:GO/[Q?3<zikzrg(#Zn:'g1x~A(}-߳- (mU%@+IsI-HL(D86 ?j[:7y|azXK~xOP> U۬ `I.P7E)2`HbȭMDfPQ\T/XM/;<j?ALn'F%P^(p@*dRQQڍiC]چ7.gN{$7~@| zhwxh!,Th<"`_v; ~_ϴ}]ǼtU6nGXswc_ 7CfI\ۀn%l`6NE~C?-֫ WΣ~jXY8pEhvhP /}H tO7mW̅orb֥ K>qzLb(̾6` ޱȈywUO,`>0EjK=Nz'?naó#^ŀYlW)Sc@lUgp**^$mC -gpXtWzCz}ǀ~#sRgNy7c`+? QO57k!0e=5]-cj8E\)}# /C 86N*:=S@y%O1c3xpR^\*8N^лm|W@# %xWE.m8f)w` h?2vy*I4k{)l89Aq:G?m\5mG(hf4<[lANd8S2>\םlwҎ@/Et)eM~ǞdgC{cs$ht%9?\&%2{׀YY,7`aUrDU{5N=4Y$|`>ɘg"oj^sZT /n صm /ڍ-9Mt߸a'T~%[MmŇ ;?/'L|-7"ύRdPn6@ N\9qZ_b+0o:jbǮ!=@eto{@ۋf/잳S3heqѠ1@PhEngLe\asWQ&Ћw?/ ڦu߱Q^kZdu>#8iwy]a%h;< z[f^PԳ: 8Ąe·S~\D.@7}'˺#Vߞ/?@͖ ;vnT`EZGAg4Re){@僲5PA|I@[.ܑtH)@f'%0xxVn0iYA=os`W`%wF̎d#gqm9X%҇l} 9k|#90Y8 g6*`@XЩuڙB?Iu|$뷔RNe\+3h%={%M}uI}6/stcr{Q@7.Џl#+Hj:lԾuvI=MmZx:O0:[AhKf0=*c^LsM@3J9!/t(*oo 8W}jʬZ`{?s^ߣno/1 duAW@" qg0Ѓ&EƽԠt>jc󣠱e ^l/r9]AK-5Ar4 [AwcGuo00{J3:vp7VǸ}lЫ:/(?Fvរ'D3 nRS꾖t6'v2m'hh=9&L`!y^/5m*G@#Bt_d&d;Jd#>_??zY W 4#`}O Fvy00x%"!2QPi^;6 4rzݛA"wcбQa;7zJeiAL<ecB ?/ƻFsARԠ{/fh3ux0:[U} .o.ɭ弅h}}kul>4/Ȝx^8p~O@C_f h]<t)mc^?[) I0<ƧP|xa:tU}&qv9Ds<g$*{^_ 86OҸ  zxAsペ0` qk8ԏj}|y \!/3Yij@HF%/̋E 2DA"iזT{vu@5%h}˜)Uxj\$*55ٻts]`hY7/k0}5OPN ʆf Z*0O`,#H c% AbTR:_CO&+Pק|j*&# j:ϑ F>5ƷoV]cozU|{q٠owho|c+t, d֢x-KyQ:4r.-|A_TI|?՘F"wuw}:hkJ2Vv*6(w(qÒjtZv\"Au9{`fqt4u_气dС3ſt(I6Dy≠! G. ^Ƙﲞ8 C^L ִmBuMP cbxD*|݂Ɖ{[+|yP7 lhܾ) tyYdlYIuWlrOx vQ'l6Z|Aq]ju49DМ[} 9o#$mʻRתAGxnm1>{P& _aѾ)48<;%s2#^Vk#92}IhSX@]xVR-j ƞTq{Hl[!S𺂼g]{D D ;40$ %1Eliw>'z@koXyl"hu=/li;O~Ld\4tk(zi;A?=NG@3) ^Or7&I^s`dzg- VWOo=hyh.Z&\lC,43{_~x H'YAѲFU /g`pɐZhH)])t%~ul:7joua)AM>pvJFQ;+(Rd1Ι;0/m_MX|} h{:t? I(Rh-"S[6gAxm5h3˂ã/y@g@kE8.Oh1<nVFNNh~}`p[e9_@K>sA˾F)K9'> u\x:W%Aˠu7ޅotoR^gC*hyՉ& ~pD矾͡ځs?|ּ z"IwPEľ[GW;{ !z' 4c(?YDV?pNQ hO=P *81Ql m*@n!4_Ǵ=I7㞷l͊ m{>.n`S2j_~9C4r]U$dW{h[_ :A=>d!02l/~|bNNe_3ƘLBq"h25kEqp)/(wzP>5Q[QrZ0u;_ Z[BqSx!*)]jV!hi#}4\7o}E3!2<_iG­SRIs~!hZ),tAmF̂Gehr<G@ǵm@wy#\|d[fcFȣ8[ykPЯ2LؚZo YP\Ĺɼ2nNЙkLҸb5`_N>#t{r?~A<v_fsChFu+e='ZhN95E+]g@ry,Ϭ;>Ԯ7"AC6}O'п6UeVr>)Nh!m|y(yG!4o\:POzUh/L!7Г@B lAe0A,e, DA_R,6WEN`ⰾ3ZHVZ JM^g&=ρFZ&yړ`]]}Yi^umG4-a o{ {>gQ}ɞ@& J\:hfoVc/-ue\]fY=4<g"6A-3A(noΰʤ=Z` 2xa[n_I(Nοt6jA[1~`$If> @y%Yr:$%> }>|[ ZI!Ʒ!+ͥNUtL`[V64j> &y`Wus &IЪVFs9BLc$j͡q5 87;-aAY @L^+#:G1sM@o&z[$Tq{);J52 t'=D@twȸ +P)q `'1*j+j7 :Y^#.RUdaM5~rj:Fy*h]:c{oh|M2υtz3hFvk׹Zt*>\ŗ:զeI@w3[Ҕ`xFrWT9'suOY :#ãj?hZ< 6xyϽ ol)OFiB8Zߔ{k~ B~+~]}}#ُchnk :q*%uQ?_)k&A[Bw4fZZ5л<1Shhn _lgڹC?>?aõA'qEЦIfm ~ hh{vh`Dri٩렡B)TVzh>asA4$*0!d>YkX`B +^ȯEޣoe@$ldh陫(>'hTeF;vyn#<WVM׃VH'sDkHw{ǸꇶIM^'Ao@7(40i Z7 .ЏynkqQT}Q;XA;h#sRI}ds ndžu7$'Ʌ}t@*?t &lxc,x^m'?t'@TQ4 ZrRZP*3ȴ?5lb*@R^cg ;P‚h]MlP>aR]M[{.Xs5DGtYYI9Ϭ1tʮMz }s`ص!Ư*>ޗ #e,7F1=$76lEHs/CM~_: /,2#qz:sEw䷛}muMi>Aw(uƼ:}d 8™pui_81AF+ys*XQZ@T\UG6uΤ?ۗ}`}nL`ბŚH5Ӕs41n&|@Sj4&(}MʣPdA~ 俼G=2 _in۶#C,5'"&]T:so{d8U,0Wm\z4߉"Ek|۪ Gτ"_3̽O!ǐ}7~H{S=6jR)N!sdΡv$:D VRBF8޹k1All.rׅґKG?\@C&߼NL D`G$߯KQ/n r3@x={ 689X}(m}M}sU-Ԑ>U>(MF@fYܣJ"d͍ua -\t)q[L7Saz7|<" Dc}iF4^'e2kd}K-F{Ȣߌ?2~6n9yi| @+>d셌{P(]8ؐgq??2uL7d9:5w}Ĺ,ybż뿒hjwPP>"Y;d;0r29cwMQŢ~mE%dTH>I/ ATNW#717N!nB|a_/?*м塽id}*F  ȴ>Ifc@Aooh^F;G| (v- V1R>LnR^~E?KKˌ2)us}>k5" wvH gTxw.tw)!-@3[gƌ<: [Zr $]ґS+Dd(v݇U[Cfoi]G&)u Np2T^Y c|Ȍ !3iE$Wd.볾g&2>m*lH+'MEuKYδUˉ@)Λ,G|/%F~HVi50̊ ~W1bC~lf $Jo7!˓]֐߰(UwEl2vw3!#Z!yd#P4R˅5Ӆܠ$%,H2RGarSy }q@:8X$픽u{M{42=5ZdR3!#5FRwSNPA&k0"SC6,=caAUm;PE~Q#si\݆Vɗ*5l+)#zIss [ kzr 0!L{0 GˑO$&- b}u ߹whn(/z5r+6FiS9oF4Aݯ_ 2<Ҫ߱!֚*,921r9왕 B8?}Ȩ3ITHMl}p 2(&GK!`<݂|/Dہ|xP5|u? jvyO<㨳֓S@zff4o|?.V0zY5xz;==,2w*[5+0yV=@&Qx6v:2Y]SG?ɓ^〉e>pD*`_Z IE,H~vdz-C!Yb;2skdOOߡXW i.w*C P|ZhCIdL-0C4/o>@&I 9?n]yyvxY$fd`Hb(mf ;s"E'5| a+q,RGjT(_aEyl/UG`;|Ld^wHd+ꙞR}0$e*\Xqg d1_-2Dk jFۀǂ0Ej6E&.܂#"ޚM{9 K/y7,㩞zt7V>;hkw^}B,ϡB[_Ev$dJk+{#d7-dd!0.F{%_JFwc# ӂ̸M9 "PYّmV`JqYfWqLz 9 }v֕刹2 3VZiOd}ɎT@q&i \âإ۽N S^Cff,ȿTr/@?a9t4@WoV;:֡+hivz'68! :DzW Q{84E^Fee/K<T!/Cy;fF?C*Y].#Sl,w8qMWS}df'Ajfh.'hH&ï 髳 k08ѻ!S}眏-B72U>CWCE#jVx3XרJ2x#玮VO(Lc/6oOhT=N"W4# Ojj"Zc7 U; q1OאINg36d{#}e'?_el\ D/P{ȩ` ~}AYyao+t!mR[$([5n49*w,ow \ }k=|z6"sU|9OOFY@iؖB5/\tQ*\_H|:5$tBOn󣁝!:ȢuI)ʮ~OπZHg}kLԝ/P/\{4'm@ydt;{ĩ'ې& hPMIK׺n>}J'E[ Ȼ ҕ@STrff{o?jt."(mjuU [.,Z}݊EuY&Bld%/20F5O^12lk9*PHX-i#`֝bE|Rk zm,")ȨcgYЎYb˚ȢS7 bd|k,Pn:ذ*~?h`p=އ+Y5ÄX){߷Geo,eK X=G+ݮЌf Dh5n?1fm-ol0>>3S+8gy8T\I]խ\4[[+Z=y9ny5*> 24K'\Al`ګ-PC .~=~7'5.+݀2C/Gpbr{gV0;GV!+(:uO dz5qb}/wQQ-t?.y6&,bXqss7DM?#˟ly-7ZXtJ!=&rMGŤqS6CT >O>_ =WSN&WdVWq3Wɚ"[?Uߍ })+3䎱t5n яd%8s-R>}upl^nY +g >,lݢ+Y[FBy| \e{Noډt3tD$5!o|ImPGw7GE7\̓ElaIUYPGe:7_Qӯir˷ ^aʟ^ k}'1ռSMwy ZFWUrl/j]QktQJ*]%ۏ Pm|o.٨H eɲbG܂uď`Iq=qrYa9W/ܾ}oN o*q&.˞`):ר/k9.!%1=dDYkx=I,޳c:՟՜K~p׊cie\\ s,m=v-q'pYBV,LY옚Ȫ;!l?Aa|.Lu8*vx&xyJPU>v* 5,_bJgXw6fգb Ԛ,*r(llåCGհ2]S=6k~OiwSUzi$n)y4B+#ЁY ĬgW5Iʝy{9ć*r#5\4? 8vD_eaо4ܔQ^^NA\e8#NA[pcG N明~\ӥÊb$l}6EO|gD?TGb]kz1啬P*ߕx{_?JιL̇Upf:VIYIo=/.ʻ1@ q6 \_4'+1'70wg8+ոbksV3+؍*,YW/(6v@;:^~hZUx˞^4wg^ǭ.qb?V*,U%N{SxԖb|KB9dn8]RR_r\Iyʉ22'5ee'%xcpVz8)΃?s>>7{vPh/\c^>qC_'XSU=Va]9$O}t j8a4\ZL_Åt[ZNpahMwN*tI]hg( o{e@טe#ʈB'zbaXV8`~r¦ }&{?Jcy7q<8Sٞh{)p'xxUzr>}}w<|F-o8'/?5 Xl!NjXv N}J8]s'l Wnb80V5$~ځ€oa.D"jz`jLaL>+vc*_Σ?* >ejF88_k-kzY|d/}'5Oո%t)vp̋(ܖ;4+Doj :Uzc4')>R%-6^u5!fu҂$.Y>sݎ.S| :3{m6ۨ/߅p.4e+lQl%u'S_DY:M+e2)%>sE rWW*YJ;g=TpUzY.n*Wȵ^Gqқ}/Y{DPGK= %fd]?_OJo<g:ʒ笼ǡ3l,]J 7kHTdKoϕ)W=ڦsj|&nv*qe~<%>ٷgiG򊓸n֚[ Y;+?tDc1.3o\dF̴nيK*&ҺN?5K.ϔ ;9[]3YqښI''#]np|Q:/mA1+U_s*GJ;N{$&\7ĒA+ w^}ywJmW72~6 uRE8}qʦr|fʪ~iZX>ҟSsGC9?Un9F~::KɨDo]\B)~ƅm!s9PK8̕wZ*'Ri&7qqZf>҆3DM4}ftMG2#(|u Ya1sggulXccy,',j/$ntl}CN2}DnSSQ~OqM̦Ŝtm|]k?d'5:+qH:ITֱ=|i@EaNez=z^M\S20Zz5fi+3V]FUPs>:57}Wp#neQm;X]-0?qg/W}v`dJ@G;~My#}Gwv>Xu&6+{Vf0]?,qŧ"N]{'{hHw=:fwU=3WXJ<{vh UJ.=?+~ uX&)Aԅ3eSqM.*ϙjUǂpwpK8?4GֵOxմ.8O-֭z5W?1?O*̫eՆ:S"F]B %O8$%Jdr R8‹SMQ"}a霼׶ic>r^FpڼV{s%7/m+v$Ů{.iӷmfVj.ː)awpg ~= 69?nmY݁8NŸb+ArT.a V\jͰ26l Ϗ YSۄcO*s}#UG>W;l;oJڽ+>8w/?>eΏ+$~='cj)R͟&(._.Y(,T©<~-4NBg㢣:QFg6|*ԵD|˹z7i/w;=QۥuB ׯ=qwN7\AI8$ n3yt+d +BRU.S͐du"ҡ8-CwQp--MJ{GWĕ_&[NcU;#?4^NڑgZqm#.mQ.|T|{2̻i-C$jY8uZ^X+sX gt<:}pԎUl Eeoh%K.e/8]{*Vol)UIsҧn3U4eTR o}cPGq`RmA{O]dU[Tiv>G íou}՝%K}uF>zNO2'JΩKY  _uQ/w~ٿs:o3AO)7 l=aSQU҈N+vsmxQ3~VC NznoOC]9+|`kt6>6zjDg{ʼnElzX_td?_%$.$ OQտϬK\wF?:j&}}˾E}ZkwWNov[tǡkEv؀Ӆ&WWwwEZ8绻1TwwË́Ѭ8EKp@Ğd՞ˏ6u{T'qhwWJfjOK+LSJz> UVk&핾kw}.uwQ^| ޔnOW\_tlAwWX=)nSGzŮj:]]pnun޵R^ԦvKL ؑ!]YqƨM)]{쾽0h%(c|%%ӺO_w/tw.Y",WY^S/ۣ.s9#a_͝Nu&N.]qAU񃺺^WF++f_8JRD o}׎QݕvsVwW\\så{'JdטNGMFazu/0q.=9uxfJ ;P); wO͎vckṮKu%/2>nUowivW˝\.E2(?WSTކĊC7PQ[l 8.VFAb1=+ +-ۋJ7Ϸ@¢5_ږC9P͉6y*jDuݫe>l_p- isM\p\U@An}vV\$KB /N[ᖊ:˭~)("{D!j_Xxb .%!]fZN9Y{,v!pjP|_iTu^bT>{{ ~G;bw7¹]NiL ]>DRp 9ߨ5XU*טqTY1; \/u ~bs\^zj`.z'`a\7*HGew=Bnu^(jVm?Oyd;,OK{6 U,KEݫﲣZu\r?c *NV0 kns((>Z߹=4u)/]~lu)]ȽsRR(sd;Xm]whqST7З}Se8V|yFlH7I{ĕ:T}p~/k~B.꡸m ك&#JC_u{&ǫz:N6?:YrNpq/]P1 Œ=&C';ֱ_搪YoKiyƶ('g0}xvָ8{tdmȆ(d¾T~9Aenᛛ#q&kgc567gHq9cq]ao{5|@|/I;VW?2unotQWwU h>[vj8״I9Ź=uvM.]B&M 0{KµN9;uJqzF_]TzPe92S.ʫoQ?Q aUh}6F5Lm*/DRwH>Fi B}u7C<BU%|cBGm<2ⶥ/ ?ِsY~Y.tIWS^IȒ1_75K3urܼA|`=cgp ]"YL-ujYޱ &'ƫ9F9ym#q_'Cv\;k*p;%Äput;[4b]Gg [,Q / SLG qFFK qӄ6qcp vs+Q@m+;KeM͸wWdCl|u&k).K$6 (CN;++7yxVsة?^0 f'/HFo2ӃaPAZӬ9It-ˢϨ_UM5$}h`@|V@Tu[$Wcl[$ 0F 5ĝk&*qE<]$VMxa2Ũ8T%:ss+~g*.{JozKt l8F>ThJ/k Ӣa O_fD\F+(}iMX+k}!&Ι~d"\kXUnm]a=2< e&=/sQIatI8JuzU0(Rh~^ꓻlHhCuI a4wxTJS)5 ,bƊ 8pThs/Nr{3,KlsyEI6ەp^OԱ'=ōpnL\y2]j5v()I昻C=پ\?[#fV 2EmG\thAN/8QP2 T.;50z(**p9FƠ{jW"Τ% --Bd.F/Fޕu -J[vZ^{zp@#W nW@6x3ϧÇ>$ M(]ϯE- ~Z(iORs*Xjn *@[,ѶǩwEa=@O]U'5{Um\"~[6څ(>o>(kF.mŕ^{ͲAytf:udg`TxV]va|?'7 8< |v3;mB|'(l {PM8YվL)'_[7z׸|aaeKz>*.* tiϋkau<^I^0\cŌsKX ;~{]~ph{h) _/}y\Iysl"QI~ܫ#n E\*g>nZl3-]Ea|9?& T'?BQ lr$9AQ:ŻQݔ+'af/_LCUH4uDK[A-ơ ~=逽T78e#&5#Ze5.4`G1'P]5h:9;konl3P٠eeGe?-"]>^*$GG*Qڣ9Qgx?)}ߚvja=\11 gfwo{]F*h9 ZEn[}2ڥx[wFzjE$8;8I 2jFrNGcq\W.\_:%Mϟײ3+cRvAk)*-[9pM{54s[ipz }=3Dc<2Qѭ3k^ge6#i_lkO 72V=s*.O;(eSFc>KɌ K:QF5΢ };*"@-PWhwTYm.k7~K cʸN[֣_^Q͊B]ƥU>j4ffoeٮRlOͫصICM ٰ>~nJz"T_ts j}uK0H~JJA4pua?ѸCvΠzvWWPg{G /Mw0._)|e G6ߵXn/͊^˹!2&ͧ}Q{u.kiW?BU!E+2Q}foP 8w'F)]jh/ons&.gO2_g Pj[+Q]2]PůG:!sg}_6`&s{ةC[BvQsɰ$zG%|x]އՍ$k3Q[ ʍ^,=\S>p7KnAVrTŵ Xդ/p Nbs"x8K 6LE%b춚eRK{F[wĕs&y%n_)%rEʻmVW.jdkn9* b(Deڥp"ɢi(QsTvG\civ_johro39m㉭W ߯WsoJG>3̻QsoӽGrQæU!AxQ )3IGŃۇrJƏ:"s /VO/euow=04fty∡[u>9,פq7}jrQf_>cpkGy TQ2J9셸Tx6a7nY]o'rSu_\).3iH 5=}n=Ec,oqM9~ jXdE ] >?rOێ_y?ɾ~P~lُ5uX/YC /6?'aޏ?~}xogXzL?߯o_]o{lv OÿMﮍn|?~Ӈ_^?ߞ?'͝8ǿ_][Y׆Skߧm~v~:ōeo175w}rϣl r t~韌/-x'}S?}x}ޏy?/m~|ݿ~nkZ*mM66Z;oi;lqu՛.}Mo;tM35nϜ3}5%16۷oNwA~oP!0POo-o|[4bo->3>3>3>3>3>3H>3H>3H>3H>3H>3(>3(>3(>3(>3(>3h>3h>3h>3h>3h>3>3>3>3>3>Cg !3D|">Cg }>C3 }>C3 }>C03 >À03 >À03 gBS4IhRФ@SM}hBAAAAAiF@iF@iFB i$FB i$FBiQFAiQFAiѐFC i4ѐFC i4ѐFCi 1@i 1@i Mi"HAD&4 Miiiiiififififi` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` P` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` 0` X"KD`,%"DX"KD`,%"DX"KD`,%"DX"KD`,%"DX"KD`,%"DX"KD`,%"DX"f_K8[zjWoϷN=Kmitml/man/0000755000176200001440000000000014127016073012150 5ustar liggesusersmitml/man/is.mitml.list.Rd0000644000176200001440000000123114001335223015132 0ustar liggesusers\name{is.mitml.list} \alias{is.mitml.list} \title{Check if an object is of class \code{mitml.list}} \description{ This function checks if its argument is a list of class \code{mitml.list}. } \usage{ is.mitml.list(x) } \arguments{ \item{x}{An R object.} } \value{ \code{TRUE} or \code{FALSE}. A warning message is displayed if the contents of \code{x} do not appear to be data frames. } \author{Simon Grund} \seealso{\code{\link{as.mitml.list}}} \examples{ l <- list(data.frame(x = rnorm(20))) l <- as.mitml.list(l) is.mitml.list(l) # TRUE l <- as.list(1:10) is.mitml.list(l) # FALSE class(l) <- "mitml.list" is.mitml.list(l) # TRUE, with a warning } mitml/man/long2mitml.list.Rd0000644000176200001440000000346214001352030015465 0ustar liggesusers\name{long2mitml.list} \alias{long2mitml.list} \alias{jomo2mitml.list} \title{Convert imputations from long format to \code{mitml.list}} \description{ These functions convert data sets containing multiple imputations in long format to objects of class \code{mitml.list}. The resulting object can be used in further analyses. } \usage{ long2mitml.list(x, split, exclude = NULL) jomo2mitml.list(x) } \arguments{ \item{x}{A data frame in long format containing multiple imputations (see 'Details').} \item{split}{A character string denoting the column in \code{x} that identifies different imputations (see 'Details').} \item{exclude}{A vector denoting the values of \code{split} that should be excluded.} } \details{ The function \code{long2mitml.list} converts data frames from the long format to \code{mitml.list} (i.e., a list of imputed data sets). In long format, all imputations are contained in a single data frame, where different imputations are denoted by \code{split}. This function splits the data frame into a list of imputed data sets according to \code{split}, excluding the values specified by \code{exclude} (see the 'Examples'). The \code{jomo2mitml.list} function is a special case of \code{long2mitml.list} which converts imputations that have been generated with \code{jomo} (see the \code{jomo} package)). } \value{ A list of imputed data sets with class \code{mitml.list}. } \author{Simon Grund} \seealso{\code{\link{mitmlComplete}}} \examples{ data(studentratings) require(jomo) # impute data using jomo (native functions) clus <- studentratings[, "ID"] Y <- studentratings[, c("ReadAchiev", "ReadDis")] imp <- jomo(Y = Y, clus = clus, nburn = 1000, nbetween = 100, nimp = 5) # split imputations impList <- long2mitml.list(imp, split = "Imputation", exclude = 0) impList <- jomo2mitml.list(imp) } mitml/man/plot.mitml.Rd0000644000176200001440000001444114001351570014535 0ustar liggesusers\name{plot.mitml} \alias{plot.mitml} \title{Print diagnostic plots} \description{ Generates diagnostic plots for assessing the convergence and autocorrelation behavior of \code{pan}'s and \code{jomo}'s MCMC algorithms. } \usage{ \method{plot}{mitml}(x, print = c("beta", "beta2", "psi", "sigma"), pos = NULL, group = "all", trace = c("imputation", "burnin", "all"), thin = 1, smooth = 3, n.Rhat = 3, export = c("none", "png", "pdf"), dev.args = list(), ...) } \arguments{ \item{x}{An object of class \code{mitml} as produced by \code{panImpute} or \code{jomoImpute}.} \item{print}{A character vector containing one or several of \code{"beta"}, \code{"beta2"}, \code{"psi"} or \code{"sigma"} denoting which parameters should be plotted. Default is to plot all parameters.} \item{pos}{Either \code{NULL} or an integer vector denoting a specific entry in \code{"beta"}, \code{"beta2"}, \code{"psi"} or \code{"sigma"}. Default is \code{NULL}, which plots all entries.} \item{group}{Either \code{"all"} or an integer denoting for which group the plots should be generated. Used only when groupwise imputation was used. Default is \code{"all"}.} \item{trace}{One of \code{"imputation"}, \code{"burnin"} or \code{"all"} denoting which part of the chain should be used for the trace plot. Default is \code{"imputation"}, which plots only the iterations after burn-in.} \item{thin}{An integer denoting the thinning factor that is applied before plotting. Default is \code{1}, which corresponds to no thinning.} \item{smooth}{A numeric value denoting the smoothing factor for the trend line in trace plots. Higher values correspond to less smoothing. Default is \code{3}. If set to \code{0} or \code{NULL}, the trend line is suppressed.} \item{n.Rhat}{An integer denoting the number of segments of each chain used for calculating the potential scale reduction factor. Default is \code{3}.} \item{export}{(optional) A character string specifying if plots should be exported to a file. If \code{"png"} or \code{"pdf"}, then plots are printed into a folder named "mitmlPlots" in the current directory using either the \code{png} or the \code{pdf} device. Default is \code{"none"}, which does not export files.} \item{dev.args}{(optional) A named list containing additional arguments that are passed to the graphics device.} \item{\dots}{Parameters passed to the plotting functions.} } \details{ The \code{plot} method generates a series of plots for the parameters of the imputation model which can be used for diagnostic purposes. In addition, a short summary of the parameter chain is displayed. Setting \code{print} to \code{"beta"}, \code{"beta2"}, \code{"psi"} and \code{"sigma"} will plot the fixed effects, the variances and covariances of the random effects, and the variances and covariances of the residuals, respectively. In this context, \code{"beta2"} refers to the fixed effects for target variables at level 2 and is only used when a two-part model was specified in (\code{\link{jomoImpute}}). Each plotting window contains a trace plot (upper left), an autocorrelation plot (lower left), a kernel density approximation of the posterior distribution (upper right), and a posterior summary (lower right). The summary includes the following quantities: \describe{ \item{\code{EAP}:}{Expected value a posteriori (i.e., the mean of the parameter chain)} \item{\code{MAP}:}{Mode a posteriori (i.e., the mode of the parameter chain)} \item{\code{SD}:}{Standard deviation of the parameter chain} \item{\code{2.5\%}:}{The 2.5\% quantile of parameter values} \item{\code{97.5\%}:}{The 97.5\% quantile of parameter values} \item{\code{Rhat}:}{Estimated potential scale reduction factor (\eqn{\hat{R}})} \item{\code{ACF-k}:}{Smoothed autocorrelation at lag \eqn{k}, where \eqn{k} is the number of iterations between imputations (see \code{\link{summary.mitml}})} } The \code{trace} and \code{smooth} arguments can be used to influence how the trace plot is drawn and what part of the chain should be used for it. The \code{thin} argument can be used for thinning the chain before plotting, in which case the number of data points is reduced in the trace plot, and the autocorrelation is calculated up to lag \eqn{k/thin} (see above). The \code{n.Rhat} argument controls the number of segments that are used for calculating the potential scale reduction factor (\eqn{\hat{R}}) in each plot (see \code{summary.mitml}). Further aguments to the graphics device are supplied using the \code{dev.args} argument. The \code{plot} function computes and displays diagnostic information primarily for the imputation phase (i.e., for iterations after burn-in). This is the default in the \code{plot} function and the recommended method for most users. If \code{trace = "all"}, the full chain is displayed with emphasis on the imputation phase, and the posterior summary is calculated based on only the iterations after burn-in (as recommended). If \code{trace = "burnin"}, the posterior summary and the trace plots are calculated based on only the burn-on interations, which is generally not sufficient to establish convergence and should be used with caution. } \note{ The plots are presented on-screen one at a time. To proceed with the next plot, the user may left-click in the plotting window or press the "enter" key while in the R console, depending on the operating system. No plots are displayed when exporting to file. } \value{ None (invisible \code{NULL}). } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{summary.mitml}}} \examples{ \dontrun{ data(studentratings) # * Example 1: simple imputation fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # plot all parameters (default) plot(imp) # plot fixed effects only plot(imp, print = "beta") # export plots to file (using pdf device) plot(imp, export = "pdf", dev.args = list(width = 9, height = 4, pointsize = 12)) # * Example 2: groupwise imputation fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, group = FedState, n.burn = 1000, n.iter = 100, m = 5) # plot fixed effects for all groups (default for 'group') plot(imp, print = "beta", group = "all") # plot fixed effects for first group only plot(imp, print = "beta", group = 1) } } \keyword{methods} mitml/man/write.mitml.Rd0000644000176200001440000000214214002031056014677 0ustar liggesusers\name{write.mitml} \alias{write.mitml} \title{Write \code{mitml} objects to file} \description{ Saves objects of class \code{mitml} in R binary formats (similar to \code{?save}). } \usage{ write.mitml(x, filename, drop = FALSE) } \arguments{ \item{x}{An object of class \code{mitml} as produced by \code{panImpute} or \code{jomoImpute}.} \item{filename}{Name of the destination file, specified with file extension (e.g., \code{.R}, \code{.Rdata}).} \item{drop}{Logical flag indicating if the parameters of the imputation model should be dropped to reduce file size. Default is \code{FALSE}.} } \value{ None (invisible \code{NULL}). } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{read.mitml}}} \examples{ \dontrun{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write full 'mitml' object (default) write.mitml(imp, filename = "imputation.Rdata") # drop parameters of the imputation model write.mitml(imp, filename = "imputation.Rdata", drop = TRUE) } } mitml/man/mitmlComplete.Rd0000644000176200001440000000313614001352056015250 0ustar liggesusers\name{mitmlComplete} \alias{mitmlComplete} \title{Extract imputed data sets} \description{ This function extracts imputed data sets from \code{mitml} class objects as produced by \code{panImpute} and \code{jomoImpute}. } \usage{ mitmlComplete(x, print = "all", force.list = FALSE) } \arguments{ \item{x}{An object of class \code{mitml} as produced by \code{panImpute} or \code{jomoImpute}.} \item{print}{Either an integer vector, \code{"list"}, or \code{"all"} denoting which data sets to extract. If set to \code{"list"} or \code{"all"}, all imputed data sets will be returned as a list. Negative values and zero return the original (incomplete) data set. Default is \code{"all"}.} \item{force.list}{(optional) Logical flag indicating if single data sets should be enclosed in a list. Default is \code{FALSE}.} } \value{ Usually a list of imputed data with class \code{mitml.list} If only one data set is extracted: a data frame unless \code{force.list = TRUE}. } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # extract original (incomplete) data set mitmlComplete(imp, print = 0) # extract first imputed data set (returned as mitml.list) mitmlComplete(imp, print = 1, force.list = TRUE) # extract all imputed data sets at once implist <- mitmlComplete(imp, print = "all") \dontrun{ # ... alternatives with same results implist <- mitmlComplete(imp, print = 1:5) implist <- mitmlComplete(imp, print = "list") } } mitml/man/mitml.list2mids.Rd0000644000176200001440000000441714127016073015500 0ustar liggesusers\name{mitml.list2mids} \alias{mitml.list2mids} \title{Convert objects of class \code{mitml.list} to \code{mids}} \description{ This function converts a \code{mitml.list} class object to \code{mids} (as used in the \code{mice} package). } \usage{ mitml.list2mids(x, data, fill = FALSE, where = NULL) } \arguments{ \item{x}{A list of imputed data sets with class \code{mitml.list} (as produced by \code{\link{mitmlComplete}}, \code{\link{mids2mitml.list}}, or similar).} \item{data}{A data frame containing to original (incomplete) data (see 'Details').} \item{fill}{A logical flag indicating whether variables in the imputed data that are not in the original data should be added and filled with \code{NA} (default is \code{FALSE}).} \item{where}{(optional) A data frame or matrix of logicals indicating the location of missing values (see 'Details').} } \details{ This function converts objects of class \code{mitml.list} into \code{mids} objects (as used in the \code{mice} package). The conversion requires a list of imputed data sets and the original (incomplete) data set. If the imputed data sets have been appended with new variables (e.g., by \code{\link{within.mitml.list}}), the new variables can be added to the original data set by setting \code{fill = TRUE}. This function is essentially a wrapper around \code{\link[mice:as.mids]{as.mids}} that sets the case and imputation identifiers automatically and and passes the \code{where} argument as is (see also the documentation of \code{\link[mice:as.mids]{as.mids}}). } \value{ An object of class \code{mids}. } \author{Simon Grund} \seealso{\code{\link{mitmlComplete}}, \code{\link{mids2mitml.list}}, \code{\link{within.mitml.list}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: simple conversion # convert to 'mids' impmids <- mitml.list2mids(implist, data = studentratings) # * Example 2: conversion with additional variables # compute new variables implist <- within(implist, { M.ReadAchiev <- clusterMeans(ReadAchiev, ID) C.ReadAchiev <- ReadAchiev - M.ReadAchiev }) # convert to 'mids' impmids <- mitml.list2mids(implist, data = studentratings, fill = TRUE) } mitml/man/mids2mitml.list.Rd0000644000176200001440000000131014001352040015451 0ustar liggesusers\name{mids2mitml.list} \alias{mids2mitml.list} \title{Convert objects of class \code{mids} to \code{mitml.list}} \description{ This function converts a \code{mids} class object (as produced by the \code{mice} package) to \code{mitml.list}. The resulting object may be used in further analyses. } \usage{ mids2mitml.list(x) } \arguments{ \item{x}{An object of class \code{mids} as produced by \code{mice} (see the \code{mice} package).} } \value{ A list of imputed data sets with class \code{mitml.list}. } \author{Simon Grund} \seealso{\code{\link{mitmlComplete}}} \examples{ data(studentratings) # imputation using mice require(mice) imp <- mice(studentratings) implist <- mids2mitml.list(imp) } mitml/man/write.mitmlSPSS.Rd0000644000176200001440000000522414002031050015406 0ustar liggesusers\name{write.mitmlSPSS} \alias{write.mitmlSPSS} \title{Write \code{mitml} objects to SPSS compatible format} \description{ Saves objects of class \code{mitml} as a text and a syntax file which can be processed by the statistical software SPSS (IBM Corp., 2013). } \usage{ write.mitmlSPSS(x, filename, sep = "\t", dec = ".", na.value = -999, syntax = TRUE, locale = NULL) } \arguments{ \item{x}{An object of class \code{mitml} or \code{mitml.list} as produced by \code{panImpute}, \code{jomoImpute}, \code{mitmlComplete}, or similar).} \item{filename}{File base name of the data and syntax files, specified without file extension.} \item{sep}{The field separator.} \item{dec}{The decimal separator.} \item{na.value}{A numeric value coding the missing data in the resulting data file.} \item{syntax}{A logical flag indicating if an SPSS syntax file should be generated. This file contains instructions for SPSS for reading in the data file. Default is \code{TRUE}.} \item{locale}{(optional) A character string specifying the localization to be used in SPSS (e.g., \code{"en_US"}, \code{"de_DE"}; see 'Details').} } \details{ In SPSS, multiply imputed data are contained in a single file, in which an \code{Imputation_} variable separates the original data and the various imputed data sets. During export, factors are converted to numeric, whereas character variables are left ``as is''. By default, \code{write.mitmlSPSS} generates a raw text file containing the data, along with a syntax file containing instructions for SPSS. This syntax file mimics SPSS's functionality to read text files with sensible settings. In order to read in the data, the syntax file must be opened and executed using SPSS, or open using the GUI. Manual changes to the syntax file can be required, for example, if the file path of the data file is not correctly represented in the syntax. The \code{locale} argument can be used to ensure that SPSS reads the data in the correct locale. Alternatively, \code{\link{write.mitmlSAV}} may be used for exporting directly to the SPSS native \code{.sav} format. } \value{ None (invisible \code{NULL}). } \references{ IBM Corp. \emph{IBM SPSS Statistics for Windows}. Armonk, NY: IBM Corp } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{mitmlComplete}}, \code{\link{write.mitmlSAV}}} \examples{ \dontrun{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write data file and SPSS syntax write.mitmlSPSS(imp, filename = "imputation", sep = "\t", dec = ".", na.value = -999, locale = "en_US") } } mitml/man/anova.mitml.result.Rd0000644000176200001440000000636614001541757016217 0ustar liggesusers\name{anova.mitml.result} \alias{anova.mitml.result} \title{Compare several nested models} \description{ Performs model comparisons for a series of nested statistical models fitted using \code{with.mitml.list}. } \usage{ \method{anova}{mitml.result}(object, ..., method = c("D3", "D4", "D2"), ariv = c("default", "positive", "robust"), data = NULL) } \arguments{ \item{object}{An object of class \code{mitml.result} as produced by \code{\link{with.mitml.list}}.} \item{\dots}{Additional objects of class \code{mitml.result} to be compared.} \item{method}{A character string denoting the method used for the model comparison. Can be \code{"D3"}, \code{"D4"} or \code{"D2"} (see 'Details'). Default is \code{"D3"}.} \item{ariv}{A character string denoting how the ARIV is calculated. Can be \code{"default"}, \code{"positive"}, or \code{"robust"} (see 'Details').} \item{data}{(optional) A list of imputed data sets (see 'Details').} } \details{ This function performs likelihood-based comparisons between multiple statistical models fitted with \code{\link{with.mitml.list}}. If possible, the models are compared using the \eqn{D_3} statistic (Meng & Rubin, 1992). If this method is unavailable, the \eqn{D_4} or \eqn{D_2} statistic is used instead (Chan & Meng, 2019; Li, Meng, Raghunathan, & Rubin, 1991). This function is essentially a wrapper for \code{\link{testModels}} with the advantage that several models can be compared simultaneously. For a list of supported models and further options for more specific model comparisons, see \code{testModels}. The \code{ariv} argument affects how the average relative increase in variance is calculated (see also \code{testModels}). Note that the \eqn{D_4} method can fail if the data to which the model was fitted cannot be found. In such a case, the \code{data} argument can be used to specify the list of imputed data sets directly (see also \code{testModels}). } \value{ A list containing the results of each model comparison. A \code{print} method is used for more readable output. } \author{Simon Grund} \seealso{\code{\link{with.mitml.list}}, \code{\link{testModels}}} \examples{ require(lme4) data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # simple comparison (same as testModels) fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE)) fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE)) anova(fit1, fit0) \dontrun{ # multiple comparisons fit2 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1 + ReadDis|ID), REML = FALSE)) anova(fit2, fit1, fit0) } } \references{ Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. \emph{Biometrika, 79}, 103-111. Laird, N., Lange, N., & Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the em algorithm. \emph{Journal of the American Statistical Association, 82}, 97-105. Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. \emph{Journal of the American Statistical Association, 86}, 1065-1073. } \keyword{methods} mitml/man/testEstimates.Rd0000644000176200001440000001003314001612433015263 0ustar liggesusers\name{testEstimates} \alias{testEstimates} \title{Compute final estimates and inferences} \description{ Computes final parameter estimates and inferences from multiply imputed data sets. } \usage{ testEstimates(model, qhat, uhat, extra.pars = FALSE, df.com = NULL, ...) } \arguments{ \item{model}{A list of fitted statistical models as produced by \code{\link{with.mitml.list}} or similar.} \item{qhat}{A matrix or list containing the point estimates of the parameters for each imputed data set (see 'Details').} \item{uhat}{An array, matrix, or list containing the variance estimates (i.e., squared standard errors) of the parameters for each imputed data set (see 'Details').} \item{extra.pars}{A logical flag indicating if estimates of additional parameters (e.g., variance components) should be calculated. Default is \code{FALSE}.} \item{df.com}{(optional) A numeric vector denoting the complete-data degrees of freedom for the hypothesis tests (see 'Details').} \item{\dots}{Not used.} } \details{ This function calculates pooled parameter estimates and inferences as suggested by Rubin (1987, "Rubin's rules") for each parameter of the fitted model. The parameters can either be extracted automatically from the fitted statistical models (\code{model}) or provided manually as matrices, arrays, or lists (\code{qhat} and \code{uhat}, see 'Examples'). Rubin's original method assumes that the complete-data degrees of freedom are infinite, which is reasonable in larger samples. Alternatively, the degrees of freedom can be adjusted for smaller samples by specifying \code{df.com} (Barnard & Rubin, 1999). The \code{df.com} argument can either be a single number if the degrees of freedom are equal for all parameters being tested, or a numeric vector with one element per parameter. Using the \code{extra.pars} argument, pooled estimates for additional parameters can be requested (e.g., variance components). This option is available for a number of models but may not provide estimates for all parameters in all model types. In such a case, users may extract the estimates of additional parameters by hand and pool them with the \code{qhat} argument (see 'Examples'). No inferences are calculated for pooled additional parameters. Currently, the procedure supports automatic extraction of model parameters from models that define \code{coef} and \code{vcov} methods (e.g., \code{lm}, \code{glm}, \code{lavaan} and others) as well as multilevel models estimated with \code{lme4} or \code{nlme} and GEEs estimated with \code{geepack}. The arguments \code{qhat} and \code{uhat} provide a general method for pooling parameter estimates regardless of model type (see 'Examples'). Support for further models may be added in future releases. } \value{ A list containing the pooled parameter and inferences. A \code{print} method is used for more readable output. } \references{ Barnard, J., & Rubin, D. B. (1999). Small-sample degrees of freedom with multiple imputation. \emph{Biometrika, 86}, 948-955. Rubin, D. B. (1987). \emph{Multiple imputation for nonresponse in surveys}. Hoboken, NJ: Wiley. } \author{Simon Grund} \seealso{\code{\link{with.mitml.list}}, \code{\link{confint.mitml.testEstimates}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit multilevel model using lme4 require(lme4) fit.lmer <- with(implist, lmer(SES ~ (1|ID))) # * Example 1: pool estimates of fitted models (automatic) # pooled estimates and inferences separately for each parameter (Rubin's rules) testEstimates(fit.lmer) # ... adjusted df for finite samples testEstimates(fit.lmer, df.com = 49) # ... with additional table for variance components and ICCs testEstimates(fit.lmer, extra.pars = TRUE) # * Example 2: pool estimates using matrices or lists (qhat, uhat) fit.lmer <- with(implist, lmer(SES ~ ReadAchiev + (1|ID))) qhat <- sapply(fit.lmer, fixef) uhat <- sapply(fit.lmer, function(x) diag(vcov(x))) testEstimates(qhat = qhat, uhat = uhat) } mitml/man/subset.mitml.list.Rd0000644000176200001440000000337014001352352016034 0ustar liggesusers\name{subset.mitml.list} \alias{subset.mitml.list} \title{Subset a list of imputed data sets} \description{ Creates data subsets for a list of multiply imputed data sets. } \usage{ \method{subset}{mitml.list}(x, subset, select, ...) } \arguments{ \item{x}{A list of imputed data sets with class \code{mitml.list} as produced by \code{mitmlComplete} (or similar).} \item{subset}{An R expression by which to subset each data set.} \item{select}{An R expression by which to select columns.} \item{\dots}{Not used.} } \details{ This function can be used to create subsets and select variables for a list of multiply imputed data sets according to the R expressions given in the \code{subset} and \code{select} arguments. The function is based on the \code{subset} function for regular data sets and works in a similar manner. Note that subsetting is performed individually for each data set. For this reason, the cases included may differ across data sets if the variables used for subsetting contain different values. } \value{ A list of imputed data sets with class \code{mitml.list}. } \author{Simon Grund} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: subset by SES, select variables by name subset(implist, SES < 25, select = c(ID, FedState, Sex, SES, ReadAchiev, ReadDis)) # * Example 2: subset by FedState, select variables by column number subset(implist, FedState == "SH", select = -c(6:7, 9:10)) \dontrun{ # * Example 3: subset by ID and Sex subset(implist, ID %in% 1001:1005 & Sex == "Boy") # * Example 4: select variables by name range subset(implist, select = ID:Sex) } } \keyword{methods} mitml/man/c.mitml.list.Rd0000644000176200001440000000374014001334273014754 0ustar liggesusers\name{c.mitml.list} \alias{c.mitml.list} \alias{rbind.mitml.list} \alias{cbind.mitml.list} \title{Concatenate lists of imputed data sets} \description{ These functions allow concatenating lists of imputed data sets by data set, row, or column. } \usage{ \method{c}{mitml.list}(...) \method{rbind}{mitml.list}(...) \method{cbind}{mitml.list}(...) } \arguments{ \item{\dots}{One or several lists of imputed data sets with class \code{mitml.list} as produced by \code{mitmlComplete} (or similar).} } \details{ The \code{c}, \code{cbind}, and \code{rbind} functions allow combining multiple lists of imputed data sets in different ways. The \code{c} method concatenates by data set (i.e., by appending additional data sets to the list), \code{rbind} concatenates by row (i.e., appending additional rows to each data set), and \code{cbind} concatenates by column (i.e., by appending additional columns to each data set). These functions are intended for experienced users and should be used with caution. Appending rows or columns from multiple imputation procedures is usually unsafe unless in special applications (see 'Examples'). } \value{ A list of imputed data sets with an additional class attribute \code{mitml.list}. } \author{Simon Grund} \examples{ # Example 1: manual imputation by grouping variable data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp1 <- panImpute(subset(studentratings, FedState == "SH"), formula = fml, n.burn = 1000, n.iter = 100, m = 5) imp2 <- panImpute(subset(studentratings, FedState == "B"), formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist1 <- mitmlComplete(imp1) implist2 <- mitmlComplete(imp2) rbind(implist1, implist2) # Example 2: predicted values from linear model imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) pred <- with(implist1, predict(lm(ReadDis ~ ReadAchiev))) cbind(implist, pred.ReadDis = pred) } \keyword{methods} mitml/man/as.mitml.list.Rd0000644000176200001440000000154014001334106015124 0ustar liggesusers\name{as.mitml.list} \alias{as.mitml.list} \title{Convert a list of data sets to \code{mitml.list}} \description{ This function adds a \code{mitml.list} class attribute to a list of data frames. The resulting object can be used in further analyses. } \usage{ as.mitml.list(x) } \arguments{ \item{x}{A list of data frames.} } \value{ The original list with an additional class attribute \code{mitml.list}. The list entries are converted into a \code{data.frame} if necessary, in which case a note is printed. } \author{Simon Grund} \seealso{\code{\link{is.mitml.list}}, \code{\link{long2mitml.list}}} \examples{ # data frame with 'imputation' indicator dat <- data.frame(imputation = rep(1:10, each = 20), x = rnorm(200)) # split into a list and convert to 'mitml.list' l <- split(dat, dat$imputation) l <- as.mitml.list(l) is.mitml.list(l) # TRUE } mitml/man/testConstraints.Rd0000644000176200001440000001330414001612637015646 0ustar liggesusers\name{testConstraints} \alias{testConstraints} \title{Test functions and constraints of model parameters} \description{ Performs hypothesis tests for arbitrary functions of the model parameters using the Delta method. } \usage{ testConstraints(model, qhat, uhat, constraints, method = c("D1", "D2"), ariv = c("default", "positive"), df.com = NULL) } \arguments{ \item{model}{A list of fitted statistical models as produced by \code{\link{with.mitml.list}} or similar.} \item{qhat}{A matrix or list containing the point estimates of the parameters for each imputed data set (see 'Details').} \item{uhat}{An array or list containing the variance-covariance matrix of the parameters for each imputed data set (see 'Details').} \item{constraints}{A character vector specifying constraints or functions of the vector of model parameters to be tested (see 'Details').} \item{method}{A character string denoting the method by which the test is performed. Can be \code{"D1"} or \code{"D2"} (see 'Details'). Default is \code{"D1"}.} \item{ariv}{A character string denoting how the ARIV is calculated. Can be \code{"default"} or \code{"positive"} (see 'Details').} \item{df.com}{(optional) A single number or a numeric vector denoting the complete-data degrees of freedom for the hypothesis test (see 'Details'). Only used if \code{method = "D1"}.} } \details{ This function performs tests of arbitrary functions (or constraints) of the model parameters using similar methods as \code{\link{testModels}}. The function relies on the Delta method (e.g., Casella & Berger, 2002) for testing functions of the parameters and assumes that their sampling distribution is approximately normal. The parameters can either be extracted automatically from the fitted statistical models (\code{model}) or provided manually as matrices, arrays, or lists (\code{qhat} and \code{uhat}, see 'Examples'). Constraints and other functions of the model parameters are specified in the \code{constraints} argument. The constraints must be supplied as a character vector, where each string denotes a function or a constraint to be tested (see 'Examples'). The Wald-like tests that are carried out by \code{testConstraints} are pooled across the imputed data sets with the \eqn{D_1} (Li, Raghunathan & Rubin, 1991) or \eqn{D_2} (Li, Meng, Raghunathan & Rubin, 1991) method, where \eqn{D_1} operates on the constrained point and variance estimates, and \eqn{D_2} operates on the Wald-statistics (for additional details, see \code{testModels}). The pooled estimates and standard errors reported in the output are always based on \eqn{D_1}. For \eqn{D_1}, the complete-data degrees of freedom can be adjusted for smaller samples by specifying \code{df.com} (see \code{testModels}). This function supports general statistical models that define \code{coef} and \code{vcov} methods (e.g., \code{lm}, \code{glm}, \code{lavaan} and others) as well as multilevel models estimated with \code{lme4} or \code{nlme} and GEEs estimated with \code{geepack}. The arguments \code{qhat} and \code{uhat} provide a general method for pooling parameter estimates regardless of model type (see 'Examples'). Support for further models may be added in future releases. The \code{ariv} argument determines how the average relative increase in variance (ARIV) is calculated (see \code{testModels}). If \code{ariv = "default"}, the default estimators are used. If \code{ariv = "positive"}, the default estimators are used but constrained to take on strictly positive values. } \value{ A list containing the results of the model comparison. A \code{print} method is used for more readable output. } \references{ Casella, G., & Berger, R. L. (2002). \emph{Statistical inference (2nd. Ed.)}. Pacific Grove, CA: Duxbury. Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. \emph{Statistica Sinica, 1}, 65-92. Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. \emph{Journal of the American Statistical Association, 86}, 1065-1073. } \author{Simon Grund} \seealso{\code{\link{testModels}}, \code{\link{with.mitml.list}}} \examples{ data(studentratings) fml <- MathDis + ReadDis + SchClimate ~ (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit simple regression model fit.lm <- with(implist, lm(SchClimate ~ ReadDis + MathDis)) # apply Rubin's rules testEstimates(fit.lm) # * Example 1: test 'identity' function of two parameters (automatic) # test equivalent to model comparison with a restricted model (without 'ReadDis' # and 'MathDis') cons <- c("ReadDis", "MathDis") testConstraints(fit.lm, constraints = cons) # ... adjusting for finite samples testConstraints(fit.lm, constraints = cons, df.com = 749) # ... using D2 testConstraints(fit.lm, constraints = cons, method = "D2") # * Example 2: test for equality of two parameters # tests the hypothesis that the coefficients pertaining to 'ReadDis' and 'MathDis' # are equal (ReadDis = MathDis) cons <- c("ReadDis-MathDis") testConstraints(fit.lm, constraints = cons) # * Example 3: test against a fixed value # tests the hypothesis that the coefficient of "ReadDis" is equal to one # (i.e., 'ReadDis' - 1 == 0) cons <- c("ReadDis-1") testConstraints(fit.lm, constraints = cons) # * Example 4: test 'identity' function of two parameters (qhat, uhat) fit.lm <- with(implist, lm(SchClimate ~ ReadDis + MathDis)) qhat <- sapply(fit.lm, coef) uhat <- sapply(fit.lm, function(x) vcov(x), simplify = "array") cons <- c("ReadDis", "MathDis") testConstraints(qhat = qhat, uhat = uhat, constraints = cons) } mitml/man/justice.Rd0000644000176200001440000000230314001342027014073 0ustar liggesusers\name{justice} \alias{justice} \docType{data} \title{Example data set on employees' justice perceptions and satisfaction} \description{ Data set containing simulated data for employees nested within organizations, featuring employees' sex, ratings on individual justice orientation and ratings on job satisfaction. The data set also includes scores for justice climate in each organization (defined at the level of organizations, level 2). Different organizations are denoted by the variable \code{id}. The data were simulated based on the results by Liao and Rupp (2005), as well as the secondary analyses of the same data given in Mathieu, Aguinis, Culpepper, and Chen, (2012). } \format{A data frame containing 1400 observations on 4 variables.} \usage{data(justice)} \references{ Liao, H., & Rupp, D. E. (2005). The impact of justice climate and justice orientation on work outcomes: A cross-level multifoci framework. \emph{Journal of Applied Psychology}, 90, 242.256. Mathieu, J. E., Aguinis, H., Culpepper, S. A., & Chen, G. (2012). Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. \emph{Journal of Applied Psychology}, 97, 951-966. } \keyword{datasets} mitml/man/clusterMeans.Rd0000644000176200001440000000503314001334702015076 0ustar liggesusers\name{clusterMeans} \alias{clusterMeans} \title{Calculate cluster means} \description{ Calculates the mean of a given variable within each cluster, possibly conditioning on an additional grouping variable. } \usage{ clusterMeans(x, cluster, adj = FALSE, group = NULL) } \arguments{ \item{x}{A numeric vector for which cluster means should be calculated. Can also be supplied as a character string denoting a variable in the current environment (see 'Details').} \item{cluster}{A numeric vector or a factor denoting the cluster membership of each unit in \code{x}. Can also be supplied as a character string (see 'Details').} \item{adj}{Logical flag indicating if person-adjusted group means should be calculated. If \code{TRUE}, cluster means are calculated for each unit by excluding that unit from calculating the cluster mean. Default is \code{FALSE}.} \item{group}{(optional) A grouping factor or a variable that can be interpreted as such. If specified, cluster means are calculated separately within the sub-groups defined by \code{group}. Can also be supplied as a character string (see 'Details').} } \details{ This function calculates the mean of a variable within each level of a cluster variable. Any \code{NA} are omitted during calculation. The three main arguments of the function can also be supplied as (single) character strings, denoting the name of the respective variables in the current environment. This is especially useful for calculating several cluster means simultaneously, for example using \code{\link{within.mitml.list}} (see 'Example 2' below). } \value{ A numeric vector with the same length as \code{x} containing the cluster mean for all units. } \author{Simon Grund, Alexander Robitzsch} \seealso{\code{\link{within.mitml.list}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: single cluster means # calculate cluster means (for each data set) with(implist, clusterMeans(ReadAchiev, ID)) # ... person-adjusted cluster means with(implist, clusterMeans(ReadAchiev, ID, adj = TRUE)) # ... groupwise cluster means with(implist, clusterMeans(ReadAchiev, ID, group = Sex)) # * Example 2: automated cluster means using 'for' and 'assign' # calculate multiple cluster means within multiply imputed data sets within(implist,{ vars <- c("ReadAchiev", "MathAchiev", "CognAbility") for(i in vars) assign(paste(i, "Mean", sep = "."), clusterMeans(i, ID)) rm(i, vars) }) } mitml/man/leadership.Rd0000644000176200001440000000256514001342154014560 0ustar liggesusers\name{leadership} \alias{leadership} \docType{data} \title{Example data set on leadership style and job satisfaction} \description{ Data set based on the data simulated by Paul D. Bliese as described in Klein et al. (2000) with slight modifications. The data set consists of 750 employees, nested within 50 work groups, and includes employees' ratings on negative leadership style, job satisfaction, and workload as well as a measure for each work group's cohesion. The original data set is available in the \code{multilevel} package and was modified by (a) transforming workload into a categorical variable, (b) transforming cohesion into a group-level variable, and (c) by inducing missing values. } \usage{data(leadership)} \format{A data frame containing 750 observations on 5 variables.} \references{ Bliese, P. D. (2013). multilevel: Multilevel functions (Version 2.5) [Computer software]. Retrieved from \code{http://CRAN.R-project.org/package=multilevel} Klein, K. J., Bliese, P. D., Kozlowski, S. W. J., Dansereau, F., Gavin, M. B., Griffin, M. A., ... Bligh, M. C. (2000). Multilevel analytical techniques: Commonalities, differences, and continuing questions. In K. J. Klein & S. W. J. Kozlowski (Eds.), \emph{Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions} (pp. 512-553). San Francisco, CA: Jossey-Bass. } \keyword{datasets} mitml/man/amelia2mitml.list.Rd0000644000176200001440000000142114001321663015757 0ustar liggesusers\name{amelia2mitml.list} \alias{amelia2mitml.list} \title{Convert objects of class \code{amelia} to \code{mitml.list}} \description{This function converts a \code{amelia} class object (as produced by the \code{Amelia} package) to \code{mitml.list}. The resulting object may be used in further analyses.} \usage{ amelia2mitml.list(x) } \arguments{ \item{x}{An object of class \code{amelia} as produced by \code{amelia} (see the \code{Amelia} package).} } \value{ A list of imputed data sets with an additional class attribute \code{mitml.list}. } \author{Simon Grund} \seealso{\code{\link{mitmlComplete}}} \examples{ data(studentratings) require(Amelia) imp <- amelia(x = studentratings[, c("ID", "MathAchiev", "ReadAchiev")], cs = "ID") implist <- amelia2mitml.list(imp) } mitml/man/sort.mitml.list.Rd0000644000176200001440000000300714001351773015521 0ustar liggesusers\name{sort.mitml.list} \alias{sort.mitml.list} \title{Sort a list of imputed data sets} \description{ Sorts a list of multiply imputed data sets according to an R expression. } \usage{ \method{sort}{mitml.list}(x, decreasing = FALSE, by, ...) } \arguments{ \item{x}{A list of imputed data sets with class \code{mitml.list} as produced by \code{mitmlComplete} (or similar).} \item{decreasing}{Logical flag indicating if data sets should be sorted in decreasing (i.e., reversed) order. Default is `\code{FALSE}`.} \item{by}{An R expression or a list of multiple expressions by which to sort the imputed data sets (see 'Examples').} \item{\dots}{Further arguments to `\code{order}' (see 'Details').} } \details{ This function sorts a list of imputed data sets according to the R expression given in the \code{by} argument. The function is based on \code{order} and works in a similar manner. Note that sorting is performed individually for each data set. For this reason, the order of cases may differ across data sets if the variables used for sorting contain different values. } \value{ A list of imputed data sets with class \code{mitml.list}. } \author{Simon Grund} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: sort by ID sort(implist, by = ID) # * Example 2: sort by combination of variables sort(implist, by = list(FedState, ID, -SES)) } \keyword{methods} mitml/man/confint.mitml.testEstimates.Rd0000644000176200001440000000317514001335073020056 0ustar liggesusers\name{confint.mitml.testEstimates} \alias{confint.mitml.testEstimates} \title{Compute confidence intervals} \description{ Computes confidence intervals on the basis of the final parameter estimates and inferences given by \code{\link{testEstimates}}. } \usage{ \method{confint}{mitml.testEstimates}(object, parm, level = 0.95, ...) } \arguments{ \item{object}{An object of class \code{mitml.testEstimates} as produced by \code{testEstimates}.} \item{parm}{(optional) A reference to the parameters for which to calculate confidence intervals. Can be a character or integer vector denoting names or position of parameters, respectively. If missing, all parameters are considered (the default).} \item{level}{The confidence level. Default is to \code{0.95} (i.e., 95\%).} \item{\dots}{Not being used.} } \details{ This function computes confidence intervals with the given confidence level for the pooled parameters on the basis of a \eqn{t}-distribution, with estimates, standard errors, and degrees of freedom as returned by \code{\link{testEstimates}}. } \value{ A matrix containing the lower and upper bounds of the confidence intervals. } \author{Simon Grund} \seealso{\code{\link{testEstimates}}} \examples{ data(studentratings) fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 500, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # fit regression model fit <- with(implist, lm(ReadDis ~ 1 + ReadAchiev)) est <- testEstimates(fit) # compute confidence intervals confint(est) # ... with different confidence levels confint(est, level = 0.90) confint(est, level = 0.999) } mitml/man/jomoImpute.Rd0000644000176200001440000003347314001612615014574 0ustar liggesusers\name{jomoImpute} \alias{jomoImpute} \title{Impute single-level and multilevel missing data using \code{jomo}} \description{ Performs single- and multilevel imputation for (mixed) continuous and categorical data using the \code{jomo} package Supports imputation of missing data at level 1 and 2 as well as imputation using random (residual) covariance matrices. See 'Details' for further information. } \usage{ jomoImpute(data, type, formula, random.L1 = c("none", "mean", "full"), n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE) } \arguments{ \item{data}{A data frame containing the incomplete data, the auxiliary variables, the cluster indicator variable, and any other variables that should be included in the imputed datasets.} \item{type}{An integer vector specifying the role of each variable in the imputation model or a list of two vectors specifying a two-level model (see 'Details').} \item{formula}{A formula specifying the role of each variable in the imputation model or a list of two formulas specifying a two-level model. The basic model is constructed by \code{model.matrix}, which allows including derived variables in the imputation model using \code{I()} (see 'Details' and 'Examples').} \item{random.L1}{A character string denoting if the covariance matrix of residuals should vary across groups and how the values of these matrices are stored (see 'Details'). Can be \code{"none"} (common covariance matrix), \code{"mean"} (random covariance matrix, storing only mean values), or \code{"full"} (random covariance matrix, storing all values). Default is \code{"none"}.} \item{n.burn}{The number of burn-in iterations before any imputations are drawn. Default is 5,000.} \item{n.iter}{The number of iterations between imputations. Default is 100.} \item{m}{The number of imputed data sets to generate. Default is 10.} \item{group}{(optional) A character string denoting the name of an additional grouping variable to be used with the \code{formula} argument. If specified, the imputation is run separately within each of these groups.} \item{prior}{(optional) A list with components \code{Binv}, \code{Dinv}, and \code{a} for specifying prior distributions for the covariance matrix of random effects and the covariance matrix of residuals (see 'Details'). Default is to use least-informative priors.} \item{seed}{(optional) An integer value initializing R's random number generator for reproducible results. Default is to use the global seed.} \item{save.pred}{(optional) Logical flag indicating if variables derived using \code{formula} should be included in the imputed data sets. Default is \code{FALSE}.} \item{keep.chains}{(optional) A character string denoting which chains of the MCMC algorithm to store. Can be \code{"full"} (stores chains for all parameters) or \code{"diagonal"} (stores chains for fixed effects and diagonal entries of the covariance matrices). Default is \code{"full"} (see 'Details').} \item{silent}{(optional) Logical flag indicating if console output should be suppressed. Default is \code{FALSE}.} } \details{ This function serves as an interface to the \code{jomo} package and supports imputation of single-level and multilevel continuous and categorical data at both level 1 and 2 (see Carpenter & Kenward, 2013; Goldstein et al., 2009). In order for categorical variables to be detected correctly, these must be formatted as a \code{factor} variables (see 'Examples'). The imputation model can be specified using either the \code{type} or the \code{formula} argument. The \code{type} interface is designed to provide quick-and-easy imputations using \code{jomo}. The \code{type} argument must be an integer vector denoting the role of each variable in the imputation model: \itemize{ \item{\code{1}: target variables containing missing data} \item{\code{2}: predictors with fixed effect on all targets (completely observed)} \item{\code{3}: predictors with random effect on all targets (completely observed)} \item{\code{-1}: grouping variable within which the imputation is run separately} \item{\code{-2}: cluster indicator variable} \item{\code{0}: variables not featured in the model} } At least one target variable and, for multilevel imputation, the cluster indicator must be specified. If the cluster indicator is omitted, single-level imputation will be performed. The intercept is automatically included as both a fixed and (for multilevel models) a random effect. If a variable of type \code{-1} is found, then separate imputations are performed within each level of that variable. The \code{formula} argument is intended as a more flexible and feature-rich interface to \code{jomo}. Specifying the \code{formula} argument is similar to specifying other formulae in R. Given below is a list of operators that \code{jomoImpute} currently understands: \itemize{ \item{\code{~}: separates the target (left-hand) and predictor (right-hand) side of the model} \item{\code{+}: adds target or predictor variables to the model} \item{\code{*}: adds an interaction term of two or more predictors} \item{\code{|}: denotes cluster-specific random effects and specifies the cluster indicator (e.g., \code{1|ID})} \item{\code{I()}: defines functions to be interpreted by \code{model.matrix}} } If the cluster indicator is omitted, single-level imputation will be performed. For multilevel imputation, predictors are allowed to have fixed effects, random effects, or both on all target variables. The intercept is automatically included as both a fixed and (for multilevel models) a random effect. Both can be suppressed if needed (see \code{\link{panImpute}}). Note that, when specifying random effects other than the intercept, these will \emph{not} be automatically added as fixed effects and must be included explicitly. Any predictors defined by \code{I()} will be used for imputation but not included in the data set unless \code{save.pred = TRUE}. If missing data occur at both level 1 and 2, the imputation model is specified as a list of two \code{formula}s or \code{type}s, respectively. The first element of this list denotes the model specification for variables at level 1. The second element denotes the model specification for variables at level 2. Missing data are imputed jointly at both levels (see 'Examples', see also Carpenter and Kenward, 2013; Goldstein et al., 2009). It is possible to model the covariance matrix of residuals at level 1 as random across clusters (Yucel, 2011; Carpenter & Kenward, 2013). The \code{random.L1} argument determines this behavior and how the values of these matrices are stored. If set to \code{"none"}, a common covariance matrix is assumed across groups (similar to \code{panImpute}). If set to \code{"mean"}, the covariance matrices are random, but only the average covariance matrix is stored at each iteration. If set to \code{"full"}, the covariance matrices are random, and all variances and covariances from all clusters are stored. In order to run separate imputations for each level of an additional grouping variable, the \code{group} argument can be used. The name of the grouping variable must be given as a character string (i.e., in quotation marks). The default prior distribution for the covariance matrices in \code{jomoImpute} are "least informative" inverse-Wishart priors with minimum positive degrees of freedom (largest dispersion) and the identity matrix for scale. The \code{prior} argument can be used to specify alternative prior distributions. These must be supplied as a list containing the following components: \itemize{ \item{\code{Binv}: scale matrix for the covariance matrix of residuals at level 1} \item{\code{Dinv}: scale matrix for the covariance matrix of random effects and residuals at level 2} \item{\code{a}: starting value for the degrees of freedom of random covariance matrices of residuals (only used with \code{random.L1 = "mean"} or \code{random.L1 = "full"})} } Note that \code{jomo} does not allow for the degrees of freedom for the inverse-Wishart prior to be specified by the user. These are always set to the lowest value possible (largest dispersion) or determined iteratively if the residuals at level 1 are modeled as random (see above). For single-level imputation, only \code{Binv} is relevant. In imputation models with many parameters, the number of chains in the MCMC algorithm being stored can be reduced with the \code{keep.chains} argument (see also \code{\link{panImpute}}). This setting influences the storage mode of parameters (e.g., dimensions and indices of arrays) and should be used with caution. } \value{ An object of class \code{mitml}, containing the following components: \item{data}{The original (incomplete) data set, sorted according to the cluster variable and (if given) the grouping variable, with several attributes describing the original order (\code{"sort"}), grouping (\code{"group"}) and factor levels of categorical variables.} \item{replacement.mat}{A matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set.} \item{index.mat}{A matrix containing the row and column index for each missing value. The index matrix is used to \emph{link} the missing values in the data set with their corresponding rows in the replacement matrix.} \item{call}{The matched function call.} \item{model}{A list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, at level 1 and level 2, respectively.} \item{random.L1}{A character string denoting the handling of the (random) covariance matrix of residuals at level 1 (see 'Details').} \item{prior}{The prior parameters used in the imputation model.} \item{iter}{A list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets.} \item{par.burnin}{A multi-dimensional array containing the parameters of the imputation model from the burn-in phase.} \item{par.imputation}{A multi-dimensional array containing the parameters of the imputation model from the imputation phase.} } \note{ For objects of class \code{mitml}, methods for the generic functions \code{print}, \code{summary}, and \code{plot} are available to inspect the fitted imputation model. \code{mitmlComplete} is used for extracting the imputed data sets. } \references{ Carpenter, J. R., & Kenward, M. G. (2013). \emph{Multiple imputation and its application}. Hoboken, NJ: Wiley. Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. \emph{Statistical Modelling}, 9, 173-197. Yucel, R. M. (2011). Random covariances and mixed-effects models for imputing multivariate multilevel continuous data. \emph{Statistical Modelling}, 11, 351-370. } \author{Simon Grund, Alexander Robitzsch, Oliver Luedtke} \seealso{\code{\link{panImpute}}, \code{\link{mitmlComplete}}, \code{\link{summary.mitml}}, \code{\link{plot.mitml}}} \examples{ # NOTE: The number of iterations in these examples is much lower than it # should be. This is done in order to comply with CRAN policies, and more # iterations are recommended for applications in practice! data(studentratings) data(leadership) # *** # for further examples, see "panImpute" # ?panImpute # *** ................................ # the 'type' interface # # * Example 1.1 (studentratings): 'ReadDis' and 'SES', predicted by 'ReadAchiev' # (random slope) type <- c(-2, 0, 0, 0, 0, 1, 3, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- jomoImpute(studentratings, type = type, n.burn = 100, n.iter = 10, m = 5) # * Example 1.2 (leadership): all variables (mixed continuous and categorical # data with missing values at level 1 and level 2) type.L1 <- c(-2, 1, 0, 1, 1) # imputation model at level 1 type.L2 <- c(-2, 0, 1, 0, 0) # imputation model at level 2 names(type.L1) <- names(type.L2) <- colnames(leadership) type <- list(type.L1, type.L2) type imp <- jomoImpute(leadership, type = type, n.burn = 100, n.iter = 10, m = 5) # * Example 1.3 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted # with empty model, groupwise for 'FedState' (single-level imputation) type <- c(0, -1, 0, 0, 0, 1, 1, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- jomoImpute(studentratings, type = type, group = "FedState", n.burn = 100, n.iter = 10, m = 5) # *** ................................ # the 'formula' interface # # * Example 2.1 (studentratings): 'ReadDis' and 'SES' predicted by 'ReadAchiev' # (random slope) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.2 (studentratings): 'ReadDis' predicted by 'ReadAchiev' and the # the cluster mean of 'ReadAchiev' fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.3 (studentratings): 'ReadDis' predicted by 'ReadAchiev', groupwise # for 'FedState' fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100, n.iter = 10, m = 5) # * Example 2.4 (leadership): all variables (mixed continuous and categorical # data with missing values at level 1 and level 2) fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , COHES ~ 1 ) imp <- jomoImpute(leadership, formula = fml, n.burn = 100, n.iter = 10, m = 5) # * Example 2.5 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted # with empty model, groupwise for 'FedState' (single-level imputation) fml <- ReadDis + ReadAchiev + SES ~ 1 imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100, n.iter = 10, m = 5) } \keyword{models} mitml/man/panImpute.Rd0000644000176200001440000002603514001612610014375 0ustar liggesusers\name{panImpute} \alias{panImpute} \title{Impute multilevel missing data using \code{pan}} \description{ Performs multiple imputation of multilevel data using the \code{pan} package (Schafer & Yucel, 2002). Supports imputation of continuous multilevel data with missing values at level 1. See 'Details' for further information. } \usage{ panImpute(data, type, formula, n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE) } \arguments{ \item{data}{A data frame containing the incomplete data, the auxiliary variables, the cluster indicator variable, and any other variables that should be included in the imputed datasets.} \item{type}{An integer vector specifying the role of each variable in the imputation model (see 'Details').} \item{formula}{A formula specifying the role of each variable in the imputation model. The basic model is constructed by \code{model.matrix}, thus allowing to include derived variables in the imputation model using \code{I()} (see 'Details' and 'Examples').} \item{n.burn}{The number of burn-in iterations before any imputations are drawn. Default is 5,000.} \item{n.iter}{The number of iterations between imputations. Default is 100.} \item{m}{The number of imputed data sets to generate.} \item{group}{(optional) A character string denoting the name of an additional grouping variable to be used with the \code{formula} argument. If specified, the imputation model is run separately within each of these groups.} \item{prior}{(optional) A list with components \code{a}, \code{Binv}, \code{c}, and \code{Dinv} for specifying prior distributions for the covariance matrix of random effects and the covariance matrix of residuals (see 'Details'). Default is to use least-informative priors.} \item{seed}{(optional) An integer value initializing \code{pan}'s random number generator for reproducible results. Default is to using a random seed.} \item{save.pred}{(optional) Logical flag indicating if variables derived using \code{formula} should be included in the imputed data sets. Default is \code{FALSE}.} \item{keep.chains}{(optional) A character string denoting which chains of the MCMC algorithm to store. Can be \code{"full"} (stores chains for all parameters) or \code{"diagonal"} (stores chains for fixed effects and diagonal entries of the covariance matrices). Default is \code{"full"} (see 'Details').} \item{silent}{(optional) Logical flag indicating if console output should be suppressed. Default is to \code{FALSE}.} } \details{ This function serves as an interface to the \code{pan} package and supports imputation of continuous multilevel data at level 1 (Schafer & Yucel, 2002). The imputation model can be specified using either the \code{type} or the \code{formula} argument. The \code{type} interface is designed to provide quick-and-easy imputations using \code{pan}. The \code{type} argument must be an integer vector denoting the role of each variable in the imputation model: \itemize{ \item{\code{1}: target variables containing missing data} \item{\code{2}: predictors with fixed effect on all targets (completely observed)} \item{\code{3}: predictors with random effect on all targets (completely observed)} \item{\code{-1}: grouping variable within which the imputation is run separately} \item{\code{-2}: cluster indicator variable} \item{\code{0}: variables not featured in the model} } At least one target variable and the cluster indicator must be specified. The intercept is automatically included as both a fixed and a random effect. If a variable of type \code{-1} is found, then separate imputations are performed within each level of that variable. The \code{formula} argument is intended as a more flexible and feature-rich interface to \code{pan}. Specifying the \code{formula} argument is similar to specifying other formulae in R. Given below is a list of operators that \code{panImpute} understands: \itemize{ \item{\code{~}: separates the target (left-hand) and predictor (right-hand) side of the model} \item{\code{+}: adds target or predictor variables to the model} \item{\code{*}: adds an interaction term of two or more predictors} \item{\code{|}: denotes cluster-specific random effects and specifies the cluster indicator (e.g., \code{1|ID})} \item{\code{I()}: defines functions to be interpreted by \code{model.matrix}} } Predictors are allowed to have fixed effects, random effects, or both on all target variables. The intercept is automatically included as both a fixed and a random effect, but it can be suppressed if needed (see 'Examples'). Note that, when specifying random effects other than the intercept, these will \emph{not} be automatically added as fixed effects and must be included explicitly. Any predictors defined by \code{I()} will be used for imputation but not included in the data set unless \code{save.pred = TRUE}. In order to run separate imputations for each level of an additional grouping variable, the \code{group} argument can be used. The name of the grouping variable must be given as a character string (i.e., in quotation marks). The default prior distributions for the covariance matrices in \code{panImpute} are "least informative" inverse-Wishart priors with minimum positive degrees of freedom (largest dispersion) and the identity matrix for scale. The \code{prior} argument can be used to specify alternative prior distributions. These must be supplied as a list containing the following components: \itemize{ \item{\code{a}: degrees of freedom for the covariance matrix of residuals} \item{\code{Binv}: scale matrix for the covariance matrix of residuals} \item{\code{c}: degrees of freedom for the covariance matrix of random effects} \item{\code{Dinv}: scale matrix for the covariance matrix of random effects} } A sensible choice for a diffuse non-default prior is to set the degrees of freedom to the lowest value possible, and the scale matrices according to a prior guess of the corresponding covariance matrices (see Schafer & Yucel, 2002). In imputation models with many parameters, the number of chains in the MCMC algorithm being stored can be reduced with the \code{keep.chains} argument. If set to \code{"full"} (the default), all chains are saved. If set to \code{"diagonal"}, only chains pertaining to fixed effects and the diagonal entries of the covariance matrices are saved. This setting influences the storage mode of parameters (e.g., dimensions and indices of arrays) and should be used with caution. } \value{ An object of class \code{mitml}, containing the following components: \item{data}{The original (incomplete) data set, sorted according to the cluster variable and (if given) the grouping variable, with several attributes describing the original row order (\code{"sort"}) and grouping (\code{"group"}.} \item{replacement.mat}{A matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set.} \item{index.mat}{A matrix containing the row and column index for each missing value. The index matrix is used to \emph{link} the missing values in the data set with their corresponding rows in the replacement matrix.} \item{call}{The matched function call.} \item{model}{A list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, respectively.} \item{random.L1}{A character string denoting the handling of random residual covariance matrices (not used here; see \code{jomoImpute}).} \item{prior}{The prior parameters used in the imputation model.} \item{iter}{A list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets.} \item{par.burnin}{A multi-dimensional array containing the parameters of the imputation model from the burn-in phase.} \item{par.imputation}{A multi-dimensional array containing the parameters of the imputation model from the imputation phase.} } \note{ For objects of class \code{mitml}, methods for the generic functions \code{print}, \code{summary}, and \code{plot} are available to inspect the fitted imputation model. \code{mitmlComplete} is used for extracting the imputed data sets. } \references{ Schafer, J. L., and Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. \emph{Journal of Computational and Graphical Statistics, 11}, 437-457. } \author{Simon Grund, Alexander Robitzsch, Oliver Luedtke} \seealso{\code{\link{jomoImpute}}, \code{\link{mitmlComplete}}, \code{\link{summary.mitml}}, \code{\link{plot.mitml}}} \examples{ # NOTE: The number of iterations in these examples is much lower than it # should be! This is done in order to comply with CRAN policies, and more # iterations are recommended for applications in practice! data(studentratings) # *** ................................ # the 'type' interface # # * Example 1.1: 'ReadDis' and 'SES', predicted by 'ReadAchiev' and # 'CognAbility', with random slope for 'ReadAchiev' type <- c(-2, 0, 0, 0, 0, 0, 3, 1, 2, 0) names(type) <- colnames(studentratings) type imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5) # * Example 1.2: 'ReadDis' and 'SES' groupwise for 'FedState', # and predicted by 'ReadAchiev' type <- c(-2, -1, 0, 0, 0, 0, 2, 1, 0, 0) names(type) <- colnames(studentratings) type imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5) # *** ................................ # the 'formula' interface # # * Example 2.1: imputation of 'ReadDis', predicted by 'ReadAchiev' # (random intercept) fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # ... the intercept can be suppressed using '0' or '-1' (here for fixed intercept) fml <- ReadDis ~ 0 + ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # * Example 2.2: imputation of 'ReadDis', predicted by 'ReadAchiev' # (random slope) fml <- ReadDis ~ ReadAchiev + (1+ReadAchiev|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # * Example 2.3: imputation of 'ReadDis', predicted by 'ReadAchiev', # groupwise for 'FedState' fml <- ReadDis ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, group = "FedState", n.burn = 1000, n.iter = 100, m = 5) # * Example 2.4: imputation of 'ReadDis', predicted by 'ReadAchiev' # including the cluster mean of 'ReadAchiev' as an additional predictor fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # ... using 'save.pred' to save the calculated cluster means in the data set fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5, save.pred = TRUE) head(mitmlComplete(imp, print = 1)) } \keyword{models} mitml/man/summary.mitml.Rd0000644000176200001440000000561414001353131015251 0ustar liggesusers\name{summary.mitml} \alias{summary.mitml} \title{Summary measures for imputation models} \description{ Provides summary statistics and additional information on imputations in objects of class \code{mitml}. } \usage{ \method{summary}{mitml}(object, n.Rhat = 3, goodness.of.appr = FALSE, autocorrelation = FALSE, ...) } \arguments{ \item{object}{An object of class \code{mitml} as produced by \code{panImpute} or \code{jomoImpute}.} \item{n.Rhat}{(optional) An integer denoting the number of segments used for calculating the potential scale reduction factor. Default is \code{3}.} \item{goodness.of.appr}{(optional) A logical flag indicating if the goodness of approximation should be printed. Default is \code{FALSE} (see 'Details').} \item{autocorrelation}{(optional) A logical flag indicating if the autocorrelation should be printed. Default is \code{FALSE} (see 'Details').} \item{\dots}{Not used.} } \details{ The \code{summary} method calculates summary statistics for objects of class \code{mitml} as produced by \code{\link{panImpute}} or \code{\link{jomoImpute}}. The output includes the potential scale reduction factor (PSRF, or \eqn{\hat{R}}) and (optionally) the goodness of approximation and autocorrelation. The PSRF is calculated for each parameter of the imputation model and can be used as a convergence diagnostic (Gelman and Rubin, 1992). Calculation of the PSRFs can be suppressed by setting \code{n.Rhat = NULL}. The PSRFs are not computed from different chains but by dividing each chain from the imputation phase into a number of segments as denoted by \code{n.Rhat}. This is slightly different from the original method proposed by Gelman and Rubin. The goodness of approximation measure indicates what proportion of the posterior standard deviation is due to simulation error. This is useful for assessing the accuracy of the posterior summaries (e.g., the EAP). The autocorrelation includes estimates of the autocorrelation in the chains at lag 1 (i.e., for consecutive draws) and for lags \eqn{k} and \eqn{2k}, where \eqn{k} is the number of iterations between imputations. For lag \eqn{k} and \eqn{2k}, the autocorrelation is slightly smoothed to reduce the influence of noise on the estimates (see \code{\link{plot.mitml}}). } \value{ An object of class \code{summary.mitml}. A print method is used for more readable output. } \references{ Gelman, A., and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. \emph{Statistical Science, 7}, 457-472. Hoff, P. D. (2009). \emph{A first course in Bayesian statistical methods}. New York, NY: Springer. } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{plot.mitml}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # print summary summary(imp) } \keyword{methods} mitml/man/with.mitml.list.Rd0000644000176200001440000001041514127010430015475 0ustar liggesusers\name{with.mitml.list} \alias{with.mitml.list} \alias{within.mitml.list} \title{Evaluate an expression in a list of imputed data sets} \description{ The \code{with} and \code{within} methods evaluate R expressions in a list of multiply imputed data sets. } \usage{ \method{with}{mitml.list}(data, expr, include.data = FALSE, ...) \method{within}{mitml.list}(data, expr, ignore = NULL, ...) } \arguments{ \item{data}{A list of imputed data sets with class \code{mitml.list} as produced by \code{mitmlComplete} or \code{as.mitml.list}.} \item{expr}{An R expression to be evaluated for each data set.} \item{include.data}{Either a logical flag or a character string denoting how the data are included when \code{expr} is evaluated (see 'Details'). If \code{FALSE}, an environment is created from \code{data}, and \code{expr} is evaluated therein. If \code{TRUE}, a call is constructed from \code{expr} and evaluated with the imputed data in the \code{"data"} slot. If character, a call is constructed from \code{expr} and evaluated with the imputed data in the slot named by \code{include.data}. Default is \code{FALSE}.} \item{ignore}{A character vector naming objects that are created but should not be saved (see 'Details').} \item{\dots}{Not used.} } \details{ The two functions provide \code{with} and \code{within} methods for objects of class \code{mitml.list}. They evaluate an R expression repeatedly for each of the imputed data sets but return different values: \code{with} returns the result of the evaluated expression; \code{within} returns the resulting data sets. The \code{within} function is useful for transforming and computing variables in the imputed data (e.g., centering, calculating cluster means, etc.). The \code{with} function is useful, for example, for fitting statistical models. The list of fitted models can be analyzed using \code{\link{testEstimates}}, \code{\link{testModels}}, \code{\link{testConstraints}}, or \code{\link[=anova.mitml.result]{anova}}. The \code{include.data} argument can be used to include the imputed data sets in the call to fit statistical models (\code{expr}) using \code{with}. This is useful for fitting models that require that the fitting function be called with a proper \code{data} argument (e.g., \code{lavaan} or \code{nlme}; see 'Examples'). Setting \code{include.data = TRUE} will fit the model with the imputed data sets used as the \code{data} argument. Setting \code{include.data = "df"} (or similar) will fit the model with the imputed data sets as the \code{df} argument (useful if the function refers to the data by a nonstandard name, such as \code{"df"}). The \code{ignore} argument can be used to declare objects that are not to be saved in the data sets created by \code{within}. } \value{ \code{with}: A list of class \code{mitml.results} containing the evaluated expression for each data set. \code{within}: A list of class \code{mitml.list} containing the imputed data modified by the evaluated expression. } \author{Simon Grund} \seealso{\code{\link{mitmlComplete}}, \code{\link{anova.mitml.result}}, \code{\link{testEstimates}}, \code{\link{testModels}}, \code{\link{testConstraints}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: data transformation # calculate and save cluster means new1.implist <- within(implist, Means.ReadAchiev <- clusterMeans(ReadAchiev, ID)) # center variables, calculate interaction terms, ignore byproducts new2.implist <- within(implist, { M.SES <- mean(SES) M.CognAbility <- mean(CognAbility) C.SES <- SES - M.SES C.CognAbility <- CognAbility - M.CognAbility SES.CognAbility <- C.SES * C.CognAbility }, ignore = c("M.SES", "M.CognAbility")) # * Example 2: fitting statistical models # fit regression model fit.lm <- with(implist, lm(ReadAchiev ~ ReadDis)) # fit multilevel model with lme4 require(lme4) fit.lmer <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID))) \dontrun{ # fit structural equation model with lavaan (with include.data = TRUE) require(lavaan) mod <- "ReadAchiev ~ ReadDis" fit.sem <- with(implist, sem(model = mod, cluster = "ID", estimator = "MLR"), include.data = TRUE) } } \keyword{methods} mitml/man/mitml-package.Rd0000644000176200001440000000565114001574042015155 0ustar liggesusers\name{mitml-package} \alias{mitml-package} \docType{package} \title{mitml: Tools for multiple imputation in multilevel modeling} \description{ Provides tools for multiple imputation of missing data in multilevel modeling. This package includes a user-friendly interface to the algorithms implemented in the R packages \code{pan} and \code{jomo} as well as several functions for visualizing, managing, and analyzing multiply imputed data sets. The main interface to \code{pan} is the function \code{\link{panImpute}}, which allows specifying imputation models for continuous variables with missing data at level 1. In addition, the function \code{\link{jomoImpute}} provides an interface to \code{jomo}, which allows specifying imputation models for both continuous and categorical variables with missing data at levels 1 and 2 as well as single-level data. The imputations and parameter values are stored in objects of class \code{mitml}. To obtain the completed (i.e., imputed) data sets, \code{\link{mitmlComplete}} is used, producing a list of imputed data sets of class \code{mitml.list} that can be used in further analyses. Several additional functions allow for convenient analysis of multiply imputed data sets including (bot not limited to) multilevel analyses with \code{lme4} or \code{nlme} and structural equation modeling with \code{lavaan}. The functions \code{\link[=with.mitml.list]{within}}, \code{\link[=sort.mitml.list]{sort}}, and \code{\link[=subset.mitml.list]{subset}} can be used to manage and manipulate multiply imputed data sets. Statistical models are fitted using \code{\link[=with.mitml.list]{with}}. Pooled parameter estimates can be extracted with \code{\link{testEstimates}}, and model comparisons as well as single- and multi-parameter hypotheses tests can be performed using the functions \code{\link{testModels}} and \code{\link{testConstraints}}. In addition, the \code{\link[=anova.mitml.result]{anova}} method provides a simple interface to model comparisons. Data sets can be imported and exported from or to different statistical software packages. Currently, \code{\link{mids2mitml.list}}, \code{\link{amelia2mitml.list}}, \code{\link{jomo2mitml.list}}, and \code{\link{long2mitml.list}} can be used for importing imputations for other packages in R. In addition, \code{\link{write.mitmlMplus}}, \code{\link{write.mitmlSAV}}, and \code{\link{write.mitmlSPSS}} export data sets to M\emph{plus} and SPSS, respectively. Finally, the package provides tools for summarizing and visualizing imputation models, which is useful for the assessment of convergence and the reporting of results. The data sets contained in this package are published under the same license as the package itself. They contain simulated data and may be used by anyone free of charge as long as reference to this package is given. } \author{ Authors: Simon Grund, Alexander Robitzsch, Oliver Luedtke Maintainer: Simon Grund } \keyword{package} mitml/man/multilevelR2.Rd0000644000176200001440000000507714001346370015033 0ustar liggesusers\name{multilevelR2} \alias{multilevelR2} \title{Calculate R-squared measures for multilevel models} \description{ Calculates several measures for the proportion of explained variance in a fitted linear mixed-effects or multilevel model (or a list of fitted models). } \usage{ multilevelR2(model, print = c("RB1", "RB2", "SB", "MVP")) } \arguments{ \item{model}{Either a fitted linear mixed-effects model as produced by \code{lme4} or \code{nlme}, or a list of fitted models as produced by \code{with.mitml.list}.} \item{print}{A character vector denoting which measures should be calculated (see details). Default is to printing all measures.} } \details{ This function calculates several measures of explained variance (\eqn{R^2}) for linear-mixed effects models. It can be used with a single model, as produced by the packages \code{lme4} or \code{nlme}, or a list of fitted models produced by \code{with.mitml.list}. In the latter case, the \eqn{R^2} measures are calculated separately for each imputed data set and then averaged across data sets. Different \eqn{R^2} measures can be requested using the \code{print} argument. Specifying \code{RB1} and \code{RB2} returns the explained variance at level 1 and level 2, respectively, according to Raudenbush and Bryk (2002, pp. 74 and 79). Specifying \code{SB} returns the total variance explained according to Snijders and Bosker (2012, p. 112). Specifying \code{MVP} returns the total variance explained based on ``multilevel variance partitioning'' as proposed by LaHuis, Hartman, Hakoyama, and Clark (2014). } \value{ A numeric vector containing the \eqn{R^2} measures requested in \code{print}. } \note{ Calculating \eqn{R^2} measures is currently only supported for two-level models with a single cluster variable. } \author{Simon Grund} \references{ LaHuis, D. M., Hartman, M. J., Hakoyama, S., & Clark, P. C. (2014). Explained variance measures for multilevel models. \emph{Organizational Research Methods}, 17, 433-451. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage. } \examples{ require(lme4) data(studentratings) fml <- MathAchiev + ReadAchiev + CognAbility ~ 1 + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) fit <- with(implist, lmer(MathAchiev ~ 1 + CognAbility + (1|ID))) multilevelR2(fit) } mitml/man/testModels.Rd0000644000176200001440000001763614003555233014576 0ustar liggesusers\name{testModels} \alias{testModels} \title{Test multiple parameters and compare nested models} \description{ Performs multi-parameter hypothesis tests for a vector of statistical parameters and compares nested statistical models obtained from multiply imputed data sets. } \usage{ testModels(model, null.model, method = c("D1", "D2", "D3", "D4"), use = c("wald", "likelihood"), ariv = c("default", "positive", "robust"), df.com = NULL, data = NULL) } \arguments{ \item{model}{A list of fitted statistical models (``full'' model) as produced by \code{\link{with.mitml.list}} or similar.} \item{null.model}{A list of fitted statistical models (``restricted'' model) as produced by \code{\link{with.mitml.list}} or similar.} \item{method}{A character string denoting the method by which the test is performed. Can be \code{"D1"}, \code{"D2"}, \code{"D3"}, or \code{"D4"} (see 'Details'). Default is \code{"D1"}.} \item{use}{A character string denoting Wald- or likelihood-based based tests. Can be either \code{"wald"} or \code{"likelihood"}. Only used if \code{method = "D2"}.} \item{ariv}{A character string denoting how the ARIV is calculated. Can be \code{"default"}, \code{"positive"}, or \code{"robust"} (see 'Details').} \item{df.com}{(optional) A number denoting the complete-data degrees of freedom for the hypothesis test. Only used if \code{method = "D1"}.} \item{data}{(optional) A list of imputed data sets (see 'Details'). Only used if \code{method = "D4"}} } \details{ This function compares two nested statistical models fitted to multiply imputed data sets by pooling Wald-like or likelihood-ratio tests. Pooling methods for Wald-like tests of multiple parameters were introduced by Rubin (1987) and further developed by Li, Raghunathan and Rubin (1991). The pooled Wald test is referred to as \eqn{D_1} and can be used by setting \code{method = "D1"}. \eqn{D_1} is the multi-parameter equivalent of \code{\link{testEstimates}}, that is, it tests multiple parameters simultaneously. For \eqn{D_1}, the complete-data degrees of freedom are assumed to be infinite, but they can be adjusted for smaller samples by supplying \code{df.com} (Reiter, 2007). An alternative method for Wald-like hypothesis tests was suggested by Li, Meng, Raghunathan and Rubin (1991). The procedure is called \eqn{D_2} and can be used by setting \code{method = "D2"}. \eqn{D_2} calculates the Wald-test directly for each data set and then pools the resulting \eqn{\chi^2} values. The source of these values is specified by the \code{use} argument. If \code{use = "wald"} (the default), then a Wald test similar to \eqn{D_1} is performed. If \code{use = "likelihood"}, then the two models are compared with a likelihood-ratio test instead. Pooling methods for likelihood-ration tests were suggested by Meng and Rubin (1992). This procedure is referred to as \eqn{D_3} and can be used by setting \code{method = "D3"}. \eqn{D_3} compares the two models by pooling the likelihood-ratio test across multiply imputed data sets. Finally, an improved method for pooling likelihood-ratio tests was recommended by Chan & Meng (2019). This method is referred to as \eqn{D_4} and can be used by setting \code{method = "D4"}. \eqn{D_4} also compares models by pooling the likelihood-ratio test but does so in a more general and efficient manner. The function supports different classes of statistical models depending on which \code{method} is chosen. \eqn{D_1} supports models that define \code{coef} and \code{vcov} methods (or similar) for extracting the parameter estimates and their estimated covariance matrix. \eqn{D_2} can be used for the same models (if \code{use = "wald"} and models that define a \code{logLik} method (if \code{use = "likelihood"}). \eqn{D_3} supports linear models, linear mixed-effects models (see Laird, Lange, & Stram, 1987) with an arbitrary cluster structed if estimated with \code{lme4} or a single cluster if estimated by \code{nlme}, and structural equation models estimated with \code{lavaan} (requires ML estimator, see 'Note'). Finally, \eqn{D_4} supports models that define a \code{logLik} method but can fail if the data to which the model was fitted cannot be found. In such a case, users can provide the list of imputed data sets directly by specifying the \code{data} argument or refit with the \code{include.data} argument in \code{\link{with.mitml.list}}. Support for other statistical models may be added in future releases. The \eqn{D_4}, \eqn{D_3}, and \eqn{D_2} methods support different estimators of the relative increase in variance (ARIV), which can be specified with the \code{ariv} argument. If \code{ariv = "default"}, the default estimators are used. If \code{ariv = "positive"}, the default estimators are used but constrained to take on strictly positive values. This is useful if the estimated ARIV is negative. If \code{ariv = "robust"}, which is available only for \eqn{D_4}, the "robust" estimator proposed by Chan & Meng (2019) is used. This method should be used with caution, because it requires much stronger assumptions and may result in liberal inferences if these assumptions are violated. } \value{ A list containing the results of the model comparison. A \code{print} method is used for more readable output. } \note{ The methods \eqn{D_4}, \eqn{D_3}, and the likelihood-based \eqn{D_2} assume that models were fit using maximum likelihood (ML). Models fit using REML are automatically refit using ML. Models fit in \code{'lavaan'} using the MLR estimator or similar techniques that require scaled \eqn{chi^2} difference tests are currently not supported. } \references{ Chan, K. W., & Meng, X.-L. (2019). Multiple improvements of multiple imputation likelihood ratio tests. ArXiv:1711.08822 [Math, Stat]. \url{https://arxiv.org/abs/1711.08822} Laird, N., Lange, N., & Stram, D. (1987). Maximum likelihood computations with repeated measures: Application of the em algorithm. \emph{Journal of the American Statistical Association, 82}, 97-105. Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. \emph{Statistica Sinica, 1}, 65-92. Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. \emph{Journal of the American Statistical Association, 86}, 1065-1073. Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. \emph{Biometrika, 79}, 103-111. Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. \emph{Biometrika, 94}, 502-508. Rubin, D. B. (1987). \emph{Multiple imputation for nonresponse in surveys}. Hoboken, NJ: Wiley. } \author{Simon Grund} \seealso{\code{\link{testEstimates}}, \code{\link{testConstraints}}, \code{\link{with.mitml.list}}, \code{\link{anova.mitml.result}}} \examples{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) implist <- mitmlComplete(imp) # * Example 1: multiparameter hypothesis test for 'ReadDis' and 'SES' # This tests the hypothesis that both effects are zero. require(lme4) fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE)) fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE)) # apply Rubin's rules testEstimates(fit1) # multiparameter hypothesis test using D1 (default) testModels(fit1, fit0) # ... adjusting for finite samples testModels(fit1, fit0, df.com = 47) # ... using D2 ("wald", using estimates and covariance-matrix) testModels(fit1, fit0, method = "D2") # ... using D2 ("likelihood", using likelihood-ratio test) testModels(fit1, fit0, method = "D2", use = "likelihood") # ... using D3 (likelihood-ratio test, requires ML fit) testModels(fit1, fit0, method = "D3") # ... using D4 (likelihood-ratio test, requires ML fit) testModels(fit1, fit0, method = "D4") } mitml/man/write.mitmlMplus.Rd0000644000176200001440000000374314002031054015726 0ustar liggesusers\name{write.mitmlMplus} \alias{write.mitmlMplus} \title{Write \code{mitml} objects to Mplus format} \description{ Saves objects of class \code{mitml} as a series of text files which can be processed by the statistical software M\emph{plus} (Muthen & Muthen, 2012). } \usage{ write.mitmlMplus(x, filename, suffix = "list", sep = "\t", dec = ".", na.value = -999) } \arguments{ \item{x}{An object of class \code{mitml} or \code{mitml.list} as produced by \code{panImpute}, \code{jomoImpute}, \code{mitmlComplete}, or similar).} \item{filename}{File base name for the text files containing the imputed data sets, specified without file extension.} \item{suffix}{File name suffix for the index file.} \item{sep}{The field separator.} \item{dec}{The decimal separator.} \item{na.value}{A numeric value coding the missing data in the resulting data files.} } \details{ The M\emph{plus} format for multiply imputed data sets comprises a set of text files, each containing one imputed data set, and an index file containing the names of all data files. During export, factors and character variables are converted to numeric. To make this more transparent, \code{write.mitmlMplus} produces a log file which contains information about the data set and the factors that have been converted. In addition, a basic M\emph{plus} input file is generated that can be used for setting up subsequent analysis models. } \value{ None (invisible \code{NULL}). } \references{ Muthen, L. K., & Muthen, B. O. (2012). \emph{Mplus User's Guide. Seventh Edition.} Los Angeles, CA: Muthen & Muthen. } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{mitmlComplete}}} \examples{ \dontrun{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write imputation files, index file, and log file write.mitmlMplus(imp, filename = "imputation", suffix = "list", na.value = -999) } } mitml/man/write.mitmlSAV.Rd0000644000176200001440000000311314002031052015244 0ustar liggesusers\name{write.mitmlSAV} \alias{write.mitmlSAV} \title{Write \code{mitml} objects to native SPSS format} \description{ Saves objects of class \code{mitml} in the \code{.sav} format used by the statistical software SPSS (IBM Corp., 2013). The function serves as a front-end for \code{write_sav} from the \code{haven} package. } \usage{ write.mitmlSAV(x, filename) } \arguments{ \item{x}{An object of class \code{mitml} or \code{mitml.list} as produced by \code{panImpute}, \code{jomoImpute}, \code{mitmlComplete}, or similar).} \item{filename}{Name of the destination file. The file extension (\code{.sav}) is appended if needed.} } \details{ This function exports multiply imputed data sets to a single \code{.sav} file, in which an \code{Imputation_} variable separates the original data and the various imputed data sets. This allows exporting imputed data directly to the native SPSS format. Alternatively, \code{\link{write.mitmlSPSS}} may be used for creating separate text and SPSS syntax files, which offers more control over the data format. } \value{ None (invisible \code{NULL}). } \references{ IBM Corp. (2013). \emph{IBM SPSS Statistics for Windows, Version 22.0}. Armonk, NY: IBM Corp } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{mitmlComplete}}, \code{\link{write.mitmlSPSS}}} \examples{ \dontrun{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write data file and SPSS syntax write.mitmlSAV(imp, filename = "imputation") } } mitml/man/read.mitml.Rd0000644000176200001440000000160214002034134014460 0ustar liggesusers\name{read.mitml} \alias{read.mitml} \title{Read \code{mitml} objects from file} \description{ This function loads \code{mitml} class objects from R binary formats (similar to \code{?load}), usually produced by \code{write.mitml}. } \usage{ read.mitml(filename) } \arguments{ \item{filename}{Name of the file to read, to be specified with file extension (e.g., \code{.R}, \code{.Rdata}).} } \value{ An object of class \code{mitml}. } \author{Simon Grund} \seealso{\code{\link{panImpute}}, \code{\link{jomoImpute}}, \code{\link{write.mitml}}} \examples{ \dontrun{ data(studentratings) fml <- ReadDis + SES ~ ReadAchiev + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5) # write 'mitml' object write.mitml(imp, filename = "imputation.R") # read previously saved 'mitml' object old.imp <- read.mitml("imputation.R") class(old.imp) old.imp } } mitml/man/studentratings.Rd0000644000176200001440000000154214001352237015513 0ustar liggesusers\name{studentratings} \alias{studentratings} \docType{data} \title{Example data set on student ratings and achievement} \description{ Contains simulated data for students nested within schools, featuring students' ratings of their teachers' behavior (i.e., disciplinary problems in mathematics and reading class) and their general learning environment (school climate) as well as mathematics and reading achievement scores, and scores for socio-economic status and cognitive ability. In addition, the data set features the ID of 50 different schools (i.e., clusters), the biological sex of all students, and a broad, additional grouping factor. Different amounts of missing data have been inserted into the data set in a completely random fashion. } \usage{data(studentratings)} \format{A data frame containing 750 observations on 10 variables.} \keyword{datasets} mitml/DESCRIPTION0000644000176200001440000000165014127042517013107 0ustar liggesusersPackage: mitml Type: Package Title: Tools for Multiple Imputation in Multilevel Modeling Version: 0.4-3 Date: 2021-10-05 Author: Simon Grund [aut,cre], Alexander Robitzsch [aut], Oliver Luedtke [aut] Maintainer: Simon Grund BugReports: https://github.com/simongrund1/mitml/issues Imports: pan, jomo, haven, grDevices, graphics, stats, methods, utils Suggests: mice, miceadds, Amelia, lme4, nlme, lavaan, geepack, survival, knitr, rmarkdown LazyData: true LazyLoad: true Description: Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets. License: GPL (>= 2) VignetteBuilder: knitr NeedsCompilation: no Packaged: 2021-10-05 12:00:27 UTC; simon Repository: CRAN Date/Publication: 2021-10-05 12:30:07 UTC mitml/build/0000755000176200001440000000000014127037132012473 5ustar liggesusersmitml/build/vignette.rds0000644000176200001440000000044414127037132015034 0ustar liggesusersRN0t(*HQ Pժ]QPƏCظvJޝΌyB1J y_MἺD|SRԊ6[Õ#dbiγ`O0-^Uc10a20J xy";;0pM~иt-e[puN _:sOc㽐d5GLW\0ᕛc<+Fٸ`ԯxڽ^l_>uD{G( %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction} %\VignetteEncoding{UTF-8} --- --- ```{r setup, include=FALSE, cache=FALSE} library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ``` This vignette is intended to provide a first introduction to the R package `mitml` for generating and analyzing multiple imputations for multilevel missing data. A usual application of the package may consist of the following steps. 1. Imputation 2. Assessment of convergence 3. Completion of the data 4. Analysis 5. Pooling The `mitml` package offers a set of tools to facilitate each of these steps. This vignette is intended as a step-by-step illustration of the basic features of `mitml`. Further information can be found in the other [vignettes](https://github.com/simongrund1/mitml/wiki) and the package [documentation](https://cran.r-project.org/package=mitml/mitml.pdf). ## Example data For the purposes of this vignette, we employ a simple example that makes use of the `studentratings` data set, which is provided with `mitml`. To use it, the `mitml` package and the data set must be loaded as follows. ```{r} library(mitml) data(studentratings) ``` More information about the variables in the data set can be obtained from its `summary`. ```{r} summary(studentratings) ``` In addition, the correlations between variables (based on pairwise observations) may be useful for identifying possible sources of information that may be used during the treatment of missing data. ```{r, echo=FALSE} round(cor(studentratings[,-(1:3)], use="pairwise"),3) ``` This illustrates that (a) most variables in the data set are affected by missing data, but also (b) that substantial relations exist between variables. For simplicity, we focus on only a subset of these variables. ## Model of interest For the present example, we focus on the two variables `ReadDis` (disciplinary problems in reading class) and `ReadAchiev` (reading achievement). Assume we are interested in the relation between these variables. Specifically, we may be interested in the following analysis model $$ \mathit{ReadAchiev}_{ij} = \gamma_{00} + \gamma_{10} \mathit{ReadDis}_{ij} + u_{0j} + e_{ij} $$ On the basis of the syntax used in the R package `lme4`, this model may be written as follows. ```{r, results="hide"} ReadAchiev ~ 1 + ReadDis + (1|ID) ``` In this model, the relation between `ReadDis` and `ReadAchiev` is represented by a single fixed effect of `ReadDis`, and a random intercept is included to account for the clustered structure of the data and the group-level variance in `ReadAchiev` that is not explained by `ReadDis`. ## Generating imputations The `mitml` package includes wrapper functions for the R packages `pan` (`panImpute`) and `jomo` (`jomoImpute`). Here, we will use the first option. To generate imputations with `panImpute`, the user must specify (at least): 1. an imputation model 2. the number of iterations and imputations The easiest way of specifying the imputation model is to use the `formula` argument of `panImpute`. Generally speaking, the imputation model should include all variables that are either (a) part of the model of interest, (b) related to the variables in the model, or (c) related to whether the variables in the model are missing. In this simple example, we include only `ReadDis` and `ReadAchiev` as the main target variables and `SchClimate` as an auxiliary variable. ```{r} fml <- ReadAchiev + ReadDis + SchClimate ~ 1 + (1|ID) ``` Note that, in this specification of the imputation model. all variables are included on the left-hand side of the model, whereas the right-hand side is left "empty". This model allows for all relations between variables at Level 1 and 2 and is thus suitable for most applications of the multilevel random intercept model (for further discussion, see also Grund, Lüdtke, & Robitzsch, 2016, in press). The imputation procedure is then run for 5,000 iterations (burn-in), after which 100 imputations are drawn every 100 iterations. ```{r, results="hide"} imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 100, m = 100) ``` This step may take a few seconds. Once the process is completed, the imputations are saved in the `imp` object. ## Assessing convergence In `mitml`, there are two options for assessing the convergence of the imputation procedure. First, the `summary` calculates the "potential scale reduction factor" ($\hat{R}$) for each parameter in the imputation model. If this value is noticeably larger than 1 for some parameters (say $>1.05$), a longer burn-in period may be required. ```{r} summary(imp) ``` Second, diagnostic plots can be requested with the `plot` function. These plots consist of a trace plot, an autocorrelation plot, and some additional information about the posterior distribution. Convergence can be assumed if the trace plot is stationary (i.e., does not "drift"), and the autocorrelation is within reasonable bounds for the chosen number of iterations between imputations. For this example, we examine only the plot for the parameter `Beta[1,2]` (i.e., the intercept of `ReadDis`). ```{r conv, echo=FALSE} plot(imp, trace = "all", print = "beta", pos = c(1,2), export = "png", dev.args = list(width=720, height=380, pointsize=16)) ``` ```{r, eval=FALSE} plot(imp, trace = "all", print = "beta", pos = c(1,2)) ``` ![](mitmlPlots/BETA_ReadDis_ON_Intercept.png) Taken together, both $\hat{R}$ and the diagnostic plots indicate that the imputation model converged, setting the basis for the analysis of the imputed data sets. ## Completing the data In order to work with and analyze the imputed data sets, the data sets must be completed with the imputations generated in the previous steps. To do so, `mitml` provides the function `mitmlComplete`. ```{r} implist <- mitmlComplete(imp, "all") ``` This resulting object is a list that contains the 100 completed data sets. ## Analysis and pooling In order to obtain estimates for the model of interest, the model must be fit separately to each of the completed data sets, and the results must be pooled into a final set of estimates and inferences. The `mitml` package offers the `with` function to fit various statistical models to a list of completed data sets. In this example, we use the `lmer` function from the R package `lme4` to fit the model of interest. ```{r, message=FALSE} library(lme4) fit <- with(implist, lmer(ReadAchiev ~ 1 + ReadDis + (1|ID))) ``` The resulting object is a list containing the 100 fitted models. To pool the results of these models into a set of final estimates and inferences, `mitml` offers the `testEstimates` function. ```{r} testEstimates(fit, extra.pars = TRUE) ``` The estimates can be interpreted in a manner similar to the estimates from the corresponding complete-data procedure. In addition, the output includes diagnostic quantities such as the fraction of missing information (FMI), which can be helpful for interpreting the results and understanding problems with the imputation procedure. ###### References Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Multiple imputation of multilevel missing data: An introduction to the R package pan. *SAGE Open*, *6*(4), 1–17. doi: 10.1177/2158244016668220 ([Link](https://doi.org/10.1177/2158244016668220)) Grund, S., Lüdtke, O., & Robitzsch, A. (in press). Multiple imputation of missing data for multilevel models: Simulations and recommendations. *Organizational Research Methods*. doi: 10.1177/1094428117703686 ([Link](https://doi.org/10.1177/1094428117703686)) --- ```{r, echo=F} cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) ``` mitml/vignettes/Analysis.Rmd0000644000176200001440000002302414127016114015631 0ustar liggesusers--- title: "Analysis of Multiply Imputed Data Sets" output: rmarkdown::html_vignette: css: "css/vignette.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Analysis of Multiply Imputed Data Sets} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE, cache=FALSE} library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ``` This vignette is intended to provide an overview of the analysis of multiply imputed data sets with `mitml`. Specifically, this vignette addresses the following topics: 1. Working with multiply imputed data sets 2. Rubin's rules for pooling individual parameters 3. Model comparisons 4. Parameter constraints Further information can be found in the other [vignettes](https://github.com/simongrund1/mitml/wiki) and the package [documentation](https://cran.r-project.org/package=mitml/mitml.pdf). ## Example data (`studentratings`) For the purposes of this vignette, we make use of the `studentratings` data set, which contains simulated data from 750 students in 50 schools including scores on reading and math achievement, socioeconomic status (SES), and ratings on school and classroom environment. The package and the data set can be loaded as follows. ```{r} library(mitml) library(lme4) data(studentratings) ``` As evident from its `summary`, most variables in the data set contain missing values. ```{r} summary(studentratings) ``` In the present example, we investigate the differences in mathematics achievement that can be attributed to differences in SES when controlling for students' sex. Specifically, we are interested in the following model. $$ \mathit{MA}_{ij} = \gamma_{00} + \gamma_{10} \mathit{Sex}_{ij} + \gamma_{20} (\mathit{SES}_{ij}-\overline{\mathit{SES}}_{\bullet j}) + \gamma_{01} \overline{\mathit{SES}}_{\bullet j} + u_{0j} + e_{ij} $$ Note that this model also employs group-mean centering to separate the individual and group-level effects of SES. ## Generating imputations In the present example, we generate 20 imputations from the following imputation model. ```{r, results="hide"} fml <- ReadDis + SES ~ 1 + Sex + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 200, m = 20) ``` The completed data are then extracted with `mitmlComplete`. ```{r} implist <- mitmlComplete(imp, "all") ``` ## Transforming the imputed data sets In empirical research, the raw data rarely enter the analyses but often require to be transformed beforehand. For this purpose, the `mitml` package provides the `within` function, which applies a given transformation directly to each data set. In the following, we use this to (a) calculate the group means of SES and (b) center the individual scores around their group means. ```{r} implist <- within(implist, { G.SES <- clusterMeans(SES, ID) # calculate group means I.SES <- SES - G.SES # center around group means }) ``` This method can be used to apply arbitrary transformations to all of the completed data sets simultaneously. > **Note regarding** `dplyr`**:** > Due to how it is implemented, `within` cannot be used directly with `dplyr`. > Instead, users may use `with` instead of `within` with the following workaround. >```{r, eval=FALSE} >implist <- with(implist,{ > df <- data.frame(as.list(environment())) > df <- ... # dplyr commands > df >}) >implist <- as.mitml.list(implist) >``` > Advanced users may also consider using `lapply` for a similar workaround.` ## Fitting the analysis model In order to analyze the imputed data, each data set is analyzed using regular complete-data techniques. For this purpose, `mitml` offers the `with` function. In the present example, we use it to fit the model of interest with the R package `lme4`. ```{r} fit <- with(implist, { lmer(MathAchiev ~ 1 + Sex + I.SES + G.SES + (1|ID)) }) ``` This results in a list of fitted models, one for each of the imputed data sets. ## Pooling The results obtained from the imputed data sets must be pooled in order to obtain a set of final parameter estimates and inferences. In the following, we employ a number of different pooling methods that can be used to address common statistical tasks, for example, for (a) estimating and testing individual parameters, (b) model comparisons, and (c) tests of constraints about one or several parameters. #### Parameter estimates Individual parameters are commonly pooled with the rules developed by Rubin (1987). In `mitml`, Rubin's rules are implemented in the `testEstimates` function. ```{r} testEstimates(fit) ``` In addition, the argument `extra.pars = TRUE` can be used to obtain pooled estimates of variance components, and `df.com` can be used to specify the complete-data degrees of freedom, which provides more appropriate (i.e., conservative) inferences in smaller samples. For example, using a conservative value for the complete-data degrees of freedom for the fixed effects in the model of interest (Snijders & Bosker, 2012), the output changes as follows. ```{r} testEstimates(fit, extra.pars = TRUE, df.com = 46) ``` #### Multiple parameters and model comparisons Oftentimes, statistical inference concerns more than one parameter at a time. For example, the combined influence of SES (within and between groups) on mathematics achievement is represented by two parameters in the model of interest. Multiple pooling methods for Wald and likelihood ratio tests (LRTs) are implemented in the `testModels` function. This function requires the specification of a full model and a restricted model, which are then compared using (pooled) Wald tests or LRTs. Specifically, `testModels` allows users to pool Wald tests ($D_1$), $\chi^2$ test statistics ($D_2$), and LRTs ($D_3$ and $D_4$; for a comparison of these methods, see also Grund, Lüdtke, & Robitzsch, 2016b). To examine the combined influence of SES on mathematics achievement, the following restricted model can be specified and compared with the model of interest (using $D_1$). ```{r} fit.null <- with(implist, { lmer(MathAchiev ~ 1 + Sex + (1|ID)) }) testModels(fit, fit.null) ``` > **Note regarding the order of arguments:** > Please note that `testModels` expects that the first argument contains the full model, and the second argument contains the restricted model. > If the order of the arguments is reversed, the results will not be interpretable. Similar to the test for individual parameters, smaller samples can be accommodated with `testModels` (with method $D_1$) by specifying the complete-data degrees of freedom for the denominator of the $F$ statistic. ```{r} testModels(fit, fit.null, df.com = 46) ``` The pooling method used by `testModels` is determined by the `method` argument. For example, to calculate the pooled LRT corresponding to the Wald test above (i.e., $D_3$), the following command can be issued. ```{r} testModels(fit, fit.null, method="D3") ``` #### Constraints on parameters Finally, it is often useful to investigate functions (or constraints) of the parameters in the model of interest. In complete data sets, this can be achieved with a test of linear hypotheses or the delta method. The `mitml` package implements a pooled version of the delta method in the `testConstraints` function. For example, the combined influence of SES on mathematics achievement can also be tested without model comparisons by testing the constraint that the parameters pertaining to `I.SES` and `G.SES` are both zero. This constraint is defined and tested as follows. ```{r} c1 <- c("I.SES", "G.SES") testConstraints(fit, constraints = c1) ``` This test is identical to the Wald test given in the previous section. Arbitrary constraints on the parameters can be specified and tested in this manner, where each character string denotes an expression to be tested against zero. In the present example, we are also interested in the *contextual* effect of SES on mathematics achievement (e.g., Snijders & Bosker, 2012). The contextual effect is simply the difference between the coefficients pertaining to `G.SES` and `I.SES` and can be tested as follows. ```{r} c2 <- c("G.SES - I.SES") testConstraints(fit, constraints = c2) ``` Similar to model comparisons, constraints can be tested with different methods ($D_1$ and $D_2$) and can accommodate smaller samples by a value for `df.com`. Further examples for the analysis of multiply imputed data sets with `mitml` are given by Enders (2016) and Grund, Lüdtke, and Robitzsch (2016a). ###### References Enders, C. K. (2016). Multiple imputation as a flexible tool for missing data handling in clinical research. *Behaviour Research and Therapy*. doi: 10.1016/j.brat.2016.11.008 ([Link](https://doi.org/10.1016/j.brat.2016.11.008)) Grund, S., Lüdtke, O., & Robitzsch, A. (2016a). Multiple imputation of multilevel missing data: An introduction to the R package pan. *SAGE Open*, *6*(4), 1–17. doi: 10.1177/2158244016668220 ([Link](https://doi.org/10.1177/2158244016668220)) Grund, S., Lüdtke, O., & Robitzsch, A. (2016b). Pooling ANOVA results from multiply imputed datasets: A simulation study. *Methodology*, *12*, 75–88. doi: 10.1027/1614-2241/a000111 ([Link](https://doi.org/10.1027/1614-2241/a000111)) Rubin, D. B. (1987). *Multiple imputation for nonresponse in surveys*. Hoboken, NJ: Wiley. Snijders, T. A. B., & Bosker, R. J. (2012). *Multilevel analysis: An introduction to basic and advanced multilevel modeling*. Thousand Oaks, CA: Sage. --- ```{r, echo=F} cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) ``` mitml/vignettes/css/0000755000176200001440000000000013321365165014201 5ustar liggesusersmitml/vignettes/css/vignette.css0000644000176200001440000000741513321364322016541 0ustar liggesusersbody { background-color: #fff; max-width: 720px; overflow: visible; padding-left: 2em; padding-right: 2em; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 16px; font-weight: 500; line-height: 1.65; text-align: justify; text-justify: inter-word; margin: 2em auto; } #header { text-align: center; } #TOC { clear: both; margin: 0 0 10px 10px; padding: 4px; width: 400px; border: 1px solid #CCCCCC; border-radius: 5px; background-color: #f6f6f6; font-size: 16px; line-height: 1.5; text-align: left; } #TOC .toctitle { font-weight: bold; font-size: 18px; margin-left: 5px; } #TOC ul { padding-left: 40px; margin-left: -1.5em; margin-top: 5px; margin-bottom: 5px; } #TOC ul ul { margin-left: -2em; } #TOC li { line-height: 16px; } p { margin: 0.6em 0; } blockquote { border-left:3px dotted #e5e5e5; background-color: #fff; padding: 0 1em; margin: 0.9em 0; } hr { border-style: solid; border: none; border-top: 1px solid #777; margin: 28px 0; } dl { margin-left: 0; } dl dd { margin-bottom: 13px; margin-left: 13px; } dl dt { font-weight: bold; } ul, ol { text-align: left; } ul { margin-top: 0; } ul li { list-style: circle outside; } ul ul { margin-bottom: 0; } pre, code { background-color: #f7f7f7; line-height: 1.2; border-radius: 3px; color: #333; padding: 0px; white-space: pre; /* or: pre-wrap */ overflow-x: auto; } pre { border-radius: 3px; margin: 5px 0px 10px 0px; padding: 10px; } code { font-family: Consolas, monospace; font-size: 85%; } p > code, li > code { padding: 2px 2px; } h1, h2, h3, h4, h5, h6 { text-align: left; line-height: 1.2; } h1 { font-size: 2em; font-weight: 600; } h2 { color: #191919; font-size: 1.5em; font-weight: 600; } h3, h4, h5 { color: #292929; font-weight: 600; } /* Reference list */ h6 { color:#191919; font-size: 1.5em; font-weight: 600; margin-top: 0.83em; margin-bottom: 0.83em; } h6 ~ p { text-align: left; } a { color: #777; text-decoration: none; } a:hover { color: #aaa; text-decoration: underline; } /* a:visited { color: #777; } a:visited:hover { color: #aaa; text-decoration: underline; } */ /* tables */ table, table th, table td { border-left-style: none; border-right-style: none; } table { margin-top: 25px; margin-bottom: 25px; margin-left: auto; margin-right: auto; border-collapse: collapse; border-spacing: 0; } th { padding:5px 10px; border: 1px solid #b2b2b2; } td { padding:5px 10px; border: 1px solid #e5e5e5; } dt { color:#444; font-weight:500; } th { color:#444; } table thead, table tr.even { background-color: #f7f7f7; } /* images */ img { display: block; margin-left: auto; margin-right: auto; max-width:100%; } div.figure { text-align: center; } /* hovering behavior for images (e.g., play/pause GIFs) */ .gif_play, #gif:hover .gif_stop{ display:none } .gif_stop, #gif:hover .gif_play{ display:block } /* code highlighting */ pre code { color: #707070; } /* General Code w/o Class */ pre code.r { color: #333333; } /* General Code */ code span.kw { color: #558200; font-weight: normal; } /* Keyword */ code span.co { color: #707070; font-style: normal; } /* Comment */ code span.dt { color: #333333; } /* Data Type */ code span.fu { color: #558200; } /* Function calls */ code span.dv { color: #007878; } /* Decimal Values */ code span.bn { color: #007878; } /* Base N */ code span.fl { color: #007878; } /* Float */ code span.ch { color: #985b00; } /* Character */ code span.st { color: #985b00; } /* String */ code span.ot { color: #007878; } /* Other Token */ code span.al { color: #a61717; font-weight: bold; } /* Alert Token */ code span.er { color: #a61717; background-color: #e3d2d2; } /* Error Token */ mitml/vignettes/Level2.Rmd0000644000176200001440000001442014002023537015176 0ustar liggesusers--- title: "Imputation of Missing Data at Level 2" output: rmarkdown::html_vignette: css: "css/vignette.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Imputation of Missing Data at Level 2} %\VignetteEncoding{UTF-8} --- --- ```{r setup, include=FALSE, cache=FALSE} library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ``` This vignette illustrates the use of `mitml` for the treatment of missing data at Level 2. Specifically, the vignette addresses the following topics: 1. Specification of the two-level imputation model for missing data at both Level 1 and 2 2. Running the imputation procedure Further information can be found in the other [vignettes](https://github.com/simongrund1/mitml/wiki) and the package [documentation](https://cran.r-project.org/package=mitml/mitml.pdf). ## Example data For purposes of this vignette, we make use of the `leadership` data set, which contains simulated data from 750 employees in 50 groups including ratings on job satisfaction, leadership style, and work load (Level 1) as well as cohesion (Level 2). The package and the data set can be loaded as follows. ```{r} library(mitml) data(leadership) ``` In the `summary` of the data, it becomes visible that all variables are affected by missing data. ```{r} summary(leadership) ``` The following data segment illustrates this fact, including cases with missing data at Level 1 (e.g., job satisfaction) and 2 (e.g., cohesion). ```{r, echo=FALSE} leadership[73:78,] ``` In the following, we will employ a two-level model to address missing data at both levels simultaneously. ## Specifying the imputation model The specification of the two-level model, involves two components, one pertaining to the variables at each level of the sample (Goldstein, Carpenter, Kenward, & Levin, 2009; for further discussion, see also Enders, Mister, & Keller, 2016; Grund, Lüdtke, & Robitzsch, in press). Specifically, the imputation model is specified as a list with two components, where the first component denotes the model for the variables at Level 1, and the second component denotes the model for the variables at Level 2. For example, using the `formula` interface, an imputation model targeting all variables in the data set can be written as follows. ```{r} fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , # Level 1 COHES ~ 1 ) # Level 2 ``` The first component of this list includes the three target variables at Level 1 and a fixed (`1`) as well as a random intercept (`1|GRPID`). The second component includes the target variable at Level 2 with a fixed intercept (`1`). From a statistical point of view, this specification corresponds to the following model $$ \begin{aligned} \mathbf{y}_{1ij} &= \boldsymbol\mu_{1} + \mathbf{u}_{1j} + \mathbf{e}_{ij} \\ \mathbf{y}_{2j} &= \boldsymbol\mu_{2} + \mathbf{u}_{1j} \; , \end{aligned} $$ where $\mathbf{y}_{1ij}$ denotes the target variables at Level 1, $\mathbf{y}_{2j}$ the target variables at Level 2, and the right-hand side of the model contains the fixed effects, random effects, and residual terms as mentioned above. Note that, even though the two components of the model appear to be separated, they define a single (joint) model for all target variables at both Level 1 and 2. Specifically, this model employs a two-level covariance structure, which allows for relations between variables at both Level 1 (i.e., correlated residuals at Level 1) and 2 (i.e., correlated random effects residuals at Level 2). ## Generating imputations Because the data contain missing values at both levels, imputations will be generated with `jomoImpute` (and not `panImpute`). Except for the specification of the two-level model, the syntax is the same as in applications with missing data only at Level 1. Here, we will run 5,000 burn-in iterations and generate 20 imputations, each 250 iterations apart. ```{r, results="hide"} imp <- jomoImpute(leadership, formula = fml, n.burn = 5000, n.iter = 250, m = 20) ``` By looking at the `summary`, we can then review the imputation procedure and verify that the imputation model converged. ```{r} summary(imp) ``` Due to the greater complexity of the two-level model, the output includes more information than in applications with missing data only at Level 1. For example, the output features the model specification for variables at both Level 1 and 2. Furthermore, it provides convergence statistics for the additional regression coefficients for the target variables at Level 2 (i.e., `Beta2`). Finally, it also becomes visible that the two-level model indeed allows for relations between target variables at Level 1 and 2. This can be seen from the fact that the potential scale reduction factor ($\hat{R}$) for the covariance matrix at Level 2 (`Psi`) was largest for `Psi[4,3]`, which is the covariance between cohesion and the random intercept of work load. ## Completing the data The completed data sets can then be extracted with `mitmlComplete`. ```{r} implist <- mitmlComplete(imp, "all") ``` When inspecting the completed data, it is easy to verify that the imputations for variables at Level 2 are constant within groups as intended, thus preserving the two-level structure of the data. ```{r, echo=FALSE} implist[[1]][73:78,] ``` ###### References Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. *Psychological Methods*, *21*, 222–240. doi: 10.1037/met0000063 ([Link](https://doi.org/10.1037/met0000063)) Goldstein, H., Carpenter, J. R., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. *Statistical Modelling*, *9*, 173–197. doi: 10.1177/1471082X0800900301 ([Link](https://doi.org/10.1177/1471082X0800900301)) Grund, S., Lüdtke, O., & Robitzsch, A. (in press). Multiple imputation of missing data for multilevel models: Simulations and recommendations. *Organizational Research Methods*. doi: 10.1177/1094428117703686 ([Link](https://doi.org/10.1177/1094428117703686)) --- ```{r, echo=F} cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) ``` mitml/NEWS0000644000176200001440000002033614127036773012111 0ustar liggesusers# * RELEASE HISTORY OF THE 'mitml' PACKAGE: -- # Version 0.4-3 (2021-10-05) -- * mitml.list2mids: new function (converts objects from 'mitml.list' to 'mids') * testEstimates, added features (support for ordinal models estimated with testConstraints, 'MASS::polr') testModels: * other: bugfix (fixes vignette build errors) # Version 0.4-2 (2021-09-10) -- * testModels: bugfix (fixes unneded refits in 'D1', fixes scoping issue in 'D3' and 'D4') # Version 0.4-1 (2021-02-05) -- * fixed Solaris build error # Version 0.4-0 (2021-01-25) -- * testEstimates: added initial support for SEM ('lavaan'); argument 'var.comp' is now deprecated and was replaced by 'extra.pars' * testModels: added new pooling method for LRTs ('D4'); now adopts more accurate formula for small-sample degrees of freedom ('df.com'); added initial support for SEM ('lavaan'); expanded support of 'D3' for multilevel models ('lme4') with arbitrary number of clusters; added 'ariv' argument for different estimators * testConstraints: added initial support for SEM ('lavaan'); now adopts more accurate formula for small-sample degrees of freedom ('df.com') * anova.mitml.results: see 'testModels'; now uses 'D4' and 'D2' as fallback options (in that order) * other: general code improvements # Version 0.3-6 (2018-07-10) -- * confint: new function, calculating confidence intervals for pooled estimates (applicable to testEstimates() objects) * jomoImpute, added features (option to save fewer parameters with panImpute: 'keep.chains') * jomoImpute, added features (support for single-level imputation models) * testEstimates, added features (support for Cox-PH models using 'survival' testConstraints, package) testModels: * testConstraints: added features (pooled estimates and SEs of specified constraints) * mitmlComplete: bugfix (fixes row ordering issue) * jomoImpute, bugfix (fixes erroneous removal of global seed) panImpute: * other: added vignettes (analysis, level-2 imputation) # Version 0.3-5 (2017-03-14) -- * testEstimates: now prints the two-tailed p-value (as opposed to one-tailed in earlier versions), revised label for p-values, improved output with constrained variance components * testModels: revised label for p-values * testEstimates: added features (support for GEEs using the 'geepack' package) * testModels: added features (support for GEEs using the 'geepack' package) * testConstraints: added features (support for GEEs using the 'geepack' package) * c.mitml.list: new functions, combining lists of multiply imputed data sets (and rbind..., by data set (c.mitml.list), row (rbind.mitml.list), or column cbind...) (cbind.mitml.list) * sort.mitml.list: new function, sorting lists of multiply imputed data sets by one or several variables (similar to '?order') * subset.mitml.list: new function, generating subsets for lists of multiply imputed data sets (similar to '?subset') * amelia2mitml.list: new function, converting imputations generated by the 'Amelia' package to 'mitml.list' * justice: updated data set (added categorical variable, missing data at Level 2) # Version 0.3-4 (2016-09-12) -- * mitmlComplete: changed default arguments ('print' now defaults to 'all', returning list of completed data sets) * jomoImpute: bugfix (fixes error in which jomoImpute() rejected correctly specified priors when 'group=NULL') * mitmlComplete: bugfix (fixes error with categorical target variables when there are no missing data) * plot: adjusted warning message for 'pos' argument to include 'beta2'. # Version 0.3-3 (2016-07-04) -- * jomoImpute: added features (support for imputation of cluster-level variables, i.e., the two-level procedures "jomo2...") * print/summary: revised appearance in two-level models (model summary is displayed separately by level for two-level imputation procedures) * plot: additional value for print argument ("beta2", denoting the regression coefficients of the cluster-level imputation model) * jomoImpute: bugfix (fixes error in the usage of starting values in cases with only continuous/no categorical data) * plot: revised formatting of the plot title (order of variables) # Version 0.3-2 (2016-05-10) -- * plot: added features (requesting single parameters, thinning of the chain prior to plotting) * summary: added features (summary of autocorrelation) * plot: revised appearance and behavior (burn-in printed in grey, included Rhat and autocorrelation at lag k in the posterior summary; for trace="burnin", the summary is now calculated for the burn-in phase, not the imputation phase) # Version 0.3-1 (2016-05-10) -- * anova: new function based on objects of class 'mitml.result', facilitating comparisons for a sequence of models * long2mitml.list: new function, converting multiple imputations from "long" format to 'mitml.list' * jomo2mitml.list: new function, converting imputations generated byt the 'jomo' package to 'mitml.list' * multilevelR2: new function, calculating measures of explained variance (R^2) for multilevel models and based on observed or multiply imputed data sets * justice: new data set, including re-simulated data based on the study of procedural justice, justice climate, and organizational satisfaction by Liao and Rupp (2005) * plot: renamed export directory ("mitmlPlots", formerly "panPlots") * testModels: added automatic refit using ML for REML fitted models * mitmlComplete: bugfix (fixes error with mixed categorical and continuous variables) * plot: bugfix (fixes handling of latent background variables for categorical variables) # Version 0.3-0 (2016-03-15) -- * jomoImpute: new function, providing an interface to the jomo package for imputation of missing values at level 1 - includes adjustments in mitml.complete as well as the summary, print, and plot methods) - includes minor changes in the interpretation of the formula and type arguments - adds a few slots to the 'mitml' object class * summary: bugfix (fixes behavior when applied to fixed parameters with zero variance) * as.mitml.list: bugfix (fixes order of class attributes) # Version 0.2-4 (2015-10-19) -- * clusterMeans: code improvements * panImpute: code improvements * testConstraints: added features (model-independent input) * testEstimates: added features (model-independent input) * testModels: comparisons of REML fits through D2 is now permitted * summary: bugfix (n.Rhat now takes effect), added features (goodness of approximation) # Version 0.2-3 (2015-07-09) -- * panImpute: added features (silent mode), bugfix (ordering of variables with nonstandard priors) * summary: added features (details on PSR) * plot: revised layout, added features (trend line, posterior summary), bugfix (plot labels) * testModels: bugfix (structural zeros in lmer objects) * studentratings: renamed variable (data set) # Version 0.2-2 (2015-05-23) -- * initial release mitml/R/0000755000176200001440000000000014127022467011602 5ustar liggesusersmitml/R/jomo2mitml.list.R0000644000176200001440000000020614001604017014752 0ustar liggesusersjomo2mitml.list <- function(x){ # convert jomo imputations to mitml.list long2mitml.list(x, split = "Imputation", exclude = 0) } mitml/R/panImpute.R0000644000176200001440000001345014001606200013652 0ustar liggesuserspanImpute <- function(data, type, formula, n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE){ # wrapper function for the Gibbs sampler in the pan package # *** checks if(!missing(type) && !missing(formula)) stop("Only one of 'type' or 'formula' may be specified.") if(save.pred && !missing(type)){ warning("Option 'save.pred' is ignored if 'type' is specified") save.pred = FALSE } keep.chains <- match.arg(keep.chains) # convert type if(!missing(type)){ formula <- .type2formula(data, type) group <- attr(formula, "group") } # empty objects to assign to clname <- yvrs <- y <- ycat <- zcol <- xcol <- pred <- clus <- psave <- pvrs <- qvrs <- pnames <- qnames <- NULL # preserve original order if(!is.data.frame(data)) as.data.frame(data) data <- cbind(data, original.order = 1:nrow(data)) # address additional grouping grname <- group if(is.null(group)){ group <- rep(1, nrow(data)) }else{ group <- data[,group] if(length(group) != nrow(data)) stop("Argument 'group' is not correctly specified.") } group.original <- group group <- as.numeric(factor(group, levels = unique(group))) # *** # model input # populate local frame .model.byFormula(data, formula, group, group.original, method = "pan") # check model input if(any(is.na(group))) stop("Grouping variable must not contain missing data.") if(any(is.na(pred))) stop("Predictor variables must not contain missing data.") if(sum(is.na(y)) == 0) stop("Target variables do not contain any missing data.") if(any(!sapply(y, is.numeric))) stop("Target variables must be numeric. You may either convert them or use jomoImpute() instead.") if(any(duplicated(yvrs))) stop("Found duplicate target variables.") # reorder colums cc <- which(colnames(data) %in% c(clname, grname, yvrs)) data.ord <- cbind(data[c(clname, grname, yvrs)], data[-cc]) # *** # pan setup if(is.null(prior)){ prior <- list( a = ncol(y), Binv = diag(1, ncol(y)), c = ncol(y)*length(zcol), Dinv = diag(1, ncol(y)*length(zcol)) ) } if(is.null(seed)){ set.seed(as.integer(runif(1, 0, 10^6))) }else{ set.seed(as.integer(seed)) } rns <- sapply(unique(group), function(x, m) as.integer(runif(m+1, 0, 10^6)), m = m) # prepare output ind <- which(is.na(data.ord), arr.ind = TRUE, useNames = FALSE) ind <- ind[ ind[,2] %in% which(colnames(data.ord) %in% colnames(y)), ,drop = FALSE ] rpm <- matrix(NA, nrow(ind), m) # standard dimensions ng <- length(unique(group)) np <- length(xcol) nq <- length(zcol) nr <- ncol(y) # reduced dimensions dpsi <- nr*nq dsig <- nr if(keep.chains == "diagonal"){ dpsi <- dsig <- 1 } bpar <- list(beta = array( NA, c(np, nr, n.burn, ng) ), psi = array( NA, c(nr*nq, dpsi, n.burn, ng) ), sigma = array( NA, c(nr, dsig, n.burn, ng) )) ipar <- list(beta = array( NA, c(np, nr, n.iter*m, ng) ), psi = array( NA, c(nr*nq, dpsi, n.iter*m, ng) ), sigma = array( NA, c(nr, dsig, n.iter*m, ng) )) # burn-in if(!silent){ cat("Running burn-in phase ...\n") flush.console() } glast <- as.list(unique(group)) for(gg in unique(group)){ gi <- group == gg gy <- y[gi,] gpred <- pred[gi,] gclus <- clus[gi] # sort 1, ..., k gclus <- match(gclus, unique(gclus)) cur <- pan::pan(gy, subj = gclus, gpred, xcol, zcol, prior, seed = rns[1, gg], iter = n.burn) glast[[gg]] <- cur$last # save parameter chains bpar[["beta"]][,,,gg] <- cur$beta if(keep.chains == "diagonal"){ bpar[["psi"]][,,,gg] <- .adiag( cur$psi ) bpar[["sigma"]][,,,gg] <-.adiag( cur$sigma ) }else{ bpar[["psi"]][,,,gg] <- cur$psi bpar[["sigma"]][,,,gg] <- cur$sigma } } # imputation for(ii in 1:m){ if(!silent){ cat("Creating imputed data set (", ii, "/", m,") ...\n") flush.console() } gy.imp <- as.list(unique(group)) for(gg in unique(group)){ gi <- group == gg gy <- y[gi,] gpred <- pred[gi,] gclus <- clus[gi] # sort 1, ..., k gclus <- match(gclus, unique(gclus)) cur <- pan::pan(gy, subj = gclus, gpred, xcol, zcol, prior, seed = rns[ii+1, gg], iter = n.iter, start = glast[[gg]]) glast[[gg]] <- cur$last # save imputations gy.imp[[gg]] <- cur$y # save parameter chains i0 <- seq.int(n.iter*(ii-1)+1, n.iter*ii) ipar[["beta"]][,,i0, gg] <- cur$beta if(keep.chains == "diagonal"){ ipar[["psi"]][,,i0, gg] <- .adiag( cur$psi ) ipar[["sigma"]][,,i0, gg] <- .adiag( cur$sigma ) }else{ ipar[["psi"]][,,i0, gg] <- cur$psi ipar[["sigma"]][,,i0, gg] <- cur$sigma } } y.imp <- do.call(rbind, gy.imp) rpm[,ii] <- y.imp[is.na(y)] } if(!silent){ cat("Done!\n") } # clean up srt <- data.ord[,ncol(data.ord)] data.ord <- data.ord[,-ncol(data.ord)] # prepare output data if( save.pred && !missing(formula) ) data.ord <- cbind(data.ord, pred[, psave, drop = F]) # ordering attr(data.ord, "sort") <- srt attr(data.ord, "group") <- group.original # model summary model <- list(clus = clname, yvrs = yvrs, pvrs = pvrs, qvrs = qvrs) attr(model, "is.ML") <- TRUE attr(model, "is.L2") <- FALSE attr(model, "full.names") <- list(pvrs = pnames, qvrs = qnames) out <- list( data = data.ord, replacement.mat = rpm, index.mat = ind, call = match.call(), model = model, random.L1 = "none", prior = prior, iter = list(burn = n.burn, iter = n.iter, m = m), keep.chains = keep.chains, par.burnin = bpar, par.imputation = ipar ) class(out) <- c("mitml", "pan") return(out) } mitml/R/cbind.mitml.list.R0000644000176200001440000000021114001605565015066 0ustar liggesuserscbind.mitml.list <- function(...){ # merges two objects of class "mitml.list" by appending columns of list entries Map(cbind, ...) } mitml/R/zzz.R0000644000176200001440000000025514001610463012552 0ustar liggesusers.onAttach <- function(libname, pkgname){ packageStartupMessage("*** This is beta software. Please report any bugs!\n*** See the NEWS file for recent changes.") } mitml/R/summary.mitml.R0000644000176200001440000000645314001606021014534 0ustar liggesuserssummary.mitml <- function(object, n.Rhat = 3, goodness.of.appr = FALSE, autocorrelation = FALSE, ...){ # summary method for objects of class "mitml" inc <- object$data ngr <- length(unique(attr(object$data, "group"))) prm <- object$par.imputation iter <- dim(prm[[1]])[3] k <- object$iter$iter isML <- attr(object$model, "is.ML") isL2 <- attr(object$model, "is.L2") # parameter chains (for backwards compatibility) if(is.null(object$keep.chains)) object$keep.chains <- "full" # percent missing mdr <- sapply(inc, FUN = function(x){mean(is.na(x))}) mdr[] <- sprintf(mdr*100, fmt = "%.1f") mdr <- gsub("^0.0$", "0", mdr) # convergence for imputation phase conv <- NULL Rhat <- ifelse(is.null(n.Rhat), FALSE, n.Rhat >= 2) SDprop <- goodness.of.appr ACF <- autocorrelation if(Rhat|SDprop|ACF){ conv <- c(list(beta = NULL), if(isL2) list(beta2 = NULL), if(isML) list(psi = NULL), list(sigma = NULL)) for(pp in names(conv)){ ni <- dim(prm[[pp]])[1] nj <- dim(prm[[pp]])[2] nl <- dim(prm[[pp]])[4] cmat <- matrix(NA_real_, ni*nj*nl, 3+Rhat+SDprop+3*ACF) cmat[,1] <- rep(1:ni, nj*nl) cmat[,2] <- rep(1:nj, each = ni, times = nl) cmat[,3] <- rep(1:nl, each = ni*nj) colnames(cmat) <- c("i1", "i2", "grp", if(Rhat) "Rhat", if(SDprop) "SDprop", if(ACF) c("lag-1", "lag-k", "lag-2k")) for(ll in 1:nl){ # by group for(jj in 1:nj){ for(ii in 1:ni){ # check for redundant entries if(pp == "psi"){ if(jj > ii) next } if(pp == "sigma"){ if(jj > ((ii-1)%%nj)+1) next } ind <- ( cmat[,1] == ii & cmat[,2] == jj & cmat[,3] == ll ) chn <- matrix(prm[[pp]][ii,jj,,ll], 1, iter) # potential scale reduction (Rhat) if(Rhat) cmat[ind, "Rhat"] <- .GelmanRubin(chn, n.Rhat) # goodness of approximation if(SDprop) cmat[ind, "SDprop"] <- .SDprop(chn) # autocorrelation if(ACF){ cmat[ind, "lag-1"] <- .reducedACF(chn, lag = 1, smooth = 0) cmat[ind, "lag-k"] <- .reducedACF(chn, lag = k, smooth = 2, sd=.5) cmat[ind, "lag-2k"] <- .reducedACF(chn, lag = 2*k, smooth = 2, sd=.5) } } } } conv[[pp]] <- cmat[!apply(cmat, 1, function(x) any(is.na(x))),,drop = F] } attr(conv, "stats") <- c("Rhat", "SDprop", "ACF")[c(Rhat, SDprop, ACF)] } smr <- list( call = object$call, model = object$model, prior = object$prior, iter = object$iter, keep.chains = object$keep.chains, ngr = ngr, missing.rates = mdr, conv = conv ) class(smr) <- "mitml.summary" return(smr) } .reducedACF <- function(x, lag, smooth = 0, sd=.5){ # check NA if(all(is.na(x))) return(NA) n <- length(x) lag0 <- lag lag <- lag + (-smooth:smooth) ac <- numeric(length(lag)) y <- x - mean(x) ss.y <- sum(y^2) for(li in 1:length(lag)){ ll <- lag[li] # leave at 0 for constant value ac[li] <- if(ss.y>0) sum( y[1:(n-ll)] * y[1:(n-ll)+ll] ) / ss.y else 0 } if(smooth>0){ # weights based on normal density w <- dnorm(-smooth:smooth, 0, sd) ac <- sum( ac * (w/sum(w)) ) } ac } mitml/R/internal-methods-zzz.R0000644000176200001440000000454114053232734016037 0ustar liggesusers# *** # misc. methods # # * check for REML fit .checkREML <- function(object, ...) UseMethod(".checkREML", object) .checkREML.default <- function(object) return(FALSE) .checkREML.merMod <- function(object) return(lme4::isREML(object)) .checkREML.lme <- function(object) return(object$method == "REML") # * update REML fit with ML .updateML <- function(object, ...) UseMethod(".updateML", object) .updateML.default <- function(object) return(object) .updateML.merMod <- function(object) return(.localUpdate(object, REML = FALSE)) .updateML.lme <- function(object) return(.localUpdate(object, data = object$data, method = "ML")) # * determine degrees of freedom .getDFs <- function(object, ...) UseMethod(".getDFs", object) .getDFs.default <- function(object){ df <- NULL # try logLik df.try <- try(attr(logLik(object), "df"), silent = TRUE) if(!inherits(df.try, "try-error")){ df <- df.try attr(df, "type") <- "logLik" } # try df.residual and sample size (nobs, residuals) # NOTE: does not account for scale parameters (e.g., residual variance) if(is.null(df)){ rdf <- try(df.residual(object), silent = TRUE) n <- try(nobs(object), silent = TRUE) if(inherits(n, "try-error")) n <- try(length(predict(object)), silent = TRUE) if(inherits(n, "try-error")) n <- try(length(residuals(object)), silent = TRUE) if(!inherits(rdf, "try-error") && !inherits(n, "try-error")){ df <- n - rdf attr(df, "type") <- "df.residual" } } return(df) } .getDFs.lavaan <- function(object){ df <- attr(lavaan::logLik(object), "df") attr(df, "type") <- "logLik" return(df) } # * extract model formula .getFormula <- function(object, ...) UseMethod(".getFormula", object) .getFormula.default <- function(object){ fml <- try(deparse(formula(object))) if(inherits(fml, "try-error")) fml <- NULL fml <- Reduce(paste, fml) return(fml) } .getFormula.lme <- function(object){ fe.fml <- deparse(formula(object)) re.fml <- lapply(formula(object$modelStruct$reStruct), deparse) for(ff in seq_along(re.fml)) re.fml[[ff]] <- paste0(re.fml[[ff]], "|", names(re.fml)[ff]) fml <- paste(c(fe.fml, unlist(re.fml)), collapse = ", ") return(fml) } .getFormula.lavaan <- function(object){ cll <- getCall(object) fml <- deparse(cll[c(1, match("model", names(cll)))]) fml <- sub(")$", ", ...)", fml) return(fml) } mitml/R/plot.mitml.R0000644000176200001440000005714214002017105014016 0ustar liggesusersplot.mitml <- function(x, print = c("beta", "beta2", "psi", "sigma"), pos = NULL, group = "all", trace = c("imputation", "burnin", "all"), thin = 1, smooth = 3, n.Rhat = 3, export = c("none", "png", "pdf"), dev.args = list(), ...){ # plot method for objects of class "mitml" # retrieve data and variable names (predictors) vrs <- x$model clus <- x$model$clus pvrs <- seq_along(attr(vrs, "full.names")$pvrs) qvrs <- seq_along(attr(vrs, "full.names")$qvrs) names(pvrs) <- attr(vrs, "full.names")$pvrs names(qvrs) <- attr(vrs, "full.names")$qvrs isML <- attr(x$model, "is.ML") isL2 <- attr(x$model, "is.L2") if(isL2){ pvrs.L2 <- seq_along(attr(vrs, "full.names")$pvrs.L2) names(pvrs.L2) <- attr(vrs, "full.names")$pvrs.L2 } # match arguments print <- match.arg(print, several.ok = TRUE) trace <- match.arg(trace) export <- match.arg(export) # check for random L1 rl1 <- x$random.L1 == "full" # parameter chains (for backwards compatibility) kc <- x$keep.chains if(is.null(kc)) kc <- "full" # check print and position for selected parameters if(!is.null(pos) & length(print)>1){ pos <- NULL warning("The 'pos' argument may only be used when 'print' is cleary defined as one of 'beta', 'beta2', 'psi', or 'sigma' (see '?plot').") } # grouping grp.labels <- unique(attr(x$data, "group")) if(is.numeric(group)) grp.labels <- grp.labels[group] grp <- length(grp.labels) # export, graphical parameters if(export != "none"){ wd <- getwd() out <- file.path(wd, "mitmlPlots") if(!file.exists(out)) dir.create(out) }else{ do.call(dev.new, dev.args) devAskNewPage(ask = FALSE) } oldpar <- par(no.readonly = TRUE) # *** # start plotting # for(gg in 1:grp){ # grouping if(grp>1){ glab <- paste(",Group:", grp.labels[gg], sep = "") gfile <- paste("Group-", grp.labels[gg], "_", sep = "") }else{ glab <- gfile <- "" } # expand targets for multiple categories yvrs <- vrs$yvrs yvrs.L2 <- vrs$yvrs.L2 # ... level 1 cvrs <- intersect(yvrs, attr(x$data, "cvrs")) nc <- length(cvrs) if(length(cvrs)>=1){ yvrs <- c(yvrs[!yvrs %in% cvrs], cvrs) for(cc in 1:nc){ cv <- cvrs[cc] ci <- which(yvrs == cv) yi <- 1:length(yvrs) nlev <- attr(x$data, "levels")[gg, cc] if(nlev>2){ newy <- paste0(cv, 1:(nlev-1)) }else{ newy <- cv } sel0 <- yi[yici] yvrs <- c(yvrs[sel0], newy, yvrs[sel1]) } } ynam <- yvrs yvrs <- seq_along(yvrs) names(yvrs) <- ynam # ... level 2 if(isL2){ cvrs.L2 <- intersect(yvrs.L2, attr(x$data, "cvrs")) nc.L2 <- length(cvrs.L2) if(length(cvrs.L2)>=1){ yvrs.L2 <- c(yvrs.L2[!yvrs.L2 %in% cvrs.L2], cvrs.L2) for(cc in 1:nc.L2){ cv <- cvrs.L2[cc] ci <- which(yvrs.L2 == cv) yi <- 1:length(yvrs.L2) nlev <- attr(x$data, "levels")[gg, nc+cc] if(nlev>2){ newy <- paste0(cv, 1:(nlev-1)) }else{ newy <- cv } sel0 <- yi[yici] yvrs.L2 <- c(yvrs.L2[sel0], newy, yvrs.L2[sel1]) } } ynam <- yvrs.L2 yvrs.L2 <- seq_along(yvrs.L2) names(yvrs.L2) <- ynam } # number of iterations n <- dim(x$par.burnin[["beta"]])[3]+dim(x$par.imputation[["beta"]])[3] nb <- dim(x$par.burnin[["beta"]])[3] ni <- dim(x$par.imputation[["beta"]])[3] niter <- x$iter[["iter"]] # thinned-sample indicators s <- seq.int(thin, n, by = thin) sb <- seq.int(thin, nb, by = thin) si <- seq.int(thin, ni, by = thin) lag <- ceiling(niter/thin) # *** plots for fixed regression coefficients at level 1 # if("beta" %in% print){ # check if pos is badly defined if(!is.null(pos)){ if(pos[1] > max(pvrs) | pos[1] < min(pvrs) | pos[2] > max(yvrs) | pos[2] < min(yvrs)){ .restoreDevice(oldpar, export, close = TRUE) stop("There is no entry [", pos[1], ",", pos[2], "] in 'beta'.") } } for(ic in yvrs){ for(ir in pvrs){ # skip if individual parameters requested if(!is.null(pos)){ if(!(pos[1] == ir & pos[2] == ic)) next } if(export != "none"){ filename <- paste("BETA_", gfile, names(yvrs[ic]), "_ON_", names(pvrs[ir]), ".", export, sep = "") filename <- gsub("[(),]", "", filename) filename <- gsub("[[:space:]]", "-", filename) out.args <- c(list(file = file.path(out, filename)), dev.args) do.call(export, out.args) } layout(matrix(c(1, 2, 3, 4), 2, 2), c(5, 1), c(1.13, 1)) # choose section of trace switch(trace, imputation={ trc <- x$par.imputation[["beta"]][ir,ic,,gg][si] }, burnin={ trc <- x$par.burnin[["beta"]][ir,ic,,gg][sb] }, all={ trc <- c(x$par.burnin[["beta"]][ir,ic,,gg], x$par.imputation[["beta"]][ir,ic,,gg])[s] } ) # trace plot par(mar = c(3, 3, 2, 0)+0.5, mgp = c(2, 1, 0), font.lab = 2) ymin <- min(trc) ymax <- max(trc) yr <- ymax-ymin plot(trc, type = ifelse(trace == "all", "n", "l"), ylab = "Trace", xlab = "Iteration", xaxt = "n", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) if(trace == "all"){ lines(which(s<=nb), trc[s<=nb], col = "grey75") lines(which(s>=nb), trc[s>=nb], col = "black") } axt <- axTicks(1) title(main = paste("Beta [", ir, ",", ic, glab, "]: ", names(yvrs[ic]), " ON ", names(pvrs[ir]), sep = ""), cex.main = 1) if(trace == "imputation"){ axl <- sprintf("%d", thin*(axt+nb)) }else{ axl <- sprintf("%d", thin*axt) } axis(side = 1, at = axt, labels = axl) # trend line for trace (moving window average) if(all(is.numeric(smooth), smooth>0)){ B <- floor(niter/(smooth*thin)) mwa <- .movingAverage(trc, B, fill = TRUE) lines(mwa, col = "grey60") } # blue line if(trace == "all") abline(v = ceiling(nb/thin), col = "blue") # further plots if(trace == "burnin"){ drw <- x$par.burnin[["beta"]][ir, ic, sb, gg] }else{ drw <- x$par.imputation[["beta"]][ir, ic, si, gg] } # autocorrelation plot par(mar = c(3, 3, 1, 0)+0.5) ac <- acf(drw, lag.max = lag+2, plot = F) plot(ac[1:lag], ylim = c(-.1, 1), yaxt = "n", main = NULL, ylab = "ACF", ci = 0, ...) axis(side = 2, at = c(0, .5, 1)) abline(h = c(-.1, .1), col = "blue") # kernel density plot par(mar = c(3, 0, 2, 0)+0.5, mgp = c(2, 0, 0)) ddrw <- density(drw) plot(x = ddrw$y, y = ddrw$x, type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) # posterior summary par(mar = c(1, -0.5, 0, -0.5)+0.5) plot.new() text(0, 0.5, paste("EAP: ", sprintf(fmt = "%.3f", mean(drw)), "\n", "MAP: ", sprintf(fmt = "%.3f", ddrw$x[which.max(ddrw$y)]), "\n", "SD: ", sprintf(fmt = "%.3f", sd(drw)), "\n", "2.5%: ", sprintf(fmt = "%.3f", quantile(drw, .025)), "\n", "97.5%: ", sprintf(fmt = "%.3f", quantile(drw, .975)), "\n", "Rhat: ", sprintf(fmt = "%.3f", .GelmanRubin(t(drw), n.Rhat)), "\n", "ACF-k: ", sprintf(fmt = "%.3f", .smoothedACF(ac, k = lag, sd=.5)), "\n", sep = ""), adj = c(0, .5), cex=.8, family = "mono", font = 2, ...) if(export != "none"){ dev.off() }else{ devAskNewPage(ask = TRUE) } }}} # *** plots for fixed regression coefficients at level 2 # if(isL2 & "beta2" %in% print){ # check if pos is badly defined if(!is.null(pos)){ if(pos[1] > max(pvrs.L2) | pos[1] < min(pvrs.L2) | pos[2] > max(yvrs.L2) | pos[2] < min(yvrs.L2)){ .restoreDevice(oldpar, export, close = TRUE) stop("There is no entry [", pos[1], ",", pos[2], "] in 'beta2'.") } } for(ic in yvrs.L2){ for(ir in pvrs.L2){ # skip if individual parameters requested if(!is.null(pos)){ if(!(pos[1] == ir & pos[2] == ic)) next } if(export != "none"){ filename <- paste("BETA2_", gfile, names(yvrs.L2[ic]), "_ON_", names(pvrs.L2[ir]), ".", export, sep = "") filename <- gsub("[(),]", "", filename) filename <- gsub("[[:space:]]", "-", filename) out.args <- c(list(file = file.path(out, filename)), dev.args) do.call(export, out.args) } layout(matrix(c(1, 2, 3, 4), 2, 2), c(5, 1), c(1.13, 1)) # choose section of trace switch(trace, imputation={ trc <- x$par.imputation[["beta2"]][ir,ic,,gg][si] }, burnin={ trc <- x$par.burnin[["beta2"]][ir,ic,,gg][sb] }, all={ trc <- c(x$par.burnin[["beta2"]][ir,ic,,gg], x$par.imputation[["beta2"]][ir,ic,,gg])[s] } ) # trace plot par(mar = c(3, 3, 2, 0)+0.5, mgp = c(2, 1, 0), font.lab = 2) ymin <- min(trc) ymax <- max(trc) yr <- ymax-ymin plot(trc, type = ifelse(trace == "all", "n", "l"), ylab = "Trace", xlab = "Iteration", xaxt = "n", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) if(trace == "all"){ lines(which(s<=nb), trc[s<=nb], col = "grey75") lines(which(s>=nb), trc[s>=nb], col = "black") } axt <- axTicks(1) title(main = paste("Beta2 [", ir, ",", ic, glab, "]: ", names(yvrs.L2[ic]), " ON ", names(pvrs.L2[ir]), sep = ""), cex.main = 1) if(trace == "imputation"){ axl <- sprintf("%d", thin*(axt+nb)) }else{ axl <- sprintf("%d", thin*axt) } axis(side = 1, at = axt, labels = axl) # trend line for trace (moving window average) if(all(is.numeric(smooth), smooth>0)){ B <- floor(niter/(smooth*thin)) mwa <- .movingAverage(trc, B, fill = TRUE) lines(mwa, col = "grey60") } # blue line if(trace == "all") abline(v = ceiling(nb/thin), col = "blue") # further plots if(trace == "burnin"){ drw <- x$par.burnin[["beta2"]][ir,ic,sb,gg] }else{ drw <- x$par.imputation[["beta2"]][ir,ic,si,gg] } # autocorrelation plot par(mar = c(3, 3, 1, 0)+0.5) ac <- acf(drw, lag.max = lag+2, plot = F) plot(ac[1:lag], ylim = c(-.1, 1), yaxt = "n", main = NULL, ylab = "ACF", ci = 0, ...) axis(side = 2, at = c(0, .5, 1)) abline(h = c(-.1, .1), col = "blue") # kernel density plot par(mar = c(3, 0, 2, 0)+0.5, mgp = c(2, 0, 0)) ddrw <- density(drw) plot(x = ddrw$y, y = ddrw$x, type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) # posterior summary par(mar = c(1, -0.5, 0, -0.5)+0.5) plot.new() text(0, 0.5, paste("EAP: ", sprintf(fmt = "%.3f", mean(drw)), "\n", "MAP: ", sprintf(fmt = "%.3f", ddrw$x[which.max(ddrw$y)]), "\n", "SD: ", sprintf(fmt = "%.3f", sd(drw)), "\n", "2.5%: ", sprintf(fmt = "%.3f", quantile(drw, .025)), "\n", "97.5%: ", sprintf(fmt = "%.3f", quantile(drw, .975)), "\n", "Rhat: ", sprintf(fmt = "%.3f", .GelmanRubin(t(drw), n.Rhat)), "\n", "ACF-k: ", sprintf(fmt = "%.3f", .smoothedACF(ac, k = lag, sd=.5)), "\n", sep = ""), adj = c(0, .5), cex=.8, family = "mono", font = 2, ...) if(export != "none"){ dev.off() }else{ devAskNewPage(ask = TRUE) } }}} # *** plots for random effects' variance components # if(isML & "psi" %in% print){ # joint set of variables at level 1 and 2 yvrs.comb <- c(yvrs, if(isL2) yvrs.L2+length(yvrs)) # index matrix bvec <- t(expand.grid(qvrs, yvrs)) if(isL2) bvec <- cbind(bvec, t(expand.grid(1, yvrs.L2+length(yvrs)))) # attempt to fix pos if badly defined if(!is.null(pos)){ pos0 <- pos if(pos[2]>pos[1]){ # fix if pos is redundant/transposed pos[1] <- pos0[2] pos[2] <- pos0[1] } if(any(pos0 > max(yvrs.comb)) | any(pos0 < min(yvrs.comb))){ .restoreDevice(oldpar, export, close = TRUE) stop("There is no entry [", pos0[1], ",", pos0[2], "] in 'psi'.") } if(!identical(pos, pos0)) warning("Could not use entry [", pos0[1], ",", pos0[2], "] in 'psi'. Used [", pos[1], ",", pos[2], "] instead.") } dpsi <- length(yvrs)*length(qvrs) if(isL2) dpsi <- dpsi+length(yvrs.L2) # if only "diagonal" entries, fix max. column index to 1 cpsi <- if(kc == "diagonal") 1 else dpsi for(ic in 1:cpsi){ for(ir in ic:dpsi){ # skip if different individual parameters requested if(!is.null(pos)){ if(!(pos[1] == ir & pos[2] == ic)) next } # if only "diagonal" entries, use ir for all labels ic2 <- if(kc == "diagonal") ir else ic # check for residual at L2 icL2 <- ic > (length(yvrs)*length(qvrs)) irL2 <- ir > (length(yvrs)*length(qvrs)) if(export != "none"){ filename <- paste0("PSI_", gfile, names(yvrs.comb[bvec[2, ir]]), if(!irL2) paste0("_ON_", names(qvrs[bvec[1, ir]])), "_WITH_", names(yvrs.comb[bvec[2, ic2]]), if(!icL2) paste0("_ON_", names(qvrs[bvec[1, ic2]])), ".", export) filename <- gsub("[(),]", "", filename) filename <- gsub("[[:space:]]", "-", filename) out.args <- c(list(file = file.path(out, filename)), dev.args) do.call(export, out.args) } layout(matrix(c(1, 2, 3, 4), 2, 2), c(5, 1), c(1.13, 1)) switch(trace, imputation={ trc <- x$par.imputation[["psi"]][ir,ic,,gg][si] }, burnin={ trc <- x$par.burnin[["psi"]][ir,ic,,gg][sb] }, all={ trc <- c(x$par.burnin[["psi"]][ir,ic,,gg], x$par.imputation[["psi"]][ir,ic,,gg])[s] } ) # trace plot par(mar = c(3, 3, 2, 0)+0.5, mgp = c(2, 1, 0), font.lab = 2) ymin <- min(trc) ymax <- max(trc) yr <- ymax-ymin plot(trc, type = ifelse(trace == "all", "n", "l"), ylab = "Trace", xlab = "Iteration", xaxt = "n", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) if(trace == "all"){ lines(which(s<=nb), trc[s<=nb], col = "grey75") lines(which(s>=nb), trc[s>=nb], col = "black") } title(main = paste0("Psi [", ir, ",", ic, glab, "]: ", if(!irL2) "(", names(yvrs.comb[bvec[2, ir]]), if(!irL2) paste0(" ON ", names(qvrs[bvec[1, ir]]), ")"), " WITH ", if(!icL2) "(", names(yvrs.comb[bvec[2, ic2]]), if(!icL2) paste0(" ON ", names(qvrs[bvec[1, ic2]]), ")") ), cex.main = 1) axt <- axTicks(1) if(trace == "imputation"){ axl <- sprintf("%d", thin*(axt+nb)) }else{ axl <- sprintf("%d", thin*axt) } axis(side = 1, at = axt, labels = axl) # trend line for trace (moving window average) if(all(is.numeric(smooth), smooth>0)){ B <- floor(niter/(smooth*thin)) mwa <- .movingAverage(trc, B, fill = TRUE) lines(mwa, col = "grey60") } # blue line if(trace == "all") abline(v = ceiling(nb/thin), col = "blue") # further plots if(trace == "burnin"){ drw <- x$par.burnin[["psi"]][ir,ic,sb,gg] }else{ drw <- x$par.imputation[["psi"]][ir,ic,si,gg] } # autocorrelation plot par(mar = c(3, 3, 1, 0)+0.5) ac <- acf(drw, lag.max = lag+2, plot = F) plot(ac[1:lag], ylim = c(-.1, 1), yaxt = "n", main = NULL, ylab = "ACF", ci = 0, ...) axis(side = 2, at = c(0, .5, 1)) abline(h = c(-.1, .1), col = "blue") # kernel density plot par(mar = c(3, 0, 2, 0)+0.5, mgp = c(2, 0, 0)) ddrw <- density(drw) plot(x = ddrw$y, y = ddrw$x, type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) # posterior summary par(mar = c(1, -0.5, 0, -0.5)+0.5) plot.new() text(0, 0.5, paste("EAP: ", sprintf(fmt = "%.3f", mean(drw)), "\n", "MAP: ", sprintf(fmt = "%.3f", ddrw$x[which.max(ddrw$y)]), "\n", "SD: ", sprintf(fmt = "%.3f", sd(drw)), "\n", "2.5%: ", sprintf(fmt = "%.3f", quantile(drw, .025)), "\n", "97.5%: ", sprintf(fmt = "%.3f", quantile(drw, .975)), "\n", "Rhat: ", sprintf(fmt = "%.3f", .GelmanRubin(t(drw), n.Rhat)), "\n", "ACF-k: ", sprintf(fmt = "%.3f", .smoothedACF(ac, k = lag, sd=.5)), "\n", sep = ""), adj = c(0, .5), cex=.8, family = "mono", font = 2, ...) if(export != "none"){ dev.off() }else{ devAskNewPage(ask = TRUE) } }}} # *** plots for residual variance components # if("sigma" %in% print){ # cluster-specific covariance matrices stacked in rows gind <- attr(x$data, "group") == grp.labels[gg] clus2 <- unique(x$data[gind, clus]) clus3 <- if(rl1) seq_along(clus2) else 1 # attempt to fix pos if badly defined if(!is.null(pos)){ pos0 <- pos dims <- dim(x$par.imputation$sigma) if(pos[2] > length(yvrs)){ # fix if pos is transposed pos[1] <- pos0[2] pos[2] <- pos0[1] pos0 <- pos } if(pos[2] > ((pos[1]-1)%%length(yvrs))+1){ # fix if pos is redundant pos[1] <- pos0[1] - pos0[1]%%length(yvrs) + pos0[2] pos[2] <- pos0[1]%%length(yvrs) } if(all(pos0 > max(yvrs)) | any(pos0 < min(yvrs)) | max(pos0) > dims[1]){ .restoreDevice(oldpar, export, close = TRUE) stop("There is no entry [", pos0[1], ",", pos0[2], "] in 'sigma'.") } if(!identical(pos, pos0)) warning("Could not use entry [", pos0[1], ",", pos0[2], "] in 'sigma'. Used [", pos[1], ",", pos[2], "] instead.") } # if only "diagonal" entries, fix max. column index to 1 csig <- if(kc == "diagonal") 1 else length(yvrs) for(icl in clus3){ for(ic in 1:csig){ for(ir in ic:length(yvrs)){ # adjust row index for cluster-specific covariance matrices ir2 <- ir+(icl-1)*length(yvrs) # if only "diagonal" entries, use ir for all labels ic2 <- if(kc == "diagonal") ir else ic # skip if individual parameters requested if(!is.null(pos)){ if(!(pos[1] == ir2 & pos[2] == ic)) next } if(export != "none"){ filename <- paste0("SIGMA_", gfile, names(yvrs[ir]), "_WITH_", names(yvrs[ic2]), if(rl1) paste0("_", clus, clus2[icl]), ".", export) filename <- gsub("[(),]", "", filename) filename <- gsub("[[:space:]]", "-", filename) out.args <- c(list(file = file.path(out, filename)), dev.args) do.call(export, out.args) } layout(matrix(c(1, 2, 3, 4), 2, 2), c(5, 1), c(1.13, 1)) switch(trace, imputation={ trc <- x$par.imputation[["sigma"]][ir2,ic,,gg][si] }, burnin={ trc <- x$par.burnin[["sigma"]][ir2,ic,,gg][sb] }, all={ trc <- c(x$par.burnin[["sigma"]][ir2,ic,,gg], x$par.imputation[["sigma"]][ir2,ic,,gg])[s] } ) # trace plots par(mar = c(3, 3, 2, 0)+0.5, mgp = c(2, 1, 0), font.lab = 2) ymin <- min(trc) ymax <- max(trc) yr <- ymax-ymin plot(trc, type = ifelse(trace == "all", "n", "l"), ylab = "Trace", xlab = "Iteration", xaxt = "n", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) if(trace == "all"){ lines(which(s<=nb), trc[s<=nb], col = "grey75") lines(which(s>=nb), trc[s>=nb], col = "black") } title(main = paste0("Sigma [", ir2, ",", ic, glab, "]: ", names(yvrs[ir]), " WITH ", names(yvrs[ic2]), if(rl1) paste0(" [", clus, ":", clus2[icl], "]")), cex.main = 1) axt <- axTicks(1) if(trace == "imputation"){ axl <- sprintf("%d", thin*(axt+nb)) }else{ axl <- sprintf("%d", thin*axt) } axis(side = 1, at = axt, labels = axl) # trend line for trace (moving window average) if(all(is.numeric(smooth), smooth>0)){ B <- floor(niter/(smooth*thin)) mwa <- .movingAverage(trc, B, fill = TRUE) lines(mwa, col = "grey60") } # blue line if(trace == "all") abline(v = ceiling(nb/thin), col = "blue") # further plots if(trace == "burnin"){ drw <- x$par.burnin[["sigma"]][ir2,ic,sb,gg] }else{ drw <- x$par.imputation[["sigma"]][ir2,ic,si,gg] } # autocorrelation plot par(mar = c(3, 3, 1, 0)+0.5) ac <- acf(drw, lag.max = lag+2, plot = F) plot(ac[1:lag], ylim = c(-.1, 1), yaxt = "n", main = NULL, ylab = "ACF", ci = 0, ...) axis(side = 2, at = c(0, .5, 1)) abline(h = c(-.1, .1), col = "blue") # kernel density plot par(mar = c(3, 0, 2, 0)+0.5, mgp = c(2, 0, 0)) ddrw <- density(drw) plot(x = ddrw$y, y = ddrw$x, type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", ylim = c(ymin-yr*.03, ymax+yr*.03), ...) # posterior summary par(mar = c(1, -0.5, 0, -0.5)+0.5) plot.new() text(0, 0.5, paste("EAP: ", sprintf(fmt = "%.3f", mean(drw)), "\n", "MAP: ", sprintf(fmt = "%.3f", ddrw$x[which.max(ddrw$y)]), "\n", "SD: ", sprintf(fmt = "%.3f", sd(drw)), "\n", "2.5%: ", sprintf(fmt = "%.3f", quantile(drw, .025)), "\n", "97.5%: ", sprintf(fmt = "%.3f", quantile(drw, .975)), "\n", "Rhat: ", sprintf(fmt = "%.3f", .GelmanRubin(t(drw), n.Rhat)), "\n", "ACF-k: ", sprintf(fmt = "%.3f", .smoothedACF(ac, k = lag, sd=.5)), "\n", sep = ""), adj = c(0, .5), cex=.8, family = "mono", font = 2, ...) if(export != "none"){ dev.off() }else{ devAskNewPage(ask = TRUE) } }}}} } plot.new() par(oldpar) if(export == "none") devAskNewPage(ask = FALSE) dev.off() invisible() } # restore and shut down parameters upon error .restoreDevice <- function(pars, export, close = TRUE){ par(pars) if(export == "none") devAskNewPage(ask = FALSE) if(close) dev.off() invisible() } # moving window average for time series .movingAverage <- function(x, B, fill = TRUE){ x1 <- cumsum(x) N <- length(x) y <- rep(NA, N) i <- seq(B+1 , N-B) xdiff <- x1[ -seq(1, B) ] - x1[ -seq(N-B+1, N) ] xdiff <- xdiff[ - seq(1, B) ] y[i] <- ( x1[i] + xdiff - c(0, x1[ -seq(N-2*B, N) ]) ) / (2*B+1) # fill NAs at beginning and end of time series if(fill){ j <- seq(0, B-1) ybeg <- sapply(j, function(z) sum( x[ seq(1, (2*z+1)) ]) / (2*z+1) ) yend <- sapply(rev(j), function(z) sum( x[ seq(N-2*z, N) ] ) / (2*z+1) ) y[j+1] <- ybeg y[rev(N-j)] <- yend } y } # lag-k autocorrelation smoothed by values of a normal density .smoothedACF <- function(x, k, sd=.5){ x0 <- x$ac[-1, 1, 1] n <- length(x0) add <- n-k x0 <- x0[(k-add):n] # weights based on normal density w <- dnorm(-add:add, 0, sd) y <- sum( x0 * (w/sum(w)) ) y } mitml/R/internal-zzz.R0000644000176200001440000001012514127022177014371 0ustar liggesusers.localUpdate <- function(object, envir = parent.frame(), ...){ # update call in parent frame cll <- getCall(object) if (is.null(call)) stop("Need an object with a call component.") # update call components based on additional arguments (...) extras <- match.call(expand.dots = FALSE)$... for(i in names(extras)) cll[[i]] <- extras[[i]] # update in local environment eval(cll, envir = envir) } .checkDeprecated <- function(x, arg.list, name){ # match argument list (arg.list, usually ...) by name against deprecated (name) # and return matching value if match is found, otherwise return original value # (x) cll <- match.call() nms <- names(arg.list) m <- sapply(nms, function(n, o){ m <- try(match.arg(n, o), silent = TRUE) return(if(inherits(m, "try-error")) NA else m) }, o = name) # if match is found, print message and assign value to new name if(any(!is.na(m))){ ans <- arg.list[[nms[1]]] msg <- paste0("The '", name, "' argument is deprecated. Please use '", as.character(cll[[2]]), "' instead.") warning(msg) }else{ ans <- x } return(ans) } .checkNamespace <- function(x){ # check required packages for supported object types # specify class-package pairs cls.pkg <- list( "lme4" = "^g?l?merMod$", "nlme" = "^n?lme$", "geepack" = "^geeglm$", "survival" = "^coxph$", "MASS" = "^polr$" ) # match class to package names req.pkg <- lapply(cls.pkg, function(p, x) grep(pattern = p, x = x, value = TRUE), x = x) req.pkg <- req.pkg[sapply(req.pkg, length) > 0] for(i in seq_along(req.pkg)){ pkg.name <- names(req.pkg)[i] pkg.cls <- paste(req.pkg[[i]], collapse = "|") if(!requireNamespace(pkg.name, quietly = TRUE)) stop("The '", pkg.name, "' package must be installed in order to use this function with objects of class '", pkg.cls, "'.") } invisible(NULL) } .formatTable <- function(x, prefix = "%.", postfix = "f", digits = 3, sci.limit = 5, width, col.names, row.names, labels = NULL, labels.sep = 3){ # format table with common format and fixed width # row and column names if(missing(col.names)) col.names <- colnames(x) if(missing(row.names)) row.names <- rownames(x) # fotmat fmt <- paste0(prefix, digits, postfix) if(ncol(x) %% length(fmt)) stop("Format and table dimensions do not match.") fmt <- rep_len(fmt, length.out = ncol(x)) # format for large values isLarge <- apply(x, 2, function(z, a) any(z >= 10^a), a = sci.limit) fmt[isLarge] <- sub(paste0(postfix, "$"), "e", fmt[isLarge]) # make formatted matrix y <- matrix("", nrow(x), ncol(x)) for(i in seq_len(ncol(x))) y[,i] <- sprintf(fmt[i], x[,i] + 0) # find width if(missing(width)) width <- max(sapply(c(colnames(x), y), nchar)) # fill table out <- matrix("", nrow(x)+1, ncol(x)+1) out[,1] <- format(c("", row.names), justify = "left") out[1, -1] <- format(col.names, justify = "right", width = width) out[-1, -1] <- format(y, justify = "right", width = width) # add labels (if any) if(!is.null(labels)){ labels[nchar(labels) > 0] <- paste0("(", labels[nchar(labels) > 0], ")") pl <- format(labels, justify = "left") nc <- max(nchar(pl)) out[-1, 1] <- paste0(out[-1, 1], paste0(rep(" ", labels.sep), collapse = ""), pl) out[1, 1] <- paste0(out[1, 1], paste0(rep(" ", nc + labels.sep), collapse = "")) } return(out) } .extractMatrix <- function(x, ...){ # extract submatrix from array (indexed by ...) if(is.null(dim(x))) return(x) out <- `[`(x, , , ...) dim(out) <- dim(x)[1:2] dimnames(out) <- dimnames(x)[1:2] out } .adiag <- function(x, stacked = FALSE){ # extract diagonal elements of first two dimensions in three-dimensional array # containing either square (default) or stacked matrices d <- dim(x) # indices for diagonal entries (square or stacked-square) if(stacked){ i <- seq_len(d[2]) + d[1]*(seq_len(d[2])-1) i <- outer(i, (seq_len(d[1]/d[2])-1)*d[2], `+`) i <- outer(i, (seq_len(d[3])-1)*d[1]*d[2], `+`) }else{ i <- seq_len(d[1]) + d[1]*(seq_len(d[1])-1) i <- outer(i, (seq_len(d[3])-1)*d[1]^2, `+`) } x[as.vector(i)] } mitml/R/internal-convergence.R0000644000176200001440000000352114001604651016026 0ustar liggesusers# Gelman-Rubin (1992) criterion for convergence (Rhat) .GelmanRubin <- function(x, m){ # check NA if(all(is.na(x))) return(NA) # convert vector if(is.vector(x)) x <- matrix(x, 1, length(x)) iter <- ncol(x) mod <- iter %% m n <- rep( (iter-mod)/m , m ) nmat <- matrix(c(cumsum(n)-n+1, cumsum(n)), nrow = m) n <- n[1] Rhat <- numeric(nrow(x)) for(ii in 1:nrow(x)){ # values per chain chs <- apply(nmat, 1, function(j) x[ii, j[1]:j[2]]) mns <- apply(chs, 2, mean) vrs <- apply(chs, 2, var) Bdivn <- var(mns) W <- mean(vrs) muhat <- mean(chs) sighat2 <- (n-1)/n * W + Bdivn # sampling distribution Vhat <- sighat2 + Bdivn/m var.Vhat <- ((n-1)/n)^2*(1/m)*var(vrs) + ((m+1)/(m*n))^2*2/(m-1)*(Bdivn*n)^2 + 2*((m+1)*(n-1)/(m*n^2)) * (n/m)*(cov(vrs, mns^2)-2*muhat*cov(vrs, mns)) df <- 2*Vhat^2 / var.Vhat # compute Rhat if(Bdivn == 0 & identical(vrs, rep(0, m))){ # for zero variance defined as 1 Rhat[ii] <- 1 }else{ Rhat[ii] <- sqrt( (Vhat/W)*df/(df-2) ) } } Rhat } # criterion for goodness of approximation (Hoff, 2009) .SDprop <- function(x){ # check NA if(all(is.na(x))) return(NA) # convert vector if(is.vector(x)) x <- matrix(x, 1, length(x)) np <- nrow(x) v <- apply(x, 1, var) # variance of chain v0 <- v == 0 sdp <- sp0 <- neff <- numeric(np) for(i in 1:np){ arp <- try( ar(x[i,], aic = TRUE), silent = T ) if(!v0[i]) sp0[i] <- arp$var.pred/(1 - sum(arp$ar))^2 # spectral density at frequency 0 } n <- ncol(x) mcmc.v <- sp0/n # true variance of the mean (correcting for autocorrelation) neff[!v0] <- (v/mcmc.v)[!v0] # effective sample size neff[v0] <- n # proportion of variance due to sampling inefficiency sdp[!v0] <- sqrt(mcmc.v / v)[!v0] attr(sdp, "n.eff") <- neff sdp } mitml/R/multilevelR2.R0000644000176200001440000000762014002017337014310 0ustar liggesusersmultilevelR2 <- function(model, print = c("RB1", "RB2", "SB", "MVP")){ # print argument case insensitive print <- toupper(print) print <- match.arg(print, several.ok = TRUE) method <- NULL # select method if(is.list(model)){ cls <- class(model[[1]]) if(inherits(model[[1]], "merMod")) method <- "lmer" if(inherits(model[[1]], "lme")) method <- "nlme" }else{ cls <- class(model) if(inherits(model, "merMod")) method <- "lmer" if(inherits(model, "lme")) method <- "nlme" } if(is.null(method)) stop("Calculation of multilevel R-squared statistics not supported for models of class '", paste0(cls, collapse = "|"), "'.") # calculate R-squared if(is.list(model)){ out <- sapply(model, .getRsquared, print = print, method = method) if(is.null(dim(out))) out <- matrix(out, nrow = 1) out <- rowMeans(out) }else{ out <- .getRsquared(model, print, method) } out } .getRsquared <- function(model, print, method){ # R squared for single model fit (lme4) # check if refit is necessary refit <- any(c("RB1", "RB2", "SB") %in% print) if(method == "lmer"){ # model terms trm <- terms(model) if(!as.logical(attr(trm, "intercept"))) stop("Model must contain intercept.") yvr <- as.character(attr(trm, "variables")[-1])[attr(trm, "response")] cvr <- names(lme4::getME(model, "flist")) if(length(cvr)>1) stop("Calculation of R-squared only support for models with a single cluster variable.") cvr <- cvr[1] if(refit){ # fit null model fml0 <- formula(paste0(yvr, "~1+(1|", cvr, ")")) model0 <- update(model, fml0) # variance components under null vc0 <- lme4::VarCorr(model0) s0 <- attr(vc0, "sc")^2 t0.0 <- vc0[[cvr]][1, 1] } # alternative model components beta <- lme4::fixef(model)[-1] X <- lme4::getME(model, "X")[, -1, drop = F] Z <- lme4::getME(model, "mmList")[[1]][, -1, drop = F] muX <- colMeans(X) muZ <- colMeans(Z) vZ <- cov(Z) # predicted and total variance vc1 <- lme4::VarCorr(model) t0.1 <- vc1[[cvr]][1, 1] t10.1 <- vc1[[cvr]][1, -1] t11.1 <- vc1[[cvr]][-1, -1, drop = F] s1 <- attr(vc1, "sc")^2 } if(method == "nlme"){ # model terms trm <- terms(model) if(!as.logical(attr(trm, "intercept"))) stop("Model must contain intercept.") yvr <- as.character(attr(trm, "variables")[-1])[attr(trm, "response")] cvr <- attr(nlme::getGroups(model), "label") if(length(nlme::getGroupsFormula(model, asList = T))>1) stop("Calculation of R-squared only support for models with a single cluster variable.") if(refit){ # fit null model ffml0 <- formula(paste0(yvr, "~1")) rfml0 <- formula(paste0("~1|", cvr, "")) if(is.null(nlme::getData(model))) stop("No data sets found in 'lme' fit. See '?testModels' for an example.") model0 <- update(model, fixed = ffml0, random = rfml0, data = model$data) # variance components under null vc0 <- nlme::getVarCov(model0) s0 <- model0$sigma^2 t0.0 <- vc0[1, 1] } # alternative model components beta <- nlme::fixef(model)[-1] fe <- model$terms X <- model.matrix(fe, nlme::getData(model))[, -1, drop = F] re <- attr(model$modelStruct$reStruct[[1]], "formula") Z <- model.matrix(re, nlme::getData(model))[, -1, drop = F] muX <- colMeans(X) muZ <- colMeans(Z) vZ <- cov(Z) # predicted and total variance vc1 <- nlme::getVarCov(model) t0.1 <- vc1[1, 1] t10.1 <- vc1[1, -1] t11.1 <- vc1[-1, -1, drop = F] s1 <- model$sigma^2 } # calculate R2 vyhat <- var( X %*% beta ) vy <- vyhat + t0.1 + 2*(muZ %*% t10.1) + muZ%*%t11.1%*%muZ + sum(diag(t11.1%*%vZ)) + s1 if(refit){ rb1 <- 1 - s1/s0 rb2 <- 1 - t0.1/t0.0 sb <- 1 - (s1+t0.1)/(s0+t0.0) }else{ rb1 <- rb2 <- sb <- NA } mvp <- as.vector(vyhat/vy) c(RB1 = rb1, RB2 = rb2, SB = sb, MVP = mvp)[print] } mitml/R/testModels.R0000644000176200001440000001551614116623723014060 0ustar liggesuserstestModels <- function(model, null.model, method = c("D1", "D2", "D3", "D4"), use = c("wald", "likelihood"), ariv = c("default", "positive", "robust"), df.com = NULL, data = NULL){ # model comparison and hypothesis tests for k-dimensional estimands # *** # check input # # check model specification m <- length(model) if(!(is.list(model) && is.list(null.model))) stop("The 'model' and 'null.model' arguments must be lists of fitted statistical models.") if(length(null.model) != m) stop("The 'model' and 'null.model' arguments must be lists with the same length.") # match methods method <- match.arg(method) use <- match.arg(use) ariv <- match.arg(ariv) # check for incompatible arguments if(!is.null(df.com) && method != "D1") warning("Complete-data degrees of freedom are not available for use with '", method, "' and were ignored.") if(use == "likelihood" && method != "D2") warning("The 'likelihood' option is not available with method '", method ,"' and was ignored.") if(!is.null(data) && method != "D4") warning("The 'data' argument is not used with method '", method ,"' and was ignored.") if(ariv == "positive" && method == "D1") warning("The 'positive' option is not available with method 'D1' and was ignored.") if(ariv == "robust" && method != "D4") warning("The 'robust' option is not available with method '", method ,"' and was ignored.") # check model classes cls <- class(model[[1]]) cls.null <- class(null.model[[1]]) if(cls[1] != cls.null[1]) warning("The 'model' and 'null.model' arguments appear to include objects of different classes. Results may not be trustworthy.") .checkNamespace(union(cls, cls.null)) # check for REML and refit (if needed) reml.model <- sapply(model, .checkREML) reml.null.model <- sapply(null.model, .checkREML) reml <- any(reml.model, reml.null.model) need.refit <- FALSE if(reml){ need.refit <- (method == "D2" && use == "likelihood") || method == "D3" || method == "D4" if(need.refit){ model[reml.model] <- lapply(model[reml.model], .updateML) null.model[reml.null.model] <- lapply(null.model[reml.null.model], .updateML) } } # *** # D1 # if(method == "D1"){ # FIXME: better way to handle this? if(inherits(model[[1]], "lavaan")) stop("The 'D1' method is currently not supported for objects of class 'lavaan'. Please see '?testModels' for a list of supported model types.") est <- .extractParameters(model, diagonal = FALSE) est.null <- .extractParameters(null.model, diagonal = FALSE) par.diff <- est$nms[!(est$nms %in% est.null$nms)] par.ind <- match(par.diff, est$nms) if(length(par.diff) == 0L) stop("The 'model' and 'null.model' appear not to be nested or include the same set of parameters.") k <- length(par.diff) Qhat <- est$Qhat[par.ind,, drop = FALSE] Uhat <- est$Uhat[par.ind, par.ind,, drop = FALSE] # D1 (Li et al., 1991) D <- .D1(Qhat = Qhat, Uhat = Uhat, df.com = df.com) r <- D$r val <- D$F v <- D$v } # *** # D2 # if(method == "D2"){ if(use == "wald"){ # FIXME: better way to handle this? if(inherits(model[[1]], "lavaan")) stop("The 'D2' method currently only supports likelihood-based comparisons for objects of class 'lavaan'. Please see '?testModels' for a list of supported model types.") # extract parameter estimates est <- .extractParameters(model, diagonal = FALSE) est.null <- .extractParameters(null.model, diagonal = FALSE) par.diff <- est$nms[!(est$nms %in% est.null$nms)] par.ind <- match(par.diff, est$nms) if(length(par.diff) == 0L) stop("The 'model' and 'null.model' appear not to be nested or include the same set of parameters.") # Wald tests k <- length(par.diff) Qhat <- est$Qhat[par.ind,, drop = FALSE] Uhat <- est$Uhat[par.ind, par.ind,, drop = FALSE] dW <- sapply(seq_len(m), function(z) t(Qhat[,z]) %*% solve(Uhat[,,z]) %*% Qhat[,z]) } if(use == "likelihood"){ # extract logLik ll <- .evaluateLogLik(model) ll.null <- .evaluateLogLik(null.model) ll.diff <- ll.null$LL - ll$LL if(is.null(ll$df) || is.null(ll.null$df)) stop("Degrees of freedom for the model comparison could not be detected.") k <- ll$df - ll.null$df # account for numerical imprecision isEqual <- mapply(function(x, y) isTRUE(all.equal(x, y)), x = ll$LL, y = ll.null$LL) ll.diff[isEqual] <- 0L # LR tests dW <- -2 * (ll.diff) } # D2 (Li, Meng et al., 1991) D <- .D2(d = dW, k = k) r <- D$r if(ariv == "positive") r <- max(0, r) val <- D$F v <- D$v } # *** # D3 # if(method == "D3"){ # evaluate log-likelihood at estimated and pooled values of model parameters ll <- .evaluateUserLogLik(model) ll.null <- .evaluateUserLogLik(null.model) k <- ll$df - ll.null$df # D3 (Meng & Rubin, 1992) dL.bar <- mean(-2 * (ll.null$LL - ll$LL)) dL.tilde <- mean(-2 * (ll.null$LL.pooled - ll$LL.pooled)) r <- (m+1) * (k*(m-1))^(-1) * (dL.bar - dL.tilde) if(ariv == "positive") r <- max(0, r) val <- dL.tilde / (k*(1+r)) t <- k*(m-1) if( t > 4 ){ v <- 4 + (t-4) * (1 + (1-2*t^(-1)) * r^(-1))^2 }else{ v <- t * (1+k^(-1)) * (1+r^(-1))^2 / 2 } use <- "likelihood" } # *** # D4 # if(method == "D4"){ # evaluate log-likelihood at estimated and pooled values of model parameters ll <- .evaluateStackedLogLik(model, datalist = data) ll.null <- .evaluateStackedLogLik(null.model, datalist = data) ll.diff <- ll.null$LL - ll$LL ll.stacked.diff <- ll.null$LL.stacked - ll$LL.stacked k <- ll$df - ll.null$df h <- ll$df # account for numerical imprecision if(isTRUE(all.equal(ll.stacked.diff[1], 0))) ll.stacked.diff <- 0L isEqual <- mapply(function(x, y) isTRUE(all.equal(x, y)), x = ll$LL, y = ll.null$LL) ll.diff[isEqual] <- 0L # D4 (Chan & Meng, 2019) dbar <- mean(-2 * ll.diff) dhat <- -2 * ll.stacked.diff if(ariv == "robust"){ deltabar <- 2 * mean(ll$LL) deltahat <- 2 * ll$LL.stacked r <- (m+1) / (h*(m-1)) * (deltabar - deltahat) v <- (h*(m-1)) * (1 + 1/r)^2 }else{ r <- (m+1) / (k*(m-1)) * (dbar - dhat) if(ariv == "positive") r <- max(0, r) v <- (k*(m-1)) * (1 + r^(-1))^2 } val <- dhat / (k*(1+r)) use <- "likelihood" } # create output pval <- pf(val, k, v, lower.tail = FALSE) out <- matrix(c(val, k, v, pval, r), ncol = 5) colnames(out) <- c("F.value", "df1", "df2", "P(>F)", "RIV") # new label for p-value, SiG 2017-02-09 out <- list( call = match.call(), test = out, m = m, method = method, adj.df = !is.null(df.com), df.com = df.com, use = use, ariv = ariv, data = !is.null(data), reml = reml, refit = need.refit ) class(out) <- "mitml.testModels" return(out) } mitml/R/amelia2mitml.list.R0000644000176200001440000000023414001605533015243 0ustar liggesusersamelia2mitml.list <- function(x){ # convert amelia to mitml.list out <- unname(x$imputations) class(out) <- c("mitml.list", "list") return(out) } mitml/R/internal-methods-estimates.R0000644000176200001440000002313514127022445017176 0ustar liggesusers# *** # global functions # # * parameter estimates and variance-covariance matrix .extractParameters <- function(model, diagonal = FALSE, include.extra.pars = FALSE){ # number of imputations m <- length(model) # extract parameter estimates and variance-covariance matrices Qhat <- lapply(model, .getCoef, include.extra.pars = include.extra.pars) Uhat <- lapply(model, .getVcov, include.extra.pars = include.extra.pars) p <- length(Qhat[[1]]) nms <- names(Qhat[[1]]) # preserve parameter labels (if any) attr(nms, "par.labels") <- attr(Qhat[[1]], "par.labels") # ensure proper dimensions stopifnot(all(p == dim(Uhat[[1]]))) Qhat <- matrix(unlist(Qhat), nrow = p, ncol = m) Uhat <- array(unlist(Uhat), dim = c(p, p, m)) # extract diagonal if(diagonal){ Uhat <- apply(Uhat, 3, diag) if(is.null(dim(Uhat))) dim(Uhat) <- dim(Qhat) } out <- list(Qhat = Qhat, Uhat = Uhat, nms = nms) return(out) } # * misc. parameter estimates (e.g., variance components) .extractMiscParameters <- function(model){ # number of imputations m <- length(model) # extract parameter estimates and variance-covariance matrices Qhat <- lapply(model, .getMisc) p <- length(Qhat[[1]]) nms <- names(Qhat[[1]]) # preserve parameter labels (if any) attr(nms, "par.labels") <- attr(Qhat[[1]], "par.labels") # ensure proper dimensions if(is.null(Qhat[[1]])){ Qhat <- NULL }else{ Qhat <- matrix(unlist(Qhat), nrow = p, ncol = m) } out <- list(Qhat = Qhat, nms = nms) return(out) } # *** # generic functions # .getCoef <- function(object, ...) UseMethod(".getCoef", object) .getVcov <- function(object, ...) UseMethod(".getVcov", object) .getMisc <- function(object, ...) UseMethod(".getMisc", object) # *** # default methods # .getCoef.default <- function(object, ...) return(coef(object)) .getVcov.default <- function(object, ...) return(as.matrix(vcov(object))) .getMisc.default <- function(object) return(NULL) # *** # class-specific methods # # * stats::lm .getMisc.lm <- function(object){ # residual variance res <- resid(object) rv <- sum(res^2) / df.residual(object) names(rv) <- "Residual~~Residual" return(rv) } # * stats::glm .getMisc.glm <- function(object){ fam <- tolower(object$family$family) if(fam == "gaussian") .getMisc.lm(object) return(NULL) } # * MASS::polr .getCoef.polr <- function(object, ...) return(summary(object)$coefficients[,1]) # * lme4::(g)lmer .getCoef.merMod <- function(object, ...) return(lme4::fixef(object)) .getMisc.merMod <- function(object){ # check if model uses scale useSc <- lme4::getME(object, "devcomp")$dims["useSc"] == 1 # variance components by cluster variable vc <- lme4::VarCorr(object) clus <- names(vc) # loop over cluster variables out.list <- list() for(cc in clus){ vc.cc <- vc[[cc]] if(is.null(dim(vc.cc))) dim(vc.cc) <- c(1, 1) nms <- sub("^[(]Intercept[)]$", "Intercept", rownames(vc.cc)) vc.out <- diag(vc.cc) names(vc.out) <- paste0(nms, "~~", nms, "|", cc) vc.ind <- which(upper.tri(vc.cc), arr.ind = TRUE) for(ii in seq_len(nrow(vc.ind))){ vc.ii <- vc.cc[vc.ind[ii, , drop = FALSE]] names(vc.ii) <- paste0(nms[vc.ind[ii, 1]], "~~", nms[vc.ind[ii, 2]], "|", cc) vc.out <- c(vc.out, vc.ii) } out.list[[cc]] <- vc.out } # residual variance (if model uses scale) if(useSc){ rv <- attr(vc, "sc")^2 names(rv) <- "Residual~~Residual" out.list[["Residual"]] <- rv } # get additional parameters (ICC; only for single clustering) if(useSc && length(clus) == 1){ hasIntercept <- "(Intercept)" %in% colnames(vc[[clus]]) if(hasIntercept){ iv <- vc[[clus]]["(Intercept)", "(Intercept)"] icc <- iv / (iv + rv) names(icc) <- paste("ICC|", clus, sep = "") } out.list[["ICC"]] <- icc } out <- do.call(c, unname(out.list)) return(out) } # * nlme::lme .getCoef.lme <- function(object, ...) return(nlme::fixef(object)) .getMisc.lme <- function(object){ # check if model uses fixed sigma (no scale) fixedSigma <- attr(object$modelStruct, "fixedSigma") # variance components by cluster variable vc.list <- .listVC_lme(object) out.list <- list() cl <- names(vc.list) for(cc in names(vc.list)){ vc.cc <- vc.list[[cc]] nms <- sub("^[(]Intercept[)]$", "Intercept", attr(vc.cc, "nms")) vc.out <- diag(vc.cc) names(vc.out) <- paste0(nms, "~~", nms, "|", cc) vc.ind <- which(upper.tri(vc.cc), arr.ind = TRUE) for(ii in seq_len(nrow(vc.ind))){ vc.ii <- vc.cc[vc.ind[ii, , drop = FALSE]] names(vc.ii) <- paste0(nms[vc.ind[ii, 1]], "~~", nms[vc.ind[ii, 2]], "|", cc) vc.out <- c(vc.out, vc.ii) } out.list[[cc]] <- vc.out } # residual variance (if model does not use fixed sigma) if(!fixedSigma){ rv <- object$sigma^2 names(rv) <- "Residual~~Residual" out.list[["Residual"]] <- rv } # get additional parameters (ICC; only for single clustering) if(!fixedSigma && length(cl) == 1){ vc <- vc.list[[cl]] rownames(vc) <- colnames(vc) <- attr(vc.list[[cl]], "nms") hasIntercept <- "(Intercept)" %in% rownames(vc) if(hasIntercept){ iv <- vc["(Intercept)", "(Intercept)"] icc <- iv / (iv + rv) names(icc) <- paste("ICC|", cl, sep = "") } out.list[["ICC"]] <- icc } out <- do.call(c, unname(out.list)) return(out) } .listVC_lme <- function(object){ # read random effects structure re <- rev(object$modelStruct$reStruct) # see nlme:::VarCorr.lme vc <- lapply(re, nlme::VarCorr, rdig = 10^6, sigma = object$sigma) cl <- names(vc) # loop over cluster variables vc.list <- list() for(cc in cl){ vc.cc <- vc[[cc]] if(is.null(dim(vc.cc))) dim(vc.cc) <- c(1, 1) # standard deviation of random effects vc.sd <- vc.cc[,"StdDev"] # correlation and covariance matrix of random effects if(length(vc.sd) == 1){ vc.cov <- as.matrix(vc.sd^2) attr(vc.cov, "nms") <- rownames(vc.cc)[1] }else{ vc.cor <- cbind(attr(vc.cc, "corr"), "") vc.cor[upper.tri(vc.cor, diag = TRUE)] <- "" storage.mode(vc.cor) <- "numeric" diag(vc.cor) <- 1 vc.cor[upper.tri(vc.cor)] <- t(vc.cor)[upper.tri(t(vc.cor))] # calculate covariance matrix vc.sd <- diag(vc.cc[,"StdDev"]) vc.cov <- vc.sd %*% vc.cor %*% vc.sd attr(vc.cov, "nms") <- rownames(vc.cc) } vc.list[[cc]] <- vc.cov } return(vc.list) } # * geepack::geeglm .getMisc.geeglm <- function(object){ fixedScale <- length(object$geese$gamma) == 0 fixedCor <- length(object$geese$alpha) == 0 # scale parameter (gamma) out.list <- list() if(!fixedScale){ gam <- object$geese$gamma nms <- gsub("^[(]Intercept[)]$", "Intercept", names(gam)) names(gam) <- paste0("Scale:", nms) out.list[["gamma"]] <- gam } # correlation parameters (alpha) if(!fixedCor){ alpha <- object$geese$alpha names(alpha) <- paste0("Correlation:", names(alpha)) out.list[["alpha"]] <- alpha } out <- do.call(c, unname(out.list)) return(out) } # * lavaan::lavaan .getCoef.lavaan <- function(object, include.extra.pars = FALSE, ...){ # extract parameter estimates pt <- lavaan::parTable(object) isFree <- pt[["free"]] > 0 & !duplicated(pt[["free"]]) # see lavaan:::lav_object_inspect_coef isDefined <- pt[["op"]] == ":=" isCoef <- if(include.extra.pars) isFree | isDefined else isFree hasGroups <- lavaan::lavInspect(object, "ngroups") > 1 hasLevels <- lavaan::lavInspect(object, "nlevels") > 1 # parameter names # NOTE: replaces names in coef() with names that are independent of the user- # assigned parameter labels (can be inconsistent across models) nms <- pt[, c("lhs", "op", "rhs")] if(hasLevels) nms[["level"]] <- paste0(".l", pt[, "level"]) if(hasGroups) nms[["group"]] <- paste0(".g", pt[, "group"]) nms <- do.call(mapply, c(as.list(nms), list(FUN = paste0))) out <- pt[isCoef, "est"] names(out) <- nms[isCoef] # preserve user-defined parameter labels hasLabels <- any(nchar(pt[isCoef, "label"]) > 0) if(hasLabels) attr(out, "par.labels") <- pt[isCoef, "label"] return(out) } .getVcov.lavaan <- function(object, include.extra.pars = FALSE, ...){ pt <- lavaan::parTable(object) hasDefined <- any(pt[["op"]] == ":=") if(hasDefined && include.extra.pars){ out <- lavaan::lavInspect(object, "vcov.def.joint") }else{ out <- lavaan::lavInspect(object, "vcov") } rownames(out) <- colnames(out) <- NULL return(out) } .getMisc.lavaan <- function(object){ # extract (nonfree) parameter estimates pt <- lavaan::parTable(object) isFree <- pt[["free"]] > 0 & !duplicated(pt[["free"]]) # see lavaan:::lav_object_inspect_coef isDefined <- pt[["op"]] == ":=" # NOTE: for now, exclude parameters referring to exogenous variables # (can be included in misc.) isExo <- pt[["exo"]] == 1 isCoef <- isFree | isDefined hasGroups <- lavaan::lavInspect(object, "ngroups") > 1 hasLevels <- lavaan::lavInspect(object, "nlevels") > 1 # parameter names nms <- pt[, c("lhs", "op", "rhs")] if(hasLevels) nms[["level"]] <- paste0(".l", pt[, "level"]) if(hasGroups) nms[["group"]] <- paste0(".g", pt[, "group"]) nms <- do.call(mapply, c(as.list(nms), list(FUN = paste0))) out <- pt[!isCoef & !isExo, "est"] names(out) <- nms[!isCoef & !isExo] # preserve user-defined parameter labels hasLabels <- any(nchar(pt[!isCoef & !isExo, "label"]) > 0) if(hasLabels) attr(out, "par.labels") <- pt[!isCoef & !isExo, "label"] return(out) } # * survival::coxph (null models only) .getCoef.coxph.null <- function(object, ...) return(numeric(0)) .getVcov.coxph.null <- function(object, ...) return(matrix(NA_real_, 0, 0)) mitml/R/internal-methods-likelihood.R0000644000176200001440000003407014116643515017330 0ustar liggesusers# *** # global functions # # * log-likelihood .evaluateLogLik <- function(model){ cls <- class(model[[1]]) # evaluate log-likelihood ll <- lapply(model, .getLL) if(any(sapply(ll, is.null))) stop("Could not evaluate likelihood for objects of class '", paste0(cls, collapse = "|"), "'. Please see '?testModels' for a list of supported model types.") df <- attr(ll[[1]], "df") # ensure proper dimensions ll <- unlist(ll) out <- list(LL = ll, df = df) return(out) } # * log-likelihood evaluated at user-defined values .evaluateUserLogLik <- function(model){ m <- length(model) cls <- class(model[[1]]) # extract arguments and function for likelihood evaluation ll.args <- lapply(model, .getArgsLL) if(any(sapply(ll.args, is.null))) stop("Could not evaluate likelihood for objects of class '", paste0(cls, collapse = "|"), "'. Please see '?testModels' for a list of supported model types.") narg <- length(ll.args[[1]][["parameters"]]) nms <- names(ll.args[[1]][["parameters"]]) # evaluate log-likelihood at imputation-specific parameter values ll <- lapply(lapply(ll.args, c, force.update = FALSE), do.call, what = .getUserLL) df <- attr(ll[[1]], "df") ll <- unlist(ll) # pool parameter estimates psi.bar <- vector("list", narg) names(psi.bar) <- nms for(i in seq_along(psi.bar)){ psi <- lapply(ll.args, function(x, .i) x$parameters[[.i]], .i = i) isMatrix <- is.matrix(psi[[1]]) if(isMatrix){ q <- nrow(psi[[1]]) pp <- array(unlist(psi), dim = c(q, q, m)) pp <- apply(pp, c(1, 2), mean) rownames(pp) <- colnames(pp) <- names(psi[[1]]) psi.bar[[i]] <- pp }else{ q <- length(psi[[1]]) pp <- matrix(unlist(psi), nrow = q, ncol = m) pp <- apply(pp, 1, mean) names(pp) <- names(psi[[1]]) psi.bar[[i]] <- pp } } # evaluate log-likelihood at pooled parameter estimates for(i in seq_len(m)) ll.args[[i]]$parameters <- psi.bar ll.pooled <- sapply(ll.args, do.call, what = .getUserLL) out <- list(LL = ll, LL.pooled = ll.pooled, df = df) return(out) } # * log-likelihood evaluated with stacked data sets .evaluateStackedLogLik <- function(model, datalist = NULL){ m <- length(model) cls <- class(model[[1]]) # evaluate log-likelihood ll <- lapply(model, .getLL) if(any(sapply(ll, is.null))) stop("Could not evaluate likelihood for objects of class '", paste0(cls, collapse = "|"), "'. Please see '?testModels' for a list of supported model types.") df <- attr(ll[[1]], "df") # ensure proper dimensions ll <- unlist(ll) # extract data for stacking nullData <- is.null(datalist) if(nullData) datalist <- lapply(model, .getDataLL) # check data if(!is.list(datalist) || length(datalist) != m || !all(sapply(datalist, is.data.frame))){ if(nullData){ stop("Could not extract data from fitted model objects. Please specify 'data' and see '?testModels' for details.") }else{ stop("The 'data' argument must be a list of imputed data sets that correspond to the fitted model objects. Please see '?testModels' for details.") } } # evaluate log-likelihood with stacked data model.stacked <- .updateStackedLL(model[[1]], datalist = datalist) ll.stacked <- .getLL(model.stacked) / m out <- list(LL = ll, LL.stacked = ll.stacked, df = df) return(out) } # *** # generic functions # .getLL <- function(object, ...) UseMethod(".getLL", object) .getArgsLL <- function(object, ...) UseMethod(".getArgsLL", object) .getUserLL <- function(object, ...) UseMethod(".getUserLL", object) .getDataLL <- function(object, ...) UseMethod(".getDataLL", object) .updateStackedLL <- function(object, ...) UseMethod(".updateStackedLL", object) # *** # default methods # .getLL.default <- function(object) return(logLik(object)) .getArgsLL.default <- function(object) return(NULL) .getUserLL.default <- function(object, ...) return(NULL) .getDataLL.default <- function(object) return(model.frame(object)) .updateStackedLL.default <- function(object, datalist) return(update(object, data = do.call(rbind, datalist))) # *** # class-specific methods # # * stats::lm .getArgsLL.lm <- function(object){ # extract arguments to evaluate LL n <- nrow(object$model) beta <- coef(object) sigma2 <- sum(resid(object)^2) / n out <- list(object = object, parameters = list(beta = beta, sigma2 = sigma2)) return(out) } .getUserLL.lm <- function(object, parameters, ...){ n <- nrow(object$model) df <- object$rank + 1 trm <- attributes(object$terms) # model matrices y <- eval(trm$variables, envir = object$model)[[trm$response]] X <- model.matrix(object) # parameters beta <- parameters[["beta"]] sigma2 <- parameters[["sigma2"]] ll <- .logLik_lm(y = y, X = X, beta = beta, sigma2 = sigma2) attr(ll, "df") <- df return(ll) } .logLik_lm <- function(y, X, beta, sigma2){ n <- length(y) - (n/2) * log(2*pi*sigma2) - (1/(2*sigma2)) * sum((y - X %*% beta)^2) } # * stats::glm .getArgsLL.glm <- function(object) return(NULL) # * geepack::geeglm .getLL.geeglm <- function(object) return(NULL) .getArgsLL.geeglm <- function(object) return(NULL) # * lme4::lmer (for only LMMs) .getArgsLL.lmerMod <- function(object){ beta <- lme4::getME(object, "fixef") theta <- lme4::getME(object, "theta") sig <- sigma(object) # split theta by clustering variables cl <- lme4::getME(object, "cnms") ncl <- length(cl) nvc <- lengths(cl) theta.cl <- split(theta, rep.int(seq_along(cl), (nvc * (nvc + 1))/2)) # transform theta from scaled Cholesky factors into variance-covariance matrices (for pooling) Tau <- vector("list", ncl) names(Tau) <- paste0("Tau", seq_len(ncl)) for(i in seq_len(ncl)){ q <- sqrt(2*length(theta.cl[[i]]) + 0.25) - 0.5 m <- matrix(0, nrow = q, ncol = q) m[lower.tri(m, diag = TRUE)] <- theta.cl[[i]] * sig Tau[[i]] <- m %*% t(m) } out <- list(object = object, parameters = c(list(beta = beta), Tau, list(sigma2 = sig^2))) return(out) } .getUserLL.lmerMod <- function(object, parameters, force.update = TRUE, ...){ if(any(abs(lme4::getME(object, "offset") - 0) > .Machine$double.eps)) stop("The 'D3' method cannot be used for 'lmerMod' objects fitted with an offset.") cl <- lme4::getME(object, "cnms") ncl <- length(cl) # evaluate standard logLik ll0 <- logLik(object) df <- attr(ll0, "df") if(force.update){ # get fixed-effects linear predictor X <- lme4::getME(object, "X") beta <- parameters$beta linpred <- X %*% beta # update formula fml <- as.formula(sub("~", "~ 0 +", deparse(formula(object, random.only = TRUE)))) # update model with fixed contribution of fixed effects newobj <- .localUpdate(object, formula = fml, data = model.frame(object), offset = linpred) # get variance components Tau <- parameters[grep("^Tau", names(parameters))] sig <- sqrt(parameters$sigma2) # transform variance-covariance matrices into correlations and SDs (for devfun) theta.cl <- vector("list", ncl) for(i in seq_len(ncl)){ v <- Tau[[i]] r <- lme4::cov2sdcor(v) theta.cl[[i]] <- r[lower.tri(r, diag = TRUE)] } theta <- c(do.call(c, theta.cl), sig) # evaluate (profiled) deviance with fixed theta dev.fun <- lme4::devfun2(newobj) ll <- -dev.fun(pars = theta) / 2 attr(ll, "df") <- df }else{ ll <- ll0[1] attr(ll, "df") <- df } return(ll) } # * lme4::(g)lmer (for both LMMs and GLMMs) .updateStackedLL.merMod <- function(object, datalist){ # create imputation-specific levels for clustering variables cl <- lme4::getME(object, "cnms") for(ii in seq_along(datalist)){ for(cc in names(cl)){ datalist[[ii]][,cc] <- paste0("imp", ii, "_", datalist[[ii]][,cc]) } } # stack data stackdat <- do.call(rbind, datalist) for(cc in names(cl)) stackdat[,cc] <- as.integer(as.factor(stackdat[,cc])) # update model with stacked data # NOTE: update.merMod will find global objects of the same name before local ones (very bad), # so we need to update in a separate environment env <- new.env() assign("stackdat", value = stackdat, envir = env) newobj <- .localUpdate(object, envir = env, data = stackdat) return(newobj) } # * nlme::lme .getArgsLL.lme <- function(object){ beta <- nlme::fixef(object) Tau <- .listVC_lme(object) names(Tau) <- paste0("Tau", seq_along(Tau)) sigma2 <- sigma(object)^2 out <- list(object = object, parameters = c(list(beta = beta), Tau, list(sigma2 = sigma2))) return(out) } .getUserLL.lme <- function(object, parameters, ...){ ncl <- object$dims$Q # see nlme:::print.summary.lme if(ncl > 1) stop("The 'D3' method is only supported for models of class 'lme' with a single cluster variable. Please see '?testModels' for a list of supported model types.") # evaluate standard logLik p <- object$dims$ncol[[object$dims$Q + 1]] # see nlme:::logLik.lme fixedSigma <- attr(object[["modelStruct"]], "fixedSigma") df <- p + length(coef(object[["modelStruct"]])) + as.integer(!fixedSigma) # response and cluster variables y <- nlme::getResponse(object) clus <- nlme::getGroups(object) # fixed and random effects formulas fe.fml <- eval(eval(object$call$fixed)[-2]) # see nlme:::predict.lme re.str <- object$modelStruct$reStruct # fixed effects and design matrix X <- model.matrix(fe.fml, object$data) beta <- parameters$beta # random effects variance components and design matrix Z <- model.matrix(re.str, object$data) Tau <- parameters[[grep("^Tau", names(parameters))]] sigma2 <- parameters$sigma2 # evaluate log-likelihood ll <- .logLik_lmm(y = y, X = X, Z = Z, cluster = clus, beta = beta, Tau = Tau, sigma2 = sigma2) attr(ll, "df") <- df return(ll) } .getDataLL.lme <- function(object){ out <- nlme::getData(object) return(out) } .updateStackedLL.lme <- function(object, datalist){ # add levels to clustering variables re <- rev(object$modelStruct$reStruct) # see nlme:::VarCorr.lme cl <- names(re) for(ii in seq_along(datalist)){ for(cc in cl){ datalist[[ii]][,cc] <- paste0("imp", ii, "_", datalist[[ii]][,cc]) } } # update model with stacked data stackdat <- do.call(rbind, datalist) for(cc in names(cl)) stackdat[,cc] <- as.integer(as.factor(stackdat[,cc])) newobj <- update(object, data = stackdat) return(newobj) } .logLik_lmm <- function(y, X, Z, cluster, beta, Tau, sigma2){ p <- length(beta) q <- dim(Tau)[1] y <- split(y, cluster) X <- split(X, cluster) Z <- split(Z, cluster) lvls <- unique(cluster) L <- numeric(length(lvls)) for(i in seq_along(lvls)){ yi <- y[[i]] ni <- length(yi) Xi <- matrix(X[[i]], nrow = ni, ncol = p) Ri <- yi - Xi%*%beta Zi <- matrix(Z[[i]], nrow = ni, ncol = q) V <- diag(sigma2, ni) + Zi %*% Tau %*% t(Zi) Vinv <- chol2inv(chol(V)) dV <- determinant(V, logarithm = TRUE) dV <- dV$modulus * dV$sign L[i] <- dV + t(Ri) %*% Vinv %*% (Ri) } -sum(L)/2 } # * lavaan::lavaan .getLL.lavaan <- function(object){ # FIXME: catch scaled LRT statistics (currently not supported) # see lavaan::lavTestLRT tests <- unlist(sapply(slot(object, "test"), "[", "test")) isScaled <- c("satorra.bentler", "yuan.bentler", "yuan.bentler.mplus", "mean.var.adjusted", "scaled.shifted") %in% tests if(any(isScaled)){ return(NULL) } ll <- lavaan::logLik(object) return(ll) } .getArgsLL.lavaan <- function(object){ # FIXME: catch scaled LRT statistics (currently not supported) # see lavaan::lavTestLRT tests <- unlist(sapply(slot(object, "test"), "[", "test")) isScaled <- c("satorra.bentler", "yuan.bentler", "yuan.bentler.mplus", "mean.var.adjusted", "scaled.shifted") %in% tests if(any(isScaled)){ return(NULL) } # get parameter table pt <- lavaan::parTable(object) isFree <- pt[["free"]] > 0 out <- list(object = object, parameters = list(free = pt[isFree, "est"])) return(out) } .getUserLL.lavaan <- function(object, parameters, force.update = TRUE, ...){ ll0 <- lavaan::logLik(object) df <- attr(ll0, "df") if(force.update){ # get parameter table pt <- lavaan::parTable(object) isFree <- pt[["free"]] > 0 isConstraint <- pt[["op"]] %in% c(":=", "==", "<", ">") # fix free parameters to user-defined values pt[isFree, c("est", "se", "start")] <- NA pt[isFree, "ustart"] <- parameters$free[pt[isFree, "free"]] pt[["free"]] <- 0 pt[["user"]] <- 1 # remove defined parameters pt <- pt[!isConstraint,] # extract data data <- .restoreData_lavaan(object) # update model with fixed parameters newobj <- .localUpdate(object, model = pt, data = data) ll <- lavaan::logLik(newobj)[1] }else{ ll <- ll0[1] } attr(ll, "df") <- df return(ll) } .getDataLL.lavaan <- function(object){ out <- .restoreData_lavaan(object) return(out) } .updateStackedLL.lavaan <- function(object, datalist){ # create imputation-specific levels for clustering variables cl <- lavaan::lavInspect(object, "cluster") hasLevels <- length(cl) > 0 if(hasLevels){ # add levels to clustering variables for(ii in seq_along(datalist)) datalist[[ii]][,cl] <- paste0("imp", ii, "_", datalist[[ii]][,cl]) # stack data stackdat <- do.call(rbind, datalist) stackdat[,cl] <- as.integer(as.factor(stackdat[,cl])) }else{ stackdat <- do.call(rbind, datalist) } # update model with stacked data newobj <- .localUpdate(object, data = stackdat) return(newobj) } .restoreData_lavaan <- function(object){ # extract data data <- lavaan::lavInspect(object, "data") grp <- lavaan::lavInspect(object, "group") cl <- lavaan::lavInspect(object, "cluster") # re-add group and cluster indicators hasGroups <- length(grp) > 0 hasLevels <- length(cl) > 0 data <- if(hasGroups) lapply(data, as.data.frame) else as.data.frame(data) if(hasGroups){ grp.nms <- lavaan::lavInspect(object, "group.label") for(ii in seq_along(grp.nms)) data[[ii]][,grp] <- grp.nms[ii] data <- do.call(rbind, data) } if(hasLevels){ cc <- lavaan::lavInspect(object, "cluster.label") if(hasGroups) cc <- do.call(c, cc) data[,cl] <- cc } return(data) } mitml/R/clusterMeans.R0000644000176200001440000000220014002024253014347 0ustar liggesusersclusterMeans <- function(x, cluster, adj = FALSE, group = NULL){ # calculate cluster means # get objects if names are given isname <- c(length(x) == 1, length(cluster) == 1, length(group) == 1) & c(is.character(x), is.character(cluster), is.character(group)) if(any(isname)){ parent <- parent.frame() if(isname[1]) x <- eval(parse(text = x), parent) if(isname[2]) cluster <- eval(parse(text = cluster), parent) if(isname[3]) group <- eval(parse(text = group), parent) } # prepare group if(!is.null(group)) { if(is.character(group)) group <- as.factor(group) if(is.factor(group)) group <- as.integer(group) ngr <- length(unique(group)) } # format cluster (and groups) if(!is.numeric(cluster)) cluster <- as.integer(cluster) if(!is.null(group)) cluster <- cluster + group/(ngr+1) cluster <- match(cluster, unique(cluster)) n.obs <- rowsum(as.integer(!is.na(x)), cluster) gm <- rowsum(x, cluster, na.rm = T)/n.obs gm[is.nan(gm)] <- NA gm <- gm[cluster] if(adj){ n.obs <- n.obs[cluster] ((n.obs * gm) - x)/(n.obs - 1) }else{ gm } } mitml/R/print.mitml.R0000644000176200001440000000243114001605756014201 0ustar liggesusersprint.mitml <- function(x, ...){ # print method for objects of class "mitml" cl <- x$call vrs <-x$model itr <- x$iter ngr <- length(unique(attr(x$data, "group"))) isML <- attr(x$model, "is.ML") isL2 <- attr(x$model, "is.L2") cat("\nCall:\n", paste(deparse(cl)), sep = "\n") cat("\n") if(isL2) cat("Level 1:\n", collapse = "\n") if(isML) cat(formatC("Cluster variable:", width=-25), vrs$clus, sep = " ", collapse = "\n") cat(formatC("Target variables:", width=-25), vrs$yvrs, collapse = "\n") cat(formatC("Fixed effect predictors:", width=-25), vrs$pvrs, collapse = "\n") if(isML) cat(formatC("Random effect predictors:", width=-25), vrs$qvrs, collapse = "\n") if(isL2){ cat("\n") cat(formatC("Level 2:\n", width=-25), collapse = "\n") cat(formatC("Target variables:", width=-25), vrs$yvrs.L2, collapse = "\n") cat(formatC("Fixed effect predictors:", width=-25), vrs$pvrs.L2, collapse = "\n") } cat("\nPerformed", sprintf("%.0f", itr$burn), "burn-in iterations, and generated", sprintf("%.0f", itr$m), "imputed data sets,\neach", sprintf("%.0f", itr$iter), "iterations apart.", if(ngr>1){c("\nImputations were carried out seperately within", sprintf("%.0f", ngr), "groups.\n")}, "\n") invisible(NULL) } mitml/R/mitml.list2mids.R0000644000176200001440000000200314127016073014747 0ustar liggesusersmitml.list2mids <- function(x, data, fill = FALSE, where = NULL){ # convert objects of class "mitml.list" to "mids" # check for 'mice' if(!requireNamespace("mice", quietly = TRUE)) stop("The 'mice' package must be installed to use this function.") # check variable names nms.inc <- names(data) nms.imp <- unique(do.call(c, lapply(x, names))) if(any(c(".imp", ".id") %in% nms.inc)) stop("Columns named '.imp' or '.id' are not allowed in 'data'.") if(any(c(".imp", ".id") %in% nms.imp)) stop("Columns named '.imp' or '.id' are not allowed in 'x'.") nms.new <- nms.imp[!nms.imp %in% nms.inc] if(length(nms.new) > 0L){ if(!fill) stop("Some variables in the imputed data ('x') are not present in the original data ('data') Use 'fill = TRUE' to include them.") data[, nms.new] <- NA } # prepare data z <- c(list(data), x) for(i in seq_along(z)){ z[[i]] <- cbind(.imp = i - 1, .id = seq.int(1, nrow(z[[i]])), z[[i]]) } return(mice::as.mids(long = do.call(rbind, z), where = where)) } mitml/R/print.mitml.anova.R0000644000176200001440000000354714002347671015315 0ustar liggesusersprint.mitml.anova <- function(x, digits = 3, sci.limit = 5, ...){ # print method for anova method cll <- x$call test <- x$test fml <- x$formula method <- x$method data <- x$data ariv <- x$ariv order.method <- x$order.method use <- x$use reml <- x$reml m <- x$test[[1]]$m n.tests <- length(fml) # print header cat("\nCall:\n", paste(deparse(cll)), sep = "\n") cat("\nModel comparison calculated from", m, "imputed data sets.") # print method cat("\nCombination method:", method) if(method == "D2") cat(" (", use, ")", sep = "") if(method == "D4" && ariv == "robust") cat(" (robust)", sep = "") cat("\n") # print model formulas cat("\n") for(mm in seq.int(1, n.tests)) cat("Model ", mm, ": ", fml[mm], "\n", sep = "") cat("\n") # combine multiple tests in one table test.tab <- lapply(test, "[[", "test") test.tab <- do.call(rbind, test.tab) rn <- paste0(seq.int(1, n.tests - 1), " vs ", seq.int(2, n.tests), " ") rownames(test.tab) <- rn # format table test.digits <- c(digits, 0, rep(digits, ncol(test.tab)-2)) out <- .formatTable(test.tab, digits = test.digits, sci.limit = sci.limit) for(i in seq_len(nrow(out))) cat(" ", out[i,], "\n") cat("\n") # print footer if(is.null(order.method)){ cat("Models were ordered as provided by the user (by decreasing complexity).\n") }else{ cat("Models were automatically ordered via '", order.method, "' (by decreasing complexity).\n", sep = "") } if(method == "D4"){ if(data){ cat("Data for stacking were extracted from the `data` argument.\n") }else{ cat("Data for stacking were automatically extracted from the fitted models.\n") } } if(reml){ cat("Models originally fit with REML were automatically refit using ML.\n") } cat("\n") invisible() } mitml/R/rbind.mitml.list.R0000644000176200001440000000020614001605436015106 0ustar liggesusersrbind.mitml.list <- function(...){ # merges two objects of class "mitml.list" by appending rows of list entries Map(rbind, ...) } mitml/R/print.mitml.testConstraints.R0000644000176200001440000000306314001605106017376 0ustar liggesusersprint.mitml.testConstraints <- function(x, digits = 3, sci.limit = 5, ...){ # print method for MI estimates cll <- x$call test <- x$test constraints <- x$constraints method <- x$method m <- x$m adj.df <- x$adj.df df.com <- x$df.com # print header cat("\nCall:\n", paste(deparse(cll)), sep = "\n") cat("\nHypothesis test calculated from", m, "imputed data sets. The following\nconstraints were specified:\n\n") # print constrained estimates est <- cbind(x$Qbar, sqrt(diag(x$T))) colnames(est) <- c("Estimate", "Std. Error") rownames(est) <- paste0(constraints, ":") out <- .formatTable(est, digits = digits, sci.limit = sci.limit) for(i in seq_len(nrow(out))) cat(" ", out[i,], "\n") # print method cat("\nCombination method:", method, "\n\n") # print test results test.digits <- c(digits, 0, rep(digits, ncol(test)-2)) out <- .formatTable(test, digits = test.digits, sci.limit = sci.limit) for(i in seq_len(nrow(out))) cat(" ", out[i,], "\n") # print footer if(method == "D1"){ cat("\n") if(adj.df){ cat(c("Hypothesis test adjusted for small samples with", paste0("df=[", paste(df.com, collapse = ","), "]\ncomplete-data degrees of freedom."))) }else{ cat("Unadjusted hypothesis test as appropriate in larger samples.") } cat("\n") } cat("\n") invisible() } summary.mitml.testConstraints <- function(object, ...){ # summary method for objects of class mitml.testConstraints print.mitml.testConstraints(object, ...) } mitml/R/jomoImpute.R0000644000176200001440000004107514001606143014052 0ustar liggesusersjomoImpute <- function(data, type, formula, random.L1 = c("none", "mean", "full"), n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"), silent = FALSE){ # wrapper function for the different samplers of the jomo package # checks arguments if(!missing(type) & !missing(formula)) stop("Only one of 'type' or 'formula' may be specified.") if(save.pred & !missing(type)){ warning("Option 'save.pred' is ignored if 'type' is specified") save.pred = FALSE } random.L1 <- match.arg(random.L1) keep.chains <- match.arg(keep.chains) # convert type if(!missing(type)){ if(!is.null(group)){ gv <- match(group, colnames(data)) if(is.list(type)){ type[[1]][gv] <- -1 }else{ type[gv] <- -1 } warning("The 'group' argument is intended only for 'formula'. Setting 'type' of '", colnames(data)[gv], "' to '-1'.") } formula <- .type2formula(data, type) group <- attr(formula, "group") } # check for number of model equations formula <- .check.model( formula ) isML <- attr(formula, "is.ML") isL2 <- attr(formula, "is.L2") if(!isML && random.L1 != "none") stop("No cluster variable found. Random covariance matrices (random.L1) are not supported for single-level MI and require the specification of a cluster variable.") # objects to assign to clname <- yvrs <- y <- ycat <- zcol <- xcol <- pred <- clus <- psave <- pvrs <- qvrs <- pnames <- qnames <- yvrs.L2 <- y.L2 <- ycat.L2 <- xcol.L2 <- pred.L2 <- pvrs.L2 <- pnames.L2 <- NULL # preserve original order if(!is.data.frame(data)) as.data.frame(data) data <- cbind(data, original.order = 1:nrow(data)) # address additional grouping grname <- group if(is.null(group)){ group <- rep(1, nrow(data)) }else{ if(length(group) > 1) stop("Multiple 'group' variables found. There can be only one!") if(!group %in% colnames(data)) stop("Argument 'group' is not correctly specified.") group <- data[,group] } group.original <- group group <- as.numeric(factor(group, levels = unique(group))) # *** # model input # populate local frame .model.byFormula(data, formula, group, group.original, method = "jomo.matrix") # check model input if(any(is.na(group))) stop("Grouping variable must not contain missing data.") if(any(is.na(pred))) stop("Predictor variables must not contain missing data.") if(any(!sapply(data[yvrs], function(a) is.factor(a) || is.numeric(a)))) stop("Target variables must either be numeric or factors.") if((sum(is.na(y)) + sum(is.na(ycat)) + ifelse(isL2, sum(is.na(y.L2))+sum(is.na(ycat.L2)), 0)) == 0) stop("Target variables do not contain any missing data.") if(any(duplicated(c(yvrs, yvrs.L2)))) stop("Found duplicate target variables.") if(isL2){ if(any(is.na(pred.L2))) stop("Predictor variables must not contain missing data.") if(any(!sapply(data[yvrs.L2], function(a) is.factor(a) || is.numeric(a)))) stop("Target variables must either be numeric or factors.") } # check for L1 variables in L2 models if(isL2){ y.L1 <- !.check.variablesL2(y.L2, clus) x.L1 <- !.check.variablesL2(pred.L2, clus) if(any(y.L1)) stop("Target variables at level 1 are not allowed in level-2 equation.") if(any(x.L1)){ for(i in which(x.L1)) pred.L2[,i] <- clusterMeans(pred.L2[,i], clus) message("NOTE: Predictor variables at level 1 were found in level-2 equation and were replaced with cluster means (", paste0(pvrs.L2[x.L1], collapse = ", "), ").") } } # reorder colums cc <- which(colnames(data) %in% c(clname, grname, yvrs, yvrs.L2)) data.ord <- cbind(data[c(clname, grname, yvrs, yvrs.L2)], data[-cc]) # *** jomo setup # ycat.labels <- lapply(data[, c(colnames(ycat), colnames(ycat.L2)), drop = F], levels) # select function func <- if(ncol(ycat) == 0) "con" else if(ncol(y) == 0) "cat" else "mix" func <- paste0(ifelse(!isML, "jomo1", ifelse(!isL2, "jomo1ran", "jomo2")), if(!isL2) func, if(isL2 & random.L1 == "none") "com", if(random.L1 != "none") "hr", ".MCMCchain") func <- get(func, asNamespace("jomo")) # standard dimensions and data properties ng <- length(unique(group)) np <- length(xcol) nq <- length(zcol) ncon <- ncol(y) ncat <- ncol(ycat) nr <- ncon + ncat # combined con + cat (variables) ynumcat <- matrix(0, ng, ncat) nc <- nr2 <- integer(ng) if(isL2){ np.L2 <- length(xcol.L2) ncon.L2 <- ncol(y.L2) ncat.L2 <- ncol(ycat.L2) nr.L2 <- ncon.L2 + ncat.L2 # combined con + cat (variables) ynumcat.L2 <- matrix(0, ng, ncat.L2) nc.L2 <- nr2.L2 <- integer(ng) }else{ nr2.L2 <- integer(ng) # zero counts for compatibility ncon.L2 <- ncat.L2 <- 0 # of shared code } # ... manage categories groupwise for(gg in unique(group)){ ynumcat[gg,] <- apply(ycat[group == gg, , drop = F], 2, FUN = function(x) length(unique(x[!is.na(x)]))) nc[gg] <- length(unique(clus[group == gg])) nr2[gg] <- ncon+sum(ynumcat[gg,])-length(ynumcat[gg,]) # combined con + cat (indicators) if(isL2){ ynumcat.L2[gg,] <- apply(ycat.L2[group == gg, , drop = F], 2, FUN = function(x) length(unique(x[!is.na(x)]))) nc.L2[gg] <- length(unique(clus[group == gg])) nr2.L2[gg] <- ncon.L2+sum(ynumcat.L2[gg,])-length(ynumcat.L2[gg,]) } } # reduced dimensions dpsi <- max(nr2)*nq+max(nr2.L2) dsig1 <- ifelse(random.L1 == "full", max(nr2)*max(nc), max(nr2)) dsig2 <- max(nr2) if(keep.chains == "diagonal"){ dpsi <- dsig2 <- 1 } # * * * * * * * * * * * * * * * * * * * * # save original seed (if seed is provided) original.seed <- NULL if(!is.null(seed)){ if(exists(".Random.seed", .GlobalEnv)) original.seed <- .Random.seed set.seed(seed) } # priors if(is.null(prior)){ prior <- as.list(unique(group)) for(gg in unique(group)){ prior[[gg]] <- list( Binv = diag(1, nr2[gg]), Dinv = diag(1, nq*nr2[gg]+nr2.L2[gg]) ) if(random.L1 != "none") prior[[gg]]$a <- nr2[gg] if(!isML) prior[[gg]]$Dinv <- NULL } }else{ # check if prior is given as simple list if(!is.list(prior[[1]])) prior <- rep(list(prior), ng) } # prepare output ind <- which(is.na(data.ord), arr.ind = TRUE, useNames = FALSE) ind <- ind[ ind[,2] %in% which(colnames(data.ord) %in% c(yvrs, yvrs.L2)), , drop = FALSE ] rpm <- matrix(NA, nrow(ind), m) bpar <- c( list(beta = array( NA, c(np, max(nr2), n.burn, ng) )), if(isL2) list(beta2 = array( NA, c(np.L2, max(nr2.L2), n.burn, ng) )), if(isML) list(psi = array( NA, c(max(nr2)*nq+max(nr2.L2), dpsi, n.burn, ng) )), list(sigma = array( NA, c(dsig1, dsig2, n.burn, ng) )) ) ipar <- c( list(beta = array( NA, c(np, max(nr2), n.iter*m, ng) )), if(isL2) list(beta2 = array( NA, c(np.L2, max(nr2.L2), n.iter*m, ng) )), if(isML) list(psi = array( NA, c(max(nr2)*nq+max(nr2.L2), dpsi, n.iter*m, ng) )), list(sigma = array( NA, c(dsig1, dsig2, n.iter*m, ng) )) ) # burn-in if(!silent){ cat("Running burn-in phase ...\n") flush.console() } glast <- as.list(unique(group)) for(gg in unique(group)){ gi <- group == gg gprior <- prior[[gg]] # function arguments (group specific) gclus <- clus[gi] gclus <- matrix( match(gclus, unique(gclus))-1, ncol = 1 ) func.args <- list( Y = if(ncon>0 & ncat == 0 & !isL2) y[gi, , drop = F] else NULL, Y.con = if(ncon>0 & (ncat>0 | isL2)) y[gi, , drop = F] else NULL, Y.cat = if(ncat>0) ycat[gi, , drop = F] else NULL, Y.numcat = if(ncat>0) ynumcat[gg,] else NULL, Y2.con = if(ncon.L2>0) y.L2[gi, , drop = F] else NULL, Y2.cat = if(ncat.L2>0) ycat.L2[gi, , drop = F] else NULL, Y2.numcat = if(ncat.L2>0) ynumcat.L2[gg,] else NULL, X = pred[gi, xcol, drop = F], X2 = if(isL2) pred.L2[gi, xcol.L2, drop = F] else NULL, Z = if(isML) pred[gi, zcol, drop = F] else NULL, clus = if(isML) gclus else NULL, beta.start = matrix(0, np, nr2[gg]), l2.beta.start = if(isL2) matrix(0, np.L2, nr2.L2[gg]) else NULL, u.start = if(isML) matrix(0, nc[gg], nq*nr2[gg]+nr2.L2[gg]) else NULL, l1cov.start = if(random.L1 != "none"){ matrix(diag(1, nr2[gg]), nr2[gg]*nc[gg], nr2[gg], byrow = T) }else{ diag(1, nr2[gg]) }, l2cov.start = if(isML) diag(1, nq*nr2[gg]+nr2.L2[gg]) else NULL, start.imp = NULL, l2.start.imp = NULL, l1cov.prior = gprior$Binv, l2cov.prior = gprior$Dinv, a = gprior$a, meth = if(random.L1 != "none") "random" else NULL, nburn = n.burn, output = 0 ) func.args <- func.args[!sapply(func.args, is.null)] cur <- do.call( func, func.args ) glast[[gg]] <- cur # current parameter dimensions (group-specific) bdim <- dim(cur$collectbeta)[1:2] pdim <- dim(cur$collectcovu)[1:2] sdim <- dim(cur$collectomega)[1:2] # save chains for beta bpar[["beta"]][1:bdim[1], 1:bdim[2], , gg] <- cur$collectbeta # ... covariance matrix at L2 if(isML){ if(keep.chains == "diagonal"){ bpar[["psi"]][1:pdim[1], 1, , gg] <- .adiag(cur$collectcovu) }else{ bpar[["psi"]][1:pdim[1], 1:pdim[2], , gg] <- cur$collectcovu } } # ... random covariance matrices at L1 if(random.L1 == "mean"){ tmp <- cur$collectomega dim(tmp) <- c(nr2[gg], nc[gg], nr2[gg], n.burn) if(keep.chains == "diagonal"){ bpar[["sigma"]][1:sdim[2], 1, , gg] <- .adiag(apply(tmp, c(1, 3, 4), mean)) }else{ bpar[["sigma"]][1:sdim[2], 1:sdim[2], , gg] <- apply(tmp, c(1, 3, 4), mean) } }else{ if(keep.chains == "diagonal"){ bpar[["sigma"]][1:sdim[1], 1, , gg] <- .adiag(cur$collectomega, stacked=(random.L1 == "full")) }else{ bpar[["sigma"]][1:sdim[1], 1:sdim[2], , gg] <- cur$collectomega } } # ... L2 model if(isL2){ bdim2 <- dim(cur$collect.l2.beta)[1:2] bpar[["beta2"]][1:bdim2[1], 1:bdim2[2], , gg] <- cur$collect.l2.beta } } # imputation for(ii in 1:m){ if(!silent){ cat("Creating imputed data set (", ii, "/", m, ") ...\n") flush.console() } gy.imp <- as.list(unique(group)) for(gg in unique(group)){ gi <- group == gg gprior <- prior[[gg]] # last state (imp) last.imp <- if(isL2 | ncat>0) glast[[gg]]$finimp.latnorm else glast[[gg]]$finimp if(ncon>0 & ncat == 0 & !isL2){ last.imp <- last.imp[(nrow(y[gi, , drop = F])+1):nrow(last.imp), 1:ncon, drop = F] } last.imp.L2 <- if(isL2) glast[[gg]]$l2.finimp.latnorm else NULL # function arguments (group specific) gclus <- clus[gi] gclus <- matrix( match(gclus, unique(gclus))-1, ncol = 1 ) it <- dim(glast[[gg]]$collectbeta)[3] func.args <- list( Y = if(ncon>0 & ncat == 0 & !isL2) y[gi, , drop = F] else NULL, Y.con = if(ncon>0 & (ncat>0 | isL2)) y[gi, , drop = F] else NULL, Y.cat = if(ncat>0) ycat[gi, , drop = F] else NULL, Y.numcat = if(ncat>0) ynumcat[gg,] else NULL, Y2.con = if(ncon.L2>0) y.L2[gi, , drop = F] else NULL, Y2.cat = if(ncat.L2>0) ycat.L2[gi, , drop = F] else NULL, Y2.numcat = if(ncat.L2>0) ynumcat.L2[gg,] else NULL, X = pred[gi, xcol, drop = F], X2 = if(isL2) pred.L2[gi, xcol.L2, drop = F] else NULL, Z = if(isML) pred[gi, zcol, drop = F] else NULL, clus = if(isML) gclus else NULL, beta.start=.extractMatrix(glast[[gg]]$collectbeta, it), l2.beta.start=.extractMatrix(glast[[gg]]$collect.l2.beta, it), u.start=.extractMatrix(glast[[gg]]$collectu, it), l1cov.start=.extractMatrix(glast[[gg]]$collectomega, it), l2cov.start=.extractMatrix(glast[[gg]]$collectcovu, it), start.imp = last.imp, l2.start.imp = last.imp.L2, l1cov.prior = gprior$Binv, l2cov.prior = gprior$Dinv, a = gprior$a, meth = if(random.L1 != "none") "random" else NULL, nburn = n.iter, output = 0 ) func.args <- func.args[!sapply(func.args, is.null)] cur <- do.call( func, func.args ) glast[[gg]] <- cur # save imputations ri <- (sum(gi)+1):nrow(cur$finimp) ci <- which(colnames(cur$finimp) %in% c(yvrs, yvrs.L2)) gy.imp[[gg]] <- cur$finimp[ri, ci, drop = F] # current parameter dimensions (group-specific) bdim <- dim(cur$collectbeta)[1:2] pdim <- dim(cur$collectcovu)[1:2] sdim <- dim(cur$collectomega)[1:2] # save chains for beta iind <- (n.iter*(ii-1)+1):(n.iter*ii) ipar[["beta"]][1:bdim[1], 1:bdim[2], iind, gg] <- cur$collectbeta # ... covariance matrix at L2 if(isML){ if(keep.chains == "diagonal"){ ipar[["psi"]][1:pdim[1], 1, iind, gg] <- .adiag(cur$collectcovu) }else{ ipar[["psi"]][1:pdim[1], 1:pdim[2], iind, gg] <- cur$collectcovu } } # ... random covariance matrices at L1 if(random.L1 == "mean"){ tmp <- cur$collectomega dim(tmp) <- c(nr2[gg], nc[gg], nr2[gg], n.iter) if(keep.chains == "diagonal"){ ipar[["sigma"]][1:sdim[2], 1, iind, gg] <- .adiag(apply(tmp, c(1, 3, 4), mean)) }else{ ipar[["sigma"]][1:sdim[2], 1:sdim[2], iind, gg] <- apply(tmp, c(1, 3, 4), mean) } }else{ if(keep.chains == "diagonal"){ ipar[["sigma"]][1:sdim[1], 1, iind, gg] <- .adiag(cur$collectomega, stacked=(random.L1 == "full")) }else{ ipar[["sigma"]][1:sdim[1], 1:sdim[2], iind, gg] <- cur$collectomega } } # ... L2 model if(isL2){ bdim2 <- dim(cur$collect.l2.beta)[1:2] ipar[["beta2"]][1:bdim2[1], 1:bdim2[2], iind, gg] <- cur$collect.l2.beta } } y.imp <- data.matrix(do.call(rbind, gy.imp)) rpm[, ii] <- y.imp[, c(yvrs, yvrs.L2)][is.na(data.ord[, c(yvrs, yvrs.L2), drop = F])] } if(!silent){ cat("Done!\n") } # clean up srt <- data.ord[,ncol(data.ord)] data.ord <- data.ord[,-ncol(data.ord)] # restore original seed (if seed was provided) if(!is.null(seed)){ if(is.null(original.seed)){ rm(".Random.seed", envir = .GlobalEnv) }else{ assign(".Random.seed", original.seed, envir=.GlobalEnv) } } # *** prepare output # # save pred if( save.pred & !missing(formula) ){ ps1 <- colnames(pred) %in% psave ps2 <- (colnames(pred.L2) %in% psave) & !(colnames(pred.L2) %in% colnames(pred)[ps1]) data.ord <- cbind(data.ord, pred[,ps1, drop = F]) if(isL2) cbind(data.ord, pred.L2[,ps2, drop = F]) } # ordering attr(data.ord, "sort") <- srt attr(data.ord, "group") <- group.original # categorical variables if(ncat>0 | ncat.L2>0){ attr(data.ord, "cvrs") <- names(ycat.labels) attr(data.ord, "levels") <- cbind(ynumcat, if(isL2) ynumcat.L2) attr(data.ord, "labels") <- ycat.labels } # model summary model <- list(clus = clname, yvrs = yvrs, pvrs = pvrs, qvrs = qvrs, yvrs.L2 = if(isL2) yvrs.L2 else NULL, pvrs.L2 = if(isL2) pvrs.L2 else NULL) attr(model, "is.ML") <- isML attr(model, "is.L2") <- isL2 attr(model, "full.names") <- list(pvrs = pnames, qvrs = qnames, pvrs.L2 = if(isL2) pnames.L2 else NULL) out <- list( data = data.ord, replacement.mat = rpm, index.mat = ind, call = match.call(), model = model, random.L1 = random.L1, prior = prior, iter = list(burn = n.burn, iter = n.iter, m = m), keep.chains = keep.chains, par.burnin = bpar, par.imputation = ipar ) class(out) <- c("mitml", "jomo") return(out) } mitml/R/subset.mitml.list.R0000644000176200001440000000154314001604243015315 0ustar liggesuserssubset.mitml.list <- function(x, subset, select, ...){ # subset list of multiply imputed data sets # NOTE: code adapted from subset.data.frame (by Peter Dalgaard and Brian Ripley) rind <- if (missing(subset)) { lapply(x, function(i) rep(TRUE, nrow(i))) } else { ss <- substitute(subset) rind <- lapply(x, function(i) eval(ss, i, parent.frame())) if (!is.logical(unlist(rind))) stop("'subset' must be logical") lapply(rind, function(i) i & !is.na(i)) } cind <- if (missing(select)) { lapply(x, function(i) TRUE) } else { nl <- lapply(x, function(i){ l <- as.list(seq_along(i)) names(l) <- names(i) l }) se <- substitute(select) lapply(nl, function(i) eval(se, i, parent.frame())) } res <- lapply(seq_along(x), function(i) x[[i]][rind[[i]], cind[[i]], drop = FALSE]) as.mitml.list(res) } mitml/R/write.mitmlSPSS.R0000644000176200001440000000620714002017533014704 0ustar liggesuserswrite.mitmlSPSS <- function(x, filename, sep = "\t", dec = ".", na.value=-999, syntax = TRUE, locale = NULL){ # write text file to be read into SPSS if(!inherits(x, "mitml") && !inherits(x, "mitml.list")) stop("'x' must be of class 'mitml' or 'mitml.list'.") if(!dec %in% c(",", ".")) stop("Only a dot '.' or a comma ',' may be specified as decimal separator.") if(inherits(x, "mitml")){ x <- mitmlComplete(x, "all", force.list = TRUE) } for(ii in 1:length(x)){ x[[ii]] <- cbind(ii-1, x[[ii]]) colnames(x[[ii]])[1] <- "Imputation_" } out <- do.call(rbind, x) num <- sapply(out, is.numeric) chr <- sapply(out, is.character) fac <- sapply(out, is.factor) ord <- sapply(out, is.ordered) # convert factors conv <- as.list(which(fac)) for(ff in which(fac)){ out[,ff] <- as.factor(out[,ff]) conv[[colnames(out)[ff]]] <- matrix(c(levels(out[,ff]), 1:nlevels(out[,ff])), ncol = 2) out[,ff] <- as.numeric(out[,ff]) } ds <- paste(filename, ".dat", sep = "") out[is.na(out)] <- na.value write.table(out, file = ds, sep = sep, dec = dec, col.names = T, row.names = F, quote = F) # gerate syntax if(syntax){ sf <- paste(filename, ".sps", sep = "") if(dec == ".") d <- "DOT" else d <- "COMMA" cat(file = sf, "SET DECIMAL", d, ".\n") if(!is.null(locale)) cat(file = sf, "SET LOCALE", locale, ".\n", append = T) cat(file = sf, "\n", append = T) cat(file = sf, append = T, "GET DATA\n", "/TYPE=TXT\n", paste("/FILE=\"", ds, "\"\n", sep = ""), "/DELCASE=LINE\n", paste("/DELIMITERS=\"", sub("\t", "\\\\t", sep), "\"\n", sep = ""), "/ARRANGEMENT=DELIMITED\n", "/FIRSTCASE=2\n", "/IMPORTCASE=ALL\n", "/VARIABLES=" ) # class specific format width <- sapply(as.matrix(out)[1,], nchar, type = "width") width[chr] <- sapply(out[,chr, drop = FALSE], function(z) max(nchar(z, type = "width"))) fmt <- data.frame(v = colnames(out), f = character(ncol(out)), stringsAsFactors = F) fmt[num|fac|ord, "f"] <- paste("F", width[num|fac|ord]+3, ".2", sep = "") fmt[chr, "f"] <- paste("A", width[chr], sep = "") fmt[num, "l"] <- "SCALE" fmt[fac|chr, "l"] <- "NOMINAL" fmt[ord, "l"] <- "ORDINAL" fmt[1, "l"] <- "NOMINAL" cat(file = sf, "\n ", append = T) cat(file = sf, paste(fmt$v, fmt$f, collapse = "\n "), ".\n\n", append = T) cat(file = sf, append = T, sep = "", "CACHE .\n", "EXECUTE .\n", "DATASET NAME panImpute1 WINDOW=FRONT .\n\n" ) # value labels cat(file = sf, "VALUE LABELS", append = T) for(cc in 1:length(conv)){ cat(file = sf, "\n", paste("/", names(conv)[cc], sep = ""), append = T) for(rr in 1:nrow(conv[[cc]])){ cat(file = sf, "\n", conv[[cc]][rr, 2], paste("\'", conv[[cc]][rr, 1], "\'", sep = ""), append = T) } } cat(file = sf, " .\n\n", append = T) # missing values cat(file = sf, append = T, "MISSING VALUES\n", paste(fmt$v[num|fac|ord], collapse = " "), paste("(", na.value, ")", sep = ""), "\n", paste(fmt$v[chr], collapse = " "), paste("(\"", na.value, "\")", sep = ""), ".\n" ) } invisible() } mitml/R/print.mitml.summary.R0000644000176200001440000001121714001605764015676 0ustar liggesusersprint.mitml.summary <- function(x, ...){ # print method for objects of class "summary.mitml" cl <- x$call vrs <- x$model itr <- x$iter ngr <- x$ngr mdr <- x$missing.rates conv <- x$conv isML <- attr(x$model, "is.ML") isL2 <- attr(x$model, "is.L2") # print general information cat("\nCall:\n", paste(deparse(cl)), sep = "\n") cat("\n") if(isL2) cat("Level 1:\n", collapse = "\n") if(isML) cat(formatC("Cluster variable:", width=-25), vrs$clus, sep = " ", collapse = "\n") cat(formatC("Target variables:", width=-25), vrs$yvrs, collapse = "\n") cat(formatC("Fixed effect predictors:", width=-25), vrs$pvrs, collapse = "\n") if(isML) cat(formatC("Random effect predictors:", width=-25), vrs$qvrs, collapse = "\n") if(isL2){ cat("\n") cat(formatC("Level 2:\n", width=-25), collapse = "\n") cat(formatC("Target variables:", width=-25), vrs$yvrs.L2, collapse = "\n") cat(formatC("Fixed effect predictors:", width=-25), vrs$pvrs.L2, collapse = "\n") } cat("\nPerformed", sprintf("%.0f", itr$burn), "burn-in iterations, and generated", sprintf("%.0f", itr$m), "imputed data sets,\neach", sprintf("%.0f", itr$iter), "iterations apart.", if(ngr>1){c("\nImputations were carried out seperately within", sprintf("%.0f", ngr), "groups.")}, "\n") # print convergence diagnostics if(!is.null(conv)){ # note for reduced chains if(x$keep.chains != "full"){ cat("\nNote: Convergence criteria were calculated from a reduced set of\nparameters (setting: ", x$keep.chains, ").\n", sep = "") } for(cc in attr(conv, "stats")){ # summary for Rhat and SDprop if(cc == "Rhat" || cc == "SDprop"){ cout <- matrix(c( sapply(conv, function(z) min(z[,cc])), sapply(conv, function(z) quantile(z[,cc], .25)), sapply(conv, function(z) mean(z[,cc])), sapply(conv, function(z) median(z[,cc])), sapply(conv, function(z) quantile(z[,cc], .75)), sapply(conv, function(z) max(z[,cc])) ), ncol = 6 ) rownames(cout) <- c("Beta:", if(isL2) "Beta2:", if(isML) "Psi:", "Sigma:") colnames(cout) <- c("Min", "25%", "Mean", "Median", "75%", "Max") clab <- switch(cc, Rhat = "\nPotential scale reduction (Rhat, imputation phase):\n", SDprop = "\nGoodness of approximation (imputation phase):\n") cat(clab, "\n") print.table(round(cout, 3)) clab <- switch(cc, Rhat = "\nLargest potential scale reduction:\n", SDprop = "\nPoorest approximation:\n") cat(clab) maxval <- lapply(conv, function(a) a[which.max(a[,cc]), 1:2]) cat("Beta: [", paste(maxval$beta, collapse = ",") ,"], ", if(isL2) paste0("Beta2: [", paste(maxval$beta2, collapse = ",") ,"], "), if(isML) paste0("Psi: [", paste(maxval$psi, collapse = ",") ,"], "), "Sigma: [", paste(maxval$sigma, collapse = ",") ,"]\n", sep = "") } # summary for ACF if(cc == "ACF"){ cout <- c( sapply(conv, function(z) mean(z[,"lag-1"])), sapply(conv, function(z) mean(z[,"lag-k"])), sapply(conv, function(z) mean(z[,"lag-2k"])), sapply(conv, function(z) max(z[,"lag-1"])), sapply(conv, function(z) max(z[,"lag-k"])), sapply(conv, function(z) max(z[,"lag-2k"])) ) neg <- cout<0 cout <- sprintf(cout, fmt = "%.3f") cout[neg] <- gsub("^-0", "-", cout[neg]) cout[!neg] <- gsub("^0", " ", cout[!neg]) cout <- matrix(cout, 2+isML+isL2, 6) cout <- rbind(c(" Lag1", " Lagk", "Lag2k", " Lag1", " Lagk", "Lag2k"), cout) rownames(cout) <- c("", "Beta:", if(isL2) "Beta2:", if(isML) "Psi:", "Sigma:") colnames(cout) <- c(" Mean", "", "", " Max", "", "") cat("\nAutocorrelation (ACF, imputation phase):\n\n") print.table(cout) cat("\nLargest autocorrelation at lag k:\n") maxval <- lapply(conv, function(a) a[which.max(abs(a[,"lag-k"])), 1:2]) cat("Beta: [", paste(maxval$beta, collapse = ",") ,"], ", if(isL2) paste0("Beta2: [", paste(maxval$beta2, collapse = ",") ,"], "), if(isML) paste0("Psi: [", paste(maxval$psi, collapse = ",") ,"], "), "Sigma: [", paste(maxval$sigma, collapse = ",") ,"]\n", sep = "") } } } # missing data rates mdrout <- t(as.matrix(mdr)) rownames(mdrout) <- "MD%" cat("\nMissing data per variable:\n") print.table(mdrout) cat("\n") invisible(NULL) } mitml/R/with.mitml.list.R0000644000176200001440000000120314004241454014760 0ustar liggesuserswith.mitml.list <- function(data, expr, include.data = FALSE, ...){ # evaluates an expression for a list of data sets expr <- substitute(expr) pf <- parent.frame() # check include.data argument if(is.character(include.data)){ name.data <- include.data include.data <- TRUE }else{ name.data <- "data" } out <- if(include.data){ lapply(data, function(d, expr){ expr[[name.data]] <- substitute(d) eval(expr, parent.frame()) }, expr = expr) }else{ lapply(data, function(d, expr, pf) eval(expr, d, pf), expr = expr, pf = pf) } class(out) <- c("mitml.result", "list") return(out) } mitml/R/within.mitml.list.R0000644000176200001440000000112214001606040015300 0ustar liggesuserswithin.mitml.list <- function(data, expr, ignore = NULL, ...){ # evaluate an expression for a list of data sets, then return altered data sets expr <- substitute(expr) parent <- parent.frame() out <- lapply(data, function(x){ e <- evalq(environment(), x, parent) eval(expr, e) l <- as.list(e) l <- l[!sapply(l, is.null)] l[ignore] <- NULL nD <- length(del <- setdiff(names(x), (nl <- names(l)))) x[nl] <- l if(nD){ x[del] <- if(nD == 1){ NULL } else { vector("list", nD) } } x }) class(out) <- c("mitml.list", "list") return(out) } mitml/R/internal-model.R0000644000176200001440000001753714001605317014644 0ustar liggesusers# prepare model input by formula .model.byFormula <- function(data, formula, group, group.original, method = c("pan", "jomo", "jomo.matrix")){ # check model, separate equations formula <- .check.model(formula) isML <- attr(formula, "is.ML") isL2 <- attr(formula, "is.L2") if(isL2){ formula.L2 <- formula[[2]] formula <- formula[[1]] } method <- match.arg(method) # *** evaluate L1 model # ft <- terms(formula) tl <- attr(ft, "term.labels") vrs <- attr(ft, "variables")[-1] nms <- colnames(data) # responses yvrs <- as.character(vrs)[attr(ft, "response")] yvrs <- gsub("[\r\n]", "", yvrs) y.fml <- as.formula(paste0("~", yvrs)) yvrs <- attr(terms(y.fml), "term.labels") # check for untransformed yvrs err <- !(yvrs %in% nms) if(any(err)) stop("Could not find: ", paste0(yvrs[err], collapse = ", "), "). Target variables must be contained in the data set 'as is', and transformations must be applied beforehand.") # cluster id clt <- tl[grep("\\|", tl)] if(method == "pan" & !isML) stop("Cluster indicator not found in formula\n\n", .formula2char(formula), "\n\nPlease specify the cluster indicator and at least one random term using the '|' operator. Single-level imputation is supported by jomoImpute().") # extract and reorder if(isML){ clt <- strsplit( clt, split = "[[:blank:]]*\\|[[:blank:]]*" )[[1]] clname <- clt[2] # order data and grouping data <- data[ order(group, data[,clname]), ] group.original <- group.original[ order(group) ] group <- group[ order(group) ] }else{ clname <- NULL } # predictors: fixed pvrs <- c(if(attr(ft, "intercept")){"(Intercept)"}, tl[!grepl("\\|", tl)]) fe.fml <- c(if(attr(ft, "intercept")){"1"}else{"0"}, tl[!grepl("\\|", tl)]) fe.fml <- as.formula(paste0("~", paste0(fe.fml, collapse = "+"))) # predictors: random if(isML){ cl.fml <- as.formula(paste("~", clt[1])) cl.ft <- terms(cl.fml) qvrs <- c(if(attr(cl.ft, "intercept")){"(Intercept)"}, attr(cl.ft, "term.labels")) }else{ cl.fml <- ~0 qvrs <- NULL } # model matrix for fe and cl attr(data, "na.action") <- identity mmp <- suppressWarnings( model.matrix(fe.fml, data = data) ) mmq <- suppressWarnings( model.matrix(cl.fml, data = data) ) pnames <- colnames(mmp) qnames <- colnames(mmq) psave <- setdiff( c(pnames, qnames), c("(Intercept)", nms) ) switch( method , # panImpute (matrix input) pan={ y <- data.matrix(data[yvrs]) ycat <- NULL }, # jomoImpute, for higher-level functions (data frames, uses jomo for preprocessing) jomo={ y <- data[yvrs] ycat <- NULL }, # jomoImpute, for higher- and lower-level versions (preprocessed matrix input) jomo.matrix={ y <- data.matrix(data[yvrs]) cvrs <- sapply(data[, yvrs, drop = F], is.factor) ycat <- y[,cvrs, drop = F] y <- y[,!cvrs, drop = F] } ) clus <- if(isML) data[,clname] else NULL pred <- cbind(mmp, mmq[,!(qnames%in%pnames), drop = F]) xcol <- which(colnames(pred)%in%pnames) zcol <- which(colnames(pred)%in%qnames) # assign to parent.frame inp <- list( y = y, ycat = ycat, clus = clus, pred = pred, xcol = xcol, zcol = zcol, data = data, group = group, group.original = group.original, psave = psave, clname = clname, yvrs = yvrs, pvrs = pvrs, qvrs = qvrs, pnames = pnames, qnames = qnames ) for(i in names(inp)) assign(i, inp[[i]], pos = parent.frame()) # *** evaluate L2 model # if(isL2){ ft <- terms(formula.L2) tl <- attr(ft, "term.labels") vrs <- attr(ft, "variables")[-1] # responses yvrs <- as.character(vrs)[attr(ft, "response")] yvrs <- gsub("[\r\n]", "", yvrs) y.fml <- as.formula(paste0("~", yvrs)) yvrs <- attr(terms(y.fml), "term.labels") # check for untransformed yvrs err <- !(yvrs %in% nms) if(any(err)) stop("Could not find: ", paste0(yvrs[err], collapse = ", "), "). Target variables must be contained in the data set 'as is', and transformations must be applied beforehand.") # predictors: fixed only at L2 pvrs <- c(if(attr(ft, "intercept")){"(Intercept)"}, tl[!grepl("\\|", tl)]) fe.fml <- c(if(attr(ft, "intercept")){"1"}else{"0"}, tl[!grepl("\\|", tl)]) fe.fml <- as.formula(paste0("~", paste0(fe.fml, collapse = "+"))) # model matrix for FE only attr(data, "na.action") <- identity mmp <- suppressWarnings( model.matrix(fe.fml, data = data) ) pnames <- colnames(mmp) psave <- c( psave, setdiff( c(pnames), c("(Intercept)", nms) ) ) switch( method , jomo={ # jomoImpute, for higher-level functions (data input) y <- data[yvrs] ycat <- NULL }, jomo.matrix={ # jomoImpute, for lower-level versions (matrix input) y <- data.matrix(data[yvrs]) cvrs <- sapply(data[,yvrs, drop = F], is.factor) ycat <- y[,cvrs, drop = F] y <- y[,!cvrs, drop = F] } ) pred <- mmp xcol <- which(colnames(pred) %in% pnames) # assign to parent.frame inp <- list( y.L2 = y, ycat.L2 = ycat, pred.L2 = pred, xcol.L2 = xcol, yvrs.L2 = yvrs, pvrs.L2 = pvrs, pnames.L2 = pnames, psave = psave ) for(i in names(inp)) assign(i, inp[[i]], pos = parent.frame()) } invisible(NULL) } # convert formula to character .formula2char <- function(x){ chr <- as.character(x) paste(chr[c(3, 1, 2)]) } .check.model <- function(x){ # check model type and number of levels xnew <- x # ensure proper list format if(is.list(x) & length(x) > 2) stop("Cannot determine the number of levels. The 'formula' or 'type' argument must indicate either a single-level model, a model for responses at level 1, or two models for responses at level 1 and 2.") if(!is.list(x)) x <- list(x) # check cluster specification and model type clt <- lapply(x, function(z){ if(is.language(z)){ tl <- attr(terms(z), "term.labels") tl[grep("\\|", tl)] }else{ which(z == -2) } }) isML <- length(clt[[1]]) > 0 isL2 <- length(x) == 2 if(isL2 & !isML) stop("No cluster variable found. Imputation models for responses at level 1 and 2 require the specification of a cluster variable in the level-1 equation.") attr(xnew, "is.ML") <- isML attr(xnew, "is.L2") <- isL2 xnew } .check.variablesL2 <- function(x, clus){ # check for variables at L2 (constant at L1) apply(x, 2, function(a) all( abs(a-clusterMeans(a, clus)) < sqrt(.Machine$double.eps), na.rm = T)) } # convert type to formula .type2formula <- function(data, type){ # L2: separate model equations type <- .check.model(type) isML <- attr(type, "is.ML") isL2 <- attr(type, "is.L2") if(isL2){ type.L2 <- type[[2]] type <- type[[1]] } nms <- colnames(data) # grouping grp <- if(any(type == -1)) nms[type == -1] else NULL if(isL2 & is.null(grp)){ if(any(type.L2 == -1)) grp <- nms[type.L2 == -1] } # L1 model if(ncol(data) != length(type)) stop("Length of 'type' must be equal to the number of colums in 'data'.") if(sum(type == -2)>1) stop("Only one cluster indicator may be specified.") cls <- nms[type == -2] yvrs <- paste( nms[type == 1], collapse = "+" ) pvrs <- paste( c(1, nms[type%in%c(2, 3)]), collapse = "+" ) qvrs <- if(isML) paste( c(1, nms[type == 3]), collapse = "+" ) else NULL # build L1 formula cls.fml <- if(isML) paste("+ (", qvrs, "|", cls, ")") else NULL fml <- formula( paste(yvrs, "~", pvrs, cls.fml) ) # L2 model if(isL2){ if(ncol(data) != length(type.L2)) stop("Length of 'type' must be equal to the number of colums in 'data'.") yvrs <- paste( nms[type.L2 == 1], collapse = "+" ) pvrs <- paste( c(1, nms[type.L2%in%c(2, 3)]), collapse = "+" ) # build formula (make list) fml <- list( fml, formula( paste(yvrs, "~", pvrs) ) ) } attr(fml, "group") <- grp attr(fml, "is.ML") <- isML attr(fml, "is.L2") <- isL2 return(fml) } mitml/R/confint.mitml.testEstimates.R0000644000176200001440000000114014001603230017317 0ustar liggesusersconfint.mitml.testEstimates <- function(object, parm, level = 0.95, ...){ # calculate confidence intervals from pooled estimates est <- object$estimates pnames <- rownames(est) if(missing(parm)) parm <- pnames if(is.numeric(parm)) parm <- pnames[parm] cf <- est[parm, 1] se <- est[parm, 2] df <- est[parm, 4] a <- (1-level)/2 fac <- qt(1-a, est[parm, "df"]) pct <- paste(format(100*c(a, 1-a), trim = TRUE, scientific = FALSE, digits = 3), "%") ci <- matrix(NA_real_, length(parm), 2, dimnames = list(parm, pct)) ci[,1] <- cf - se*fac ci[,2] <- cf + se*fac return(ci) } mitml/R/mitmlComplete.R0000644000176200001440000000345214002016013014522 0ustar liggesusersmitmlComplete <- function(x, print = "all", force.list = FALSE){ if(sum(print <= 0) > 1) stop("Only one negative or zero value is allowed in 'print'.") dat <- x$data srt <- order( attr(x$data, "sort") ) labs <- attr(x$data, "labels") method <- class(x)[2] m <- x$iter$m ind <- x$index.mat rpm <- x$replacement.mat if(is.numeric(print)){ if(length(print) == 1){ if(print > 0){ com <- .completeOne(dat, print, ind, rpm, method) out <- com[srt,] }else{ out <- .stripDataAttributes(dat[srt,]) } if(force.list) out <- list(out) }else{ out <- list() for(ii in print){ if(ii > 0){ com <- .completeOne(dat, ii, ind, rpm, method) out <- c(out, list(com[srt,])) }else{ out <- c(out, list(.stripDataAttributes(dat[srt,]))) } } } }else{ if(!print %in% c("list", "all")) stop("Invalid 'print' argument.") out <- list() for(ii in 1:m){ com <- .completeOne(dat, ii, ind, rpm, method) out <- c(out, list(com[srt,])) } } if(is.list(out) && !is.data.frame(out)) class(out) <- c("mitml.list", "list") return(out) } .completeOne <- function(x, i, ind, rpm, method){ if(method == "jomo"){ fac <- which(colnames(x) %in% names(attr(x, "labels"))) nofac <- !(ind[,2] %in% fac) if(any(nofac)) x[ ind[nofac, ,drop = F] ] <- rpm[nofac, i, drop = F] for(ff in fac){ fi <- which(ind[,2] == ff) lev <- attr(x, "labels")[[colnames(x)[ff]]] if(length(fi)>0) x[ ind[fi, ,drop = F] ] <- lev[rpm[fi, i]] } }else{ x[ind] <- rpm[,i] } .stripDataAttributes(x) } .stripDataAttributes <- function(x){ attr(x, "sort") <- NULL attr(x, "group") <- NULL attr(x, "levels") <- NULL attr(x, "labels") <- NULL x } mitml/R/write.mitmlSAV.R0000644000176200001440000000116114002022335014533 0ustar liggesuserswrite.mitmlSAV <- function(x, filename){ # write to native SPSS format if(!inherits(x, "mitml") && !inherits(x, "mitml.list")) stop("'x' must be of class 'mitml' or 'mitml.list'.") if(!grepl(".sav$", tolower(filename))) filename <- paste(filename, ".sav", sep = "") # convert mitml to mitml.list if(inherits(x, "mitml")){ x <- mitmlComplete(x, "all", force.list = TRUE) } # add imputation indicator for(ii in 1:length(x)){ x[[ii]] <- cbind(ii-1, x[[ii]]) colnames(x[[ii]])[1] <- "Imputation_" } # write to file out <- do.call(rbind, x) haven::write_sav(out, filename) invisible() } mitml/R/print.mitml.testEstimates.R0000644000176200001440000000266014003773573017047 0ustar liggesusersprint.mitml.testEstimates <- function(x, digits = 3, sci.limit = 5, ...){ # print method for MI estimates cll <- x$call est <- x$estimates ep <- x$extra.pars m <- x$m adj.df <- x$adj.df df.com <- x$df.com # print header cat("\nCall:\n", paste(deparse(cll)), sep = "\n") cat("\nFinal parameter estimates and inferences obtained from", m, "imputed data sets.\n") cat("\n") # print results if(!is.null(est)){ # format numeric results pl <- attr(est, "par.labels") out <- .formatTable(est, digits = digits, sci.limit = sci.limit, labels = pl) for(i in seq_len(nrow(out))) cat(out[i,], "\n") } # print other results if(!is.null(ep)){ if(!is.null(est)) cat("\n") # format numeric results pl <- attr(ep, "par.labels") out <- .formatTable(ep, digits = digits, sci.limit = sci.limit, labels = pl) for(i in seq_len(nrow(out))) cat(out[i,], "\n") } cat("\n") # print footer if(adj.df){ cat(c("Hypothesis test adjusted for small samples with", paste0("df=[", paste(df.com, collapse = ","), "]\ncomplete-data degrees of freedom."))) }else{ cat("Unadjusted hypothesis test as appropriate in larger samples.") } cat("\n\n") invisible() } summary.mitml.testEstimates <- function(object, ...){ # summary method for objects of class mitml.testEstimates print.mitml.testEstimates(object, ...) } mitml/R/write.mitml.R0000644000176200001440000000032314001606247014171 0ustar liggesuserswrite.mitml <- function(x, filename, drop = FALSE){ # write mitml class object to file if(drop){ x <- x[!names(x) %in% c("par.burnin", "par.imputation")] } save(x, file = filename) invisible() } mitml/R/sort.mitml.list.R0000644000176200001440000000057414001605443015005 0ustar liggesuserssort.mitml.list <- function(x, decreasing = FALSE, by, ...){ # sort list of multiply imputed data sets expr <- substitute(by) args0 <- list(decreasing = decreasing, ...) res <- lapply(x, function(i){ args <- eval(expr, i, parent.frame()) if(!is.list(args)) args <- list(args) ind <- do.call("order", c(args, args0)) i[ind,] }) as.mitml.list(res) } mitml/R/write.mitmlMplus.R0000644000176200001440000000403114002017436015207 0ustar liggesuserswrite.mitmlMplus <- function(x, filename, suffix = "list", sep = "\t", dec = ".", na.value=-999){ # write text files that can be read into Mplus if(!inherits(x, "mitml") && !inherits(x, "mitml.list")) stop("'x' must be of class 'mitml' or 'mitml.list'.") if(inherits(x, "mitml")){ x <- mitmlComplete(x, "all", force.list = TRUE) } m <- length(x) if(!is.list(x)) x <- list(x) dnames <- paste(filename, 1:m, ".dat", sep = "") lname <- paste(filename, suffix, ".dat", sep = "") write.table(dnames, file = lname, col.names = FALSE, row.names = FALSE, quote = FALSE) for(ii in 1:m){ out <- x[[ii]] # convert factors notnum <- which(sapply(out, function(z) !is.numeric(z))) conv <- as.list(notnum) for(nn in notnum){ out[,nn] <- as.factor(out[,nn]) conv[[colnames(out)[nn]]] <- matrix(c(levels(out[,nn]), 1:nlevels(out[,nn])), ncol = 2) out[,nn] <- as.numeric(out[,nn]) } # write out[is.na(out)] <- na.value write.table(out, file = dnames[ii], sep = sep, dec = dec, col.names = F, row.names = F, quote = FALSE) } # log file cname <- paste(filename, ".log", sep = "") cat(file = cname, "The data set featured the following variables:") cat(file = cname, "\n\n", paste(colnames(out), collapse = " "), sep = "", append = T) if(length(conv)>0){ cat(file = cname, "\n\n", "Factors were converted to numeric values as follows:\n ", sep = "", append = T) for(cc in 1:length(conv)){ cat(file = cname, "\n", names(conv[cc]), ":\n", sep = "", append = T) write.table(conv[[cc]], file = cname, row.names = F, col.names = F, sep = " = ", quote = F, append = T) } } # input file iname <- paste(filename, ".inp", sep = "") cat(file = iname, sep = "", "TITLE:\n This Mplus input file for multiply imputed data sets was generated by mitml in R.\n", "DATA:\n file = ", lname, ";\n", " type = imputation;\n", "VARIABLE:\n names = ", paste(colnames(out), collapse = " "), ";\n", " missing = all (", na.value, ");" ) invisible() } mitml/R/anova.mitml.result.R0000644000176200001440000001147514002325715015471 0ustar liggesusersanova.mitml.result <- function(object, ..., method = c("D3", "D4", "D2"), ariv = c("default", "positive", "robust"), data = NULL){ # create list of models mod.list <- c(list(object), list(...)) # *** # check input # # check lists m <- length(object) if(length(mod.list) == 1) stop("Comparison requires at least two lists of fitted statistical models.") if(any(!sapply(mod.list, is.list))) stop("The 'object' and '...' arguments must be lists of fitted statistical models.") if(any(sapply(mod.list[-1], length) != m)) stop("The 'object' and '...' arguments must be lists with the same length.") # check method method.choices <- c("D3", "D4", "D2") method <- original.method <- match.arg(method, method.choices) # check model classes cls.list <- lapply(mod.list, function(x) class(x[[1]])) if(any(sapply(cls.list[-1], "[", 1) != cls.list[[1]][1])) warning("The 'object' and '...' arguments appear to include objects of different classes. Results may not be trustworthy.") .checkNamespace(unique(unlist(cls.list))) # check for REML and refit (if needed) reml.list <- lapply(mod.list, function(x) sapply(x, .checkREML)) reml <- any(unlist(reml.list)) if(reml){ for(ii in seq_along(mod.list)){ mod.list[[ii]][reml.list[[ii]]] <- lapply(mod.list[[ii]][reml.list[[ii]]], .updateML) } } # *** # check method and possible fallback methods # # find user-defined method and possible fallback options try.method <- method.choices[seq.int(which(method.choices == original.method), length(method.choices))] error.msg <- character() # try logLik evaluation methods until working method is found for(mm in seq_along(try.method)){ if(try.method[mm] == "D3") try.fun <- .evaluateUserLogLik if(try.method[mm] == "D4") try.fun <- .evaluateStackedLogLik if(try.method[mm] == "D2") try.fun <- .evaluateLogLik # check if method can be applied to specified objects res <- lapply(mod.list, function(x, fun){ tryCatch(expr = suppressMessages(suppressWarnings(fun(x[1]))), error = function(e) e ) }, fun = try.fun) # if applicable, proceed; otherwise, save error message and try next method (if any) notApplicable <- sapply(res, inherits, what = "error") if(any(notApplicable)){ # save error message err <- as.character(res[[which(notApplicable)[1]]]) error.msg[try.method[mm]] <- sub("^Error in .*: ", "", err) # try next method (if any) if(mm < length(try.method)){ next() }else{ stop("The '", original.method, "' method is not supported for the specified models, and no valid alternative was found. Problems were due to:\n", paste(error.msg, collapse = "")) } }else{ # set method, print warning if needed method <- try.method[mm] if(method != original.method) warning("The '", original.method, "' method is not supported for the specified models. Switching to '", method, "'.") break() } } # *** # find order of models # # try to determine (numerator) degrees of freedom for each model df.list <- lapply(lapply(mod.list, "[[", 1), .getDFs) # check if models can be ordered reorderModels <- FALSE if(all(!sapply(df.list, is.null))){ df.method <- sapply(df.list, attr, which = "type") # check if extraction method was consistent across models if(all(df.method[-1] == df.method[1])){ reorderModels <- TRUE } } # re-order models (if possible) if(reorderModels){ mod.list <- mod.list[order(unlist(df.list), decreasing = TRUE)] }else{ warning("Could not determine the order of models in 'object' and '...'. The order is therefore assumed to be as specified (with decreasing complexity). Please check whether this was intended, and see '?testModels' for specific comparisons between models.") } # *** # perform model comparisons # # model comparisons nmod <- length(mod.list) out.list <- vector("list", nmod-1) for(ii in seq_len(nmod-1)){ # make call cll <- call("testModels", model = quote(mod.list[[ii]]), null.model = quote(mod.list[[ii+1]])) cll[["method"]] <- method if(method == "D2") cll[["use"]] <- "likelihood" if(method == "D4"){ if(!is.null(data)) cll[["data"]] <- data cll[["ariv"]] <- ariv } # evaluate call out.list[[ii]] <- eval(cll) } # try to get model formulas fml <- character(nmod) for(ii in seq_len(nmod)){ f <- .getFormula(mod.list[[ii]][[1]]) fml[ii] <- f } out <- list( call = match.call(), test = out.list, m = m, method = method, use = "likelihood", ariv = ariv, data = !is.null(data), formula = fml, order.method = ifelse(reorderModels, df.method[1], NULL), reml = reml ) class(out) <- "mitml.anova" return(out) } mitml/R/testEstimates.R0000644000176200001440000000774514127022466014577 0ustar liggesuserstestEstimates <- function(model, qhat, uhat, extra.pars = FALSE, df.com = NULL, ...){ # combine scalar estimates from the analysis of multiply imputed data # *** # check input # # handle deprecated arguments dots <- list(...) extra.pars <- .checkDeprecated(extra.pars, arg.list = dots, name = "var.comp") # check model specification if(missing(model) == (missing(qhat) || missing(uhat))){ stop("Either 'model' or both 'qhat' and 'uhat' must be supplied.") } # check misc. parameters if(!extra.pars) ep.out <- NULL # *** # process matrix, array or list arguments # if(!missing(qhat)){ # check input if(missing(uhat)) stop("Either 'model' or both 'qhat' and 'uhat' must be supplied.") if(!is.matrix(qhat) && is.array(qhat)) stop("The 'qhat' argument must be either a matrix or a list.") # convert point estimates if(is.matrix(qhat)){ qhat <- lapply(seq_len(ncol(qhat)), function(i, Q) Q[,i], Q = qhat) } # convert variance estimates if(is.matrix(uhat)){ uhat <- lapply(seq_len(ncol(uhat)), function(i, U) U[,i], U = uhat) }else if(is.array(uhat)){ uhat <- lapply(seq_len(dim(uhat)[3]), function(i, U) diag(as.matrix(U[,,i])), U = uhat) } # ensure proper format m <- length(qhat) if(m != length(uhat)) stop("Dimensions of 'qhat' and 'uhat' do not match.") Qhat <- matrix(unlist(qhat), ncol = m) Uhat <- matrix(unlist(uhat), ncol = m) if(any(!is.finite(Uhat))) stop("Missing values in 'uhat' are not allowed.") nms <- names(qhat[[1]]) if(is.null(nms)) nms <- paste0("Parameter.", 1:nrow(Qhat)) cls <- NULL ep.out <- NULL if(extra.pars) warning("The 'extra.pars' argument is ignored when 'qhat' and 'uhat' are used.") } # *** # process fitted models # if(!missing(model)){ if(!is.list(model)) stop("The 'model' argument must be a list of fitted statistical models.") m <- length(model) # get class (check required packages) cls <- class(model[[1]]) .checkNamespace(cls) # extract parameter estimates est <- .extractParameters(model, diagonal = TRUE, include.extra.pars = TRUE) Qhat <- est$Qhat Uhat <- est$Uhat nms <- est$nms if(extra.pars){ ep.est <- .extractMiscParameters(model) ep.Qhat <- ep.est$Qhat ep.nms <- ep.est$nms }else{ ep.Qhat <- ep.nms <- NULL } } # *** # pool results # Qbar <- apply(Qhat, 1, mean) Ubar <- apply(Uhat, 1, mean) B <- apply(Qhat, 1, var) T <- Ubar + (1+m^(-1)) * B se <- sqrt(T) t <- Qbar/se r <- (1+m^(-1))*B/Ubar # compute degrees of freedom v <- vm <- (m-1)*(1+r^(-1))^2 if(!is.null(df.com)){ lam <- r/(r+1) vobs <- (1-lam)*((df.com+1)/(df.com+3))*df.com v <- (vm^(-1)+vobs^(-1))^(-1) } fmi <- (r+2/(v+3))/(r+1) # create output for parameter estimates pval <- 2 * (1 - pt(abs(t), df = v)) # two-tailed p-value, SiG 2017-02-09 out <- matrix(c(Qbar, se, t, v, pval, r, fmi), ncol = 7) colnames(out) <- c("Estimate", "Std.Error", "t.value", "df", "P(>|t|)", "RIV", "FMI") # two-tailed p-value, SiG 2017-02-09 rownames(out) <- nms # preserve parameter labels (if any) attr(out, "par.labels") <- attr(nms, "par.labels") # create output for other parameter estimates if(extra.pars && !missing(model)){ if(is.null(ep.Qhat)){ ep.out <- NULL warning("Computation of variance components not supported for objects of class '", paste(cls, collapse = "|"), "' (see ?with.mitml.list for manual calculation).") }else{ ep.Qbar <- apply(ep.Qhat, 1, mean) ep.out <- matrix(ep.Qbar, ncol = 1) colnames(ep.out) <- "Estimate" rownames(ep.out) <- ep.nms # parameter labales attr(ep.out, "par.labels") <- attr(ep.nms, "par.labels") } } out <- list( call = match.call(), estimates = out, extra.pars = ep.out, m = m, adj.df = !is.null(df.com), df.com = df.com, cls.method = cls ) class(out) <- "mitml.testEstimates" return(out) } mitml/R/mids2mitml.list.R0000644000176200001440000000054114127016073014754 0ustar liggesusersmids2mitml.list <- function(x){ # convert mids to mitml.list if(!requireNamespace("mice", quietly = TRUE)) stop("The 'mice' package must be installed to use this function.") m <- x$m out <- list() length(out) <- m for(ii in 1:m){ out[[ii]] <- mice::complete(x, action = ii) } class(out) <- c("mitml.list", "list") return(out) } mitml/R/long2mitml.list.R0000644000176200001440000000071514001606147014760 0ustar liggesuserslong2mitml.list <- function(x, split, exclude = NULL){ # convert data set in "long" format to mitml.list i1 <- which(colnames(x) == split) f <- x[,i1] if(!is.null(exclude)){ i2 <- if(length(exclude) == 1) f != exclude else !f %in% exclude x <- x[i2, , drop = F] f <- f[i2] if(is.factor(f)) f <- droplevels(f) } out <- split(x[, -i1, drop = F], f = f) names(out) <- NULL class(out) <- c("mitml.list", "list") return(out) } mitml/R/is.mitml.list.R0000644000176200001440000000044214002014375014422 0ustar liggesusersis.mitml.list <- function(x){ # checks if the argument is a list of class "mitml.list" l <- inherits(x, "mitml.list") & is.list(x) if(!l){ return(FALSE) }else{ if(any(!sapply(x, is.data.frame))) warning("Does not appear to be a list of data frames.") return(TRUE) } } mitml/R/c.mitml.list.R0000644000176200001440000000023614001605561014234 0ustar liggesusersc.mitml.list <- function(...){ # merges two objects of class "mitml.list" by appending list entries as.mitml.list(unlist(list(...), recursive = FALSE)) } mitml/R/print.mitml.testModels.R0000644000176200001440000000362414053231674016331 0ustar liggesusersprint.mitml.testModels <- function(x, digits = 3, sci.limit = 5, ...){ # print method for model comparisons cll <- x$call test <- x$test method <- x$method use <- x$use reml <- x$reml refit <- x$refit m <- x$m data <- x$data ariv <- x$ariv adj.df <- x$adj.df df.com <- x$df.com # print header cat("\nCall:\n", paste(deparse(cll)), sep = "\n") cat("\nModel comparison calculated from", m, "imputed data sets.") # print method cat("\nCombination method:", method) if(method == "D2") cat(" (", use, ")", sep = "") if(method == "D4" && ariv == "robust") cat(" (robust)", sep = "") cat("\n\n") # print test results test.digits <- c(digits, 0, rep(digits, ncol(test)-2)) out <- .formatTable(test, digits = test.digits, sci.limit = sci.limit) for(i in seq_len(nrow(out))) cat(" ", out[i,], "\n") cat("\n") # print footer (if any) footer <- FALSE if(method == "D1"){ footer <- TRUE if(adj.df){ cat("Hypothesis test adjusted for small samples with ", paste0("df=[", paste(df.com, collapse = ","), "]\ncomplete-data degrees of freedom."), "\n", sep = "") }else{ cat("Unadjusted hypothesis test as appropriate in larger samples.\n") } } if(method == "D4"){ footer <- TRUE if(data){ cat("Data for stacking were extracted from the `data` argument.\n") }else{ cat("Data for stacking were automatically extracted from the fitted models.\n") } } if(reml){ footer <- TRUE if(refit){ cat("Models originally fitted with REML were refitted using ML.\n") }else{ cat("Models fitted with REML were used as is.\n") } } if(footer) cat("\n") invisible() } summary.mitml.testModels <- function(object, ...){ # summary method for objects of class mitml.testModels print.mitml.testModels(object, ...) } mitml/R/as.mitml.list.R0000644000176200001440000000053114001605555014416 0ustar liggesusersas.mitml.list <- function(x){ # adds a class attribute "mitml.list" to its argument if(!is.list(x)) stop("Argument must be a 'list'.") if(any(!sapply(x, is.data.frame))){ x <- lapply(x, as.data.frame) cat("Note: List entries were converted to class 'data.frame'.\n") } class(x) <- c("mitml.list", class(x)) return(x) } mitml/R/internal-pool.R0000644000176200001440000000264514001604710014503 0ustar liggesusers.D1 <- function(Qhat, Uhat, df.com){ # pooling for multidimensional estimands (D1, Li et al., 1991; Reiter, 2007) k <- dim(Qhat)[1] m <- dim(Qhat)[2] # D1 Qbar <- apply(Qhat, 1, mean) Ubar <- apply(Uhat, c(1, 2), mean) B <- cov(t(Qhat)) r <- (1+m^(-1))*sum(diag(B%*%solve(Ubar)))/k Ttilde <- (1 + r)*Ubar val <- t(Qbar) %*% solve(Ttilde) %*% Qbar / k # compute degrees of freedom (df2) t <- k*(m-1) if(!is.null(df.com)){ # small-sample degrees of freedom (Reiter, 2007; Eq. 1-2) a <- r*t/(t-2) vstar <- ( (df.com+1) / (df.com+3) ) * df.com c0 <- 1 / (t-4) c1 <- vstar - 2 * (1+a) c2 <- vstar - 4 * (1+a) z <- 1 / c2 + c0 * (a^2 * c1 / ((1+a)^2 * c2)) + c0 * (8*a^2 * c1 / ((1+a) * c2^2) + 4*a^2 / ((1+a) * c2)) + c0 * (4*a^2 / (c2 * c1) + 16*a^2 * c1 / c2^3) + c0 * (8*a^2 / c2^2) v <- 4 + 1/z }else{ if (t > 4){ v <- 4 + (t-4) * (1 + (1 - 2*t^(-1)) * (r^(-1)))^2 }else{ v <- t * (1 + k^(-1)) * ((1 + r^(-1))^2) / 2 } } return(list(F = val, k = k, v = v, r = r)) } .D2 <- function(d, k){ # pooling for multidimensional estimands (D2, Li, Meng et al., 1991) m <- length(d) # D2 dbar <- mean(d) r <- (1+m^(-1)) * var(sqrt(d)) val <- (dbar/k - (m+1)/(m-1) * r) / (1+r) # compute degrees of freedom (df2) v <- k^(-3/m) * (m-1) * (1+r^(-1))^2 return(list(F = val, k = k, v = v, r = r)) } mitml/R/testConstraints.R0000644000176200001440000001075314001604313015125 0ustar liggesuserstestConstraints <- function(model, qhat, uhat, constraints, method = c("D1", "D2"), ariv = c("default", "positive"), df.com = NULL){ # test constraints with multiply imputed data # *** # check input # if(missing(model) == (missing(qhat) || missing(uhat))){ stop("Either 'model' or both 'qhat' and 'uhat' must be supplied.") } # match methods method <- match.arg(method) ariv <- match.arg(ariv) # warnings for ignored arguments if(!is.null(df.com) && method == "D2") warning("Complete-data degrees of freedom are not available for use with 'D2', and thus were ignored.") if(ariv == "positive" && method == "D1") warning("The 'positive' option is not available with method 'D1' and was ignored.") # clean names in constraints constraints <- gsub("\\(Intercept\\)", "Intercept", constraints) k <- length(constraints) # *** # process matrix, array or list arguments # if(!missing(qhat)){ # check input if(missing(uhat)) stop("Either 'model' or both 'qhat' and 'uhat' must be supplied.") if(!is.matrix(qhat) && is.array(qhat)) stop("The 'qhat' argument must be either a matrix or a list.") if(is.matrix(uhat)) stop("The 'uhat' argument must be either an array or a list.") # convert point estimates if(is.matrix(qhat)){ qhat <- lapply(seq_len(ncol(qhat)), function(i, Q) Q[,i], Q = qhat) } # convert variance estimates if(is.array(uhat)){ uhat <- lapply(seq_len(dim(uhat)[3]), function(i, U) as.matrix(U[,,i]), U = uhat) } # ensure proper format m <- length(qhat) p <- length(qhat[[1]]) if(m != length(uhat) || !is.matrix(uhat[[1]]) || p != ncol(uhat[[1]]) || p != nrow(uhat[[1]])) stop("Dimensions of 'qhat' and 'uhat' do not match.") Qhat <- matrix(unlist(qhat), ncol = m) Uhat <- array(unlist(uhat), dim = c(p, p, m)) if(any(!is.finite(Uhat))) stop("Missing values in 'uhat' are not allowed.") nms <- names(qhat[[1]]) } # *** # process fitted models # if(!missing(model)){ if(!is.list(model)) stop("The 'model' argument must be a list of fitted statistical models.") # get class (check required packages) cls <- class(model[[1]]) .checkNamespace(cls) # extract parameter estimates est <- .extractParameters(model) Qhat <- est$Qhat Uhat <- est$Uhat nms <- est$nms m <- length(model) p <- nrow(Qhat) } # *** # delta method # # prepare parameter names if(is.null(nms)) stop("Could not determine parameter names.") nms <- gsub("\\(Intercept\\)", "Intercept", nms) rownames(Qhat) <- nms dimnames(Uhat) <- list(nms, nms, NULL) newQhat <- array(NA, dim = c(k, m)) newUhat <- array(NA, dim = c(k, k, m)) for(ii in 1:m){ theta <- Qhat[,ii] Sigma <- as.matrix(Uhat[,,ii]) g <- parse(text = constraints) env.g <- new.env() for(pp in 1:p) assign(names(theta)[pp], theta[pp], pos = env.g) # new parameter estimates newtheta <- numeric(k) for(kk in seq_len(k)) newtheta[kk] <- eval(g[kk], envir = env.g) # derivative, new covariance matrix gdash.theta <- matrix(NA, k, p) for(kk in seq_len(k)){ tmp <- numericDeriv(g[[kk]], names(theta), env.g) gdash.theta[kk,] <- attr(tmp, "gradient") } newSigma <- gdash.theta %*% Sigma %*% t(gdash.theta) newQhat[,ii] <- newtheta newUhat[,,ii] <- newSigma } # *** # pool results # # calculate pooled estimates and covariance matrix (for output) Qbar <- apply(newQhat, 1, mean) Ubar <- apply(newUhat, c(1, 2), mean) B <- cov(t(newQhat)) r <- (1+m^(-1)) * sum(diag(B%*%solve(Ubar))) / k Ttilde <- (1+r) * Ubar # D1 (Li et al., 1991) if(method == "D1"){ D <- .D1(Qhat = newQhat, Uhat = newUhat, df.com = df.com) r <- D$r val <- D$F v <- D$v } # D2 (Li, Meng et al., 1991) if(method == "D2"){ dW <- numeric(m) for(ii in seq_len(m)) dW[ii] <- t(newQhat[,ii]) %*% solve(newUhat[,,ii]) %*% newQhat[,ii] D <- .D2(d = dW, k = k) r <- D$r if(ariv == "positive") r <- max(0, r) val <- D$F v <- D$v } # create output pval <- pf(val, k, v, lower.tail = FALSE) out <- matrix(c(val, k, v, pval, r), ncol = 5) colnames(out) <- c("F.value", "df1", "df2", "P(>F)", "RIV") # new label for p-value, SiG 2017-02-09 out <- list( call = match.call(), constraints = constraints, test = out, Qbar = Qbar, T = Ttilde, m = m, adj.df = !is.null(df.com), df.com = df.com, method = method ) class(out) <- "mitml.testConstraints" return(out) } mitml/R/read.mitml.R0000644000176200001440000000026614001604225013752 0ustar liggesusersread.mitml <- function(filename){ # read mitml objects from file env <- new.env(parent = parent.frame()) load(filename, env) obj <- ls(env) eval(parse(text = obj), env) } mitml/MD50000644000176200001440000001226514127042517011715 0ustar liggesusers3d3dd1ad567318cb3335c56fd26fcf33 *DESCRIPTION 048f52e6b8c4fe31207b73afd28d1b0d *NAMESPACE f150926e165d3e4596c4fafb64a126c3 *NEWS 9e2cb0672778a8b4e5baedf296e6f910 *R/amelia2mitml.list.R 026ab33af240eacd5325a96d43eda781 *R/anova.mitml.result.R 1057ec32cca180fa7d595eebe23e10de *R/as.mitml.list.R 48c5b4e1324382a1019ebceae5652065 *R/c.mitml.list.R 1fd07b1130bb634b2f273e4c66d9dc64 *R/cbind.mitml.list.R 6583b64c3ef3e36b5e8f5f41d95e756f *R/clusterMeans.R 42a496bae1e4d2e06c47b9478c7393fa *R/confint.mitml.testEstimates.R cb5850d9fd2826b17484e7d206e349b7 *R/internal-convergence.R 8f089070708103cad479aa7e463f2202 *R/internal-methods-estimates.R 9de81dbea5bc5c1d7ec88b97ffbc91f4 *R/internal-methods-likelihood.R 876a5d47a11f27f79d22fa124729b9ce *R/internal-methods-zzz.R 20e44bb67ffc173e74a9a9a82213e19b *R/internal-model.R 983ba9e8c5e16ada222c0f2703ef2419 *R/internal-pool.R 251f026771866292edd92de976357f57 *R/internal-zzz.R 1cc8184caece518cab74cd3b6bcec3a1 *R/is.mitml.list.R a47541f32c75c551ca3d8a302d553404 *R/jomo2mitml.list.R a2bd91aeaa472016edea7b7e7c3cb922 *R/jomoImpute.R 312dadbcc83d387450ced01882137c93 *R/long2mitml.list.R 2a6cad5f8755e8b7bd95fa669bd08d16 *R/mids2mitml.list.R 582e4cead9e097f4c627c57af5ab96ec *R/mitml.list2mids.R 0d9490747e60e275f00373ed2710c009 *R/mitmlComplete.R cc8d948b4e8973c0c21536cb49e5a4dc *R/multilevelR2.R 4e3fa946bd7faf6290c8c80644a311c9 *R/panImpute.R bbaded410f97f515bc2d9133b9f5517f *R/plot.mitml.R 64daf861f5f3be37ea89d752ddee4c89 *R/print.mitml.R 9b546f96a6561cf19077ec242cfdde97 *R/print.mitml.anova.R c4984c79b2e63160e3bb8a6ef222c0e0 *R/print.mitml.summary.R 38be19315e70661fba6fc966b4077ad2 *R/print.mitml.testConstraints.R f66cb1041ef51e2a520a4352b4d7e578 *R/print.mitml.testEstimates.R b232518de70bbaf3b21d492559eb605b *R/print.mitml.testModels.R 94c739e259a86af9b9e9ded3ffe96ccf *R/rbind.mitml.list.R 9e3ca5c9065a56c31896d4aac0c81bf3 *R/read.mitml.R 983555ebc47d9a68c11bdcbd9cad18db *R/sort.mitml.list.R 6376f391f20473a5be360976b7e2f098 *R/subset.mitml.list.R ef4f239aac0d708be5a5ae7e600ec6ff *R/summary.mitml.R 2410651b900ee8d65ac1fe4d6875846e *R/testConstraints.R 4a389ee875d04d6757c340a84f3f1245 *R/testEstimates.R 119ddf28787d132d5632162951732413 *R/testModels.R a868b90aa1c7e8e15f8507d124fde16e *R/with.mitml.list.R c57812eefeea73f36f8888769f708729 *R/within.mitml.list.R 0d51e817cdff9db18330650a80506123 *R/write.mitml.R 342094c7ffe2459319b1eecb4d73fd0c *R/write.mitmlMplus.R 342dcaf7a7d7dfafac78a73ac3c2094d *R/write.mitmlSAV.R a68e3e1f3f7935fc2a19a3dd86044dfe *R/write.mitmlSPSS.R aa7795f230bdf7460dadf7ffb248f2cf *R/zzz.R 1e859c273f3e2c8873a285ee19fb60a2 *README.md 97c9ddbd650597837469d0de5b5424ab *build/vignette.rds 7333385f16ef213729fcaae2bb48a5f2 *data/justice.rda 85f879a82e8215f2709c0d1e917b1a0d *data/leadership.rda 1856395651bfbea14af44ce56f1fa37e *data/studentratings.rda 4be38b5ab626fe9ffb233732ddb3c024 *inst/doc/Analysis.R b029945283a0dfa02189fc94e4282d96 *inst/doc/Analysis.Rmd 54f1ec42b25e13fbdf6e6703a81e39de *inst/doc/Analysis.html c96d94ae085ccb50190538802602dc71 *inst/doc/Introduction.R 3b805eada4bbaaf1fe5b011448a9a60d *inst/doc/Introduction.Rmd 8fac2f840452b00a4e80d6c2f0284c72 *inst/doc/Introduction.html f88914d5ec260593ea94284183d2467d *inst/doc/Level2.R 14b067e738e92adba38ce79703d7df92 *inst/doc/Level2.Rmd dc98f093e2c7e66448e9164a357cdbab *inst/doc/Level2.html fcc362a706fce653216dc2d8624aa6ce *man/amelia2mitml.list.Rd 833da5c9e2c0534920641034a38ba1c8 *man/anova.mitml.result.Rd b9c8438ef04243af5a1458c2613fbe96 *man/as.mitml.list.Rd 78ddac00fa9ab957c7994b8d20eb9bbd *man/c.mitml.list.Rd 889c33a18ca345b1e9fff408a6804edb *man/clusterMeans.Rd edbd24a05856fafca18d9cfaff67daca *man/confint.mitml.testEstimates.Rd d112a91bd39549ef77d9d07afe2f5ada *man/is.mitml.list.Rd 0d46acec02d6f35fd570373694bdd935 *man/jomoImpute.Rd b652663d1e5a7c3153163328e31ce8b2 *man/justice.Rd 20aea70548283d572b2570f14e1502ea *man/leadership.Rd 4a4ee1f2adf322bdb6c6516b881ce26a *man/long2mitml.list.Rd fc46e4c7ebc8bc5fa906dcee313d4f10 *man/mids2mitml.list.Rd 95c2e0897367f8d4aa174b2c52454a35 *man/mitml-package.Rd 700194843aa3c90aea72af12183e04d7 *man/mitml.list2mids.Rd 6c4258ea123bb039ed081ac8de2a52f8 *man/mitmlComplete.Rd 61237d8ecc67dc073e7498a583bf8782 *man/multilevelR2.Rd b9e18347e91ecabd0f8eaa878fb1c0f3 *man/panImpute.Rd e8b2fe34183838d79526ba7a81f50adc *man/plot.mitml.Rd a6be5aee47c10e0f12dd22bf20731729 *man/read.mitml.Rd a205de98c70c123ae6fbdd17b7bf7b51 *man/sort.mitml.list.Rd 7d70ff16df91ab57b5ac0ce4a047f14c *man/studentratings.Rd 513c2d805beec572b2e16b9228245f76 *man/subset.mitml.list.Rd d7e6f07f4cdaeafebf675a2df332058e *man/summary.mitml.Rd a997c65c9fd5388f783c0125210cd5e8 *man/testConstraints.Rd a8527cc833bdc1f8176f2e3d6e72ef4f *man/testEstimates.Rd eaec5e993d2417ca8a62dcb2f8a12d29 *man/testModels.Rd c0aa4f42e5c97070f1c498e528e523fa *man/with.mitml.list.Rd ab738a241c98727cfd75a68a8b98a1b4 *man/write.mitml.Rd 8db0c82ec77ce5b4e5b5757260fddcb5 *man/write.mitmlMplus.Rd db864a33f735879330b68e086b9e26f5 *man/write.mitmlSAV.Rd 8503ca18b3c0fd1acc6054f434d648cb *man/write.mitmlSPSS.Rd b029945283a0dfa02189fc94e4282d96 *vignettes/Analysis.Rmd 3b805eada4bbaaf1fe5b011448a9a60d *vignettes/Introduction.Rmd 14b067e738e92adba38ce79703d7df92 *vignettes/Level2.Rmd dea4ee80cd16d43382e0c184efa8b251 *vignettes/css/vignette.css mitml/inst/0000755000176200001440000000000014127037132012351 5ustar liggesusersmitml/inst/doc/0000755000176200001440000000000014127037132013116 5ustar liggesusersmitml/inst/doc/Introduction.Rmd0000644000176200001440000001742414002023474016247 0ustar liggesusers--- title: "Introduction" output: rmarkdown::html_vignette: css: "css/vignette.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction} %\VignetteEncoding{UTF-8} --- --- ```{r setup, include=FALSE, cache=FALSE} library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ``` This vignette is intended to provide a first introduction to the R package `mitml` for generating and analyzing multiple imputations for multilevel missing data. A usual application of the package may consist of the following steps. 1. Imputation 2. Assessment of convergence 3. Completion of the data 4. Analysis 5. Pooling The `mitml` package offers a set of tools to facilitate each of these steps. This vignette is intended as a step-by-step illustration of the basic features of `mitml`. Further information can be found in the other [vignettes](https://github.com/simongrund1/mitml/wiki) and the package [documentation](https://cran.r-project.org/package=mitml/mitml.pdf). ## Example data For the purposes of this vignette, we employ a simple example that makes use of the `studentratings` data set, which is provided with `mitml`. To use it, the `mitml` package and the data set must be loaded as follows. ```{r} library(mitml) data(studentratings) ``` More information about the variables in the data set can be obtained from its `summary`. ```{r} summary(studentratings) ``` In addition, the correlations between variables (based on pairwise observations) may be useful for identifying possible sources of information that may be used during the treatment of missing data. ```{r, echo=FALSE} round(cor(studentratings[,-(1:3)], use="pairwise"),3) ``` This illustrates that (a) most variables in the data set are affected by missing data, but also (b) that substantial relations exist between variables. For simplicity, we focus on only a subset of these variables. ## Model of interest For the present example, we focus on the two variables `ReadDis` (disciplinary problems in reading class) and `ReadAchiev` (reading achievement). Assume we are interested in the relation between these variables. Specifically, we may be interested in the following analysis model $$ \mathit{ReadAchiev}_{ij} = \gamma_{00} + \gamma_{10} \mathit{ReadDis}_{ij} + u_{0j} + e_{ij} $$ On the basis of the syntax used in the R package `lme4`, this model may be written as follows. ```{r, results="hide"} ReadAchiev ~ 1 + ReadDis + (1|ID) ``` In this model, the relation between `ReadDis` and `ReadAchiev` is represented by a single fixed effect of `ReadDis`, and a random intercept is included to account for the clustered structure of the data and the group-level variance in `ReadAchiev` that is not explained by `ReadDis`. ## Generating imputations The `mitml` package includes wrapper functions for the R packages `pan` (`panImpute`) and `jomo` (`jomoImpute`). Here, we will use the first option. To generate imputations with `panImpute`, the user must specify (at least): 1. an imputation model 2. the number of iterations and imputations The easiest way of specifying the imputation model is to use the `formula` argument of `panImpute`. Generally speaking, the imputation model should include all variables that are either (a) part of the model of interest, (b) related to the variables in the model, or (c) related to whether the variables in the model are missing. In this simple example, we include only `ReadDis` and `ReadAchiev` as the main target variables and `SchClimate` as an auxiliary variable. ```{r} fml <- ReadAchiev + ReadDis + SchClimate ~ 1 + (1|ID) ``` Note that, in this specification of the imputation model. all variables are included on the left-hand side of the model, whereas the right-hand side is left "empty". This model allows for all relations between variables at Level 1 and 2 and is thus suitable for most applications of the multilevel random intercept model (for further discussion, see also Grund, Lüdtke, & Robitzsch, 2016, in press). The imputation procedure is then run for 5,000 iterations (burn-in), after which 100 imputations are drawn every 100 iterations. ```{r, results="hide"} imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 100, m = 100) ``` This step may take a few seconds. Once the process is completed, the imputations are saved in the `imp` object. ## Assessing convergence In `mitml`, there are two options for assessing the convergence of the imputation procedure. First, the `summary` calculates the "potential scale reduction factor" ($\hat{R}$) for each parameter in the imputation model. If this value is noticeably larger than 1 for some parameters (say $>1.05$), a longer burn-in period may be required. ```{r} summary(imp) ``` Second, diagnostic plots can be requested with the `plot` function. These plots consist of a trace plot, an autocorrelation plot, and some additional information about the posterior distribution. Convergence can be assumed if the trace plot is stationary (i.e., does not "drift"), and the autocorrelation is within reasonable bounds for the chosen number of iterations between imputations. For this example, we examine only the plot for the parameter `Beta[1,2]` (i.e., the intercept of `ReadDis`). ```{r conv, echo=FALSE} plot(imp, trace = "all", print = "beta", pos = c(1,2), export = "png", dev.args = list(width=720, height=380, pointsize=16)) ``` ```{r, eval=FALSE} plot(imp, trace = "all", print = "beta", pos = c(1,2)) ``` ![](mitmlPlots/BETA_ReadDis_ON_Intercept.png) Taken together, both $\hat{R}$ and the diagnostic plots indicate that the imputation model converged, setting the basis for the analysis of the imputed data sets. ## Completing the data In order to work with and analyze the imputed data sets, the data sets must be completed with the imputations generated in the previous steps. To do so, `mitml` provides the function `mitmlComplete`. ```{r} implist <- mitmlComplete(imp, "all") ``` This resulting object is a list that contains the 100 completed data sets. ## Analysis and pooling In order to obtain estimates for the model of interest, the model must be fit separately to each of the completed data sets, and the results must be pooled into a final set of estimates and inferences. The `mitml` package offers the `with` function to fit various statistical models to a list of completed data sets. In this example, we use the `lmer` function from the R package `lme4` to fit the model of interest. ```{r, message=FALSE} library(lme4) fit <- with(implist, lmer(ReadAchiev ~ 1 + ReadDis + (1|ID))) ``` The resulting object is a list containing the 100 fitted models. To pool the results of these models into a set of final estimates and inferences, `mitml` offers the `testEstimates` function. ```{r} testEstimates(fit, extra.pars = TRUE) ``` The estimates can be interpreted in a manner similar to the estimates from the corresponding complete-data procedure. In addition, the output includes diagnostic quantities such as the fraction of missing information (FMI), which can be helpful for interpreting the results and understanding problems with the imputation procedure. ###### References Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Multiple imputation of multilevel missing data: An introduction to the R package pan. *SAGE Open*, *6*(4), 1–17. doi: 10.1177/2158244016668220 ([Link](https://doi.org/10.1177/2158244016668220)) Grund, S., Lüdtke, O., & Robitzsch, A. (in press). Multiple imputation of missing data for multilevel models: Simulations and recommendations. *Organizational Research Methods*. doi: 10.1177/1094428117703686 ([Link](https://doi.org/10.1177/1094428117703686)) --- ```{r, echo=F} cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) ``` mitml/inst/doc/Introduction.R0000644000176200001440000000417414127037122015727 0ustar liggesusers## ----setup, include=FALSE, cache=FALSE----------------------------------------------- library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ## ------------------------------------------------------------------------------------ library(mitml) data(studentratings) ## ------------------------------------------------------------------------------------ summary(studentratings) ## ---- echo=FALSE--------------------------------------------------------------------- round(cor(studentratings[,-(1:3)], use="pairwise"),3) ## ---- results="hide"----------------------------------------------------------------- ReadAchiev ~ 1 + ReadDis + (1|ID) ## ------------------------------------------------------------------------------------ fml <- ReadAchiev + ReadDis + SchClimate ~ 1 + (1|ID) ## ---- results="hide"----------------------------------------------------------------- imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 100, m = 100) ## ------------------------------------------------------------------------------------ summary(imp) ## ----conv, echo=FALSE---------------------------------------------------------------- plot(imp, trace = "all", print = "beta", pos = c(1,2), export = "png", dev.args = list(width=720, height=380, pointsize=16)) ## ---- eval=FALSE--------------------------------------------------------------------- # plot(imp, trace = "all", print = "beta", pos = c(1,2)) ## ------------------------------------------------------------------------------------ implist <- mitmlComplete(imp, "all") ## ---- message=FALSE------------------------------------------------------------------ library(lme4) fit <- with(implist, lmer(ReadAchiev ~ 1 + ReadDis + (1|ID))) ## ------------------------------------------------------------------------------------ testEstimates(fit, extra.pars = TRUE) ## ---- echo=F------------------------------------------------------------------------- cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) mitml/inst/doc/Level2.html0000644000176200001440000005334714127037132015151 0ustar liggesusers Imputation of Missing Data at Level 2

Imputation of Missing Data at Level 2


This vignette illustrates the use of mitml for the treatment of missing data at Level 2. Specifically, the vignette addresses the following topics:

  1. Specification of the two-level imputation model for missing data at both Level 1 and 2
  2. Running the imputation procedure

Further information can be found in the other vignettes and the package documentation.

Example data

For purposes of this vignette, we make use of the leadership data set, which contains simulated data from 750 employees in 50 groups including ratings on job satisfaction, leadership style, and work load (Level 1) as well as cohesion (Level 2).

The package and the data set can be loaded as follows.

library(mitml)
data(leadership)

In the summary of the data, it becomes visible that all variables are affected by missing data.

summary(leadership)
#      GRPID          JOBSAT             COHES            NEGLEAD          WLOAD    
#  Min.   : 1.0   Min.   :-7.32934   Min.   :-3.4072   Min.   :-3.13213   low :416  
#  1st Qu.:13.0   1st Qu.:-1.61932   1st Qu.:-0.4004   1st Qu.:-0.70299   high:248  
#  Median :25.5   Median :-0.02637   Median : 0.2117   Median : 0.08027   NA's: 86  
#  Mean   :25.5   Mean   :-0.03168   Mean   : 0.1722   Mean   : 0.04024             
#  3rd Qu.:38.0   3rd Qu.: 1.64571   3rd Qu.: 1.1497   3rd Qu.: 0.79111             
#  Max.   :50.0   Max.   :10.19227   Max.   : 2.5794   Max.   : 3.16116             
#                 NA's   :69         NA's   :30        NA's   :92

The following data segment illustrates this fact, including cases with missing data at Level 1 (e.g., job satisfaction) and 2 (e.g., cohesion).

#    GRPID      JOBSAT     COHES     NEGLEAD WLOAD
# 73     5 -1.72143400 0.9023198  0.83025589  high
# 74     5          NA 0.9023198  0.15335056  high
# 75     5 -0.09541178 0.9023198  0.21886272   low
# 76     6  0.68626611        NA -0.38190591  high
# 77     6          NA        NA          NA   low
# 78     6 -1.86298201        NA -0.05351001  high

In the following, we will employ a two-level model to address missing data at both levels simultaneously.

Specifying the imputation model

The specification of the two-level model, involves two components, one pertaining to the variables at each level of the sample (Goldstein, Carpenter, Kenward, & Levin, 2009; for further discussion, see also Enders, Mister, & Keller, 2016; Grund, Lüdtke, & Robitzsch, in press).

Specifically, the imputation model is specified as a list with two components, where the first component denotes the model for the variables at Level 1, and the second component denotes the model for the variables at Level 2.

For example, using the formula interface, an imputation model targeting all variables in the data set can be written as follows.

fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , # Level 1
             COHES ~ 1 )                                # Level 2

The first component of this list includes the three target variables at Level 1 and a fixed (1) as well as a random intercept (1|GRPID). The second component includes the target variable at Level 2 with a fixed intercept (1).

From a statistical point of view, this specification corresponds to the following model \[ \begin{aligned} \mathbf{y}_{1ij} &= \boldsymbol\mu_{1} + \mathbf{u}_{1j} + \mathbf{e}_{ij} \\ \mathbf{y}_{2j} &= \boldsymbol\mu_{2} + \mathbf{u}_{1j} \; , \end{aligned} \] where \(\mathbf{y}_{1ij}\) denotes the target variables at Level 1, \(\mathbf{y}_{2j}\) the target variables at Level 2, and the right-hand side of the model contains the fixed effects, random effects, and residual terms as mentioned above.

Note that, even though the two components of the model appear to be separated, they define a single (joint) model for all target variables at both Level 1 and 2. Specifically, this model employs a two-level covariance structure, which allows for relations between variables at both Level 1 (i.e., correlated residuals at Level 1) and 2 (i.e., correlated random effects residuals at Level 2).

Generating imputations

Because the data contain missing values at both levels, imputations will be generated with jomoImpute (and not panImpute). Except for the specification of the two-level model, the syntax is the same as in applications with missing data only at Level 1.

Here, we will run 5,000 burn-in iterations and generate 20 imputations, each 250 iterations apart.

imp <- jomoImpute(leadership, formula = fml, n.burn = 5000, n.iter = 250, m = 20)

By looking at the summary, we can then review the imputation procedure and verify that the imputation model converged.

summary(imp)
# 
# Call:
# 
# jomoImpute(data = leadership, formula = fml, n.burn = 5000, n.iter = 250, 
#     m = 20)
# 
# Level 1:
#  
# Cluster variable:         GRPID 
# Target variables:         JOBSAT NEGLEAD WLOAD 
# Fixed effect predictors:  (Intercept) 
# Random effect predictors: (Intercept) 
# 
# Level 2:
#                 
# Target variables:         COHES 
# Fixed effect predictors:  (Intercept) 
# 
# Performed 5000 burn-in iterations, and generated 20 imputed data sets,
# each 250 iterations apart. 
# 
# Potential scale reduction (Rhat, imputation phase):
#  
#          Min   25%  Mean Median   75%   Max
# Beta:  1.001 1.002 1.004  1.004 1.006 1.009
# Beta2: 1.001 1.001 1.001  1.001 1.001 1.001
# Psi:   1.000 1.001 1.002  1.001 1.002 1.006
# Sigma: 1.000 1.002 1.004  1.004 1.006 1.007
# 
# Largest potential scale reduction:
# Beta: [1,3], Beta2: [1,1], Psi: [1,1], Sigma: [2,1]
# 
# Missing data per variable:
#     GRPID JOBSAT NEGLEAD WLOAD COHES
# MD% 0     9.2    12.3    11.5  4.0

Due to the greater complexity of the two-level model, the output includes more information than in applications with missing data only at Level 1. For example, the output features the model specification for variables at both Level 1 and 2. Furthermore, it provides convergence statistics for the additional regression coefficients for the target variables at Level 2 (i.e., Beta2).

Finally, it also becomes visible that the two-level model indeed allows for relations between target variables at Level 1 and 2. This can be seen from the fact that the potential scale reduction factor (\(\hat{R}\)) for the covariance matrix at Level 2 (Psi) was largest for Psi[4,3], which is the covariance between cohesion and the random intercept of work load.

Completing the data

The completed data sets can then be extracted with mitmlComplete.

implist <- mitmlComplete(imp, "all")

When inspecting the completed data, it is easy to verify that the imputations for variables at Level 2 are constant within groups as intended, thus preserving the two-level structure of the data.

#    GRPID      JOBSAT     NEGLEAD WLOAD     COHES
# 73     5 -1.72143400  0.83025589  high 0.9023198
# 74     5  0.68223338  0.15335056  high 0.9023198
# 75     5 -0.09541178  0.21886272   low 0.9023198
# 76     6  0.68626611 -0.38190591  high 2.1086213
# 77     6 -2.97953478 -1.05236552   low 2.1086213
# 78     6 -1.86298201 -0.05351001  high 2.1086213
References

Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21, 222–240. doi: 10.1037/met0000063 (Link)

Goldstein, H., Carpenter, J. R., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173–197. doi: 10.1177/1471082X0800900301 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (in press). Multiple imputation of missing data for multilevel models: Simulations and recommendations. Organizational Research Methods. doi: 10.1177/1094428117703686 (Link)


# Author: Simon Grund (grund@ipn.uni-kiel.de)
# Date:   2021-10-05
mitml/inst/doc/Level2.R0000644000176200001440000000277414127037132014404 0ustar liggesusers## ----setup, include=FALSE, cache=FALSE----------------------------------------------- library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ## ------------------------------------------------------------------------------------ library(mitml) data(leadership) ## ------------------------------------------------------------------------------------ summary(leadership) ## ---- echo=FALSE--------------------------------------------------------------------- leadership[73:78,] ## ------------------------------------------------------------------------------------ fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , # Level 1 COHES ~ 1 ) # Level 2 ## ---- results="hide"----------------------------------------------------------------- imp <- jomoImpute(leadership, formula = fml, n.burn = 5000, n.iter = 250, m = 20) ## ------------------------------------------------------------------------------------ summary(imp) ## ------------------------------------------------------------------------------------ implist <- mitmlComplete(imp, "all") ## ---- echo=FALSE--------------------------------------------------------------------- implist[[1]][73:78,] ## ---- echo=F------------------------------------------------------------------------- cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) mitml/inst/doc/Analysis.Rmd0000644000176200001440000002302414127016114015343 0ustar liggesusers--- title: "Analysis of Multiply Imputed Data Sets" output: rmarkdown::html_vignette: css: "css/vignette.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Analysis of Multiply Imputed Data Sets} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE, cache=FALSE} library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ``` This vignette is intended to provide an overview of the analysis of multiply imputed data sets with `mitml`. Specifically, this vignette addresses the following topics: 1. Working with multiply imputed data sets 2. Rubin's rules for pooling individual parameters 3. Model comparisons 4. Parameter constraints Further information can be found in the other [vignettes](https://github.com/simongrund1/mitml/wiki) and the package [documentation](https://cran.r-project.org/package=mitml/mitml.pdf). ## Example data (`studentratings`) For the purposes of this vignette, we make use of the `studentratings` data set, which contains simulated data from 750 students in 50 schools including scores on reading and math achievement, socioeconomic status (SES), and ratings on school and classroom environment. The package and the data set can be loaded as follows. ```{r} library(mitml) library(lme4) data(studentratings) ``` As evident from its `summary`, most variables in the data set contain missing values. ```{r} summary(studentratings) ``` In the present example, we investigate the differences in mathematics achievement that can be attributed to differences in SES when controlling for students' sex. Specifically, we are interested in the following model. $$ \mathit{MA}_{ij} = \gamma_{00} + \gamma_{10} \mathit{Sex}_{ij} + \gamma_{20} (\mathit{SES}_{ij}-\overline{\mathit{SES}}_{\bullet j}) + \gamma_{01} \overline{\mathit{SES}}_{\bullet j} + u_{0j} + e_{ij} $$ Note that this model also employs group-mean centering to separate the individual and group-level effects of SES. ## Generating imputations In the present example, we generate 20 imputations from the following imputation model. ```{r, results="hide"} fml <- ReadDis + SES ~ 1 + Sex + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 200, m = 20) ``` The completed data are then extracted with `mitmlComplete`. ```{r} implist <- mitmlComplete(imp, "all") ``` ## Transforming the imputed data sets In empirical research, the raw data rarely enter the analyses but often require to be transformed beforehand. For this purpose, the `mitml` package provides the `within` function, which applies a given transformation directly to each data set. In the following, we use this to (a) calculate the group means of SES and (b) center the individual scores around their group means. ```{r} implist <- within(implist, { G.SES <- clusterMeans(SES, ID) # calculate group means I.SES <- SES - G.SES # center around group means }) ``` This method can be used to apply arbitrary transformations to all of the completed data sets simultaneously. > **Note regarding** `dplyr`**:** > Due to how it is implemented, `within` cannot be used directly with `dplyr`. > Instead, users may use `with` instead of `within` with the following workaround. >```{r, eval=FALSE} >implist <- with(implist,{ > df <- data.frame(as.list(environment())) > df <- ... # dplyr commands > df >}) >implist <- as.mitml.list(implist) >``` > Advanced users may also consider using `lapply` for a similar workaround.` ## Fitting the analysis model In order to analyze the imputed data, each data set is analyzed using regular complete-data techniques. For this purpose, `mitml` offers the `with` function. In the present example, we use it to fit the model of interest with the R package `lme4`. ```{r} fit <- with(implist, { lmer(MathAchiev ~ 1 + Sex + I.SES + G.SES + (1|ID)) }) ``` This results in a list of fitted models, one for each of the imputed data sets. ## Pooling The results obtained from the imputed data sets must be pooled in order to obtain a set of final parameter estimates and inferences. In the following, we employ a number of different pooling methods that can be used to address common statistical tasks, for example, for (a) estimating and testing individual parameters, (b) model comparisons, and (c) tests of constraints about one or several parameters. #### Parameter estimates Individual parameters are commonly pooled with the rules developed by Rubin (1987). In `mitml`, Rubin's rules are implemented in the `testEstimates` function. ```{r} testEstimates(fit) ``` In addition, the argument `extra.pars = TRUE` can be used to obtain pooled estimates of variance components, and `df.com` can be used to specify the complete-data degrees of freedom, which provides more appropriate (i.e., conservative) inferences in smaller samples. For example, using a conservative value for the complete-data degrees of freedom for the fixed effects in the model of interest (Snijders & Bosker, 2012), the output changes as follows. ```{r} testEstimates(fit, extra.pars = TRUE, df.com = 46) ``` #### Multiple parameters and model comparisons Oftentimes, statistical inference concerns more than one parameter at a time. For example, the combined influence of SES (within and between groups) on mathematics achievement is represented by two parameters in the model of interest. Multiple pooling methods for Wald and likelihood ratio tests (LRTs) are implemented in the `testModels` function. This function requires the specification of a full model and a restricted model, which are then compared using (pooled) Wald tests or LRTs. Specifically, `testModels` allows users to pool Wald tests ($D_1$), $\chi^2$ test statistics ($D_2$), and LRTs ($D_3$ and $D_4$; for a comparison of these methods, see also Grund, Lüdtke, & Robitzsch, 2016b). To examine the combined influence of SES on mathematics achievement, the following restricted model can be specified and compared with the model of interest (using $D_1$). ```{r} fit.null <- with(implist, { lmer(MathAchiev ~ 1 + Sex + (1|ID)) }) testModels(fit, fit.null) ``` > **Note regarding the order of arguments:** > Please note that `testModels` expects that the first argument contains the full model, and the second argument contains the restricted model. > If the order of the arguments is reversed, the results will not be interpretable. Similar to the test for individual parameters, smaller samples can be accommodated with `testModels` (with method $D_1$) by specifying the complete-data degrees of freedom for the denominator of the $F$ statistic. ```{r} testModels(fit, fit.null, df.com = 46) ``` The pooling method used by `testModels` is determined by the `method` argument. For example, to calculate the pooled LRT corresponding to the Wald test above (i.e., $D_3$), the following command can be issued. ```{r} testModels(fit, fit.null, method="D3") ``` #### Constraints on parameters Finally, it is often useful to investigate functions (or constraints) of the parameters in the model of interest. In complete data sets, this can be achieved with a test of linear hypotheses or the delta method. The `mitml` package implements a pooled version of the delta method in the `testConstraints` function. For example, the combined influence of SES on mathematics achievement can also be tested without model comparisons by testing the constraint that the parameters pertaining to `I.SES` and `G.SES` are both zero. This constraint is defined and tested as follows. ```{r} c1 <- c("I.SES", "G.SES") testConstraints(fit, constraints = c1) ``` This test is identical to the Wald test given in the previous section. Arbitrary constraints on the parameters can be specified and tested in this manner, where each character string denotes an expression to be tested against zero. In the present example, we are also interested in the *contextual* effect of SES on mathematics achievement (e.g., Snijders & Bosker, 2012). The contextual effect is simply the difference between the coefficients pertaining to `G.SES` and `I.SES` and can be tested as follows. ```{r} c2 <- c("G.SES - I.SES") testConstraints(fit, constraints = c2) ``` Similar to model comparisons, constraints can be tested with different methods ($D_1$ and $D_2$) and can accommodate smaller samples by a value for `df.com`. Further examples for the analysis of multiply imputed data sets with `mitml` are given by Enders (2016) and Grund, Lüdtke, and Robitzsch (2016a). ###### References Enders, C. K. (2016). Multiple imputation as a flexible tool for missing data handling in clinical research. *Behaviour Research and Therapy*. doi: 10.1016/j.brat.2016.11.008 ([Link](https://doi.org/10.1016/j.brat.2016.11.008)) Grund, S., Lüdtke, O., & Robitzsch, A. (2016a). Multiple imputation of multilevel missing data: An introduction to the R package pan. *SAGE Open*, *6*(4), 1–17. doi: 10.1177/2158244016668220 ([Link](https://doi.org/10.1177/2158244016668220)) Grund, S., Lüdtke, O., & Robitzsch, A. (2016b). Pooling ANOVA results from multiply imputed datasets: A simulation study. *Methodology*, *12*, 75–88. doi: 10.1027/1614-2241/a000111 ([Link](https://doi.org/10.1027/1614-2241/a000111)) Rubin, D. B. (1987). *Multiple imputation for nonresponse in surveys*. Hoboken, NJ: Wiley. Snijders, T. A. B., & Bosker, R. J. (2012). *Multilevel analysis: An introduction to basic and advanced multilevel modeling*. Thousand Oaks, CA: Sage. --- ```{r, echo=F} cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) ``` mitml/inst/doc/Analysis.html0000644000176200001440000010104114127037116015566 0ustar liggesusers Analysis of Multiply Imputed Data Sets

Analysis of Multiply Imputed Data Sets

This vignette is intended to provide an overview of the analysis of multiply imputed data sets with mitml. Specifically, this vignette addresses the following topics:

  1. Working with multiply imputed data sets
  2. Rubin’s rules for pooling individual parameters
  3. Model comparisons
  4. Parameter constraints

Further information can be found in the other vignettes and the package documentation.

Example data (studentratings)

For the purposes of this vignette, we make use of the studentratings data set, which contains simulated data from 750 students in 50 schools including scores on reading and math achievement, socioeconomic status (SES), and ratings on school and classroom environment.

The package and the data set can be loaded as follows.

library(mitml)
library(lme4)
data(studentratings)

As evident from its summary, most variables in the data set contain missing values.

summary(studentratings)
#        ID       FedState     Sex              MathAchiev       MathDis      
#  Min.   :1001   B :375   Length:750         Min.   :225.0   Min.   :0.2987  
#  1st Qu.:1013   SH:375   Class :character   1st Qu.:440.7   1st Qu.:1.9594  
#  Median :1513            Mode  :character   Median :492.7   Median :2.4350  
#  Mean   :1513                               Mean   :495.4   Mean   :2.4717  
#  3rd Qu.:2013                               3rd Qu.:553.2   3rd Qu.:3.0113  
#  Max.   :2025                               Max.   :808.1   Max.   :4.7888  
#                                             NA's   :132     NA's   :466     
#       SES          ReadAchiev       ReadDis        CognAbility      SchClimate     
#  Min.   :-9.00   Min.   :191.1   Min.   :0.7637   Min.   :28.89   Min.   :0.02449  
#  1st Qu.:35.00   1st Qu.:427.4   1st Qu.:2.1249   1st Qu.:43.80   1st Qu.:1.15338  
#  Median :46.00   Median :490.2   Median :2.5300   Median :48.69   Median :1.65636  
#  Mean   :46.55   Mean   :489.9   Mean   :2.5899   Mean   :48.82   Mean   :1.73196  
#  3rd Qu.:59.00   3rd Qu.:558.4   3rd Qu.:3.0663   3rd Qu.:53.94   3rd Qu.:2.24018  
#  Max.   :93.00   Max.   :818.5   Max.   :4.8554   Max.   :71.29   Max.   :4.19316  
#  NA's   :281                     NA's   :153                      NA's   :140

In the present example, we investigate the differences in mathematics achievement that can be attributed to differences in SES when controlling for students’ sex. Specifically, we are interested in the following model.

\[ \mathit{MA}_{ij} = \gamma_{00} + \gamma_{10} \mathit{Sex}_{ij} + \gamma_{20} (\mathit{SES}_{ij}-\overline{\mathit{SES}}_{\bullet j}) + \gamma_{01} \overline{\mathit{SES}}_{\bullet j} + u_{0j} + e_{ij} \]

Note that this model also employs group-mean centering to separate the individual and group-level effects of SES.

Generating imputations

In the present example, we generate 20 imputations from the following imputation model.

fml <- ReadDis + SES ~ 1 + Sex + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 200, m = 20)

The completed data are then extracted with mitmlComplete.

implist <- mitmlComplete(imp, "all")

Transforming the imputed data sets

In empirical research, the raw data rarely enter the analyses but often require to be transformed beforehand. For this purpose, the mitml package provides the within function, which applies a given transformation directly to each data set.

In the following, we use this to (a) calculate the group means of SES and (b) center the individual scores around their group means.

implist <- within(implist, {
  G.SES <- clusterMeans(SES, ID) # calculate group means
  I.SES <- SES - G.SES           # center around group means
})

This method can be used to apply arbitrary transformations to all of the completed data sets simultaneously.

Note regarding dplyr: Due to how it is implemented, within cannot be used directly with dplyr. Instead, users may use with instead of within with the following workaround.

implist <- with(implist,{
  df <- data.frame(as.list(environment()))
  df <- ... # dplyr commands
  df
})
implist <- as.mitml.list(implist)

Advanced users may also consider using lapply for a similar workaround.`

Fitting the analysis model

In order to analyze the imputed data, each data set is analyzed using regular complete-data techniques. For this purpose, mitml offers the with function. In the present example, we use it to fit the model of interest with the R package lme4.

fit <- with(implist, {
  lmer(MathAchiev ~ 1 + Sex + I.SES + G.SES + (1|ID))
})

This results in a list of fitted models, one for each of the imputed data sets.

Pooling

The results obtained from the imputed data sets must be pooled in order to obtain a set of final parameter estimates and inferences. In the following, we employ a number of different pooling methods that can be used to address common statistical tasks, for example, for (a) estimating and testing individual parameters, (b) model comparisons, and (c) tests of constraints about one or several parameters.

Parameter estimates

Individual parameters are commonly pooled with the rules developed by Rubin (1987). In mitml, Rubin’s rules are implemented in the testEstimates function.

testEstimates(fit)
# 
# Call:
# 
# testEstimates(model = fit)
# 
# Final parameter estimates and inferences obtained from 20 imputed data sets.
# 
#              Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
# (Intercept)   433.015    28.481    15.203 1.081e+03     0.000     0.153     0.134 
# SexGirl         3.380     7.335     0.461 2.794e+05     0.645     0.008     0.008 
# I.SES           0.692     0.257     2.690 2.334e+02     0.008     0.399     0.291 
# G.SES           1.296     0.597     2.173 1.097e+03     0.030     0.152     0.133 
# 
# Unadjusted hypothesis test as appropriate in larger samples.

In addition, the argument extra.pars = TRUE can be used to obtain pooled estimates of variance components, and df.com can be used to specify the complete-data degrees of freedom, which provides more appropriate (i.e., conservative) inferences in smaller samples.

For example, using a conservative value for the complete-data degrees of freedom for the fixed effects in the model of interest (Snijders & Bosker, 2012), the output changes as follows.

testEstimates(fit, extra.pars = TRUE, df.com = 46)
# 
# Call:
# 
# testEstimates(model = fit, extra.pars = TRUE, df.com = 46)
# 
# Final parameter estimates and inferences obtained from 20 imputed data sets.
# 
#              Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
# (Intercept)   433.015    28.481    15.203    36.965     0.000     0.153     0.176 
# SexGirl         3.380     7.335     0.461    43.752     0.647     0.008     0.051 
# I.SES           0.692     0.257     2.690    27.781     0.012     0.399     0.332 
# G.SES           1.296     0.597     2.173    37.022     0.036     0.152     0.175 
# 
#                         Estimate 
# Intercept~~Intercept|ID  168.506 
# Residual~~Residual      8092.631 
# ICC|ID                     0.020 
# 
# Hypothesis test adjusted for small samples with df=[46]
# complete-data degrees of freedom.

Multiple parameters and model comparisons

Oftentimes, statistical inference concerns more than one parameter at a time. For example, the combined influence of SES (within and between groups) on mathematics achievement is represented by two parameters in the model of interest.

Multiple pooling methods for Wald and likelihood ratio tests (LRTs) are implemented in the testModels function. This function requires the specification of a full model and a restricted model, which are then compared using (pooled) Wald tests or LRTs. Specifically, testModels allows users to pool Wald tests (\(D_1\)), \(\chi^2\) test statistics (\(D_2\)), and LRTs (\(D_3\) and \(D_4\); for a comparison of these methods, see also Grund, Lüdtke, & Robitzsch, 2016b).

To examine the combined influence of SES on mathematics achievement, the following restricted model can be specified and compared with the model of interest (using \(D_1\)).

fit.null <- with(implist, {
  lmer(MathAchiev ~ 1 + Sex + (1|ID))
})

testModels(fit, fit.null)
# 
# Call:
# 
# testModels(model = fit, null.model = fit.null)
# 
# Model comparison calculated from 20 imputed data sets.
# Combination method: D1
# 
#     F.value     df1     df2   P(>F)     RIV 
#       6.095       2 674.475   0.002   0.275 
# 
# Unadjusted hypothesis test as appropriate in larger samples.
# Models fitted with REML were used as is.

Note regarding the order of arguments: Please note that testModels expects that the first argument contains the full model, and the second argument contains the restricted model. If the order of the arguments is reversed, the results will not be interpretable.

Similar to the test for individual parameters, smaller samples can be accommodated with testModels (with method \(D_1\)) by specifying the complete-data degrees of freedom for the denominator of the \(F\) statistic.

testModels(fit, fit.null, df.com = 46)
# 
# Call:
# 
# testModels(model = fit, null.model = fit.null, df.com = 46)
# 
# Model comparison calculated from 20 imputed data sets.
# Combination method: D1
# 
#     F.value     df1     df2   P(>F)     RIV 
#       6.095       2  39.812   0.005   0.275 
# 
# Hypothesis test adjusted for small samples with df=[46]
# complete-data degrees of freedom.
# Models fitted with REML were used as is.

The pooling method used by testModels is determined by the method argument. For example, to calculate the pooled LRT corresponding to the Wald test above (i.e., \(D_3\)), the following command can be issued.

testModels(fit, fit.null, method="D3")
# 
# Call:
# 
# testModels(model = fit, null.model = fit.null, method = "D3")
# 
# Model comparison calculated from 20 imputed data sets.
# Combination method: D3
# 
#     F.value     df1     df2   P(>F)     RIV 
#       5.787       2 519.143   0.003   0.328 
# 
# Models originally fitted with REML were refitted using ML.

Constraints on parameters

Finally, it is often useful to investigate functions (or constraints) of the parameters in the model of interest. In complete data sets, this can be achieved with a test of linear hypotheses or the delta method. The mitml package implements a pooled version of the delta method in the testConstraints function.

For example, the combined influence of SES on mathematics achievement can also be tested without model comparisons by testing the constraint that the parameters pertaining to I.SES and G.SES are both zero. This constraint is defined and tested as follows.

c1 <- c("I.SES", "G.SES")
testConstraints(fit, constraints = c1)
# 
# Call:
# 
# testConstraints(model = fit, constraints = c1)
# 
# Hypothesis test calculated from 20 imputed data sets. The following
# constraints were specified:
# 
#             Estimate Std. Error 
#    I.SES:      0.692      0.245 
#    G.SES:      1.296      0.628 
# 
# Combination method: D1 
# 
#     F.value     df1     df2   P(>F)     RIV 
#       6.095       2 674.475   0.002   0.275 
# 
# Unadjusted hypothesis test as appropriate in larger samples.

This test is identical to the Wald test given in the previous section. Arbitrary constraints on the parameters can be specified and tested in this manner, where each character string denotes an expression to be tested against zero.

In the present example, we are also interested in the contextual effect of SES on mathematics achievement (e.g., Snijders & Bosker, 2012). The contextual effect is simply the difference between the coefficients pertaining to G.SES and I.SES and can be tested as follows.

c2 <- c("G.SES - I.SES")
testConstraints(fit, constraints = c2)
# 
# Call:
# 
# testConstraints(model = fit, constraints = c2)
# 
# Hypothesis test calculated from 20 imputed data sets. The following
# constraints were specified:
# 
#                     Estimate Std. Error 
#    G.SES - I.SES:      0.605      0.644 
# 
# Combination method: D1 
# 
#     F.value     df1     df2   P(>F)     RIV 
#       0.881       1 616.380   0.348   0.166 
# 
# Unadjusted hypothesis test as appropriate in larger samples.

Similar to model comparisons, constraints can be tested with different methods (\(D_1\) and \(D_2\)) and can accommodate smaller samples by a value for df.com. Further examples for the analysis of multiply imputed data sets with mitml are given by Enders (2016) and Grund, Lüdtke, and Robitzsch (2016a).

References

Enders, C. K. (2016). Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour Research and Therapy. doi: 10.1016/j.brat.2016.11.008 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (2016a). Multiple imputation of multilevel missing data: An introduction to the R package pan. SAGE Open, 6(4), 1–17. doi: 10.1177/2158244016668220 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (2016b). Pooling ANOVA results from multiply imputed datasets: A simulation study. Methodology, 12, 75–88. doi: 10.1027/1614-2241/a000111 (Link)

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.


# Author: Simon Grund (grund@ipn.uni-kiel.de)
# Date:   2021-10-05
mitml/inst/doc/Analysis.R0000644000176200001440000000522614127037116015033 0ustar liggesusers## ----setup, include=FALSE, cache=FALSE----------------------------------------------- library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ## ------------------------------------------------------------------------------------ library(mitml) library(lme4) data(studentratings) ## ------------------------------------------------------------------------------------ summary(studentratings) ## ---- results="hide"----------------------------------------------------------------- fml <- ReadDis + SES ~ 1 + Sex + (1|ID) imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 200, m = 20) ## ------------------------------------------------------------------------------------ implist <- mitmlComplete(imp, "all") ## ------------------------------------------------------------------------------------ implist <- within(implist, { G.SES <- clusterMeans(SES, ID) # calculate group means I.SES <- SES - G.SES # center around group means }) ## ---- eval=FALSE--------------------------------------------------------------------- # implist <- with(implist,{ # df <- data.frame(as.list(environment())) # df <- ... # dplyr commands # df # }) # implist <- as.mitml.list(implist) ## ------------------------------------------------------------------------------------ fit <- with(implist, { lmer(MathAchiev ~ 1 + Sex + I.SES + G.SES + (1|ID)) }) ## ------------------------------------------------------------------------------------ testEstimates(fit) ## ------------------------------------------------------------------------------------ testEstimates(fit, extra.pars = TRUE, df.com = 46) ## ------------------------------------------------------------------------------------ fit.null <- with(implist, { lmer(MathAchiev ~ 1 + Sex + (1|ID)) }) testModels(fit, fit.null) ## ------------------------------------------------------------------------------------ testModels(fit, fit.null, df.com = 46) ## ------------------------------------------------------------------------------------ testModels(fit, fit.null, method="D3") ## ------------------------------------------------------------------------------------ c1 <- c("I.SES", "G.SES") testConstraints(fit, constraints = c1) ## ------------------------------------------------------------------------------------ c2 <- c("G.SES - I.SES") testConstraints(fit, constraints = c2) ## ---- echo=F------------------------------------------------------------------------- cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) mitml/inst/doc/Introduction.html0000644000176200001440000030613314127037122016472 0ustar liggesusers Introduction

Introduction


This vignette is intended to provide a first introduction to the R package mitml for generating and analyzing multiple imputations for multilevel missing data. A usual application of the package may consist of the following steps.

  1. Imputation
  2. Assessment of convergence
  3. Completion of the data
  4. Analysis
  5. Pooling

The mitml package offers a set of tools to facilitate each of these steps. This vignette is intended as a step-by-step illustration of the basic features of mitml. Further information can be found in the other vignettes and the package documentation.

Example data

For the purposes of this vignette, we employ a simple example that makes use of the studentratings data set, which is provided with mitml. To use it, the mitml package and the data set must be loaded as follows.

library(mitml)
data(studentratings)

More information about the variables in the data set can be obtained from its summary.

summary(studentratings)
#        ID       FedState     Sex              MathAchiev       MathDis      
#  Min.   :1001   B :375   Length:750         Min.   :225.0   Min.   :0.2987  
#  1st Qu.:1013   SH:375   Class :character   1st Qu.:440.7   1st Qu.:1.9594  
#  Median :1513            Mode  :character   Median :492.7   Median :2.4350  
#  Mean   :1513                               Mean   :495.4   Mean   :2.4717  
#  3rd Qu.:2013                               3rd Qu.:553.2   3rd Qu.:3.0113  
#  Max.   :2025                               Max.   :808.1   Max.   :4.7888  
#                                             NA's   :132     NA's   :466     
#       SES          ReadAchiev       ReadDis        CognAbility      SchClimate     
#  Min.   :-9.00   Min.   :191.1   Min.   :0.7637   Min.   :28.89   Min.   :0.02449  
#  1st Qu.:35.00   1st Qu.:427.4   1st Qu.:2.1249   1st Qu.:43.80   1st Qu.:1.15338  
#  Median :46.00   Median :490.2   Median :2.5300   Median :48.69   Median :1.65636  
#  Mean   :46.55   Mean   :489.9   Mean   :2.5899   Mean   :48.82   Mean   :1.73196  
#  3rd Qu.:59.00   3rd Qu.:558.4   3rd Qu.:3.0663   3rd Qu.:53.94   3rd Qu.:2.24018  
#  Max.   :93.00   Max.   :818.5   Max.   :4.8554   Max.   :71.29   Max.   :4.19316  
#  NA's   :281                     NA's   :153                      NA's   :140

In addition, the correlations between variables (based on pairwise observations) may be useful for identifying possible sources of information that may be used during the treatment of missing data.

#             MathAchiev MathDis    SES ReadAchiev ReadDis CognAbility SchClimate
# MathAchiev       1.000  -0.106  0.260      0.497  -0.080       0.569     -0.206
# MathDis         -0.106   1.000 -0.206     -0.189   0.613      -0.203      0.412
# SES              0.260  -0.206  1.000      0.305  -0.153       0.138     -0.176
# ReadAchiev       0.497  -0.189  0.305      1.000  -0.297       0.413     -0.320
# ReadDis         -0.080   0.613 -0.153     -0.297   1.000      -0.162      0.417
# CognAbility      0.569  -0.203  0.138      0.413  -0.162       1.000     -0.266
# SchClimate      -0.206   0.412 -0.176     -0.320   0.417      -0.266      1.000

This illustrates that (a) most variables in the data set are affected by missing data, but also (b) that substantial relations exist between variables. For simplicity, we focus on only a subset of these variables.

Model of interest

For the present example, we focus on the two variables ReadDis (disciplinary problems in reading class) and ReadAchiev (reading achievement).

Assume we are interested in the relation between these variables. Specifically, we may be interested in the following analysis model

\[ \mathit{ReadAchiev}_{ij} = \gamma_{00} + \gamma_{10} \mathit{ReadDis}_{ij} + u_{0j} + e_{ij} \]

On the basis of the syntax used in the R package lme4, this model may be written as follows.

ReadAchiev ~ 1 + ReadDis + (1|ID)

In this model, the relation between ReadDis and ReadAchiev is represented by a single fixed effect of ReadDis, and a random intercept is included to account for the clustered structure of the data and the group-level variance in ReadAchiev that is not explained by ReadDis.

Generating imputations

The mitml package includes wrapper functions for the R packages pan (panImpute) and jomo (jomoImpute). Here, we will use the first option. To generate imputations with panImpute, the user must specify (at least):

  1. an imputation model
  2. the number of iterations and imputations

The easiest way of specifying the imputation model is to use the formula argument of panImpute. Generally speaking, the imputation model should include all variables that are either (a) part of the model of interest, (b) related to the variables in the model, or (c) related to whether the variables in the model are missing.

In this simple example, we include only ReadDis and ReadAchiev as the main target variables and SchClimate as an auxiliary variable.

fml <- ReadAchiev + ReadDis + SchClimate ~ 1 + (1|ID)

Note that, in this specification of the imputation model. all variables are included on the left-hand side of the model, whereas the right-hand side is left “empty”. This model allows for all relations between variables at Level 1 and 2 and is thus suitable for most applications of the multilevel random intercept model (for further discussion, see also Grund, Lüdtke, & Robitzsch, 2016, in press).

The imputation procedure is then run for 5,000 iterations (burn-in), after which 100 imputations are drawn every 100 iterations.

imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 100, m = 100)

This step may take a few seconds. Once the process is completed, the imputations are saved in the imp object.

Assessing convergence

In mitml, there are two options for assessing the convergence of the imputation procedure. First, the summary calculates the “potential scale reduction factor” (\(\hat{R}\)) for each parameter in the imputation model. If this value is noticeably larger than 1 for some parameters (say \(>1.05\)), a longer burn-in period may be required.

summary(imp)
# 
# Call:
# 
# panImpute(data = studentratings, formula = fml, n.burn = 5000, 
#     n.iter = 100, m = 100)
# 
# Cluster variable:         ID 
# Target variables:         ReadAchiev ReadDis SchClimate 
# Fixed effect predictors:  (Intercept) 
# Random effect predictors: (Intercept) 
# 
# Performed 5000 burn-in iterations, and generated 100 imputed data sets,
# each 100 iterations apart. 
# 
# Potential scale reduction (Rhat, imputation phase):
#  
#          Min   25%  Mean Median   75%   Max
# Beta:  1.000 1.001 1.001  1.001 1.002 1.003
# Psi:   1.000 1.001 1.001  1.001 1.001 1.002
# Sigma: 1.000 1.000 1.000  1.000 1.000 1.001
# 
# Largest potential scale reduction:
# Beta: [1,3], Psi: [2,1], Sigma: [2,1]
# 
# Missing data per variable:
#     ID ReadAchiev ReadDis SchClimate FedState Sex MathAchiev MathDis SES  CognAbility
# MD% 0  0          20.4    18.7       0        0   17.6       62.1    37.5 0

Second, diagnostic plots can be requested with the plot function. These plots consist of a trace plot, an autocorrelation plot, and some additional information about the posterior distribution. Convergence can be assumed if the trace plot is stationary (i.e., does not “drift”), and the autocorrelation is within reasonable bounds for the chosen number of iterations between imputations.

For this example, we examine only the plot for the parameter Beta[1,2] (i.e., the intercept of ReadDis).

plot(imp, trace = "all", print = "beta", pos = c(1,2))

Taken together, both \(\hat{R}\) and the diagnostic plots indicate that the imputation model converged, setting the basis for the analysis of the imputed data sets.

Completing the data

In order to work with and analyze the imputed data sets, the data sets must be completed with the imputations generated in the previous steps. To do so, mitml provides the function mitmlComplete.

implist <- mitmlComplete(imp, "all")

This resulting object is a list that contains the 100 completed data sets.

Analysis and pooling

In order to obtain estimates for the model of interest, the model must be fit separately to each of the completed data sets, and the results must be pooled into a final set of estimates and inferences. The mitml package offers the with function to fit various statistical models to a list of completed data sets.

In this example, we use the lmer function from the R package lme4 to fit the model of interest.

library(lme4)
fit <- with(implist, lmer(ReadAchiev ~ 1 + ReadDis + (1|ID)))

The resulting object is a list containing the 100 fitted models. To pool the results of these models into a set of final estimates and inferences, mitml offers the testEstimates function.

testEstimates(fit, extra.pars = TRUE)
# 
# Call:
# 
# testEstimates(model = fit, extra.pars = TRUE)
# 
# Final parameter estimates and inferences obtained from 100 imputed data sets.
# 
#              Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
# (Intercept)   582.186    14.501    40.147  4335.314     0.000     0.178     0.152 
# ReadDis       -35.689     5.231    -6.822  3239.411     0.000     0.212     0.175 
# 
#                         Estimate 
# Intercept~~Intercept|ID  902.868 
# Residual~~Residual      6996.303 
# ICC|ID                     0.114 
# 
# Unadjusted hypothesis test as appropriate in larger samples.

The estimates can be interpreted in a manner similar to the estimates from the corresponding complete-data procedure. In addition, the output includes diagnostic quantities such as the fraction of missing information (FMI), which can be helpful for interpreting the results and understanding problems with the imputation procedure.

References

Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Multiple imputation of multilevel missing data: An introduction to the R package pan. SAGE Open, 6(4), 1–17. doi: 10.1177/2158244016668220 (Link)

Grund, S., Lüdtke, O., & Robitzsch, A. (in press). Multiple imputation of missing data for multilevel models: Simulations and recommendations. Organizational Research Methods. doi: 10.1177/1094428117703686 (Link)


# Author: Simon Grund (grund@ipn.uni-kiel.de)
# Date:   2021-10-05
mitml/inst/doc/Level2.Rmd0000644000176200001440000001442014002023537014710 0ustar liggesusers--- title: "Imputation of Missing Data at Level 2" output: rmarkdown::html_vignette: css: "css/vignette.css" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Imputation of Missing Data at Level 2} %\VignetteEncoding{UTF-8} --- --- ```{r setup, include=FALSE, cache=FALSE} library(knitr) set.seed(123) options(width=87) opts_chunk$set(background="#ffffff", comment="#", collapse=FALSE, fig.width=9, fig.height=9, warning=FALSE, message=FALSE) ``` This vignette illustrates the use of `mitml` for the treatment of missing data at Level 2. Specifically, the vignette addresses the following topics: 1. Specification of the two-level imputation model for missing data at both Level 1 and 2 2. Running the imputation procedure Further information can be found in the other [vignettes](https://github.com/simongrund1/mitml/wiki) and the package [documentation](https://cran.r-project.org/package=mitml/mitml.pdf). ## Example data For purposes of this vignette, we make use of the `leadership` data set, which contains simulated data from 750 employees in 50 groups including ratings on job satisfaction, leadership style, and work load (Level 1) as well as cohesion (Level 2). The package and the data set can be loaded as follows. ```{r} library(mitml) data(leadership) ``` In the `summary` of the data, it becomes visible that all variables are affected by missing data. ```{r} summary(leadership) ``` The following data segment illustrates this fact, including cases with missing data at Level 1 (e.g., job satisfaction) and 2 (e.g., cohesion). ```{r, echo=FALSE} leadership[73:78,] ``` In the following, we will employ a two-level model to address missing data at both levels simultaneously. ## Specifying the imputation model The specification of the two-level model, involves two components, one pertaining to the variables at each level of the sample (Goldstein, Carpenter, Kenward, & Levin, 2009; for further discussion, see also Enders, Mister, & Keller, 2016; Grund, Lüdtke, & Robitzsch, in press). Specifically, the imputation model is specified as a list with two components, where the first component denotes the model for the variables at Level 1, and the second component denotes the model for the variables at Level 2. For example, using the `formula` interface, an imputation model targeting all variables in the data set can be written as follows. ```{r} fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , # Level 1 COHES ~ 1 ) # Level 2 ``` The first component of this list includes the three target variables at Level 1 and a fixed (`1`) as well as a random intercept (`1|GRPID`). The second component includes the target variable at Level 2 with a fixed intercept (`1`). From a statistical point of view, this specification corresponds to the following model $$ \begin{aligned} \mathbf{y}_{1ij} &= \boldsymbol\mu_{1} + \mathbf{u}_{1j} + \mathbf{e}_{ij} \\ \mathbf{y}_{2j} &= \boldsymbol\mu_{2} + \mathbf{u}_{1j} \; , \end{aligned} $$ where $\mathbf{y}_{1ij}$ denotes the target variables at Level 1, $\mathbf{y}_{2j}$ the target variables at Level 2, and the right-hand side of the model contains the fixed effects, random effects, and residual terms as mentioned above. Note that, even though the two components of the model appear to be separated, they define a single (joint) model for all target variables at both Level 1 and 2. Specifically, this model employs a two-level covariance structure, which allows for relations between variables at both Level 1 (i.e., correlated residuals at Level 1) and 2 (i.e., correlated random effects residuals at Level 2). ## Generating imputations Because the data contain missing values at both levels, imputations will be generated with `jomoImpute` (and not `panImpute`). Except for the specification of the two-level model, the syntax is the same as in applications with missing data only at Level 1. Here, we will run 5,000 burn-in iterations and generate 20 imputations, each 250 iterations apart. ```{r, results="hide"} imp <- jomoImpute(leadership, formula = fml, n.burn = 5000, n.iter = 250, m = 20) ``` By looking at the `summary`, we can then review the imputation procedure and verify that the imputation model converged. ```{r} summary(imp) ``` Due to the greater complexity of the two-level model, the output includes more information than in applications with missing data only at Level 1. For example, the output features the model specification for variables at both Level 1 and 2. Furthermore, it provides convergence statistics for the additional regression coefficients for the target variables at Level 2 (i.e., `Beta2`). Finally, it also becomes visible that the two-level model indeed allows for relations between target variables at Level 1 and 2. This can be seen from the fact that the potential scale reduction factor ($\hat{R}$) for the covariance matrix at Level 2 (`Psi`) was largest for `Psi[4,3]`, which is the covariance between cohesion and the random intercept of work load. ## Completing the data The completed data sets can then be extracted with `mitmlComplete`. ```{r} implist <- mitmlComplete(imp, "all") ``` When inspecting the completed data, it is easy to verify that the imputations for variables at Level 2 are constant within groups as intended, thus preserving the two-level structure of the data. ```{r, echo=FALSE} implist[[1]][73:78,] ``` ###### References Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. *Psychological Methods*, *21*, 222–240. doi: 10.1037/met0000063 ([Link](https://doi.org/10.1037/met0000063)) Goldstein, H., Carpenter, J. R., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. *Statistical Modelling*, *9*, 173–197. doi: 10.1177/1471082X0800900301 ([Link](https://doi.org/10.1177/1471082X0800900301)) Grund, S., Lüdtke, O., & Robitzsch, A. (in press). Multiple imputation of missing data for multilevel models: Simulations and recommendations. *Organizational Research Methods*. doi: 10.1177/1094428117703686 ([Link](https://doi.org/10.1177/1094428117703686)) --- ```{r, echo=F} cat("Author: Simon Grund (grund@ipn.uni-kiel.de)\nDate: ", as.character(Sys.Date())) ```