Rtsne/0000755000176200001440000000000014534330563011353 5ustar liggesusersRtsne/NAMESPACE0000644000176200001440000000047214531100546012566 0ustar liggesusers# Generated by roxygen2: do not edit by hand S3method(Rtsne,data.frame) S3method(Rtsne,default) S3method(Rtsne,dist) export(Rtsne) export(Rtsne_neighbors) export(normalize_input) import(Rcpp) importFrom(stats,model.matrix) importFrom(stats,na.fail) importFrom(stats,prcomp) useDynLib(Rtsne, .registration = TRUE) Rtsne/LICENSE0000644000176200001440000000736514531100546012364 0ustar liggesusersAll the files in this package are covered by a BSD-style licence, but with different attribution requirements and conditions for: * The following files in the src directory: tsne.cpp, tsne.h, datapoint.h, sptree.cpp, sptree.h vptree.h * The rest of the files ================================================================= The following license applies to the following files in the src directory: tsne.cpp, tsne.h, datapoint.h, sptree.cpp, sptree.h vptree.h Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by the Delft University of Technology. 4. Neither the name of the Delft University of Technology nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ================================================================= The following license applies to the rest of the files: Copyright (c) 2014, Jesse Krijthe Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of Delft University of Technology nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Rtsne/tools/0000755000176200001440000000000014531111226012501 5ustar liggesusersRtsne/tools/example-1.png0000644000176200001440000012023014531111226014776 0ustar liggesusersPNG  IHDRz4iCCPkCGColorSpaceGenericRGB8U]hU>+$΃Ԧ5lRфem,lAݝi&3i)>A['!j-P(G 3k~s ,[%,-:t} }-+*&¿ gPG݅ج8"eŲ]A b ;l õWϙ2_E,(ۈ#Zsێ<5)"E6N#ӽEkۃO0}*rUt.iei #]r >cU{t7+ԙg߃xuWB_-%=^ t0uvW9 %/VBW'_tMۓP\>@y0`D i|[` hh)Tj0B#ЪhU# ~yhu fp#1I/I"0! 'Sdd:J5ǖ"sdy#R7wAgdJ7kʕn^:}nWFVst$gj-tԝr_װ_7Z ~V54V }o[G=Nd>-UlaY5V}xg[?k&>srq߀].r_r_qsGjy4k iQܟBZ-<(d=dKO a/zv7]ǰod}sn?TF'|3Nn#I?"mzv~K=گsl<b|_|4>?pߋQrib 2* (Ѧh{28oIyes8';Z9h6g>xRx'b8ՃWOϫ[xn%|^z}%x c8eXIfMM*i_@IDATxxz R(BE|TQT@UHU( 6DAE@$IH{qlv!4W33;{α( P(@ PHF:OC P(@%? P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P( P(@ 05*7OF P(`M{X`-Z77{;E P{{{|pqq`iQ(s(:Jƍ1}tj)@ P@֭nݺg=~5puuEӦM|(@ Px!!!Ay8 P(@ 0R(@ <`(@ Pp+(@ P`Ay8 P@i.Ƶ+HKCAaAi~R$,ReRƸ'ekY1do8?oV@=\AZT)@ Pf$p.:uñ俰$rޭ1^)nomjUp$O^=m)7[UX_IPv @om(@@nA {kDCysGsؗ-V_\lNT׳ w 4h'S׽\ϊ(*SHk}O*3^$ M??\oVATM =)A/yzm `ZT)@ Pf$PQdrm+*pJ9n-JX[XW; o K K ,7Rz?'kZc^zRv\q 0 SL@64ǿu-dy\ P@)hTgu7٨=735>-%Wȏ paFy(@ P:XGY{JV;41K% v]-Ш̋)ME#M:(@ P(pIt4)[\WYG18UApCd;h F#w})@ P&&zOFj^^OKsE_"KN)/L1/ǘ @sQ"x3UxGY{QC9QFf쫪&L<*L:3\܂OC}Ţ ru5V^^=ąx ސ Q@OQ^.E-БW,' {#z T=OFr^"6 kK⏄=X1tY!f ^%P(@ Pf*p.͢EOI\S|YVmj)k6qo5v:bLAѥ xU:o  (@ P|D#ٹВ*jK]ghͺb`hb^h*(r?K8zul yz P0czn՗?*e`Nۢ柪smz"Qɩ+#Vds@QE5(`FݛAn2hI<|'W U1K؟*"zv}.f]| nJ`j=\S},Ŀۆ",(@W0|R{-^xGFW+jhz^.rdUPs P X[`@#\P ^ \ЫƏb8|Գze~G;O0% ( `ZT)@ Pf"PXXEP|dg 5?YekOIqF&Zo%H2,t-K="EN-sGƣ|M0叀(` .No .X`3&*QNڏ^GR<[rSŗ: !P$R?T\\\JVn(@ P@&sUT)Q1϶rm5f]V /3^m ߳JM>@K ܳ@ @1qDTPPn];iiYo(@ @tVJ r<٭hVT`vVfLuB"#L:.$;p($fb*c [𑑑hѢ,,,Э[7S.]¦Myf޽*U2'E Pxd$i5cڹ1>h&*QN/#?;bl XZXkc^V9 J݂?v{S^ ;v꺹(`֖֘>E-gsq'A[ Y^6YlK|Q=j}v}m;a]s=r u(*Pf͚/z 0+)@ P.PnF~a]///}VV^ڝ\aX!=7NtMZӻxbWPΡ" [K;|yoP1X?܉y(u{ ĠW/QsL H*>""k֬!os(@ PJ okeѺ?2C{Sd JV$j.GDNʮGk}ra-o[2) KvLsC @ի'N`Сׯ mbΝhٲ޶V7nʲOOO/q?@ P(6DrmD,J;M8=NBGކωU#Ԇ*$G]+8V\~djW;W@ @Preន(Q\#00P%獎?ķl_2B P07ߩ6O|f\IU>9/'aXq,rfT_ε13H{8ՁFui T2jy!JZ\\\TnN P0IT͹6ƞOWKBp2-D^lh6lvHt >{|SY~nxlѾ]b:5F PxQũz/nG+g[{ =? !N5p>>wkOy15Di'[ ;\μwmriJ!zz۹´J]ij(@ PS@ƬGE*&<-PY>QSY5Jd5Z:EF5n,/(@ Pw PXX_~m</UhCŗcrVhY&6A4y/r޼yx |MU~V9$$V(@{(D!N:/~(fvYA^8 T Cޮ,np ޮ:N#q\Yz\L_=Y)??ƍ3oWH Pxd#w:~ϪT?@|B\):Ӫ.,m֢] >uBW] .΀̀Tsk.C{6yTjȑw}SLҿ˃SL@ :3 ~sagiTפb[cMI ǽ;snbn<vaE`*3D,j Wk7X9^hj^2R(zF QPx(@݇~Oح׽Og-8oBNA0b}>6:=?j})Iw5O0 8SDeTW$nZRM5g~\Qe{ c+MţnQބJmjee֭[WK(@ Ŝb`8T-6G95P]= O|1`i 6ILkzyLx!`v_5S$- _1[y75EOktSuL<32IǮv@ @MUQ9A[XP{}.̳55scA9m|"2UдӐ22W.7G%oa!&}Tq\?qF>W~W=/hNEkE7 `K(@ ]!F&no(q %#DqKTE뽢.C uAxwt$CL:uSĭAõe2DKCjˤ&C%;05B P,d_veM t<9=@#y= y;NOX%6E3C0h4ro=vS'׾q1ͺh#K(jqL oK(@ hd>gSϖUGyۻz'kgfNs$]DWqW|룿u7$|"}./@["rJlJ̍b߷ꋾMO#(@ Pzm|:b˵8|PgkkDQC>4|Eb\ͺCGxYZI/[yGva^)nV![s 0qv|4-ɫ(@ -.߯DeKa8zBG*m1ľu~Z{ 0(Ty"þCEuԓ79J/5,% $Y^&.Mz"#y<7Zگ(pwr:ԭo9K}Ձ[@2hګ=V^T%ct9Uk#x_X'nw-ɹ3e{苬LUKأɚgZ~viptK^ (@ PDYcs/! >, ݁gE`n5wPDe^DC?϶*ݰqh?nU?ģ枏sL?؎)GE-Tw|OOrME?P%oi1&} P= ȹO.FJ1鍰Rb Zq~݈q93B$5RguEew \d)K29X9d-?Qѱ vc=Ep JVK|4 =*(@ P- lDe룎\CdWNK V3|w}#n啮Vg q]]\}|5_s(FX{{neQ :m!{C/ (@ PDdtQEo}{~+.1S`ka'A!z9qLG^(J1]%>'uv} LE?^(@r EOJ8[j=~H.:'>\.G56܉Ĝx)7ed|%22Cvŏצ'PNyE(@{\[$;UrVyJ֜aq0i^.?oylYO}mө`%MvYxKK >ojʋ(@ ܽ̎5~'fM0H1yCQͩZxAs rDiIK!515)[u樢.S#r~(`j;(' ѯi(>):u'fPpqE\N zi'RާY!#K+M )'OUyh~ @;StO}h>!jz۔Q CT $6F /1m:׼hjnE 3P՜k!]ZWܢꈶyPIHf)@ P|=AuT/vg7f٪s(@) qQeo=h}Ɣja!rYUsj$t@KSFY*R.+|{^9(@ P@T8jkH 23A D]o? @NVV[kJ2ˎ %TEe U: e @S)z:6"HӔ.z5\jgQvl?Tt}?P@ p((@3πH>*4roi,uY<^EϥZ[.;5/zv֩,w|8z+(@ L4+qu^m|˵U8%:#,,w1>M[`"K=]CTGSDK͡ǻ᫫+Yt:JM]1{O(`ϋ֛K>_ {ZOSe][ @5|~Zg=6TEe[!sԜѰPTrn9,bj'(`UcN/0 *\\U'}ebU~)h t ?Gkڕʅ0xU v}g_Iŭ{w|b @SDVzenCXfgQG )͎G]䥊:uBtIt07+RY~h P,T4]w+h}/en 9q' oHUj!Сcuy pyz PAXNڏ3URѝD@)#8"^.rT&)ɑnsm秳/[#)@ P:X>&&oI w~Xk7P|]|ȄTe|oGVK95H&!7@ P?'?N:r=g?fX,ͪڝGAFzUUKZfMe} \'Ţ:U)[P(ⲯs]ßqzU]'$zˠOdsj}Ȍ AګT=P7rY Yx P?h(J-Vŗ-=fSzPDž`z;n(@ ٔNM[4q ]ʼkK0y7 T\( x A((@T E|D8X9aHPn܌Yt#3/X_EW篟znsE轨=(@R tEoA/Ukaif#,P*Ez.:}-yRLo^Cu ܫ(@ P!;Y9c,_ϩ ,=7Q|6z[Dy+"^ P@)(H׶,q]Dy[}*8V!4/H.[(@ <eq-;y)zh:[0Pc((@@m;aѧ]׳! ο0HO(`Lނ76E P0@rhz~AގǕG(@ <yz:>A/# ϏǦ+1$,V5VzxamÝv;jG}G: p R@^R l쾎7n㧧(@ R O]1H~`5. kOvx8|‚ kkTX46>8W"/! =Ҝf&$$3y(`>1.z\wq>4v~+>8v~[G%ڹ-QbX5#ovn3a @322pa!??߄2^(@ @xz(^[BSO׮<O9= G E_g_kk݊ç0e.n.pmv.)Pйs⭷{=xzzQF AժUsN}(P>9o`0i"|3U‚/} 2/! [oSGP׳bܩA2(}-욋T8+W>/7_ sr`ig=v'r}aNnU|nFn'/_I&M6ٳ'RRRn:<3?P^=|B P(ME 1BIٽ_BCtn{9n6h? 9|U HH¬!'154GPå."2.XZw+l- ]09}l-y5jVApnrnhTͮ|43R~,YMb׮]MGF GaŊw$)) W\)q?_v\(@ P-5hOyY:iyoEz7t 0teXu/mF^ca?؀܅հNDgFĽb9/膪ε0 mR9`ȉty>;E~J:fd$ de7&k#`0=~8O ݻwW&0+z2d/^{ջo=5(@ P-)Bb 7Q󚸍^A% yv"R>v~ˀuԁqX.i'5}TٸHĀ1n\ξ>G:=\+-=vIO]Qu2w^M##chIov0qe7o_@e𗛛 <ӈϢ Iw?[%(@{r<ռOy[[#s)ECbg>4c^Ohނt8ЮO׆M~ ޱi ĩ&g3ihW7 fC +G|7/ڔ L-:t»uEQJ6iDe/XGU_~eZ r4 (@ P4 XYXgkL RRnƝ(.srT$ƣW/:tOثLXj7'VnΈʺ P9tr%g,mSxJ<"{B#=R%><UbYy`0TWy;w_5 6?3<Ά |/\>[w)@ P9y ߍwBG["IBSѵhbX4*cIUf֛߮R<}y"R~6~;In<V!|iQ)V|/VIÐ7!0/918>P: Y -GA娣%Oג75A֭[ @ڪv޿gSO=N:VO(@ P [_߆]qbHD>OB95 *}6hő">'v./gFh/ۻgt}Z/sùY}V~1ZM7\njEFh~}b d\8Gݞ3O{Qv XX#{1[~Oc={{{c޼yE7ͭ?[)@ PA [|kF`gek7$[EWXzg+W5O4=/ }\y{>. úF85EQs^ e0u>;b Z~4>>B,DOUHg\ͽc}д4o>d2ҁ0vXOزe JK'(@ Y2^Y.*=? 2S^;ez/[_mް?@^T d^<3##28gG yR|04YOۛIY/#5 ]6lh)G;55MDѳpjpNT @oߎ\v ^^7+_E.] /`Ͱ*I P0n[k#݌\[vu Wc^A&}=?/E7#&ut8"2Eb t- ]D#yR WF&{YnD8֩s{y!aOUNs~2φ31oGDD$$M)/vءZ\5tx%(@R )_c ΥFQSZI_E@|nVA;j)GN[xw1tFv >_dG=[{@I8cXmd' ?Hw"+,9Wcpc]Q<׭] [;<w+G@xu. 3`_-ig$ %'],|rm P(e򖼷*L(dgYHRbVLٞhz!v{Τmu"ݿ+l|H- Wg-Fº7Dm}xέwnZOEȘbZj.Qf/_^RQZ6lz(@ P z9=I?rd# &FC}mT6}?Ɖ9k?+fX!ǟU[z07PNx'2AK`zs$ޏ+o}KڦrOE%k&}*>7%]![c6hm۶ٳKz S(pX|m8"F(r($[d[9߳xi%o/2K&^r4 UKr.ǵS"G -/J==>v~:Ǽ2HrGNotk(ڲiE$+g*T Wںuk֘1-[$*~X P@i%PW"4[#aoiȌbV 8|#>T%^gӱ^u{ކSîbt ٗaemP uN9j|Y_s@dQz|e >P,bU`~imb'Ĝqj#̆^Q?A~LK{;y *qgEl4V"#(z'8֯og㱌/`p4..|Ǐjժly7hܗ(/ HU\ K,vlrٝ3]0v!76K"=gXѪwpc6\}sX:9vzv?>1ʕ+{nȖw4jGAڵmܗ(@A@njZ_Ck↨pս#zBS^ "ȧ<г'l)?G^B2lW8dcTuKݽcK1"s {,<)`0U6m]}"wfL Pȹ}'gHwVDGafq{\K;[djF 'F1EON`A9"=YU\>ph85i#!TLk/Tw00Y#p»pYÝ)@ Px96e;Knt^xۀ(٫3^<D(tESwg"2ճB/¢p}k9Y i&e${H\s?Q.ooO| -ZK Q.(@ .hS.zˆIxDXȓPXym/NK>f,&3Su-+(' >5V=+i#jz LB*zuş~#F1cnVsr+\A P,Dˣi#Xf^=W]}QJ)x{r5֙vcCDOo˴@_Bz>o\Tu@_ y3#as\p[MRSws(1?,Xʖ-^zќRM P01y ]b'J'Uٶ{nF pZQJ(]T5A_&KnO6G(T w$NYOTNU䫊z Gmo耘+!K;q1}pQx@ӓLg+(@ ֹ,izot;A'I$aRQrоWbSF [ u]UΏ'-`0OݻxM[WG P0qHu9oFzdHnns?gy} TŹT]ϢT." I*9v,2_v`W)v-q(\SRWiY12*r1}||et|$x(@H?z'} [2~gn|a~mbTa),:Uϳok7Wr+dXt Hfo'F>m|85q&j7+'GTٸO C== Z GML7޻w^fLL `oo(@C SdG>G.pxT\:C<J%WDv$ ۔G6H~ܤxtn wEfW߻JeīOY0_b~oK˗/cHOOתoTʔ)ggg|B P)i|E܌@IDATDMMTL?t*KK|t` !_/BWxͬ{ nam-nWP5C+g`zTR{Ar͂ xTh"7wVA)@ PJ W_w -Ղ6ɹmyM +Q)q.UD^'ZsFM¼|CҮ=g,Bk3-g<'0.[ -[Dxx8*uaذa5k,YS(@XZ} -ѢdkwD^3K\FPm[uW++eD2U D0+o?ZlO4W9rtS^^^75KtqبuG+^ʾ\(@ Px\6Õ7Er8F<~hyK΢c"hmU'%tu0ɻ#~z}"y>R]d{8եyC\z-Qh~ +ӞOS@/{U5?ewZժU1h?oC֭sN (P d3;B2cLTlYEls>/u6"pNE٤QTޔA<hޕnI^}&`W!>TbzۧO5ڵk? /V?I&^z(_ )@ PD3.l) ە+>m dO[z؋6bSfLaϾKO5<мd8Ԩ=^'gEW ҡC1bf̘ ^fpuuUL7oV9+TLx(@'":?CK*KݵcH]B(ᔟAgsꭱK)-Arx⛐Q85+O`*G9̙!]~2,my52ZtS(Pdc.  {nTi<A(J}7獊,[k-oi" 0:>1_CвeoN$|}}_ff&<25#(@ PK@v;[? >X9!E{n.8KNUY>pT|Ue~YaX{ܼZ}|m^ @%?wyғZjYSo;WP( fv>H65fh)3\K HT MV.NV|ڟmZyG34++ }EZZT=Ў;.j^a:;(@ PB QMh/wm|X~çDKMXJ/XeȬz'[4B{akk k6KfE>-ZLH۹P(P:,D%fq. b%_F|ʍ2 mOiGҗQMLذCB 7 8P9WƂ J:S(PJ_ O gC>}ߋ3ᨶKfJtJ bJ-mmPc:\ݎ~;VMnjv.n'NPs=5?\(@ P(w(Lhɔ}e#Kq^d<ߗzh^<<3Q C%vvv: )))j+W0z~ C': Pn 1[Z>9y7橢7[)j.Kp,ީ3. ןl1J) ە++n^*e s:E7ek:4nzelӼz^(@ PD•µ#"Õ7e:5_(.^}s; {5No?4ף}*t օ#;~Sv~}o.%E/˪-gٲe˿m݇)@ PoHbt:URwHwDSdHL]day\d5?5 U6}F.׋qxvcG v !?})[r;ʂx"q+WN=V^...3f dF P;;EmdAb1=f}VX >F|meDg_fz#~~]@<.Fför?a-]o.q4''<*4qDtI_{f*]WOK P0aܫ^ e  7.7VmB'dfέ {_\1 V_}ϯj;#3P j=o,%vɓ'_Z裏BN>;w.jY( by IEv58Fr_xMe?!3?5TҊ(4Q)g@=ObϟGFP4ҿҥKU|(@ P?ȹ'ȹk39ĈnWf; zEb+\$*a!##ŗ"Pb*k|;v 2w eʔ)hԸ8Uh'(@ P,H4o!d'2H.Z:_>&X:ñ~ue DxQH+[ّш8G%y]-?)k^^^4h DeI|:u*v ZDGGCGP>֭cDz=QKGݥ#}'?K?e2L2 Y9! `}G jjQyJ(q+n݊޽{~ꇭ|o۶->N`7GFFEst Tew&9`ӦMؼy3vލJnxG(@ <<'!W>{fH/gdQgH?po|)JZdy<2τ΍j6ZOT~dp$}zSNE#\(@ PS_-g(2Hv6a-.1"*p3Bc9옔yZO6x=ijU>}3a)ݞl85b]fcȐ!=? @us*JZd!~vv P.ݿ+ε" r;G rt5֞3Sȑ6P':&̀ RvGu*Lܶ >]=>vSn@P>-J]ڳgO 4&&zRsE VcnG'0[w%,lYWTޖB;ɻ}ի'N`СV|/Lڹs'ZlY|_SLRFAljYz"XZN koT_u^>wQ^?jag=%͍}]Inl<<݊s RJ*W2eШ(5ꙛnLbn(@ N%\NPEom5}/T#"H̢tr@{Bq:Je[[[ʀ (@ PûȾ<&u^ŕ!g}Ox%!Qrs&W} jyO)PP~(@ ܹ@i/t8 XgZrh/[ާ l=Que潅(TkNo_70-(@SASխtv 3KbwVhW1&y Q` 5Veˀ5hx~C_0%*^(@ ܋O`OE}y<]E8dϘO"/Q MY}! clDQ\Z4D-_1çܨ EA! DKgz [t%Hދw"6.(@ <8+C 068N5 )DD5ڜMQk7+"BLCy"=qNT\XֻK{ eAg}Sk ܭW\ѣUuy`ڵ"yӦMS] .(@ < caQ5fY皨*Oꬅlb3թSnn_5삃Ը6ž}U5Cae U%ē$:#AvJ:{^z yq |w,p|e.]7o6|ǰtRD P}+C5h8zLJ=\gEdεX *?WpUj GA~*J2HFh_ ALx.G@e7hwQyꪊʒP(+w[/ݕSv@z92*zZIb9փT2!۔FvxJNv&J7'yC|D=PnJ C͚5!Kƍq5U6(@ Prj?aJHҼrS^yEQo!nkP֮E?al>jȻS瑗2>{".OxQZ+G@+Vqqq9<Ѐ|A P<&y *[cAlTq]'XX[̨~>ESri Neǯ1u0o %P"zCEHZR(*R(ѫli#}wY_ +4hÆ -6m*ݻw^Utԩ2j(i۶" 'P!oEyw+jjK2),ï/QZRdۋoDԷ3F-!da)]\-UOȵTK{{\%  `%pb wxQF~>NtSk*% Фt=uU]wT ۤ Ve 'YŒo:@*|<ج8Ѳ^y@s/#_!j$';H MtR4i0a_b~HH'<'-k(kԿՐNohN4~JMXKҔKm>W #>lZ_͘5HlvT$T )@7n,J& +vٳGkٹs{$@$@$@A@ _R rdVږ$Z*HvX@!RW7yȪ"ũe:Q׳~ +ّKL~R#9R|MYr_^j֬)N6#  :E;Rd'h%Gы4OSyjVB]#^WǃR莛5x_R)ANk4N/Zţ]ߗh?8c$,[򈚰>Q㒻Y71>)l)@׬Y#]vҿ4%ӠA${H'N; ճ h˷Uljg;n1sO? [=Xd/gdqO-%9/X,)$Q%gZA}9s?ui _U*T F;M:UNAUro!vŋ$%  8lRh V+m\(˹.rQ暴^=CV ϋZ7WSVw|XS"EP iǎ$ !   Q!>Lenu)֩8'Oߚ)5%*_ X: yV4}ZW۵B'FH\)@Yp̙3G^uɝ;|ra0`\wu.y. ɩ]dۋoJ=Q;ky*EkIϼV|nQ|O]O6 I %xTBo!)66V|ArロCx @p@4| d 'XN"V1ʂ$:J}CY,Ĕ,jMiH Çe˖YQ.]잝;wB '  bX/ݷ]EtNeC{oc,lr{fqLtYJjyzp#ҴҧO[g5d޼yf(7n @(@KYuKYQ!4MXvG0R\Ʋ$gx8R;\!&_zqyѣw^Yn4m4`iժUwHHHB@<#eݽ匦YBzX9ҩX4Jr^\*ْ|͘#4W\\~ʔ)zLLL1^zҮ];ޓ .,~4MCŦĎ4( /7E~70a<߇z   8o9iҋ,FE E>VZ/6   &;^yW[%4+hrbV+ˉhv6*i ЬC$@$@$xEK*t9䮽M+!kUbflRD d&3.M$@$@ABĖ&>+oZ?)'B֝r`ڏxXWf9/  "p|F߰FgӇ%lI4syD3Z=*OnA6,E%  "-w.IOKlVK?qHu4_A>;"4_ 4(C$@$@~QŸ%az9ld&wdM9uJ qcKWʾ&H P?CށHH@RJmnK;! p93:y.oC3 yqI44GM$@$@L NU)7HiqNFJ/LXID Sr*.5_( da ' )<_a`ߖ@yC:uc_Rr/B-9~  8GQTW%9?jbzTG*֭M%h`R_)s3O'M'E$@$@aEKxGB$@$@YNU*@_N#e_yZr_qi9S  KHoi_~O`' \`\@y;    P·GIHHH.0 #   Hh|xHHHH@y;    =v,ZH֮]+ϚGIHHHF 믿.{N_B I͚5/+̙3    ry@OٻwرcgϞҨQ#i۶>|X>iٲ,X@V9$@$@$@$@'rѣvs[nRzu6l7     %x_d߷[:v(K.M    I hBB:uJr!-Z۷'… 'OcΝ|N@'HHHHٳgKll\}R`A뮻Zjm60`?^Fqn4l<8N<) sɒ%xb̞=_7|#FΝ;C=tn4ƍWZB(Q"xHHHH@ ИlGt{=9}m7o\5k&K @ $%,UT={P|PI$@$@$@AH l-xHHHH@Ȇ xm    L'6xҷo_)[lCHHHH2N SjbŤw)f? @ h0HHHH @ &   (@/,OލHHHH ia    KHHHH @ &   & Ӆ»Q;jժ}'r )p$@$@$Qh~ׯs=?~+dȐ!;wn9tPM3"  0!@ hHƘ1c^#GH| ANEҤI7odoHIH-~8T=ZfΜ_ٳGVXvHHH PP΍mlUW]ul޼! h_qJUVO?$eʔ{$@$@$@%@X|yx妛ncǎ%K d߾}RJ$!  Qd@ǎe7o^8přOڵk-%2 @4 GW^˒%K,yɼ$@$@$@ENj@F_mNٳgիH"ҰaCx @(@Sã$7|SV\)Ҽysi׮]#ںu̚5K>,UVzIl)?Z~}gڵe~ҥrW,$  3g~.7ʥ^j%5{9iݺo>U:eKaiܸ*THND&Mm/;ml2U٦M=7pYƳn@|:ԒcεjՒ6m*wu_ރ$@$@$(@#; G:C|[N̙#Ǐ{Oy+yeYҥ8b -**Ĭ]t@k;v0˦1B|'쯽Z \_eڴi&rm o߾ޗrHHH nf ,~햻}bΜ9?'NDesv"<-*{M6D*}+R8˻g{;ӻ-D=zs-- 5Q߲rHH"h@N~]&"Q?X'}=|pٲeYP:$t|f AdRfMn.Vl^z/ĉ-pW_m&&&t /GvЄݻSVr  cL/7XZ1~O3wZ "ÄhmX҇UNG;R*k/1=,r"Zxq rݡ'<,͗(QB}Q˜WnQ8 n!x,\r]… '#  p#@ hƿK#WB8ºKD$<wD/9W^y%"9/i o*<Ԓ訿FևVZ9juh $romo.]_~ ZZMT|✻۽>5ߨ LOZH\>С+W.;O]wKs7Td*=Mi Aas8A춯VHG#ttĩV9r\a;wnjz6"F}VIJv:ygc4Zd/܁8} r41S<֭&wkBY hrԒ?3rHdHHJ5n̰<=x`y'/xflHWoǑ(̱\{FXolXGI#œz\@)|} V B7 _RDcAZ$ǘ݀'O$@$@F>h qT-!&ts(Ӊ'zi1^Q"2`~J{F$@$@aC] %t@JvNJ`-鹿ֶV}IHH%%Hf@LX2Aa4j#*VC3Z z a5*TK׀"['eյ<\'1cسPI,{7 nU}L* k%#m"ّIDYJ|\>&&FTZ{`U XNPHKo=O  5\7FU,Z&<<bIPsR~}O:%2D"Dz$LBu$TVj׮h7ÿm8e@QguKTmB$Tn =~Heߞq")\0FPE%<\zH9@jľp~B$ !˛ y6ESmH]" 6X 8/`"b +%DI,_'A̺5ui߬.*U|kZBzȬ= k(,xfٲem# ͛o+,S TT[-?), b ޲\5ϰ: aݢEdyGI$@$@F4HBKr"y; JBT$dɒ.z,eo޼aeoDCbm7D*~zQR}X^D>#PAS=@,c+u :CjL,cCL"]AZXuQcpP;DAypu' #.[1A8j2z4'* b{&M,C!4!Y8! a5R9"בe QӷT\?g>}LTB27|cBu_1jVN,kPUHɰ̢kJ7ްR?!=4w=,!|M4~ @P4_O @,)g">JaY~gv۳>kXnG9N,YaNJTe>nKV,P=Bb Voѓ `Dc%AQ]#G3 v݆y`ꊼPN!|1V[QM'oZin-_|5Qi*=cPqk}dyrbhWANQ7ϵ-VLG-v?*Gn:*nOާrHH BPZ@OΈ@"/NcZ"-bVF,7#'>ăk(,iЏ4I-n(婉ħp7(ƎjDX(,BѐX|tBR:Ɔܣhrڔ)Sd޼yMrvTp۹/ RBpT.]O6 h_Ad ~hD$/SkXFs}1SOYvAܡAAH/ȳ eqMtZy^xv),en+E#,ckSV>DP$/(0>,Cd \!,!lĎ%{AK#GewAV ;>>IHH p iE)LO{=G=ݩ?-F;xX%;M8yu,5QoTaԭ[Qq?j=RXǹh*ʎÍmt{gM@l\*R\`"G;\dVxr4prZX[Pm}x [ hd7pu%)Uj'yVTG^=$ g [P4B|ٳMLλ;mMEj2qF} 24_N͡h丣y;]w4ՒՍGw\kUZ9ʕ+;jyay:jttIc_P՜eZcմKqQG:jtX'4Օ޽6ғHH"hg stlKS9<,XQs^ XB`a$ﶭ<}!< ]✔,t aՄ%}jTAi#Fحp\bOV_ܧ{ |ՕшxG}WMXVXdb ǽԗ=h(}O7HHŸ@(АFo>C0 [PgDJdB:#5HC鲵<74I*n&6Մܹsg8}'TI?jv!!(EgΜ)C ߉'G K(%'*6TB@iɒ%2c O'—Cq[R@4)˪"ֽ%?IHH (EF` 7M\v- Q!\ҭ|%܃rj<=W4 { ?H8qRp3՗ԓC}n%@7GA7j9GJz"WHf[Bz,!%;Q GGrB"&X򮲑 @9_! Q ΪԠt")4IDAT8,\lKzB!;*HVJl@9L(+!_tKN,>!}RR#P`%w0`EC̢( -CsaMV^X!|G|{=5A<$҇飐r]HHH BɾU:jyJqDA34Ȟ}UY"@G+96Gpz X:,2惉 Ç[j~Vrʹg )<|PLM?e~2֧Kv|Kɯ}?bBd0Ym IVA9:uKGڽ:t瘧$@$@aK> |Rj9r%Xa)-r OF'0\3Kp=JiOv.}YjqD4%Y;m\K( W)Vk#Ma R$  @%x!cǎrC坌zc YzAvko}5§b"m?-H {?,#PsZ#)1|Iы <ؖQ  n%p;q}MчzoW _v @P5375Y`zjKV툼g)s2k~\,w_,w\H.cy? Kƣ;R(OMr+V1'ש-k]Dw?hkHF$@$Ba >$##~ ~~L.TAwsupi~N.j D\P߿MS9Y?Bu%h’#wZ9^z9W>裏4|UqHH r %xe-Z"ܱhM6XVeH@>M|e!"v dCJ f>ɭb bm:o<ׯo߾* >G;~&7onaF$@$@D $ ub#`'?QWqӄ&f):}o_1~ @(iaQEnƴ߻I:I%#-*|D?rB"yGPY QժU}]@ &X%$@$@$j(@}JBA 5-ևq|%?Q݉HHB@ К5kl&7kQN$TWZN:RvNqHBuېS-ZdkԎ'R 寽Z6mZP> ;ZEZje5 uY+Фt$dqm݉|7nݺUHHH 2 83H T 3f$2D&$}WI܎hv6  '!iJe˘OR &^  `e7!,ׯ_oP) X͝;ׂs %rPwXD l$*9kvj#M<8N  8!+@A3w^K!eb P   \K6%K;>    !6|r˧$l9     ?FHHHHhHHH™@ŋK߾}lٲ87   y!KXbһwoL?qV:1a7޺u@I!TTlCri)\pL3}e2{ ՠG)ƍa0Y P-r:t"gҜ9y(Q4"]_勠YgTvjb-L<s̑>}XI{ l>SHHHH,.e-    , @    (@ϲ @| YgYpHHHH Pfd>HHHH, г,E$@$@$@$(@2A$@$@$@$pY"   Y    8K, n3g|8iB@bb}W^ 7SN~ϸfիH |8ϴbmsAJ@?≘7ѽ{&l|%РAd_vm;m$'sJ{ 2{7X =/EG{NkM7 Z%,kȐ!I >kt[%SE<>Xju߷e4P s/g?|53-.JvS愲?$w'|<ܟÆ ]_VNo(|B݆z깻 BXf1bD}񽧇Rh>#-2g<2sZM7{=x HTܡCd4h@x޼yR~};yfY|@Բ/v9C{YD.ؼyJ5Bb3 ;D!U8Ls-;wtJRulmڴ1oDn4g`~J#Mԩ#'Pƍd|Ip ˕+g?HBl2iٲh>`6D<߫W/A;Ɵqo}v]32Y"aD?B?Pr!XCW.@c3{& m3_X4q7kLZhdd!p{Z@C%qbՈ|—"bmӦMRLteF?K?.w @@ p >p   <9c   ( ЀIHHH PF;IHHH (@'   #@y3&    (~>HHH"hsΘHHHJ4p   <9c   ( ЀIHHH PF;IHHH (@'   #@y3&    (~>HHH"hsΘHHHJ4p   <9c   ( ЀIHHH PF;IHHH (@'   #@y3&    (~>HHH"hsΘHHHJ4p ӧO+"O<$$$d&ɓy PH d 9sF~m9~x!Axҿ3f+VL|MDRR%9qDۣn]t(z瓝۱qF)Y|wC&LܣQF~n @J(@S"~ 'gIN;]{9yꩧdԨQҳgOys޽R`A1b^ZW֮]+ʕˎ_veb ֭[;6l ͛7{w[oܸq~ @J(@S"~ '`79Ν;ׄc=$::Ze&ؾ{IVZI˖-^?Ӄhɒ%׵kW=9sʕ+K"E<}#GZ (`ϟIH/ PXI$L"zap 2~xۆ~+\rtA8͜9|'˔)#ԩSsͲ! l9ލݻwK*eiѢ޽{Kǎ]f͚6fs,\|ܹs]޼yeRV-Os;#?+WN 9/o6< HhhxH bw-}QV@86mT#t|>T`|衇?e[oU^u>|vM7ɤIK^{m!a %!o.5jW#GWs=j_mŞt|IG-֧Ď@>::GOӦM}QQh/ .;*0=>|ع袋֧שW86 {={w]0m:i{9Kv4PшyöݹsgJ*Sغuk2>wuӰaCw$@$"Z@kr (+¬*dfiݺY7 XԶ  K4bV?|\,?{Ig  ٴi:tH*j'S-NmےgnY|]%w߆tK8AGe# ` @ oc `tY-[fVKD#izfd͞gx[C=l o>+uj(MR|ymЉ5lX5yRF$@@4H$+$]/"v ?X`-OoW^iNM3 A;H]+|އ;ܥ}Sg(3g*sDPr1}aHBh(%H/84@jp ;aBZ%,#|ܸq/N:~ۉzDC!]"_z%K;:} |H|Ɓe}$&RC!rbk.рd"wwqU8=p ,HkwڇhG&>^D$)' 9TZբ͍6ydnݺN٭ӧOQ4dž-G<=} .W-N:9 M%Ɓx:GsZ1t р%G+ٶoVQ?V;ש2d羾R _9꾐F'C 5<HrgR3D,}wu3ڐQw|Jd? [![CRAN version](http://www.r-pkg.org/badges/version/Rtsne)](https://cran.r-project.org/package=Rtsne/) [![R-CMD-check](https://github.com/jkrijthe/Rtsne/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jkrijthe/Rtsne/actions/workflows/R-CMD-check.yaml) [![codecov.io](https://codecov.io/github/jkrijthe/Rtsne/coverage.svg?branch=master)](https://app.codecov.io/github/jkrijthe/Rtsne?branch=master) [![CRAN mirror downloads](http://cranlogs.r-pkg.org/badges/Rtsne)](https://cran.r-project.org/package=Rtsne/) # R wrapper for Van der Maaten’s Barnes-Hut implementation of t-Distributed Stochastic Neighbor Embedding ## Installation To install from CRAN: ``` r install.packages("Rtsne") # Install Rtsne package from CRAN ``` To install the latest version from the github repository, use: ``` r if(!require(devtools)) install.packages("devtools") # If not already installed devtools::install_github("jkrijthe/Rtsne") ``` ## Usage After installing the package, use the following code to run a simple example (to install, see below). ``` r library(Rtsne) # Load package iris_unique <- unique(iris) # Remove duplicates set.seed(42) # Sets seed for reproducibility tsne_out <- Rtsne(as.matrix(iris_unique[,1:4])) # Run TSNE plot(tsne_out$Y,col=iris_unique$Species,asp=1) # Plot the result ``` ![](tools/example-1.png) # Details This R package offers a wrapper around the Barnes-Hut TSNE C++ implementation of \[2\] \[3\]. Changes were made to the original code to allow it to function as an R package and to add additional functionality and speed improvements. # References \[1\] L.J.P. van der Maaten and G.E. Hinton. “Visualizing High-Dimensional Data Using t-SNE.” Journal of Machine Learning Research 9(Nov):2579-2605, 2008. \[2\] L.J.P van der Maaten. “Accelerating t-SNE using tree-based algorithms.” Journal of Machine Learning Research 15.1:3221-3245, 2014. \[3\] Rtsne/man/0000755000176200001440000000000014531106340012115 5ustar liggesusersRtsne/man/normalize_input.Rd0000644000176200001440000000216214531106340015624 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/utils.R \name{normalize_input} \alias{normalize_input} \title{Normalize input data matrix} \usage{ normalize_input(X) } \arguments{ \item{X}{matrix; Input data matrix with rows as observations and columns as variables/dimensions.} } \value{ A numeric matrix of the same dimensions as \code{X} but centred by column and scaled to have a maximum deviation of 1. } \description{ Mean centers each column of an input data matrix so that it has a mean of zero. Scales the entire matrix so that the largest absolute of the centered matrix is equal to unity. } \details{ Normalization avoids numerical problems when the coordinates (and thus the distances between observations) are very large. Directly computing distances on this scale may lead to underflow when computing the probabilities in the t-SNE algorithm. Rescaling the input values mitigates these problems to some extent. } \examples{ iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) X <- normalize_input(iris_matrix) colMeans(X) range(X) } \author{ Aaron Lun } Rtsne/man/Rtsne.Rd0000644000176200001440000002725214531106340013507 0ustar liggesusers% Generated by roxygen2: do not edit by hand % Please edit documentation in R/Rtsne.R, R/neighbors.R \name{Rtsne} \alias{Rtsne} \alias{Rtsne.default} \alias{Rtsne.dist} \alias{Rtsne.data.frame} \alias{Rtsne_neighbors} \title{Barnes-Hut implementation of t-Distributed Stochastic Neighbor Embedding} \usage{ Rtsne(X, ...) \method{Rtsne}{default}( X, dims = 2, initial_dims = 50, perplexity = 30, theta = 0.5, check_duplicates = TRUE, pca = TRUE, partial_pca = FALSE, max_iter = 1000, verbose = getOption("verbose", FALSE), is_distance = FALSE, Y_init = NULL, pca_center = TRUE, pca_scale = FALSE, normalize = TRUE, stop_lying_iter = ifelse(is.null(Y_init), 250L, 0L), mom_switch_iter = ifelse(is.null(Y_init), 250L, 0L), momentum = 0.5, final_momentum = 0.8, eta = 200, exaggeration_factor = 12, num_threads = 1, ... ) \method{Rtsne}{dist}(X, ..., is_distance = TRUE) \method{Rtsne}{data.frame}(X, ...) Rtsne_neighbors( index, distance, dims = 2, perplexity = 30, theta = 0.5, max_iter = 1000, verbose = getOption("verbose", FALSE), Y_init = NULL, stop_lying_iter = ifelse(is.null(Y_init), 250L, 0L), mom_switch_iter = ifelse(is.null(Y_init), 250L, 0L), momentum = 0.5, final_momentum = 0.8, eta = 200, exaggeration_factor = 12, num_threads = 1, ... ) } \arguments{ \item{X}{matrix; Data matrix (each row is an observation, each column is a variable)} \item{...}{Other arguments that can be passed to Rtsne} \item{dims}{integer; Output dimensionality (default: 2)} \item{initial_dims}{integer; the number of dimensions that should be retained in the initial PCA step (default: 50)} \item{perplexity}{numeric; Perplexity parameter (should not be bigger than 3 * perplexity < nrow(X) - 1, see details for interpretation)} \item{theta}{numeric; Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5)} \item{check_duplicates}{logical; Checks whether duplicates are present. It is best to make sure there are no duplicates present and set this option to FALSE, especially for large datasets (default: TRUE)} \item{pca}{logical; Whether an initial PCA step should be performed (default: TRUE)} \item{partial_pca}{logical; Whether truncated PCA should be used to calculate principal components (requires the irlba package). This is faster for large input matrices (default: FALSE)} \item{max_iter}{integer; Number of iterations (default: 1000)} \item{verbose}{logical; Whether progress updates should be printed (default: global "verbose" option, or FALSE if that is not set)} \item{is_distance}{logical; Indicate whether X is a distance matrix (default: FALSE)} \item{Y_init}{matrix; Initial locations of the objects. If NULL, random initialization will be used (default: NULL). Note that when using this, the initial stage with exaggerated perplexity values and a larger momentum term will be skipped.} \item{pca_center}{logical; Should data be centered before pca is applied? (default: TRUE)} \item{pca_scale}{logical; Should data be scaled before pca is applied? (default: FALSE)} \item{normalize}{logical; Should data be normalized internally prior to distance calculations with \code{\link{normalize_input}}? (default: TRUE)} \item{stop_lying_iter}{integer; Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0)} \item{mom_switch_iter}{integer; Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0)} \item{momentum}{numeric; Momentum used in the first part of the optimization (default: 0.5)} \item{final_momentum}{numeric; Momentum used in the final part of the optimization (default: 0.8)} \item{eta}{numeric; Learning rate (default: 200.0)} \item{exaggeration_factor}{numeric; Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0)} \item{num_threads}{integer; Number of threads to use when using OpenMP, default is 1. Setting to 0 corresponds to detecting and using all available cores} \item{index}{integer matrix; Each row contains the identity of the nearest neighbors for each observation} \item{distance}{numeric matrix; Each row contains the distance to the nearest neighbors in \code{index} for each observation} } \value{ List with the following elements: \item{Y}{Matrix containing the new representations for the objects} \item{N}{Number of objects} \item{origD}{Original Dimensionality before TSNE (only when \code{X} is a data matrix)} \item{perplexity}{See above} \item{theta}{See above} \item{costs}{The cost for every object after the final iteration} \item{itercosts}{The total costs (KL-divergence) for all objects in every 50th + the last iteration} \item{stop_lying_iter}{Iteration after which the perplexities are no longer exaggerated} \item{mom_switch_iter}{Iteration after which the final momentum is used} \item{momentum}{Momentum used in the first part of the optimization} \item{final_momentum}{Momentum used in the final part of the optimization} \item{eta}{Learning rate} \item{exaggeration_factor}{Exaggeration factor used to multiply the P matrix in the first part of the optimization} } \description{ Wrapper for the C++ implementation of Barnes-Hut t-Distributed Stochastic Neighbor Embedding. t-SNE is a method for constructing a low dimensional embedding of high-dimensional data, distances or similarities. Exact t-SNE can be computed by setting theta=0.0. } \details{ Given a distance matrix \eqn{D} between input objects (which by default, is the euclidean distances between two objects), we calculate a similarity score in the original space: \deqn{ p_{j | i} = \frac{\exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)}{\sum_{k \neq i} \exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)} } which is then symmetrized using: \deqn{ p_{i j}=\frac{p_{j|i} + p_{i|j}}{2n}.} The \eqn{\sigma} for each object is chosen in such a way that the perplexity of \eqn{p_{j|i}} has a value that is close to the user defined perplexity. This value effectively controls how many nearest neighbours are taken into account when constructing the embedding in the low-dimensional space. For the low-dimensional space we use the Cauchy distribution (t-distribution with one degree of freedom) as the distribution of the distances to neighbouring objects: \deqn{ q_{i j} = \frac{(1+ \| y_i-y_j\|^2)^{-1}}{\sum_{k \neq l} 1+ \| y_k-y_l\|^2)^{-1}}.} By changing the location of the objects y in the embedding to minimize the Kullback-Leibler divergence between these two distributions \eqn{ q_{i j}} and \eqn{ p_{i j}}, we create a map that focusses on small-scale structure, due to the asymmetry of the KL-divergence. The t-distribution is chosen to avoid the crowding problem: in the original high dimensional space, there are potentially many equidistant objects with moderate distance from a particular object, more than can be accounted for in the low dimensional representation. The t-distribution makes sure that these objects are more spread out in the new representation. For larger datasets, a problem with the a simple gradient descent to minimize the Kullback-Leibler divergence is the computational complexity of each gradient step (which is \eqn{O(n^2)}). The Barnes-Hut implementation of the algorithm attempts to mitigate this problem using two tricks: (1) approximating small similarities by 0 in the \eqn{p_{ij}} distribution, where the non-zero entries are computed by finding 3*perplexity nearest neighbours using an efficient tree search. (2) Using the Barnes-Hut algorithm in the computation of the gradient which approximates large distance similarities using a quadtree. This approximation is controlled by the \code{theta} parameter, with smaller values leading to more exact approximations. When \code{theta=0.0}, the implementation uses a standard t-SNE implementation. The Barnes-Hut approximation leads to a \eqn{O(n log(n))} computational complexity for each iteration. During the minimization of the KL-divergence, the implementation uses a trick known as early exaggeration, which multiplies the \eqn{p_{ij}}'s by 12 during the first 250 iterations. This leads to tighter clustering and more distance between clusters of objects. This early exaggeration is not used when the user gives an initialization of the objects in the embedding by setting \code{Y_init}. During the early exaggeration phase, a momentum term of 0.5 is used while this is changed to 0.8 after the first 250 iterations. All these default parameters can be changed by the user. After checking the correctness of the input, the \code{Rtsne} function (optionally) does an initial reduction of the feature space using \code{\link{prcomp}}, before calling the C++ TSNE implementation. Since R's random number generator is used, use \code{\link{set.seed}} before the function call to get reproducible results. If \code{X} is a data.frame, it is transformed into a matrix using \code{\link{model.matrix}}. If \code{X} is a \code{\link{dist}} object, it is currently first expanded into a full distance matrix. } \section{Methods (by class)}{ \itemize{ \item \code{Rtsne(default)}: Default Interface \item \code{Rtsne(dist)}: tsne on given dist object \item \code{Rtsne(data.frame)}: tsne on data.frame }} \section{Supplying precomputed distances}{ If a distance matrix is already available, this can be directly supplied to \code{Rtsne} by setting \code{is_distance=TRUE}. This improves efficiency by avoiding recalculation of distances, but requires some work to get the same results as running default \code{Rtsne} on a data matrix. Specifically, Euclidean distances should be computed from a normalized data matrix - see \code{\link{normalize_input}} for details. PCA arguments will also be ignored if \code{is_distance=TRUE}. NN search results can be directly supplied to \code{Rtsne_neighbors} to avoid repeating the (possibly time-consuming) search. To achieve the same results as \code{Rtsne} on the data matrix, the search should be conducted on the normalized data matrix. The number of nearest neighbors should also be equal to three-fold the \code{perplexity}, rounded down to the nearest integer. Note that pre-supplied NN results cannot be used when \code{theta=0} as they are only relevant for the approximate algorithm. Any kind of distance metric can be used as input. In contrast, running \code{Rtsne} on a data matrix will always use Euclidean distances. } \examples{ iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) # Set a seed if you want reproducible results set.seed(42) tsne_out <- Rtsne(iris_matrix,pca=FALSE,perplexity=30,theta=0.0) # Run TSNE # Show the objects in the 2D tsne representation plot(tsne_out$Y,col=iris_unique$Species, asp=1) # data.frame as input tsne_out <- Rtsne(iris_unique,pca=FALSE, theta=0.0) # Using a dist object set.seed(42) tsne_out <- Rtsne(dist(normalize_input(iris_matrix)), theta=0.0) plot(tsne_out$Y,col=iris_unique$Species, asp=1) set.seed(42) tsne_out <- Rtsne(as.matrix(dist(normalize_input(iris_matrix))),theta=0.0) plot(tsne_out$Y,col=iris_unique$Species, asp=1) # Supplying starting positions (example: continue from earlier embedding) set.seed(42) tsne_part1 <- Rtsne(iris_unique[,1:4], theta=0.0, pca=FALSE, max_iter=350) tsne_part2 <- Rtsne(iris_unique[,1:4], theta=0.0, pca=FALSE, max_iter=650, Y_init=tsne_part1$Y) plot(tsne_part2$Y,col=iris_unique$Species, asp=1) \dontrun{ # Fast PCA and multicore tsne_out <- Rtsne(iris_matrix, theta=0.1, partial_pca = TRUE, initial_dims=3) tsne_out <- Rtsne(iris_matrix, theta=0.1, num_threads = 2) } } \references{ Maaten, L. Van Der, 2014. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research, 15, p.3221-3245. van der Maaten, L.J.P. & Hinton, G.E., 2008. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research, 9, pp.2579-2605. } Rtsne/DESCRIPTION0000644000176200001440000000176614534330563013073 0ustar liggesusersPackage: Rtsne Type: Package Title: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation Version: 0.17 Authors@R: c( person("Jesse", "Krijthe", ,"jkrijthe@gmail.com", role = c("aut", "cre")), person("Laurens", "van der Maaten", role = c("cph"), comment = "Author of original C++ code") ) Description: An R wrapper around the fast T-distributed Stochastic Neighbor Embedding implementation by Van der Maaten (see for more information on the original implementation). License: file LICENSE URL: https://github.com/jkrijthe/Rtsne Encoding: UTF-8 Imports: Rcpp (>= 0.11.0), stats LinkingTo: Rcpp Suggests: irlba, testthat RoxygenNote: 7.2.3 NeedsCompilation: yes Packaged: 2023-12-07 01:33:30 UTC; jkrijthe Author: Jesse Krijthe [aut, cre], Laurens van der Maaten [cph] (Author of original C++ code) Maintainer: Jesse Krijthe License_is_FOSS: yes Repository: CRAN Date/Publication: 2023-12-07 11:50:11 UTC Rtsne/tests/0000755000176200001440000000000014531100546012506 5ustar liggesusersRtsne/tests/testthat/0000755000176200001440000000000014534330563014355 5ustar liggesusersRtsne/tests/testthat/test_Rtsne.R0000644000176200001440000001535714531100546016636 0ustar liggesuserscontext("Rtsne main function") # Prepare iris dataset iris_unique <- unique(iris) # Remove duplicates Xscale<-normalize_input(as.matrix(iris_unique[,1:4])) distmat <- as.matrix(dist(Xscale)) # Run models to compare to iter_equal <- 500 test_that("Scaling gives the expected result", { Xscale2 <- scale(as.matrix(iris_unique[,1:4]), center = TRUE, scale=FALSE) Xscale2 <- scale(Xscale, center=FALSE, scale=rep(max(abs(Xscale)),4)) expect_equivalent(Xscale,Xscale2) }) test_that("Manual distance calculation equals C++ distance calculation", { # Does not work on 32 bit windows skip_on_cran() # Exact set.seed(50) tsne_matrix <- Rtsne(as.matrix(iris_unique[,1:4]),verbose=FALSE, is_distance = FALSE,theta=0.0,max_iter=iter_equal, pca=FALSE, normalize=TRUE) set.seed(50) tsne_dist <- Rtsne(distmat, verbose=FALSE, is_distance = TRUE, theta = 0.0, max_iter=iter_equal) expect_equal(tsne_matrix$Y,tsne_dist$Y) # Inexact set.seed(50) tsne_matrix <- Rtsne(as.matrix(iris_unique[,1:4]),verbose=FALSE, is_distance = FALSE, theta=0.1,max_iter=iter_equal,pca=FALSE) set.seed(50) tsne_dist <- Rtsne(distmat, verbose=FALSE, is_distance = TRUE, theta = 0.1, max_iter=iter_equal, pca=FALSE) expect_equal(tsne_matrix$Y,tsne_dist$Y) }) test_that("Accepts dist", { # Exact set.seed(50) tsne_out_dist_matrix <- Rtsne(distmat, is_distance = TRUE, theta=0.0, max_iter=iter_equal) set.seed(50) tsne_out_dist <- Rtsne(dist(Xscale),theta=0.0,max_iter=iter_equal) expect_equal(tsne_out_dist$Y,tsne_out_dist_matrix$Y) # Inexact set.seed(50) tsne_out_dist_matrix <- Rtsne(distmat, is_distance = TRUE, theta=0.1, max_iter=iter_equal) set.seed(50) tsne_out_dist <- Rtsne(dist(Xscale), theta=0.1, max_iter=iter_equal) expect_equal(tsne_out_dist$Y,tsne_out_dist_matrix$Y) }) test_that("Accepts data.frame", { # Exact set.seed(50) tsne_out_matrix <- Rtsne(as.matrix(iris_unique[,1:4]),dims=1,verbose=FALSE, is_distance = FALSE,theta=0.0,max_iter=iter_equal,pca=FALSE) set.seed(50) tsne_out_df <- Rtsne(iris_unique[,1:4],dims=1,verbose=FALSE, is_distance = FALSE,theta=0.0,pca=FALSE,max_iter=iter_equal,num_threads=1) expect_equal(tsne_out_matrix$Y,tsne_out_df$Y) # Inexact set.seed(50) tsne_out_matrix_bh <- Rtsne(as.matrix(iris_unique[,1:4]),verbose=FALSE, is_distance = FALSE,theta=0.1,pca=FALSE,max_iter=iter_equal) set.seed(50) tsne_out_df <- Rtsne(iris_unique[,1:4],verbose=FALSE, is_distance = FALSE,theta=0.1,pca=FALSE,max_iter=iter_equal,num_threads=1) expect_equal(tsne_out_matrix_bh$Y,tsne_out_df$Y) }) test_that("OpenMP with different threads returns same result",{ # Does not work on windows skip_on_cran() skip_on_ci() set.seed(50) tsne_out_df1 <- Rtsne(iris_unique[,1:4],dims=3,verbose=FALSE, is_distance = FALSE, theta=0.1,pca=FALSE,max_iter=iter_equal,num_threads=1) set.seed(50) tsne_out_df2 <- Rtsne(iris_unique[,1:4],dims=3,verbose=FALSE, is_distance = FALSE, theta=0.1,pca=FALSE,max_iter=iter_equal,num_threads=2) set.seed(50) tsne_out_df3 <- Rtsne(iris_unique[,1:4],dims=3,verbose=FALSE, is_distance = FALSE, theta=0.1,pca=FALSE,max_iter=iter_equal,num_threads=3) expect_equal(tsne_out_df1$Y,tsne_out_df2$Y) expect_equal(tsne_out_df2$Y,tsne_out_df3$Y) }) test_that("Continuing from initialization gives approximately the same result as direct run", { # Does not work exactly due to resetting of "gains". iter_equal <- 1000 extra_iter <- 200 #Exact set.seed(50) tsne_out_full <- Rtsne(iris_unique[,1:4], perplexity=3,theta=0.0,pca=FALSE, max_iter=iter_equal,final_momentum = 0) set.seed(50) tsne_out_part1 <- Rtsne(iris_unique[,1:4], perplexity=3,theta=0.0,pca=FALSE, max_iter=iter_equal-extra_iter,final_momentum = 0) tsne_out_part2 <- Rtsne(iris_unique[,1:4], perplexity=3,theta=0.0,pca=FALSE, max_iter=extra_iter,Y_init=tsne_out_part1$Y,final_momentum = 0) expect_equivalent(dist(tsne_out_full$Y),dist(tsne_out_part2$Y),tolerance=0.01) #Inexact set.seed(50) tsne_out_full <- Rtsne(iris_unique[,1:4],final_momentum=0,theta=0.1,pca=FALSE,max_iter=iter_equal) set.seed(50) tsne_out_part1 <- Rtsne(iris_unique[,1:4],final_momentum=0,theta=0.1,pca=FALSE,max_iter=iter_equal-extra_iter) set.seed(50) tsne_out_part2 <- Rtsne(iris_unique[,1:4],final_momentum=0,theta=0.1,pca=FALSE,max_iter=extra_iter,Y_init=tsne_out_part1$Y) expect_equivalent(dist(tsne_out_full$Y),dist(tsne_out_part2$Y),tolerance=0.01) }) test_that("partial_pca FALSE and TRUE give similar results", { # Only first few iterations iter_equal <- 5 set.seed(42) fat_data <- rbind(sapply(runif(200,-1,1), function(x) rnorm(200,x)), sapply(runif(200,-1,1), function(x) rnorm(200,x))) set.seed(42) tsne_out_prcomp <- Rtsne(fat_data, max_iter = iter_equal) set.seed(42) tsne_out_irlba <- Rtsne(fat_data, partial_pca = T, max_iter = iter_equal) # Sign of principal components are arbitrary so even with same seed tSNE coordinates are not the same expect_equal(tsne_out_prcomp$costs, tsne_out_irlba$costs, tolerance = .01, scale = 1) }) test_that("Error conditions", { expect_error(Rtsne("test", matrix(0, 50, 10), Y_init=Y_in, perplexity=10), "matrix") expect_error(Rtsne(distmat,is_distance = 3),"logical") expect_error(Rtsne(matrix(0,2,3),is_distance = TRUE),"Input") expect_error(Rtsne(matrix(0,100,3)),"duplicates") expect_error(Rtsne(matrix(0,2,3),pca_center = 2),"TRUE") expect_error(Rtsne(matrix(0,2,3),initial_dims=1.3),"dimensionality") expect_error(Rtsne(matrix(0,2,3),dims=4),"dims") expect_error(Rtsne(matrix(0,2,3),max_iter=1.5),"should") expect_error(Rtsne(matrix(0,2,3),Y_init=matrix(0,2,1)),"incorrect format") expect_error(Rtsne(matrix(0,2,3),perplexity = 0),"positive") expect_error(Rtsne(matrix(0,2,3),theta = -0.1),"lie") expect_error(Rtsne(matrix(0,2,3),theta = 1.001),"lie") expect_error(Rtsne(matrix(0,2,3),stop_lying_iter = -1),"positive") expect_error(Rtsne(matrix(0,2,3),mom_switch_iter = -1),"positive") expect_error(Rtsne(matrix(0,2,3),momentum = -0.1),"positive") expect_error(Rtsne(matrix(0,2,3),final_momentum = -0.1),"positive") expect_error(Rtsne(matrix(0,2,3),eta = 0.0),"positive") expect_error(Rtsne(matrix(0,2,3),exaggeration_factor = 0.0),"positive") expect_error(Rtsne(matrix(0,2,3)),"perplexity is too large") }) test_that("Verbose option", { expect_output(Rtsne(iris_unique[,1:4],pca=TRUE,verbose=TRUE,max_iter=150),"Fitting performed") }) Rtsne/tests/testthat/test_neighbors.R0000644000176200001440000001066614531100546017521 0ustar liggesuserscontext("Rtsne neighbor input") set.seed(101) # The original iris_matrix has a few tied distances, which alters the order of nearest neighbours. # This then alters the order of addition when computing various statistics; # which results in small rounding errors that are amplified across t-SNE iterations. # Hence, I have to simulate a case where no ties are possible. simdata <- matrix(rnorm(234*5), nrow=234) NNFUN <- function(D, K) # A quick reference function for computing NNs, to avoid depending on other packages. { all.indices <- matrix(0L, nrow(D), K) all.distances <- matrix(0, nrow(D), K) for (i in seq_len(nrow(D))) { current <- D[i,] by.dist <- setdiff(order(current), i) all.indices[i,] <- head(by.dist, ncol(all.indices)) all.distances[i,] <- current[all.indices[i,]] } list(index=all.indices, distance=all.distances) } test_that("Rtsne with nearest-neighbor input compares to distance matrix input", { D <- as.matrix(dist(simdata)) out <- NNFUN(D, 90) # 3 * perplexity all.indices <- out$index all.distances <- out$distance # The vptree involves a few R::runif calls, which alters the seed in the precomputed distance case. # This results in different random initialization of Y. # Thus, we need to supply a pre-computed Y as well. Y_in <- matrix(runif(nrow(simdata)*2), ncol=2) out <- Rtsne_neighbors(all.indices, all.distances, Y_init=Y_in, perplexity=30, max_iter=1) blah <- Rtsne(D, is_distance=TRUE, Y_init=Y_in, perplexity=30, max_iter=1) expect_equal(out$Y, blah$Y) # Trying again with a different number of neighbors. Y_in <- matrix(runif(nrow(simdata)*2), ncol=2) out <- Rtsne_neighbors(all.indices[,1:30], all.distances[,1:30], Y_init=Y_in, perplexity=10) blah <- Rtsne(D, is_distance=TRUE, Y_init=Y_in, perplexity=10) expect_equal(out$Y, blah$Y) }) test_that("Rtsne with nearest-neighbor input behaves upon normalization", { D <- as.matrix(dist(normalize_input(simdata))) out <- NNFUN(D, 90) # 3 * perplexity all.indices <- out$index all.distances <- out$distance # The vptree involves a few R::runif calls, which alters the seed in the precomputed distance case. # This results in different random initialization of Y. # Thus, we need to supply a pre-computed Y as well. Y_in <- matrix(runif(nrow(simdata)*2), ncol=2) out <- Rtsne_neighbors(all.indices, all.distances, Y_init=Y_in, perplexity=30, max_iter=1) blah <- Rtsne(D, is_distance=TRUE, Y_init=Y_in, perplexity=30, max_iter=1) expect_equal(out$Y, blah$Y) # Trying again with a different number of neighbors. Y_in <- matrix(runif(nrow(simdata)*2), ncol=2) out <- Rtsne_neighbors(all.indices[,1:30], all.distances[,1:30], Y_init=Y_in, perplexity=10) blah <- Rtsne(D, is_distance=TRUE, Y_init=Y_in, perplexity=10) expect_equal(out$Y, blah$Y) }) test_that("Rtsne with nearest-neighbor input gives no errors for no_dims 1 and 3", { D <- as.matrix(dist(simdata)) out <- NNFUN(D, 90) # 3 * perplexity all.indices <- out$index all.distances <- out$distance # The vptree involves a few R::runif calls, which alters the seed in the precomputed distance case. # This results in different random initialization of Y. # Thus, we need to supply a pre-computed Y as well. Y_in <- matrix(runif(nrow(simdata)*1), ncol=1) out <- Rtsne_neighbors(all.indices, all.distances, Y_init=Y_in, perplexity=30, max_iter=1, dims = 1) blah <- Rtsne(D, is_distance=TRUE, Y_init=Y_in, perplexity=30, max_iter=1,dims = 1) expect_equal(out$Y, blah$Y) Y_in <- matrix(runif(nrow(simdata)*3), ncol=3) out <- Rtsne_neighbors(all.indices, all.distances, Y_init=Y_in, perplexity=30, max_iter=1, dims = 3) blah <- Rtsne(D, is_distance=TRUE, Y_init=Y_in, perplexity=30, max_iter=1,dims = 3) expect_equal(out$Y, blah$Y) }) test_that("error conditions are correctly explored", { expect_error(Rtsne_neighbors("yay", matrix(0, 50, 10), Y_init=Y_in, perplexity=10), "matrix") expect_error(Rtsne_neighbors(matrix(0L, 50, 5), matrix(0, 50, 10), Y_init=Y_in, perplexity=10), "same dimensions") expect_error(Rtsne_neighbors(matrix(0L, 50, 10), matrix(0, 50, 10), Y_init=Y_in, perplexity=10), "invalid indices") expect_error(Rtsne_neighbors(matrix(51L, 50, 10), matrix(0, 50, 10), Y_init=Y_in, perplexity=10), "invalid indices") expect_error(Rtsne_neighbors(matrix(NA_real_, 50, 10), matrix(0, 50, 10), Y_init=Y_in, perplexity=10), "invalid indices") }) Rtsne/tests/testthat.R0000644000176200001440000000004514531100546014470 0ustar liggesuserslibrary(testthat) test_check("Rtsne")Rtsne/src/0000755000176200001440000000000014534220352012134 5ustar liggesusersRtsne/src/Rtsne.cpp0000644000176200001440000001161214534210736013741 0ustar liggesusers#include #include "tsne.h" using namespace Rcpp; Rcpp::List save_results(int N, int no_dims, const std::vector& Y, const std::vector& costs, const std::vector& itercosts, double theta, double perplexity, int D, int stop_lying_iter, int mom_switch_iter, double momentum, double final_momentum, double eta, double exaggeration_factor); // Function that runs the Barnes-Hut implementation of t-SNE // [[Rcpp::export]] Rcpp::List Rtsne_cpp(NumericMatrix X, int no_dims, double perplexity, double theta, bool verbose, int max_iter, bool distance_precomputed, NumericMatrix Y_in, bool init, int stop_lying_iter, int mom_switch_iter, double momentum, double final_momentum, double eta, double exaggeration_factor, unsigned int num_threads) { size_t N = X.ncol(), D = X.nrow(); double * data=X.begin(); if (verbose) Rprintf("Read the %lu x %lu data matrix successfully!\n", (unsigned long)N, (unsigned long)D); std::vector Y(N * no_dims), costs(N), itercosts(static_cast(std::ceil(max_iter/50.0))); // Providing user-supplied solution. if (init) { for (size_t i = 0; i < Y.size(); i++) Y[i] = Y_in[i]; if (verbose) Rprintf("Using user supplied starting positions\n"); } // Run tsne if (no_dims==1) { TSNE<1> tsne(perplexity, theta, verbose, max_iter, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads); tsne.run(data, N, D, Y.data(), distance_precomputed, costs.data(), itercosts.data()); } else if (no_dims==2) { TSNE<2> tsne(perplexity, theta, verbose, max_iter, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads); tsne.run(data, N, D, Y.data(), distance_precomputed, costs.data(), itercosts.data()); } else if (no_dims==3) { TSNE<3> tsne(perplexity, theta, verbose, max_iter, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads); tsne.run(data, N, D, Y.data(), distance_precomputed, costs.data(), itercosts.data()); } else { Rcpp::stop("Only 1, 2 or 3 dimensional output is suppported.\n"); } return Rcpp::List::create(Rcpp::_["Y"]=Rcpp::NumericMatrix(no_dims, N, Y.data()), Rcpp::_["costs"]=Rcpp::NumericVector(costs.begin(), costs.end()), Rcpp::_["itercosts"]=Rcpp::NumericVector(itercosts.begin(), itercosts.end())); } // Function that runs the Barnes-Hut implementation of t-SNE on nearest neighbor results. // [[Rcpp::export]] Rcpp::List Rtsne_nn_cpp(IntegerMatrix nn_dex, NumericMatrix nn_dist, int no_dims, double perplexity, double theta, bool verbose, int max_iter, NumericMatrix Y_in, bool init, int stop_lying_iter, int mom_switch_iter, double momentum, double final_momentum, double eta, double exaggeration_factor, unsigned int num_threads) { size_t N = nn_dex.ncol(), K=nn_dex.nrow(); // transposed - columns are points, rows are neighbors. if (verbose) Rprintf("Read the NN results for %lu points successfully!\n", (unsigned long)N); std::vector Y(N * no_dims), costs(N), itercosts(static_cast(std::ceil(max_iter/50.0))); // Providing user-supplied solution. if (init) { for (size_t i = 0; i < Y.size(); i++) Y[i] = Y_in[i]; if (verbose) Rprintf("Using user supplied starting positions\n"); } // Run tsne if (no_dims==1) { TSNE<1> tsne(perplexity, theta, verbose, max_iter, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads); tsne.run(nn_dex.begin(), nn_dist.begin(), N, K, Y.data(), costs.data(), itercosts.data()); } else if (no_dims==2) { TSNE<2> tsne(perplexity, theta, verbose, max_iter, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads); tsne.run(nn_dex.begin(), nn_dist.begin(), N, K, Y.data(), costs.data(), itercosts.data()); } else if (no_dims==3) { TSNE<3> tsne(perplexity, theta, verbose, max_iter, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads); tsne.run(nn_dex.begin(), nn_dist.begin(), N, K, Y.data(), costs.data(), itercosts.data()); } else { Rcpp::stop("Only 1, 2 or 3 dimensional output is suppported.\n"); } return Rcpp::List::create(Rcpp::_["Y"]=Rcpp::NumericMatrix(no_dims, N, Y.data()), Rcpp::_["costs"]=Rcpp::NumericVector(costs.begin(), costs.end()), Rcpp::_["itercosts"]=Rcpp::NumericVector(itercosts.begin(), itercosts.end())); } Rtsne/src/vptree.h0000644000176200001440000001740314531100546013616 0ustar liggesusers/* * * Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology) * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the Delft University of Technology. * 4. Neither the name of the Delft University of Technology nor the names of * its contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO * EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY * OF SUCH DAMAGE. * */ /* This code was adopted with minor modifications from Steve Hanov's great tutorial at http://stevehanov.ca/blog/index.php?id=130 */ #include #include #include #include #include #include #include #include #include "datapoint.h" #ifndef VPTREE_H #define VPTREE_H double euclidean_distance(const DataPoint &t1, const DataPoint &t2) { double dd = .0; for(int d = 0; d < t1.dimensionality(); d++) dd += (t1.x(d) - t2.x(d)) * (t1.x(d) - t2.x(d)); return sqrt(dd); } double precomputed_distance(const DataPoint &t1, const DataPoint &t2) { double dd = .0; dd = t1.x(t2.index()); return dd; } template class VpTree { public: // Default constructor VpTree() : _root(0) {} // Destructor ~VpTree() { delete _root; } // Function to create a new VpTree from data void create(const std::vector& items) { delete _root; _items = items; _root = buildFromPoints(0, items.size()); } // Function that uses the tree to find the k nearest neighbors of target void search(const T& target, int k, std::vector* results, std::vector* distances) { // Use a priority queue to store intermediate results on std::priority_queue heap; // Variable that tracks the distance to the farthest point in our results double tau = DBL_MAX; // Perform the search search(_root, target, k, heap, &tau); // Gather final results results->clear(); distances->clear(); while(!heap.empty()) { results->push_back(_items[heap.top().index]); distances->push_back(heap.top().dist); heap.pop(); } // Results are in reverse order std::reverse(results->begin(), results->end()); std::reverse(distances->begin(), distances->end()); } private: std::vector _items; // Single node of a VP tree (has a point and radius; left children are closer to point than the radius) struct Node { int index; // index of point in node double threshold; // radius(?) Node* left; // points closer by than threshold Node* right; // points farther away than threshold Node() : index(0), threshold(0.), left(0), right(0) {} ~Node() { // destructor delete left; delete right; } }* _root; // An item on the intermediate result queue struct HeapItem { HeapItem( int index, double dist) : index(index), dist(dist) {} int index; double dist; bool operator<(const HeapItem& o) const { return dist < o.dist; } }; // Distance comparator for use in std::nth_element struct DistanceComparator { const T& item; DistanceComparator(const T& item) : item(item) {} bool operator()(const T& a, const T& b) { return distance(item, a) < distance(item, b); } }; // Function that (recursively) fills the tree Node* buildFromPoints( int lower, int upper ) { if (upper == lower) { // indicates that we're done here! return NULL; } // Lower index is center of current node Node* node = new Node(); node->index = lower; if (upper - lower > 1) { // if we did not arrive at leaf yet // Choose an arbitrary point and move it to the start Rcpp::RNGScope scope; int i = (int) ((double)R::runif(0,1) * (upper - lower - 1)) + lower; std::swap(_items[lower], _items[i]); // Partition around the median distance int median = (upper + lower) / 2; std::nth_element(_items.begin() + lower + 1, _items.begin() + median, _items.begin() + upper, DistanceComparator(_items[lower])); // Threshold of the new node will be the distance to the median node->threshold = distance(_items[lower], _items[median]); // Recursively build tree node->index = lower; node->left = buildFromPoints(lower + 1, median); node->right = buildFromPoints(median, upper); } // Return result return node; } // Helper function that searches the tree void search(Node* node, const T& target, unsigned int k, std::priority_queue& heap, double* ptau) { if(node == NULL) return; // indicates that we're done here // Compute distance between target and current node double dist = distance(_items[node->index], target); // If current node within radius tau if(dist < (*ptau)) { if(heap.size() == k) heap.pop(); // remove furthest node from result list (if we already have k results) heap.push(HeapItem(node->index, dist)); // add current node to result list if(heap.size() == k) *ptau = heap.top().dist; // update value of tau (farthest point in result list) } // Return if we arrived at a leaf if(node->left == NULL && node->right == NULL) { return; } // If the target lies within the radius of ball if(dist < node->threshold) { if(dist - (*ptau) <= node->threshold) { // if there can still be neighbors inside the ball, recursively search left child first search(node->left, target, k, heap, ptau); } if(dist + (*ptau) >= node->threshold) { // if there can still be neighbors outside the ball, recursively search right child search(node->right, target, k, heap, ptau); } // If the target lies outsize the radius of the ball } else { if(dist + (*ptau) >= node->threshold) { // if there can still be neighbors outside the ball, recursively search right child first search(node->right, target, k, heap, ptau); } if(dist - (*ptau) <= node->threshold) { // if there can still be neighbors inside the ball, recursively search left child search(node->left, target, k, heap, ptau); } } } }; #endif Rtsne/src/tsne.h0000644000176200001440000001044114531100546013255 0ustar liggesusers/* * * Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology) * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the Delft University of Technology. * 4. Neither the name of the Delft University of Technology nor the names of * its contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO * EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY * OF SUCH DAMAGE. * */ #include "datapoint.h" #include #ifndef TSNE_H #define TSNE_H static inline double sign_tsne(double x) { return (x == .0 ? .0 : (x < .0 ? -1.0 : 1.0)); } template class TSNE { public: TSNE(double perplexity, double theta, bool verbose, int max_iter, bool init, int stop_lying_iter, int mom_switch_iter, double momentum, double final_momentum, double eta, double exaggeration_factor,int num_threads); void run(double* X, unsigned int N, int D, double* Y, bool distance_precomputed, double* cost, double* itercost); void run(const int* nn_index, const double* nn_dist, unsigned int N, int K, double* Y, double* cost, double* itercost); private: void symmetrizeMatrix(unsigned int N); void trainIterations(unsigned int N, double* Y, double* cost, double* itercost); void computeGradient(double* P, unsigned int* inp_row_P, unsigned int* inp_col_P, double* inp_val_P, double* Y, unsigned int N, int D, double* dC, double theta); void computeExactGradient(double* P, double* Y, unsigned int N, int D, double* dC); double evaluateError(double* P, double* Y, unsigned int N, int D); double evaluateError(unsigned int* row_P, unsigned int* col_P, double* val_P, double* Y, unsigned int N, int D, double theta); void getCost(double* P, double* Y, unsigned int N, int D, double* costs); void getCost(unsigned int* row_P, unsigned int* col_P, double* val_P, double* Y, unsigned int N, int D, double theta, double* costs); void zeroMean(double* X, unsigned int N, int D); void computeGaussianPerplexity(double* X, unsigned int N, int D, bool distance_precomputed); template void computeGaussianPerplexity(double* X, unsigned int N, int D, int K); void computeGaussianPerplexity(const int* nn_dex, const double* nn_dist, unsigned int N, int K); void setupApproximateMemory(unsigned int N, int K); void computeProbabilities(const double perplexity, const int K, const double* distances, double* cur_P); void computeSquaredEuclideanDistance(double* X, unsigned int N, int D, double* DD); void computeSquaredEuclideanDistanceDirect(double* X, unsigned int N, int D, double* DD); double randn(); // Member variables. double perplexity, theta, momentum, final_momentum, eta, exaggeration_factor; int max_iter, stop_lying_iter, mom_switch_iter, num_threads; bool verbose, init, exact; std::vector row_P, col_P; std::vector val_P, P; }; #endif Rtsne/src/datapoint.h0000644000176200001440000000223214531100546014266 0ustar liggesusers#ifndef DATAPOINT_H #define DATAPOINT_H class DataPoint { int _ind; public: double* _x; int _D; DataPoint() { _D = 1; _ind = -1; _x = NULL; } DataPoint(int D, int ind, double* x) { _D = D; _ind = ind; _x = (double*) malloc(_D * sizeof(double)); for(int d = 0; d < _D; d++) _x[d] = x[d]; } DataPoint(const DataPoint& other) { // this makes a deep copy -- should not free anything if(this != &other) { _D = other.dimensionality(); _ind = other.index(); _x = (double*) malloc(_D * sizeof(double)); for(int d = 0; d < _D; d++) _x[d] = other.x(d); } } ~DataPoint() { if(_x != NULL) free(_x); } DataPoint& operator= (const DataPoint& other) { // asignment should free old object if(this != &other) { if(_x != NULL) free(_x); _D = other.dimensionality(); _ind = other.index(); _x = (double*) malloc(_D * sizeof(double)); for(int d = 0; d < _D; d++) _x[d] = other.x(d); } return *this; } int index() const { return _ind; } int dimensionality() const { return _D; } double x(int d) const { return _x[d]; } }; #endif Rtsne/src/sptree.cpp0000644000176200001440000003173414531100546014151 0ustar liggesusers/* * * Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology) * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the Delft University of Technology. * 4. Neither the name of the Delft University of Technology nor the names of * its contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO * EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY * OF SUCH DAMAGE. * */ #include #include #include #include #include #include #include #include "sptree.h" // Constructs cell template Cell::Cell() { } template Cell::Cell(double* inp_corner, double* inp_width) { for(int d = 0; d < NDims; d++) setCorner(d, inp_corner[d]); for(int d = 0; d < NDims; d++) setWidth( d, inp_width[d]); } // Destructs cell template Cell::~Cell() { } template double Cell::getCorner(unsigned int d) const { return corner[d]; } template double Cell::getWidth(unsigned int d) const { return width[d]; } template void Cell::setCorner(unsigned int d, double val) { corner[d] = val; } template void Cell::setWidth(unsigned int d, double val) { width[d] = val; } // Checks whether a point lies in a cell template bool Cell::containsPoint(double point[]) const { for(int d = 0; d < NDims; d++) { if(corner[d] - width[d] > point[d]) return false; if(corner[d] + width[d] < point[d]) return false; } return true; } // Default constructor for SPTree -- build tree, too! template SPTree::SPTree(double* inp_data, unsigned int N) { // Compute mean, width, and height of current map (boundaries of SPTree) int nD = 0; double* mean_Y = (double*) calloc(NDims, sizeof(double)); double* min_Y = (double*) malloc(NDims * sizeof(double)); double* max_Y = (double*) malloc(NDims * sizeof(double)); for(unsigned int d = 0; d < NDims; d++) { min_Y[d] = DBL_MAX; max_Y[d] = -DBL_MAX; } for(unsigned int n = 0; n < N; n++) { for(unsigned int d = 0; d < NDims; d++) { mean_Y[d] += inp_data[n * NDims + d]; if(inp_data[nD + d] < min_Y[d]) min_Y[d] = inp_data[nD + d]; if(inp_data[nD + d] > max_Y[d]) max_Y[d] = inp_data[nD + d]; } nD += NDims; } for(int d = 0; d < NDims; d++) mean_Y[d] /= (double) N; // Construct SPTree double* width = (double*) malloc(NDims * sizeof(double)); for(int d = 0; d < NDims; d++) width[d] = max_tsne(max_Y[d] - mean_Y[d], mean_Y[d] - min_Y[d]) + 1e-5; init(NULL, inp_data, mean_Y, width); fill(N); // Clean up memory free(mean_Y); free(max_Y); free(min_Y); free(width); } // Constructor for SPTree with particular size and parent -- build the tree, too! template SPTree::SPTree(double* inp_data, unsigned int N, double* inp_corner, double* inp_width) { init(NULL, inp_data, inp_corner, inp_width); fill(N); } // Constructor for SPTree with particular size (do not fill the tree) template SPTree::SPTree(double* inp_data, double* inp_corner, double* inp_width) { init(NULL, inp_data, inp_corner, inp_width); } // Constructor for SPTree with particular size and parent (do not fill tree) template SPTree::SPTree(SPTree* inp_parent, double* inp_data, double* inp_corner, double* inp_width) { init(inp_parent, inp_data, inp_corner, inp_width); } // Constructor for SPTree with particular size and parent -- build the tree, too! template SPTree::SPTree(SPTree* inp_parent, double* inp_data, unsigned int N, double* inp_corner, double* inp_width) { init(inp_parent, inp_data, inp_corner, inp_width); fill(N); } // Main initialization function template void SPTree::init(SPTree* inp_parent, double* inp_data, double* inp_corner, double* inp_width) { parent = inp_parent; data = inp_data; is_leaf = true; size = 0; cum_size = 0; for(unsigned int d = 0; d < NDims; d++) boundary.setCorner(d, inp_corner[d]); for(unsigned int d = 0; d < NDims; d++) boundary.setWidth( d, inp_width[d]); for(unsigned int i = 0; i < no_children; i++) children[i] = NULL; for(unsigned int d = 0; d < NDims; d++) center_of_mass[d] = .0; } // Destructor for SPTree template SPTree::~SPTree() { for(unsigned int i = 0; i < no_children; i++) { if(children[i] != NULL) delete children[i]; } } // Update the data underlying this tree template void SPTree::setData(double* inp_data) { data = inp_data; } // Get the parent of the current tree template SPTree* SPTree::getParent() { return parent; } // Insert a point into the SPTree template bool SPTree::insert(unsigned int new_index) { // Ignore objects which do not belong in this quad tree double* point = data + new_index * NDims; if(!boundary.containsPoint(point)) return false; // Online update of cumulative size and center-of-mass cum_size++; double mult1 = (double) (cum_size - 1) / (double) cum_size; double mult2 = 1.0 / (double) cum_size; for(unsigned int d = 0; d < NDims; d++) { center_of_mass[d] = center_of_mass[d] * mult1 + mult2 * point[d]; } // If there is space in this quad tree and it is a leaf, add the object here if(is_leaf && size < QT_NODE_CAPACITY) { index[size] = new_index; size++; return true; } // Don't add duplicates for now (this is not very nice) bool any_duplicate = false; for(unsigned int n = 0; n < size; n++) { bool duplicate = true; for(unsigned int d = 0; d < NDims; d++) { if(point[d] != data[index[n] * NDims + d]) { duplicate = false; break; } } any_duplicate = any_duplicate | duplicate; } if(any_duplicate) return true; // Otherwise, we need to subdivide the current cell if(is_leaf) subdivide(); // Find out where the point can be inserted for(unsigned int i = 0; i < no_children; i++) { if(children[i]->insert(new_index)) return true; } // Otherwise, the point cannot be inserted (this should never happen) return false; } // Create four children which fully divide this cell into four quads of equal area template void SPTree::subdivide() { // Create new children double new_corner[NDims]; double new_width[NDims]; for(unsigned int i = 0; i < no_children; i++) { unsigned int div = 1; for(unsigned int d = 0; d < NDims; d++) { new_width[d] = .5 * boundary.getWidth(d); if((i / div) % 2 == 1) new_corner[d] = boundary.getCorner(d) - .5 * boundary.getWidth(d); else new_corner[d] = boundary.getCorner(d) + .5 * boundary.getWidth(d); div *= 2; } children[i] = new SPTree(this, data, new_corner, new_width); } // Move existing points to correct children for(unsigned int i = 0; i < size; i++) { bool success = false; for(unsigned int j = 0; j < no_children; j++) { if(!success) success = children[j]->insert(index[i]); } index[i] = -1; } // Empty parent node size = 0; is_leaf = false; } // Build SPTree on dataset template void SPTree::fill(unsigned int N) { for(unsigned int i = 0; i < N; i++) insert(i); } // Checks whether the specified tree is correct template bool SPTree::isCorrect() { for(unsigned int n = 0; n < size; n++) { double* point = data + index[n] * NDims; if(!boundary.containsPoint(point)) return false; } if(!is_leaf) { bool correct = true; for(int i = 0; i < no_children; i++) correct = correct && children[i]->isCorrect(); return correct; } else return true; } // Build a list of all indices in SPTree template void SPTree::getAllIndices(unsigned int* indices) { getAllIndices(indices, 0); } // Build a list of all indices in SPTree template unsigned int SPTree::getAllIndices(unsigned int* indices, unsigned int loc) { // Gather indices in current quadrant for(unsigned int i = 0; i < size; i++) indices[loc + i] = index[i]; loc += size; // Gather indices in children if(!is_leaf) { for(int i = 0; i < no_children; i++) loc = children[i]->getAllIndices(indices, loc); } return loc; } template unsigned int SPTree::getDepth() { if(is_leaf) return 1; int depth = 0; for(unsigned int i = 0; i < no_children; i++) depth = max_tsne(depth, children[i]->getDepth()); return 1 + depth; } // Compute non-edge forces using Barnes-Hut algorithm template double SPTree::computeNonEdgeForces(unsigned int point_index, double theta, double neg_f[]) const { double resultSum = 0; double buff[NDims]; // make buff local for parallelization // Make sure that we spend no time on empty nodes or self-interactions if(cum_size == 0 || (is_leaf && size == 1 && index[0] == point_index)) return resultSum; // Compute distance between point and center-of-mass double sqdist = .0; unsigned int ind = point_index * NDims; for(unsigned int d = 0; d < NDims; d++) { buff[d] = data[ind + d] - center_of_mass[d]; sqdist += buff[d] * buff[d]; } // Check whether we can use this node as a "summary" double max_width = 0.0; double cur_width; for(unsigned int d = 0; d < NDims; d++) { cur_width = boundary.getWidth(d); max_width = (max_width > cur_width) ? max_width : cur_width; } if(is_leaf || max_width / sqrt(sqdist) < theta) { // Compute and add t-SNE force between point and current node sqdist = 1.0 / (1.0 + sqdist); double mult = cum_size * sqdist; resultSum += mult; mult *= sqdist; for(unsigned int d = 0; d < NDims; d++) neg_f[d] += mult * buff[d]; } else { // Recursively apply Barnes-Hut to children for(unsigned int i = 0; i < no_children; i++){ resultSum += children[i]->computeNonEdgeForces(point_index, theta, neg_f); } } return resultSum; } // Computes edge forces template void SPTree::computeEdgeForces(unsigned int* row_P, unsigned int* col_P, double* val_P, unsigned int N, double* pos_f, int num_threads) const { // Loop over all edges in the graph #pragma omp parallel for schedule(static) num_threads(num_threads) for(unsigned int n = 0; n < N; n++) { unsigned int ind1 = n * NDims; for(unsigned int i = row_P[n]; i < row_P[n + 1]; i++) { double buff[NDims]; // make buff local for parallelization // Compute pairwise distance and Q-value double sqdist = 1.0; unsigned int ind2 = col_P[i] * NDims; for(unsigned int d = 0; d < NDims; d++) { buff[d] = data[ind1 + d] - data[ind2 + d]; sqdist += buff[d] * buff[d]; } sqdist = val_P[i] / sqdist; // Sum positive force for(unsigned int d = 0; d < NDims; d++) pos_f[ind1 + d] += sqdist * buff[d]; } } } // Print out tree template void SPTree::print() { if(cum_size == 0) { Rprintf("Empty node\n"); return; } if(is_leaf) { Rprintf("Leaf node; data = ["); for(unsigned int i = 0; i < size; i++) { double* point = data + index[i] * NDims; for(int d = 0; d < NDims; d++) Rprintf("%f, ", point[d]); Rprintf(" (index = %d)", index[i]); if(i < size - 1) Rprintf("\n"); else Rprintf("]\n"); } } else { Rprintf("Intersection node with center-of-mass = ["); for(int d = 0; d < NDims; d++) Rprintf("%f, ", center_of_mass[d]); Rprintf("]; children are:\n"); for(int i = 0; i < no_children; i++) children[i]->print(); } } // declare templates explicitly template class SPTree<1>; template class SPTree<2>; template class SPTree<3>; Rtsne/src/Makevars0000644000176200001440000000212114531100546013623 0ustar liggesusers## Use the R_HOME indirection to support installations of multiple R version ## PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS) PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) ## As an alternative, one can also add this code in a file 'configure' ## ## PKG_LIBS=`${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"` ## ## sed -e "s|@PKG_LIBS@|${PKG_LIBS}|" \ ## src/Makevars.in > src/Makevars ## ## which together with the following file 'src/Makevars.in' ## ## PKG_LIBS = @PKG_LIBS@ ## ## can be used to create src/Makevars dynamically. This scheme is more ## powerful and can be expanded to also check for and link with other ## libraries. It should be complemented by a file 'cleanup' ## ## rm src/Makevars ## ## which removes the autogenerated file src/Makevars. ## ## Of course, autoconf can also be used to write configure files. This is ## done by a number of packages, but recommended only for more advanced users ## comfortable with autoconf and its related tools. Rtsne/src/Makevars.win0000644000176200001440000000047314531100546014427 0ustar liggesusers ## Use the R_HOME indirection to support installations of multiple R version ## PKG_LIBS = $(shell "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" -e "Rcpp:::LdFlags()") $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) PKG_LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(SHLIB_OPENMP_CXXFLAGS) PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) Rtsne/src/normalize_input.cpp0000644000176200001440000000206414531100546016060 0ustar liggesusers#include "Rcpp.h" #include // Function that performs the matrix normalization. // [[Rcpp::export]] Rcpp::NumericMatrix normalize_input_cpp(Rcpp::NumericMatrix input) { // Rows are observations, columns are variables. Rcpp::NumericMatrix output=Rcpp::clone(input); const int N=output.nrow(), D=output.ncol(); // Running through each column and centering it. Rcpp::NumericMatrix::iterator oIt=output.begin(); for (int d=0; d max_X) max_X = tmp; } for (oIt=output.begin(); oIt!=output.end(); ++oIt) { *oIt /= max_X; } return output; } Rtsne/src/tsne.cpp0000644000176200001440000007242214531100546013617 0ustar liggesusers/* * * Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology) * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the Delft University of Technology. * 4. Neither the name of the Delft University of Technology nor the names of * its contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO * EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY * OF SUCH DAMAGE. * */ #define USE_FC_LEN_T #include #include #include #include #include #include #include #include "sptree.h" #include "vptree.h" #include "tsne.h" #ifdef _OPENMP #include #endif extern "C" { #include #include } #ifndef FCONE # define FCONE #endif using namespace std; template TSNE::TSNE(double Perplexity, double Theta, bool Verbose, int Max_iter, bool Init, int Stop_lying_iter, int Mom_switch_iter, double Momentum, double Final_momentum, double Eta, double Exaggeration_factor, int Num_threads) : perplexity(Perplexity), theta(Theta), momentum(Momentum), final_momentum(Final_momentum), eta(Eta), exaggeration_factor(Exaggeration_factor), max_iter(Max_iter), stop_lying_iter(Stop_lying_iter), mom_switch_iter(Mom_switch_iter), num_threads(Num_threads), verbose(Verbose), init(Init), exact(theta==.0) { #ifdef _OPENMP int threads = num_threads; if (num_threads==0) { threads = omp_get_max_threads(); } // Print notice whether OpenMP is used if (verbose) Rprintf("OpenMP is working. %d threads.\n", threads); #endif return; } // Perform t-SNE template void TSNE::run(double* X, unsigned int N, int D, double* Y, bool distance_precomputed, double* cost, double* itercost) { if(N - 1 < 3 * perplexity) { Rcpp::stop("Perplexity too large for the number of data points!\n"); } if (verbose) Rprintf("Using no_dims = %d, perplexity = %f, and theta = %f\n", NDims, perplexity, theta); if (verbose) Rprintf("Computing input similarities...\n"); clock_t start = clock(); // Compute input similarities for exact t-SNE if(exact) { // Compute similarities computeGaussianPerplexity(X, N, D, distance_precomputed); // Symmetrize input similarities if (verbose) Rprintf("Symmetrizing...\n"); for(unsigned long n = 0; n < N; n++) { for(unsigned long m = n + 1; m < N; m++) { P[n * N + m] += P[m * N + n]; P[m * N + n] = P[n * N + m]; } } double sum_P = 0; for(size_t i = 0; i < P.size(); i++) sum_P += P[i]; for(size_t i = 0; i < P.size(); i++) P[i] /= sum_P; } // Compute input similarities for approximate t-SNE else { int K=3*perplexity; // Compute asymmetric pairwise input similarities if (distance_precomputed) { computeGaussianPerplexity(X, N, D, K); } else { computeGaussianPerplexity(X, N, D, K); } // Symmetrize input similarities symmetrizeMatrix(N); double sum_P = .0; for(unsigned int i = 0; i < row_P[N]; i++) sum_P += val_P[i]; for(unsigned int i = 0; i < row_P[N]; i++) val_P[i] /= sum_P; } if (verbose) { clock_t end = clock(); if(exact) Rprintf("Done in %4.2f seconds!\nLearning embedding...\n", (float) (end - start) / CLOCKS_PER_SEC); else Rprintf("Done in %4.2f seconds (sparsity = %f)!\nLearning embedding...\n", (float) (end - start) / CLOCKS_PER_SEC, (double) row_P[N] / ((double) N * (double) N)); } trainIterations(N, Y, cost, itercost); return; } // Perform t-SNE with nearest neighbor results. template void TSNE::run(const int* nn_index, const double* nn_dist, unsigned int N, int K, double* Y, double* cost, double* itercost) { if(N - 1 < 3 * perplexity) { Rcpp::stop("Perplexity too large for the number of data points!\n"); } if (verbose) Rprintf("Using no_dims = %d, perplexity = %f, and theta = %f\n", NDims, perplexity, theta); if (verbose) Rprintf("Computing input similarities...\n"); clock_t start = clock(); // Compute asymmetric pairwise input similarities computeGaussianPerplexity(nn_index, nn_dist, N, K); // Symmetrize input similarities symmetrizeMatrix(N); double sum_P = .0; for(unsigned int i = 0; i < row_P[N]; i++) sum_P += val_P[i]; for(unsigned int i = 0; i < row_P[N]; i++) val_P[i] /= sum_P; if (verbose) { clock_t end = clock(); if(exact) Rprintf("Done in %4.2f seconds!\nLearning embedding...\n", (float) (end - start) / CLOCKS_PER_SEC); else Rprintf("Done in %4.2f seconds (sparsity = %f)!\nLearning embedding...\n", (float) (end - start) / CLOCKS_PER_SEC, (double) row_P[N] / ((double) N * (double) N)); } trainIterations(N, Y, cost, itercost); return; } // Perform main training loop template void TSNE::trainIterations(unsigned int N, double* Y, double* cost, double* itercost) { // Allocate some memory double* dY = (double*) malloc(N * NDims * sizeof(double)); double* uY = (double*) malloc(N * NDims * sizeof(double)); double* gains = (double*) malloc(N * NDims * sizeof(double)); if(dY == NULL || uY == NULL || gains == NULL) { Rcpp::stop("Memory allocation failed!\n"); } for(unsigned int i = 0; i < N * NDims; i++) uY[i] = .0; for(unsigned int i = 0; i < N * NDims; i++) gains[i] = 1.0; // Lie about the P-values if(exact) { for(unsigned long i = 0; i < (unsigned long)N * N; i++) P[i] *= exaggeration_factor; } else { for(unsigned long i = 0; i < row_P[N]; i++) val_P[i] *= exaggeration_factor; } // Initialize solution (randomly), if not already done if (!init) { for(unsigned int i = 0; i < N * NDims; i++) Y[i] = randn() * .0001; } clock_t start = clock(), end; float total_time=0; int costi = 0; //iterator for saving the total costs for the iterations for(int iter = 0; iter < max_iter; iter++) { // Stop lying about the P-values after a while, and switch momentum if(iter == stop_lying_iter) { if(exact) { for(unsigned long i = 0; i < (unsigned long)N * N; i++) P[i] /= exaggeration_factor; } else { for(unsigned int i = 0; i < row_P[N]; i++) val_P[i] /= exaggeration_factor; } } if(iter == mom_switch_iter) momentum = final_momentum; // Compute (approximate) gradient if(exact) computeExactGradient(P.data(), Y, N, NDims, dY); else computeGradient(P.data(), row_P.data(), col_P.data(), val_P.data(), Y, N, NDims, dY, theta); // Update gains for(unsigned int i = 0; i < N * NDims; i++) gains[i] = (sign_tsne(dY[i]) != sign_tsne(uY[i])) ? (gains[i] + .2) : (gains[i] * .8); for(unsigned int i = 0; i < N * NDims; i++) if(gains[i] < .01) gains[i] = .01; // Perform gradient update (with momentum and gains) for(unsigned int i = 0; i < N * NDims; i++) uY[i] = momentum * uY[i] - eta * gains[i] * dY[i]; for(unsigned int i = 0; i < N * NDims; i++) Y[i] = Y[i] + uY[i]; // Make solution zero-mean zeroMean(Y, N, NDims); // Print out progress if((iter > 0 && (iter+1) % 50 == 0) || iter == max_iter - 1) { end = clock(); double C = .0; if(exact) C = evaluateError(P.data(), Y, N, NDims); else C = evaluateError(row_P.data(), col_P.data(), val_P.data(), Y, N, NDims, theta); // doing approximate computation here! if(iter == 0) { if (verbose) Rprintf("Iteration %d: error is %f\n", iter + 1, C); } else { total_time += (float) (end - start) / CLOCKS_PER_SEC; if (verbose) Rprintf("Iteration %d: error is %f (50 iterations in %4.2f seconds)\n", iter+1, C, (float) (end - start) / CLOCKS_PER_SEC); } itercost[costi] = C; costi++; start = clock(); } } end = clock(); total_time += (float) (end - start) / CLOCKS_PER_SEC; if(exact) getCost(P.data(), Y, N, NDims, cost); else getCost(row_P.data(), col_P.data(), val_P.data(), Y, N, NDims, theta, cost); // doing approximate computation here! // Clean up memory free(dY); free(uY); free(gains); if (verbose) Rprintf("Fitting performed in %4.2f seconds.\n", total_time); return; } // Compute gradient of the t-SNE cost function (using Barnes-Hut algorithm) template void TSNE::computeGradient(double* P, unsigned int* inp_row_P, unsigned int* inp_col_P, double* inp_val_P, double* Y, unsigned int N, int D, double* dC, double theta) { // Construct space-partitioning tree on current map SPTree* tree = new SPTree(Y, N); // Compute all terms required for t-SNE gradient double* pos_f = (double*) calloc(N * D, sizeof(double)); double* neg_f = (double*) calloc(N * D, sizeof(double)); if(pos_f == NULL || neg_f == NULL) { Rcpp::stop("Memory allocation failed!\n"); } tree->computeEdgeForces(inp_row_P, inp_col_P, inp_val_P, N, pos_f, num_threads); // Storing the output to sum in single-threaded mode; avoid randomness in rounding errors. std::vector output(N); #pragma omp parallel for schedule(guided) num_threads(num_threads) for (unsigned int n = 0; n < N; n++) { output[n]=tree->computeNonEdgeForces(n, theta, neg_f + n * D); } double sum_Q = .0; for (unsigned int n=0; n void TSNE::computeExactGradient(double* P, double* Y, unsigned int N, int D, double* dC) { // Make sure the current gradient contains zeros for(unsigned int i = 0; i < N * D; i++) dC[i] = 0.0; // Compute the squared Euclidean distance matrix double* DD = (double*) malloc((unsigned long)N * N * sizeof(double)); if(DD == NULL) { Rcpp::stop("Memory allocation failed!\n"); } computeSquaredEuclideanDistance(Y, N, D, DD); // Compute Q-matrix and normalization sum double* Q = (double*) malloc((unsigned long)N * N * sizeof(double)); if(Q == NULL) { Rcpp::stop("Memory allocation failed!\n"); } double sum_Q = .0; for(unsigned long n = 0; n < N; n++) { for(unsigned long m = 0; m < N; m++) { if(n != m) { Q[n * N + m] = 1 / (1 + DD[n * N + m]); sum_Q += Q[n * N + m]; } } } // Perform the computation of the gradient for(unsigned long n = 0; n < N; n++) { for(unsigned long m = 0; m < N; m++) { if(n != m) { double mult = (P[n * N + m] - (Q[n * N + m] / sum_Q)) * Q[n * N + m]; for(int d = 0; d < D; d++) { dC[n * D + d] += (Y[n * D + d] - Y[m * D + d]) * mult; } } } } // Free memory free(DD); DD = NULL; free(Q); Q = NULL; } // Evaluate t-SNE cost function (exactly) template double TSNE::evaluateError(double* P, double* Y, unsigned int N, int D) { // Compute the squared Euclidean distance matrix double* DD = (double*) malloc((unsigned long)N * N * sizeof(double)); double* Q = (double*) malloc((unsigned long)N * N * sizeof(double)); if(DD == NULL || Q == NULL) { Rcpp::stop("Memory allocation failed!\n"); } computeSquaredEuclideanDistance(Y, N, D, DD); // Compute Q-matrix and normalization sum double sum_Q = DBL_MIN; for(unsigned long n = 0; n < N; n++) { for(unsigned long m = 0; m < N; m++) { if(n != m) { Q[n * N + m] = 1 / (1 + DD[n * N + m]); sum_Q += Q[n * N + m]; } else Q[n * N + m] = DBL_MIN; } } for(unsigned long i = 0; i < (unsigned long)N * N; i++) Q[i] /= sum_Q; // Sum t-SNE error double C = .0; for(unsigned long n = 0; n < N; n++) { for(unsigned long m = 0; m < N; m++) { C += P[n * N + m] * log((P[n * N + m] + 1e-9) / (Q[n * N + m] + 1e-9)); } } // Clean up memory free(DD); free(Q); return C; } // Evaluate t-SNE cost function (approximately) template double TSNE::evaluateError(unsigned int* row_P, unsigned int* col_P, double* val_P, double* Y, unsigned int N, int D, double theta) { // Get estimate of normalization term SPTree* tree = new SPTree(Y, N); double* buff = (double*) calloc(D, sizeof(double)); double sum_Q = .0; for(unsigned int n = 0; n < N; n++) sum_Q += tree->computeNonEdgeForces(n, theta, buff); // Loop over all edges to compute t-SNE error int ind1, ind2; double C = .0, Q; for(unsigned int n = 0; n < N; n++) { ind1 = n * D; for(unsigned int i = row_P[n]; i < row_P[n + 1]; i++) { Q = .0; ind2 = col_P[i] * D; for(int d = 0; d < D; d++) buff[d] = Y[ind1 + d]; for(int d = 0; d < D; d++) buff[d] -= Y[ind2 + d]; for(int d = 0; d < D; d++) Q += buff[d] * buff[d]; Q = (1.0 / (1.0 + Q)) / sum_Q; C += val_P[i] * log((val_P[i] + FLT_MIN) / (Q + FLT_MIN)); } } // Clean up memory free(buff); delete tree; return C; } // Evaluate t-SNE cost function (exactly) template void TSNE::getCost(double* P, double* Y, unsigned int N, int D, double* costs) { // Compute the squared Euclidean distance matrix double* DD = (double*) malloc((unsigned long)N * N * sizeof(double)); double* Q = (double*) malloc((unsigned long)N * N * sizeof(double)); if(DD == NULL || Q == NULL) { Rcpp::stop("Memory allocation failed!\n"); } computeSquaredEuclideanDistance(Y, N, D, DD); // Compute Q-matrix and normalization sum double sum_Q = DBL_MIN; for(unsigned long n = 0; n < N; n++) { for(unsigned long m = 0; m < N; m++) { if(n != m) { Q[n * N + m] = 1 / (1 + DD[n * N + m]); sum_Q += Q[n * N + m]; } else Q[n * N + m] = DBL_MIN; } } for(unsigned long i = 0; i < (unsigned long)N * N; i++) Q[i] /= sum_Q; // Sum t-SNE error for(unsigned long n = 0; n < N; n++) { costs[n] = 0.0; for(unsigned long m = 0; m < N; m++) { costs[n] += P[n * N + m] * log((P[n * N + m] + 1e-9) / (Q[n * N + m] + 1e-9)); } } // Clean up memory free(DD); free(Q); } // Evaluate t-SNE cost function (approximately) template void TSNE::getCost(unsigned int* row_P, unsigned int* col_P, double* val_P, double* Y, unsigned int N, int D, double theta, double* costs) { // Get estimate of normalization term SPTree* tree = new SPTree(Y, N); double* buff = (double*) calloc(D, sizeof(double)); double sum_Q = .0; for(unsigned int n = 0; n < N; n++) sum_Q += tree->computeNonEdgeForces(n, theta, buff); // Loop over all edges to compute t-SNE error int ind1, ind2; double Q; for(unsigned int n = 0; n < N; n++) { ind1 = n * D; costs[n] = 0.0; for(unsigned int i = row_P[n]; i < row_P[n + 1]; i++) { Q = .0; ind2 = col_P[i] * D; for(int d = 0; d < D; d++) buff[d] = Y[ind1 + d]; for(int d = 0; d < D; d++) buff[d] -= Y[ind2 + d]; for(int d = 0; d < D; d++) Q += buff[d] * buff[d]; Q = (1.0 / (1.0 + Q)) / sum_Q; costs[n] += val_P[i] * log((val_P[i] + FLT_MIN) / (Q + FLT_MIN)); } } // Clean up memory free(buff); delete tree; } // Compute input similarities with a fixed perplexity template void TSNE::computeGaussianPerplexity(double* X, unsigned int N, int D, bool distance_precomputed) { size_t N2=N; N2*=N; P.resize(N2); // Compute the squared Euclidean distance matrix double* DD = (double*) malloc(N2 * sizeof(double)); if(DD == NULL) { Rcpp::stop("Memory allocation failed!\n"); } if (distance_precomputed) { DD = X; } else { computeSquaredEuclideanDistanceDirect(X, N, D, DD); // Needed to make sure the results are exactly equal to distance calculation in R for (size_t n=0; n 0) { min_beta = beta; if(max_beta == DBL_MAX || max_beta == -DBL_MAX) beta *= 2.0; else beta = (beta + max_beta) / 2.0; } else { max_beta = beta; if(min_beta == -DBL_MAX || min_beta == DBL_MAX) beta /= 2.0; else beta = (beta + min_beta) / 2.0; } } // Update iteration counter iter++; } // Row normalize P for(unsigned long m = 0; m < N; m++) P[n * N + m] /= sum_P; } // Clean up memory if (!distance_precomputed) { free(DD); } DD = NULL; } // Compute input similarities with a fixed perplexity using ball trees (this function allocates memory another function should free) template template void TSNE::computeGaussianPerplexity(double* X, unsigned int N, int D, int K) { if(perplexity > K) Rprintf("Perplexity should be lower than K!\n"); // Allocate the memory we need setupApproximateMemory(N, K); // Build ball tree on data set VpTree* tree = new VpTree(); vector obj_X(N, DataPoint(D, -1, X)); for(unsigned int n = 0; n < N; n++) obj_X[n] = DataPoint(D, n, X + n * D); tree->create(obj_X); // Loop over all points to find nearest neighbors if (verbose) Rprintf("Building tree...\n"); int steps_completed = 0; #pragma omp parallel for schedule(guided) num_threads(num_threads) for(unsigned int n = 0; n < N; n++) { vector indices; vector distances; indices.reserve(K+1); distances.reserve(K+1); // Find nearest neighbors tree->search(obj_X[n], K + 1, &indices, &distances); double * cur_P = val_P.data() + row_P[n]; computeProbabilities(perplexity, K, distances.data()+1, cur_P); // +1 to avoid self. unsigned int * cur_col_P = col_P.data() + row_P[n]; for (int m=0; m void TSNE::computeGaussianPerplexity(const int* nn_idx, const double* nn_dist, unsigned int N, int K) { if(perplexity > K) Rprintf("Perplexity should be lower than K!\n"); // Allocate the memory we need setupApproximateMemory(N, K); // Loop over all points to find nearest neighbors int steps_completed = 0; #pragma omp parallel for schedule(guided) num_threads(num_threads) for(unsigned int n = 0; n < N; n++) { double * cur_P = val_P.data() + row_P[n]; computeProbabilities(perplexity, K, nn_dist + row_P[n], cur_P); const int * cur_idx = nn_idx + row_P[n]; unsigned int * cur_col_P = col_P.data() + row_P[n]; for (int m=0; m void TSNE::setupApproximateMemory(unsigned int N, int K) { row_P.resize(N+1); col_P.resize(N*K); val_P.resize(N*K); row_P[0] = 0; for(unsigned int n = 0; n < N; n++) row_P[n + 1] = row_P[n] + K; return; } template void TSNE::computeProbabilities (const double perplexity, const int K, const double* distances, double* cur_P) { // Initialize some variables for binary search bool found = false; double beta = 1.0; double min_beta = -DBL_MAX; double max_beta = DBL_MAX; double tol = 1e-5; // Iterate until we found a good perplexity int iter = 0; double sum_P; while(!found && iter < 200) { // Compute Gaussian kernel row for(int m = 0; m < K; m++) cur_P[m] = exp(-beta * distances[m] * distances[m]); // Compute entropy of current row sum_P = DBL_MIN; for(int m = 0; m < K; m++) sum_P += cur_P[m]; double H = .0; for(int m = 0; m < K; m++) H += beta * (distances[m] * distances[m] * cur_P[m]); H = (H / sum_P) + log(sum_P); // Evaluate whether the entropy is within the tolerance level double Hdiff = H - log(perplexity); if(Hdiff < tol && -Hdiff < tol) { found = true; } else { if(Hdiff > 0) { min_beta = beta; if(max_beta == DBL_MAX || max_beta == -DBL_MAX) beta *= 2.0; else beta = (beta + max_beta) / 2.0; } else { max_beta = beta; if(min_beta == -DBL_MAX || min_beta == DBL_MAX) beta /= 2.0; else beta = (beta + min_beta) / 2.0; } } // Update iteration counter iter++; } // Row-normalize current row of P. for(int m = 0; m < K; m++) { cur_P[m] /= sum_P; } return; } template void TSNE::symmetrizeMatrix(unsigned int N) { // Count number of elements and row counts of symmetric matrix int* row_counts = (int*) calloc(N, sizeof(int)); if(row_counts == NULL) { Rcpp::stop("Memory allocation failed!\n"); } for(unsigned int n = 0; n < N; n++) { for(unsigned int i = row_P[n]; i < row_P[n + 1]; i++) { // Check whether element (col_P[i], n) is present bool present = false; for(unsigned int m = row_P[col_P[i]]; m < row_P[col_P[i] + 1]; m++) { if(col_P[m] == n) present = true; } if(present) row_counts[n]++; else { row_counts[n]++; row_counts[col_P[i]]++; } } } int no_elem = 0; for(unsigned int n = 0; n < N; n++) no_elem += row_counts[n]; // Allocate memory for symmetrized matrix std::vector sym_row_P(N+1), sym_col_P(no_elem); std::vector sym_val_P(no_elem); // Construct new row indices for symmetric matrix sym_row_P[0] = 0; for(unsigned int n = 0; n < N; n++) sym_row_P[n + 1] = sym_row_P[n] + row_counts[n]; // Fill the result matrix int* offset = (int*) calloc(N, sizeof(int)); if(offset == NULL) { Rcpp::stop("Memory allocation failed!\n"); } for(unsigned int n = 0; n < N; n++) { for(unsigned int i = row_P[n]; i < row_P[n + 1]; i++) { // considering element(n, col_P[i]) // Check whether element (col_P[i], n) is present bool present = false; for(unsigned int m = row_P[col_P[i]]; m < row_P[col_P[i] + 1]; m++) { if(col_P[m] == n) { present = true; if(n <= col_P[i]) { // make sure we do not add elements twice sym_col_P[sym_row_P[n] + offset[n]] = col_P[i]; sym_col_P[sym_row_P[col_P[i]] + offset[col_P[i]]] = n; sym_val_P[sym_row_P[n] + offset[n]] = val_P[i] + val_P[m]; sym_val_P[sym_row_P[col_P[i]] + offset[col_P[i]]] = val_P[i] + val_P[m]; } } } // If (col_P[i], n) is not present, there is no addition involved if(!present) { sym_col_P[sym_row_P[n] + offset[n]] = col_P[i]; sym_col_P[sym_row_P[col_P[i]] + offset[col_P[i]]] = n; sym_val_P[sym_row_P[n] + offset[n]] = val_P[i]; sym_val_P[sym_row_P[col_P[i]] + offset[col_P[i]]] = val_P[i]; } // Update offsets if(!present || (present && n <= col_P[i])) { offset[n]++; if(col_P[i] != n) offset[col_P[i]]++; } } } // Divide the result by two for(int i = 0; i < no_elem; i++) sym_val_P[i] /= 2.0; // Return symmetrized matrices row_P.swap(sym_row_P); col_P.swap(sym_col_P); val_P.swap(sym_val_P); // Free up some memery free(offset); offset = NULL; free(row_counts); row_counts = NULL; } // Compute squared Euclidean distance matrix (using BLAS) template void TSNE::computeSquaredEuclideanDistance(double* X, unsigned int N, int D, double* DD) { double* dataSums = (double*) calloc(N, sizeof(double)); if(dataSums == NULL) { Rcpp::stop("Memory allocation failed!\n"); } for(unsigned int n = 0; n < N; n++) { for(int d = 0; d < D; d++) { dataSums[n] += (X[n * D + d] * X[n * D + d]); } } for(unsigned long n = 0; n < N; n++) { for(unsigned long m = 0; m < N; m++) { DD[n * N + m] = dataSums[n] + dataSums[m]; } } double a1 = -2.0; double a2 = 1.0; int Nsigned = N; dgemm_("T", "N", &Nsigned, &Nsigned, &D, &a1, X, &D, X, &D, &a2, DD, &Nsigned FCONE FCONE); free(dataSums); dataSums = NULL; } // Compute squared Euclidean distance matrix (No BLAS) template void TSNE::computeSquaredEuclideanDistanceDirect(double* X, unsigned int N, int D, double* DD) { const double* XnD = X; for(unsigned int n = 0; n < N; ++n, XnD += D) { const double* XmD = XnD + D; double* curr_elem = &DD[n*N + n]; *curr_elem = 0.0; double* curr_elem_sym = curr_elem + N; for(unsigned int m = n + 1; m < N; ++m, XmD+=D, curr_elem_sym+=N) { *(++curr_elem) = 0.0; for(int d = 0; d < D; ++d) { *curr_elem += (XnD[d] - XmD[d]) * (XnD[d] - XmD[d]); } *curr_elem_sym = *curr_elem; } } } // Makes data zero-mean template void TSNE::zeroMean(double* X, unsigned int N, int D) { // Compute data mean double* mean = (double*) calloc(D, sizeof(double)); if(mean == NULL) { Rcpp::stop("Memory allocation failed!\n"); } int nD = 0; for(unsigned int n = 0; n < N; n++) { for(int d = 0; d < D; d++) { mean[d] += X[nD + d]; } nD += D; } for(int d = 0; d < D; d++) { mean[d] /= (double) N; } // Subtract data mean nD = 0; for(unsigned int n = 0; n < N; n++) { for(int d = 0; d < D; d++) { X[nD + d] -= mean[d]; } nD += D; } free(mean); mean = NULL; } // Generates a Gaussian random number template double TSNE::randn() { Rcpp::RNGScope scope; double x, y, radius; do { x = 2 * (double)R::runif(0,1) - 1; y = 2 * (double)R::runif(0,1) - 1; radius = (x * x) + (y * y); } while((radius >= 1.0) || (radius == 0.0)); radius = sqrt(-2 * log(radius) / radius); x *= radius; y *= radius; return x; } // declare templates explicitly template class TSNE<1>; template class TSNE<2>; template class TSNE<3>; Rtsne/src/sptree.h0000644000176200001440000001044514531100546013612 0ustar liggesusers/* * * Copyright (c) 2014, Laurens van der Maaten (Delft University of Technology) * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the Delft University of Technology. * 4. Neither the name of the Delft University of Technology nor the names of * its contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY LAURENS VAN DER MAATEN ''AS IS'' AND ANY EXPRESS * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO * EVENT SHALL LAURENS VAN DER MAATEN BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY * OF SUCH DAMAGE. * */ #ifndef SPTREE_H #define SPTREE_H using namespace std; static inline double max_tsne(double x, double y) { return (x <= y ? y : x); } template class Cell { double corner[NDims]; double width[NDims]; public: Cell(); Cell(double* inp_corner, double* inp_width); ~Cell(); double getCorner(unsigned int d) const; double getWidth(unsigned int d) const; void setCorner(unsigned int d, double val); void setWidth(unsigned int d, double val); bool containsPoint(double point[]) const; }; template class SPTree { public: enum { no_children = 2 * SPTree::no_children }; private: // Fixed constants static const unsigned int QT_NODE_CAPACITY = 1; // Properties of this node in the tree SPTree* parent; unsigned int dimension; bool is_leaf; unsigned int size; unsigned int cum_size; int num_threads; // Axis-aligned bounding box stored as a center with half-dimensions to represent the boundaries of this quad tree Cell boundary; // Indices in this space-partitioning tree node, corresponding center-of-mass, and list of all children double* data; double center_of_mass[NDims]; unsigned int index[QT_NODE_CAPACITY]; // Children SPTree* children[no_children]; public: SPTree(double* inp_data, unsigned int N); SPTree(double* inp_data, double* inp_corner, double* inp_width); SPTree(double* inp_data, unsigned int N, double* inp_corner, double* inp_width); SPTree(SPTree* inp_parent, double* inp_data, unsigned int N, double* inp_corner, double* inp_width); SPTree(SPTree* inp_parent, double* inp_data, double* inp_corner, double* inp_width); ~SPTree(); void setData(double* inp_data); SPTree* getParent(); void construct(Cell boundary); bool insert(unsigned int new_index); void subdivide(); bool isCorrect(); void rebuildTree(); void getAllIndices(unsigned int* indices); unsigned int getDepth(); double computeNonEdgeForces(unsigned int point_index, double theta, double neg_f[]) const; void computeEdgeForces(unsigned int* row_P, unsigned int* col_P, double* val_P, unsigned int N, double* pos_f, int num_threads) const; void print(); private: void init(SPTree* inp_parent, double* inp_data, double* inp_corner, double* inp_width); void fill(unsigned int N); unsigned int getAllIndices(unsigned int* indices, unsigned int loc); bool isChild(unsigned int test_index, unsigned int start, unsigned int end); }; template <> struct SPTree<0> { enum { no_children = 1 }; }; #endif Rtsne/src/RcppExports.cpp0000644000176200001440000001311714531113117015131 0ustar liggesusers// Generated by using Rcpp::compileAttributes() -> do not edit by hand // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 #include using namespace Rcpp; #ifdef RCPP_USE_GLOBAL_ROSTREAM Rcpp::Rostream& Rcpp::Rcout = Rcpp::Rcpp_cout_get(); Rcpp::Rostream& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get(); #endif // Rtsne_cpp Rcpp::List Rtsne_cpp(NumericMatrix X, int no_dims, double perplexity, double theta, bool verbose, int max_iter, bool distance_precomputed, NumericMatrix Y_in, bool init, int stop_lying_iter, int mom_switch_iter, double momentum, double final_momentum, double eta, double exaggeration_factor, unsigned int num_threads); RcppExport SEXP _Rtsne_Rtsne_cpp(SEXP XSEXP, SEXP no_dimsSEXP, SEXP perplexitySEXP, SEXP thetaSEXP, SEXP verboseSEXP, SEXP max_iterSEXP, SEXP distance_precomputedSEXP, SEXP Y_inSEXP, SEXP initSEXP, SEXP stop_lying_iterSEXP, SEXP mom_switch_iterSEXP, SEXP momentumSEXP, SEXP final_momentumSEXP, SEXP etaSEXP, SEXP exaggeration_factorSEXP, SEXP num_threadsSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< NumericMatrix >::type X(XSEXP); Rcpp::traits::input_parameter< int >::type no_dims(no_dimsSEXP); Rcpp::traits::input_parameter< double >::type perplexity(perplexitySEXP); Rcpp::traits::input_parameter< double >::type theta(thetaSEXP); Rcpp::traits::input_parameter< bool >::type verbose(verboseSEXP); Rcpp::traits::input_parameter< int >::type max_iter(max_iterSEXP); Rcpp::traits::input_parameter< bool >::type distance_precomputed(distance_precomputedSEXP); Rcpp::traits::input_parameter< NumericMatrix >::type Y_in(Y_inSEXP); Rcpp::traits::input_parameter< bool >::type init(initSEXP); Rcpp::traits::input_parameter< int >::type stop_lying_iter(stop_lying_iterSEXP); Rcpp::traits::input_parameter< int >::type mom_switch_iter(mom_switch_iterSEXP); Rcpp::traits::input_parameter< double >::type momentum(momentumSEXP); Rcpp::traits::input_parameter< double >::type final_momentum(final_momentumSEXP); Rcpp::traits::input_parameter< double >::type eta(etaSEXP); Rcpp::traits::input_parameter< double >::type exaggeration_factor(exaggeration_factorSEXP); Rcpp::traits::input_parameter< unsigned int >::type num_threads(num_threadsSEXP); rcpp_result_gen = Rcpp::wrap(Rtsne_cpp(X, no_dims, perplexity, theta, verbose, max_iter, distance_precomputed, Y_in, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads)); return rcpp_result_gen; END_RCPP } // Rtsne_nn_cpp Rcpp::List Rtsne_nn_cpp(IntegerMatrix nn_dex, NumericMatrix nn_dist, int no_dims, double perplexity, double theta, bool verbose, int max_iter, NumericMatrix Y_in, bool init, int stop_lying_iter, int mom_switch_iter, double momentum, double final_momentum, double eta, double exaggeration_factor, unsigned int num_threads); RcppExport SEXP _Rtsne_Rtsne_nn_cpp(SEXP nn_dexSEXP, SEXP nn_distSEXP, SEXP no_dimsSEXP, SEXP perplexitySEXP, SEXP thetaSEXP, SEXP verboseSEXP, SEXP max_iterSEXP, SEXP Y_inSEXP, SEXP initSEXP, SEXP stop_lying_iterSEXP, SEXP mom_switch_iterSEXP, SEXP momentumSEXP, SEXP final_momentumSEXP, SEXP etaSEXP, SEXP exaggeration_factorSEXP, SEXP num_threadsSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< IntegerMatrix >::type nn_dex(nn_dexSEXP); Rcpp::traits::input_parameter< NumericMatrix >::type nn_dist(nn_distSEXP); Rcpp::traits::input_parameter< int >::type no_dims(no_dimsSEXP); Rcpp::traits::input_parameter< double >::type perplexity(perplexitySEXP); Rcpp::traits::input_parameter< double >::type theta(thetaSEXP); Rcpp::traits::input_parameter< bool >::type verbose(verboseSEXP); Rcpp::traits::input_parameter< int >::type max_iter(max_iterSEXP); Rcpp::traits::input_parameter< NumericMatrix >::type Y_in(Y_inSEXP); Rcpp::traits::input_parameter< bool >::type init(initSEXP); Rcpp::traits::input_parameter< int >::type stop_lying_iter(stop_lying_iterSEXP); Rcpp::traits::input_parameter< int >::type mom_switch_iter(mom_switch_iterSEXP); Rcpp::traits::input_parameter< double >::type momentum(momentumSEXP); Rcpp::traits::input_parameter< double >::type final_momentum(final_momentumSEXP); Rcpp::traits::input_parameter< double >::type eta(etaSEXP); Rcpp::traits::input_parameter< double >::type exaggeration_factor(exaggeration_factorSEXP); Rcpp::traits::input_parameter< unsigned int >::type num_threads(num_threadsSEXP); rcpp_result_gen = Rcpp::wrap(Rtsne_nn_cpp(nn_dex, nn_dist, no_dims, perplexity, theta, verbose, max_iter, Y_in, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads)); return rcpp_result_gen; END_RCPP } // normalize_input_cpp Rcpp::NumericMatrix normalize_input_cpp(Rcpp::NumericMatrix input); RcppExport SEXP _Rtsne_normalize_input_cpp(SEXP inputSEXP) { BEGIN_RCPP Rcpp::RObject rcpp_result_gen; Rcpp::RNGScope rcpp_rngScope_gen; Rcpp::traits::input_parameter< Rcpp::NumericMatrix >::type input(inputSEXP); rcpp_result_gen = Rcpp::wrap(normalize_input_cpp(input)); return rcpp_result_gen; END_RCPP } static const R_CallMethodDef CallEntries[] = { {"_Rtsne_Rtsne_cpp", (DL_FUNC) &_Rtsne_Rtsne_cpp, 16}, {"_Rtsne_Rtsne_nn_cpp", (DL_FUNC) &_Rtsne_Rtsne_nn_cpp, 16}, {"_Rtsne_normalize_input_cpp", (DL_FUNC) &_Rtsne_normalize_input_cpp, 1}, {NULL, NULL, 0} }; RcppExport void R_init_Rtsne(DllInfo *dll) { R_registerRoutines(dll, NULL, CallEntries, NULL, NULL); R_useDynamicSymbols(dll, FALSE); } Rtsne/R/0000755000176200001440000000000014534214252011550 5ustar liggesusersRtsne/R/utils.R0000644000176200001440000000724314531100546013036 0ustar liggesusers#' Normalize input data matrix #' #' Mean centers each column of an input data matrix so that it has a mean of zero. #' Scales the entire matrix so that the largest absolute of the centered matrix is equal to unity. #' #' @param X matrix; Input data matrix with rows as observations and columns as variables/dimensions. #' #' @details #' Normalization avoids numerical problems when the coordinates (and thus the distances between observations) are very large. #' Directly computing distances on this scale may lead to underflow when computing the probabilities in the t-SNE algorithm. #' Rescaling the input values mitigates these problems to some extent. #' #' @return A numeric matrix of the same dimensions as \code{X} but centred by column and scaled to have a maximum deviation of 1. #' #' @author #' Aaron Lun #' #' @examples #' iris_unique <- unique(iris) # Remove duplicates #' iris_matrix <- as.matrix(iris_unique[,1:4]) #' X <- normalize_input(iris_matrix) #' colMeans(X) #' range(X) #' @export normalize_input <- function(X) { # Using the original C++ code from bhtsne to do mean centering, even though it would be simple to do with sweep(). # This is because R's sums are more numerically precise, so for consistency with the original code, we need to use the naive C++ version. normalize_input_cpp(X) } is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) # Checks if the input is a whole number. { abs(x - round(x)) < tol } .check_tsne_params <- function(nsamples, dims, perplexity, theta, max_iter, verbose, Y_init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor) # Checks parameters for the t-SNE algorithm that are independent of # the format of the input data (e.g., distance matrix or neighbors). { if (!is.wholenumber(dims) || dims < 1 || dims > 3) { stop("dims should be either 1, 2 or 3") } if (!is.wholenumber(max_iter) || !(max_iter>0)) { stop("number of iterations should be a positive integer")} if (!is.null(Y_init) && (nsamples!=nrow(Y_init) || ncol(Y_init)!=dims)) { stop("incorrect format for Y_init") } if (!is.numeric(perplexity) || perplexity <= 0) { stop("perplexity should be a positive number") } if (!is.numeric(theta) || (theta<0.0) || (theta>1.0) ) { stop("theta should lie in [0, 1]")} if (!is.wholenumber(stop_lying_iter) || stop_lying_iter<0) { stop("stop_lying_iter should be a positive integer")} if (!is.wholenumber(mom_switch_iter) || mom_switch_iter<0) { stop("mom_switch_iter should be a positive integer")} if (!is.numeric(momentum) || momentum < 0) { stop("momentum should be a positive number") } if (!is.numeric(final_momentum) || final_momentum < 0) { stop("final momentum should be a positive number") } if (!is.numeric(eta) || eta <= 0) { stop("eta should be a positive number") } if (!is.numeric(exaggeration_factor) || exaggeration_factor <= 0) { stop("exaggeration_factor should be a positive number")} if (nsamples - 1 < 3 * perplexity) { stop("perplexity is too large for the number of samples")} init <- !is.null(Y_init) if (init) { Y_init <- t(Y_init) # transposing for rapid column-major access. } else { Y_init <- matrix() } list(no_dims=dims, perplexity=perplexity, theta=theta, max_iter=max_iter, verbose=verbose, init=init, Y_in=Y_init, stop_lying_iter=stop_lying_iter, mom_switch_iter=mom_switch_iter, momentum=momentum, final_momentum=final_momentum, eta=eta, exaggeration_factor=exaggeration_factor) } .clear_unwanted_params <- function(args) # Removing parameters that we do not want to store in the output. { args$Y_in <- args$init <- args$no_dims <- args$verbose <- NULL args } Rtsne/R/neighbors.R0000644000176200001440000000253114531100546013651 0ustar liggesusers#' @rdname Rtsne #' @export Rtsne_neighbors <- function(index, distance, dims=2, perplexity=30, theta=0.5, max_iter=1000,verbose=getOption("verbose", FALSE), Y_init=NULL, stop_lying_iter=ifelse(is.null(Y_init),250L,0L), mom_switch_iter=ifelse(is.null(Y_init),250L,0L), momentum=0.5, final_momentum=0.8, eta=200.0, exaggeration_factor=12.0, num_threads=1, ...) { if (!is.matrix(index)) { stop("Input index is not a matrix") } if (!identical(dim(index), dim(distance))) { stop("index and distance matrices should have the same dimensions") } R <- range(index) if (any(R < 1 | R > nrow(index) | !is.finite(R))) { stop("invalid indices") } tsne.args <- .check_tsne_params(nrow(index), dims=dims, perplexity=perplexity, theta=theta, max_iter=max_iter, verbose=verbose, Y_init=Y_init, stop_lying_iter=stop_lying_iter, mom_switch_iter=mom_switch_iter, momentum=momentum, final_momentum=final_momentum, eta=eta, exaggeration_factor=exaggeration_factor) # Transposing is necessary for fast column-major access to each sample, -1 for zero-indexing. out <- do.call(Rtsne_nn_cpp, c(list(nn_dex=t(index - 1L), nn_dist=t(distance), num_threads=num_threads), tsne.args)) out$Y <- t(out$Y) # Transposing back. c(list(N=nrow(index)), out, .clear_unwanted_params(tsne.args)) } Rtsne/R/Rtsne.R0000644000176200001440000003362414531105126012772 0ustar liggesusers#' Barnes-Hut implementation of t-Distributed Stochastic Neighbor Embedding #' #' Wrapper for the C++ implementation of Barnes-Hut t-Distributed Stochastic Neighbor Embedding. t-SNE is a method for constructing a low dimensional embedding of high-dimensional data, distances or similarities. Exact t-SNE can be computed by setting theta=0.0. #' #' Given a distance matrix \eqn{D} between input objects (which by default, is the euclidean distances between two objects), we calculate a similarity score in the original space: \deqn{ p_{j | i} = \frac{\exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)}{\sum_{k \neq i} \exp(-\|D_{ij}\|^2 / 2 \sigma_i^2)} } which is then symmetrized using: \deqn{ p_{i j}=\frac{p_{j|i} + p_{i|j}}{2n}.} The \eqn{\sigma} for each object is chosen in such a way that the perplexity of \eqn{p_{j|i}} has a value that is close to the user defined perplexity. This value effectively controls how many nearest neighbours are taken into account when constructing the embedding in the low-dimensional space. #' For the low-dimensional space we use the Cauchy distribution (t-distribution with one degree of freedom) as the distribution of the distances to neighbouring objects: #' \deqn{ q_{i j} = \frac{(1+ \| y_i-y_j\|^2)^{-1}}{\sum_{k \neq l} 1+ \| y_k-y_l\|^2)^{-1}}.} #' By changing the location of the objects y in the embedding to minimize the Kullback-Leibler divergence between these two distributions \eqn{ q_{i j}} and \eqn{ p_{i j}}, we create a map that focusses on small-scale structure, due to the asymmetry of the KL-divergence. The t-distribution is chosen to avoid the crowding problem: in the original high dimensional space, there are potentially many equidistant objects with moderate distance from a particular object, more than can be accounted for in the low dimensional representation. The t-distribution makes sure that these objects are more spread out in the new representation. #' #' For larger datasets, a problem with the a simple gradient descent to minimize the Kullback-Leibler divergence is the computational complexity of each gradient step (which is \eqn{O(n^2)}). The Barnes-Hut implementation of the algorithm attempts to mitigate this problem using two tricks: (1) approximating small similarities by 0 in the \eqn{p_{ij}} distribution, where the non-zero entries are computed by finding 3*perplexity nearest neighbours using an efficient tree search. (2) Using the Barnes-Hut algorithm in the computation of the gradient which approximates large distance similarities using a quadtree. This approximation is controlled by the \code{theta} parameter, with smaller values leading to more exact approximations. When \code{theta=0.0}, the implementation uses a standard t-SNE implementation. The Barnes-Hut approximation leads to a \eqn{O(n log(n))} computational complexity for each iteration. #' #' During the minimization of the KL-divergence, the implementation uses a trick known as early exaggeration, which multiplies the \eqn{p_{ij}}'s by 12 during the first 250 iterations. This leads to tighter clustering and more distance between clusters of objects. This early exaggeration is not used when the user gives an initialization of the objects in the embedding by setting \code{Y_init}. During the early exaggeration phase, a momentum term of 0.5 is used while this is changed to 0.8 after the first 250 iterations. All these default parameters can be changed by the user. #' #' After checking the correctness of the input, the \code{Rtsne} function (optionally) does an initial reduction of the feature space using \code{\link{prcomp}}, before calling the C++ TSNE implementation. Since R's random number generator is used, use \code{\link{set.seed}} before the function call to get reproducible results. #' #' If \code{X} is a data.frame, it is transformed into a matrix using \code{\link{model.matrix}}. If \code{X} is a \code{\link{dist}} object, it is currently first expanded into a full distance matrix. #' #' @param X matrix; Data matrix (each row is an observation, each column is a variable) #' @param index integer matrix; Each row contains the identity of the nearest neighbors for each observation #' @param distance numeric matrix; Each row contains the distance to the nearest neighbors in \code{index} for each observation #' @param dims integer; Output dimensionality (default: 2) #' @param initial_dims integer; the number of dimensions that should be retained in the initial PCA step (default: 50) #' @param perplexity numeric; Perplexity parameter (should not be bigger than 3 * perplexity < nrow(X) - 1, see details for interpretation) #' @param theta numeric; Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5) #' @param check_duplicates logical; Checks whether duplicates are present. It is best to make sure there are no duplicates present and set this option to FALSE, especially for large datasets (default: TRUE) #' @param pca logical; Whether an initial PCA step should be performed (default: TRUE) #' @param partial_pca logical; Whether truncated PCA should be used to calculate principal components (requires the irlba package). This is faster for large input matrices (default: FALSE) #' @param max_iter integer; Number of iterations (default: 1000) #' @param verbose logical; Whether progress updates should be printed (default: global "verbose" option, or FALSE if that is not set) #' @param ... Other arguments that can be passed to Rtsne #' @param is_distance logical; Indicate whether X is a distance matrix (default: FALSE) #' @param Y_init matrix; Initial locations of the objects. If NULL, random initialization will be used (default: NULL). Note that when using this, the initial stage with exaggerated perplexity values and a larger momentum term will be skipped. #' @param pca_center logical; Should data be centered before pca is applied? (default: TRUE) #' @param pca_scale logical; Should data be scaled before pca is applied? (default: FALSE) #' @param normalize logical; Should data be normalized internally prior to distance calculations with \code{\link{normalize_input}}? (default: TRUE) #' @param stop_lying_iter integer; Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0) #' @param mom_switch_iter integer; Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0) #' @param momentum numeric; Momentum used in the first part of the optimization (default: 0.5) #' @param final_momentum numeric; Momentum used in the final part of the optimization (default: 0.8) #' @param eta numeric; Learning rate (default: 200.0) #' @param exaggeration_factor numeric; Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0) #' @param num_threads integer; Number of threads to use when using OpenMP, default is 1. Setting to 0 corresponds to detecting and using all available cores #' #' @return List with the following elements: #' \item{Y}{Matrix containing the new representations for the objects} #' \item{N}{Number of objects} #' \item{origD}{Original Dimensionality before TSNE (only when \code{X} is a data matrix)} #' \item{perplexity}{See above} #' \item{theta}{See above} #' \item{costs}{The cost for every object after the final iteration} #' \item{itercosts}{The total costs (KL-divergence) for all objects in every 50th + the last iteration} #' \item{stop_lying_iter}{Iteration after which the perplexities are no longer exaggerated} #' \item{mom_switch_iter}{Iteration after which the final momentum is used} #' \item{momentum}{Momentum used in the first part of the optimization} #' \item{final_momentum}{Momentum used in the final part of the optimization} #' \item{eta}{Learning rate} #' \item{exaggeration_factor}{Exaggeration factor used to multiply the P matrix in the first part of the optimization} #' #' @section Supplying precomputed distances: #' If a distance matrix is already available, this can be directly supplied to \code{Rtsne} by setting \code{is_distance=TRUE}. #' This improves efficiency by avoiding recalculation of distances, but requires some work to get the same results as running default \code{Rtsne} on a data matrix. #' Specifically, Euclidean distances should be computed from a normalized data matrix - see \code{\link{normalize_input}} for details. #' PCA arguments will also be ignored if \code{is_distance=TRUE}. #' #' NN search results can be directly supplied to \code{Rtsne_neighbors} to avoid repeating the (possibly time-consuming) search. #' To achieve the same results as \code{Rtsne} on the data matrix, the search should be conducted on the normalized data matrix. #' The number of nearest neighbors should also be equal to three-fold the \code{perplexity}, rounded down to the nearest integer. #' Note that pre-supplied NN results cannot be used when \code{theta=0} as they are only relevant for the approximate algorithm. #' #' Any kind of distance metric can be used as input. #' In contrast, running \code{Rtsne} on a data matrix will always use Euclidean distances. #' #' @references Maaten, L. Van Der, 2014. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research, 15, p.3221-3245. #' @references van der Maaten, L.J.P. & Hinton, G.E., 2008. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research, 9, pp.2579-2605. #' #' @examples #' iris_unique <- unique(iris) # Remove duplicates #' iris_matrix <- as.matrix(iris_unique[,1:4]) #' #' # Set a seed if you want reproducible results #' set.seed(42) #' tsne_out <- Rtsne(iris_matrix,pca=FALSE,perplexity=30,theta=0.0) # Run TSNE #' #' # Show the objects in the 2D tsne representation #' plot(tsne_out$Y,col=iris_unique$Species, asp=1) #' #' # data.frame as input #' tsne_out <- Rtsne(iris_unique,pca=FALSE, theta=0.0) #' #' # Using a dist object #' set.seed(42) #' tsne_out <- Rtsne(dist(normalize_input(iris_matrix)), theta=0.0) #' plot(tsne_out$Y,col=iris_unique$Species, asp=1) #' #' set.seed(42) #' tsne_out <- Rtsne(as.matrix(dist(normalize_input(iris_matrix))),theta=0.0) #' plot(tsne_out$Y,col=iris_unique$Species, asp=1) #' #' # Supplying starting positions (example: continue from earlier embedding) #' set.seed(42) #' tsne_part1 <- Rtsne(iris_unique[,1:4], theta=0.0, pca=FALSE, max_iter=350) #' tsne_part2 <- Rtsne(iris_unique[,1:4], theta=0.0, pca=FALSE, max_iter=650, Y_init=tsne_part1$Y) #' plot(tsne_part2$Y,col=iris_unique$Species, asp=1) #' \dontrun{ #' # Fast PCA and multicore #' #' tsne_out <- Rtsne(iris_matrix, theta=0.1, partial_pca = TRUE, initial_dims=3) #' tsne_out <- Rtsne(iris_matrix, theta=0.1, num_threads = 2) #' } #' @useDynLib Rtsne, .registration = TRUE #' @import Rcpp #' @importFrom stats model.matrix na.fail prcomp #' #' @export Rtsne <- function (X, ...) { UseMethod("Rtsne", X) } #' @describeIn Rtsne Default Interface #' @export Rtsne.default <- function(X, dims=2, initial_dims=50, perplexity=30, theta=0.5, check_duplicates=TRUE, pca=TRUE, partial_pca=FALSE, max_iter=1000,verbose=getOption("verbose", FALSE), is_distance=FALSE, Y_init=NULL, pca_center=TRUE, pca_scale=FALSE, normalize=TRUE, stop_lying_iter=ifelse(is.null(Y_init),250L,0L), mom_switch_iter=ifelse(is.null(Y_init),250L,0L), momentum=0.5, final_momentum=0.8, eta=200.0, exaggeration_factor=12.0, num_threads=1, ...) { if (!is.logical(is_distance)) { stop("is_distance should be a logical variable")} if (!is.matrix(X)) { stop("Input X is not a matrix")} if (is_distance & !(is.matrix(X) & (nrow(X)==ncol(X)))) { stop("Input is not an accepted distance matrix") } if (!(is.logical(pca_center) && is.logical(pca_scale)) ) { stop("pca_center and pca_scale should be TRUE or FALSE")} if (!is.wholenumber(initial_dims) || initial_dims<=0) { stop("Incorrect initial dimensionality.")} if (!is.wholenumber(num_threads) || num_threads<0) { stop("Incorrect number of threads.")} tsne.args <- .check_tsne_params(nrow(X), dims=dims, perplexity=perplexity, theta=theta, max_iter=max_iter, verbose=verbose, Y_init=Y_init, stop_lying_iter=stop_lying_iter, mom_switch_iter=mom_switch_iter, momentum=momentum, final_momentum=final_momentum, eta=eta, exaggeration_factor=exaggeration_factor) # Check for missing values X <- na.fail(X) # Apply PCA if (!is_distance) { if (pca) { if(verbose) cat("Performing PCA\n") if(partial_pca){ if (!requireNamespace("irlba", quietly = TRUE)) {stop("Package \"irlba\" is required for partial PCA. Please install it.", call. = FALSE)} X <- irlba::prcomp_irlba(X, n = initial_dims, center = pca_center, scale = pca_scale)$x }else{ if(verbose & min(dim(X))>2500) cat("Consider setting partial_pca=TRUE for large matrices\n") X <- prcomp(X, retx=TRUE, center = pca_center, scale. = pca_scale, rank. = initial_dims)$x } } if (check_duplicates) { if (any(duplicated(X))) { stop("Remove duplicates before running TSNE.") } } if (normalize) { X <- normalize_input(X) } X <- t(X) # transposing for rapid column-major access. } else { # Compute Squared distance if we are using exact TSNE if (theta==0.0) { X <- X^2 } } out <- do.call(Rtsne_cpp, c(list(X=X, distance_precomputed=is_distance, num_threads=num_threads), tsne.args)) out$Y <- t(out$Y) # Transposing back. info <- list(N=ncol(X)) if (!is_distance) { out$origD <- nrow(X) } # 'origD' is unknown for distance matrices. out <- c(info, out, .clear_unwanted_params(tsne.args)) class(out) <- c("Rtsne","list") out } #' @describeIn Rtsne tsne on given dist object #' @export Rtsne.dist <- function(X,...,is_distance=TRUE) { X <- as.matrix(na.fail(X)) Rtsne(X, ..., is_distance=is_distance) } #' @describeIn Rtsne tsne on data.frame #' @export Rtsne.data.frame <- function(X,...) { X <- model.matrix(~.-1,na.fail(X)) Rtsne(X, ...) } Rtsne/R/RcppExports.R0000644000176200001440000000203614534214252014165 0ustar liggesusers# Generated by using Rcpp::compileAttributes() -> do not edit by hand # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 Rtsne_cpp <- function(X, no_dims, perplexity, theta, verbose, max_iter, distance_precomputed, Y_in, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads) { .Call(`_Rtsne_Rtsne_cpp`, X, no_dims, perplexity, theta, verbose, max_iter, distance_precomputed, Y_in, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads) } Rtsne_nn_cpp <- function(nn_dex, nn_dist, no_dims, perplexity, theta, verbose, max_iter, Y_in, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads) { .Call(`_Rtsne_Rtsne_nn_cpp`, nn_dex, nn_dist, no_dims, perplexity, theta, verbose, max_iter, Y_in, init, stop_lying_iter, mom_switch_iter, momentum, final_momentum, eta, exaggeration_factor, num_threads) } normalize_input_cpp <- function(input) { .Call(`_Rtsne_normalize_input_cpp`, input) } Rtsne/NEWS.md0000644000176200001440000000105714531100546012445 0ustar liggesusers# Rtsne 0.15 * Substantial speed increase by fixing the possible embedding dimensionalities to 1, 2 or 3 * Clarification of the licensing of different parts of the package * Support for using, more efficient, partial PCA (by Daniel Wells) * Made the normalization optional and the normalization function used available (by Aaron Lun) * Support for using precomputed nearest neighbour matrices (by Aaron Lun) * Added OpenMP support * Default verbose value is now the global setting (by Richard Cotton) * Added a `NEWS.md` file to track changes to the package. Rtsne/MD50000644000176200001440000000246414534330563011671 0ustar liggesusers6cc250f0de08298bd88d809e88d6ab48 *DESCRIPTION 49fc1d4c2793cb7e3f05d44249b7e27b *LICENSE 1c735967f4f81eb8e783fd0034e71403 *NAMESPACE b945559fefc649a5c564c892b2da8ca3 *NEWS.md 084cfe0c64d846b9cc5f37ed17057d8a *R/RcppExports.R 01e00e3145ca1f3974ae2a03416548f2 *R/Rtsne.R 4e9ec6d249295c36908a210edf740e53 *R/neighbors.R acc89602417f900edc1155767873f277 *R/utils.R f7cf46b9667e131792841f134c0efb0a *README.md d86c9361970f4d71462ead471ae295b5 *inst/CITATION fba54a6ca5579b4ed9901e2aea801389 *man/Rtsne.Rd 534b6db9f30e8d2ccc33ce366180e0e8 *man/normalize_input.Rd fccdbee9a90c6ec50412daaf3f641201 *src/Makevars cd626b9e988ce6dd307bf23e23e5f26c *src/Makevars.win c76804e93f56f72ad478b3610860ea5c *src/RcppExports.cpp 89a933631b2a530ae8866d831ed33ad1 *src/Rtsne.cpp 18ddc006d1ce092a4636748c5873a4a4 *src/datapoint.h 4c911f51eb82db749ffb654223fd6b9c *src/normalize_input.cpp 965a08920d0d7afc760a154f84c24c2c *src/sptree.cpp 8cc6063c4e6b8d4f51fe30a8a9436ebb *src/sptree.h 41551e8cdaddfdd94c828372d10a7127 *src/tsne.cpp e99c1b8c5682d35a6a5c698f9bcad673 *src/tsne.h 9b4912c92ddeb9daef0a3178680e6580 *src/vptree.h 99cf22df0ba208958da6b07a31a21a74 *tests/testthat.R 7e61928cebf8128619d5b22c75de49af *tests/testthat/test_Rtsne.R 6951d26d437058a846e9a2029e6c08b6 *tests/testthat/test_neighbors.R 4fc4884b693e3b0d2bece5033c0ccb4e *tools/example-1.png Rtsne/inst/0000755000176200001440000000000014534213762012331 5ustar liggesusersRtsne/inst/CITATION0000644000176200001440000000231714534214163013465 0ustar liggesusersbibentry( bibtype="Manual", title = "{Rtsne}: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation", author = "Jesse H. Krijthe", year = "2015", note = sprintf("R package version %s", meta$Version), url = "https://github.com/jkrijthe/Rtsne", textVersion="Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation, URL: https://github.com/jkrijthe/Rtsne" ) bibentry( bibtype="Article", title="Visualizing High-Dimensional Data Using t-SNE", volume= "9", pages="2579-2605", year="2008", author="L.J.P. van der Maaten and G.E. Hinton", journal="Journal of Machine Learning Research", textVersion="L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.") bibentry( bibtype="Article", title="Accelerating t-SNE using Tree-Based Algorithms", volume= "15", pages="3221-3245", year="2014", author="L.J.P. van der Maaten", journal="Journal of Machine Learning Research", textVersion="L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014." )