pwalign/build/0000755000175100017510000000000014614351210014327 5ustar00biocbuildbiocbuildpwalign/build/vignette.rds0000644000175100017510000000037114614351210016667 0ustar00biocbuildbiocbuild 0 ;ɃOwPAdzZ\i;|r5vj`!iZ4M@iJѻ}tM.C(i>\2.K`TA*EiQ#CWp*̹+.]bcV  2W%ydeX]G:/7 *SqA&s`G^ P9'S Fр_b77id?qAq,vc16l\yد{c*<έךhVKT{}9ڸ8鈻!YZӥ䵊u~8_d ?=9E 淵3= (rc_:3w3fjo4[Vk2~w>|w߹L6L}Ƚq*ދ=OOsߎO9ԷT=8 >~&̃3f~s{ݠ}ut3gX/v7fa?Þr9}b^Ay|ăLʯo7?~'|_z+=9% px \>>>N_ <?PH"? pwalign/data/BLOSUM45.rda0000644000175100017510000000117314614311433016010 0ustar00biocbuildbiocbuildVMO0lUQz_*?@nK!*!~=uōS-ebOxjk\ӧ6I?޹4]~Yzw6f"e;0#9r[Y&*c{;[lm㨃lC/ Yˆdf$P;;W{B:]w!=l.t'ac;ixW/o#_k ި^L9b딗 Mjt^ 7͈q>6`sߒ٠>[jm;e91p>2qQZpde{IV#< COO9#nO#:?-G}ۧD=&JTvN|88p ppp ` '/k#OVo@b l pwalign/data/BLOSUM50.rda0000644000175100017510000000116414614311433016004 0ustar00biocbuildbiocbuildVMo1w7DąJ--m ʅvv=r;773N.O6G*Єf? &gyqynffvۑ=k[Gؔٗc~!{na'~3E97oEClKǀ1Oe}-DbS~s{G8jE}xFPW&{oVRߢ׍uU?5ϼDWWWls9ckdNk}YFԻlz֞ ߌh;~VSk`~ W//c[Xr^H`osW}&F?]O?|{ɟ7fv-%%GW^%xG +k|'>6H$~> pwalign/data/BLOSUM62.rda0000644000175100017510000000115414614311433016006 0ustar00biocbuildbiocbuildVN1gvGĉ_! % pCJ@cnbivvwGsqM gMmo嗛yq+?qx<i:{ W{zQt4/.| _{f}'f$h ـ'(?`cb Y~䃿G]73sh>8|/y'.Fcg88@#|gOW:8LIxek?V5J#ȼǹIǨH~]k2 W3-_w-U2/Zύq ׯr%T]PjHuߕWC e q#pMrG'O|.;W|}k[09>y#:Y|U6P~LzEG`qw4s #.@Qگj^R/zz#v+YΣgUCxڲﮞ,t_;&múp ppp 3)%+ w_?{Ag  pwalign/data/BLOSUM80.rda0000644000175100017510000000121014614311433015777 0ustar00biocbuildbiocbuildKo0UĊ bK)G ,I)xsfv ӄp;o>c0Yo,Zl[t=f[2o|2誏M??߶g >b:bS)GS(Oh~z?nfY3>s8ྱ8[m}]Rg^*93Fnx 7\?^yzw/o`Rށfyp x 8<<<<^NOogOπCKp 4 pwalign/data/PAM120.rda0000644000175100017510000000124114614311433015472 0ustar00biocbuildbiocbuildMo0dwVTą (/AQ)nHpP.|8<7Qk{<'{};;Poo?UG:|r8o44f6as'qq?:xrXQY : E|w>cotk#1prhk$Q|Ǩzl\|= k(a=ζ \Tnc㼌$w/gu?+bwwBeڂsp4؜K{1]|q{b3}2GWeL YFtjSjRsžϡAtMG_kpI ]^܏V_5ܷ¶t_ KOgosZO5=‹5 97K$/w&uF~g;n_׵Ts&KNt,Nn{O>ߦ|j4ɦ<,t%ُ9o%~3S+}f-gM'{H뷒GC,a¿yBm%ѯ= ޘ]!_k6.Φ>Ԣ>;j~ɯ~DY2~r.[s5= p x |||||<|5H^< pwalign/data/PAM250.rda0000644000175100017510000000117314614311433015502 0ustar00biocbuildbiocbuildo1ŝtK#*qo*qPTJA!C'UykNXrű̼|}ZBUY'"l.O?J?dcT܏pcVk6k}3=>U3B0c܃xX߸ µ?-=ŰņY͑5q~,G%K(_q4x賴ʚ -=[36џVflLP`Ft}:/,m8IE^qhp3g }ۏ9}M_Lusvhڸ;b<~b!VqMXOX?0oࡿJ_'}R: gFojZs͚&huЉ%d@~抷L{տ{F=B^+./yV./H[kvJ16-|tZԧBu,_r:k!* \o>#7gQ-p-_T؀w:5YY9DH1WW͎ul&; _ }?%(towu^n.6O=FŸN{x@lG0/x.<[Jyu]|ŪKԑ<գ:lXlY1ɕ0y'<9 -Ls2 <&%tէѼdAuп!G}Zf[t>Q:g*! 3>|x܅lBw}kr[ϣ#M:m}?nn2]?Sֆ4vO$w$t>:NO//oo3Gg'kWh?y1{ pwalign/data/PAM40.rda0000644000175100017510000000134014614311433015413 0ustar00biocbuildbiocbuildnA1$NPH\xNŸ (8 Oxmw>owN4ڙ^_]=>:*bO쥟QQGi8ia?"4: pO~_F'RkyuDKtSi*J]keAZrtz\ɏB6*K|y^ Mg(lw\63촖ؗH6iPgwo|h\xe6G+ɣ8)lTA7~#]÷\%eC8wŅFS{L|`n&Va[RgLU!gڣnZ P>7*[,8Dd,bq^S){&Oψΰ_3&SΩ=q>1'Pp/_b.yfN_٘FuiHm+uk-|%~k'xq#Rn[`gƚ z!/ae<.$FD<{+-M#;>Fy^>F+u>`t- j~ڝ(E_/=i ~" q͑GSaNh]3nXOQm6رOT\+&-fen+FRr4IhEO#/={+uyԵjG5궞ďKלੳk :𲏈pgՃv#JDk:`N471ʞs&(WwڏDh7J#IڍE'1NӑcwwFq^vq|yD XY d8e`́e#uS jB59x1% ZPS:qP#LVL݂HE-r O.pͰ_بdT#Sp{՚-9Lm7R9P+Sa^@خx  NEX5yJ8 KTu` C9C)}v@\5 R=1O o(q 4w$85:Z4EI~4AJf ASh wj8#XW- mz  傜 #@i 0"wze0/ 쒡ӄPA 푆W:N&**#sqmӨ U 'j/оI dF1t | 8*|ZGσ Ofp ,I ${`](%h4`(E0T,e;ʕDeCeweJ@n. n t#%) 2j La;$F9PK57 <{ioT@kGCІfh& X,JS8yq(BᆣF%9aZa qDSO)ɝd%k@Q th'â[C$j@Fn@7B#.]`jt^EF~AogZ#PgAka$nq0/_B=7:B1J,g h:JB@/zD ˑ8 `PfɎ+8]^yb^2%>d n )RJ" InT1F!@PÝ'ޢ'$!d,E8ʃ5S눋A*px{VC9ESte %IϢ+1`䟴sTfLjfLU2F aLP syQF,TdKɃ` Gd? 0F(s,@f( HO }ralZӑ 0Q3~|/89 uD`3\2?R".tRC:G?mUpt3,zWE)ݐj}.C i5\Ub<5PPvaN^tޣ.g@ C' fc3&y#mˁNh)(knR@vr7kIΞd#CU^ZUXS`Xp]}C:Ń \zj(`z+:X V@N'JxF7RߘV4Jy+ C0cĒ@TLșgH&%Ĝꀑ촕ӝ1, -ӺҤ: NR4@@aP:i_ckd]P+@RײDŽRP`)aazEJY 5hʐ<7S*) *[$b)(TA@2Su'! aTt=*d;CEHQ&kHy2KaQƨ* m89z 102;jr먥)e5١:1]4LqXG ʇ"FK`"0% \rx: _P19N #HGB y0$}&2HGB=PGB1HGB!4yYHhPz$# H(# HGB^=Pz$# HGB>y$# HGBGB=x$# HGB=@B?o{$# E JP  #H(Px$#:HGBGB!# EUd HGB?aH(HGB=Pz$# HGB?MH(2#A<P#AD Ե,?$ HGB= <#d2$T# e HhcGB  <P&#L H  HGBGZ+8{H N4n =wb'_ů>M'#$p7;ӣY#94P v/x[xz1lMs3ϸ+Ij En??Vhxֱ!'t ʥ)5^aȫeuzDvj,*&Y{Wo鸺*G5a?lnEb'ߙę_uq8yQj0HGpG\|NOx5ÿ&Akcl*[xiJsӛDRHkj_Ѱ^&F]H{$vֵkvdqq{Ύm`Mtw ulUrP뽜O6r-ey&DSͣp g9<݌YKrni#mIđi=c9Ro%(ZcٖHv,<"6Qh>x@GOZ##seaNF٧!P? AwV;zhdop3Ef*SǪ9ա59Am$p6_4O9&#J.9+a_fYpiИNNFd)Xy#@!nxZꄖ0񖬞`H=H7-h̫~/[jT̨c/wn-VIlZpO_kЊ׌ny4N̖˞s ,L%.-["7'緪u[OmҘ5.^ekAO!D~\@sk)𹾔 xp56)SBՎ. ?[&kȼ6;H| Boۙ ⪱;(*.tqɩ1N~ZHuhN]i|Ѹ)ؗw>68?U&k4~LOg=a\U^&U]@c]4PۧEhTߧMð{./tÙ߈qr>٤ENQU#ٽMW-4ȸEݭ)ZRPQ ˋI ,~Ws7̲}bk0aV @@wƷuhP#t4 ύ<_.k;adtw<4}KqRDeBT6zW*9`*fnzð}&87gOO[˒p4.r|zY 'ik/tIIZv&QsTV\QW ^Jh;=FvOtID1VT:3LʶDH߳0}J~鐼eW" ]1G8V_Oewr2õ[{btJ)EԹm]Gjxڬƅҥ8pDw{`ܧph1֘ cf+V,qfah7&ʁ] 4|P -MxۆYLѽߕGԙ+ K@Jg N 68R*|g:g늪iCip "5ϡ=]Wk_1j7+cQ/,c>0x[$c'{05~܏tus07V{ZtC_61.Z?Y}&1ZqevЋ 6dY)~|wĶ[g84ۍqx:x{N#?pSǽ|Hع L~4vۗǸ;L#zm#=A'dcq9dtd_kPcjI˾W6hzWd3L>7_za|FK?yʥUI ?gx]>[n9f_U VttBGn\(8ވ8͡Oe#)pwalign/data/srPhiX174.rda0000644000175100017510000011156714614311433016320 0ustar00biocbuildbiocbuildg#Ǖ&K$Hi$4N1-C)oH@"D HPU.vX]լ5{{;3{{{ovzdd"3ΏFwT0/\VkuWvս owN}WgT.֥;秇W*ؽ~rrTS^o_-?Ro]ks*WŸmW^ )]?+HZj(-+[.sÕv )g&og(`dſ" "D$Q* f`wK-hHKk~Y3hC#OtyHX1U )~3ȟ< $-E8 k D %9b].S(G}ZD6ii񬺻8KԤ3WondMn2eƪ˗zR65*,t_Ŝgcn%+U=,8XZhzD ie4^KՍ&#cb1> ĕy?cys]o3b6pXtɗ<ל)bDhrU-jX/F6eȽnn$QA]A 3c#DB4i"*haɱ-֑2X߀-  )m,bj[j&:Nʰ^.#O lpyͽ5ҝ )rKT'X5U6$V!dR:XQMfiOf2Ćp%e)a* 1ݵl(3cfoi,dk4uni,čy^XWR*o' i&U3(˅;5Y*KxƂ4IR1 čb`wd(DI"FX_`,̥rW4RL^kLYFuUskzVOf 2B-Ϩ %˗q whE/T2(Ʌ*Z?DcHs?Q?Y4/ZABÖ!SpPn,ܰBpA %?R˨:&'ҍYg& ;Oc4ÀqzN*򊍳\K)̷`t#R:,\u ~a0 %'$֦TU0!g8̒<~gЊd,-&&7#gJtL9a$ @qpeE t3ӑHG慄0F,A\Ӳ02N -'J bLtdqLp:$*Xr'brw[b\b3 $@;z4O!}%3=p4EC'dDoz3g/Kl,o=Y 񔕪E#G"鑬e[,&J']}n d~åOD\.~b.ދjçɈé268*WZ5F^qC!#*B4k1l42L'Ї!T !8 $Q z(Vw9#?k{ !L$VHCRu!#Oy$1ˠ*Ri6s~* kÎY,3exVt^DA? ?5̝(¨Ria+>6"Fq')FEp&E*tRGqN p19_dZ/ $SHGRB)4ilV,i[ǽ.):e;՛H(![T15FKSk+TRiJxGc`W4 ~[`>W{y0 0d. ͖?E] eqgKiԆTk:Q`VUC| 3IIޘZt-K(jWaR839 4Haf,,%S'+1P^W0yBg֦Hg}C $Z)xE),"Z6|UZ|BU5g.ZHQ%|1^2 "F]ꒂpdΗY#{%eS~c(h/iy_CKYbjOJ[%2<0Qi(("&| U*}ٴꓷ٧{Q6։ h[J&:NQfY8{@W =Ll8_r3o٠>}TLfy[*k7rS7:X#2R j1e)~)j'2.Lyn(0d0,==@ҚȦID:̧x^/l0I$Jh*xGhG MTY.,\(*f.IP D#DIzYkj*ԋ,ՁOHVA"9"rr(f )-y̮-ԺoH.lޘ%E9>8D$ОO\lGL1 G(s.2o2M_:%h\?_L`q:cRI1a9Qcnl,pOJT1C n Ks~DK&"7[꧄#i cU^ȡďqy~!r5_DSuOrvy;I RIc\rMU!TrY х6*huiE%&d*@Pлz|9$ BO2'Cp]#;ZB.cB+@߈UB_LJi6i[Qe^ZALG^<'(/jgEQB-0ak9'r-Ke8 O).˞MYP @AЉbi8g@$QXk;NʓEP864eroluTkSOD5:qe ދOq(%A[WRh#!580͞ sۙ#q4c,Uk,0\VOZ1$èr[dxڤiLl4Wp1+̠QL8Ƴh4ͬml)/?^Ilt>hDhT ȅڞjn Y/#D(gLcJ1cmf5SkZܗ,1M/6s*kZQNK֨kܜkbu5bJ.krmxre46s1Ld]eO  `$2/,ݕ,H/>L1%B ,*ЂjńiK UZ%xS?Qѯ4Er H2 y*7^sy)xPHP.e Kݪ'5'ey*@K"++(3xrċƳuj2#v;^`c'bsaMDP4:! iF8aI&8@/FA:Yhˉ;qq22|Or-m(vJbytjk+\ę </)\@%i(bran*A5 5ܭ7ՖktCflQw|$bl'b{sddG#8}ON$w9Hnt,\A!݄`?43dnQQVȏaR3μ"nԣ F"btYI0#$R=/~heʫ w;\0Zr"Aa` Q{|libD8!waj }qbP T4D1a]|rM _1o4H4 ,Dyf*R4.<pgK%s,T`型] Kp (b(YY4DcUb el652#ݩ_Zhir2e͸=շ̵\MkB.odr[87F˰2t- >h~S4 < KQai(T$J,btD&aU֯,$L|^ox;Q6Ґ=-$jF֙Mse-Z213s8/@m-ZK(fmĶ4+^zSndk<$kӌŒӼ < q G~BVE^VC.\ЍIyxY,3e:HS NVnH'6Ζ\sz3"G7,R7otby%/V}^X=X.h>HqfIQHHn7:5&#}"BӘD+Tp9 `)S-ZJxJZӫ34azֹ%5)Y1H#Nl! -U鍲7̰ c6 ܷUR` +XJHrew.Ⱦ\, #B' ]ضZzm+ɞڂQd$q&'#uJ}4/LF% 5O7xdIz8y[Ȼ5͋ڵt&uHdXÚk)Cֲ s?C1Cͨ"b ${hQ1rBd2ʒ NFQ2JcXCF`EoB)%r&zZqҥ*\wSMs,qXda<25lHl:*T^-ȗUSg43Q'$D{=GӈJ/ROIF.Khf\X#HĤ<1g)GCAQh88b$~,6wU"§TG8~Tb֢ n$9{1hKW ȩn&dcAj>cedY*U,#kG'wѬ9j xZ r$-))]P-^2Yۓ[ 6ĆLHHU|)O6S 'hN CiO`o #t(t( =!WYhve k>L >U8⓶40 4Sv:MRSMdE#س&b 0Cm$*-SQ |"C:W:9 ⹯bDL:̠sLtlw:DW4|Q{ ^!ѢJJ&؊G}aYFS)ڊ-) H|@y`P'[urY0&,XйfZds+c='&G;c8|p[|O7&[L_׋)O,WgY}N 8{s&h')3>2K^D_/Eֿ[Ğw2o[޼h@"`(jݔ`aQD۱Cv؀J" N xp32q:VtDۉ'h疈Qh9~n6MK)VL)t~H{D Y4@Y4/*v8 N VT.u&%7}4s?/(+SQFO5AVJ14ʇ2R޸X*MRrrҌ3l616>.MilYіDubrKKVt"'̳\2s8#wIf f9Rc7JSs6Nar'cN! UJڊXt9Ͷx2? Sc MDc@?)͉NUT(gLxO0ğ˳"⽄3~ygcI0fԪIZd 7YTTQ shk$^$KFL7'rVqZ3INhAסc%&fLg}2Z2 FVie+SgrXJ @HxA;.E:˫ a\!-xQ g27&lI\ѡt ^%|x!ψV"Oq4?(>}~&ȆM&,MB |6)TP¥]6m&\ kf]-] :9\˗,Uiάg2ϲYK 27 j$.Z/$\ϚCQ6HIlE%sE2AIUj23I\UXd3.4V~M2{Ѳf|'s+8szKpćg3(c`G Eaeh{s:7e 3q„(A^TkbeCQݒl NI'K@G$f $]$ yV"GMCգӣf^ǵ%NٜƯg2NE:I\+fC )^ 8PbV393ՕvcZT$lOŵA:Թb!Ow 8"H́{s3k!Xi`P Oͣ"Rak YzO=gbO8@oLn)(\c'|"/raMKn ڑ/SF(k9DIA|mzZo.`UzEP{2Mx^U[FlIQ(% hF':?X3=L.s/_TbOrxzBՓRt6B!:ɦQB[0牡'dWJjѨBEeu*i 1К^nW#?#q(L. 葯PhQԃSl1FqB漼OBGT`͐R&SԎ, THb4B4>؟6OV+0Mh"%+qD N8t/0H1+rᗴGL3l)-R=V<̓QA +^٫3{ !4Q\ ilRCzVIdjƬ^!1{4$ )&iy:7+-hxy&3g7!ԍ8{BQNEqC[H ʰqfJP2Ԅ6Nh >*T L1B91L] %@zе0lY#컌[fLZeJtX_$NZE:K?j:= p{L9=Z&'n h)WX< *Ƭ0G1yt%+rhLN2QD8Aȭ| 7+6:I7Q~Da^\~n/P:%.%(drlAXvyقDXhfq k̑6}S!riSjsμzka,Q^()=uJoKB6/TQPeo+=0 EyYC*ᑦh0 Zn5cN<4%A:xx'WcHk*Y*͘VSc:χdp_W`?exU"C !8 3N4nI},CېvHeqHK"6x*$]^juʳjT|T^vx2-EI8q* c)/ٛR[gcД+F jz(? F'0M qh2cx8X >?iᤳB'T<ZJ碭˄K/d2W Gfِϱ>=+Ƌ)('Kb^sI..=OYFϚ0#va-!fRT+ ~Ha b$IeG#[NAa22yܧӢ0>bvM=뼼Lr-[6_Ha/Ew,Ղ +V4hmRGJ$|p@Ͱ"?f;;RؿH'fGlyc:C‡&oڐx]x0=<A~LbLăWz:4jȣT_y=1Q]M ΖƤF⇞~'Tx,[k @E-5jq"7>y3UyaFU*+F0|lqNkO5sz=rcLg.g4竍|%IC'E錎}yZQdďSV p)'c4ٞ>XK ]kD'B+AZFϭ̻X>>C]壑)(*FuԵWlaYHHdI_\^GRxXM-9W^{8$k=^G1"8RSppڣYD$&_d+_\xа`oOx@ H)o4z&#^_ZQ2o / <8/̄~g$`{a&B-a,hTH6dQLLi`)`NZ vZ8PRQ`B=K GZ3ؾ<1o#+t8G3VgOMՔђ=+]TF$|"AǠuLHI Ʉwe,J({$6<X>Wl, Fm}sqB:(yi.8?L20>?zgostYƧV1T])6.BЀRӹE4_ڠH0JYFBi)ԞXYamgjKاDNgtv:Wމ߯xtxc']ڿtc'cGgjykg#_Ivp~ҟ_~OUUJyӇ%/v:wݴ/>s`Z~s݆!ܬsor鶋/ʏۭK;*}}mS3nMWZE}??:M6tu5yxQŗ\:?;^u/ޔ_~SʛM/=YӫnnRGԵڷ敇ۯ٣OIgU.T`TGóu_Z釾3?;ԧ998k+vՁ6 3LJWj.<{O. ó3|KO_9pvtr~ុpx|Aʣ 'O\`^}'/_>|oM/U TX?QuRO];:Ek7Nkݳo=႔d/7P`2}z~?uԱ}|%._?;{߅53RڊM-~ ﮛ*m43&%`M6V.Cs.s._xU)>P"O}t=ƃ{XiUMr#A؅5Z~mfO3#AMN]{aժl<_^^Mx'v/Y|yc2\ 90Ûۓ]lZ͜,?N˓م'Nn_ ,iKDKO{⯔6*}3 ⾡=*kHSZ4y52W߲?exm};1޼)RqPKJ'~j?rEdMH_~xIj YR^B O=~t@o.`tN :L{}_gsf\`\pWFjƒ-Xk\m75nnWh#5h$ `?qtx霸3qKR%xQuLn;I~];yv*kyv9aE~ ol@4rLtw\QaVya*|~+k ϫ#Yf+Hkc[ҫ5U2_.--}tK4pEbݻS~ҠV7| 㲋㲻Oq%hkpXZR#5 T1H:TUV`2YC.63WL.`VY"_Ϭ$Ls뢏ێE1Z~TT|¦b|Ժf ,mt[UI|N A6_Z6Iba<щ^xhgi}/HRiV*[Wb;a3mo!Z[~Vgk>P5j7[!^o1a66< q:6/VB=Ͳ[6ƥ`4Ⱥ1\^ײKgR'@PlIVuFn+^$|u=6Uv7ܻ_v_ gM +8&C7T"[ӡ 6B*ǥFc2w\>訦ns`jb5e[}RhuT6Y3\zrӷ BuxP8z4|nGķ6zpQhy )h>QpQJ7 q·^oa9}|v^ey1+hnx[!+vN嚦\1M}>p5MKуkЪK'7.l]{ʍg~>?Ʌgq|u{'/s\l>z>;#Sm,2_~b]Hn..X7sj)W mg[ #B<{Υ~gO/-*Sn8+iG2ޛDnٳܞm}g}/m{+pjmYK%g77-W׊mȠkoՖiҋl=8*fMa2M݇w+@PIIک̑EE:NWQ<[.)Dl"M΋LWʊe7ퟟ>z\5cQ(,}F}^>tN)*5Kn,L?o\*%O}ɵGw\1Wo;FR\c*m Va& og?9&yj>#vzt\72ީ[uȧ7.8=h%-v&|/=*=Q6!}i(V-[a^BMC[ruy檡tM{ئ4ҰMVOؽ=e,P왨żd-30Wm[;.NU|!X?Z-._NJՖ6lQom^)i`b[wQyb*;4Sw(ٯp rZjmRʙ}ΰ:m3HN&&ؤj 6}Fxu h yQm@..PNmm۠Stۗ cwhc( >I؄-jw'V~VC\ kꚿȺ3>-Ut8|}+?V}snlP66ڃ^jIb>Yu̧;E7Bn'Zbe:}!ҥZg!Rӯ|f%f$z|ivӰ{hɁZc^ ^*H_t"׋m1'߱>[VPSu7IpR5E-/raYӵfMmXԵܖ˝|LPVL]yBud,I ӳ͊n[0?S]{`+NTܦJNUXcѾ6©>OD{-fY痭ڀmkN~޺=fa(tiUmZZOI&ވ` Td<߮QmM7jڟꭝ!fu1j]ieco'c]UT$%f:lj;rDVRmrWnH˫DkhjE񧗪Bmˈ!6tuJq;8~N߶5^)ӃgON7_9hZxT^9L SÔ;Un[VUn[VUn[VUn[VU~R9\n0oDQ(('ʿkPDy (G;Q@Yv:k(e( (AF^i?OםίPފ}?~?? AAAAAAAAA!!!!!!!C?a? 0{{{{{? ? ? ? ? ? ?>}?!?C}}}G#?c 1&O !!!!?S O3π?g OOOsρ9?/ _ /OOOOK /%_+W_k 5?#G? (QG? 1cǀǀǀǀǀǀǀǀǀ?8q>>K% _/e2 ?=[V?W+ _WU*?C?C 'O?? ? ? ? ? ? ? GO;W#?׀k5 1M8c?Otv' _ׁu)))))))?OS?OS ?'g99o7 ? ? ? ? ? ? ? ???????? ? ? ? ? ? ? Q? $I'O? )SO? 4ig? g? ,9s?y? /_"E/_ %K/_ 2e/_ +W_ *UW_ 5kׁ_:u7o 7o &-[o 6mo ;w________ .]w{>}??!C?#G? 1c'O? wV0^un[VUn[VUn[VUn[VUno)`"oy+7Ӄ|BɜXew|u}upz͢_秇WA~><9{6syv_=\=wu~ 9<ٔᶛͻjKouۡ;n?ھv{=[ܵz*{*wۼljdͷm9j5Pj'U[syPWں*b]FM:v\r@0 /qtn߆bjl ĺYax".$+o7woG]=<܂zRsôB8)qzniۈ^1V;[H[8YʂsDԵ'!8Ё6cpz½M￷yo*EۨFWxiI&pz֘crr9̼nl'wR#jB?[sSWncʻ ʾ4nmF~ryFRHb]is-{Ƶ[6uӞdjB(;v\vTPZUua(ɘNsǹ)PKa 7}mSotLbo _F;癭c!{*u[GB+5ʖnӻxptKwk85] (5&wBIh3Qmb[`o# wŽr]6S2\ vkKHqQ+[*w)[񥚚rK2K>ﺢ 9i]Y4yVGWmYq3($Y5ݮDνkʷܖnӴhsAv'ov뜃߇\ ;ۥ3Kԝ]-ճ,Q=*M꒺tШU˵i͒r'laΘSezv)]jEԲ(oFN64fI5a]loGf]m vaRV-KF1&N&F Xw[m&k5;$S4󾬪MyJaW;ӻ[dPތݐMON\u6GWHm|)XVt*ZuN KiԻ'K VmEe1 7e:?6d;t[r:1j֓1*])-L[v5Vխ 6 ɔQP6CGVXd[lQ$u]`Sȇkw+[]m1V=Nw6uu_VЫEO]CQʁ9ӭe\bk5R1r16Pj,&-ԕJbjiӢhx4-wEJWE:g-]jWʑntdQNg1͓n 7КhV5t6]"n,&]# \쐊hK9cq-Vk8Oֵ@Vlr[1b;^Zkm1m VݺxGE \ۚUr ..fCtȝQիWJLFSZlSBnǵ]v5̉iƲvE*!Fŝjsn z4-Z.)U66;Zʏ٭ *FӢgvqxeCpZޛ<4<\r3WZ5\fπl%5n5we$vƤ;S6ϒd6A׋6S/ P9%Ci(v+IvcWn 0u8>YZ\8|i5mGkp1z",hz VAVܺ"Մˁu>Qgpo(Rz\1wehrSw;t4iAtT~N̑nmkCqj〵s1 Eygsg[fXLH5jl}y]'{g?5P߫&V8^ z׵Tr\T+yEv{F#i,.WB qUB !@$$-7r;v.Il׹8#s;qvp8qg7_W^)wvf]UԬϿL| ɴ69(eVV )X]3Mi"+?Ic!Fm3 Ͱ6ӄKxm T=o%Qz2L(tI49 Fԅ5ki7tyk#]&:6M[ LHbRch݊JKnO;VUa%4#sn(J=ujm20;̄7N=i#olb"(v$fT9]ϰjWwYGFjy"F#t";5C Qg6IBša?Jf]->CllLN&RiȋVW0xZ[y~hj\b/ :ԷAק`;ٸ{uœԬMЎbG4arXtʥk,m{W2oU0R i9_ifR.{'?Mrh+[s(&bw8(ߩv_'ϬvDK.߅FA^G~? 9ˢ7ـ@>F*SAv"o")SEBC^@~Kȿ.!KۑȧȟBm{.mG8KV{AMQq*rrr?+pgG!{ȏ!8ȵȃG~wEq6zq)w ?89g!r38/"( r= zQ| Niw!?JQ|i y #(r#0򇐿ǢYrnY?8rr?@OQ|Dv o .D`#ȿ*M# 'O?2⻾qȣ[E&_/"D2kESKM!3'w ٍ(^sr EmQ|4d)9|/gZ>χ?| /_ !/"_/E!_!_Qb_ "/"/b/b/K/K_K /%/KR/KR /2 /)/R_ Kr/r/e 2__W_WW%_ W%/r_W _+%WJ_ WU_WUU W*_jjkkZkZ::zzooFoFo5Wj_ %_ %__k 5;w-Z_ 7M7Mu:_W_W_73 73z_=- -+ߊ + mmo 7#o7;;&oM ;; 3o7f[ o-={N;N . nn{{{^{^>>~+oV [AA6om !!;o0? 0? #?#?;w ;'LwN ;QGQ] w.wn ={1 1 qq^ {/' ' >?~'I' A?C!?0a?G# ) (Q? O4? O43e3 ?3 ?,? ,?s?s>!2GG1 1 ' ' ) ) gg97eeȌȌȌȌȌȌȌȌvePsŲo'^Sڳ];ֿ"Sup;w^n:w8vc=vmy8qShئwQyGoڽeW-583K{pz_(f _Y|U䧛c>ݯu 6B.沭yo!o|=,==yr=2Q;}7-"oDo@I|~Z#J\3k0gY)W&mr}\g֡~ _kkC`> Z~`ٚ/D.+ea&^pжWqBdm_5X\sA7a۴k~fyͿ}_߬Gid{8|2%bvsy*<5c[|Asibm_gkiv39vY|e3v&5~qrMl}ьq_9͘&}rNi~vzKg ik]. 9g5s1K\9?rܦA?uJ\Js~ڱ]{o9woufno~{Z|i3ӚyiԦϯ5iwF1b4zϹb-+B~4?-iVAsExg7ccrᆳY̯\>&k:wMޜŮջ__&{}9`v^:1|̡zJ^'RsN>_{Ou/:n\ّwk^ 6h{W9?J5&kN-B֦Žs~WyVO|j1>Sqq#[>Ǭ"Ե>}N/u*B_3?kmLs#j׻6}ۣAߦ2/r9fcVʯnl9muJXKS^.léř%w{c%͛>]^ e};'qf(D.R a-8̜NeM"2iDo*2絗!@5KVsmAi~d^`+hѨ[wK`.*㓛ɰn,1U++(Ώ4z*Y]xedJy ̌Ոj3&2uݴXAҲڔ ;l ²䦕`j'ʵӖ8<%[T5o"tM ۹Vh3+OR ڏ4SA@J,[ T5ʏ]qiիܽco9yd5C2[1fA>?&Lk<ĎP&^SƆ9I1DUX!fb D¼iQMZP-\p@4";04;1(MKD嘁-s?wE}CN,xT-ES5Xi_5W05ʄ[:d5jy8fS^[-٫r$%4K"K؊j-Kc ڬj;[q?.Gl#e`}ZDx'8&NUWP*_yKPJH.bԹqYЁ-{4Pt<\ &"炍KmJz}+}pܑz#;Ϗ2\~5,N%.Hس:G z,|ZF)%ز4饇uEꀙFG@Z)#x,[)VKMc]7#*4KOx~0&2T"Fc56+Dr?p w{,,\n 2vmfF>\c2ӺZ}$3(wɳUЖ^pW<r?jdYFd%T&ևEU+ʒ,QH٫j]ZV7cӪj+uqͫO6XfcqÇάݝ&rv /g;vamT(:gd 7'k l1+k9;gRZ´w=Cdn=h$vcn<4vYUt"鷫LI:cf {"0ITe`Ĥ$Y2c1Yڭu`%E+zˎJZQt~ңDzQLrff[V;&bd7s:(}Cn  Iw;9Mؤ1$f@="Z>%fkжJՏ7=ݕٱ`eQ),}ck9g׬+HK,E* ࠨpŊj޼D];&R {TJӸ3a  i2<@2ک5Oh[-/ܽꩂj}46kh(Ȫ+92]J~fs# el.( /UrkAXFN\KbET:Z}@LXmJ'wLZ5I:yնb L%7̋ Yv]vkCnZjXD j bs|tHBUx`ܹ p#ȉj._$T |ɗ~+b5\"-'İY9[#? oXE`Wʓ].Vr^NN2_Pk#%z6Ԡ4#:wlvS*Px@]b=$*TD^;Q֍5\e0)L[ع ᪼С;agͱJZBSN'Ǔ2ٿ=cWUZq6J3ꎽdr VaU7eXu0/=7t ؏n߫i4S&ޅi\!%X-oYY\UrUVˌ3ʼn2y" Dc|N]mgt}@AHn 4Vf­Iu|S:#s]3>aӍ+9X- ͂AYyj v3l(<Tsk+Վ=X*p7R9OC"7s/Ir6gl T>UbȞ,-YnOE}BguX!Pƞ)(䫴J0,T;:`*6fƭeu dJżǍYG3(aF[Ivcs=asn ް̲eX#J8Wi\1Yڃ9WJf| Y-ʪ mU>x-U_yC7D9^de_ *8jj8ce%0@Ϡ />ܶ3j47lg 2VR[x@μE!nuS:2 \:n %¾Z10;=Dg#" yRګAOhc 'Řdze}julW/mNzn;+) _=]dfW(J癌!djJ̡S̶ڕ|؋low0&֭h5*ߋUʔ.|$8eG:pj@ \_k)vo髵w*3Vr l8j02^/8 (,= &JFi`,c(e2%[8x` - eW_4u@ocԐ$ݝ2jF̐09ՎM7Y T+y%Arp26}̼*kc^057Ou 7*HT%C5Qة\"K͒t+Fc>KwS RȲu3rpra2=Ӭ辛!ņ x9]D"M3-^pU =}KuIH $}Ш_抑G} م5 Rk48ڭUM 3#t-j:tUfBTH^NZ+Clܜܾɣx ]P1ggX,gJ'V7{]EWnaOwFGݬ/HRo1QkO3r|4|ݹ% |jhv.I¼ 7'PӲͷtME80:vh mv-ŨWPZ\/5 /*QeՋ!ևaq-9 3RrpR~jdrn'Y憰.ǹL/T/6SquF#0K2]g)#{u]U3+|p8EՉ j:y0DD?i0v. Qz>jg7=op%v>Ǹ'}Fjz鷜tJnzZ X+UT5!goUoi^§:chFfe_-xz W&^b]k\h2ҡ!{K.d_Ӗyμ# Y1:G Y|r 4#th0tHպW 0qg2\~)9t(%Yu)>2#iPcڹńhSdUR'1Iq M՘ +Oc'ԈE1sZmMɍHe"72c'eȔ9MZ+ՈfK=Xմ$\ VV`-_Xo5# Ͻ~R`2:!Yu }ԃ(}%h\D!OHuKo[rj}Ũ%f.TB3/JvAM?y+ە9VL4M&;7T2ffbj޺BJ@*B>,5rf*:= 2.71.5) Imports: methods, utils LinkingTo: S4Vectors, IRanges, XVector, Biostrings Enhances: Rmpi Suggests: RUnit Collate: 00datacache.R utils.R InDel-class.R AlignedXStringSet-class.R PairwiseAlignments-class.R PairwiseAlignmentsSingleSubject-class.R PairwiseAlignments-io.R align-utils.R pid.R substitution_matrices.R pairwiseAlignment.R stringDist.R zzz.R git_url: https://git.bioconductor.org/packages/pwalign git_branch: RELEASE_3_19 git_last_commit: d161c8d git_last_commit_date: 2024-04-30 Repository: Bioconductor 3.19 Date/Publication: 2024-04-30 NeedsCompilation: yes Packaged: 2024-05-01 05:26:32 UTC; biocbuild Author: Patrick Aboyoun [aut], Robert Gentleman [aut], Hervé Pagès [cre] Maintainer: Hervé Pagès pwalign/inst/0000755000175100017510000000000014614351210014205 5ustar00biocbuildbiocbuildpwalign/inst/doc/0000755000175100017510000000000014614351210014752 5ustar00biocbuildbiocbuildpwalign/inst/doc/PairwiseAlignments.pdf0000644000175100017510000071333414614351210021265 0ustar00biocbuildbiocbuild%PDF-1.5 % 147 0 obj << /Length 902 /Filter /FlateDecode >> stream xKo8:J[doެbmʒW6phu҃̐Cٓ  JV R+ VMxV6K_cZd|kޭq& B֤*4k.'o⦈MX1a(nMFkn 5q]/Jq7cWun _ʚrѷyhiM&%RҚ:]b\ޗH)#[b$AaJIJ_Nh% C`ϔD1uf][D.D/uoc շ9;W SvW RndI?) UË/%Cm -׃Ch.گ|Z??${TBb"|:qJ"T"f> K!F`3ODz@kWZ^O(Yo}?b/G'N?/"*.Kצ?b Ubm)m6.*w?#mNU K$jC)$aH ?R ]mw"<ؘ<6.Cҵ5V ZhocIBGyј,}YTUS>xTJ3C8!GIfdN \@QӍ> stream x[Ks6ϯmڈ'Af*VVjw=dr%b"S IƋ <3E$@t7WVE U :DŽ4[$7-Lv]wrŠ0p!}EQ磌e˟. i!W:B^T!E+$?OB8Ee#[Qc+J*7/dޖ=vrS(Yǹr1=<pE%͓vt'H &X2۪֝M2z;v}[]$=g_ޛjaIQ6GluկǪYVoy1R%q3f҉p֮DVU ǖ_ʛ/K);&n˝ tB>6#IpRf_E-+hR$ZdK(t\77kS03K R | fewZq\ѐZ7jnvrhn{ EL Lqۀ%7PxJ0?Ghz-FiHxbv66>4^ E&v[adqA4&%™JKf.]^ΘXZM4`Fn"2 і vMǞV*b\a3sqi$fp0 рDUb*£0B1 VS/>fh&fthR|,y:teĻT330CJCӦL@3\g:k 0,v_"<5!69L]3:~'Pk e\$swC.0G11_;flB1@>$h <CN(B<(I*S{HM%>D B}܀8 NB6J=q&2w]^qzՃFvӄ ߔ-Jn4d8$)l|[II߽sI]_owʝavRb‡`f&bCdN丸LnkDrB: Rຩ0]ς^ \ xswZpX|hCC:gɕfQvݶn(b#:clkIUYԬM?{Ʀ/cOKp}[67Ug5,>& XZÛ!E'C= {l=6:ƝFtH0ǦTxuUwUeIzƐvBI@4ءj]_W$`cI`Q ꍞ4>AL +1y|մ}iprdH-ϰ1*qč'EcLQH ۻR*;ϏwjO'@g”Bp !:"+3/%.^ Cʅǩ쥂< NG' ˞L< *ђ=<4|l|졏+D)~jq*LQ LxgNl>“l8q>$O< )?X.>4O L<|1w\q)R7 byr3dJIq|pv:3+?+Qkj|s)+g-94P E0O/`醙j<a"",0\nܗUAMiV2<{o fT96;V)s #Fal:ib}la^61dL3u3lcL'¦bSx=e1>d & .ؿAtnNL[Ic0)nҦ:TOfIc>vӷV' A[r L?6eI!C˶3bN%uU<# ܠ_% WC.! endstream endobj 174 0 obj << /Length 2864 /Filter /FlateDecode >> stream x˒6P͉X |Ʃlىw=>>PFÍDjI*co7 |F/yK!y'J9vb^Jcı<_Qk(P>(#sY:=H@J,4LsD( N HOx0bΥ={{E:}a|ga<* (:T+)L>:dZׅ)y1 cG t-VYd.'?~Y260i]a<75׌de<@Rbm$3P/O3 G28l[h41AN*#Cg2=;g +9 D9]̧΢xUR#^3{\>dG[b ge1䀌}1?` c c 5|~Ǡ}4 .(/؟/Z.  Q5Σ<*͵{#Q[ ~ՕQV| 278:_N|ia؛ о9ɞ#PŰ;>y8Kas IUVuL͹芽Y9 |To+잏;,EWĬXA5j~vSc?@ͩޞ#[Yu1h=h/~q o} .0OǪҗVl0_lVn!I97-OJdU;3bxr'TB*"q{eo.L)g[[ŸQ-lI ]M`Vyyg_Ւ#Ltv>GS=oL60ب!I`Y[-h R :e`e9RQWu9jh[0f5{M f{ύpAJu>ЮH@'Б{zlVf YP+̎td.4)ϐ̴C,B`;#z1`CŮԆtxnꭶw]nTd\{q6U6uzhHƲ]4J6%3]#jZo7}`lT҆|pl>Q?~x}J>F3ة^a G0~'xp4G?Դኪ|TYZZ ʱ<34Md,a q{I\[!h9~t[bGPG>QG1i̋HrEu['t{[Ӿpf{xNJ 4&nҗ6Wi Wet&IPf^]<^ Sማ:&Icxz_;$}>cD Rr&A-i|e4絑m{܍5F8AI喞HiqF^B/(u~ErM'ޏ nG0SKz同Ǜ'8R endstream endobj 2 0 obj << /Type /ObjStm /N 100 /First 828 /Length 1990 /Filter /FlateDecode >> stream xZߏ6~_Ep lk &9>8^'Xyo2\[7EQ"3 3#)+Ҥ"&){4VAycEcTTXÆQ5BLy-* Ѧ,qKAYWR"29尢]g]@pȢ&/yM&#*l"*_d)燇+33y͏4"7Sp?ުWg=CzAI!y*y_*,x Snp̷yV#W7Q3U]a E5xЫXN kib ?͑SqZotS 0}nWlB[ZVpY5{祚\Vv*)ad5{]mۦ.r`L<i|س"Yw$ xr|aE<jmxؑ_1.l:OF}ˁP+kC^&޴Xh+*aXU)(&I#jRh3~-be@ u@y|30n*#mx=GGNy8Hw bfxH_q&^&fҶh]]ߏZ1!!df9 j?]ga>E+Gy#`AjN%~\xV<;bDǛ7]1P70бsOtsI?=:,mu0sۑ]G渶tvoJa>:]ypM>ؽmn@߶qv,KqjDQ\hz+M.Fꜭ{- endstream endobj 182 0 obj << /Length 2717 /Filter /FlateDecode >> stream xr۸=_5@lmN:l8Ӈl( H-I՗9x5d;L_ o_~'&Zq- BiJŗEΊJQ}6oUߔSͧ)̧iTRrXc˯YŊ1t5u[֜VYv5VqG5MU- ½~'Ĉ^&`OфPGfe6mޞڼ*q[bٛ`Zpjzv_㶳zdyw k%V9盽f?)+~΀hW"k 2ޑ($ N-+K 0kIwe-IR;/ɔ0;p#(+"DUi܊e/,L)[2Yac&"XgXdBemMbTFȒJRcoޜʪcɫƮbڗSOH"w Ev+G0GвK7rFD>C)[wb쟨HIՃΎov? eXW:A<~8i?juKLO4 2k')QW)L[U )My;'KEY:rpXL@šUb7Nd[ ThR2#S}:tE.–ކci/4 ID?.Dl1%< ـjbJ(p}]2Z*%IņaM’$tBm6\p˩v-Хq N6NB%$p2/OLPM@Eꯟޞkc1.F>&7Y=C\K"&.?`0fz|N-[еصbw%,|}VoϏ Ǿ.f=_ w6O޸'P[XVטX}G'xJ}¢n›kpGe; !5Ɇ?}SG5!)<GDn/6 WrBjEFLjB>J_a>isIm#s'g0y7vb";b:[2P8P<uAi!z2= OȅV$~p+!5eb2Nb2tf ~rO 6cޮޗP,sw@H;ٗm8|v^vZ'ضɂK}l/$ԏ endstream endobj 188 0 obj << /Length 1533 /Filter /FlateDecode >> stream xڽɎ60|UKh@ڇI3V#[:ӯ#y䩧D>oћݫ`$2*$Z1%aVlѓQJ ^ק+m_>}lm4n6w I+sƒa+pkL7ح:06KaWH'˴:8F*>1R idL<>c$V iKɼ/ZWRgkct(-F#Z@%m{15Ors5 t'qAI]F?LM'"Vn7DC,Ifs@n8| >!હOrmvw]PH紐DQu3)pH%a!#&z+@gllw,F>>9<:`T :?VDɳ /eFE/p" 2³bE}]ܪ]T'fqBvwew`XPpbVB]iӝ^b>ҾA>Q:m%ar>8) e}0Ȭ{Ovd2$*X* w^%mcӃAMz1"FN=Ф}6rp3- C9Šr^WXv T]B$c's4RMm*ņ@wU\JT$?^%-{Y0|RITl}kH3CbtK6vG=2?(^ FKEG9^zH`zw)8L^|o2B )Bm7qƊѨOIdHnXqY Fh>&.I)ʻp[|v= vQߺÝIӸfNOޓrA^y5AP!myܼϏc *#ĦI͉:`+Yd>-A]7<&*Xބ0 <_ʚ]v!iA kEPK˰GrD'++'>tyctegp }ej!x1F!uf*_VM#7p8WBx'KH.L\\ #HF5 $k^|eWJGχHٞ-_J4].|XBK}wqT&$lҗ'xҡbk j̛ BDحE}\ N|`"{ڋF}fn>z`h痓~mus1!{C]as]*8;|?%2fm@=F]A(T^llWh.gEG۩L\,w 1t*yѵ%aϮCYWF(Qy_{/- endstream endobj 196 0 obj << /Length 1180 /Filter /FlateDecode >> stream xVKo6W9IŊIK@YZ "1:r(Er4[֞H= <]$Xlv HyLV9]lބcQ)Rw6\7_"䜣XaL)-ŝps>K$Y(h΍oˆfp lEtuלp=.oʦ٪C]VJN{VQ?Oتc4ȭ m̆/᷅NJKWma='RR/L'K.)K3B犑xam9ԿL`"^VqWZY'w5X0S1ʪz&2Lo(FablINFҜ#&սB% l M6D( # ~^cSڮ40{S{í8N 25;$PTCO1RM>4OصDpA11z ѹff\qR*A&Wq6-C;3yLrMɖ`mzGW`&2:mj}R9kBʃ/j+q=@rFR&"^= >D(ҟgY[g·wAW84$R<.-Ia0 ;@S54gQ:)Gz- i, 󌾌 W9ab[hfDX @S);`FD z T.?%8 6{pIY5;V[H{,P 1U{H,MWA7ƈrkkx%F])I zLҷf:Lz'AM'gR9z8 ahAy5>=)H!Bϩay6BX'x}\|P8P4Gx*tljS(|,|P¡12>EB(MXkvh%[; DJ, lLnY9;u+`3 !]N`wR=V=JW~WZL!ee~:F|X_ `7Y_gp []h_7S㉽ RcxSTxqfZ endstream endobj 203 0 obj << /Length 1318 /Filter /FlateDecode >> stream xڽXKs6WprH3&7:Ӵ!δn4 [PMRu|MATft\`]| iA< ?iW^1{vqs}Vs8Hcr]/Z|\/HK*%b IMV$nYT%lu^nC]clsd5}HO_"rSƴB(Ř}`R5Y{i LTp}#)vyT#B|)0J'MXTF G{Iˆy-C:[8?ua{Qot$g S@f9ai֐++!󜝅+Zcf'kH Α`zhpGo=e~WHʂIDU5]bX>smȊnWLE9DG/'JDXi"!,)"{,"MU4.>dLu;֪o &4s<0 8o7)O}gXJz\x `R5ȇ_ʗHuzYEˀPu-uY<4QA^zCjI/&")XI1)oʗ/Y?3L Wϭfs{!LT;ɶ:ҵ5:Niv|L?kz.x='q5h#j <`>obt@LJκ{`xwp)nFJ#W'7WJJJJJ]q,n TJшd<ߎ̭j\2hDppߗLRRo0h>1-<ɉHF33^Ƒ㳓#];23OyN2h":`7Ƀ endstream endobj 208 0 obj << /Length 1438 /Filter /FlateDecode >> stream xڵWY6~ X5[R lӤh-6A*lёwxHdցa"gșoN~Ï!FLD0`$VO8Ha][q<MDQ`_r(T 1Ș84xWuVOw7fY2ы(8x/ C?wUli/Y0a$cd=*=2pa./tץJjUMgT(8GTj 򇹪-.[UI=\e*Rv1]U:{T\% 6%)iD\-@duxHT9Pؗ&gBǤLV5sqe!pq;1_jUҞihK8vT0 (w[s?oE8_  p2 3n2D|RyT7B^ e%fI+>@bd$mQB2j3"sNvL6[;tN~i]i펺/t?cEɃ:(^ٷ{'ZNGTPn@9[MVVy$#``{sm,xieNbCr&ᰐT.U o$ZRmKy*m{"sxfjq@>C*. }$Qj䠯3Fuf:KIQj>AÇeϭ9uWJZr(%j;0t E.(e؂ڥGGx刈\H{C ( ]ZD87?~ACûsG˃fHQq3%dBϝ4i% }ݔKB̊mZ3U}]]IzD.F.ɨxS+''ZLb M( ăH:?Rgoc]DۮA=vR?L4aPX-+*GU׊٨d9IAf͇jV=J<#{EpL@gN= 78WAw|:ӣ} Bbڢn\XٯEc,;Ǻ-vois%2[gW+xH1 ѨQDfn]ć/RN[9vՌعcom^. vy]$:HIT~<1\t:)&Q4˶;:ҐZ#JgXZݺ<:b掙؈Y8fUwLq1ozGPJ fڶ_)y endstream endobj 213 0 obj << /Length 949 /Filter /FlateDecode >> stream xڥVn8+ YR$%ڢ] 0@@#ӎqM "Es_^7o?sH$qW C*׈n88o7iUM ߷jC;rrFf@f@[20YxQyYK5Eu͌  0> H@@|H9!?+^?%C.iNk IWce?=ǣl^aO2 4`"s&ޜdש {sۢ+\`XUM(^~*\ )ҶFx/Sepp+Z8Qd Oxz :r%^֍ɥN|6Vsvx "[OPrKam -`W担p'XzA/0g&CM`v9lT`IڻL"c㤾RIlUnO7G:A£F4[oH$P;x֑uN}-6(˗hjʣ궃nQ6HqS9W#N.tP_>S =Co3mhV7yUյZf1".,څMB߻y*$l &b6S6{i%NNu\)S#iW=:e r~/>NJɐG&H2 J6@r̳It4IvebD:{.vS qMiTĶwf|&\s 7λڱ~, lduPz®^%/eۺ7e+,/i8[dC9GO6>SaM֓i#YKU0f9!<@=>U:0ڨ]6 >5kZ\vU3H XƵ5' % endstream endobj 218 0 obj << /Length 1741 /Filter /FlateDecode >> stream xڽYKoFW9I@.s(!%p|B"-Β)Qaݙ=oW$G E0FB ^]`>PU@$eps۫O! %c*D4 $ 5<wO4.,~x{aLB2d @P|)cH.Lɺl9K փXH2Vs$\KGqYLgTIdbm+fiV&ǽ2Yv{Kf0paeuvj}GdN UP`8l#ᐌH $Tz'Y2OEqZCkh=rK(=kM&BIr"(rWY=`[G%`ZtMxVIxƈQL'4C!!Hv;pQ/).xY|;TB6PEfΖ5zc~V܅u[;X̨RHC YwՅ4Bb3DM]} pGo֥#QrEťRS1k3 ȼQ$8vDA}EucF (Fcp%bxU8yrP`G yìIjҢك)21bx8]k]=9pGLqZ"es28T[(}֮Ķ'6Vjß:"#5M1^}; Zc/Qx_{t{%saH{(°UC4TnOjͮ:9a%^YwWr%Ɠs-LLZu뎼d:qfuID] SH\K-F#b[4rQ!5%0X?t1E~; ?@)$z'w{[MCyr?iCزa|(ɟ+06&]To('ߛb]lPhanP6^ۗĚxzN {Va}HT0:3,]-.LiY&s^{[3XY.1#h::Ni`0ɓJתEA eu纗W΍u Hj]=.ڏ۩T& uW]ѐCy14QnnmGQs0!NCYh`VoO)%Uv9SСS:hf}O14e NǬ Q~ ꜧ/mex"տf7Mbo> stream xXK6W>ɨňzP-@Ɂ+kmrN!%[fC˒ ͋Wg߄lBC̟,',$ɄS:Y,'.~$%)M\'>2;qTjeN69 w\BnA%x!V&/$ Ǣ߭^I(HFh%/RH2s9?xLS!/_&IIKnrP()%iYUJ|QɛZdsUVJR2^YٮQ0$,EZvSkcx`4X#hnck#CA-ԑAk]xOSGIUN 5:OXrPC ʡ7s^JTn:R . Xp%U|YCM=4Ȩ'RQI),7g R?>Ao f|@ ?&?8!?0 N1E{30&q??|6g=8iBڦ~YItC?UЖ7%`B7ci: g4qV;GfZmXQ5Y=-”0ڎ b.uJT wh~DZ ~s vd.o^޼6HGS{UZZT;Y~va;وf'DXgjgTE#<w+v??̅*}suX|Gg_I[Ԓorx6/XgUuO׶AI^F> ӶGkKdĶRZ'öŧE3]//B#]:8`ߖa\ZdE%O:>W!9 H{Jу>=fHNֿ5/NKes *07/ugeO?_j@6a]%>^<?e endstream endobj 233 0 obj << /Length 1744 /Filter /FlateDecode >> stream xڽXKo8W=h%wŢi&Ei8ڵ%$q~pH[r4 =c͛3Dν4 z^2R) '1g~+)O!s_LR/D\xAO\xQBܛ!|sr}J)+4N"#n{]l_Z R/!@`8rsS7su@I [ 7+V#e#v:ou OW]_k~:V*pGZ\#)>s ٖT{h6J%E*[.xW *Y5-$FlY !ߐ i,8$؅rK] D`D>LV8`~PbHd_K/Å ;#ŸCv6$7B/<iNA>yDzEYY$7w.@1dZ@CH/C4ҵCw{Mj{Yh]w[F/ʼncw *ʋꕉQBh9{BEOB=ǜVwseGVRKS Xte9P*f™V U_dzY{}S㚺oZ;4;[h`$ ]FƋBqL[ߩŪq!3a삿wiKxvp~߷:̑Gs$$ή_['Vx@zXbGL fuA)䪦֦`&Alv^wut2;?CF?oFnv|Dboc[[*M "k>5R=84؞NT0 H~r<f;994Tw r-%zӋ xO&{#K9d.ZFM3k'VjR蘁C&yCMLCVy7ell/(/§EY6K`T`PipJUJ5f'+xQ%-^½0h\9=QZ8&pٲD=˅94c4%"08-T@Sֽ'du]/zp\{lTG> stream xڽXmo6_a٪Hk7K3 E0 Fmr`_֋%IN=w˞=("/}Sw5 ?h'2Y۞}s9LãhA.[7ZؖnK!^HABLvjB[}h IT5֨s0zÉ҅\9b%a,dő7KeCn6ZJ?}xXRzƋTVg+=&`Ne& [}^}ջx(Y2#I`nK~Gf_prqH"L$Vb]ڪ=vV}em Lvaڕ9&3'Ķީ%_Ivp_΅64zdyWP@h?V-n;io,f< -HNN/Q6R$qV+&/s۔B:;MM ySI['(q9Ɛ zSV@hNpJQjt@A=s/~;:;cǴ}e\Z}cMZ*U.z;mWfwcyj,μ5i ޅ]-hG .H2L@Ջ0H  gA@!H<)fAp(٪c F=(:CM@ f?Ph$ԓ}p{y11!)h|Ce;_5R:6> stream x]o6콿␢o]Y8hmh֡bi0JΘvm_"u\'H {9IQIrIR/y<[^KfqʼEϖiƌ9"u'˫i?_v3ٽgyş_,}~K得 V 6 Ɍ>;!V܏C1̢0];AhZYG.jK$f]'c+2"? %.ԑg9NExu v(tQf`"}WsW9ׅVE^JZ-ys;Y%1Ue[m*+&LιqxQո6˭u ΄C3Za0>HdvhPڑ1Ʀ!n8f1\@uq !JV ϣ׿|xy/Q -P~D)az+Ӏ~J ?"X}w 9sQÓK(l)Mum EIUQw<809U~}+R,Ǡ BGf9ģʧ,o;-.p1@ػQC, Qb ʐnbUtbn)d 9'>ƒaf(Q ̽&`őAwڼە"qވH8>-B(.U,HMQY@2ĤDpS/v#)t"gR SqHؙ뜈L)Oa|+$ ڀG/-'OR;yW;&įkX"9z}vlqR}[m@ߜDMFp1җ H_lOEP(5w!D^C*hN5|TW. H_@]ca8"f&-7#5/&Ijׅ7zMB'`̤5 YO6I8OI//\ kUtO%Ik5i%Y!i{UܘV3`>mߴBU5ϥ44@S㘡s ubea* C@#˅h澚Ȑr]?ڤ`a>O!0v׮e깢BN_AH34U3>|I^۠*<æwG"݁$Uǿy2{_G3=gRTkQ7qډ5>|b,zcUCl4'1y>Sn> stream xXYo8~ϯ0Rkc#cY&.Zhܧ4d$KZi(~spFΏޝpIHz~< G W3/Dhҭ<)1s{40ƌ3NJM.B~C 6ʊqb _b2;ŧT-7-{JrbvGHrK8/i~ Gܤ[^(EcK^Z]`"bYVwcW]ZTi*7ZRIh)[^el%WWRʘgLFG9i>(J@h4u0  2!h # ԶgMX.,z|d[lZ#o`ߛ1qȔQI ҄+퓘c>i(G[3u1cϤ'={Y/vnpkAbgFI򁫤 H55L$U񥄤T.^J2aJud13&]^ZOҧ֓-t XJ2'us+,i Y< B>Bp-IF8I| I70膬0q7G 5&}4sHkECSS e yDv_8G7N'6NIt8\805Nn{Hdm:M4R3lO8u7k/r|,'V@<q<a|cC7NZjM; '4;`&ZowAӸ|FT)F9l~9 endstream endobj 254 0 obj << /Length 1210 /Filter /FlateDecode >> stream xڵWIo8W):1E YL a0 4bb6Hr򨈲;IH|ONΦQ4!OH2=Rj!(L9b,'$o 8 0gㇲzs_7EiRv)ph𱨪.+uVTۢY-6Ba-G몜x^m ^4&w?GB 0yۨ*QoZ H"Ddn,ZTexM㤓84 Ҏ$H`ߢYGD t^vmUԫ/tznTRZiRBi[4 w5zQ'\ru3/Ub"[O_L}upEbZJ.$¬ϭ8-jOaC]W.xf_BOL6Rt8IB!簂APƜAUkک5C~98 (R9qgL:cGq'-f c~iIHSՌRߴ,x:cI'B#)XTKNI04d)!ܴ VH3H"c 6#Q)kQ:m7ӻ=@O+I@R11!$sJ"& L8d AAaYC⯛+Z n.I~?-z9w{q: qMy(/OI jBڭVF=ښ(]>.d8Ί'sX+B?T\Cf.;i޶1/ 1x<NBθȗRrM%d>krZv=N߆:i&|)2)3M7Q&O%׏B*kBM[bB?u;PTnDHX{9u{e!>Eo _0ɪ WD=O[W3x fغiSEVeseeEJmvw]Wv5N @Og:}^,®#,ݟ7e&wǀ)SO^+vV Xt;|υ]!vS> stream xڝXmo6_O ,Iz) jka@ dɑ#)RZ'x+_.7>g,y. f^HYnM[FGZ}o~ Unڭ.u2l 0y16A-CB,gnb]`aO4N$J(i1 DZjЬR&?IJ8MdRӉp<+4XD@ )DĀ$= KQiP-I5)Itu j+`BbՔg x7t|zƨ]ze N_~Ɋ텥g#v("&ΰc? aCA>X( NFW_U`|7 T:z' #@'UJXyE;oݡm^u('niPLqXIꓒY3,ā}(OZ9V.jVJ! {=UW}!tDQAe~z!p,t@Uuc: jDM)Cok|A\v|m?#ǫF;"{\Hu(тANngtͭ,7]0џKΈl,o91gj Xle?#wO-DžztYv8Dڶ}uxo’Fr4Zyi1[zutT`{VA$R |Qh|kPo=!`]v%4E)xN.HSrSVzAN-bY+nMRodǭ*'t}V2_O=}*dZvz8ڙo粪7Bi%2Bp!<#KLo?j%3}{G80G>&)4 %{BtwP$MW'z`A/ ,7%7W'NL3 brjx׆ϩxוQp ǺHRDb1j25rs{4ws9\oz endstream endobj 265 0 obj << /Length 1621 /Filter /FlateDecode >> stream xڕXr6}Wh$M"fj'I튵K%y%HfM : ;'QH^&;ԙcM$M>N?Bϳ9Ls܁vZ-  {EPfˉ9bb!9 ,zXNEGU\xAF˥R[ņX=~t%d:2%K}R]R+2Úriv/l{GoP1<%{wĂ>!C ߄x >aU&D %sWj@!NxeT˶ F>z5jR1DQa>#4}ޡ~ps(b2?۷0X-s!Hͦ1 2{{.sF#D c([y M4E\ U ;xOm>*a@4]۳>NI"B{wq*9vqwaЉO TS:]&,`۸8+aBb_%>˿x+佟 UTJEʆ~Uk9-S_)n!j݃*mVevP<҇shhN`I 2pGF:Qyq$]ãMZC)LhNU~k 9 u}pD?<߂blyEu'#\&LYR@N~IW[% /)ɴNyxS<츳N0=j'x'5':tG*u,mjuknqٸrPUf=yŞbIR}u%P# Kp|f%kN?(ַa}{|:CE35~祬Cf.snw()Ab:\)?jK:7J;7fI5DbW| o05 endstream endobj 270 0 obj << /Length 1112 /Filter /FlateDecode >> stream xڽVn6}Wyk.E[ v M6FQ YfldHr2%ۧXC<tFBJ3TI%ND$GQ9$tb +*n:Hst>q}u#Tp\2]~">vzWlmnҼ(g ͇'@0SM"0<_0'um=JSǪ g]0߾&Ko^Sj;Jx W8&pws*RAm] *OUSE.?lDB@jDLum&V-6}ZҦc_u?t@'.JӪ݇ܥc9+\̊↰/^rjy8u/#Z3Á@ z_W0`#CS'#G`$Vff,'W}0 ;>FȡLҺ6e#x#P) CUw@A(( sF>".q-N$L:d'L41L&8@"2u5y8iо:w m endstream endobj 276 0 obj << /Length 1026 /Filter /FlateDecode >> stream xuUێ6}WQ*7݀^mR}X恖iYD9Cq/ 93CQ(ω2͗ )`j:o1t…7,z;l_J*͛e1F,$v}+’QJwߴMXg4.?!*R#IA=91D ;եPίmL?Íx,Y#z5I֌Ӎ9jmHk8`4~[ 좇A'<Ѵ5I0n[To,,rC²V5oو Rf'`b/s.J8|kk}#`t8zj|'JyH/bLGoN-K?Xx9yrBdӬ4Wtc8OcׯI'ۚ&Ej;x;*xvT* 1nE/+&2axJX|wL.ZUQ6NJS&*M;$pQ Sm;N5s4.NZ퉊NN(eYY!}*2^ocɹu,9zT ":!L5ǶA-BP ӬKo_zܺ8mZZJpC" 0C?=H' DŴ&DBH hhm4y%!1mb#u8O%:暑996]BIZ}RB.]QcU*tF]O/7g&Z2I?&0X1,Gi|J0zgez߇ MIxk`0+u%ɱE]> /ExtGState << >>/ColorSpace << /sRGB 283 0 R >>>> /Length 1044 /Filter /FlateDecode >> stream xVoE flk#@d%Uh4j3RUUCelϳY+KzOepԌs&MI~w۷.)%^_Bn^S7+nքQ\V8+ν]'la;?琫)Ρw4%o 3 iZ>O.\2ֈ'5l{a5p2KqQZ/6KЫ$/Sh8iBY^jC8~!03|_zgwJq]xCȷG[iZ?^>8oq_{L<g?0)7q*=pBX)2Uoc8 x96[~B-Ρ4j a~{_W&=o+> 1zA/~A?/\]hA/4ŐA/ u//ݺxŘ? 12v k_c5g ?:Y:^t``챳;'w/o?Q7xʾW|?:W-xOm5]v?-P?upW,ĽwwgS|v__¾*Af F=L3X1O+)xKC*m)dd >/ Aq`E~)ׄ5 9_exo[ ›pxx|ۇwxJ<K\`W b9?nQJ endstream endobj 285 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 178 0 obj << /Type /ObjStm /N 100 /First 869 /Length 1837 /Filter /FlateDecode >> stream xYYOI~s"%$=x VQWcx<8i+a:nTpL2UL͢a**C7101'S9ѵń_Ǽ(fb`|bɁ ~E3% X_ jAe ц^Khb4h`)3 %i ,k>>̰ 0I3lR200(1ܱИI8f@<3hx΄vhj$$E%fj`LkHޣПl`P5pd֓z8N[ #sId9X*ZYlb5! g`,r.@m8yU8\.KD Fр('v Ok, ` Asl!_G4HM>hQY0$aD#@`t@ O$O@dox$NX{eqoN/G/^īz>-ku)ވB#/IZ p`{JA9`?0>|byI=2&[*Br=uRPhx2.d|z8nn0bZQڶ`^15`/pkj2z{p{L=Ԡ XM+5X9l [N5y FqrQi$=mBDrP bヒ4մ'C*nԽJFQBhFx/ͨedb!&#ʔSbY%Rj瞔ډodwcJ✘$A !TёI&b/e0]c9r$M$d)N47D IV\8Tq~? frCJz8vs#Wp:t5cJ9OSv*/ke < J !JYC P3!|Z)zOOeʌzU<+3Gp^'nؓn7CYR Y6+ /Rɍ]9e9pFTV*EݽJbJ:O?xlѻ P{Lk+d_8X4\}/P=Ŝ3RYR_(Ŷ<㠳;68Ĝ8a bfW~SʡD͠QsT(j2˨Srw= y34B vK`W=PC?tOֲr(-Ey=WiT(ä:xb={=/>Nm-V4k9ꪣ^5 u5&n%+T#kqKsyXTśjj&e1?z<ݴ[B=9!1sirk+V 7br?}:`Av2[.PeuSMN+5_,&f^|DQ.JyV_D3+q[z&h(K8G'b1Yߌk1ޢGb3q9)ilD}SϪO,(zNz_}-uHa7̗.!%^C=JrxJ{CiQv endstream endobj 290 0 obj << /Length 751 /Filter /FlateDecode >> stream xUmo0_&vLjC,|*Քen)oKAa۱c;/Cc<bz;4 oK!<ʣL_$Kv (nD:@E?)*-(DIZ]"W9l.Q|8vWAJ"}l9l*bՍPBeeJ~5x~6_+[떫Ċ((zF/ ˸EFe=N;Neg +d1z/f@TߒZdg"om:ffGpiqjEjn_ࢁ< a]L/!rkqŽZk^Z_i\ vDpR^'rfEq'NĊOq ĩ%شeYTk H[)1^k˳GU` 0ǹ5 ҢTgѴԪ(I?U9>4]~ u^;<9cmP/0g8 RhCcBv~Ϣ3FH<b6#*3΄{A{"TNyb ?!eZ4f=Ƿ\w;S0󢹵VE S=O,4~Ng}wm+ғ :q]3]W}j7Y$Bs$lr7R55Fqra笘|5m<8W/ 8e|C~ endstream endobj 287 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (/tmp/RtmpPMH7GH/Rbuild3eb483282f4dfd/pwalign/vignettes/PairwiseAlignments-profiling1.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 292 0 R /BBox [0 0 432 432] /Resources << /ProcSet [ /PDF /Text ] /Font << /F2 293 0 R/F3 294 0 R>> /ExtGState << >>/ColorSpace << /sRGB 295 0 R >>>> /Length 608 /Filter /FlateDecode >> stream xUMo0 W'ѧ/f'o+2iEW˛ͺ> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 301 0 obj << /Length 1854 /Filter /FlateDecode >> stream xXmoF"'΋^Z5q{.6E.(&8ѝ,IPFmkĢ83|HCmͥ%^,ߞE#L%,ߜ]b)y4W-|Fv(Z~w-WEPNja4骳nkea7\J)8gꉫ2~b/~jajGk-?~}cqrKN~(r*KuԽ*?/x0wDЩibWGGl)'9s Dbk-ۥ}6z_0ھ8#6'+-Ncu'8QN7NSqɅ%,ҿ ȄSXt h:wĽj;mS}i(5ۋߐ]= -3 ESuY}{:g6t(<Ԃ y>{솑6z#Vף30i __nVUŁxv֞xrRl bʁ+?p -:y<$#CȈ;2zR"q L~OJ~ujCM8o-=7O/lƒyL #GrߑBdH:$òP86Z!9A{>K*@a;xz\koxp0&b ?WIL4mgp[n )pBi|d "J 懮Wv)&Õ#갻+ ӂȹZˇ;yHyR&c t|02e`T{˞t4!\8 xXTqO*DcѩbZo9UimꝕABCZ:>FcnB] E`A:S8Ù4[7F|bw($}4Nn0lEM F}+WQD׵8Cw0,( `Z: ˩Zx*C?& ܇싾^8]g,q4S c9ΕChtѮF4_U@V ?! !j NM "o-}>փmW=لܝ7C218riw;bSߋnr=D$7>q " aG  \u 8,ڃ5 e$b%~HFu/ꑚw gSn>UڃUW>Ś3]\A}yZ,qQfCy; &{[ګ՝>-jڹ ]lǭڸJ"TxI1#Ҳ1'z{Z:247$<h=^C5鋽 j8|iyyJ3xIq0_ᐝK#c> stream xڽVn6+bHJcH-ㅬP YtE9]{)~Ȕ)=qxFW7%( i=c`%L0=Lb Ca0 &_z^PF=BP˜6z( (h"/`NC; 4aņ/||d=5gL||-VQ_+Ak^W> yqkLK7aX&P+HwOU#e#'s|:.R0z'@ ,x$Aa`=2ܟi=^DY֓T1-N3Af{BI0C#rU[;|.B (w:P.oAt?o=W?xLm~ \JoWxnKo;+7xpp i`q[Lr@%/JmPo-+1t+x%^tUp&Νϫi|N~_+^fs1]Jn\| \-I^! |xƀD~7PE卩U&n4 LQ8̲E_qSPT9vcԱJP("aD୬oxԜ}DPHjLoMHRBvFp kN3g( B@Or^(A/y w|V.Uv`G{ 泤\|ߏlj0+4- vܴ*/W ]onf1}>oQJL:ƠP?rY׎8F>6LQ8 -zlo獭b4ek|U/y]XF*[Qw럪eM#XOίMoVRO@i]|aMNj;qAWTH`+;3KGf_;+cjzs>{9$ͱxiŅYto`phr.HDx4Aql> stream xڭWnF}W~"sW^6E@ `E䁦W[dI*.)ZE왙3_.\$$ yXQD0$Jby8Ek"^ZLy{3Vh/-ueךlkmD+5"'봾|tUigP)Ywhڲ[mƄI>h >+).`ae~ryΈ G- t{pR>7im4ob@UQwkkqHX` ʈ+b"J,XPoP`۵i]Ŭ$<-7iQTfHz]EbxpN]W,v HIUkrsV>$Q,D3 ~̿5rb;*853Un.۠0n f { ^S~D졳ڻB1,&l8pXőفq6¶knZ'$c !p8.1y\QSm,$6 Ah4}JOY8O;BIwu;L!Bb7)KX(HޟlzRb^k@.Ve>n;>˻}hxk)筴2Oǁy_9e; $=^"Nu)o'\529ק dM(ADgfOql%J\&l endstream endobj 322 0 obj << /Length 1195 /Filter /FlateDecode >> stream xڽWKo8W9؈)Q[))==(2mkH$7v~a.CP'糓Wh,fh% a<Y@f7σWbc;},0)#ا4݊vLS(qgiJbcc`BfAͣV}SOܲ 0D.ȄR.z`8J7sxj0Nh8;6E!|nE!`cҬ 1hd,L-g#GN8]Lp}ҬJaCډ"K$D4.Ix"<(Šʁ $\7^-V^j۵;0Uz) \A;?3rmCeME F[x&^} U#U&"9zdjZlja* P*gQBY,tv endstream endobj 332 0 obj << /Length 1464 /Filter /FlateDecode >> stream xڭXo6޿HQj==mDl3[r%Ic$QrQgGYSlf9$X,t~,ۢ%e]e?A`v_4)vjH)ꠦ>EH)8z*v8V6ḗj%}' *'Ѓ%^w܆ 5 L' ' 4Tͥn3<1xwWypf .x.SNS)_aI@1%fН9ws4R娱j*EȦqt噲'As2gK:W7 )ѻ'&B 3?SۢlW?g]>رq@z endstream endobj 337 0 obj << /Length 1030 /Filter /FlateDecode >> stream xXo6~_!(`_aС aE} L,\~xBZr&EAy WIhAj33N̿f.b3S&RύQ0Sp3jw2K^,W\;u}Y> stream xڽX[o6~ϯ\tX%{lpŐCaQ(6m %EP"e9nmơs?qvw$ŎA#'`Gh<B#<^~LЉL0vbW,ӛ$\\r=)5P߮@!Aď'%0|&;A ?*7oLm-4ˊ'n]R/$4_ ^W[--Ht\+Prau?*n-2\3 VB 9Aڹ8~;KDZ51֮o$v( xHum0qBXe:- ! "CQ!1"uCE`,S?L-zJ;|*)kZS\R VYR5*N@XF B\GX*!çtUo.,DtcM 4>F\P'D(C!(#:v"@i1 m.<F7l^|d]ԡa'qɧs:ҙ&ZxƓEdWJ{>~vJ_ {/81dn~^.>{E.)4;7oR]-xr[0K䲂I ݲ }+B+WW [8y\_g&x";Od2OWc1hIW*zBbZT0F G1}&|:#Kh>_֩ϤBdeܼpkM+p+Ko|#8)QN'4o}Dú`#~ӧ]-$ПzΕ}B蹤g^6u)DjE`)` &f{$̻guZ[Dws&)q;mR1{bu,V*YrlPU9^,^2$:3rV[-)fj4?#z E)$؟OFs%? oGi,*ű3?Y6sX.^H%K2{C1}Hp%1AƜLmzyx1 %jvu[!o< 8o+xeuzǫ\Zf>zΛծ/0,%AXNv~`o$]Hr/Of8ul Pr23I'_FZf0]ӌO4/vܝn7i29>u,qtyƤCC;򏆳֝ endstream endobj 346 0 obj << /Length 1110 /Filter /FlateDecode >> stream xWKo6W9,lREJh dPqiMʒVM| =m4HۋGgf8>/!;qk b cd웹;+Y> ɒ%4XhXh|I 6ŠQ$XiY1nOAmveUŊR bkķA  F1F#//c#V=4;ʧ@r^Uڨ]z7j֯:,y/wTM,HjWp"i}>}t)^N|-WA1ΎyŽ,$8+Vwe:eVU 0Y!$4z7?ӭ 9/21I3n&4v<ϙr;䛣-7Y*RӴ {=djG׉ xbjƶgN6ECYBS9 @* ?27 \~j2"Kx? Oܫ> stream xXY6~ϯ0d#1(z 6$ۢEj#w(e+ (917ϔ94׳3눡z%Ʊzvhd\S:oUjyy }MUO:x-#NΏ`n:+Eʊ`4_?Dfs՝/fE7qfێ,m32C `39<"@Ąď,yC7l uԬ>6z\pEެڃZb8!/DP֠wQ_u%G8}">2P 6Iv:5%Q([qƎJM֮kR A,2Kw(]ڡ4/ 9~]Tnc wem^۳m}]ి~w'::7 D2zUiEJ1HV?+w̕9ve )pt [+b[N3.e@AC$X 0LQۭSpwZٱ.tCY>i̻u3MrJS+?a_m9QEwd7:+7 U[Z>R~Ve{qRv<AM; ڷMq0 mGb4m %6*@t1Da. l!GU7~ĠQbb ίߊO4,i;80G#J4A-02+5$/[y>"5>H0I. cWt;D-C"!4={ ,YleMuuȭGl/XcRoڬn!_<)}a`\P$@BǮ'qsw}M>tD caR%{NP{Q ؄5F%O.H:["vPm:4>£ tItBp;C r3=~4*\ٺl)2K'uy /\܈curބv$DBy{J H{_|Rꞛi 7y~ON"+ƈbf9$+'+ u:<ԝ.ʘd`TRl^'\Yh{"Sv)%"qawe%t zS#Y2~g'#_j~Ot1[k-L endstream endobj 358 0 obj << /Length 1257 /Filter /FlateDecode >> stream xڽVKo6W>hGnζEiGD%No/,+=93gly&1LI (dz_|BYs#T?|>7eҭ| XN LY̦>wYUl:)VتL7Y6mq+]5q؝]Tv $ 󋛅=qe(!VB m+& jݚ%E8 3(2b$M^#pƀ>P xVUSӐ{cZҠ-7UݐIVVՌbѝS tr"\+sV AJ&ZU5b$錅$=R+$s$>]O& ,֠Om|&bXp!CF? <7J[Q6USCl/VYx.|R\C fޯRb0CeahiY A8A, W֐FӼZ_$Wou t,(=;CSpZvQ\o7Ex?u\YRO VB< s|bTT >_5J jF=؏]bՐ1BPO47MR#WʲyT檶{=o@Sm-Z^U D2kS!N%r~ǝ5w0|\˜8=$2pXކ!M,D!f`M}[C#{}Z=IK6IPKd?ຽ6P7|nOm{N쵴^z-뵼^Rk #lb0*FQ^ޠ/ I6m-!\jftr:Ϭk?sj {Xn iqʺU&sAS*:>Пcb8ێbd6|yXl[? endstream endobj 364 0 obj << /Length 796 /Filter /FlateDecode >> stream xV[o0~WD<%Zc|svU[m-{(`h\hʪi}v>Mӄ_sldAAV> ij|0!t`Bz6L.sbӺ\-v4OgɅY|[Sblim Bh_j[8.]38KbZr%T-8Kr$Rh6,22?l*`j?VS*IY>N.hwy$(H2C"{R^5< |%ǕB{YQKz gRTB.u !j%g Rv՘i@4a ڻr| @dRDfDGRUcx@;\2!,0b]w=UݍhBӀYM㍫Ɗmn}'1cź+7Us%y L3* 8YJlHr`Z; lbJPԕ ˳McQs&M Ί~azz>BWГ}EX6O>U܈2)MI,7_x?gwrO"i2j'i y'4KR Zis:+ٸe Q~ pƋőY_:GEEA|B%Ɗ+1VbzJks!@> =Fb; ++Yb)ۮ9=GV~:nV@2h}DQ 4ADCdg?D endstream endobj 355 0 obj << /Type /XObject /Subtype /Form /FormType 1 /PTEX.FileName (/tmp/RtmpPMH7GH/Rbuild3eb483282f4dfd/pwalign/vignettes/PairwiseAlignments-ans8a.pdf) /PTEX.PageNumber 1 /PTEX.InfoDict 367 0 R /BBox [0 0 432 432] /Resources << /ProcSet [ /PDF /Text ] /Font << /F2 368 0 R/F3 369 0 R>> /ExtGState << >>/ColorSpace << /sRGB 370 0 R >>>> /Length 569 /Filter /FlateDecode >> stream xTM0W̱90; V$vPJ(ڢ6V3Uj+Me a `< Pk *PޣQphv\ׅ@!3G0ߺ +8`o+Pl ɰh$HC,c% JxpCAȢ*N0&_|C TB"k0B>&$XUFBlz"DR#!ڞj0'UMEf&O/y >(,LH sla&Y'Jhf~^.+2LH"LOI Bbʲ '/32=M3BJ:&d k^ ;(9eY3~ `N|6"^M.ZSڒBVyKq2t?!^&$JFiօ'vY%n6-6a]Be 6jmwOtnGh;s/\Cê=@t(93uړa΅8냇Y.m-Yf endstream endobj 372 0 obj << /Alternate /DeviceRGB /N 3 /Length 2596 /Filter /FlateDecode >> stream xwTSϽ7PkhRH H.*1 J"6DTpDQ2(C"QDqpId߼y͛~kg}ֺLX Xňg` lpBF|،l *?Y"1P\8=W%Oɘ4M0J"Y2Vs,[|e92<se'9`2&ctI@o|N6(.sSdl-c(2-yH_/XZ.$&\SM07#1ؙYrfYym";8980m-m(]v^DW~ emi]P`/u}q|^R,g+\Kk)/C_|Rax8t1C^7nfzDp 柇u$/ED˦L L[B@ٹЖX!@~(* {d+} G͋љς}WL$cGD2QZ4 E@@A(q`1D `'u46ptc48.`R0) @Rt CXCP%CBH@Rf[(t CQhz#0 Zl`O828.p|O×X ?:0FBx$ !i@ڐH[EE1PL ⢖V6QP>U(j MFkt,:.FW8c1L&ӎ9ƌaX: rbl1 {{{;}#tp8_\8"Ey.,X%%Gщ1-9ҀKl.oo/O$&'=JvMޞxǥ{=Vs\x ‰N柜>ucKz=s/ol|ϝ?y ^d]ps~:;/;]7|WpQoH!ɻVsnYs}ҽ~4] =>=:`;cܱ'?e~!ańD#G&}'/?^xI֓?+\wx20;5\ӯ_etWf^Qs-mw3+?~O~ endstream endobj 375 0 obj << /Length 1503 /Filter /FlateDecode >> stream xڍWn8}W"Z. tq[o4"%V&]J~Eʷ>ș̙r6;8DAbՉCtGwy羋}ΙR\G";Rm&,Sa'EiLbcx aQXMoanһJ1"WvׂidzR+e^e,_Yyfz%B?X۩Lkt9Q=/ȕbvA xfuYiQ3! oK'\!i mA m'Cz+{kQhEΡJSЕC|~\/!SKJh,lEB~y=N@[_r-+^Ry©g~- m殇mRe/f_͚5;\dhj^t"{/hv9[ts7&76dzn"q?^_mƢǛb8|jnu<ߍkqpx;~<Q>=^]`_R߅<߾W p~7O7SW V&o'b_52+C (3ܲgÞ>kG&sdUfzVmw ʎ14ۘ*U{ECGf VKd溤 =)w5/ɜj_̕;?#)V4ߋeD gtɥr"ꁠ1a*£{B˴,|%60X<Ȓ3簀{bb 7PEpe 7Z4y)0 dr+ߝŀOFKYFmp+`O&/YG|j:Z75_%E5%8i_@3PLBw']&uNWU¶ 3<Q}Λ\oxLY6)3MB q c/K.oVg 5Ms2am-o @QuPn%N־q&a2;luB:> stream xmRn0+t$[V{ji,hJdjI("ޥ(1 >ffoJ$':)vd蜒,gIQ%QO8Uu N`|Ckt;é1f~ۗ1?WT 1d`XRK.W ۇPۡ)wol mO]L')ܹR{m9\7/X&!Yi5%mt V6́q A޻l$t^dss8W@c+7w ;קT;=wT7Աl+m]H"(;pE*5  #SR4j)$v,Pn8"&ί%r(&)2M ŗb endstream endobj 286 0 obj << /Type /ObjStm /N 100 /First 877 /Length 1994 /Filter /FlateDecode >> stream xYr#}Wz0hS[ڕV%kwDYr$38r}Nc(2mWbn\>%(Xa^P"4=aR/#;|¯.Q;EDxVQ Hh"-L'4m7 ضeT`Dl9x4#Vh+HQ1 Sc&hLT'hрa-EbnX1L<<N;hb: @rC0=c7c`Ma>dk Eq$4`DĤ1q\@eQ\9ZxQO`0+PT-81,tXxZ`0# B² C ;؄ k/u܅.B3fk=yTxz|j< S8`TDKPisN'BY#s,γ hԌȔ0qĩK>$zb"  1c 6ѣg? ؼUܒ2@_L{}on>jN\Njעz|0cTR,>6v6䦿4y$.p!+"%xԟˋkWBZ*gPNw ߭?ϧkæfi^+Tg[[-7ΎT6ʍzR>TuQ},~mT.lތ TU~]{Y9?gZlIZ嫿͗/b~} K$hu;_nXmڮY4a[Z^ϫUkSMTj^4n5m=]ê"t W9LJ.n\W+خŢoyWE]6TӚ?>8]Yt1HDʝH?ʐ&0Xh"'%Er ~9XL| ޮf*W՛Ǣx0%6Ee1A:%Wbeᝨk߷|XC QϘWxW@^lВ"ZKK$vᓋjP(=:vqfĂK]ɉIFHOu焓z"C<ǚk5c 9> T28StKu yl #"iƉL8 tXԘ74 !* !C848HDG]]Pj^>A&mKH2; FI,00Ô)ǀϼsfcmи4OqĬN,-=zp(v4]#.A:-'g,snc[ӏϦ0/VDW/v~12Y $[SIBR'%-}!mu nhz#Hq$zK0?s#f"xdP]tKJM{p}ۇ=mAE!i&r[_:Bu1lWО@enSnE'u#FD+CN^4#ziON$۶,n> stream xڍP.Lp  .C% 63{݂ $ ,HpdwUT1to?H!m +@ܜ\"Yuuen./'6# @Eu69⁨u9y"܂"\\.."9; P:،P+pV,naaA?N`WB؁N9tV0_) rsB]m%X@ [hƉгtBm W0;BܜjMOڟv_o'8 :@^g[  TPDx" gD#rAA?Jn !p<\, 쌀cO zw/_upz8@m~a;C\rqLl? ?{Zm~lAl_>p;puo X!`[3`?]!cq~d0k?F 4Re:ed^?O HO;[kAVvOs8pWC5.a 7?,տwZ>~0~XSk9C~+A qڿM"^PHX@˿0h_=lC{Xn3?8 #пvY ?C9 {P!&|ApoH\]f.=?# :  C;H0/2D O&dϫSϭ`f߭!r4w^OwAeKk?{h&724"HZ=XֶO|}I`".ST )V#6h?!{).d-$2M#X%vG5A?uot{!C8@^tJ`&}7RHjz|%N]S:A;d=KjLk~fZ!sA_DRwj0Uɕ74ePfvD=FXwp5l$,;~6! 9 ga3|w]]2L.α1+ Eo2jcd=V|=WMՆ~N?W^@EowBc8"U~[uJSoT_~uDqځċ^s} 5xz*j0m+0T0,H-M3PIkVixاmP`5 1)Ώk`bĸ镓ּhZʙ;ă09=ᵥ8ǣ52?=2(Z he'9zSNЯmKfKhr5ϯVi%S9Co9 /R:~6rۿ -CjVy貋- `xr`ee|Xg$uaMnGVGڱ5gͱ( uWcR޾ )f1CY2ắqqf1ޣ 0>s"4…+yGO xF`lH.O9+cZS=^L( 5: 3Qhh!VwDgpdD(jWPW+ڕ;1j¬(S jBo\iǀeR`;Vu\)Ly&C{՛ -gK'N k! t -nOAD?*Xuuؠ_6{(Gxmd|qYB2h~A/0t_#szU;& z ٶEMKFu/4?9oTL0 "tʯps2&\UWg(ŋP,ciֱ҆8N?-lGb&_QV 5X Da\ɡXe~1i"( {  h__ >F|Vji}"qF:)^JEaʫ(+=!C TXïW,rXŽ2y HDu``;pS+h^kF׷j;3^%կ&'_^Dz5R6Gq+.!{&h GxµonPC2m X*?A&㾄Gюw5D~  Yϲ,u,CxE-98g||' ҬYSj반K~hӆ,>J^"  n./K$b]~wTaLhb.}B^++sZ0Uv VP-Dz ,rǸI$gz⻚. %՜e#70iԅ(h/)-4c}һ5hQ)"Y>Tc8'PYthR΄w2CIW*2㜍J>5v_meCR͋Z/&4:tFgRɶ\`&rKt6aJPX1˜Mzm aڜ2뢱 8:! 4 c*$wUMh&/birwUX\ ,{dw$7)ER IԴ$FH5d6u>C>K Nxq:gOoؗ##kW.]BXTZyv2CS[jߋa/d$N:.[D*d6(I͹ =8ot$$C0qggN iQ z˝XIɷ"jŗ3B(t\m 8Z9pCk60#6Q{w",e\ov gQ];M5pZ pO\ms&IMTUg|^v >s^2>,OT}| >BYPg͔ZʼnO^T'ef1A wV?Ft`q҇vpWh*;Zf$ V(-Rr֤ҡ&RjI 4SaIv=7<]{.}+ԕY|' _qo'HU%s𘭸1S+ITgDϿ ͽ s28jT_[-sc[͓@5WݡO4ߗ9G/ٙ(ZX>:e630/6%vl9=X;E{V~rjG+w0*M=ݼY2U 묓tn"&\C*>?c~@p%X$-:\7|ѷ'YDqQ "avՏDSo3XRӽKmJ>/F.5M/$8ͯc8*uNaABCТ7hiRÿJIՑ@ N[AE׭(E\ʈ 3b 1d6 $^ ۹>9 WnNpY+qɧc!-Iq7zBj ޹T`awzųWN^|(Ui+@/m_9g"y=nqj΀1VMD+[V} _$#C};f :&w|;_i XƎ22j(qE3KY9;|ԌMh e/QqR!F,:K(XvemrJ;y_{[IT y.6)vykD}[B[} o1TpX}• HӒS5iZ))O:oMs7} 2Zh\aR.fy{-8nⷀo!|A]WSxCG5ƒhz#u`48jB\i#|'w4iX]>m+ R5 =Тqo'߲UOi oK1Y2{.i[׳F@3^f~MH2L2džZ{,R/N%ѠT_e'}4Tу=bI}:ܓ !&)pr2]fPL]3~u\X[AOr.#zWl ٢d,˵X6,iʄlByujGC1t s SWݫ%W>&rXᣆ4{ gS]Q W*͓]ode ,OZvLy/DnN庹7>rDCn5E8|P.fT1=|6NW=],Ļ0pl"2n 5MfV[K^E]/$&Q 5wEj1ȪYR׼Mu5DNP+';?u.|Ruv8Eϔ9f) 0'ʄh; \iԗ'].}l!+zbt0q%j/$vP^@v磬3Hțg|'Pa:L ;zk cvzꬆp]ŨF[R(&bХqF]9ڻz HY,$F=Yj"MyӉG{xf{/o_ ˞g\w2!w~+ `a@N?,\`6#a̘ZSVZ+?ZzN,%45&A2; 6 ӖyIJ!Cb.!]up, uW:ӽYΖЧK훩##d|bxl(*CY0pZ) }ӷ'-NnVZ|z45z n< N X^{/FrBjEA77v"&A:D]a@3#9u4)+ qіԫePf,FT8Wǜ~Mh3V@:%]+P[Z9vaX-1Y[3.fGJ]JlʪDPi~U!]taaݪ?rmqRҔXMrU{"9ImzY=7K 簁O^@jY)Y: 9Q~ce9z5&uOߙQV?"ƁW;1JäihJh`',W(~H1mKlstxLь@bzI{o:V~/ίt?mL֥ԝ$y΃-W\Kau$wΡ tFVTU3?EqrHRV.Eq7slQp `Gm|z]Y?`/7{Zi9vol^%|rؾYbkz?q+~ynU=eD6CRݫ,Q0(#w=XNk*5Z'0UH 2yBzd1M߾wHtgg^0Hn j7c!(@,_%D$Rgm`hJ.Iü,V_?th˙L.yHpԊK]Z9dSƢz AyA:)Υ{X lz eO9'=gn1(F E}0q6'cfkQƔO.^S>Cf?9+Zy{ܡ^+GUJ= jH=;K`ZDb3+ Q-ɏÔfn PX,X=a5wDv<$93H=դ >@7P~'6Bs#x?Z=}_{wa%5|hUCwv-Ve#j ;,deM/:EF7-!t!#lq#RuӂV.:)hZ>P =1naPQp%T&c9 TB=j5XLAd!%9YfSC+fKUFTG~Ig5A\;߯† r_c1H(lT+=e>>kK I\mP6IӸ50uß!|I6N"O:nM|M.tY% 'ֹtK鑶@}vY`=zWFmegܧ!s%M\r&jkzYBiԄۇZ$BFDHqr:ZֺGm `뿌~_0'|%R¯UquR(N0@*pM yeEpJ 8^_CءsQ@o֖u){olWgxG_;ETJxu׼ LB[4;I_:Pz7j#Oh;?CP&پW2f Z|9fFB/e\lK2RF.|J(W+og(v~Q2z_7GU .piɳM4DLU=t1)[@x&;D8]n f+~ɂ0I< nȊ>H6(/;ԧlXIdUwNqJ+yz9FXԜ`XъǻڱTUNKh~Nmy]EcP }f8f<ӫGlo{b j{ +563bb&rf*q|SkhD9>30XQ:M\Qfn8^y1 $ng1_/A5M3e0'FL^BS]%U8Ԉ%' $k/-H !gYҌDojKXAw<:r,ֱk蘞Tv~yhMJs'\6b-R7gr(7EOǼ5;s*lX'vgOQxPc=jxɰpS^wN?&nI_nycu_%FJ$ GOq?̼DbҊr٦qi]EƈվA,piO.)9N\Glx FR"wče s:@gu{8hiϋXw4֠f}^$FkyݐZ7^xf C[eA= "nJPE6Ќu;i~Jm~tqyd endstream endobj 404 0 obj << /Length1 1550 /Length2 8208 /Length3 0 /Length 9249 /Filter /FlateDecode >> stream xڍT[6Nw 1tIwwH0C% " Ҡt) 4}}}ky~>{@!el w9"55%A77/'77.ˌŠvA"Eq6Y (;PP-@q`f+PXXw8@ Aj aG+#@ {+" rq:ي*K  ԝT 0#rn~nYC+M[o7o r8zEx;aԜTU[Cܝ׫= A@>Nn?vLքH]ר9B`Mg?rxx?` a89@XW{V6 <qvu._HP7>o$, a߈z@m \C}` ?noNfN? {眭D:/(<9&P2/ y8&J0rSk):yK*fҮO}79wm-gR}R"K?)w@Ф}ԃ)am;ÀзHT!/ީSWJB) o?#c*XwJ ݈3PϦUJ}epdh76X#d>N ':D)ˇ^RwSPRPZx%"i+g`%0vi;~۽X]\OaMT.Zc﷾5֟(Qߏ!u&K?=(MY_[|secS$мkD,= X9b<4~a* 6SeiL;]kR&Vh!2VcQREjj %BhDm\šy-؎'6g֦!xoW(n?|gIT}2:!&ˌdf9wR6PQR d w%fqؓLՏ]u1gTH'3סGAN ފ/Q4<ÏyZ1~- 5.a'< [},Oo5_F5@ߘd䛶 Q\J,[=*(5,+U_VU&uE;KBG1oX, |ha7Okj!~ˆt=Ayqeʃ{MɻTy8%:"Y(r7ބEdq+("|u}ˮy0w{{y rU~|x>QimNKY]fN=O'0y`k;EP&^B\^:o+#!zQY>5xhgtyg=Xi+FQ􅂳~d^"Hl03P /$Q܏b:oim6^e@'f^Dxwa>2xˣ{ՉjuE^֦g 3^ym%_(O? %);,dbWbi YT&/s 6C(rkKH 3%4p>@JzGd| dkp>*~O)+f0g p W=.wxYWv>/c@RR|=j|^ G:Jibj5IXu8ňsDcmw:|)ns-I/GYjDFǹ~ޔ^le5 ;[L> 9r3(n}u'AZ{'"FZ-}NVe]M-{l!Ch.zDjc8 R[=_B>Gv iwے,\c$'DWtmvfNІeBi(uU꒡gum{B"@_"v-OUb*T=QHȗu&{"L\hUMk4q[yB&KG$9TᕈZXm!Q bk6SN'!Ff?D*=#D8PX&@^bq jj2'hD}I~nsW*'ضȫ0$l==R䷃ᔜm)1~RfPpD~hch?,Lj> ˋ{O eRd*|QJ8j2r[WIBaN5HXߌH U M(fN-5I[xGaŗV 3gH)$E륻w,BEGpv۲uDfDJm!H /wd}M_B-k3'CcBDJ. w؎E`^X#ul%|"oA#uΘ&'2Z'!W;~GU1]!*sˉ+} tYW.n䶼aO32u-Rs|4,ąw!,aI`PgĦw,,=%]| X?r..ֳF;3%uM'M c*?N%djy6k+P~L(h,LY?%Qԟx eFY YZOt{u_[&|̯˂r~|zNmcCEjW[temRod䫳ҍͰ$8 `sIG64 /e)ֹj$Ly$vI6pt)ǃ `z +$tL'9رB]R! 9Ŧu(^ƒV71W DW-W -NyPz0oo斩hKy^׀#uGCJ!~N %JA\U2;bӛx(?P5c&yAju bCɁ=7Y 4'RXEsYKII˭YuЂt5ci(QTm M]v~_d |bGm{P)aE%'M4a D^a3_4?6F[O?oUgB-rIc=sFt?_6xIV/nfO%xEˑښyg40uMܮgծ${FdwJ2՛azGсQfsNQQ0y#Q ?ς᠐xU)8! 'yy͂fєlb2no-ͺ|,z!9rf,Z/ %\R4[>lS)jȪ^1/h + Qa Q˲6h_l2pTh} 4,sZEqEQcч vHm0qTI,G ҙnz쥞&/ijK$]GeO`ྒ&k 2ANx=jMo9sE^z#yL]`X1i g[>˲]%ӯ3xcо/1,=<!-F=͡!4.YQ'> /vT.mkԗ:JId(eoUvU]͙wF)ճ<yNW$eRUIi,I1cRGJטu&vq`=>xĊFv,#;'Ƣmuofv徜\~տ07sjSc  '/4L܅cR̰Go4}2ܤ,XdeaNArVq-:%<'5)ў\^}K?Lp]ٚ8xSnevԝ㸏UTe?#1+Zǽ"npk4r \uqNrL`zG"DQȢ5E2˸R,'g||-ϟlqks%/K,Ѻy["*].I`VOgS=a`e"*f Oq9-5t[0X~՝LmR }"H`s͙6A'Twʗg|1`xH̽ەUoxB8ɓ-}CPXovISI,iX"i$XW%XWb赾a>n=i;dǎCvA@T5[oBIF_N~jT[r({}m# :f1P3NJZyFǁS³Џ׊kʆ3_ye[OL˾cXV/*JuN{J3>i'ۍR><2JbN`)y#5$7@کiăj=/f(h{vAW¸:a,pN|,Rq +ouGI;+tj;J 2'XP\ SQ]O>s!5d2?*m[=K˱Ak!JŢ .fo6yJNjGn<)q@Қ`TVsq3O4=r xl*_|zsN2S+EX]\91r#%<4GGmoRhr!VFu&kzᚶb2eIG"ܯE$Ajo {NU/ C-taoǗ >Pb4׮f9ewGbhhM t2*.Mi{SQEÏXIqeUʚ?Dvn|09Ts2D@c$ *A>hۼxg}@;ظ/iXk%yrgAKqJP@;;b)ZB8c٭c%)Mٞe@g/F; @X\SA$eEY%CnZUS*aj(\$WG^`T_X4l{@6СMlt+eZ*8"W ?q=5ُ\nCO.%&>\~7$kvvDI/l7z>3>^Hq-X*av_رY+!'F=jN}L~cpNj9s6EBnz9Cҍj`6$\IǟM`ƫL%Fb|fW꘥~ЃS{pִOfVwӍA, 'R#DuWk.xM6,/7#y:3<$;3E'>jbY~TJt#۶l_njQ5`ڛx?%|=ǫo#d SrVQ0Vl0i/~Cib:VNl7JkH4+;~^(Cvt +_ARZ"?B^׾3AQĢiH?D%*B69N "@z\Y{ĕ  rXxB^ǝ-Gx;šĆ Yd0[Mhߘ`jpS=J=a2lS-:Dǿl QU][6w\IevC!Xkc3f< Q!2bV9?IHSmoh'BI7eki74a5bsP1%PyՊI'0ݑRosؤH2TΦ|ʍ= \yԟl5NbKO\**6X$wC]BL#ZYz%[џ[}hL7旜LV_n}mU!B^7-FGqN;y~<]X5*XNu;chV.٤]|$(p-E4CLͺf|ִ=S>uzVG/|g&9+qPlL 0Dnp7>Ju_XFpcuD!)fķ28ܯM[1)mt"\ٴe/Oxt^u;Zˍp4ah,Tv}'/Fit{Jc_kB056t8 BnZh%YoursY<ƧhumabBm@e^s"gIc㠭0ky5Mz{F/p]/To(\h:bk`b) 5.I/$GAߢgv"o8[sr]l4ȴ_n'h9IUrͅ]v?2[?sA{scHE^ecܸz:=saƕ7t;wH/.! 4lsjT3d;>$ܳP>dݙKA* {p[/75HUTC{Hx4Ea?G1Moz\fhSolj^ITt Uw:=0pqX֚ L-rS2;ŻFn`3$j_z8 wtܜ^ Խ#y>h:~*_}^…Ν9qSd4dzXb=6Sy cᴮ'܇O=Wj,w;fD^)UV~h?jlmy*.>ӕܫ6'[w0>:%6A"4+4\T.ۺ*r'v|8Vai>LڲlsR:F+o|3ݷuĵD@<`GkTv,mߩoL.t  ʈI ,I)b<> stream xڍT]6LK) CHI 0 04Ht4ҍtwK|>}kֺkqD)c + N (@N9ËĤA8 p *_rp0Ƀv0(@EB<<^ apQ< b  `l&9'bkx#Պ  C@Pav| ЅYA`C Dݹ@.\0$<n`k Gqa3 .0;<+0j .誨?4 O@pYY@PO`qչj{  ?2et* qBp@~;Cr0GG0;?ylvO?' sz%@6vuևB]*<@l {XqC ?Ts<؀]@n` oſ%l ` B,(?`?!<^>uUӐxgdeaoN~'dBAJ|U60ȟ>"_w,MkܔG. FMH5qЀ=M .Zad[kCVvO\9@`m |V@?T P+p'Ð$7a!\l`p- W['V>8]7"`jy ?syLlKpP GyKdnG Nf'Ww{W+\??.0l== n Cι6(1ʴf= oq~[UF"\]oӹ3[Ot~ܘ=Ykž&6T#SOz jWUgW'yJJgC't֫pnJpFG1Zfc 81 =G iUaFz//ui%g$7&F=#aLR%..6 ^H#y)&ozy$ڭ p5 jBuI9^<Q LFvܠ>jF{c#6_{q^&7^IuIQٺL"fEөK(`3noA!ށGca~}_' ^ j[~M><, SeMWh{â#V *>\ /U_:f,WQX0 7oס[ݍج 9\>/+5#ނ\V'sVT%H+X {zntu1xz+o9JUX 5Զ|_Nv&ysV[ӢuɊt-S,*yKo,eR䏁#+E1TT8fE ֨;}wO3Tc=h Zx$w%, ψ'엽2ic%*]LkC$L[oePՁvQEB&&GyHMOW lB^bZmGb"$='Sȥ g_yWŅ>Y#:(Ebˆf0nPL8.{8iN(~SuLxY<S⃙z/޴xj*cp!'xbÇt^r tw_N='Keb p5W߯4}9$tH'eDEȨ'k+&Eu~-~7:t Ƽn[ER,Rk^idǸy .%N^vPνi|z?Ըȶ{8HԼArƲJY-4ԞO"XB;xxF +U~m]s&.팁z#/)0'yl-.TS9\a5y;ɼ\SsӚ?j\6r/uxQ3_-ĊOClDTۡ_P fkMS!BC^Mag:u+׫@Y =:6\E?Sgב^b ?y_J;R*v\7se1HEkO01Kz</[gԊԡ%;.E$zXWYoUn4?MxjQݬ![{5v; C+ *,xSٯ-vu&,&pۜud:0i+Ȱʪs e/^d:UzoaehzUĈnjDJA ~>o#e]57yLPťh.  ͖0XW\VS˵ me[及y?o@:@e"ExW{y.oӡO0UoNPgy|X lj(+ rӔ$$¨cN uDSŊQԪ=WZ(z+#C!yhm ZskO8eE.-%\8͇ulѰ<ʸVf: qNZ'> J 5dvT58]t4WĞ 9{-+a (Tm&Pe5$)^_>;4„ kZ@Iq ğ!'0*z m&/lk= UyH 7^w"J._iE}@oKpPfo]<{ (jDSw픊s=M3MyEPA.GF5*gS'vM7=Fg]{F`)*ȋ)0SAEd}ݓQq<(ZIdcA<%S}RFI1P'Ah&n5,^ b*7UU=5If#pvE&b٫t˅vk= \.pŹi> x ʧ&/pN #u{y)e$QH6JjŕLvb<6)Ӎj.B11w~zDadhsCجS#Bz>S铛˟ʦe&=+KR'I^!&ۓZn϶%:O& J.]E^ 1_;zVAj#z #;˚Q?9|h3i)Z#QT?KtmS^-_/n\:yfC*  ʋExf,qi##)(q<.t.&:chRdyȬ-Q|,o1b4dh%78ŅR~μ-<{K΂kp&@S&(ʣ'6H<;KaS,XgJԪkA] 0@70@j5r (*_܋;W1΁hܫ)&lx a똝ZE('U,7VF~8- }O2ztE79fV#ʲ8DO ҸxƗo1)}z$  Gc/ o}ok:kp&锿Dj)c5)V3xZ&6RY9;T`][ꦽڎ*2@9Re"B`ȟ4Xc0HuE P#oY >7JF[jO7g}މy U)NM3)Wu+N׸®sw6V<_οϼl}QöU>9Zi آV4rE#|kjq#-jsјkG Q 2|X"FPiAoX=Zv4q^˄CMFZnˍE6: 2_k(>h廴sᛟҾRsgyyaR5Ǘ~vw6] %n:XLJkaWW~ĚogGk0)lą\`ɌSZ9]T^Օ1MW| BL1%-i&nhg.y?{'xi@^u'H<^L& -JJ~|zh(`^Nŋk Tj TTVyʮc:]),:|=b oiNhuEK-fq%r3 /?4`4K-dվwȼB}yj;(CRaZӦr<6p:L*''=x@yHE:a(kc/@NO6E2|18q&nT3L>TPXΐ\`|G||KـEWTޑs c zgrdnA\(Z̬ճsj=ͦ9Y]ˠѥYm7W,K 01fYkQ嗛lQ +!fDt#BPfRxwx/6=Y1^mD?@'av@LKP'I5 SBmU4{F5C8Y.P(xP/mŘ&'I>io"ڟxj>N>(p,J;ؚv9w.*,%ASC2|}2!jՙ>\ʕ͔gwG 6M(Ze0#Sn|0>9%xsqX V"Пdx M{(CXc"{LV>|@ֹ%|IӴT GA>5ʚ'+E.)Y_zD)i< F])վJCrȍ(7p=&skW|vxQ2{Xj^Nx蛍ͅuc#ĴAI ~SHC!ho [˷%-/q,[_H \*] \A co{ Lq Xi:JEKKlD'AOZ8^Ohbغى@2ܑL4L0S,=$i.;[(GOQc6Z*x6{jM-J[q1>=a܏j)B#T92r,W]w)MNU }C"z:v2AEMgWP~.t8 &‘)nvJY0#dZ^|߫vTjQ&.İw %~g$ )mHb܃QJo+עū{POKGsl萨t8?e n3ƀxB !&[?3$9i6E4 ըQZz{T,qO8یTQxWT6F c!f,TD&`@'Z )n@;oA{J[cvk=f†X;u;U07Ri0i/=cou_VU7bXMPġ}/"S%BYVDE3jEۖF)=S1X3&ߞוC\?Xɣ_\wmm˱&\Bx.Cs{} i2h:Up2DgUtft=/{7a;1k1(9O$(㸵x(dУNiY]Cj~9kl `ϙWԮģvVkhmqXX2 ݾv^{Y1~yiS$LCEea4s 'џ L~z[njwcB8ORVV'7WdJ65A{ʅG9Ώc/)`JܹX[S̉N6lݹHN& 7pP!-K_)݁}&8WlB K%(']|Ӏdl׎#PŒLag[XQ|w0o"HJyHm|xuٙRHgߩz Mg c^cV.׶Wd JZOYXo+O 7[}-7_ZRuge/< E{ f|rn]˱7&p}#Q.$Eѻ5MdK/j5 yO\j\t\=:agD24d[8Z1V8 gD߾]7M0XsgsyFo_toYQC;"66@ڕF~~wWdnK#^WlTWgmJW*yMKKѩ-ERF9CncrcsIkoq#MKWڻ(3)J cw,{MR~Zi=3(3 j썩ɸ'DZˆыkg )fSpP3543{^$sX 覵2S=ky:u|N瘴 }Rb*F\Y,eX@w^@\A.;-*1#u=)Q=e/:Х˼>c[/lvkx./C[[YF߽Nø?k4y5rc u9ő@,o^tԟ-r.4VyQQ+O-ѵ_"]?^c/S endstream endobj 408 0 obj << /Length1 1373 /Length2 6101 /Length3 0 /Length 7047 /Filter /FlateDecode >> stream xڍvT.!RCҍtH# 3 tH4ҍt !%J+]sֽܻk֚ywyװ3)!l*8O( x/@!BvvC(솄" BuJ Mh;b@ @#M8IuUupr%$x``7- `-`QH% A\$<==A0$?A EA`$l=. 3?!;E6@أ2MPWUsm_ _멬XtU4[DU ;7NwQbE |)Z+/{0 ן@Oܽ0yݣ4FBA伛8磣QCQ%0u_ "zY<lu&gG:pk5Q?:FQQanTxu+Jb⤑DIFtewhay- kHRCN9?x;9ڏ(g ~%~ׂ+H{.evb?( :zyLWl]@:csUY ?]r o/pp 4O6Ȳ/V|g97"{mF^}}9!D S:X76ODI3FSY)g)UIL<ߙ$ZWSw8˼oTУ?=~7dp|zv6U_o\Kg쮭9"/!xxZ2%:R 4VME=Smi-Kdc`0C̑R5|JONdr}s/)߀4cFqLMB `roҡ[ T k5!wFNxVfy8ZUIpN5b[%|W54 C:λ O\%Fમ0b}'޹]c;+[?=)yjio[/n!]7n=b;I ,wiYޘvzDajrW19Òi=v>P>D{y;z;SY 9.X=zܢ2 _h) ˸H=a$>N3+a e#QX1w_4XZƹFjD?{tyRvnk#Am#+bcu'^gM(iTUHipT* 7^E@]rSrݵ7CYe*0nK;%d?]yS2G彚'4Y>ء2!QGbɼ .HDi쯡>e8K=)sXW2\-70bԾuWMҲY 1OEȊ̘P b i7,[in2Il3(=vaP@`Rܕ4VUz{Ma_V<[IBx]e#h:@f̞y6VI%ݡپ5\:qB>^ބSh<:Me*/hH&75uGd#v|T(lŋIQbiLQrLڟ<՗Գ:{Qx9yn }_=A'i~sHX=#yUľ / Ԧ7ꫝ~E%9,ܻA Ӊ޿`X#I/e#qF\_:y]X)Q$9I|jX/J}0+?3(9k0 "~'+e2-O~cSS4)ג,Md'V ?,*F->W٢~Qt;*0te W.p֟.\V *h<XDEF\PʏrsTZkq#n)޲fI ǻzм3 4e5߁i mm| .UAzƖ{2r>)D{S5Z8&h"G̉էBd3|lIϞO-Ѽ['R ?5AX&4MZ<5tpʺlD4ʂލoq2V?̐.joXZ5mدN(8eu~)C/p BtvsPpEKbf>fb0DU7g ?e1BDywa˟l_ kĦUM+Ip_D!%\PqVOqT{to]S{sQ^,0x=Vezsw= E CMr :a5d8Ě;luΜpRoN]qKjrגt|R%Cul8cڹ~m8i"dQݧRG2xM٤nfx~_ltw{G}t=9\S8m.V597n?59w rvfN̠,w+]][̫*(G cwiM =2۾L\ʢk]:ɋ  InZx~iG rʔd˵?edPjPNWyL1C65q?RY噵"K!"jLd ,6TیPȲ4:Vd?50>dN CXzZD!{횣a䷧|jپf]q1]јE!ZKxLef(Dc's X-|#e f%-4273fka>i|Κ{¼%k(J8Z[#$:g} AK}UKNSKS^UTUc'q.fH~Řcؚ-rS ^RmI5ޭ 0F)~mLW!=8Uom>r+ZI2'i<̅ܙf&iVZHd^.l┼~6Vk})s.$pz/%y[#KIQ6JTo bb| endstream endobj 410 0 obj << /Length1 1452 /Length2 6575 /Length3 0 /Length 7557 /Filter /FlateDecode >> stream xڍvTZ-MJE:tT7қj)J R+RH"EtJ~{oVJ{f̙=gET\  PDKI3$X}սP,Ӏbf4PBd `0P ![C: $4 Fbg  !b݁(7E XW8 " 4plBpb=@  (G$,Gb]&p\!]@hbX7<08v{gMu_i 0~GH3W ξ s4@i0,+½+Wp@Oo x!H@@/< .H4xw kwsAZzzƢ RSq⒲@qIi0!bE_4,_k0e+G`i0 oE BzkZoV π*KDh!FH,Ta|8 /?U0wo aM  [Iq:~+@cx   k@Yك Oc#?>,AX?r@Papߺ'cn;T_WXLMx7Ӑ̊>PMi)2}[{C]dp|h2d~?lMSLe9+"ܝYW W(пC;lcؒK=ӲOmËG򜞎bŹEhGr.uE! q6sFIٰr3|S[Me+))-B`:6O!{#(Jgf^vLd3;D+;t12׼ͧdd\ aWTL?ǭ4ɭǖZ6֢1xډp! e K[szaҌJTIb8HĶ"A+,,D8(vEc]Kn6~Z'v*'v7 |P?j$:z{{r\oA{L)SESI,;]0Mg*^;"\>Mƴ=ׄj?< 3\*aOhD':mn/:dVj c7W[nS"NFŏ}:o}{A5.И M{j5$mэx8ǧ<),:uJ!wSKf¥u/U0$ ZӴt̼bՕq z1?:Y9T/4W?7ٞqmurzV/Ì^:5ΐOq-*VM)lJ4{$(% ؗ4$-ZjYk؟~ٵ(4ڪgCԳ]b4y( fu}k,3>3:Ho܀ Ua81?ժwfvw[U]79ȺG&ʚפH"x%Q1w%  ]/x7P@ 2]XOS~QjMmJUsq2.x{hRӲ+`Y.? 8PXt6u0FT&5;.p*&gkR+_ |ޕfL'^RʤJB" &PT qP"-®5,m?qM~`"JX Inq{ ||cg|% \]iZvW޺1%R{ {Sf6=^N( buMˏ$MBJ55O+ྴZ=]HȰdЭe;^{:==_|f6 H/+UJKKWt)1~ 3fz=[UREHy]+I }%šG,䧗f08f&slZ^'<團hd$tjM_Ȟdn&ڵ⇑ }"\U̽[6Uwe>lָau; mbɔTUi8_pCwo@&8UT05CWw#cZR9i&V|\EG>A~Xs'e _|u#|k7; B@/ȉdrZHJ~2Uӗ3#Ba::5^;MCYboT F< ,E}i1Qq":%'yPIG,+R8K187O,t٢=ɒ{odwzYILKoM5+ RnenPPƼ𢪢=pwWQ ,ˮ m]}_M^ާt"\fRdWj囒( =d&aO?g.nb0zW(6>z~T%@em“vyj9ߕD-fi1>i[y"C~'԰Ooo;zKojmU v}!fy{P7CDY\Cg3?OJ1 \CTD|DpLD&_tYaz5@x 'S tv{еn/ԛx2fʠOC9, 3,n){Fu@b0ާx*WL erfqW_s wLsiD& Ԩ_aH\:k(orlfSޯ'U{GUS[,6dɪjިj˨)pwv.ϟt_^?^\]5/aAj>Rba#wXf3i!GA_ s[N՗)!2a+yK$C/8YqO~7U+e=Ul"͌LFM i'֭l'H*'k2P$ ,7ẝ>_}4vmbl#@o}ʯRN[j*e5rBk>I>XŠ/$S xѓ}(j{M)TgQi}mE U=k7E5lB-o=P "6Ժ7%?HH KPzϊI[KōKs>ZqzUn̈́XOdJ캟 vB)u`{|L;٣u:]ǻ *՛v= HpmL$EQO Lf V3xLrtʚES70٤&Uc} C5FR-bjl_fp;%C bA_pv)G @@_.V)3-xג`}GJk NtbP"!e/&{a1*9< .GgH֒ޔK>פM͙zX]8~⣙]jNSb^#=y>4U$Xv-*Rp!yc0kx4G2KD"Q7Wb u}:᯶XP*ōUOS&wK=%n/ns UY )ۍJlX[ |Ӈ֏׍WTYJEsP%W/ʗqjK5%jNa.?S_f:Y2 Z;. 9^ƃjI۔n}ɤbaQbp_|HU~M[rBtS5 G4 j-^E~cPȃ9j"qn•JerwfӂeG֜٪*XK4ꋳ,t7&PZ͛4פa?7R7y$,~'&]9G6dL(ia}aڳ)J߬jkM#[p0='k#n0@={CQKd2nV +_\0< G P7~Y,)%geU'E尀7ks_}_2D+o4!] "omǨ76跣r+8*A/*1f;@44`wq»2)w֡bŅ#nz"[Q= )~ۮ.9>2}'R\#=|&vK0S3?bEgC^+\.kTQ6HR䜐ⰷ/<.ZQ?U#֎HQnb)řJ:A.H<1s3х >0fLU=E/_f!F!y=^h2\DA*GkCF I埏(ʋⓑ-uR oGd=XJ\t/bA+vsXf5٥GH~)Ϳ#0d;zbdeğl;sGwrj t endstream endobj 412 0 obj << /Length1 1518 /Length2 6978 /Length3 0 /Length 8010 /Filter /FlateDecode >> stream xڍx4־ ZщDe0f0F]^=QBAD&y{}k֚g?{~Y̠Xn VÐyErz&|^^n^^~׿Q à^n1矒e{|‚B*0[8@tQw4g@ G) `П Po|gvWU P? rݐ)ЀfT#_9U DM |OyCO6 K5 ȯ_6ԐQ%&0j 5lB+ kJÇJo1xap$` wXpD7vP-?`?_Gǃba( ( t qFE۠T"H+[{z= Y8H,MHE QxA$Ǘ>ؙJS.V6UӴ&g}9.<\JUC)=qLgN\#9LС%24cM2 |i* p3R#cdr;w_00 Lw5KwW1F=mf14j0Rު Éd ٪M4Y3xF"0% +ᥟm n vr4g%ri,׳G}MtM\PibBRǃ2ן*F^J5K< }0qtG^g/vN֮G<8jyV%'-Y#٥<Klc~~\ZF&~<  A0n4{Fu']3Q$uJGg=4I[kixTAEAOk ,~1z%=xinB-тad$$nfX[H͒^UK\9skWoHV7S!x?%n'KZ:^: =k:\oU~P s@qB7N }9TıxPe[ռ)l$D\kp*A^7?7~%DNet|Kpf{T)!g2ww?b)#'f%5v2QHjU \Ѓ`Cx]ڂ{k^Tw]2Xrv$WHZIE3Ɗ)wuR0u( -rȇ:XB7eT.-2<3ImT@? Ul}FT&<{[~53kKf(%_fU}50ThoP!uZ+1"NBbAᷦD$](hq$QԶ5 @SܢROP+|\,$Zߨ4>6,4?ݒG?FC퉅n"ܻ-KtGD :[\߉ՙnuщ8{Ē7('z9Q~\$9PeO醘W~βVbd.`تM.b"z}#VV keFFh<}o~]sNáզWTOߞǀO+ԕ>-4沧v2Jm+˺Gj1}mX!55ixYRtFs1&)[Iph﹣Q}9-YmǺe}O3sļ2W{N!mhcDP_`':dU!^9Iqr>/̣MVh~x3ѿꌖ|9q זvX ĉ{B9, S7CDiІjGrԿm;<??Pc)0L1WJAߏ>|U$isय़`45mҥdNn2$pmkmrK< }awKi"3*/jgܸ\X|5odAuL1&X{|<y l¿cS}%n0q&"8^[Ѧ:D}x3ph ~4OCss6)alTDXn-aWN1s{܌Vmeu 4Tob醰^q4Sٿ})G+>S+pr_QC (WzMB ,3f~eθ[sQdYңB`G 9*J.jMیa I0lE9EwnJgr.ڢƐ$8i}xq'>$Ԩqu/6zϛ~bDž߫oO&Ɉ_P[&T=<޶^"s=X2BK}C0dv jE;jW6:{[PWk ).o}#Y&띄C `pꆴ9bmRjaނTnJk:7#ߞP;%x}@5^/]Sxky#3/PC%KgxHDi /8{mn ׬^_Z]%f ߹[}ooCqytGR32vo$bqSq4#TS$-zrn,MHݧl/2 @=,*Ni_I3wJ2N)o]bi_+L;亂گ3_Ι " h괇0qfD/i}X3Kpn$!z_[cI|oy ͋;0 k6I[/:GK.QhHn8yՖ x]> dCE=/<)W2H W6gގLLOv0@mrF'?rҜ:/tfDbZ8i8yIҩ8fF$ KI-և[L|++x "<2N^+q1Rz&G߉=wxz?eeZ'rjvl^IJg˩lgCC3';-5WLLo$P} -c#-b'6(J%ܙAcS.y~wչu" qW(ALWb=_{̎Y~umIIR^;rUpOC.ߝz\nyC@QQ硯@ho}8,EZw m60Ί)N0o[Ć:S^[}6PĬf6L(ֿ~bLk͗԰ڦ<3YbaV@h[< t-dkL/ H})86!jD)ĝ>>8YDc8n1<&ƙDI㨛{׉o\4^akr-:}2PƋ┫;NiwXH"37%o15'N&cĚ.RR;=R nITd2B?\Tәd ^|,g"7;/HrT`:\VYIRTYV8zksc7 [?/ZNJnXp).<-ŷ&+ ygi;XlȒeط,1>Zs TNѯbpu:)&ܰmU': }eJfѧ3] {,B[Qȕ{703Upx~< zmr_|b4xI]^c6]AS%]ȬM,ns yFc6saH6+x;騲o$"iW m<@7Vܶ;2==4#VC5OV>v$[Iz쏭 sQ/RWT#Y[;o_rTha(Ҥc#qcYj·$J=47rrH -eˉH(%ȉaFv.7" lbΌ7CYJ(\v8'4/Qȥ]ax8=XZ'ݤiIB3gbP楬,C$G|Ŭ02_7bѣ b+B3n6: n^0&|f>*q&dFK^N ]mmapَh>tD2ZYS4w/7.y??~tu,kG^ePϸ֣eSpc.t3}Zjf/uKrƇ nx D D1+΃ږ_6#tPA9))шV3sy?|yvN͵6"Р?x 0R(:Ee0?,9}#QQ` O8rO@&,Wˋssq{;Mfԟ]? ,+FdK]De<7r3mHkP%A^O?0N%ˏJ@ /%t wO5uIY> stream xڍtT.(0HHw#]030 ݍtJH ")! 19߹w{׬>}cc呶FXApO k$'bcӃa"6}(/, F190 SG*n$,=ȁau^ u%bE8{!av(6zp@8GDNH;A0PN!`G.QCr=< s[ ;`s a(W^WʠOYn-prQD!{ܬm~ a | AR0[( HXXPu@=!v__0z?g3=fEݡ \@kD6G<&|h|~~3Cwd4L <@臈(?em'Wn<-ձ_p5'4hB)KwMῪD )9:sa/-Z7 M51:Qemi-Z< A^>?8U ւ v$j08T mAgW /m+ _HF"^D|h9 |@hCZC=+#PzF? IZ@'@k茄 nH$s%n_B=)q}Uh i/CxK+⍺"P/}3]d, j}~>6umӴ-+h1E)!ߵ&sēwyf eFێ>[#?Lfmic `GHfFfMG'@W2TtN.V Ȕ,i榧t'V4,B1k$.a7+Uj;pJ:*o:,:W vo:/b;UMY:ŌҰHDmƱA~^q^x/$n5"5X^Xd۫?yb͗d]kvnѝvEh#[WQ˛m aBrcƳsjU$ ИOvt)(s<6ܘP< g5}C~dEJI$rC 7rj9gتV~% p1/^=$FBwo8:z@J-()»/cլUm`.1Ր=Ju24_{T>veʷQ+H 4X(˺}M1{Nȹp@R{>ϑ XgQ6;Woi 7wтPD[|twg2a9Q^5hfΦYZs>Rړ{bm^||eO` d<85u~M 09Uyo,`R[P7Ei/O :spagt6'RQ T3tWF*`I Yˋp!SqPD* I$;K触$]|A;cګEMF%U+ v nqSb}*fRajB^o:ĪtMA\Ҟp?q!pQv}L|CΑC.* ),. Lo7sIzW~>5^fLσH/> n%ڲjKs56 2%Gr4w&Wo GtqpЎ X3.+rB@sdU3QNK&~k"?IgbE \ss$/+bc.g~,wY~M\zP9Ζ=)Gķwڝ9TɆt=hYkKW>2,T T﨤|qy̸#t$SS*aLZհ|'|)߯gl}{+*-V8&S>OOl8*uw9oӪ2( aI_39Aӽpn s`J?}GK~v6; AC2A3!mv7^FW!DHG$ZƯucƤ#q9#|Weܙ](!1si[Gqz 4e8.IT0dZl6"K}.) /J)=uOς۸#o`MˇT׿#3k=ڌX|cpnⳞ@ ȍl'Ati{I.)M_b7MP .L 7LgN 'I1;#a(Sܨ[E`\C}F !I3oִn"a8bl=}%t 9Y1?#hIľIQ<*Zaca{,`#;xmC]'g!K8ucnx\7VՆeF.?Mt ērhn]5x9o4,N2dj*IurDd SӬtq K ;}ǵ*K܎*-iwI/yr6ugC.!xY;niVU+w[/0q#?ҽ#mYN%fb2:O;츴?M_/"]tf!2\qi EW!w:ʯ7HiJM,'[FMar ?OfL Y7bOnEg EJ Lmѫ¥ s yĬ( >1zz9|SIJ#,icP^ nb!uh LX{"U R$kߙ;P{v[0I;BZ`w.%m2J\h,.]8:Kذj4V#&,lK#&tκJ]1B}w%e[4./sd$x>^(ϸn9LdT%49Hb]}BnuIYNiäls c61ɖ-ƐOSl:U[nDN5{"ǫYw[k_uuF& |n`NR𷕲zsvfٓm\0a7Yfa# oEfaM[Ԥ*vH6vWg\(Jjc2_DpN!KK5 0K^WG,h&G+aZ=~ IviKXFu1uWK .>3ISۥ pNTit$z#pƋ3t^b%B5h>`9T(fP3ZR%V8pCAM̷yC| ލGЯ*U.ehS:ҩ{| \<)xǫ-o`3fHLc*#Nz/! 5lZ$*9"fN o̰;$_7UJT#@/(x:k'?k>޼̠:`5^TMg&;nzuLɴ'+Qy6EHup / f'1TOpHנSQzx7|qnMp2Mh#A.r]=C^?upj˼/@UD;$O'g;1y%ݕu,!?|14tnO?F1C+ ;iǙJ.*U_!:EH"h:Az{hMX(ܛ*\*]FYdA0_ag9fcuU83X:ߞXtyJ^[IØGBJ~.e]R̋v偷]ádu}ӡ2wa K"8+z Q߬ޥ v7S?-lKrQ~T"?f/!99%c$x֐Ǵ}p?=xZyZVfQv>'Q(6'-4B?QYo*ixYHq*\''AGy@}%N[6ocsͤjK9ίg #)DY]οA+,˪³he*_+D+՘|u2э\g5 %*#NTJ$bkLW7*]-m؜l}ʭ*{Ir?Jٱ-O:+W5y4J/q1+ײ[TwG8V쟓09VB|MȆkGF̂_ʲΛM:A3]:>I,wG(hxiCydC;o^| bR/.>p&E䈬:ؠ__S [QJu.=aŷfMGֳr#pa'$j֊!fd e/j{lHs*)&Bf_Y4*)ޙl<'USӰL!|m78P=zhW]enIB"'{aAiG{X,U*gI;)e;Ǡ@$xұ17yۘij`U&%ZQj,EE=lRͩc5c>(A"aqzMvFvf':L)G\N:y=.yi @ )>z"ry֓ݳIR,c 8߱83 hϫH҆7娇,NjX)ǻoRskUO~3@tVTU %h q]dK̳i&f[5q r>~(mG!KX737Ώ;I $[UNu~dfmr2`dŒ+tNμKzŰhWmO|*No($6IUSI?w$1g7t=57Q32-Oe)JH]T^  BψRqb|Nvݞ[Sh+7^ z͕'7-7fr]T-V[/y>o?.zc3F nQLa!/K)1;FmSĎ Q,:ӟЊ)_ڞvz]# 3Uh kҼod@3%}vXzm2_Igvxm'b0Y85p`Vp/\r|QHP|VNNm(()6O%B(exo4G]idњ*m=4xǁ֚X=ۓ4Fy}HHjog?7ut^:SgjC.+Qz{X endstream endobj 416 0 obj << /Length1 1612 /Length2 17692 /Length3 0 /Length 18527 /Filter /FlateDecode >> stream xڬct]%vضm;;m۶m;mɎ~޷OOqqZfլqHQNT΅ oik,go'Klj kd!#q25t5t1hDM&...2 RMY?-; 7ƣw!5_\'X^pRAE*)>.8m UW *-ԋ6DQcO/L:oO~Sl S!l9>M+C?~U,NV\Ii˲.'_ hh ^*K,c=i4=uI;rY!w8.x0`SOޢ,J0Rq)G #̲_ȠHya.^-TbwҼAs9Q!)hFTD}X$(thnc3cP e6x&7D9POE|]6Kݮ|ݡ-Ǝ.= m5!Nz J5>ƒam?d=wa^P*zGmwE"M\ va17~-Eq}*J3b˨ )r?vcq|Od8[)?/;BφTc7ҶV@_* }Ios*qx͈B?F V)/Ta|wo`=m/W1:\}'2l я)27CF48 +ZaO;ph]3wç Gm#]S҃~؟RRn+Tz)='s6 ңQa)wr [IIJU*R y؊'"_g麛>d✡RkJzkfЗʕ(8N'#RB:UGo*!Hb;?Ӣ{!<¦1#c:؝R!8[_ln#e*W6V_Qiyü!Sz G\7#b9%k>ZS%ZvX;Crg^a]$5A1lT=%w70K(L E3@>AVqkn{F¨ y܅d(5 Av͓Ht)٘!`Te.ܻ{+7߼/)s%_mG%qC8HN(J[S/E tR:"1C 0)9i67e"]9psXwNCr: Ry8p=҈ah-Ejy)(FCa]Wx[GZ;9w g(E9(@`ԸԶr 7>Ƿ$yt󖘖mrhopI{$$,/ N꠰J<U-@d]谚ԓ9-|.+\`- AΕ? &\%IVڌ=U,2 }OL/H ϸb KQ/Fр">YBbx#A9.Dm.7?D^z~n7Y2AYrGu=׮5L=cE$:ͪŠ"9߽9t|Ӈ9K$p9O[d2eJ_m{uhm‰Kwj>T Oj1)yj,&cQa5c{n-wl}L9MEn-mET,0 :3j,4E;Pb#苑TD:WƸ)~ZHVDҫVܶЍꄺ)~'ș3a\a("b zНе?J'|"`Q^19BKC/68KM|.|C1FIb<"58[GgQ %S&P豾;4҄0f&8+5l5flOXoΈӇ5gk',(A`P^vhawo\М5YEō5M_9,ߐm3?ӄ~LQyݑ#$Z)[+K)s$VT^mMٵ? RRh-(!׏eeN+h^ nRc/rUCi|EMwCl({ժp%ls, V ZwkMnG 朦g%ܿHMKpPZWz&AI^zXchupe"Fr(銤:49(ci[ F!0 ydр?Zho +c>yJ{#3Ƿ>nr[Wd!p k̷8eQG剬a,&u;W`:׬̀W gfpN!ӏO~#N=r=F "m\wG|s=Ӎe˞iIJDJ|sFX"N #_ _@2p.T! &f(Q6$,ef W`2d#|B=a$A/Գ,{BScYɵOdֆa_#!2ՍՇW~ $79umc1АO_ARc3Ş&+m1&0oj!ѽ_t zxzi_lݝJ PmJF={`JgPk}^,Z}:̴e4 ]J}(C3捰|7jˉZ\FU]gyqf(~4 |BQ*8gV9p\9 0`'x5ɌpGdm\!Xh=.=BKJ-l+dV&yef$!B0H{*.$q+ew"qG\+f#kuLp.˒ݢ]I>b]R]J,]t&@z Z*VFMnBUm҃D?}Ց÷؃sDz#?#4|+m{q)q9c[ îDNt*e>D$o %N4c&e?+D'F>cγ={uR)aD@T-zOox`YZn;X0s8Rrhd?DǎZ;2˜;њq:ďBZuo,nlA9b ݌$cC+mGb'!s_g-J.1AB5q8/=oT}],/\ޝx%IJOs3>LW7xfjIpa7Fo zJ^ /)-XlCd8M=FB+ǻ*O (Ϯʶ˔~uvڔY+ak <7mx%;per iٍ&eU:668 :r-ҽ1;Y$rݻj.Bʱ#^PŤvML*FNW9xW wԹNQUլ-%UqvlO%AmP,,VEz4JwrsNPhO+ՔCw.W,;?T%(Dxm4 ogNCdகཡs0ggu|MGtXmꡃl7/0~ ދʇj;vN=gWVvuo\KJW){ޜ2l:l hkї/nJP UIPj^r_6Db]2bTR!i((ZgPKD$#<|*XE(&zy iiHXM! =8oJ#]OG]iu Qs?(ȉaŜM3za:zޙJqN5:>%*r , ]}.]_5k]-c7cPY^fIHe%e6usIIQ{ B&/LӇUyK9N**l;nQ詹|X~սg*&ֿu$kqm05hBcՀo{O[OfbۊeC zՃճstNh[ o'xQ̔ߩ*AIH6hygŪ=z.-Bʞ5Lۃnzgs|knmՉY%ơ΋UOm +hzuRޡѠD7jωU{57x1vZOVf *Ig?&\ƾ2a{2~ PɅoOc/y?6wxB~"y`\LvO_ Z 7ܥRnׯxvFUΛ*ѹufN-6A3L1rU& Oeӿ?ҠOۏT2Kv1#k)$€?c=::oĶsU^29#һs{«V`:;ud kc,_yrMLJBQ>7(d9sOǾc/)[)[?g"Kym;%j\kN.h̃Ķ )%| w#v kri`^1'>!#[QSF c͒<v:f-lM<U[ ØC>_‰rx4qrr_4Xcɰ^wyڬr_nMÀ?<[HXTAǵcrܷ?meDId5DAQ".|^W5k/AkDޙY{J?v\?cCS4vfg%~ہW/8,}:>9Nۆt4žGnFSp` $ hXn<)AHUrn~}X*oGL@s’RFP!W&ù!j+x"+!"r1 ET.3niDG-6A*wsgk.8dIJ?ܯveN)mHw9nl~C{8K"y~M)9r 0|S$Ņ գ\Rr;#*c=/" D]ivX|60†oM6kU#QDYE1;8kWupǟiis&;^iMKg}v m+*E{/~ɼM3Mu?.Tn/]_xl @2xꙊ$Q&G童i/?nСXb(PmnACBsG)wBģQ4׻RM:5 /mCc w8HmwLB<nCJ˅9;xeӏvoaY\!P1!څrp*oIK|Ynvb&]g;x~-`Ub(ѣ0;"-X ՚BP8@.G \O&gOͧ#~(H)=yH vNYBPUJ)cS eJ}cI+e 5,mO!_@AO?1Uݽ%yS%ѡ-(I>5$(Xp|Wef^ULrDP.?P*hDz㟀ٜ>߶>on -C\lCbxj!V҅'4Hd6"InڗQr pֿ?gIe`43¸w)>6s,$?tQ%=n\Jػ<ȅɢ]-=P֛EzwÏm$Oh_aUBO*W*T~ j!An=!-DsX.!ރƝ+{tbLULߕI%]]y@WgJ' ɛ[:fnb4;!!v.#U}qE|΀!Fә߶Ke+WADŽŞ ugv3pYf NӚtIyr&}߳R(D ;/y:Atn}\*˫@ǠB`>,3 9h"P!(5iim*"XgzAY2Hw7o' :z9!5>X OSj0Rw*D$T=1[6".U)]L ]K OmYL05N@0G)Ʀmw-3K,9♟FÇH67c:`S[K&qH\׽gbT5'jTộwñ @me VJ!<]X!|[&j EgL$-57ޯӷd'AE/8iBVvLdyc2 CFNKZy|pjwwRZD9Tk\p ~a,ijLo8q-0#|ht~G\ kKa礫CaH74,Ñ3|'R#Ca^ K[X1]ׯ~OZj4$#sk|e|ҳ) meRƒLhS5͚_FzԂFdM%CZi6 DOϼk*Q`+sJ0UqcYɖo?`"ԹTԉ@ ԾɧmƉ: 6S QX$xopkghu&͒J O揌sȃzm48nyflŝe\\?ӝ:)yz؈ކt}ာ 6vMpjձAMRsaO*4ܤ` }P' ^U K>˺?2+u_l"v|Dςyɯ<-}:̝ׯn~9/נUk(U݄7|;e&L>ep\lxMHUB?E&̯fcQˆ3 $lxO앩7 L{>n: .WM;E|ت$Xe\E"*;xB1xj (ZkE9/bV$D|6O;1&|H|W?aW]V!_* }<9":!w7&-lrJXT\Ԛw#d R \9 tcv]MΌ568*,qA!!Q%8>*4<-(,,IE2ahax]}j(* \ߤ6$=AMw2%e3p8uKkef.\kB_tFq2X,YzeCjӌ3.!i 88B^& 1q^On;BJUr&^DNdB&Q5s0g A.H?꿴sn}]@s:uH(/"BzۢvE&i:~VYֻ^몠V:bP춫3PQd3{y{VH<i߸ C٥2`!tI7͡sż+gZ\&qBpNtaD3*S~B,v|%f:H:2 Y&Ϙl76nY2gKPEщg,6ODіm?q;"u/ʧD`bsEDxx3(Զ-BmY)b?հ;ʷmNaJj^Т x][?f7II$y'Þ?rOMXumUcx"caRgcu>6^lDnÈ/D"si/B?eG$hR)M0IE՘}}du^AT&Tpؿ x`wՆbSwh l$vp)#;EݬpĒE2*Y]/\_7$2[m+NJ :v zUo?b 0oSLeNg*"*fIԢMƆ >℉w 24S=1O(Y hQ F|BB!\994 =/ޡo3i"(Qa\YPj14n)E %dg} &BhDp>w7kUWsj↡Toa$)Ğ^1llj}bvUfԓ*S饖V0%6*;(o]Vy|06}f\oqVJ:;pRf sz vKC;a@함(dRP%7zzm+$(yRԛf2ˆjn_e|3;@Oӄ er",֩[dւ |<{|o ܓfQ!XFlI5D͍IIU 7n "i &6;-Mې&fA]8!3wVb,:QN?-uʒdiQlG?L$dc>syY"7&CSrɨB7AҐ`yߵ}BN=c ߠ.Zռ.dlVc<(vu[>5dUZ ezR`l:]Rw{Qh-m.f?%N*aa< A?oCt^J< okqչh?ۄ~$A"i:mAoH5@25*I l L3R>7[O-1OfO/2b T=\yBjl~pOt1[a&șшy?ʼn³,tujwcbFTFʣA&uAtA7׊4|,SX-$4~ ,2;<) o:cn ;oPb`vhy!cusUIER( dFrYsoY?!.yS' ',f!l)=z}xzkk, 5! '?j涕|֣\?11;G~@hs0OI3ȑ1.xv283tɩg P&;6U9"HK5|S@R1ZN eP| \)(P(m2.Peol~yƸ|g9Pw_"!h-+:p/M3_"BQjLϖQ;Z7)ګtO!Xěo(re2yvOh `)ujF}ȃQBҡ<^@A`}2q?5@Ე1's~:Ƥcz4fWAA]zWms#ʳbd\`!Smz/`%V\2#l:œOUtFs$­E?P+ RvOz`,ܖgeDiu+\(r r$}J֚Q)Ճ:*'\9߁'g2XĂ1P;p/A0Df_:x(瑝cߡEl-!•ZWU@@OYג1A`[ KlW='ã%bgrѳ8 6=;56cJ~|CFr-op͛נb?o%IWm;3Sϙ"SlE}NM!9:Aky` :_/|4[,;()}~CpcW+x4so U0iy Ʊa NU<7AY{y.JI1zOy{93zEMl@{<Ԋa)OMH\نD"K;M4uoLXk vX!TD)7?"yΎJXo3B&Ħ@.2BA#-~3gZ*| q\hh~&F ya6שy10l U8gXZ(6>e(b|/:ƘVvEP4;]D@М1,"9ҋry!j1IG\k.x E܎sz r4 Ç{3n)CfzSMM_xc:FGH$l2-BfaGX6&8ND%L&z iDZ>FX8ɿMhۧ͘a+G$D- OsT cDSڤ '.̖)ǩu 2bdɮfj^L!\đɼ+m6|DR/Dp/'{N{ӪoHZEȾyU{|TxC${CXzO΃ϸvA+/uC՛⾭ A:=`G:꣤3E918{LMlʉ8:YӉek!xXB2t0eBu6*} xz86'2ut@ᡆ 9,LNJAQOGn9Sp1fhۓ(7Cu~CV|PN7a7rS; ͭ^uU.Y8!<\<b8xsi@[Eb-aUI6UjK5nD&ȘGC=eVPsџg7㔅Sbx.vVo.м[QgŃTf<tE^Yi (ʷͣu#zΉkDR_RlǕeB_%m71 L4:4ϋly. U7caxR18c*I O\) ̐g׽WMԙjvdRJuܜHj9zX$Z Ko0 K:u=cَ,h>1ib(zSM5Lb9Odi-R7%1scuLvFsr;bUexO Nt+۾$TxjSi+πd b>'CٽHBJxІq_{gU܍~v.ɳeCtq:&j5\6έcxud_\^xa!vO:#@dʄSLe:rq;1U8k$yn]0 භNG}iDx_r:'HZ9cy%W2/6RYzV9sr%PmH¨uusɶFn4Iիa[\ $^AIwAknP s-h- Gn*w+3t+zj)ŝ~TOށcM#Qw&#HXxeKTtëj=CS7Y'g}Jf@(Њ;F'yHKb2΂jhMNOL?~Ůnifu!DC1N3m3 ldVZL8>WNx%S\W~QM5c-nezȸwEl|GYGeE qjauÐnkȾsWTvRPrèωaM|fXO2!ÙjVdNfI#Vғ8^X9%g %cZS51#7V 5'嶤W74 F`ڌHxqI~:!sFZ/Էzb~Ho!-$lW 3,ȈhЋ0l"I_QAf۪&^أ9)a${e#(i:Xqznep,r=HZ$#gدϩaDe$ (,}>mNtsi+ׄ>+7pڠSoHɎ3WAً0O7Bp,skϧ>h,܄uߑ~HqV:X\*ȹrŜ!$Oe8P=nƲQ]Ikn$uM槌*:?г/_FQVzc8#4IIN+ܣ~( V븼nXf!ʕ,(c]R H;` .G"OIvdM IT|ȭ&Re;.sty=EXRFuūƉUlZҎE*٬mՅA$f{ɱ_x3j*pn6d*76zG5~ؼֳsV3Z=K6b?:*/9& |5R8h3.,C:Zf(X $zE%8^-)svSZuH7&Q܎O⽴SGH J|^&%-[Za'T nesR~ {){4f-,ƫM1FNe] {fQъdLfoM[zŕR+mޅ-C 38"Qߔȗ]Bmz[%v!O{4vnj;%8K&>P4G~sUFass^ˮ@1Ĺ_="c8 N0N;p˿kmBUp4W$v+J 7wj+2jʀpdHAN:r&T"kRު3ɞ:bYX}GtȄB=o- s}6R H^–}LaQLpqE$5=30#û[rO2t1**TPU_ [Pq=6?$X]/p:yDVPP(́6MPDMe=Y.}JX#08ޒ endstream endobj 418 0 obj << /Length1 1630 /Length2 16593 /Length3 0 /Length 17449 /Filter /FlateDecode >> stream xڬcx]-wYmtlN*mc۶muܱ={s:g5ǜcb{U)( lr6F.Nv2tJ@3y#k _ Lhlag+b M"@c33 lghaf TUR/?.#@F:Y>m)@9`ja +hJʉ(T@[5@o+ c `j`lgkbOkN'{01`tpr p9:MU__2;'g'cG {g߬ "bNa_O;cZ/lhap;0p6/ſpq5 h@3CGk_Lo[{+_^YڔoNc翹,laYI[S;&.5 vo&v),ߔ;DDyo/ߩ\ m./[ 瞱6ts8a6wÒΆ"hkW:&fz-,܁& SC3]hma 7a*Vohk+׿gPSPV?ݰT *@J.kgx^tL:f+_4Lu5tvph3322~NFQv65li6vqtM_kaVy-2ӝ0sGDGBKU kzw* kCf?=O?0)zSx>$T([4 zg^WK2ZjSJz%P3,WOT$hƩ ]M ugOC#ý7490d<ɧDIƟnc΄WÂ^#-Xd~9Cʹ+bT3M7f1?Z p @rIgCbM')?py*GoӔ0Q.?cBXywan}P/tc.)22gTT:*?G6$*f&x@t_wGL6-(sZ--=j&ho*~\lH"TIkv0jvShbS򫴤+:4"㊾:TDB`LLPA1UZ_4o)UQodƳywffDSHy8rTtBdkX邶: u9Pz ;fV;[%;rfDfc2QZ1[\P -'kxWbj}}G]yI1V~kj"Z-R'.ƿ2\U*yYh#PbaFV`Q1n8A| bSMKesrTػenRV5 NW5M e bW Mn1tF[176*ŀҡz1بުph* :Dk͝H^, E{TF=fRCb~3xJQn9|{6ۃXXvR |PvϹܟ8 5~oqh5Kap?;}*[߻7?fTL]` !vD9_UuˍJ yzHM/Y΃7Gp[9@I[ض\< !;(ckaw6?押]<{BMu>4#~Ӊ*$p-:ƴF}$LۣA>]? U݈SU|awvPsy, {6l[^yJsT5Ai0Y@H cK;:Z<꺫J̶7܄}dly<]TsqT9{FT.l{J5F/:T9R N~d\ʍ*B9dxT2I3sif} &29v{j# S֬4̺u\:YjOJ6 ~&sR7\ODSO)}#~/ÒB?ݱ?1ro39 Gߺ'9PpN&t<8+{Y17>cikUwKqVWapv&o6/Սo^Cx%u4}i?Zq[ngZ8jQF_TM<G0&)?kWk-K^xWƕ_X+g+CTQu]o1QLEx *_N

lJm]W e\9#AܶV(avsxù58q3rsgEWsFoa t,n!:و ۈ)m_zb`nOn&8JhFN ؠ~ wN?i[ة(Cu'p5  Z > ʙކ6xj%> ~ g}>Ir*s/[ԭȷ;+Ta_ְiC Wr3W j xJثߢo]z1BT5šWP[[j\.ǭ cUÆ>ĜK}z9ϺXU(㣎*3gքnwda\'1,'40TXPBe)Xuߵ:(\) 1&PnZ Гµ{%GJvʡIiԫJ1| E>yjjlArGAfm5abVXa׶;"~;SgE^76gJ,M IķYhVSl-bUsa|r첔eed>^}ﷆ=PA!caksC]%V(+q5'G=Oal\+i5,/6H|rꍬƽH7BT Մ0Xkàk] F|$ah39eb0pBjUμ3L.'N}>ML;KקQOWv*TZQ[$f%_\Eu3ļ$X\ZZ~>v֣ҨUydrmzH#nM'Ϣ|ySLsx$zY͒ˆf/gܼFRHJ,Y/IDCR+:jt}ݹɚc)0޼BJ[Gx%>J/y5VPK;ᅴ%'LGWa+ЧD9wъ a+LR teal"Fz=f^^D>煛Q\  {mϤ 3ch͙/fM{AcxŊǏ o Һ'7K iUƙ6ks Pʣ(vyLoncO4tjUT7~]`L,3 k yZ׈$6B 8R׻gǡI"R:m>#\ iCGS@ґjʶ=h|f3 Eq"!| (1Wި~V+$"FTh+c6?L; jˁd{V_zz yc<,hMR`~BKmҷ1u|>'4YŶNF9p[sʥub~^-'m]lV'F%fT-ܖZ3@!-W+׀cvÀF"hWg{+iCUF^ )eR h !DGoF~xch-3@2^%<{7${ud$H<kl8u{ (SIj;kb)7mD4Uy#y<$1{"c{| ~X K92y\P#nWm?:݄KܚSQ`}VKG/An/îH ^QgDJ (Mhgk XuMszhxZŀ#ih7%oµh6hFW_~  `6diGꃰm +g5A[;\˿/T^'<_~eF*ifYB;ϒ/a_jO@D;o9a_SĊN0ecYԍ,Ez1P(2X1?Q87[Ee=u7D[ Lsj^S`uv;-.΋'Ԟ{Eޞ7lk%q1Ev9%Rm xJŹn``yLfk\Ӫu<` FNytOqCa.X8+<}ad~c1.7/P`:ZHH8aYd_E(|BQ@N(Il!g(nh^mD/Ы9k$!Rx!.~1ϰCOax_Mj,>0Yu7a9"U #<>M))G-8"h~|e%@Q(6:2(UB| MJWan&pS3f6(948VÜeçSfWMNgJXqZdgg9e!!_3ixvD!W@d.,ƺI5jhg` W#A(ckO0vsĶ,jASB~yHq0~da f8:l9RA=6A# v?}=7iW܀ B1ȿJgr1[}$YZ!0moܻ)LZFnjj5l* +$5,$h^X`+Yfz*\t?$RF<:g~H'|6v-ٍAqKUl8f'^O,Ώ2,|;S -4IͱH(.ChE Y@bXISk{wJn`.DwvRN-HkRDugh_ނ=A5㵘;A!!IkZ&[GG R0cX6sASa>]4RLV7Sln # HNvΖ;غu҅9m~@U fHA՘e= cөٗظcAıӻXzkqL|&\=#Ndw>56 fk:Ax!5v7>EQ6?G'8N٩G.{ni l{aAAnRC7OE[pk]`{䝀L5ii QE!o1c]`^Wo3 %%m¿#ڬ?b5bͳyҖU_Ob!>qL8ϕ`)4zhD7tn{ ya 9BI;;BLZ(zb憂>pzM> ~i>5_5'sz7>_ )/C dEqC)B*y^?Wv) '{m+8Єsw-wTLE9]t0:M$e+ɤD`)4M[8mfz 8\r,9FH륬sS "^"(W /IYon?_d^{@Bk1@b>|K5r;׮fE-ػCY%*hS8F&;뾎^?Pp=TMAFrqs~o#eZ'9( $ɭJ l_s-P9[SĶ8~?7.iI 髑(Ӫ&d@щ}I(sD4NyC6P""¿ӱ ;LQ 7̫`Q!#_d4\A"kEB_K[Dt+Y.w͑ ^^-VX-y>rsr=f8T`3&}:S8y+%`Nv3,zg}XL%) } :5kI$Ѣ Imc$֫Ld)%v-SIw IQ.j9E!(kL-욓ig^#av |f)<[W֒P|Zb^PQ[7uƖA:m0szꮯ!-C-6 )v=e-9 גzAuDNgcI7:BDr.5@Q2vp d,gоFqqBT7g]Wօ /I$kZVoɜ2[v Fo]( ёbg=^i${-(̥WRNE%gZX@35-UUd4jcjA?ǵ٢XudʩVH7swG+R{  {J$ΗSDN׍lR;*OYүP0SzV3zf>k ?>Cgm|?bhQzj 7*4. /e&o }̖y_1NFaO])ùT%&#:l9 ٯNdT˛?_]n5Вx=.w,0Ì GW֐`)X G"ʐ{}]洢Ja8ȧRdjxp5[D~W:Ubn6BK\yOi6d fw6 ?K? ˹.:*lr;@0|.6uy=7[tMa.ɖ @Q_qEn@·~G 46sXБVk}=dē޶ش^r7k䂗B7WyCݣg`-"~jal 0?cXDCF5v]1!PDN}Z ww3^]⤘I%[9/Ey/ l]E-mP?Tfm(O!Đak K03spFH-ғug ou#)H4vfŲ+m* 0DJp.֣"kI^e)5'.f-Qg<%ь#p y5\TPc$uED5[)rd8biweAXA/ӛZ]+7.L΅qId'_ȎL-l|NYʗnY9'5"8(jϿђ-XXE7~pMJ5VDԞ(8BB"Pa(hfy)+wf e_NZk|7ovNUWbdu0͊a ]FH wkI88s)1+~~Jodnx:r>gt vU9οQ ~1"7noloXM_,IXmTcpq&m uB IdKi7lYLaIa2& 4 K(Y%3ff=DEaNan=RM* %~%aX,+G.n8T0EImq[vYUD4z%*<.܈W!(q*SaEwuƯոyY#95 ]AzcݪoCx'a^+Z@^}|^z)1Ѥgo*m;(9+ A7ʧ}) Y:?^ JI x2PtH@Oo#?^x6`MqVDBޑd<8n\;OPo#"@$9f<~ ӛΛ@ N]Ŧc)zYE}: }caZ#u! "Xd :bvCBU 9!m {n@f7g2UeGY)"RǼ{!w`6R9D{pF4HC$1AtFU-qؕB#KQ!#L.D\T ~mcoVtrKcE 'h2;. {?#ʊj3!XK-+C-Mz#(K7)Ŗc(Zc^5Gp: lZZ `MZ[d~O"Xej(S KUr"t OD,N(?Bgtg05iIXZg)ce1z͵m9IRrВY" .3FRV}8+wJQb_Hm]#95W*lD\bYIYv˾!@N `!u.#³lG!բPYyH?ˁn:T}MÞgSjJx1yne a!ʁM.OU=<\[ێ?(:@B;._E6~kONUs%jAn :{z5m#_( h QY PM@d!3,jo~iRyvn^VqU9G 6$d  d- Sb┻hvrH9 Q)Iα~L儿0S TR_7Kq} %6{xMOP~.HI,2YϤwH? Q~٠-\.t<؏VYMr&6FF%3uP;ZFzڢ.xEؓ$]>1E#;iH!`8H4fnj1% eE9N0}p* %*njF `t@l ] $]쾾5<86nM_FT)lpGv2KIlԚiE/_P /ۤ Wx`K_RٍBB᠓1b!d&0\F ثz3ڏR j`XBYODa͆B{tN %'VcǃҴ?rF.5O v]x%Lwudw:@r'bW8o7 EXAk<נwn\<[6ajPi'C:-SƇE?B):[ZiSrEвKт ɖ)2D$9G篃Ō|nzW#ղ$m,i0aHȅ_n#⬭?}u5aQl{?ʗnEɂsԯIђMyz*.*Q=蔆;y% XŭHPYxd#ABth@ICn0QrZFa,04LܲPX;ܷQŴ+Q(f [W =Em N]N\6y}~--nL촆>|gkj_B9 Ek$Q+k3AJ7 Typ{ٺrA斥8eech*Ԩ Ӧ|3W,n+/_ڙ$V bG5M@ۗ.^ GϽnzBwr\e04MIӸS*T, ~<U5{S9fɝcґ9qOjItKjKWu&;bwCQ! ÷h'vRU7_7}3X%D _ Q}<d8WO 7HV/1ۖa- DܥartkHk/j^6v*V(xs-<@myyod5xӎO ɸltD_1\rܷs`R?w0  =fsZ\ъ>!&}L. $=q_̱>[ -`%=%OWK4A% &la.-+grsz$qo(Uo2Qx+=jyF\Qt{kEVp7= 3$,(d`k7VK:t+VS?s5SN58&feCF97%_zq&Рh<^YGoA"BOjx](L!QY9ќN]]iթgiqQ5s̯3/'EqiwU(yVLCԦQHRH ]`̞Ru\v`/#j';67\5h"rp"{|\ Ljcݪ5E+pK"!!H*2%ϪؓM _oW=c{sTRH} Zsn-R+DuLnow_ o8ո= ?}pILųn֫-G(Qbu)őz6]t҇җ+)} bGܸ~+3=vb_vo^ y^@KU&f^+MOX X2'/ь@93Bd3Y" "NcRK,vyt8Pdj2pLy [cs_mէ6@\ zՑ|;v2OdP^N-hK% WS6MɎbj`NI;&҂Vj_Cpg}Q!ABE|93HU4lC'CҪ{WV&V*^ ONlbb"f{^G̑1.O[6(ёƃC6(E dCLq7* ^f!!;NjڦLi_1QaӊcY 3Iu-ffJ/*eoN*]-}j l]/ E"L* ĉB'qQ[ n9]UD?]I } l!ǣ W<2fdqlH{h_bXn,ZnAvYUuBt)+*s3gPٹ"&m5_eLx~gPpw`׉44(:d0i S9NͮVK7jLnO1=O35*{,̧'ե !Gz*of (poa6:)4ekqΌ*/TVu:Hc61b髂p{$zuw ,9drS';$SHB`|5)̄r %A%w/'zD/^R=B<'lFo +˗?&Lg'u.%g].)mf?l^|Yo䶺lHѬe9 yg7ѼKjFe ܻqɵY<>.pZ ޱC;k+H{LCqG %3I; IKEtEKv@xRǰ?r&JMAA.28jsc>'0X"sz}&ዎb(=eu/e,Kqyۨ/mQm]ʳ `&eTT xHH1kmmA`x^ m(Da!MNc5'X-Uaws) BIO\i4 v7#R9<6/FBUψqvH#QU.J~]CN=9i; &nҿ3- i]m{cnQӫ5ߴNնI8NXz~HP{t%FJoTD:'OsSeWo /}0:E4FZ tC;0-^qJ Yє*5h㥗g2ZPe]?&\T8](}KY^ ]eWF3,6IE.#ͼ7JȦS?*)[@"p1AaH$cے\_Q{W+pK{tɒ@EE'|4NB uިb,IШR; I]$؇غu(ZtdǴ}sC7=jZv{P3՘i#yF2׷JSY=1YuZTan+ΦTA^;y %]n;֏]7.s0YIhCzle߀k:9+P^M >~Pvܼi <c9mqUԑͲE|@)}QZ4q/8~ħSIR!~ }4lH`+DG>K53ؑ]ih(kƵb{=\ҵ'$Cä /GPPoT@Ӊ6㬈 q ˌT9Y?m+kl9HaRQEh$cvn+1k< f}.1Ƅk 9%C\E՛SNPcQ~|Ejyxl)o1f:ٛ-"% EXUh"xѸ^X9`rFum=ndsdg xG׫8<̡A͕zxtmvd8߹<'$ LH\ZX K>}m 6ce3AZ(\ ؄*KI2;ȥ6!]zax 's7yTŒnҊ^_T+1!DNJ8P [x諀-ȯaC)@~G< ]:S/(+V)! <Lh_n=3GVHV Ru5yHw&0!8B.n혞?L,IdK{ڒ\(Be_~å[;[Y203PQ`>TTOZ7P&n!D؟D뉮cD U3S*RmrSZYxnh0v%Lq%#ig>`tS֞nƧl%@cl,+X;ğeh*ە5X^ qDGl_ٲ~p,t4^e;)R8Ʃ4Xu֩~)s!ׂ#YW!14y__RX #1)`D[h#uP&YJu$p!. ].8R44RP pe++iO%a C9tN^orL0mb27suTg?1m = Q`b5d6MLEF ;1"r8E]7U}E.QVT]%T]Y>\5-Y1X⎙|1@V>4DzP+a?Ncm%ilR\#xd!MH%멲t{5wic@0q &xGk3BO5QS3K䍨-Fx8ߦ1a:1kU+d(UW [tHRJgHT``{_9gZT^#&*Tq?ŝӞ+I5 Jǵ灷z  MiT endstream endobj 420 0 obj << /Length1 1144 /Length2 3728 /Length3 0 /Length 4478 /Filter /FlateDecode >> stream xuWi@P,G`xO/ G"a1(%'o۔lWodTR`^o2'sD~ʁUd#-<Ԁ36 m|IN5mQ-~.(n-Uw&Qhm6&@: D1aJpU`jZs1N&;QbĻo zg>ߟHSvAHB7:i'0~?nesL$omx WBZHfk"(qoȩrVת]{1hotɗGe\\:_/m.v_@,{!n!;lҿw]c֭y84Qp.kTwovѩ0ݓov:Mj\=vE2*%joj*W}ا:k.YtbJT-h}hF4xg aɟ˘=^+)%#Oʘ+ShtO p/Ah:ښBs~/P&vQW']F+Iyh|y 92\Oy߾Qy NT͑Ӝ?S@sH#/C*9^ÍM.v&(y AWL֠s p˕oCQr&$#r8g2 }ԷC $cYu9%͛Êw w_&M}7^TӷƘ,G5Vx5y.44NP-@9|T|:,sr)z( %,Ѝ^ENQc(@qYSj(i@ɭ%4qCob2mroƝ E_hnttsUhbf?Sn 4T X/ދͳ ٨gҚxlB2iNMKJdm+\fT2Yw<˟XF ,֟tfJv𗋜]Qß]fj0۵-]1B+a6ldE1˝$DOx%ZN7JBPY"F$8WMe Cشsm L}Hޑ5Nmnvl㵹fSc|7CсcˆhMɅwKIv"~X]:iNw/#ލ']%O_ԑZ.Oq&1J1iJ3pD -75*Vn&ܾ01B~(1Q Q-هL/KhEqT &Ֆp{k~WG4ظKtWQBӃkFIZj^o lrxlKG:%]'Ol6g֢͋keQxrc!06)i>ߔ>j%cܳ;ٲϊ ,_BV|yo`xSZRL{{c Gњ&hՕp:JΙ ۷wȯŶ&K&cE*q* ~݊?Lvd-lj–!Yʻz%Sft\SgS#%{ KK˦4 TIm endstream endobj 422 0 obj << /Length1 1626 /Length2 12877 /Length3 0 /Length 13717 /Filter /FlateDecode >> stream xڭxStnlV%۶Vlb;'۶mvũݽ{ϾXc|Ϝ|"'VPfbgugf`Y8)qM,zv8rraG34,,fnnn8r3JUI\Fit0P|~mΟ)ׁ@ 04%Trq-bdma 0:vc;[Zsb `p[|ݍ@G 'osv [ck|MdhaiL`dha "v4L?=M]jogOW-# g2{Ga8Yؚhfhb trLOh ڔ峦gm3 [8ƿvE\ꯝahbgk019PXfoB?rs3\1| CbXX{[Bv&jt67[OZpp(X8L ?^hma 虙Ŧbnale0mM'UgTԐVߎ Ka?Z3/4BBv/zf.=+?;?eYCgG wgLwJF䯵Qv65ܴRe6vqt$<4[]3 L\=4! >l_\RWe]nZ0xfv Es8҃eMٝ %&Cݤh= `+FLu v;W E8sHGA`cRفV"r`txh6+'8w;WN{Q{-Ua&m鲦4?i+5vldtlD]Oj  A҉K!fu%<jy,LDx$ tT<3pA;e:"vO|]]=y~}׊{F\)8N9H@ӛHs`wF17,xcmv/lƕ烈 Qhݸsg0ܑdQ&_[ 8> t )1:ev s{>(nfhJki_Jx.0Aop7ŧb,5U|h_z+>h@ᠶ]| 5J̬׼5pҷ'+oa<`7{j$TAꃘY*dVU#;Ӯ0E; 4w !dc_2;3CB!o^?7V|[yNf$XԛGYjYMW@1]2Z؍4^ a2tI#EmJ,};K)BHI[$O>McdXCrsg ʟ$nP8Y%vnKύPC7x0ڰ:QF*@E?{Y||l _d~_I28PM&76Xe A{W?u[bjjM}97)Db`qH0ida}J Ot]IT~M]:]j['xᡆƙ2t-N +?B$)~}e%4"ۄP@/ CUV̉/R%w(؀[Q[uZɘ9:d'&(o]+xdFRP/Mo7N".9JK+ cY'L3G2m7.ʜc j!֐'/I bΟ1[QDAB5i{k7.vPqdbV]:5$Tp4E2r`:aaIʫOڧ)KQ{x5]Vg=h8*_$wx#0)rƪ;/3dN B PKkth+W*Ra iTPkq5" 9%t8kʬAQKm>cfE&-b=x!e zD\GPU{ڹAUҐfm$Z\w"!ys E`}bI+ppe~[4€gGjغ " IR`էt(X#ڱC4=F F.VODL T;=s}H~YX0"w_ZqѺ%u!*AqAfXIs@u/ߜ|[44"L,̭J8]PCО]O,ष ,F<H"kى_JI)`(leirm~F̘{{9&-Ձi?}Ȏ |Cfrk:*bKf*/D;y?< eX‰f񽁚E, (a};\7u/WZZF,Co<6VVѻ&Bױ2(؍S\9WMlt XpXB 0K|/VЬwel`권)OׁT y ^m¸~ãF=jN.)#˙>p<13Ŧ{.2K ?/9Jm;qeL?_?#hfd$m _Yhai6J킙ܪP#]8 %Ѡ4\H?gtRF-JFmn]Qcm\ֱ3q~jJW4Tzi0@XͅG0RV! cYlB!&4e͍^ڣS۷w^]|>-t[ ~<673)X^L[e8!,9zz[UB K3=-"WF#iéxemC+>S;\i%gfsƍvQFM ?a4.t]ް{,ILznNy D-])wfA"a1׳51#BH˫OdiY8Q=v_ubC< u?CXӞmy .(`hae 1Ax ͑^U;|LgL9m Hxm"h-Ҳ .X 8.;2S:<pVɳch&o߇? Gi}4ʮ/C>[YM>riW~aPPûVzVۿ/ﶪb]rAO۵bП/R x9SZʲﰷ52eoOrȬy4;գ@LuJP,b++7A>'ƺI*4cݑT~[Gd+|d %'8>~fh5"l+>sK4G9";tPL 0 _n )V㏺ R$l5fW` X}h),xp붜 C)J.|"m) ;En]h vd,_+֝x~7^?yaIc)kVY9QJ% eIN#p4\Y+w|!4tӪL.ͻ45>@roHUBJٰoql̠ɯonC R S)hqc7l\D t\Ze͗alf@ (ګؑA_rA`q'x"&GEnCT3qS})=stB輳W_PES^> 2v?^Q>}5`@0A>t[ _9T"]I-TYGA  NgՐDL?>FzUiRro"'3KDELqb3MƊ=}d@lPkIo=kw_Ŵ:\  "O0cLG)PצL` 녜em!\#2튉Hό\jKR,[5\Ӌ}s¯r  ]1{{ȅ>ʎ[φ ֻ&֧ڕJ嘈KmJyS !Q_ËǬ6$(iٚZNڻ_oscU? gٰR[A~Ǽ9U.*˳cbheg~ͯXV! +bH2)g/YF3c }f>K*'i6 dz [Y-ln9I6sLKl#GVNrxM{1D&_2L=p35{[ z}E{S|`]}Uu2&a> ~K&ϥ?r񞐮&(3QTRҋ&b׬3N !ac̛̽WmjS8C+ (R‡8:0]:Jw3ua>ڡ,gp6GAŢ5B6o侦mEwx& !\wWF}4v639t>cՠ'/$8lf_@dȕ2R~:Y I:j m4UXF4nu@Zb_+^aX8JDL\^FݢӢxz+ijm-D9=bT!"{"QoZ[HgrO=;“Vwrsg~BV%7=W7m*[M*dy[A/%SijiqQe%DrA*]R׌QPlbN)ÛHvtYtC]mm͔#A_GE\x;8 ˛g}~@kegHJUH*MVE×k\= @?Z+` jFuT?W ]'R%ΘY[~ɰ4ݻU3ht >U$ήD_<V|uh-۸ugk\>\x]voM1KaJ6/&SLyѓl9U: ;Y67:&r<m%j%Ƨg̖ tRa7]:6|d>=v"7~(r'IAxNڕZe|V ibEL=D1lsPhA:GMOpoN- K![ :6nL{ҎtECQZh3xnҗ]PLgkٕBd #]"\Lƽ[C۹ T{0)qᶌVv ;`{J ճ#{ZzB_NE=j}b$%9Cf^GGb_AtgDF@LRa  t'(e n!"{^ccX؛qNl▧ΥBݡˤb֟J.z:(%|@)}es;Tj׎T|uak&9(^aa>whx2͐qR"&%0pXA™njok% {iS +;O[[zz`\Wd)n) |sv@܀Gʃ,?]~agh1N֯pkAg``?f^g"C@; Im3ήJZvyl+nڸO_I?H!qŢyE%#]5G%&" T )=mTdr^7#rO+u%g@uM٠ 1i:BֈV l*H~7Uc.tp`|x5/ UVIZ]ʡ? =Ǯ'n;~D#nA-yN/Qh3&bԒsybknͪ f|IZDJOu4z7cP S9/m:2Db4 sسCKV>"| OoP|TFp`s~x8c^۔3>SU=-IH6uH8ۮ~6%+KǠm@N0rc;lQpk+ k%k# mI$[.,Eq_h}edKy!V|6U;8[k)Zg񴢀Zh|uňVVcOpȖMBmWҌvPǐRQ*,cx08Zs3F$JIAm]7wxs>,E* Lˆǀn|REns n^_858,2>GO3c[Im{w_a1Qͩn{2/pLϧEӅ@ơ'gYj_N4\0̉E.CkW~Æ0o*@1`ˀ3~0,g(]otU"sok-DN*4!!j/"9D7t7`&/Hc8vHmJ Õb=ezV9Ё{!o#ؘ|L=/ߙvξԓ6()q {1~z@x`4-ت{-oAvH.ȴsNգ%xcϗ[b %oiz %j:=*g1+*׾熫*Wb& Hv" tF(%)/w7Ev| da+ q۵ȿeD ~JV̓}K=u@HZDNVگ? ݈Ң_<2J@<+$ r*.A(&`X9SJ>,(0lJ0z75amٲx#L4#?Դ=A^҆ןw#Ct:lqgon(b[2)Rd,mck걛ZTr9U36iA4feGνYoD5.>\#6n0S˗5Z6A#6؈U0C} Y6V/I/PjWeteTn_QH"\DaY CVі=ҕԫR9cMI(uNLAlժaȍ-1zF < uVX EY4RS; kRLǺ8'{8y[K; a2(dFE-x>ؔ~U2#+LjY)>}J=VCJvbgi K6:Hq!F:[nlrWG;ns{r><VbVSffS}[ R43.'wq)vNR .[E _^PP :=s Kqۺܔ;vHM?MI'~ȹU%"ԫD Z]9We?} 4py;M0-8 R!uc[@|"`dl>+M$8г= '60Rb]pCY)T]҂}+̒?-M͍_"8w)=ǔ%6۝އNazSp@1*jjHnVLJaSqt`]8Hryb띔(oIAu5@ÊPpM5XQtZ`J =[*sW@C|K(׃0)(>+)pDs0ԗ8 =SmmI ]o~ZyolEpeZR7Wێ[~06=a ŗ׬\%iGyR' TMV^zuE:wQ,X$;3j dꃑߜ)rb4~|Ndބf֬,Mj<F uwlػ)'bW ?}{W XeԀ&qS]Kzy L2tdFM^}%rqiW0^LE;j[؊ZJNKS8AΐQi{Ҫ&f mM%)ˤS7c9hl"Q=d5x(kL7ˤ'M<2'= ylE:_JMFgIͯ8\?{`3\&ִqr9V<3jߠv-(pfCTO J7G3EhK4Cgc#J8䰅#Jtg Q@kJt򽦿V{p wQE(a}T,LȗMD\q\6u>&]^RMc!%(Nj ղk#Is,' =ӫ 8z;EJ}~G igL@K=4(7u\ztPۼ I 3GXޡ&B@e6-jV!uEX-3nZa[Rsr?pLD4`ܗnZAF,hW<ɪ^ xX:V.8`@Q͙s!3Y2 G-f?~;:gt#B$đD<6wț[=װ2rJ2FBJ"phtZR)^BZa4J2NV4ܩ~vpJ,ܔ(b/eb0UC$5UMV%0~A6|WѸ涩|LuvJ? ia[^V)<[lyHDmᘶ8_FRBH bBDM@&=dc 㓜2C\Ug=5P#~qɈ8f Rl9E%=rިp0'GqZYL m͞I޽}X-e2y-k - fJBID$63N3֋j>4<|QPN!Qpd<N|{6cP?ܥZߤ҉8V􈈱y)+:z9=Kl~gwa[=dwU5L )zvgaz l0ߍSboFWT.̓?[|f܁ܬ$摝ѝ&kal>a(yI8@:.{"On$*T֠ f'ysMu"!*Q 8[QagkJ 1'\%I.’%(-x<[J&T,Ih2znrpGF #~CQ?z\M9#Y-g[(Zׅ:@oYFÊTͭAiSы`>>1xêZ: Lkz,DfƹΫÁ9Mz"t®4q.Ēwq7&~Ӛ1alBΞlC!ƿNE\tUZ7 Fnb92A~ɟi{Qt\|b0 /0fq"uG_h@#|J_/q}-{vqRmAL%v [o@ =1[lrD{E1*i꛷ɢ4 I |ԫk3OćI0RWG(}Wv_hyv>RE原 PX&CoO˽wᘆXIQÏtRW'$=}Nڞi_\Z{ƘJPht$8zz/- S@ њ*qʕ 5EP7UxcaH'W Zj@=!,AdĖ젱r37'MMj=W˜va3y]ŦoHbu,'~T{ry@9[4[gm2*1znک 5:Le^\kvF4쨬.zqe(Vêd ̵ӫV~iSmTcXɟm%7W6Ⴅ_lNA&1%7;_q`呗YťmF9QSvgt8MMtSkU_|Ѱ](Ւ@WCPRR) Ljlat._dZAu߬\U>[PwoVQnچQ7Edj˂]Jylu ΞaOxNw*z~ENf68?3 L "ߏ"mam>\A `Z:&@i7Shd F=h\ĔH\֙Y 4NQuY>wCnW':G@苿Ш/i.1Nr5VśiR-f =A)rTRR:I)~ s%yi9v?5= cmю{2 [*4dmI%i<-ϧjnÔ*)U,ύ%$"[nZnȬnR’6cU-kyGB f!a6?FZYlKG4 Og'׊R 7/(̐ZcYU0!#ӞuM2vݯWHwVYG: ]u=W.Ľ|էQfk7NߜV.GeƱREȆ0)kGb޷i( Q%nx&KCe#6ṗJ+dd. ;K endstream endobj 424 0 obj << /Length1 1630 /Length2 19198 /Length3 0 /Length 20030 /Filter /FlateDecode >> stream xڬctem&ۨ8qR1+TTc۩ضm۪ضm_=ۧOu>?랸&yϱ&'VI'hdcvcgșY89(Xp)Mlp@}G3k}G @ hLpa[7{3SG5 J1毧53 h'p4,ay I9q @h ׷(8Xd @j=?9tlf݀@T[g^omfֆNF$WnllmZXSqpt07u "<M`W 1kidcOIu7v8]e9Z_i89Yg{%/_umm-m/ҘoLCǿM̬IkcFNsAT $l-F@c89ǿ!Tw,@ -_9_.}bNrV;w[ Ecoѷ2t?yWk5?I:m_j-4s3s)9-_rk#5/j+M -!*/]ʟAEVCI\`ewl?1`l\tL:foB̬^bϳ+@o݌LZ3:?N6tKߪ]p+6)驎5Xك"Zzm ~WtlsW7Lr}-~K}?fIٝ"CݠhAsǠSzq5/Ψz3SE0bsD97٣-ar]4fJZME#e@ d>MV4,9>w)q}'f*xwE'"mt B/A=FEhi (8>ZQ)e)ڎEɳe:>e(3ÆEOTU*t͠X# UFTE pީ|H5Gv #uax@*Sj3y"]x4̃E lrBrO62b,Zs-ߏN$ n[c1 ښ bP@BϐXlnЌ%G%NeLyY#)xx0o NԐڭ<֏3oo25 PS hW"#QV74o |. :#(NIr`cGiı~2yIH ʨf >,p2Y n吸ߖrAyuӟ\č5_y9W8w F# סiC7 Go?Lpn#B$EvjG `VZ4> % =ilBF:ZTJف^R:Tf3ZZl\|L: ֚H&CDcb1 .EAb*V5=>duׯKκ"Od{ Oi{ >sM W,7kt8)b;?@}}o=c"5$:y/)T%}Ae~Agy+M99!gQ;DJ,qZ!ڼ;*=AD5#p=\֓ ڪpqL`@Iwzv+mq[VWM:SL^Ox_!b2 _Ŀ5P.Ow ewnn[TeayNG+Qg7 @TҸ!w~Q9Y)\!t"7Ҳ2(?Ү#u o!4\s!M+?5y8 fVt/ƍc%}`ГC6#m_ϑ#ru8x:LpC-͝hT})}Ӈ DHW RS WE[yBZeW0`[唯T4f8Z'/~kؤuzRHl}EtkgFm o&.Բ&K|0!M- 0Ɓ'qC6.)foh Xqؾwoކi>+Yѯ1#S }8jm{B+k `In{0kɊ|{F>(;5(FX֌q51BӎU1 >JO?Ӗbt'Լ0UW͗h''v5:2ṹOr{T5bn)E(xwj"w;ȻUcmyyFaDì2׆5h#s~Leh>gFEޖG' D#?.-*^SEjaC>lE*d>"++k^dഉDJTTbK_uDԏEO+,ԟO77]:g{o4+?q4GݰdEVCrruTrj;̑sj?nPo]~C*[殅,H#*ps6f}~Cm~% ; y%l{m?Ƕª#睠Ӿ[dY][sEDH7Vђ;L'뷪W8:) -=(Y֣ҩPzLr{-"_m,e9JBd" ]-\Y= (%]n O6VL>fN[3E:*;siA ]_ ٸ$8YWĦ` P[Ny.0,ރ-9L}vË3@+x ޟa=.nƎ,!]$%U1qorƣRFPr ]jlX$cqXM?~&[l`O)?BDK3d"<qzC;GYd I"wțx|i >Nt>e|I0mnH];K fxv%eVgv%ޅd|@ڿ2!k8ߧ^TNGCeÈo!p|ӈnl!cx*~A;ĉnDn߳̏UDn6dѾ L$1i:y8nk+*~ix ĖO{~)IV = Xn[#`Yv˚j..iCFCqBI]4"㸒dˍ4s%U]ލXpH\bπpo}C[gmeIlh[*[8}Q]>rtqlFsmq>P$m@zO_O5.꾅vٜ&=g.uXkRT&DdОl fpLlF*[ ' 2e>J: hh\hnB>i;H2Gr<(@<>F8D7mG9n[*=TC\Xwk v_$j[fpGgQȍB8Р7xP|Pnst9C MүUx&K=˾x/ucz)칵 >&ɠḦS۽"|nDojB4pqiI ǽ) {%:F0C弨IsR9[C^G\|`b:eq?׿R5LGlRV ) CR?,$㹕&d˨>%M[Q_]]'ѥ=\ILMF_#->LMg%T& v-ݢnRɆ\y1Ϋ|ѲһV}};9/S' ?>@1vA(5b}P-n]WxVh-0iGw&Z#t+Tcg}McvƆBw WB5Ҟ-"Rbf&ӍCr)EreY>'}Wی'HΐaܲHkArXGvm7IU8AIn~Pb6ֈ, L5-f!Imw¨ڵC.9j[I2!vi&9Zy+2KyOڳV¯hޖO nJS|bֈuSz2X}s1UB לkfƫ9qODk %I$g5c6lfFww~O){\[fx>LY_zr9A7^+%"Cמ{/Jc)Q=p8\ӂ_WJ#=hYqY ϭl|9/q(2Z+o{&,a l<@FC /L|Wv9kjpi#ߙ9#oC# Bly0%(S:ֽ5zL^Áta G6,ҦѦzr_IAR]h+`MO WFz\45Q[p@>Il /ThSީ<msO{ m#ڟᐝ#{j2>)l #Diov6K9KxnHM䨑Qt5͠Iً唶Kro +qjP#0eNUVħݹA3BA A+zX mk3A~PrFY'%_ә]>1D=6)4Qԣ&'㩏< 2U=u0U`"\KfT^Au(?]aBL<í0_u-HDo^:cEULs`ިZJCwoj>DDvJ]nw<=5_3poͺAGH.̼Mj(`E=hrW 筹&\L^rܛyEW)8C -R.N2ྲglO7+4*Bn%$D*mWD ؏_"6ŀTߺYl VaWA~g,|dGnHwɨLCOΓ01h)|M}?!Au]JFҤԍUjuYrC\j- 6#ʢzO'*sڴ,*9cUTcn |$I: %#ٍyr4;UNPL: F[؍'*?4ߌ&4 51e+Z a*ڵ ORCZ_4f_tf7iH%}uRiH_R5SRͩf\Dj:*7X k_vkc:ys;@8JBT79;AN'C"is[^KM s`j #-S'}lWGmf3d,qqQ >8O;/ִ D>$EAD\OC/˟%̀[3s~)'*O*MQkVr"~> .5ς2T#06 &s5,L['a%u]!2׎ԏ*`fc6øGٹ&\ ޳H黤JCcY. I;rس@LjȀ:$3Nj[)*˱ddTڀ*BK`G0 xAfGYX4xA#ȱ'sDѱuTj[ʹ&W*,Gƛ.D?2]$ ?aA&iЄTmIneڷ ۃfKCbPdz 4J3O%?+sD ]rBXr6` ]yxQ2_^ [B${ጇXo \֑*\PQ0QtdycҬY,sWOB2C1[+9zTVIT^fs$}pd9SCCۯy!>zo+󡪶<]pjoIAP~vERdpSNww z뮄vu}I~2(;Fk~cױv^"$=+=;B*yNBѸyьt,<ڕ5BCcGLU&2v$"[z`NL1txK2зz&|^sד^@ 1 xܷ?6B~TA(bfuQ3µVbг?1˚•7 Mך !cۄx$-4a^zU¬ A\Ѐt:w[yYvXzra%[3H/m%0Ue~,`ME o bȺSIEH'3xe;l\Dag9вQlId<;hsΫlh~&P$M: zދPVG2ȯ}uT@i*.$C!H l=>oĸw-&O =IYUuGh0">X?  HF*Z7+ne8P+jI2'Pam4zr 0F/~Aa0?6'%57}Y\-GL(rּf/ܨ_#EL =Y -K=Ūif.~hA}MH£bAXCXqaz`bҿ+mpϓ/H^Uy͑hM9pONCWn7o_fsM(2{ƗB}^fpzv+66c3諩~M ^1wVŗOKJ-5y`=5.Z@ǚ$B럊#Cx-6`gH61Tzxa+>tmaIe5*A/ 'ԼS]tGB*,')K~g婜%`ę#̀m }>9BK;YL:GOFAB!N®xpLD7rZcmMof B~G>EMR;bG_7uk8.*0Q x@2Q#өz&Zҿ)Ug,65T(5ZtbzҴrW1P];[e{\w1 ٟ r ɱ lssXH̥{EÊfWg\=tHj]))qIw_?xX4 F}"cS2h?Bg!j-h \hEnV?5"c;\/S߶ӱh`Z!)<2,j' Gǵ!~[/ܜ19 յF&NX7nˬ|▽nSwzapq3x!䎧Qc ̼&JLY3Rln"M6iϴC~v ~[tP5jՒˣ&E3Kc 1$gJ<)X0HvCdj+֓|j^ǢY=+8C ?  FqZ&"V} ڽSxǟ^Լ#M ZXSdLbmmeA-3&!^̘{K f^pA2H'u5%9 MT;dSЪ/ݙ =iަ"=dM޺e` &QV.tO+95XR_t yòb\$!vOe(,h8]\¹|ȫCKX~ \NzKyW,whlCA$Y=v*uWw@Ɉ5 HqKm[ssCZ'^%J\zнi;*H6+yZ90p|~2@#VC,YG8b T TIсV?*@iVȃ大UٴXc=>@9}Cݜ\Q& ]+]hnfy_ 0:w)TPFWܝGdWWeJ"@ROj X&T1a4 sp1tx_^kmHPF==GF64L7AUGL\zljqo?άm$Us 87Uz¥֡8qzi);[mzT|[>> Qǡaװ,O=.' ̛(rQJWIBa5 }S5!.[bD5 6q:kAmc7uRt7(㕕:RTmu@bE>Н/Y=9ܧu41w=>iΤ@;CNt7j[P3&Rc~Yn{XWqen@I⭇3v|%"#Kmm?m.qCYzoE%z]湅yNj\f-+Z1OLD*veBtc#Tepߪ<~6UUh($+f4#7P1pjbut 4h*_8ȹ \Spz?ǵVփ^'FwyB^ZA33 lm$?9!7<@;ݎ/􁁁*ڜ '.b~S]b$#␻"*6 ņ_JtW"sZ)ܯ9žQYxfTtO|oUyA3ٸAby_Znr& tZ jbY_q*l35X0&pliε;ΔOU0Bb) )C0YWr>=̲Z)&D60nEPY)i[]$ݪi*)g},C6;6JXj(-YW+Hҷqym؃ng@F˨C~J g>MN?ݪ%c oʘYd _-"rOި*r艹oDums͏Z[4x,0qFN) uEě,gg4M+|tf]w7AdeT>R EjC3C.E}m|w] ėW̓[Hm"hS DP$w\j+Y'ypgͦw:N$>nkE۞<% gO~f\ؽѷ>,WaVP!8c zpv:US3UXq[(>ӑ o3lċZ-Tk8B7%a-=v/Qp.Zm D*K1WM̵"l6bGPa#s\`I wxK:Nl;{.d!H/ޟ.e Hم'QAձr " Nj*0H:~D٥ۚQBՖ@ 5ؼ]$më1$d(4Kw: k_j[32(C fvcZ%m+=?U)- O|d!#*{,趬?zl6A+XIJdY4]Bf+#]# f;I,㣣~{d㻫>j"SP "@29ʸt 9ZcMt>c|0rRkZn͌IzTwJmxEN& ~D7U~r8pO6F 2Ѻv=o}D}z8~Uf>E~q3roEvr,k5~'Q܄0[ڹN]TBSVPO@3,`zt4{lV,eQ*8HEV>E$ A^dAư1j׭X_жDwTW7<5+ʣ3W2 Zp8mieI3녮Oy-sŌ2{ggF*ܗ==.D3@ьZe1܀ Q%n] 9+wĕyޑh6>g#׻3#l E*0SilJK>_Fކ_3B$&҇@zKP^ľ7GHýXr"YqE$$%m!4]r[&(V$%iS& C,jLKYX5-Y@̒ YE=t\H6(5ha$$-mt=݌mܨvAKPrAӪ0H5'!w|shM}&I[Fڲ9N.VfT&ymO(_|hLq^4.}uCkq!Ik}%իM(\n6mP{``dJѤ[>nΥdӝcC(F@^h`ꭒT+SA6qZ,-Z "УK4Io(Q))?`>sFӬG?!lcZ7pP'Iڷ=_`]"{;}|s> sHÏJ W{^0ZO6|:7m]5rYm^?4PX RΨb8 nD3_ȓ>h_i?ӊ!tGF\}̫ӻMK{쳶]#i@CP^xQkR=^ Q":?W&ιˎƋԓ[_UH$}}# `V@=1B NzD{)KޞukXjŒ'k{O۫k11ED/0 qGpav\HyGeQɒ;#˧,%j pg< z?(s 3Np;پ%,Aqc{GjGz. /%%U!ύ\w<4*golv̢hGǚxKL^!)m8iUrRޔy5:= &a2rf0MpA(QM Q LuV._-1B@<R yrh'ЊHo{JgHXs)VSfq> |6J9IJ='3}eU83(oyc^ei+t:9_n}J.OF,8oU8EX9k 9nAkhUr=9da!$8Wr0Ih<2;8pw:22O+fdHlo1U.JJz[yQNz~WNܚ?O>.ju<k N] d_G۫k%3)|"0Go!I³'}cog~[Gv6j4߮$PmVUl\Ӂ!ލ-KbB-62RQm6-4o0CGh(03Xíystբ+} aѕ0 5Y@Xŗ³|F-DYlZaH2NByo1lc9UB͹9"pկSBRw;C`S 炱Nnh9_{vO3'1; ln.o$MxJbԩLчJ@Ṕu׾#{f޳ w`vJ[z@0,ӆRm' JΈQl9acC&̑xRW` 11@iSayJofXk{?MJ{]4́u-/!tloҋ84[AVKPuoíپ| %7Ls[ƺl[¦i]aBf/:9h=*wx5E̢3Aי`%3ߐDԿW=VoCNw5сP\^`dc)q' JEtK\!JbfFZ`/7~N @ba=}/aLjb%TT)"#8RAz릭Elnpފ<ԳU Wjs@̜F9 sqף9E!Z_=meǽpаf^RBl;G2FNU@z}C*q> dgR횚?7hDS+jgp$|ᢑ-y{L7/LqƺYu\3GHpW_p %Dpx(U$ 2y/lpbd LrSb9u%.s>jue/2U2 `9xd "UvFT_6A1rDJ8!ۨqccYE }uJ빒)`;`GXu$. ]nڻM ;8=_unl璷x8T <=yq39jmJ1ng_z,xui*>=TNOM^ľΝ$N5"Y{W2]e# SG )4f?GӨiR{.l;c%6%Avmׄ4`8yVihOuT?M~b0*Nں)Ϟk`"α} e6j KTYϔIC+ nTlEm1vtaCEh'ZOѷ&-7mhzǩ)13Wb~^X1hxa p67C_rǣ-xL%3vGpJWm:}SO( x?:]g[CdKʹE$BSo-l,!Yg73 R ֣rQM]!"SQ D59ޡWC>)T|c}#Oy^=X Noq ׎)jN~2ƾQ~7:((3D#Cqp/V[VfN#.;!27AaŮ>Ԭu5 Cѝ֬6&L Qʥ(6HqtWjwLvӀ>m? 7_r8-"0d uqJu {jŞt(#;&KQcKod*$V}ziނ|o;>3I04$ItU3(vb* FR?xr˩к/m*EKȶOj~k.T@tk#C\;\2"C0Cfkj3r vńR;/r2p;(TCBs_qtc ]Ⱥ}#=Ruْ: q9q*4P/#%!8j+1^z|U}v3#$M ^R[l^91}fz& OfR _O37Ժe.h#VV^CSfh(B=O4L,ӻÓԚS`koH<4Jz' =~ #`89n %)ז|GP'3A0Nx0莸zy(]1 !`.nIxT`VG##O!W+MI dZ5.h [-3v3o)IH.5;D1O{9plElGcrI<bv pSYs&fX#&Τ*TifTAC^!_'f61h9 7f1 endstream endobj 426 0 obj << /Length1 1647 /Length2 12686 /Length3 0 /Length 13538 /Filter /FlateDecode >> stream xڭxeXݒ-wwwww-@p!kp'Kpw.w̙;Ώݵvګv4%:P7swUsWrU`TZ˺>0NJJqA Z$66+///%@deTӦg-3" +ǃA?vTn@%WVѕUH+i@"T@9H ttc0wtU++4}́NA '=rX:}#`nnWvKǿrrqa}8QU$_]A0c_%}|n WXf@# 4]AVʀ2u~|pu:_7ur]hgɄ#~utnO3IZ8:y,Jn!43D_Eyw%N-ngdj3Ac5__ ;wkp;V 1r2q ry-T@nKSۮ`t9?D|?XX Ӱ:? ſ0hjK7*]G$mEG\E%&ed0q|\Ə7֊n. />  + F>Rw3uh4|(4/4GX^p4ILwf u*i(v L0y ej{k?vzݕƶJ䣭Qq3 ю=S؀bSU3*~!hcw; "¤sB0OՎ^[p*dhpg>+<%)N@1iM;'/F;g4G:G /b ֍,>7pЗ nRqaW(,r:A K)H~R4'^Wڦ:RiV2Uc~Zt̛%i(rN8|?y(S81ǎfBtDvw"Vn,t\R\asRo-0ى1o/[ 7t|:_@|,7ThYbsv\Eu<< WvF<] XYEN08rB :TxHUVD9Ypa@lf\, ]qǯ B|]fȲJBW0!K#RZ;zeSq3?a pEπ7JuPGiDOITw8-2Iy_{+z"CѼ+OO7/DVxgMLQ*. D^!RW5֊wy4a+R$t11.AF}b&Jӻ}OFwEc_ѣoG"y0-rgՁlȚOxq Wazh7C^= VB"|P*iG|]f=u^]knRdV:tgQ tYh9Ugܯx*5[} ZMxtFMv\mNȃt0⍨r|~^π^UqpmI^ɏ爬Q''5PXzx_vau\{&;\U^'*)(7Cq\},򋅅}8IȵuFt2HuwCևN91Rs) (Ӎv_FЭLeX\ļq,շ'Mtg,x7{"}B/9BIwLv+>Ե6:k/i,$fuSӣ7I=;I3x#/?=_ZcmP=#R2T[9{W(7t^dL1*2T?~,R-dV3 WluO[S;A<1ӥ($c5=S9Vk*K:t~+c1r%z>iݗFVj6dW&˂&Vp*ciEG(' g@XIh/9m!W`FSN$ZHV: qhZ<%Ъߣ'b.[FSӓ_|QG&9Џ|BM1V!MH v.QA'n4P%FƾD]sS>o S[W>ws@N<6:ðJ J7~Rf}ã'tF: 9Գn"iN/wiP7+BmdvM_<5lj~ɣ 3vXRØ}i YW'H{O~RU9O$;&i%<wo!DΟ_^:͗cDcǐKmKʀ kr0| >~uk'a8=tҼEcYa,lчwN$ `)!;C?1xu؝FQC xx(70taӻ/OvYD&x}6EWR +pNZ"F [U][`wY`O:⦘!%C8tag ,ϘV*> d[0r>hpZISqwRUAMST?!<<}H ;+^y"//c"9ھ"%ˬ_[;WȽ]+a ՀAy7Ir:Ջ=m `{ʙL.qъE[xOwK4PCA[>ND y!R5e&3[j>ŽOSK{q!%X/@tI&hQsHi*/$ۊuV;y\" wgP xhٺ_\yKLRWy# q @ȡ;"%<Ė9IѬ"3N[ Wy,z&؇Lbc}@pwU(lzpA6mU&M1D4Ҹąڣ/6Mչt[0 }"r6|) wS ۺ34)1kk[ ~QP a7IܯYO`eJ4RRo[z,O6P;\m6GI!eғ rkEeE5FpsLɞ*< &>$/vԚ~dZC!9Wf+H9[#M(K;JCDJޘ#{Ɠ.c`ܿ/CtR 9a2ARn/!?uõUF0u1_/N]RYWB7| ns>BHn%[l1G?^|V9~LY+KnAGj!͡U'ف[k˰0.\[#DE:UDݝ;z=c2%tMC%R  h e`<O3;8+nCbq7hl3a9 )jW_BX,<,J!fef1 *^"j Z2N, bdG'Lk/;xrQ\'իg13"{n2ISGsiל4ȶu_NmA:s f=Ժu]f}{2#jO'eYVgUvla`*w Iɟ9,?Y{JT6"{$n{מ¸WUlМVvMFi=|ϑ{Fp*z[]6{1 HL:@іά QgґvHrS7T RloϨCDk҅A2p?}h3}_WٱdLlTwoNzaڂ_%?YHbbAuDQhA^hobP[_D텰 |9Ҁ0ec:s+6쥥5;i V =ЅXWMNWE|\XX^5tIÀ$hkgr}Yz%%LXK9 = 4eYg!pɜTY6?3o+>rX@9tmj~T..4UΣĺ${f?kEu`<πy4t>-G{/,-xަ_"6lER5l30t sK!\Ԥ iiY.o\  k%ewd2ayXֹõЅ\q 9 ΪxҒ4zle"uGKNX VVo' zHNtA0m6VH[S$h|EP8ǜrbQ fie+}|K?Q8+;kPhޠf7}q7f eͻc1׽,T(!u@Gʊ.H\vȈ4ۚdҔpx&¤x$ "[eja{l59omI0`=t(2]14:bn1I!NBn z oP#B'QlMLly5=$3h+D'awH7Ekp7:]n<ɿ)S{hSvt :?_$^47?7[R#$h!JD5 @i0%0XgHyGvW~=}oc@=RBUU.H)Lnວ*"4#*q =8Roז1!2>d(wgj]q]"X++$?Ql|voyԢy/ 4^!gvJ- p11X]_%\\1U [ybZ3h vBQ%>?\VEAU%gr, e*[R(@X9^"iY8H% 02#Iq,=..`IV(g_SG_Gߦ5O&{Zh΢@igl⩧+.Fx3bUƻ$DHH&Gէ~#BXA?md lL 5 Ԉ'37vܓ>0!)ѻZp96o0 1*b+dkq|4.`*?a"d]DTvMʆ>Y#Iǘt4,N´u;(ma^ΗYl WXBq7"T!9(/\=sْR絾s4֯_enGKj @*m(öNAl\.'m|Ce@/]ӽl?l d'49vw*b:e!1O\ls \kN}&v{α ey{'C ^[Фo kBg8tfn:z o s'Iu͹^ Dl\ bx7s}ߋ) z|sX۹$J1/@H5퇣nAյx\TQ޸g´roQj) ł;qql6Bɘ)/3&Ə[BU63^(vw/R[Zs2H aI8h3)4*CZ+<8'drrsN3}vxWzf]!:Bk]eae= P7&G;b#@$.ó>jnCC8ۊZu>K2ipu]'2v;KK81)C]M,`bִ ~p8Tl3D5wG莏qi@8'eۆCvȼYaH=ԥHMRa)O ٮH:}'ܨkf.h'R@QG6/M6cг筃ޥ p.QI{Q6PI멆ii4A˥v9P)1:/QA3GYYOՐ3sgPeKw&;)|8?r$gϡ#E{waC2;I|kKz??$3f΅Z"H1)clkSZ&`EzEfmV dH#PyyU+Y{݂Tf1~5C<,?&r7nJN&ZdقQ2E6/,X5]T =7[{bB^}}o^f Lho>9kN>HMi| /JV;T%LRcv SάL>-C@i@]F3f} zhQ *M4YK~A 6Y ^ igRhd5ɶJ -D/tҠ , jU Dip4VOП4l殅n۝C?@}1͍M/W/kWZk!ke\Vtϥڷdᠠ`',s~͎SRܒeZ\%BK BM՜dZX5.Ua[DF췟ӪE?ڹ_: .4~}}G~ .Y- ma: m-hq^MPW^te]"r@R)kveԓyMO'ylH^`XmD'@ޛ{?RhXw`qIhGV(N܁|5yv긃و4L(CQ+l0(Z5~Dy708ftWGwIKVnY2f}y5Lvf\3Y{|wUR}YKz t1sLlž 5 A.X T0q+QOVrǸé>mg3[ B1yK:zS׮P= >ی_6KЬU~)_ഭKj^c^3cqsc4!K26'/g59紫tF.+Xv6y|fF r-nm1ˏڳ}u "}aG|3h,| 8Ɣ-X⮙?oN*[v,_b.1 +Z4ۺ^gD! Y搵1]*]/+Gu"kߺ3p#i$`$'؁JZ \Fuv!ަ/v T'3PM``2 ۻ Xs#c3b`i_Y_d%)E{rPS@2rlh".ǘ zZ[D92 gУ-H*Hj^սt ՏŻduO.5,7HׇvWzB#ލu`1_ElcAQUrí h\Nüw+ӿam"bqCP;Z֍,k3A˜ h]+'>U-}qKxOKWa:E- -rV_)>B g5F$0VٴߌS‘&yD e%(>]&<sR??5;`R(L [S{PvVʣ.".v-~NyPp&|kN0uRp{`/y7'6u>KN)K1ju葄II(ԩ*)ݸ}eGs7y,KtX M0P;v䭁 S:>o.epgkR*T>mf Xr>n$ώ7F^ ReأտMGDOu!O(9&+mrSlG(T,AIx̥EN5sd=g0vv5qvP1glA)}@+ΚxH33VhyG}x8!rVgd  ߈ڰ:d/^ }g#hVa>zB&̅ =vi+<N8p$>fm{,avYi?s9GP -oDTj% if+XuBv挰~P}Z9Yd' a@luGs9BMӯ闔<D N֍Je'w:?ahXnAp!^?s/|nxtVy{gvv,;EȟL!V89̦N)볛]U1~@.[rwO>?)1-I3ܖn-Ld;׈bLGne[nh#I])K ^>$dgQ.,UMsxJD|=KN_5}ڂI[,4y4h[˺~wgqgH=,rȄ7O&Le+Gg;VW>{OI]5JcBqkld Wv\7X)s.j^[*u.3g*?ujpgr6g0~Jz*ödwy=§M Ak^wh R 9ˀ߸(DI#C Đ43;m G."}G8rRdh rƯţJ=Y$J~b4Vr-Fߠg@$JjQjl|gN/UDmOPQOc7 Q Ei_%ZASV6b Ə5N+"r+AQyiTe-b em BCX5C\ýkM{4@apPˋnNĚb2&$pw,% 7rDY LJvꗚ?ʃioz c#f endstream endobj 428 0 obj << /Length 696 /Filter /FlateDecode >> stream xmTMo0Wx$ ! 8l[jWHL7IPV=M̼ su;Uٛ=w]yil;<[[j<=?׾+v`&ߴț<^*;~&Q>MS >_P{=s@dkx;`VY`s4JaQܡn.Uu9\Y6><ٴ.Z.4>Dӗ}~r:-d0VWk,8yLһʮӮђ[*mLr?q 5F8@=@)& 8Rx uD\j2HV0CzL] bctI g$`htы0\F0s jd< I6zg W qȐ+#k .bsrbmXK7ǵH7Gnb>&jؐu1VljOu$՟qWS/%1{\xB!K(hHTЖ枃Jρϯv=k2UKς_:~$/ ~E+7ˢ/ l(/} -+ZXukoԝE?ZKq endstream endobj 429 0 obj << /Length 695 /Filter /FlateDecode >> stream xmTMo0Wx$ ! 8l[jWHL7IPV=M̼ su;Uٛ=w]yil;<[[j<=?׾+v`&ߴț<^*;~&Q>MS>u;q~:fc_0F)lGιmu f8Gӫ6b"!YUe.`M{My?IC4}+̝l/Bj*{pϻƲO('$ *{>J-9_eQ"V$)MP:^9 ^` br @ {@(\,RH&ti m+3ԅ ,;F$БzFFieD(0A1a8yΠFpnù[w6p@ )9r9b_ia|F-(:(nQHY^`nA|n(戥K}s\}sԑoA&vqc⠦ YK^ʛ!_my_)=^ ^{TGRw1RDž'xJzImi9j'pͽܳ/-_Z,N_: ~iyY2q,nЪ5QN Y58.] endstream endobj 430 0 obj << /Length 739 /Filter /FlateDecode >> stream xmUMo0WxvHUdCmU^!1H#x?gx]OTm$|͜s_Iss :L;<Sz==׾f`*_`ɫڟk3'iѴ}=M;7rfnj-eSӵOLg~8 )ok A8 $`I\3`Af<Z]! xNky"7 _㓧q H`nḱRONH=CpB:# =%888QA~!*zƜАT?!~> tw8y*sύ }nFE>7*QύR>7G];~<6OIyktg>O:yұϓN|I/|yIg>O:y҅ϓ.}2 L> stream xmTMo0+J!m$d!mT&t@32U1~3~˻rr\i$^ںQg|6'oxdG2: lic$Pߛ)? _CtPRJ(:Nps0I֡iDAWj~:ytM{47xO_ M! K2XE?iڝ]]TʵHrS0QOKx&Z=1>bqb0q&d'H1[Q/c0&տp*I(kÆ2$l/#A cΘ :X"^fF~NK rJ_dP !@+MTH`ԩ3NE7kfBqxIA2Gs6AEYe/O3рI?kM'WGff@$%~S s셑(wr͂n"&}7dXz s)d?X~`5`?؈`cMv~+5k6c?؜` -d?diCNa\`͡2 ~DSim@]Yd8|pJ endstream endobj 432 0 obj << /Length 740 /Filter /FlateDecode >> stream xmUMo0WxvH UdC۪TBb B8߯{ .@=/ۙڽs{K;K.k6/k+[M'ҷ>dyӔKe'$cS`vfSfK}fƁVGGf\bu<19w|擬CTAW $rG]IyMsh$aW7y̟u? sK-`θtJ!'c83?NaO<Dg!;IX 0z)rЃ@kpBQ]^Z7! / U <ɉ#W m/%]cX! gȀhID8QN~ACT/sQQRs 穅ύ>7: F+}n4eE=zG~<6OɈy2kLd>O&y2ϓQ>OfdV>OF<dR'<>O)yJS*}𗏿tx>z{O->tՍ]*3>cC~ endstream endobj 433 0 obj << /Length 900 /Filter /FlateDecode >> stream xmUMo:W5?$R. d9M eCkmCp;;w~>|3E_?O]5߶w]Occ]=~?}Oyh9%?۹׬B|Ɯ>);vw%g43>\ 6 EJ78 1{~`W(-;]%=xe_,b+-O;q\L}UI--=BKE1p[! Mߊyu>.N5K)Wb٬8i[_uʕMzQ)V(Txޢjy!Z2P="Zd0\ÃGR\).2*Шa!U,H`+j.5Nα@VK-x%3%AYӀzΚ>kP#5m0Woþj.ZT$X/)n)#Wo(oRZ $Kp4Z-b\1ܰJ P"GXQi/8k^Zq:Zs9dB )sL-7xJ`aɽ)f$1 dъcCZC<73JgznHȰYɚTa,_-O87}KԴܗLloK+gJ.GZyVc48Wt]:P~`rZq.n1] S/Pu7Ue:?&?!d&1yHn5)yғBx#1ޞ]Go׏M?X endstream endobj 434 0 obj << /Length 900 /Filter /FlateDecode >> stream xmUMo:W5?$R. d9M eCkmCp;;w~>|3E_?O]5߶w]Occ]=~?}Oyh9%?۹׬B|Ɯ>);vw7{>oaI> ѲH8U/RǾ0ñ_x0ӅxBiE.͏S=/b_ixމbc4fi|8EXD_R4.GRQhV̪xvqڎXJfUıkM;rͭSlҏ֋jU,N2@ ",   T[<5 1"àcvG@mg K | +T|5flxZ1YP^ꠦdb}[ה_Q>kUbw88]k|'%Ǿjց{ g䈏rsqk:n87xIue.Aft0!?4ɳ4mFtӔ^z1?z .~lP}L endstream endobj 435 0 obj << /Length 664 /Filter /FlateDecode >> stream xmTMo0WxNB+8l[+ML7RI";onDo3ތ?n~<&yݽIr/ŋ=wWIG77eW]Nm=ij몝m-m3Q/oMq'}vIֿ/ ˺sӵBK)ɱn;A9n1vAxHŢn!XN4$>΃=mc-bB}hjM^Uwww BF˥푊QM]1ʫڞCeݡ}BʥXl6ȶ5R^clFrJՒk ;%9& }8K|y091x&GϹPT#Z%)&!lRvDr䨑\#G|bǚHUʸ4'22| ^Dm=^sS<cLUي_3;S}Ш2?}LN=8g,u..Q/)87l _??q Zqб<4 4谡Цg~ѧ,I 4sY^y?4hv5O#ܵy7S4 &*s0P.9S0׬p~ne8|p\ouqn6|kq_^~& am endstream endobj 436 0 obj << /Length 665 /Filter /FlateDecode >> stream xmTn0CB*D rضj^SpH ;olvR3ތm~<&yݽIr+œG۞m=ģ몝=b[ntC۶z;vʾ6%:svI>77 N!._ M u+$bEw!y1 vxHŢnSX: {Nm]XNDW[״bݹ,,-FVL"~C۷6ZHfٶ )/16X9CjIxļ$Bi#cΓ@l MDϹPT#ZC%)&!lR&TG5k䨑}WLԌ]Uz@K~bo#?қHљ<-+`q}ʂbI2_́Y_%X?Na~ZjGcrj59c+ϳEHDܰ%~WLz9ܓ2ƛFϲ`'I&se?zyxмj5F̹k#niM7>T20P-9SA˰֬p~ne8|p99[ڴw=ߣ& c endstream endobj 437 0 obj << /Length 665 /Filter /FlateDecode >> stream xmTMk0WhFG*! miʲVZCcYy#9햅ļ{3񸟤e&Oo]&C]]Mq>zwt߉Ǯ)n.pCx?nڽVgx=itO"i [\l\WM}'ԭ̚t4pXeȉeU oq yM\-CnCW_Ey}wP dZz891euB)] W-\v\]~[S!8&+Zce"'2Ɍ5I@|"B2AQhSlLء28a}ɑFq5ҍnnbfǮCG= Wܢe$g;A,:sx l=NOTƘ$0_س/vЧQ%~Zx pX2]$^qnaK??q FqMyc0=) &l(mi,3|d &\c ]͹&ӈ9w{d-tx\ \cΜekqLJs?<@>qhx .׷8wl~1V<*m"mmDa endstream endobj 438 0 obj << /Length 664 /Filter /FlateDecode >> stream xmTMo0WxvB+8l[jWHL7RI;onDo3ތ?n~<&Y$ŝK_IsE77E[^N\5sߖ;7|[lzmS_*7F?h3΃;mc-bB`ew\_7oK׽;(2Z.ETz}ܟ~o9V^MVK7-\f\S}[S!pcSs|TXo1/ȡ aeuC> stream xmTMo0WxvB+8l[+ML7RI;onDo3ތ?n~<&YվI|/ŋ;t硋nn\3<:Wj\=?-wn6pGۦ|Tnʽgxté7~qzxKlqrnX7UޞMjuSAxHiQ,'wͱ 1}hW7q{UEݥ-rG*F>NNL7u]tNhWS;wE )b,#TTHy=)9>*QKr7P:MȡQ^s$LD6aȑ*s.$S56`>ƄmÁ#TL 5kd}WXssc*zRh/#? bE$L|ږ8^y>eSQc̯bV̯cNa'_OAJ195kd3EH@8ܰ%~As*=F 0`{RLPh33Y$LƹǬ oqMsȼ tx\ \cΜ-eksL ?"@>qhx ׷=l~1֍>*]!MBa endstream endobj 440 0 obj << /Length 665 /Filter /FlateDecode >> stream xmTn0C6U@"mTt@;olvR3ތm~<&YվI|+œ;t羋<]3;Wj|{}[ mmᆂMv{Kt=c_~B?zxoBS6wBJ)X7UaMuSxHiQV,4$O;nC-bD/OCnC_n^ѻs׽9X2Z.ET~{~ʶrn_~߼h!R,6ew*ؔb%k e+Kӄ$a"1x*s.$S56P>Ƅm„A Fs 5577vرϾ+uaя6R:!,əCxg+ѧy*JcL|*m:fvuiWUꧏɩ\g%<Ϛ"sÖ0_:3x0kjhyIYx0aCnOg3$cx0<<v5O#ܵu7A 6*sZ ZcΜ-ܠeYksL ?"@>qh|tngk;dGGM@c endstream endobj 469 0 obj << /Producer (pdfTeX-1.40.22) /Author()/Title()/Subject()/Creator(LaTeX with hyperref)/Keywords() /CreationDate (D:20240501012631-04'00') /ModDate (D:20240501012631-04'00') /Trapped /False /PTEX.Fullbanner (This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian) kpathsea version 6.3.4/dev) >> endobj 394 0 obj << /Type /ObjStm /N 98 /First 912 /Length 4518 /Filter /FlateDecode >> stream x\rG}Wy'Ȯ}#DmbC "q |SKzDКy֓'OeV7(K/+VI+cb*,]mřTpלSWY[)*bZs*P?Ѥ+);P0 MιJpCP4 JKE.i x%le& UYgt7TpWjd1S,0r *YIVQ@v(X`M~ )~bv7 cy2؜69O`ȑ|f/COuP !C]{$2dQMJ (7iJ:IjЇm/S :ѽ-kR=VqwMs=k(.A=|웯a-jiy ! S+cZچ2>ަy?CmI*$e? }s1.kǥO~YAt {H˚{-C.َC!h*SD%ik[upnZφ!AQ-kҺp3H* fǻ1A3iTnW;__&`̏IQ=?y2Fɻ[sߌXn^lvMsΑ ;"#NJ/]XF|-#~=cIB(UӳG^9eHv>LYvP㦩꧋q:_No֋A9'>>yG''?rF RÇ/Gt?_p~:oxhr^^-=XmG?'嬡8]744dyڬ7i=|q}=YMg9/ i7roj6Y]katJ*&xɛv '2Jƃ%RˣC>ɣ26ӳhr;R4~gz|^dVԟK}:oEɲQnWKه?{#[dّnv؝qف{}o`U emwqdycǓg'/O&n#`qJVo%﷍jd@z nZ7IVgj5=1af\t9+͋w_N6Mx߃"H-Mת/d1qbҘ+:p^(\m3>#7PG7A-w>1t;fB kYExpjEp-RRp]h]1W ~Gܷ 褹ز7Sؑ 8itN4˃`~żM|6I%޷g^w#Y#h0rXWZj{'?ǫ;z Uk]^;2w zGٕsBLYv8Bo.{\^ۤCgVKVtp[d?o (>ϓpvCW)p-9%5fU"f92(RS:(emT^aTrm bf&˒-Q.B~HK Z ["(=. 0՘sʦ@pdW+/;ەY!~u#>+#D>d?ؐy{$E> _Uu6W~:]q$boMqzZ?Z }-ݖC.DE}@HPNBj% >$QxOIYN=H;)x9#Q4Pt]LP-AޅGQP϶>4U@M)`=)x|ٍU{ +˜N.``w=vw{̰뱠W[l+#o|8ڇlw3, A;@V#Yсtq/qU*q|zA'@wRŵW+t!OHp"ՎyDg}KpaoӸlkWRGv-n0ҍZ^!3!6k7uu-J]rR)ym]͈%٘]\َV&ڮX u1ô]R̔qd^w#-@Jef#2m dBZTK;R],lA:Hv$G6l -Jqg2V"cA 1DKصDdgrȝ;@] Z :;@;@ER:h4{mP:St6/7߹6j^+CؖmOi9oz\1DÏSE ﹂ ֔55qe|D_L޲mO}lAaQNӟ(55jb5,kl;Ò GZ G@%ȑ%, rT4 rTg 9k ;(1v$1%(@&V@B2mqPP(v(Azl9~|<ʽؼ=BӳbvW%Lf Zh#4ܡ˦x 0ҔW :o5h,\B$DYбH 9?0 6ʩ(k|ex_׺O ClWm{m--EDʺ ʖH ¦cb;+×,kCE:l:ٙ:Vjk 0įlttLg,C\pkW`TB")Ш(ADhs@1 0aiԪ9Ǐv]t[ 2R%:^ jգYmmsjԜOs%ukBnXM}Eϭ(C:KR=_=rACVl%ۼlhpr^~!Nٱ>?Hz~RΧnEː>oƇe^C^1~hE9p0f5s!flsfv@Ab㬬Abcl°^۹Eߏ$ic?W6]7f8yĮtjeԉ@X*mB%WpiiΝYMO:*SĮѻXYQ'eF~{4YOf˃{<.9Y\4Uyꦙ?퐗?ru endstream endobj 470 0 obj << /Type /XRef /Index [0 471] /Size 471 /W [1 3 1] /Root 468 0 R /Info 469 0 R /ID [ ] /Length 1168 /Filter /FlateDecode >> stream x%ILUy1BKK)PZ(Ёe,PP mQZ5q1iDa c\&.0a Wԍnc\K! !,8@d C!ʂ7am`;v!|PVvU hHKA,i(GD h=hgH@5Zi`'Z?)],IZ"Z?iGE#m i]'H}hh=Mt?8{9N@;Fzաumh hGIv44$\]rM6wuꇎDW\q2F?mMtg@QTKY՗-IF`L`̀Yp ́y ,2z\*"JU b!>ξF+̺ G 25 ٴP_F­-Zx{@VitB!BGV!\ AS{BG/8w^cN[j9P3z j<0?'7Y\ |[8~qfh3sqwYXU 8-M k!pM-+H>liqZxCs|',6s8̱YxI[xmKಅWgL0@=镯z`.2>u,̶ܺ %) QZ l$GUR *@4w1(x7YVN4ALA00 \ *& c@COJ:N> ?kٞ Z% S>9g]-ެtp>wD[*A2 8`_oh0kFTTf[&EyVQe~hQTeƭ_B_<`xG\&w#~<ՈF"ntF<]LFߢ  )H^ljRT5H quantile(randomScores1, 0.999), experiment[["width"]]) ################################################### ### code chunk number 18: adapter4 ################################################### ## Method 2: Use consecutive matches anywhere in string with an FDR of 1e-03 submat2 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) randomScores2 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf, scoreOnly = TRUE) quantile(randomScores2, seq(0.99, 1, by = 0.001)) adapterAligns2 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf) table(score(adapterAligns2) > quantile(randomScores2, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(start(pattern(adapterAligns2)) > 37 - end(pattern(adapterAligns2)), experiment[["side"]]) ################################################### ### code chunk number 19: adapter5 ################################################### ## Method 3: Use consecutive matches on the ends with an FDR of 1e-03 submat3 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) randomScores3 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf, scoreOnly = TRUE) quantile(randomScores3, seq(0.99, 1, by = 0.001)) adapterAligns3 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf) table(score(adapterAligns3) > quantile(randomScores3, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(adapterAligns3)) == 36, experiment[["side"]]) ################################################### ### code chunk number 20: adapter6 ################################################### ## Method 4: Allow mismatches and indels on the ends with an FDR of 1e-03 randomScores4 <- pairwiseAlignment(randomStrings, adapter, type = "overlap", scoreOnly = TRUE) quantile(randomScores4, seq(0.99, 1, by = 0.001)) adapterAligns4 <- pairwiseAlignment(adapterStrings, adapter, type = "overlap") table(score(adapterAligns4) > quantile(randomScores4, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(adapterAligns4)) == 36, experiment[["side"]]) ################################################### ### code chunk number 21: adapter7 ################################################### ## Method 4 continued: Remove adapter fragments fragmentFound <- score(adapterAligns4) > quantile(randomScores4, 0.999) fragmentFoundAt1 <- fragmentFound & (start(pattern(adapterAligns4)) == 1) fragmentFoundAt36 <- fragmentFound & (end(pattern(adapterAligns4)) == 36) cleanedStrings <- as.character(adapterStrings) cleanedStrings[fragmentFoundAt1] <- as.character(narrow(adapterStrings[fragmentFoundAt1], end = 36, width = 36 - end(pattern(adapterAligns4[fragmentFoundAt1])))) cleanedStrings[fragmentFoundAt36] <- as.character(narrow(adapterStrings[fragmentFoundAt36], start = 1, width = start(pattern(adapterAligns4[fragmentFoundAt36])) - 1)) cleanedStrings <- DNAStringSet(cleanedStrings) cleanedStrings ################################################### ### code chunk number 22: genome1 ################################################### data(phiX174Phage) genBankPhage <- phiX174Phage[[1]] nchar(genBankPhage) data(srPhiX174) srPhiX174 quPhiX174 summary(wtPhiX174) fullShortReads <- rep(srPhiX174, wtPhiX174) srPDict <- PDict(fullShortReads) table(countPDict(srPDict, genBankPhage)) ################################################### ### code chunk number 23: genome2 ################################################### genBankSubstring <- substring(genBankPhage, 2793-34, 2811+34) genBankAlign <- pairwiseAlignment(srPhiX174, genBankSubstring, patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") summary(genBankAlign, weight = wtPhiX174) revisedPhage <- replaceLetterAt(genBankPhage, c(2793, 2811), "TT") table(countPDict(srPDict, revisedPhage)) ################################################### ### code chunk number 24: genome3 ################################################### genBankCoverage <- coverage(genBankAlign, weight = wtPhiX174) plot((2793-34):(2811+34), as.integer(genBankCoverage), xlab = "Position", ylab = "Coverage", type = "l") nchar(genBankSubstring) slice(genBankCoverage, lower = 1) ################################################### ### code chunk number 25: profiling1 ################################################### N <- as.integer(seq(500, 5000, by = 500)) timings <- rep(0, length(N)) names(timings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) timings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global"))[["user.self"]] } timings coef(summary(lm(timings ~ poly(N, 2)))) plot(N, timings, xlab = "String Size, Both Strings", ylab = "Timing (sec.)", type = "l", main = "Global Pairwise Sequence Alignment Timings") ################################################### ### code chunk number 26: profiling2 ################################################### scoreOnlyTimings <- rep(0, length(N)) names(scoreOnlyTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) scoreOnlyTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global", scoreOnly = TRUE))[["user.self"]] } scoreOnlyTimings round((timings - scoreOnlyTimings) / timings, 2) ################################################### ### code chunk number 27: doal ################################################### file <- system.file("extdata", "someORF.fa", package="Biostrings") orf <- readDNAStringSet(file) orf orf10 <- DNAStringSet(orf, end=10) consensusMatrix(orf10, as.prob=TRUE, baseOnly=TRUE) ################################################### ### code chunk number 28: infco ################################################### informationContent <- function(Lmers) { zlog <- function(x) ifelse(x==0,0,log(x)) co <- consensusMatrix(Lmers, as.prob=TRUE) lets <- rownames(co) fr <- alphabetFrequency(Lmers, collapse=TRUE)[lets] fr <- fr / sum(fr) sum(co*zlog(co/fr), na.rm=TRUE) } informationContent(orf10) ################################################### ### code chunk number 29: ans1a ################################################### pairwiseAlignment("zyzzyx", "syzygy") pairwiseAlignment("zyzzyx", "syzygy", type = "local") pairwiseAlignment("zyzzyx", "syzygy", type = "overlap") ################################################### ### code chunk number 30: ans1b ################################################### pairwiseAlignment("zyzzyx", "syzygy", type = "overlap", gapExtension = Inf) ################################################### ### code chunk number 31: ans2a ################################################### ex2 <- summary(pairwiseAlignment("zyzzyx", "syzygy")) nmatch(ex2) / nmismatch(ex2) ################################################### ### code chunk number 32: ans3 ################################################### ex3 <- pairwiseAlignment("zyzzyx", "syzygy", type = "overlap") ################################################### ### code chunk number 33: ans3a ################################################### nmatch(ex3) nmismatch(ex3) ################################################### ### code chunk number 34: ans3b ################################################### compareStrings(ex3) ################################################### ### code chunk number 35: ans3c ################################################### as.character(ex3) ################################################### ### code chunk number 36: ans3d ################################################### mismatch(pattern(ex3)) ################################################### ### code chunk number 37: ans3e ################################################### aligned(subject(ex3)) ################################################### ### code chunk number 38: ans4a ################################################### submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 - pairwiseAlignment("zyzzyx", "syzygy", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) ################################################### ### code chunk number 39: ans4b ################################################### stringDist(c("zyzzyx", "syzygy", "succeed", "precede", "supersede")) ################################################### ### code chunk number 40: ans5a ################################################### data(BLOSUM62) pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"), substitutionMatrix = BLOSUM62, gapOpening = 12, gapExtension = 4) ################################################### ### code chunk number 41: ans6a ################################################### adapter <- DNAString("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") set.seed(123) N <- 1000 experiment <- list(side = rbinom(N, 1, 0.5), width = sample(0:36, N, replace = TRUE)) table(experiment[["side"]], experiment[["width"]]) ex6Strings <- simulateReads(N, adapter, experiment, substitutionRate = 0.005, gapRate = 0.0005) ex6Strings <- DNAStringSet(ex6Strings) ex6Strings ## Method 1: Use edit distance with an FDR of 1e-03 submat1 <- nucleotideSubstitutionMatrix(match = 0, mismatch = -1, baseOnly = TRUE) quantile(randomScores1, seq(0.99, 1, by = 0.001)) ex6Aligns1 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1) table(score(ex6Aligns1) > quantile(randomScores1, 0.999), experiment[["width"]]) ## Method 2: Use consecutive matches anywhere in string with an FDR of 1e-03 submat2 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) quantile(randomScores2, seq(0.99, 1, by = 0.001)) ex6Aligns2 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf) table(score(ex6Aligns2) > quantile(randomScores2, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(start(pattern(ex6Aligns2)) > 37 - end(pattern(ex6Aligns2)), experiment[["side"]]) ## Method 3: Use consecutive matches on the ends with an FDR of 1e-03 submat3 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) ex6Aligns3 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf) table(score(ex6Aligns3) > quantile(randomScores3, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(ex6Aligns3)) == 36, experiment[["side"]]) ## Method 4: Allow mismatches and indels on the ends with an FDR of 1e-03 quantile(randomScores4, seq(0.99, 1, by = 0.001)) ex6Aligns4 <- pairwiseAlignment(ex6Strings, adapter, type = "overlap") table(score(ex6Aligns4) > quantile(randomScores4, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(ex6Aligns4)) == 36, experiment[["side"]]) ################################################### ### code chunk number 42: ans6b ################################################### simulateReads <- function(N, left, right = left, experiment, substitutionRate = 0.01, gapRate = 0.001) { leftChars <- strsplit(as.character(left), "")[[1]] rightChars <- strsplit(as.character(right), "")[[1]] if (length(leftChars) != length(rightChars)) stop("left and right adapters must have the same number of characters") nChars <- length(leftChars) sapply(seq_len(N), function(i) { width <- experiment[["width"]][i] side <- experiment[["side"]][i] randomLetters <- function(n) sample(DNA_ALPHABET[1:4], n, replace = TRUE) randomLettersWithEmpty <- function(n) sample(c("", DNA_ALPHABET[1:4]), n, replace = TRUE, prob = c(1 - gapRate, rep(gapRate/4, 4))) if (side) { value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), rightChars), randomLettersWithEmpty(nChars), sep = "", collapse = "") value <- paste(c(randomLetters(36 - width), substring(value, 1, width)), sep = "", collapse = "") } else { value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), leftChars), randomLettersWithEmpty(nChars), sep = "", collapse = "") value <- paste(c(substring(value, 37 - width, 36), randomLetters(36 - width)), sep = "", collapse = "") } value }) } leftAdapter <- adapter rightAdapter <- reverseComplement(adapter) ex6LeftRightStrings <- simulateReads(N, leftAdapter, rightAdapter, experiment) ex6LeftAligns4 <- pairwiseAlignment(ex6LeftRightStrings, leftAdapter, type = "overlap") ex6RightAligns4 <- pairwiseAlignment(ex6LeftRightStrings, rightAdapter, type = "overlap") scoreCutoff <- quantile(randomScores4, 0.999) leftAligned <- start(pattern(ex6LeftAligns4)) == 1 & score(ex6LeftAligns4) > pmax(scoreCutoff, score(ex6RightAligns4)) rightAligned <- end(pattern(ex6RightAligns4)) == 36 & score(ex6RightAligns4) > pmax(scoreCutoff, score(ex6LeftAligns4)) table(leftAligned, rightAligned) table(leftAligned | rightAligned, experiment[["width"]]) ################################################### ### code chunk number 43: ans7a ################################################### genBankFullAlign <- pairwiseAlignment(srPhiX174, genBankPhage, patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") summary(genBankFullAlign, weight = wtPhiX174) ################################################### ### code chunk number 44: ans7b ################################################### genBankFullCoverage <- coverage(genBankFullAlign, weight = wtPhiX174) plot(as.integer(genBankFullCoverage), xlab = "Position", ylab = "Coverage", type = "l") slice(genBankFullCoverage, lower = 1) ################################################### ### code chunk number 45: ans7c ################################################### genBankFullAlignRevComp <- pairwiseAlignment(srPhiX174, reverseComplement(genBankPhage), patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") table(score(genBankFullAlignRevComp) > score(genBankFullAlign)) ################################################### ### code chunk number 46: ans8a ################################################### N <- as.integer(seq(5000, 50000, by = 5000)) newTimings <- rep(0, length(N)) names(newTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], 35, replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) newTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global"))[["user.self"]] } newTimings coef(summary(lm(newTimings ~ poly(N, 2)))) plot(N, newTimings, xlab = "Larger String Size", ylab = "Timing (sec.)", type = "l", main = "Global Pairwise Sequence Alignment Timings") ################################################### ### code chunk number 47: ans8b ################################################### newScoreOnlyTimings <- rep(0, length(N)) names(newScoreOnlyTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], 35, replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) newScoreOnlyTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global", scoreOnly = TRUE))[["user.self"]] } newScoreOnlyTimings round((newTimings - newScoreOnlyTimings) / newTimings, 2) ################################################### ### code chunk number 48: sessinfo ################################################### sessionInfo() pwalign/inst/doc/PairwiseAlignments.Rnw0000644000175100017510000015023414614311433021257 0ustar00biocbuildbiocbuild%\VignetteIndexEntry{Pairwise Sequence Alignments} %\VignetteKeywords{DNA, RNA, Sequence, Biostrings, Sequence alignment} %\VignettePackage{pwalign} % % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \textwidth=6.5in \textheight=8.5in %\parskip=.3cm \oddsidemargin=-.1in \evensidemargin=-.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\R}{{\textsf{R}}} \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\term}[1]{{\emph{#1}}} \newcommand{\Rpackage}[1]{\textsf{#1}} \newcommand{\Rfunction}[1]{\texttt{#1}} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Pairwise Sequence Alignments} \author{Patrick Aboyoun \\ Gentleman Lab \\ Fred Hutchinson Cancer Research Center \\ Seattle, WA} \date{\today} \maketitle \tableofcontents \section{Introduction} In this document we illustrate how to perform pairwise sequence alignments using the \Rpackage{pwalign} package's central function \Rfunction{pairwiseAlignment}. This function aligns a set of \Rfunarg{pattern} strings to a \Rfunarg{subject} string in a global, local, or overlap (ends-free) fashion with or without affine gaps using either a fixed or quality-based substitution scoring scheme. This function's computation time is proportional to the product of the two string lengths being aligned. \section{Pairwise Sequence Alignment Problems} The (Needleman-Wunsch) global, the (Smith-Waterman) local, and (ends-free) overlap pairwise sequence alignment problems are described as follows. Let string $S_i$ have $n_i$ characters $c_{(i,j)}$ with $j \in \left\{1, \ldots, n_i\right\}$. A pairwise sequence alignment is a mapping of strings $S_1$ and $S_2$ to gapped substrings ${S'}_1$ and ${S'}_2$ that are defined by \begin{eqnarray*} {S'}_1 & = & g_{\left(1,a_1\right)}c_{\left(1,a_1\right)} \cdots g_{\left(1,b_1\right)}c_{\left(1,b_1\right)}g_{\left(1,b_1+1\right)}\\ {S'}_2 & = & g_{\left(2,a_2\right)}c_{\left(2,a_2\right)} \cdots g_{\left(2,b_2\right)}c_{\left(2,b_2\right)}g_{\left(2,b_2+1\right)} \end{eqnarray*} \begin{tabbing} where \= \\ \> $a_i, b_i \in \{1, \ldots, n_i\}$ with $a_i \leq b_i$ \\ \> $g_{(i,j)} = 0$ or more gaps at the specified position $j$ for aligned string $i$ \\ \> $length({S'}_1) = length({S'}_2)$ \end{tabbing} Each of these pairwise sequence alignment problems is solved by maximizing the alignment \textit{score}. An alignment score is determined by the type of pairwise sequence alignment (global, local, overlap), which sets the $[a_i, b_i]$ ranges for the substrings; the substitution scoring scheme, which sets the distance between aligned characters; and the gap penalties, which is divided into opening and extension components. The optimal pairwise sequence alignment is the pairwise sequence alignment with the largest score for the specified alignment type, substitution scoring scheme, and gap penalties. The pairwise sequence alignment types, substitution scoring schemes, and gap penalties influence alignment scores in the following manner: \begin{description} \item{Pairwise Sequence Alignment Types: } The type of pairwise sequence alignment determines the substring ranges to apply the substitution scoring and gap penalty schemes. For the three primary (global, local, overlap) and two derivative (subject overlap, pattern overlap) pairwise sequence alignment types, the resulting substring ranges are as follows: \begin{description} \item{Global - } $[a_1, b_1] = [1, n_1]$ and $[a_2, b_2] = [1, n_2]$ \item{Local - } $[a_1, b_1]$ and $[a_2, b_2]$ \item{Overlap - } $\left\{[a_1, b_1] = [a_1, n_1], [a_2, b_2] = [1, b_2]\right\}$ or $\left\{[a_1, b_1] = [1, b_1], [a_2, b_2] = [a_2, n_2]\right\}$ \item{Subject Overlap - } $[a_1, b_1] = [1, n_1]$ and $[a_2, b_2]$ \item{Pattern Overlap - } $[a_1, b_1]$ and $[a_2, b_2] = [1, n_2]$ \end{description} \item{Substitution Scoring Schemes: } The substitution scoring scheme sets the values for the aligned character pairings within the substring ranges determined by the type of pairwise sequence alignment. This scoring scheme can be fixed for character pairings or quality-dependent for character pairings. (Characters that align with a gap are penalized according to the ``Gap Penalty'' framework.) \begin{description} \item{Fixed substitution scoring - } Fixed substitution scoring schemes associate each aligned character pairing with a value. These schemes are very common and include awarding one value for a match and another for a mismatch, Point Accepted Mutation (PAM) matrices, and Block Substitution Matrix (BLOSUM) matrices. \item{Quality-based substitution scoring - } Quality-based substitution scoring schemes derive the value for the aligned character pairing based on the probabilities of character recording errors \cite{Malde:2008}. Let $\epsilon_i$ be the probability of a character recording error. Assuming independence within and between recordings and a uniform background frequency of the different characters, the combined error probability of a mismatch when the underlying characters do match is $\epsilon_c = \epsilon_1 + \epsilon_2 - (n/(n-1)) * \epsilon_1 * \epsilon_2$, where $n$ is the number of characters in the underlying alphabet (e.g. in DNA and RNA, $n = 4$). Using $\epsilon_c$, the substitution score is given by $b * \log_2(\gamma_{(x,y)} * (1 - \epsilon_c) * n + (1 - \gamma_{(x,y)}) * \epsilon_c * (n/(n-1)))$, where $b$ is the bit-scaling for the scoring and $\gamma_{(x,y)}$ is the probability that characters $x$ and $y$ represents the same underlying letters (e.g. using IUPAC, $\gamma_{(A,A)} = 1$ and $\gamma_{(A,N)} = 1/4$). \end{description} \item{Gap Penalties: } Gap penalties are the values associated with the gaps within the substring ranges determined by the type of pairwise sequence alignment. These penalties are divided into \textit{gap opening} and \textit{gap extension} components, where the gap opening penalty is the cost for adding a new gap and the gap extension penalty is the incremental cost incurred along the length of the gap. A \textit{constant gap penalty} occurs when there is a cost associated with opening a gap, but no cost for the length of a gap (i.e. gap extension is zero). A \textit{linear gap penalty} occurs when there is no cost associated for opening a gap (i.e. gap opening is zero), but there is a cost for the length of the gap. An \textit{affine gap penalty} occurs when both the gap opening and gap extension have a non-zero associated cost. \end{description} \section{Main Pairwise Sequence Alignment Function} The \Rfunction{pairwiseAlignment} function solves the pairwise sequence alignment problems mentioned above. It aligns one or more strings specified in the \Rfunarg{pattern} argument with a single string specified in the \Rfunarg{subject} argument. <>= options(width=72) @ <>= library(pwalign) pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede") @ The type of pairwise sequence alignment is set by specifying the \Rfunarg{type} argument to be one of \texttt{"global"}, \texttt{"local"}, \texttt{"overlap"}, \texttt{"global-local"}, and \texttt{"local-global"}. <>= pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", type = "local") @ The gap penalties are regulated by the \Rfunarg{gapOpening} and \Rfunarg{gapExtension} arguments. <>= pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", gapOpening = 0, gapExtension = 1) @ The substitution scoring scheme is set using three arguments, two of which are quality-based related (\Rfunarg{patternQuality}, \Rfunarg{subjectQuality}) and one is fixed substitution related (\Rfunarg{substitutionMatrix}). When the substitution scores are fixed by character pairing, the \Rfunarg{substituionMatrix} argument takes a matrix with the appropriate alphabets as dimension names. The \Rfunction{nucleotideSubstitutionMatrix} function tranlates simple match and mismatch scores to the full spectrum of IUPAC nucleotide codes. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1) @ When the substitution scores are quality-based, the \Rfunarg{patternQuality} and \Rfunarg{subjectQuality} arguments represent the equivalent of $[x-99]$ numeric quality values for the respective strings, and the optional \Rfunarg{fuzzyMatrix} argument represents how the closely two characters match on a $[0,1]$ scale. The \Rfunarg{patternQuality} and \Rfunarg{subjectQuality} arguments accept quality measures in either a \Rclass{PhredQuality}, \Rclass{SolexaQuality}, or \Rclass{IlluminaQuality} scaling. For \Rclass{PhredQuality} and \Rclass{IlluminaQuality} measures $Q \in [0, 99]$, the probability of an error in the base read is given by $10^{-Q/10}$ and for \Rclass{SolexaQuality} measures $Q \in [-5, 99]$, they are given by $1 - 1/(1 + 10^{-Q/10})$. The \Rfunction{qualitySubstitutionMatrices} function maps the \Rfunarg{patternQuality} and \Rfunarg{subjectQuality} scores to match and mismatch penalties. These three arguments will be demonstrated in later sections. The final argument, \Rfunarg{scoreOnly}, to the \Rfunction{pairwiseAlignment} function accepts a logical value to specify whether or not to return just the pairwise sequence alignment score. If \Rfunarg{scoreOnly} is \Robject{FALSE}, the pairwise alignment with the maximum alignment score is returned. If more than one pairwise alignment has the maximum alignment score exists, the first alignment along the subject is returned. If there are multiple pairwise alignments with the maximum alignment score at the chosen subject location, then at each location along the alignment mismatches are given preference to insertions/deletions. For example, \code{pattern: [1] ATTA; subject: [1] AT-A} is chosen above \code{pattern: [1] ATTA; subject: [1] A-TA} if they both have the maximum alignment score. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) @ \subsection{Exercise 1} \begin{enumerate} \item Using \Rfunction{pairwiseAlignment}, fit the global, local, and overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} using the default settings. \item Do any of the alignments change if the \Rfunarg{gapExtension} argument is set to \Robject{-Inf}? \end{enumerate} [Answers provided in section \ref{sec:Answers1}.] \section{Pairwise Sequence Alignment Classes} Following the design principles of Bioconductor and R, the pairwise sequence alignment functionality in the \Rpackage{pwalign} package keeps the end user close to their data through the use of five specialty classes: \Rclass{PairwiseAlignments}, \Rclass{PairwiseAlignmentsSingleSubject}, \Rclass{PairwiseAlignmentsSingleSubjectSummary}, \Rclass{AlignedXStringSet}, and \Rclass{QualityAlignedXStringSet}. The \Rclass{PairwiseAlignmentsSingleSubject} class inherits from the \Rclass{PairwiseAlignments} class and they both hold the results of a fit from the \Rfunction{pairwiseAlignment} function, with the former class being used to represent all patterns aligning to a single subject and the latter being used to represent elementwise alignments between a set of patterns and a set of subjects. <>= pa1 <- pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede") class(pa1) @ and the \Rfunction{pairwiseAlignmentSummary} function holds the results of a summarized pairwise sequence alignment. <>= summary(pa1) class(summary(pa1)) @ The \Rclass{AlignedXStringSet} and \Rclass{QualityAlignedXStringSet} classes hold the ``gapped'' ${S'}_i$ substrings with the former class holding the results when the pairwise sequence alignment is performed with a fixed substitution scoring scheme and the latter class a quality-based scoring scheme. <>= class(pattern(pa1)) submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pa2 <- pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1) class(pattern(pa2)) @ \subsection{Exercise 2} \begin{enumerate} \item What is the primary benefit of formal summary classes like \Rclass{PairwiseAlignmentsSingleSubjectSummary} and \Rclass{summary.lm} to end users? \end{enumerate} [Answer provided in section \ref{sec:Answers2}.] \section{Pairwise Sequence Alignment Helper Functions} Tables \ref{table:helperfuns1}, \ref{table:helperfuns1} and \ref{table:alignfuns} show functions that interact with objects of class \Rclass{PairwiseAlignments}, \Rclass{PairwiseAlignmentsSingleSubject}, and \Rclass{AlignedXStringSet}. These functions should be used in preference to direct slot extraction from the alignment objects. \begin{table}[ht] \begin{center} \begin{tabular}{l|l} \hline Function & Description \\ \hline \Rfunction{[} & Extracts the specified elements of the alignment object \\ \Rfunction{alphabet} & Extracts the allowable characters in the original strings \\ \Rfunction{compareStrings} & Creates character string mashups of the alignments \\ \Rfunction{deletion} & Extracts the locations of the gaps inserted into the pattern for the alignments \\ \Rfunction{length} & Extracts the number of patterns aligned \\ \Rfunction{mismatchTable} & Creates a table for the mismatching positions \\ \Rfunction{nchar} & Computes the length of ``gapped'' substrings \\ \Rfunction{nedit} & Computes the Levenshtein edit distance of the alignments \\ \Rfunction{indel} & Extracts the locations of the insertion \& deletion gaps in the alignments \\ \Rfunction{insertion} & Extracts the locations of the gaps inserted into the subject for the alignments \\ \Rfunction{nindel} & Computes the number of insertions \& deletions in the alignments \\ \Rfunction{nmatch} & Computes the number of matching characters in the alignments \\ \Rfunction{nmismatch} & Computes the number of mismatching characters in the alignments \\ \Rfunction{pattern}, \Rfunction{subject} & Extracts the aligned pattern/subject \\ \Rfunction{pid} & Computes the percent sequence identity \\ \Rfunction{rep} & Replicates the elements of the alignment object \\ \Rfunction{score} & Extracts the pairwise sequence alignment scores \\ \Rfunction{type} & Extracts the type of pairwise sequence alignment \\ \hline \end{tabular} \end{center} \caption{Functions for \Rclass{PairwiseAlignments} and \Rclass{PairwiseAlignmentsSingleSubject} objects.} \label{table:helperfuns1} \end{table} \begin{table}[ht] \begin{center} \begin{tabular}{l|l} \hline Function & Description \\ \hline \Rfunction{aligned} & Creates an \Rclass{XStringSet} containing either ``filled-with-gaps'' or degapped aligned strings \\ \Rfunction{as.character} & Creates a character vector version of \Rfunction{aligned} \\ \Rfunction{as.matrix} & Creates an ``exploded" character matrix version of \Rfunction{aligned} \\ \Rfunction{consensusMatrix} & Computes a consensus matrix for the alignments \\ \Rfunction{consensusString} & Creates the string based on a 50\% + 1 vote from the consensus matrix \\ \Rfunction{coverage} & Computes the alignment coverage along the subject \\ \Rfunction{mismatchSummary} & Summarizes the information of the \Rfunction{mismatchTable} \\ \Rfunction{summary} & Summarizes a pairwise sequence alignment \\ \Rfunction{toString} & Creates a concatenated string version of \Rfunction{aligned} \\ \Rfunction{Views} & Creates an \Rclass{XStringViews} representing the aligned region along the subject \\ \hline \end{tabular} \end{center} \caption{Additional functions for \Rclass{PairwiseAlignmentsSingleSubject} objects.} \label{table:helperfuns2} \end{table} The \Rfunction{score}, \Rfunction{nedit}, \Rfunction{nmatch}, \Rfunction{nmismatch}, and \Rfunction{nchar} functions return numeric vectors containing information on the pairwise sequence alignment score, number of matches, number of mismatches, and number of aligned characters respectively. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pa2 <- pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1) score(pa2) nedit(pa2) nmatch(pa2) nmismatch(pa2) nchar(pa2) aligned(pa2) as.character(pa2) as.matrix(pa2) consensusMatrix(pa2) @ The \Rfunction{summary}, \Rfunction{mismatchTable}, and \Rfunction{mismatchSummary} functions return various summaries of the pairwise sequence alignments. <>= summary(pa2) mismatchTable(pa2) mismatchSummary(pa2) @ \begin{table}[ht] \begin{center} \begin{tabular}{l|l} \hline Function & Description \\ \hline \Rfunction{[} & Extracts the specified elements of the alignment object \\ \Rfunction{aligned}, \Rfunction{unaligned} & Extracts the aligned/unaligned strings \\ \Rfunction{alphabet} & Extracts the allowable characters in the original strings \\ \Rfunction{as.character}, \Rfunction{toString} & Converts the alignments to character strings \\ \Rfunction{coverage} & Computes the alignment coverage \\ \Rfunction{end} & Extracts the ending index of the aligned range \\ \Rfunction{indel} & Extracts the insertion/deletion locations \\ \Rfunction{length} & Extracts the number of patterns aligned \\ \Rfunction{mismatch} & Extracts the position of the mismatches \\ \Rfunction{mismatchSummary} & Summarizes the information of the \Rfunction{mismatchTable} \\ \Rfunction{mismatchTable} & Creates a table for the mismatching positions \\ \Rfunction{nchar} & Computes the length of ``gapped'' substrings \\ \Rfunction{nindel} & Computes the number of insertions/deletions in the alignments \\ \Rfunction{nmismatch} & Computes the number of mismatching characters in the alignments \\ \Rfunction{rep} & Replicates the elements of the alignment object \\ \Rfunction{start} & Extracts the starting index of the aligned range \\ \Rfunction{toString} & Creates a concatenated string containing the alignments \\ \Rfunction{width} & Extracts the width of the aligned range \\ \hline \end{tabular} \end{center} \caption{Functions for \Rclass{AlignedXString} and \Rclass{QualityAlignedXString} objects.} \label{table:alignfuns} \end{table} The \Rfunction{pattern} and \Rfunction{subject} functions extract the aligned pattern and subject objects for further analysis. Most of the actions that can be performed on \Rclass{PairwiseAlignments} objects can also be performed on \Rclass{AlignedXStringSet} and \Rclass{QualityAlignedXStringSet} objects as well as operations including \Rfunction{start}, \Rfunction{end}, and \Rfunction{width} that extracts the start, end, and width of the alignment ranges. <>= class(pattern(pa2)) aligned(pattern(pa2)) nindel(pattern(pa2)) start(subject(pa2)) end(subject(pa2)) @ \subsection{Exercise 3} For the overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} with the \Rfunction{pairwiseAlignment} default settings, perform the following operations: \begin{enumerate} \item Use \Rfunction{nmatch} and \Rfunction{nmismath} to extract the number of matches and mismatches respectively. \item Use the \Rfunction{compareStrings} function to get the symbolic representation of the alignment. \item Use the \Rfunction{as.character} function to the get the character string versions of the alignments. \item Use the \Rfunction{pattern} function to extract the aligned pattern and apply the \Rfunction{mismatch} function to it to find the locations of the mismatches. \item Use the \Rfunction{subject} function to extract the aligned subject and apply the \Rfunction{aligned} function to it to get the aligned strings. \end{enumerate} [Answers provided in section \ref{sec:Answers3}.] \section{Edit Distances} One of the earliest uses of pairwise sequence alignment is in the area of text analysis. In 1965 Vladimir Levenshtein considered a metric, now called the \textit{Levenshtein edit distance}, that measures the similarity between two strings. This distance metric is equivalent to the negative of the score of a pairwise sequence alignment with a match cost of 0, a mismatch cost of -1, a gap opening penalty of 0, and a gap extension penalty of 1. The \Rfunction{stringDist} uses the internals of the \Rfunction{pairwiseAlignment} function to calculate the Levenshtein edit distance matrix for a set of strings. There is also an implementation of approximate string matching using Levenshtein edit distance in the \Rfunction{agrep} (approximate grep) function of the \Rpackage{base} R package. As the following example shows, it is possible to replicate the \Rfunction{agrep} function using the \Rfunction{pairwiseAlignment} function. Since the \Rfunction{agrep} function is vectorized in \Rfunarg{x} rather than \Rfunarg{pattern}, these arguments are flipped in the call to \Rfunction{pairwiseAlignment}. <>= agrepBioC <- function(pattern, x, ignore.case = FALSE, value = FALSE, max.distance = 0.1) { if (!is.character(pattern)) pattern <- as.character(pattern) if (!is.character(x)) x <- as.character(x) if (max.distance < 1) max.distance <- ceiling(max.distance / nchar(pattern)) characters <- unique(unlist(strsplit(c(pattern, x), "", fixed = TRUE))) if (ignore.case) substitutionMatrix <- outer(tolower(characters), tolower(characters), function(x,y) -as.numeric(x!=y)) else substitutionMatrix <- outer(characters, characters, function(x,y) -as.numeric(x!=y)) dimnames(substitutionMatrix) <- list(characters, characters) distance <- - pairwiseAlignment(pattern = x, subject = pattern, substitutionMatrix = substitutionMatrix, type = "local-global", gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) whichClose <- which(distance <= max.distance) if (value) whichClose <- x[whichClose] whichClose } cbind(base = agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE), bioc = agrepBioC("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)) cbind(base = agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE), bioc = agrepBioC("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)) @ \subsection{Exercise 4} \begin{enumerate} \item Use the \Rfunction{pairwiseAlignment} function to find the Levenshtein edit distance between \Robject{"syzygy"} and \Robject{"zyzzyx"}. \item Use the \Rfunction{stringDist} function to find the Levenshtein edit distance for the vector \Robject{c("zyzzyx", "syzygy", "succeed", "precede", "supersede")}. \end{enumerate} [Answers provided in section \ref{sec:Answers4}.] \section{Application: Using Evolutionary Models in Protein Alignments} When proteins are believed to descend from a common ancestor, evolutionary models can be used as a guide in pairwise sequence alignments. The two most common families evolutionary models of proteins used in pairwise sequence alignments are Point Accepted Mutation (PAM) matrices, which are based on explicit evolutionary models, and Block Substitution Matrix (BLOSUM) matrices, which are based on data-derived evolution models. The \Rpackage{pwalign} package contains 5 PAM and 5 BLOSUM matrices (\Robject{PAM30} \Robject{PAM40}, \Robject{PAM70}, \Robject{PAM120}, \Robject{PAM250}, \Robject{BLOSUM45}, \Robject{BLOSUM50}, \Robject{BLOSUM62}, \Robject{BLOSUM80}, and \Robject{BLOSUM100}) that can be used in the \Rfunarg{substitutionMatrix} argument to the \Rfunction{pairwiseAlignment} function. Here is an example pairwise sequence alignment of amino acids from Durbin, Eddy et al being fit by the \Rfunction{pairwiseAlignment} function using the \Robject{BLOSUM50} matrix: <>= data(BLOSUM50) BLOSUM50[1:4,1:4] nwdemo <- pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"), substitutionMatrix = BLOSUM50, gapOpening = 0, gapExtension = 8) nwdemo compareStrings(nwdemo) pid(nwdemo) @ \subsection{Exercise 5} \begin{enumerate} \item Repeat the alignment exercise above using \Robject{BLOSUM62}, a gap opening penalty of 12, and a gap extension penalty of 4. \item Explore to find out what caused the alignment to change. \end{enumerate} [Answers provided in section \ref{sec:Answers5}.] \section{Application: Removing Adapters from Sequence Reads} Finding and removing uninteresting experiment process-related fragments like adapters is a common problem in genetic sequencing, and pairwise sequence alignment is well-suited to address this issue. When adapters are used to anchor or extend a sequence during the experiment process, they either intentionally or unintentionally become sequenced during the read process. The following code simulates what sequences with adapter fragments at either end could look like during an experiment. <>= simulateReads <- function(N, adapter, experiment, substitutionRate = 0.01, gapRate = 0.001) { chars <- strsplit(as.character(adapter), "")[[1]] sapply(seq_len(N), function(i, experiment, substitutionRate, gapRate) { width <- experiment[["width"]][i] side <- experiment[["side"]][i] randomLetters <- function(n) sample(DNA_ALPHABET[1:4], n, replace = TRUE) randomLettersWithEmpty <- function(n) sample(c("", DNA_ALPHABET[1:4]), n, replace = TRUE, prob = c(1 - gapRate, rep(gapRate/4, 4))) nChars <- length(chars) value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), chars), randomLettersWithEmpty(nChars), sep = "", collapse = "") if (side) value <- paste(c(randomLetters(36 - width), substring(value, 1, width)), sep = "", collapse = "") else value <- paste(c(substring(value, 37 - width, 36), randomLetters(36 - width)), sep = "", collapse = "") value }, experiment = experiment, substitutionRate = substitutionRate, gapRate = gapRate) } adapter <- DNAString("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") set.seed(123) N <- 1000 experiment <- list(side = rbinom(N, 1, 0.5), width = sample(0:36, N, replace = TRUE)) table(experiment[["side"]], experiment[["width"]]) adapterStrings <- simulateReads(N, adapter, experiment, substitutionRate = 0.01, gapRate = 0.001) adapterStrings <- DNAStringSet(adapterStrings) @ These simulated strings above have 0 to 36 characters from the adapters attached to either end. We can use completely random strings as a baseline for any pairwise sequence alignment methodology we develop to remove the adapter characters. <>= M <- 5000 randomStrings <- apply(matrix(sample(DNA_ALPHABET[1:4], 36 * M, replace = TRUE), nrow = M), 1, paste, collapse = "") randomStrings <- DNAStringSet(randomStrings) @ Since edit distances are easy to explain, it serves as a good place to start for developing a adapter removal methodology. Unfortunately given that it is based on a global alignment, it only is useful for filtering out sequences that are derived primarily from the adapter. <>= ## Method 1: Use edit distance with an FDR of 1e-03 submat1 <- nucleotideSubstitutionMatrix(match = 0, mismatch = -1, baseOnly = TRUE) randomScores1 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) quantile(randomScores1, seq(0.99, 1, by = 0.001)) adapterAligns1 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1) table(score(adapterAligns1) > quantile(randomScores1, 0.999), experiment[["width"]]) @ One improvement to removing adapters is to look at consecutive matches anywhere within the sequence. This is more versatile than the edit distance method, but it requires a relatively large number of consecutive matches and is susceptible to issues related to error related substitutions and insertions/deletions. <>= ## Method 2: Use consecutive matches anywhere in string with an FDR of 1e-03 submat2 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) randomScores2 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf, scoreOnly = TRUE) quantile(randomScores2, seq(0.99, 1, by = 0.001)) adapterAligns2 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf) table(score(adapterAligns2) > quantile(randomScores2, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(start(pattern(adapterAligns2)) > 37 - end(pattern(adapterAligns2)), experiment[["side"]]) @ Limiting consecutive matches to the ends provides better results, but it doesn't resolve the issues related to substitutions and insertions/deletions errors. <>= ## Method 3: Use consecutive matches on the ends with an FDR of 1e-03 submat3 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) randomScores3 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf, scoreOnly = TRUE) quantile(randomScores3, seq(0.99, 1, by = 0.001)) adapterAligns3 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf) table(score(adapterAligns3) > quantile(randomScores3, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(adapterAligns3)) == 36, experiment[["side"]]) @ Allowing for substitutions and insertions/deletions errors in the pairwise sequence alignments provides much better results for finding adapter fragments. <>= ## Method 4: Allow mismatches and indels on the ends with an FDR of 1e-03 randomScores4 <- pairwiseAlignment(randomStrings, adapter, type = "overlap", scoreOnly = TRUE) quantile(randomScores4, seq(0.99, 1, by = 0.001)) adapterAligns4 <- pairwiseAlignment(adapterStrings, adapter, type = "overlap") table(score(adapterAligns4) > quantile(randomScores4, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(adapterAligns4)) == 36, experiment[["side"]]) @ Using the results that allow for substitutions and insertions/deletions errors, the cleaned sequence fragments can be generated as follows: <>= ## Method 4 continued: Remove adapter fragments fragmentFound <- score(adapterAligns4) > quantile(randomScores4, 0.999) fragmentFoundAt1 <- fragmentFound & (start(pattern(adapterAligns4)) == 1) fragmentFoundAt36 <- fragmentFound & (end(pattern(adapterAligns4)) == 36) cleanedStrings <- as.character(adapterStrings) cleanedStrings[fragmentFoundAt1] <- as.character(narrow(adapterStrings[fragmentFoundAt1], end = 36, width = 36 - end(pattern(adapterAligns4[fragmentFoundAt1])))) cleanedStrings[fragmentFoundAt36] <- as.character(narrow(adapterStrings[fragmentFoundAt36], start = 1, width = start(pattern(adapterAligns4[fragmentFoundAt36])) - 1)) cleanedStrings <- DNAStringSet(cleanedStrings) cleanedStrings @ \subsection{Exercise 6} \begin{enumerate} \item Rerun the simulation time using the \Rfunction{simulateReads} function with a \Rfunarg{substitutionRate} of 0.005 and \Rfunarg{gapRate} of 0.0005. How do the different pairwise sequence alignment methods compare? \item (Advanced) Modify the \Rfunction{simulateReads} function to accept different equal length adapters on either side (left \& right) of the reads. How would the methods for trimming the reads change? \end{enumerate} [Answers provided in section \ref{sec:Answers6}.] \section{Application: Quality Assurance in Sequencing Experiments} Due to its flexibility, the \Rfunction{pairwiseAlignment} function is able to diagnose sequence matching-related issues that arise when \Rfunction{matchPDict} and its related functions don't find a match. This section contains an example involving a short read Solexa sequencing experiment of bacteriophage $\phi$ X174 DNA produced by New England BioLabs (NEB). This experiment contains slightly less than 5000 unique short reads in \Robject{srPhiX174}, with quality measures in \Robject{quPhiX174}, and frequency for those short reads in \Robject{wtPhiX174}. In order to demonstrate how to find sequence differences in the target, these short reads will be compared against the bacteriophage $\phi$ X174 genome NC\_001422 from the GenBank database. <>= data(phiX174Phage) genBankPhage <- phiX174Phage[[1]] nchar(genBankPhage) data(srPhiX174) srPhiX174 quPhiX174 summary(wtPhiX174) fullShortReads <- rep(srPhiX174, wtPhiX174) srPDict <- PDict(fullShortReads) table(countPDict(srPDict, genBankPhage)) @ For these short reads, the \Rfunction{pairwiseAlignment} function finds that the small number of perfect matches is due to two locations on the bacteriophage $\phi$X174 genome. Unlike the \Rfunction{countPDict} function from the \Rpackage{Biostrings} package, the \Rfunction{pairwiseAlignment} function works off of the original strings, rather than \Rfunction{PDict} processed strings, and to be computationally efficient it is recommended that the unique sequences are supplied to the \Rfunction{pairwiseAlignment} function, and the frequencies of those sequences are supplied to the \Rfunarg{weight} argument of functions like \Rfunction{summary}, \Rfunction{mismatchSummary}, and \Rfunction{coverage}. For the purposes of this exercise, a substring of the GenBank bacteriophage $\phi$ X174 genome is supplied to the \Rfunarg{subject} argument of the \Rfunction{pairwiseAlignment} function to reduce the computation time. <>= genBankSubstring <- substring(genBankPhage, 2793-34, 2811+34) genBankAlign <- pairwiseAlignment(srPhiX174, genBankSubstring, patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") summary(genBankAlign, weight = wtPhiX174) revisedPhage <- replaceLetterAt(genBankPhage, c(2793, 2811), "TT") table(countPDict(srPDict, revisedPhage)) @ The following plot shows the coverage of the aligned short reads along the substring of the bacteriophage $\phi$ X174 genome. Applying the \Rfunction{slice} function to the coverage shows the entire substring is covered by aligned short reads. <>= genBankCoverage <- coverage(genBankAlign, weight = wtPhiX174) plot((2793-34):(2811+34), as.integer(genBankCoverage), xlab = "Position", ylab = "Coverage", type = "l") nchar(genBankSubstring) slice(genBankCoverage, lower = 1) @ \subsection{Exercise 7} \begin{enumerate} \item Rerun the global-local alignment of the short reads against the entire genome. (This may take a few minutes.) \item Plot the coverage of these alignments and use the \Rfunction{slice} function to find the ranges of alignment. Are there any alignments outside of the substring region that was used above? \item Use the \Rfunction{reverseComplement} function on the bacteriophage $\phi$ X174 genome. Do any short reads have a higher alignment score on this new sequence than on the original sequence? \end{enumerate} [Answers provided in section \ref{sec:Answers7}.] \section{Computation Profiling} The \Rfunction{pairwiseAlignment} function uses a dynamic programming algorithm based on the Needleman-Wunsch and Smith-Waterman algorithms for global and local pairwise sequence alignments respectively. The algorithm consumes memory and computation time proportional to the product of the length of the two strings being aligned. <>= N <- as.integer(seq(500, 5000, by = 500)) timings <- rep(0, length(N)) names(timings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) timings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global"))[["user.self"]] } timings coef(summary(lm(timings ~ poly(N, 2)))) plot(N, timings, xlab = "String Size, Both Strings", ylab = "Timing (sec.)", type = "l", main = "Global Pairwise Sequence Alignment Timings") @ When a problem only requires the pairwise sequence alignment score, setting the \Rfunarg{scoreOnly} argument to \Robject{TRUE} will more than halve the computation time. <>= scoreOnlyTimings <- rep(0, length(N)) names(scoreOnlyTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) scoreOnlyTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global", scoreOnly = TRUE))[["user.self"]] } scoreOnlyTimings round((timings - scoreOnlyTimings) / timings, 2) @ \subsection{Exercise 8} \begin{enumerate} \item Rerun the first set of profiling code, but this time fix the number of characters in \Robject{string1} to 35 and have the number of characters in \Robject{string2} range from 5000, 50000, by increments of 5000. What is the computational order of this simulation exercise? \item Rerun the second set of profiling code using the simulations from the previous exercise with \Rfunarg{scoreOnly} argument set to \Robject{TRUE}. Is is still twice as fast? \end{enumerate} [Answers provided in section \ref{sec:Answers8}.] \section{Computing alignment consensus matrices} The \Rfunction{consensusMatrix} function is provided for computing a consensus matrix for a set of equal-length strings assumed to be aligned. To illustrate, the following application assumes the ORF data to be aligned for the first 10 positions (patently false): <>= file <- system.file("extdata", "someORF.fa", package="Biostrings") orf <- readDNAStringSet(file) orf orf10 <- DNAStringSet(orf, end=10) consensusMatrix(orf10, as.prob=TRUE, baseOnly=TRUE) @ The information content as defined by Hertz and Stormo 1995 is computed as follows: <>= informationContent <- function(Lmers) { zlog <- function(x) ifelse(x==0,0,log(x)) co <- consensusMatrix(Lmers, as.prob=TRUE) lets <- rownames(co) fr <- alphabetFrequency(Lmers, collapse=TRUE)[lets] fr <- fr / sum(fr) sum(co*zlog(co/fr), na.rm=TRUE) } informationContent(orf10) @ \section{Exercise Answers} \subsection{Exercise 1} \label{sec:Answers1} \begin{enumerate} \item Using \Rfunction{pairwiseAlignment}, fit the global, local, and overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} using the default settings. <>= pairwiseAlignment("zyzzyx", "syzygy") pairwiseAlignment("zyzzyx", "syzygy", type = "local") pairwiseAlignment("zyzzyx", "syzygy", type = "overlap") @ \item Do any of the alignments change if the \Rfunarg{gapExtension} argument is set to \Robject{-Inf}? \textit{Yes, the overlap pairwise sequence alignment changes.} <>= pairwiseAlignment("zyzzyx", "syzygy", type = "overlap", gapExtension = Inf) @ \end{enumerate} \subsection{Exercise 2} \label{sec:Answers2} \begin{enumerate} \item What is the primary benefit of formal summary classes like \Rclass{PairwiseAlignmentsSingleSubjectSummary} and \Rclass{summary.lm} to end users? \textit{These classes allow the end user to extract the summary output for further operations.} <>= ex2 <- summary(pairwiseAlignment("zyzzyx", "syzygy")) nmatch(ex2) / nmismatch(ex2) @ \end{enumerate} \subsection{Exercise 3} \label{sec:Answers3} For the overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} with the \Rfunction{pairwiseAlignment} default settings, perform the following operations: <>= ex3 <- pairwiseAlignment("zyzzyx", "syzygy", type = "overlap") @ \begin{enumerate} \item Use \Rfunction{nmatch} and \Rfunction{nmismath} to extract the number of matches and mismatches respectively. <>= nmatch(ex3) nmismatch(ex3) @ \item Use the \Rfunction{compareStrings} function to get the symbolic representation of the alignment. <>= compareStrings(ex3) @ \item Use the \Rfunction{as.character} function to the get the character string versions of the alignments. <>= as.character(ex3) @ \item Use the \Rfunction{pattern} function to extract the aligned pattern and apply the \Rfunction{mismatch} function to it to find the locations of the mismatches. <>= mismatch(pattern(ex3)) @ \item Use the \Rfunction{subject} function to extract the aligned subject and apply the \Rfunction{aligned} function to it to get the aligned strings. <>= aligned(subject(ex3)) @ \end{enumerate} \subsection{Exercise 4} \label{sec:Answers4} \begin{enumerate} \item Use the \Rfunction{pairwiseAlignment} function to find the Levenshtein edit distance between \Robject{"syzygy"} and \Robject{"zyzzyx"}. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 - pairwiseAlignment("zyzzyx", "syzygy", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) @ \item Use the \Rfunction{stringDist} function to find the Levenshtein edit distance for the vector \Robject{c("zyzzyx", "syzygy", "succeed", "precede", "supersede")}. <>= stringDist(c("zyzzyx", "syzygy", "succeed", "precede", "supersede")) @ \end{enumerate} \subsection{Exercise 5} \label{sec:Answers5} \begin{enumerate} \item Repeat the alignment exercise above using \Robject{BLOSUM62}, a gap opening penalty of 12, and a gap extension penalty of 4. <>= data(BLOSUM62) pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"), substitutionMatrix = BLOSUM62, gapOpening = 12, gapExtension = 4) @ \item Explore to find out what caused the alignment to change. \textit{The sift in gap penalties favored infrequent long gaps to frequent short ones.} \end{enumerate} \subsection{Exercise 6} \label{sec:Answers6} \begin{enumerate} \item Rerun the simulation time using the \Rfunction{simulateReads} function with a \Rfunarg{substitutionRate} of 0.005 and \Rfunarg{gapRate} of 0.0005. How do the different pairwise sequence alignment methods compare? \textit{The different methods are much more comprobable when the error rates are lower.} <>= adapter <- DNAString("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") set.seed(123) N <- 1000 experiment <- list(side = rbinom(N, 1, 0.5), width = sample(0:36, N, replace = TRUE)) table(experiment[["side"]], experiment[["width"]]) ex6Strings <- simulateReads(N, adapter, experiment, substitutionRate = 0.005, gapRate = 0.0005) ex6Strings <- DNAStringSet(ex6Strings) ex6Strings ## Method 1: Use edit distance with an FDR of 1e-03 submat1 <- nucleotideSubstitutionMatrix(match = 0, mismatch = -1, baseOnly = TRUE) quantile(randomScores1, seq(0.99, 1, by = 0.001)) ex6Aligns1 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1) table(score(ex6Aligns1) > quantile(randomScores1, 0.999), experiment[["width"]]) ## Method 2: Use consecutive matches anywhere in string with an FDR of 1e-03 submat2 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) quantile(randomScores2, seq(0.99, 1, by = 0.001)) ex6Aligns2 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf) table(score(ex6Aligns2) > quantile(randomScores2, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(start(pattern(ex6Aligns2)) > 37 - end(pattern(ex6Aligns2)), experiment[["side"]]) ## Method 3: Use consecutive matches on the ends with an FDR of 1e-03 submat3 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) ex6Aligns3 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf) table(score(ex6Aligns3) > quantile(randomScores3, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(ex6Aligns3)) == 36, experiment[["side"]]) ## Method 4: Allow mismatches and indels on the ends with an FDR of 1e-03 quantile(randomScores4, seq(0.99, 1, by = 0.001)) ex6Aligns4 <- pairwiseAlignment(ex6Strings, adapter, type = "overlap") table(score(ex6Aligns4) > quantile(randomScores4, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(ex6Aligns4)) == 36, experiment[["side"]]) @ \item (Advanced) Modify the \Rfunction{simulateReads} function to accept different equal length adapters on either side (left \& right) of the reads. How would the methods for trimming the reads change? <>= simulateReads <- function(N, left, right = left, experiment, substitutionRate = 0.01, gapRate = 0.001) { leftChars <- strsplit(as.character(left), "")[[1]] rightChars <- strsplit(as.character(right), "")[[1]] if (length(leftChars) != length(rightChars)) stop("left and right adapters must have the same number of characters") nChars <- length(leftChars) sapply(seq_len(N), function(i) { width <- experiment[["width"]][i] side <- experiment[["side"]][i] randomLetters <- function(n) sample(DNA_ALPHABET[1:4], n, replace = TRUE) randomLettersWithEmpty <- function(n) sample(c("", DNA_ALPHABET[1:4]), n, replace = TRUE, prob = c(1 - gapRate, rep(gapRate/4, 4))) if (side) { value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), rightChars), randomLettersWithEmpty(nChars), sep = "", collapse = "") value <- paste(c(randomLetters(36 - width), substring(value, 1, width)), sep = "", collapse = "") } else { value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), leftChars), randomLettersWithEmpty(nChars), sep = "", collapse = "") value <- paste(c(substring(value, 37 - width, 36), randomLetters(36 - width)), sep = "", collapse = "") } value }) } leftAdapter <- adapter rightAdapter <- reverseComplement(adapter) ex6LeftRightStrings <- simulateReads(N, leftAdapter, rightAdapter, experiment) ex6LeftAligns4 <- pairwiseAlignment(ex6LeftRightStrings, leftAdapter, type = "overlap") ex6RightAligns4 <- pairwiseAlignment(ex6LeftRightStrings, rightAdapter, type = "overlap") scoreCutoff <- quantile(randomScores4, 0.999) leftAligned <- start(pattern(ex6LeftAligns4)) == 1 & score(ex6LeftAligns4) > pmax(scoreCutoff, score(ex6RightAligns4)) rightAligned <- end(pattern(ex6RightAligns4)) == 36 & score(ex6RightAligns4) > pmax(scoreCutoff, score(ex6LeftAligns4)) table(leftAligned, rightAligned) table(leftAligned | rightAligned, experiment[["width"]]) @ \end{enumerate} \subsection{Exercise 7} \label{sec:Answers7} \begin{enumerate} \item Rerun the global-local alignment of the short reads against the entire genome. (This may take a few minutes.) <>= genBankFullAlign <- pairwiseAlignment(srPhiX174, genBankPhage, patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") summary(genBankFullAlign, weight = wtPhiX174) @ \item Plot the coverage of these alignments and use the \Rfunction{slice} function to find the ranges of alignment. Are there any alignments outside of the substring region that was used above? \textit{Yes, there are some alignments outside of the specified substring region.} <>= genBankFullCoverage <- coverage(genBankFullAlign, weight = wtPhiX174) plot(as.integer(genBankFullCoverage), xlab = "Position", ylab = "Coverage", type = "l") slice(genBankFullCoverage, lower = 1) @ \item Use the \Rfunction{reverseComplement} function on the bacteriophage $\phi$ X174 genome. Do any short reads have a higher alignment score on this new sequence than on the original sequence? \textit{Yes, there are some strings with a higher score on the new sequence.} <>= genBankFullAlignRevComp <- pairwiseAlignment(srPhiX174, reverseComplement(genBankPhage), patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") table(score(genBankFullAlignRevComp) > score(genBankFullAlign)) @ \end{enumerate} \subsection{Exercise 8} \label{sec:Answers8} \begin{enumerate} \item Rerun the first set of profiling code, but this time fix the number of characters in \Robject{string1} to 35 and have the number of characters in \Robject{string2} range from 5000, 50000, by increments of 5000. What is the computational order of this simulation exercise? \textit{As expected, the growth in time is now linear.} <>= N <- as.integer(seq(5000, 50000, by = 5000)) newTimings <- rep(0, length(N)) names(newTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], 35, replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) newTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global"))[["user.self"]] } newTimings coef(summary(lm(newTimings ~ poly(N, 2)))) plot(N, newTimings, xlab = "Larger String Size", ylab = "Timing (sec.)", type = "l", main = "Global Pairwise Sequence Alignment Timings") @ \item Rerun the second set of profiling code using the simulations from the previous exercise with \Rfunarg{scoreOnly} argument set to \Robject{TRUE}. Is is still twice as fast? \textit{Yes, it is still over twice as fast.} <>= newScoreOnlyTimings <- rep(0, length(N)) names(newScoreOnlyTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], 35, replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) newScoreOnlyTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global", scoreOnly = TRUE))[["user.self"]] } newScoreOnlyTimings round((newTimings - newScoreOnlyTimings) / newTimings, 2) @ \end{enumerate} \section{Session Information} All of the output in this vignette was produced under the following conditions: <>= sessionInfo() @ \begin{thebibliography}{} \bibitem{Durbin:1998} {Durbin, R.}, {Eddy, S.}, {Krogh, A.}, and {Mitchison G.} \newblock {\em Biological Sequence Analysis}. \newblock Cambridge UP 1998, sec 2.3. \bibitem{Haubold:2006} {Haubold, B.} and {Wiehe, T.} \newblock {\em Introduction to Computational Biology}. \newblock Birkhauser Verlag 2006, Chapter 2. \bibitem{Malde:2008} {Malde, K.} \newblock The effect of sequence quality on sequence alignment. \newblock {\em Bioinformatics}, 24(7):897-900, 2008. \bibitem{NeedWun:1970} {Needleman,S.} and {Wunsch,C.} \newblock A general method applicable to the search for similarities in the amino acid sequence of two proteins. \newblock {\em Journal of Molecular Biology}, 48, 443-453, 1970. \bibitem{Smith:2003} {Smith, H.}; {Hutchison, C.}; {Pfannkoch, C.}; and {Venter, C.} \newblock Generating a synthetic genome by whole genome assembly: \{phi\}X174 bacteriophage from synthetic oligonucleotides. \newblock {\em Proceedings of the National Academy of Sciences}, 100(26): 15440-15445, 2003. \bibitem{SmithWater:1981} {Smith,T.F.} and {Waterman,M.S.} \newblock Identification of common molecular subsequences. \newblock {\em Journal of Molecular Biology}, 147, 195-197, 1981. \end{thebibliography} \end{document} pwalign/inst/unitTests/0000755000175100017510000000000014614311433016212 5ustar00biocbuildbiocbuildpwalign/inst/unitTests/test_pairwiseAlignment.R0000644000175100017510000002413314614311433023061 0ustar00biocbuildbiocbuild### FIXME!! BROKEN_test_pairwiseAlignment_emptyString <- function() { string1 <- DNAStringSet("") string2 <- DNAStringSet("ACGT") ## Empty pattern. alignment <- pairwiseAlignment(string1, string2) checkEquals(as.character(aligned(pattern(alignment))), "") checkEquals(as.character(aligned(subject(alignment))), "") checkEquals(score(alignment), -26) checkEquals(pairwiseAlignment(string1, string2, scoreOnly = TRUE), -26) ## Empty subject. alignment <- pairwiseAlignment(string2, string1) checkEquals(as.character(aligned(pattern(alignment))), "") checkEquals(as.character(aligned(subject(alignment))), "") checkEquals(score(alignment), -26) checkEquals(pairwiseAlignment(string2, string1, scoreOnly = TRUE), -26) ## Empty pattern and subject. alignment <- pairwiseAlignment(string1, string1) checkEquals(as.character(aligned(pattern(alignment))), "") checkEquals(score(alignment), 0) checkEquals(pairwiseAlignment(string1, string1, scoreOnly = TRUE), 0) } test_pairwiseAlignment_emptyLocalAlign <- function() { string1 <- DNAString("A") string2 <- DNAString("T") alignment <- pairwiseAlignment(string1, string2, type = "local") checkEquals(as.character(aligned(pattern(alignment))), "") checkEquals(as.character(aligned(subject(alignment))), "") checkEquals(score(alignment), 0) checkEquals(pairwiseAlignment(string1, string2, type = "local", scoreOnly = TRUE), 0) } test_pairwiseAlignment_gappedLocalAlign <- function() { string1 <- DNAString("TCAGTTGCCAAACCCGCT") string2 <- DNAString("AGGGTTGACATCCGTTTT") sigma <- nucleotideSubstitutionMatrix(match = 10, mismatch = -10, baseOnly = TRUE) alignment <- pairwiseAlignment(string1, string2, substitutionMatrix = sigma, gapOpening = 12, gapExtension = 3, type="local") checkEquals(as.character(aligned(pattern(alignment))), "GTTGCCAAACCCG") checkEquals(as.character(aligned(subject(alignment))), "GTTGACAT--CCG") checkEquals(score(alignment), 52) } test_pairwiseAlignment_backToBackIndel <- function() { mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -10, baseOnly = TRUE) string1 <- DNAString("AC") string2 <- DNAString("AT") alignment <- pairwiseAlignment(string1, string2, gapOpening = 0, substitutionMatrix = mat) alignmentScore <- pairwiseAlignment(string1, string2, gapOpening = 0, substitutionMatrix = mat, scoreOnly = TRUE) checkEquals(as.character(aligned(pattern(alignment))), "A-") checkEquals(as.character(aligned(subject(alignment))), "AT") checkEquals(score(alignment), -7) checkEquals(alignmentScore, -7) } test_pairwiseAlignment_editDistance <- function() { string1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") string2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") mat <- nucleotideSubstitutionMatrix(match = 0, mismatch = -1, baseOnly = TRUE) globalAlign <- pairwiseAlignment(string1, string2, substitutionMatrix = mat, gapOpening = 0, gapExtension = 1) globalAlignScore <- pairwiseAlignment(string1, string2, substitutionMatrix = mat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) checkEquals(as.character(pattern(globalAlign)), "ACTTCACCAGCTCCCTGGCGG-TAAGTTGATC-A-AAGGA-A-ACGCA-A-AGTTTTCAAG") checkEquals(as.character(subject(globalAlign)), "GTTTCACTA-CTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") checkEquals(compareStrings(globalAlign), "??TTCAC?A+CT?CCT??CGG-TAAGT??AT?-A-AA??A-A-A???A-A-A?TTTTCA??") checkEquals(score(globalAlign), -25) checkEquals(globalAlignScore, -25) } test_pairwiseAlignment_zeroOpening <- function() { string1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") string2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly = TRUE) globalAlign <- pairwiseAlignment(string1, string2, substitutionMatrix = mat, gapOpening = 0, gapExtension = 5) globalAlignScore <- pairwiseAlignment(string1, string2, substitutionMatrix = mat, gapOpening = 0, gapExtension = 5, scoreOnly = TRUE) overlapAlign <- pairwiseAlignment(string1, string2, type = "overlap", substitutionMatrix = mat, gapOpening = 0, gapExtension = 5) overlapAlignScore <- pairwiseAlignment(string1, string2, type = "overlap", substitutionMatrix = mat, gapOpening = 0, gapExtension = 5, scoreOnly = TRUE) localAlign <- pairwiseAlignment(string1, string2, type = "local", substitutionMatrix = mat, gapOpening = 0, gapExtension = 5) localAlignScore <- pairwiseAlignment(string1, string2, type = "local", substitutionMatrix = mat, gapOpening = 0, gapExtension = 5, scoreOnly = TRUE) checkEquals(as.character(pattern(globalAlign)), "ACTTCACCAGCTCCCTGGCGG-TAAGTTGATC-A-AAGGA-A-ACGCA-A-AGTTTTCAAG") checkEquals(as.character(subject(globalAlign)), "GTTTCACTA-CTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") checkEquals(compareStrings(globalAlign), "??TTCAC?A+CT?CCT??CGG-TAAGT??AT?-A-AA??A-A-A???A-A-A?TTTTCA??") checkEquals(score(globalAlign), -55) checkEquals(globalAlignScore, -55) checkEquals(as.character(pattern(overlapAlign)), "G") checkEquals(as.character(subject(overlapAlign)), "G") checkEquals(score(overlapAlign), 1) checkEquals(overlapAlignScore, 1) checkEquals(as.character(pattern(localAlign)), "GGTAAGT") checkEquals(as.character(subject(localAlign)), "GGTAAGT") checkEquals(score(localAlign), 7) checkEquals(localAlignScore, 7) } test_pairwiseAlignment_fixedSubstitutionMatrix <- function() { string1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") string2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly = TRUE) globalAlign <- pairwiseAlignment(string1, string2, substitutionMatrix = mat, gapOpening = 5, gapExtension = 2) globalAlignScore <- pairwiseAlignment(string1, string2, substitutionMatrix = mat, gapOpening = 5, gapExtension = 2, scoreOnly = TRUE) overlapAlign <- pairwiseAlignment(string1, string2, type = "overlap", substitutionMatrix = mat, gapOpening = 5, gapExtension = 2) overlapAlignScore <- pairwiseAlignment(string1, string2, type = "overlap", substitutionMatrix = mat, gapOpening = 5, gapExtension = 2, scoreOnly = TRUE) localAlign <- pairwiseAlignment(string1, string2, type = "local", substitutionMatrix = mat, gapOpening = 5, gapExtension = 2) localAlignScore <- pairwiseAlignment(string1, string2, type = "local", substitutionMatrix = mat, gapOpening = 5, gapExtension = 2, scoreOnly = TRUE) checkEquals(as.character(pattern(globalAlign)), "ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG") checkEquals(as.character(subject(globalAlign)), "GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") checkEquals(compareStrings(globalAlign), "??TTCAC?A??TCC?T???GGTAAGT??AT?---AAA??---AAA???A?A?TTTTCA??") checkEquals(score(globalAlign), -52) checkEquals(globalAlignScore, -52) checkEquals(as.character(pattern(overlapAlign)), "G") checkEquals(as.character(subject(overlapAlign)), "G") checkEquals(score(overlapAlign), 1) checkEquals(overlapAlignScore, 1) checkEquals(as.character(pattern(localAlign)), "GGTAAGT") checkEquals(as.character(subject(localAlign)), "GGTAAGT") checkEquals(score(localAlign), 7) checkEquals(localAlignScore, 7) } test_pairwiseAlignment_qualityScoring <- function() { string1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") string2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") classes <- c("PhredQuality", "SolexaQuality", "IlluminaQuality") for (qualityClass in classes) { scoring <- qualitySubstitutionMatrices(qualityClass = qualityClass)["22", "22", c("1", "0")] stringQuality <- do.call(qualityClass, list(22L)) globalAlign <- pairwiseAlignment(string1, string2, patternQuality = stringQuality, subjectQuality = stringQuality) globalAlignScore <- pairwiseAlignment(string1, string2, scoreOnly = TRUE, patternQuality = stringQuality, subjectQuality = stringQuality) overlapAlign <- pairwiseAlignment(string1, string2, type = "overlap", patternQuality = stringQuality, subjectQuality = stringQuality) overlapAlignScore <- pairwiseAlignment(string1, string2, type = "overlap", scoreOnly = TRUE, patternQuality = stringQuality, subjectQuality = stringQuality) localAlign <- pairwiseAlignment(string1, string2, type = "local", patternQuality = stringQuality, subjectQuality = stringQuality) localAlignScore <- pairwiseAlignment(string1, string2, type = "local", scoreOnly = TRUE, patternQuality = stringQuality, subjectQuality = stringQuality) checkEquals(as.character(pattern(globalAlign)), "ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG") checkEquals(as.character(subject(globalAlign)), "GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") checkEquals(compareStrings(globalAlign), "??TTCAC?A??TCC?T???GGTAAGT??AT?---AAA??---AAA???A?A?TTTTCA??") checkEquals(score(globalAlign), sum(c(33, 21) * scoring) - 44, tolerance = 1e-6) checkEquals(globalAlignScore, sum(c(33, 21) * scoring) - 44, tolerance = 1e-6) checkEquals(as.character(pattern(overlapAlign)), "G") checkEquals(as.character(subject(overlapAlign)), "G") checkEquals(score(overlapAlign), scoring[[1]], tolerance = 1e-6) checkEquals(overlapAlignScore, scoring[[1]], tolerance = 1e-6) checkEquals(as.character(pattern(localAlign)), "GGTAAGT") checkEquals(as.character(subject(localAlign)), "GGTAAGT") checkEquals(score(localAlign), 7 * scoring[[1]], tolerance = 1e-6) checkEquals(localAlignScore, 7 * scoring[[1]], tolerance = 1e-6) } TRUE } pwalign/man/0000755000175100017510000000000014614311433014006 5ustar00biocbuildbiocbuildpwalign/man/align-utils.Rd0000644000175100017510000001333514614311433016532 0ustar00biocbuildbiocbuild\name{align-utils} \alias{align-utils} \alias{mismatch,AlignedXStringSet0,missing-method} \alias{nmatch,PairwiseAlignments,missing-method} \alias{nmatch,PairwiseAlignmentsSingleSubjectSummary,missing-method} \alias{nmismatch,AlignedXStringSet0,missing-method} \alias{nmismatch,PairwiseAlignments,missing-method} \alias{nmismatch,PairwiseAlignmentsSingleSubjectSummary,missing-method} \alias{nedit} \alias{nedit,PairwiseAlignments-method} \alias{nedit,PairwiseAlignmentsSingleSubjectSummary-method} \alias{mismatchTable} \alias{mismatchTable,AlignedXStringSet0-method} \alias{mismatchTable,QualityAlignedXStringSet-method} \alias{mismatchTable,PairwiseAlignments-method} \alias{mismatchSummary} \alias{mismatchSummary,AlignedXStringSet0-method} \alias{mismatchSummary,QualityAlignedXStringSet-method} \alias{mismatchSummary,PairwiseAlignmentsSingleSubject-method} \alias{mismatchSummary,PairwiseAlignmentsSingleSubjectSummary-method} \alias{coverage,AlignedXStringSet0-method} \alias{coverage,PairwiseAlignmentsSingleSubject-method} \alias{coverage,PairwiseAlignmentsSingleSubjectSummary-method} \alias{compareStrings} \alias{compareStrings,character,character-method} \alias{compareStrings,XString,XString-method} \alias{compareStrings,XStringSet,XStringSet-method} \alias{compareStrings,AlignedXStringSet0,AlignedXStringSet0-method} \alias{compareStrings,PairwiseAlignments,missing-method} \alias{consensusMatrix,PairwiseAlignmentsSingleSubject-method} \title{Utility functions related to sequence alignment} \description{ A variety of different functions used to deal with sequence alignments. } \usage{ nedit(x) # also nmatch and nmismatch mismatchTable(x, shiftLeft=0L, shiftRight=0L, \dots) mismatchSummary(x, \dots) \S4method{coverage}{AlignedXStringSet0}(x, shift=0L, width=NULL, weight=1L) \S4method{coverage}{PairwiseAlignmentsSingleSubject}(x, shift=0L, width=NULL, weight=1L) compareStrings(pattern, subject) \S4method{consensusMatrix}{PairwiseAlignmentsSingleSubject}(x, as.prob=FALSE, shift=0L, width=NULL, baseOnly=FALSE, gapCode="-", endgapCode="-") } \arguments{ \item{x}{ A \code{character} vector or matrix, \code{XStringSet}, \code{XStringViews}, \code{PairwiseAlignments}, or \code{list} of FASTA records containing the equal-length strings. } \item{shiftLeft, shiftRight}{ Non-positive and non-negative integers respectively that specify how many preceding and succeeding characters to and from the mismatch position to include in the mismatch substrings. } \item{\dots}{ Further arguments to be passed to or from other methods. } \item{shift, width}{ See \code{?\link[IRanges]{coverage}}. } \item{weight}{ An integer vector specifying how much each element in \code{x} counts. } \item{pattern, subject}{ The strings to compare. Can be of type \code{character}, \code{XString}, \code{XStringSet}, \code{AlignedXStringSet}, or, in the case of \code{pattern}, \code{PairwiseAlignments}. If the first argument of \code{compareStrings()} (\code{pattern}) is a \code{PairwiseAlignments} object, then the second argument (\code{subject}) must be missing. In this case \code{compareStrings(x)} is equivalent to \code{compareStrings(pattern(x), subject(x))}. } \item{as.prob}{ If \code{TRUE} then probabilities are reported, otherwise counts (the default). } \item{baseOnly}{ \code{TRUE} or \code{FALSE}. If \code{TRUE}, the returned vector only contains frequencies for the letters in the "base" alphabet i.e. "A", "C", "G", "T" if \code{x} is a "DNA input", and "A", "C", "G", "U" if \code{x} is "RNA input". When \code{x} is a \link{BString} object (or an \link{XStringViews} object with a \link{BString} subject, or a \link{BStringSet} object), then the \code{baseOnly} argument is ignored. } \item{gapCode, endgapCode}{ The codes in the appropriate \code{\link{alphabet}} to use for the internal and end gaps. } } \value{ \code{nedit()}: An integer vector of the same length as the input \code{PairwiseAlignments} object reporting the number of edits (i.e. nb of mismatches + nb of indels) for each alignment. \code{mismatchTable()}: A data.frame containing the positions and substrings of the mismatches for the \code{AlignedXStringSet} or \code{PairwiseAlignments} object. \code{mismatchSummary()}: A list of data.frame objects containing counts and frequencies of the mismatches for the \code{AlignedXStringSet} or \code{PairwiseAlignmentsSingleSubject} object. \code{compareStrings()}: Combines two equal-length strings that are assumed to be aligned into a single character string containing that replaces mismatches with \code{"?"}, insertions with \code{"+"}, and deletions with \code{"-"}. } \author{P. Aboyoun} \seealso{ \code{\link{pairwiseAlignment}}, \code{\link{consensusMatrix}}, \link{XString-class}, \link{XStringSet-class}, \link{XStringViews-class}, \link{AlignedXStringSet-class}, \link{PairwiseAlignments-class}, \link{match-utils} } \examples{ ## Compare two globally aligned strings string1 <- "ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG" string2 <- "GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC" compareStrings(string1, string2) ## Create a consensus matrix nw1 <- pairwiseAlignment( AAStringSet(c("HLDNLKGTF", "HVDDMPNAL")), AAString("SMDDTEKMSMKL"), substitutionMatrix = "BLOSUM50", gapOpening = 3, gapExtension = 1) consensusMatrix(nw1) ## Examine the consensus between the bacteriophage phi X174 genomes data(phiX174Phage) phageConsmat <- consensusMatrix(phiX174Phage, baseOnly = TRUE) phageDiffs <- which(apply(phageConsmat, 2, max) < length(phiX174Phage)) phageDiffs phageConsmat[,phageDiffs] } \keyword{methods} pwalign/man/AlignedXStringSet-class.Rd0000644000175100017510000001230414614311433020736 0ustar00biocbuildbiocbuild\name{AlignedXStringSet-class} \docType{class} % Classes \alias{class:AlignedXStringSet0} \alias{AlignedXStringSet0-class} \alias{AlignedXStringSet0} \alias{class:AlignedXStringSet} \alias{AlignedXStringSet-class} \alias{AlignedXStringSet} \alias{class:QualityAlignedXStringSet} \alias{QualityAlignedXStringSet-class} \alias{QualityAlignedXStringSet} % Accessor methods: \alias{unaligned} \alias{unaligned,AlignedXStringSet0-method} \alias{aligned} \alias{aligned,AlignedXStringSet0-method} \alias{start,AlignedXStringSet0-method} \alias{end,AlignedXStringSet0-method} \alias{width,AlignedXStringSet0-method} \alias{indel} \alias{indel,AlignedXStringSet0-method} \alias{nindel} \alias{nindel,AlignedXStringSet0-method} \alias{nchar,AlignedXStringSet0-method} \alias{seqtype,AlignedXStringSet0-method} \alias{ranges,AlignedXStringSet0-method} % Standard generic methods: \alias{show,AlignedXStringSet0-method} \alias{as.character,AlignedXStringSet0-method} \alias{toString,AlignedXStringSet0-method} % Internal: \alias{parallel_slot_names,AlignedXStringSet0-method} \alias{parallelVectorNames,AlignedXStringSet0-method} \title{AlignedXStringSet and QualityAlignedXStringSet objects} \description{ The \code{AlignedXStringSet} and \code{QualityAlignedXStringSet} classes are containers for storing an aligned \code{XStringSet}. } \details{ Before we define the notion of alignment, we introduce the notion of "filled-with-gaps subsequence". A "filled-with-gaps subsequence" of a string string1 is obtained by inserting 0 or any number of gaps in a subsequence of s1. For example L-A--ND and A--N-D are "filled-with-gaps subsequences" of LAND. An alignment between two strings string1 and string2 results in two strings (align1 and align2) that have the same length and are "filled-with-gaps subsequences" of string1 and string2. For example, this is an alignment between LAND and LEAVES: \preformatted{ L-A LEA } An alignment can be seen as a compact representation of one set of basic operations that transforms string1 into align1. There are 3 different kinds of basic operations: "insertions" (gaps in align1), "deletions" (gaps in align2), "replacements". The above alignment represents the following basic operations: \preformatted{ insert E at pos 2 insert V at pos 4 insert E at pos 5 replace by S at pos 6 (N is replaced by S) delete at pos 7 (D is deleted) } Note that "insert X at pos i" means that all letters at a position >= i are moved 1 place to the right before X is actually inserted. There are many possible alignments between two given strings string1 and string2 and a common problem is to find the one (or those ones) with the highest score, i.e. with the lower total cost in terms of basic operations. } \section{Accessor methods}{ In the code snippets below, \code{x} is a \code{AlignedXStringSet} or \code{QualityAlignedXStringSet} object. \describe{ \item{\code{unaligned(x)}:}{ The original string. } \item{\code{aligned(x, degap = FALSE)}:}{ If \code{degap = FALSE}, the "filled-with-gaps subsequence" representing the aligned substring. If \code{degap = TRUE}, the "gap-less subsequence" representing the aligned substring. } \item{\code{ranges(x)}:}{ The bounds of the aligned substring. } \item{\code{start(x)}:}{ The start of the aligned substring. } \item{\code{end(x)}:}{ The end of the aligned substring. } \item{\code{width(x)}:}{ The width of the aligned substring, ignoring gaps. } \item{\code{indel(x)}:}{ The positions, in the form of an \code{IRanges} object, of the insertions or deletions (depending on what \code{x} represents). } \item{\code{nindel(x)}:}{ A two-column matrix containing the length and sum of the widths for each of the elements returned by \code{indel}. } \item{\code{length(x)}:}{ The length of the \code{aligned(x)}. } \item{\code{nchar(x)}:}{ The nchar of the \code{aligned(x)}. } \item{\code{alphabet(x)}:}{ Equivalent to \code{alphabet(unaligned(x))}. } \item{\code{as.character(x)}:}{ Converts \code{aligned(x)} to a character vector. } \item{\code{toString(x)}:}{ Equivalent to \code{toString(as.character(x))}. } } } \section{Subsetting methods}{ \describe{ \item{\code{x[i]}:}{ Returns a new \code{AlignedXStringSet} or \code{QualityAlignedXStringSet} object made of the selected elements. } \item{\code{rep(x, times)}:}{ Returns a new \code{AlignedXStringSet} or \code{QualityAlignedXStringSet} object made of the repeated elements. } } } \author{P. Aboyoun} \seealso{ \code{\link{pairwiseAlignment}}, \code{\link{PairwiseAlignments-class}}, \code{\link{XStringSet-class}} } \examples{ pattern <- AAString("LAND") subject <- AAString("LEAVES") pa1 <- pairwiseAlignment(pattern, subject, substitutionMatrix="BLOSUM50", gapOpening=3, gapExtension=1) alignedPattern <- pattern(pa1) class(alignedPattern) # AlignedXStringSet object unaligned(alignedPattern) aligned(alignedPattern) as.character(alignedPattern) nchar(alignedPattern) } \keyword{methods} \keyword{classes} pwalign/man/InDel-class.Rd0000644000175100017510000000174114614311433016376 0ustar00biocbuildbiocbuild\name{InDel-class} \docType{class} % Classes \alias{class:InDel} \alias{InDel-class} \alias{InDel} % Accessor methods: \alias{insertion} \alias{insertion,InDel-method} \alias{deletion} \alias{deletion,InDel-method} \title{InDel objects} \description{ The \code{InDel} class is a container for storing insertion and deletion information. } \details{ This is a generic class that stores any insertion and deletion information. } \section{Accessor methods}{ In the code snippets below, \code{x} is a \code{InDel} object. \describe{ \item{\code{insertion(x)}:}{ The insertion information. } \item{\code{deletion(x)}:}{ The deletion information. } } } \author{P. Aboyoun} \seealso{ \code{\link{pairwiseAlignment}}, \code{\link{PairwiseAlignments-class}} } \examples{ pa <- PairwiseAlignments("-PA--W-HEAE", "HEAGAWGHE-E") pa_indel <- indel(pa) # an InDel object insertion(pa_indel) deletion(pa_indel) } \keyword{methods} \keyword{classes} pwalign/man/pairwiseAlignment.Rd0000644000175100017510000002213014614311433017755 0ustar00biocbuildbiocbuild\name{pairwiseAlignment} \alias{pairwiseAlignment} \alias{pairwiseAlignment,ANY,ANY-method} \alias{pairwiseAlignment,ANY,QualityScaledXStringSet-method} \alias{pairwiseAlignment,QualityScaledXStringSet,ANY-method} \alias{pairwiseAlignment,QualityScaledXStringSet,QualityScaledXStringSet-method} \title{Optimal Pairwise Alignment} \description{ Solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems. } \usage{ pairwiseAlignment(pattern, subject, \dots) \S4method{pairwiseAlignment}{ANY,ANY}(pattern, subject, patternQuality=PhredQuality(22L), subjectQuality=PhredQuality(22L), type="global", substitutionMatrix=NULL, fuzzyMatrix=NULL, gapOpening=10, gapExtension=4, scoreOnly=FALSE) \S4method{pairwiseAlignment}{QualityScaledXStringSet,QualityScaledXStringSet}(pattern, subject, type="global", substitutionMatrix=NULL, fuzzyMatrix=NULL, gapOpening=10, gapExtension=4, scoreOnly=FALSE) } \arguments{ \item{pattern}{a character vector or \code{\link{XStringSet}} derivative of any length, or an \code{\link{XString}} derivative.} \item{subject}{a character vector or \code{\link{XStringSet}} derivative of length 1 or \code{length(pattern)}, or an \code{\link{XString}} derivative.} \item{patternQuality, subjectQuality}{objects of class \code{\link{XStringQuality}} representing the respective quality scores for \code{pattern} and \code{subject} that are used in a quality-based method for generating a substitution matrix. These two arguments are ignored if \code{!is.null(substitutionMatrix)} or if its respective string set (\code{pattern}, \code{subject}) is of class \code{\link{QualityScaledXStringSet}}.} \item{type}{type of alignment. One of \code{"global"}, \code{"local"}, \code{"overlap"}, \code{"global-local"}, and \code{"local-global"} where \code{"global"} = align whole strings with end gap penalties, \code{"local"} = align string fragments, \code{"overlap"} = align whole strings without end gap penalties, \code{"global-local"} = align whole strings in \code{pattern} with consecutive subsequence of \code{subject}, \code{"local-global"} = align consecutive subsequence of \code{pattern} with whole strings in \code{subject}.} \item{substitutionMatrix}{substitution matrix representing the fixed substitution scores for an alignment. It cannot be used in conjunction with \code{patternQuality} and \code{subjectQuality} arguments.} \item{fuzzyMatrix}{fuzzy match matrix for quality-based alignments. It takes values between 0 and 1; where 0 is an unambiguous mismatch, 1 is an unambiguous match, and values in between represent a fraction of "matchiness". (See details section below.)} \item{gapOpening}{the cost for opening a gap in the alignment.} \item{gapExtension}{the incremental cost incurred along the length of the gap in the alignment.} \item{scoreOnly}{logical to denote whether or not to return just the scores of the optimal pairwise alignment.} \item{\dots}{optional arguments to generic function to support additional methods.} } \details{ Quality-based alignments are based on the paper the Bioinformatics article by Ketil Malde listed in the Reference section below. Let \eqn{\epsilon_i} be the probability of an error in the base read. For \code{"Phred"} quality measures \eqn{Q} in \eqn{[0, 99]}, these error probabilities are given by \eqn{\epsilon_i = 10^{-Q/10}}. For \code{"Solexa"} quality measures \eqn{Q} in \eqn{[-5, 99]}, they are given by \eqn{\epsilon_i = 1 - 1/(1 + 10^{-Q/10})}. Assuming independence within and between base reads, the combined error probability of a mismatch when the underlying bases do match is \eqn{\epsilon_c = \epsilon_1 + \epsilon_2 - (n/(n-1)) * \epsilon_1 * \epsilon_2}, where \eqn{n} is the number of letters in the underlying alphabet (i.e. \eqn{n = 4} for DNA input, \eqn{n = 20} for amino acid input, otherwise \eqn{n} is the number of distinct letters in the input). Using \eqn{\epsilon_c}, the substitution score is given by \eqn{b * \log_2(\gamma_{x,y} * (1 - \epsilon_c) * n + (1 - \gamma_{x,y}) * \epsilon_c * (n/(n-1)))}, where \eqn{b} is the bit-scaling for the scoring and \eqn{\gamma_{x,y}} is the probability that characters \eqn{x} and \eqn{y} represents the same underlying information (e.g. using IUPAC, \eqn{\gamma_{A,A} = 1} and \eqn{\gamma_{A,N} = 1/4}. In the arguments listed above \code{fuzzyMatch} represents \eqn{\gamma_{x,y}} and \code{patternQuality} and \code{subjectQuality} represents \eqn{\epsilon_1} and \eqn{\epsilon_2} respectively. If \code{scoreOnly == FALSE}, a pairwise alignment with the maximum alignment score is returned. If more than one pairwise alignment produces the maximum alignment score, then the alignment with the smallest initial deletion whose mismatches occur before its insertions and deletions is chosen. For example, if \code{pattern = "AGTA"} and \code{subject = "AACTAACTA"}, then the alignment \code{pattern: [1] AG-TA; subject: [1] AACTA} is chosen over \code{pattern: [1] A-GTA; subject: [1] AACTA} or \code{pattern: [1] AG-TA; subject: [5] AACTA} if they all achieve the maximum alignment score. } \value{ If \code{scoreOnly == FALSE} (the default), the function returns a \code{\link{PairwiseAlignmentsSingleSubject}} object (if a single subject was supplied) or a \code{\link{PairwiseAlignments}} object (if more than one subject was supplied). In both cases, the returned object contains N \emph{optimal pairwise alignments} where N is the number of supplied patterns, that is, N = \code{length(pattern)} if \code{pattern} is a character vector or \code{\link{XStringSet}} derivative, or N = 1 if it's an \code{\link{XString}} derivative. If more than one subject was supplied, the alignments in the returned \code{\link{PairwiseAlignments}} object are obtained by aligning \code{pattern[[1]]} to \code{subject[[1]]}, \code{pattern[[2]]} to \code{subject[[2]]}, \code{pattern[[3]]} to \code{subject[[3]]}, etc... If \code{scoreOnly == TRUE}, a numeric vector containing the scores for the N \emph{optimal pairwise alignments} is returned. } \references{ R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis, Cambridge UP 1998, sec 2.3. B. Haubold, T. Wiehe, Introduction to Computational Biology, Birkhauser Verlag 2006, Chapter 2. K. Malde, The effect of sequence quality on sequence alignment, Bioinformatics 2008 24(7):897-900. } \note{ Use \code{\link{matchPattern}} or \code{\link{vmatchPattern}} if you need to find all the occurrences (eventually with indels) of a given pattern in a reference sequence or set of sequences. Use \code{\link{matchPDict}} if you need to match a (big) set of patterns against a reference sequence. } \author{P. Aboyoun} \seealso{ \code{\link{writePairwiseAlignments}}, \code{\link{stringDist}}, \link{PairwiseAlignments-class}, \link{XStringQuality-class}, \link{substitution_matrices}, \code{\link{matchPattern}} } \examples{ ## Nucleotide global, local, and overlap alignments s1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") s2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") # First use a fixed substitution matrix mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly = TRUE) globalAlign <- pairwiseAlignment(s1, s2, substitutionMatrix = mat, gapOpening = 5, gapExtension = 2) localAlign <- pairwiseAlignment(s1, s2, type = "local", substitutionMatrix = mat, gapOpening = 5, gapExtension = 2) overlapAlign <- pairwiseAlignment(s1, s2, type = "overlap", substitutionMatrix = mat, gapOpening = 5, gapExtension = 2) # Then use quality-based method for generating a substitution matrix pairwiseAlignment(s1, s2, patternQuality = SolexaQuality(rep(c(22L, 12L), times = c(36, 18))), subjectQuality = SolexaQuality(rep(c(22L, 12L), times = c(40, 20))), scoreOnly = TRUE) # Now assume can't distinguish between C/T and G/A pairwiseAlignment(s1, s2, patternQuality = SolexaQuality(rep(c(22L, 12L), times = c(36, 18))), subjectQuality = SolexaQuality(rep(c(22L, 12L), times = c(40, 20))), type = "local") mapping <- diag(4) dimnames(mapping) <- list(DNA_BASES, DNA_BASES) mapping["C", "T"] <- mapping["T", "C"] <- 1 mapping["G", "A"] <- mapping["A", "G"] <- 1 pairwiseAlignment(s1, s2, patternQuality = SolexaQuality(rep(c(22L, 12L), times = c(36, 18))), subjectQuality = SolexaQuality(rep(c(22L, 12L), times = c(40, 20))), fuzzyMatrix = mapping, type = "local") ## Amino acid global alignment pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"), substitutionMatrix = "BLOSUM50", gapOpening = 0, gapExtension = 8) } \keyword{models} \keyword{methods} pwalign/man/PairwiseAlignments-class.Rd0000644000175100017510000002757314614311433021223 0ustar00biocbuildbiocbuild\name{PairwiseAlignments-class} \docType{class} % Classes \alias{class:PairwiseAlignments} \alias{PairwiseAlignments-class} \alias{PairwiseAlignments} \alias{class:PairwiseAlignmentsSingleSubject} \alias{PairwiseAlignmentsSingleSubject-class} \alias{PairwiseAlignmentsSingleSubject} \alias{class:PairwiseAlignmentsSingleSubjectSummary} \alias{PairwiseAlignmentsSingleSubjectSummary-class} \alias{PairwiseAlignmentsSingleSubjectSummary} \alias{parallel_slot_names,PairwiseAlignments-method} % Constructor-like functions and generics: \alias{PairwiseAlignments,XString,XString-method} \alias{PairwiseAlignments,XStringSet,missing-method} \alias{PairwiseAlignments,character,missing-method} \alias{PairwiseAlignments,character,character-method} \alias{PairwiseAlignmentsSingleSubject,XString,XString-method} \alias{PairwiseAlignmentsSingleSubject,XStringSet,missing-method} \alias{PairwiseAlignmentsSingleSubject,character,missing-method} \alias{PairwiseAlignmentsSingleSubject,character,character-method} % Accessor methods: \alias{pattern,PairwiseAlignments-method} \alias{subject,PairwiseAlignments-method} \alias{type} \alias{type,PairwiseAlignments-method} \alias{alignedPattern} \alias{alignedSubject} \alias{alignedPattern,PairwiseAlignments-method} \alias{alignedSubject,PairwiseAlignments-method} \alias{score,PairwiseAlignments-method} \alias{insertion,PairwiseAlignments-method} \alias{deletion,PairwiseAlignments-method} \alias{indel,PairwiseAlignments-method} \alias{nindel,PairwiseAlignments-method} \alias{nchar,PairwiseAlignments-method} \alias{seqtype,PairwiseAlignments-method} % Standard generic methods: \alias{show,PairwiseAlignments-method} % Methods for PairwiseAlignmentsSingleSubject: \alias{summary,PairwiseAlignmentsSingleSubject-method} \alias{Views,PairwiseAlignmentsSingleSubject-method} \alias{aligned,PairwiseAlignmentsSingleSubject-method} \alias{as.character,PairwiseAlignmentsSingleSubject-method} \alias{toString,PairwiseAlignmentsSingleSubject-method} \alias{as.matrix,PairwiseAlignmentsSingleSubject-method} % Methods for PairwiseAlignmentsSingleSubjectSummary: \alias{type,PairwiseAlignmentsSingleSubjectSummary-method} \alias{score,PairwiseAlignmentsSingleSubjectSummary-method} \alias{nindel,PairwiseAlignmentsSingleSubjectSummary-method} \alias{length,PairwiseAlignmentsSingleSubjectSummary-method} \alias{nchar,PairwiseAlignmentsSingleSubjectSummary-method} \alias{show,PairwiseAlignmentsSingleSubjectSummary-method} \title{PairwiseAlignments, PairwiseAlignmentsSingleSubject, and PairwiseAlignmentsSingleSubjectSummary objects} \description{ The \code{PairwiseAlignments} class is a container for storing a set of pairwise alignments. The \code{PairwiseAlignmentsSingleSubject} class is a container for storing a set of pairwise alignments with a single subject. The \code{PairwiseAlignmentsSingleSubjectSummary} class is a container for storing the summary of a set of pairwise alignments. } \usage{ ## Constructors: ## When subject is missing, pattern must be of length 2 \S4method{PairwiseAlignments}{XString,XString}(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) \S4method{PairwiseAlignments}{XStringSet,missing}(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) \S4method{PairwiseAlignments}{character,character}(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") \S4method{PairwiseAlignments}{character,missing}(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") } \arguments{ \item{pattern}{a character vector of length 1 or 2, an \code{\link{XString}}, or an \code{\link{XStringSet}} object of length 1 or 2.} \item{subject}{a character vector of length 1 or an \code{\link{XString}} object.} \item{type}{type of alignment. One of \code{"global"}, \code{"local"}, \code{"overlap"}, \code{"global-local"}, and \code{"local-global"} where \code{"global"} = align whole strings with end gap penalties, \code{"local"} = align string fragments, \code{"overlap"} = align whole strings without end gap penalties, \code{"global-local"} = align whole strings in \code{pattern} with consecutive subsequence of \code{subject}, \code{"local-global"} = align consecutive subsequence of \code{pattern} with whole strings in \code{subject}.} \item{substitutionMatrix}{substitution matrix for the alignment. If NULL, the diagonal values and off-diagonal values are set to 0 and 1 respectively.} \item{gapOpening}{the cost for opening a gap in the alignment.} \item{gapExtension}{the incremental cost incurred along the length of the gap in the alignment.} \item{baseClass}{the base \code{\link{XString}} class to use in the alignment.} } \details{ Before we define the notion of alignment, we introduce the notion of "filled-with-gaps subsequence". A "filled-with-gaps subsequence" of a string string1 is obtained by inserting 0 or any number of gaps in a subsequence of s1. For example L-A--ND and A--N-D are "filled-with-gaps subsequences" of LAND. An alignment between two strings string1 and string2 results in two strings (align1 and align2) that have the same length and are "filled-with-gaps subsequences" of string1 and string2. For example, this is an alignment between LAND and LEAVES: \preformatted{ L-A LEA } An alignment can be seen as a compact representation of one set of basic operations that transforms string1 into align1. There are 3 different kinds of basic operations: "insertions" (gaps in align1), "deletions" (gaps in align2), "replacements". The above alignment represents the following basic operations: \preformatted{ insert E at pos 2 insert V at pos 4 insert E at pos 5 replace by S at pos 6 (N is replaced by S) delete at pos 7 (D is deleted) } Note that "insert X at pos i" means that all letters at a position >= i are moved 1 place to the right before X is actually inserted. There are many possible alignments between two given strings string1 and string2 and a common problem is to find the one (or those ones) with the highest score, i.e. with the lower total cost in terms of basic operations. } \section{Object extraction methods}{ In the code snippets below, \code{x} is a \code{PairwiseAlignments} object, except otherwise noted. \describe{ \item{\code{alignedPattern(x), alignedSubject(x)}:}{ Extract the aligned patterns or subjects as an \code{XStringSet} object. The 2 objects returned by \code{alignedPattern(x)} and \code{alignedSubject(x)} are guaranteed to have the same shape (i.e. same \code{length()} and \code{width()}). } \item{\code{pattern(x), subject(x)}:}{ Extract the aligned patterns or subjects as an \code{AlignedXStringSet0} object. } \item{\code{summary(object, ...)}:}{ Generates a summary for the \code{PairwiseAlignments} object. } } } \section{General information methods}{ In the code snippets below, \code{x} is a \code{PairwiseAlignments} object, except otherwise noted. \describe{ \item{\code{alphabet(x)}:}{ Equivalent to \code{alphabet(unaligned(subject(x)))}. } \item{\code{length(x)}:}{ The common length of \code{alignedPattern(x)} and \code{alignedSubject(x)}. There is a method for \code{PairwiseAlignmentsSingleSubjectSummary} as well. } \item{\code{type(x)}:}{ The type of the alignment (\code{"global"}, \code{"local"}, \code{"overlap"}, \code{"global-local"}, or \code{"local-global"}). There is a method for \code{PairwiseAlignmentsSingleSubjectSummary} as well. } } } \section{Aligned sequence methods}{ In the code snippets below, \code{x} is a \code{PairwiseAlignmentsSingleSubject} object, except otherwise noted. \describe{ \item{\code{aligned(x, degap = FALSE, gapCode="-", endgapCode="-")}:}{ If \code{degap = FALSE}, "align" the alignments by returning an \code{XStringSet} object containing the aligned patterns without insertions. If \code{degap = TRUE}, returns \code{aligned(pattern(x), degap=TRUE)}. The \code{gapCode} and \code{endgapCode} arguments denote the code in the appropriate \code{\link{alphabet}} to use for the internal and end gaps. } \item{\code{as.character(x)}:}{ Equivalent to \code{as.character(alignedPattern(x))}. } \item{\code{as.matrix(x)}:}{ Returns an "exploded" character matrix representation of \code{aligned(x)}. } \item{\code{toString(x)}:}{ Equivalent to \code{toString(as.character(x))}. } } } \section{Subject position methods}{ In the code snippets below, \code{x} is a \code{PairwiseAlignmentsSingleSubject} object, except otherwise noted. \describe{ \item{\code{consensusMatrix(x, as.prob=FALSE, baseOnly=FALSE, gapCode="-", endgapCode="-")}:}{ See `\link{consensusMatrix}` for more information. } \item{\code{consensusString(x)}:}{ See `\link{consensusString}` for more information. } \item{\code{coverage(x, shift=0L, width=NULL, weight=1L)}:}{ See `\link{coverage,PairwiseAlignmentsSingleSubject-method}` for more information. } \item{\code{Views(subject, start=NULL, end=NULL, width=NULL, names=NULL)}:}{ The \code{XStringViews} object that represents the pairwise alignments along \code{unaligned(subject(subject))}. The \code{start} and \code{end} arguments must be either \code{NULL}/\code{NA} or an integer vector of length 1 that denotes the offset from \code{start(subject(subject))}. } } } \section{Numeric summary methods}{ In the code snippets below, \code{x} is a \code{PairwiseAlignments} object, except otherwise noted. \describe{ \item{\code{nchar(x)}:}{ The nchar of the \code{aligned(pattern(x))} and \code{aligned(subject(x))}. There is a method for \code{PairwiseAlignmentsSingleSubjectSummary} as well. } \item{\code{insertion(x)}:}{ An \code{\link[IRanges:IRangesList-class]{CompressedIRangesList}} object containing the locations of the insertions from the perspective of the \code{pattern}. } \item{\code{deletion(x)}:}{ An \code{\link[IRanges:IRangesList-class]{CompressedIRangesList}} object containing the locations of the deletions from the perspective of the \code{pattern}. } \item{\code{indel(x)}:}{ An \code{InDel} object containing the locations of the insertions and deletions from the perspective of the \code{pattern}. } \item{\code{nindel(x)}:}{ An \code{InDel} object containing the number of insertions and deletions. } \item{\code{score(x)}:}{ The score of the alignment. There is a method for \code{PairwiseAlignmentsSingleSubjectSummary} as well. } } } \section{Subsetting methods}{ \describe{ \item{\code{x[i]}:}{ Returns a new \code{PairwiseAlignments} object made of the selected elements. } \item{\code{rep(x, times)}:}{ Returns a new \code{PairwiseAlignments} object made of the repeated elements. } } } \author{P. Aboyoun} \seealso{ \code{\link{pairwiseAlignment}}, \code{\link{writePairwiseAlignments}}, \link{AlignedXStringSet-class}, \link{XString-class}, \link{XStringViews-class}, \link{align-utils}, \code{\link{pid}} } \examples{ PairwiseAlignments("-PA--W-HEAE", "HEAGAWGHE-E") pattern <- AAStringSet(c("HLDNLKGTF", "HVDDMPNAKLLL")) subject <- AAString("SHLDTEKMSMKLL") pa1 <- pairwiseAlignment(pattern, subject, substitutionMatrix="BLOSUM50", gapOpening=3, gapExtension=1) pa1 alignedPattern(pa1) alignedSubject(pa1) stopifnot(identical(width(alignedPattern(pa1)), width(alignedSubject(pa1)))) as.character(pa1) aligned(pa1) as.matrix(pa1) nchar(pa1) score(pa1) } \keyword{methods} \keyword{classes} pwalign/man/PairwiseAlignments-io.Rd0000644000175100017510000001402514614311433020511 0ustar00biocbuildbiocbuild\name{PairwiseAlignments-io} \alias{PairwiseAlignments-io} \alias{writePairwiseAlignments} \title{Write a PairwiseAlignments object to a file} \description{ The \code{writePairwiseAlignments} function writes a \link{PairwiseAlignments} object to a file. Only the "pair" format is supported at the moment. } \usage{ writePairwiseAlignments(x, file="", Matrix=NA, block.width=50) } \arguments{ \item{x}{ A \link{PairwiseAlignments} object, typically returned by the \code{\link{pairwiseAlignment}} function. } \item{file}{ A connection, or a character string naming the file to print to. If \code{""} (the default), \code{writePairwiseAlignments} prints to the standard output connection (aka the console) unless redirected by \code{sink}. If it is \code{"|cmd"}, the output is piped to the command given by \code{cmd}, by opening a pipe connection. } \item{Matrix}{ A single string containing the name of the substitution matrix (e.g. \code{"BLOSUM50"}) used for the alignment. See the \code{substitutionMatrix} argument of the \code{\link{pairwiseAlignment}} function for the details. See \code{?\link{substitution_matrices}} for a list of predefined substitution matrices available in the \pkg{pwalign} package. } \item{block.width}{ A single integer specifying the maximum number of sequence letters (including the "-" letter, which represents gaps) per line. } } \details{ The "pair" format is one of the numerous pairwise sequence alignment formats supported by the EMBOSS software. See \url{http://emboss.sourceforge.net/docs/themes/AlignFormats.html} for a brief (and rather informal) description of this format. } \value{ Nothing (invisible \code{NULL}). } \note{ This brief description of the "pair" format suggests that it is best suited for \emph{global} pairwise alignments, because, in that case, the original pattern and subject sequences can be inferred (by just removing the gaps). However, even though the "pair" format can also be used for non global pairwise alignments (i.e. for \emph{global-local}, \emph{local-global}, and \emph{local} pairwise alignments), in that case the original pattern and subject sequences \emph{cannot} be inferred. This is because the alignment written to the file doesn't necessarily span the entire pattern (if \code{type(x)} is \code{local-global} or \code{local}) or the entire subject (if \code{type(x)} is \code{global-local} or \code{local}). As a consequence, the \code{writePairwiseAlignments} function can be used on a \link{PairwiseAlignments} object \code{x} containing non global alignments (i.e. with \code{type(x) != "global"}), but with the 2 following caveats: \enumerate{ \item The type of the alignments (\code{type(x)}) is not written to the file. \item The original pattern and subject sequences cannot be inferred. Furthermore, there is no way to infer their lengths (because we don't know whether they were trimmed or not). } Also note that the \code{\link{pairwiseAlignment}} function interprets the \code{gapOpening} and \code{gapExtension} arguments differently than most other alignment tools. As a consequence the values of the Gap_penalty and Extend_penalty fields written to the file are not the same as the values that were passed to the \code{gapOpening} and \code{gapExtension} arguments. With the following relationship: \itemize{ \item Gap_penalty = gapOpening + gapExtension \item Extend_penalty = gapExtension } } \author{H. Pagès} \references{ \url{http://emboss.sourceforge.net/docs/themes/AlignFormats.html} } \seealso{ \itemize{ \item \code{\link{pairwiseAlignment}} \item \link{PairwiseAlignments-class} \item \link{substitution_matrices} } } \examples{ ## --------------------------------------------------------------------- ## A. WITH ONE PAIR ## --------------------------------------------------------------------- pattern <- DNAString("CGTACGTAACGTTCGT") subject <- DNAString("CGTCGTCGTCCGTAA") pa1 <- pairwiseAlignment(pattern, subject) pa1 writePairwiseAlignments(pa1) writePairwiseAlignments(pa1, block.width=10) ## The 2 bottom-right numbers (16 and 15) are the lengths of ## the original pattern and subject, respectively. pa2 <- pairwiseAlignment(pattern, subject, type="global-local") pa2 # score is different! writePairwiseAlignments(pa2) ## By just looking at the file, we can't tell the length of the ## original subject! Could be 13, could be more... pattern <- DNAString("TCAACTTAACTT") subject <- DNAString("GGGCAACAACGGG") pa3 <- pairwiseAlignment(pattern, subject, type="global-local", gapOpening=-2, gapExtension=-1) writePairwiseAlignments(pa3) ## --------------------------------------------------------------------- ## B. WITH MORE THAN ONE PAIR (AND NAMED PATTERNS) ## --------------------------------------------------------------------- pattern <- DNAStringSet(c(myp1="ACCA", myp2="ACGCA", myp3="ACGGCA")) pa4 <- pairwiseAlignment(pattern, subject) pa4 writePairwiseAlignments(pa4) ## --------------------------------------------------------------------- ## C. REPRODUCING THE ALIGNMENT SHOWN AT ## http://emboss.sourceforge.net/docs/themes/alnformats/align.pair ## --------------------------------------------------------------------- pattern <- c("TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT", "GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG", "SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE") subject <- c("TSPASIRPPAGPSSRRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGT", "CTTSTSTRHRGRSGWRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTT", "GPPAWAGDRSHE") pattern <- unlist(AAStringSet(pattern)) subject <- unlist(AAStringSet(subject)) pattern # original pattern subject # original subject data(BLOSUM62) pa5 <- pairwiseAlignment(pattern, subject, substitutionMatrix=BLOSUM62, gapOpening=9.5, gapExtension=0.5) pa5 writePairwiseAlignments(pa5, Matrix="BLOSUM62") } \keyword{utilities} \keyword{manip} pwalign/man/phiX174Phage.Rd0000644000175100017510000000523414614311433016412 0ustar00biocbuildbiocbuild\name{phiX174Phage} \docType{data} \alias{phiX174Phage} \alias{srPhiX174} \alias{quPhiX174} \alias{wtPhiX174} \title{Versions of bacteriophage phiX174 complete genome and sample short reads} \description{ Six versions of the complete genome for bacteriophage \eqn{\phi} X174 as well as a small number of Solexa short reads, qualities associated with those short reads, and counts for the number times those short reads occurred. } \format{ \code{phiX174Phage}: A \code{DNAStringSet} containing the following six naturally occurring versions of the bacteriophage \eqn{\phi} X174 genome cited in Smith et al.: \enumerate{ \item Genbank: The version of the genome from GenBank (NC_001422.1, GI:9626372). \item RF70s: A preparation of \eqn{\phi} X double-stranded replicative form (RF) of DNA by Clyde A. Hutchison III from the late 1970s. \item SS78: A preparation of \eqn{\phi} X virion single-stranded DNA from 1978. \item Bull: The sequence of wild-type \eqn{\phi} X used by Bull et al. \item G'97: The \eqn{\phi} X replicative form (RF) of DNA from Bull et al. \item NEB'03: A \eqn{\phi} X replicative form (RF) of DNA from New England BioLabs (NEB). } \code{srPhiX174}: A \code{DNAStringSet} containing short reads from a Solexa machine. \code{quPhiX174}: A \code{BStringSet} containing Solexa quality scores associated with \code{srPhiX174}. \code{wtPhiX174}: An integer vector containing counts associated with \code{srPhiX174}. } \author{P. Aboyoun} \references{ \itemize{ \item \url{http://www.genome.jp/dbget-bin/www_bget?refseq+NC_001422} \item Bull, J. J., Badgett, M. R., Wichman, H. A., Huelsenbeck, Hillis, D. M., Gulati, A., Ho, C. & Molineux, J. (1997) Genetics 147, 1497-1507. \item Smith, Hamilton O.; Clyde A. Hutchison, Cynthia Pfannkoch, J. Craig Venter (2003-12-23). "Generating a synthetic genome by whole genome assembly: \{phi\}X174 bacteriophage from synthetic oligonucleotides". Proceedings of the National Academy of Sciences 100 (26): 15440-15445. doi:10.1073/pnas.2237126100. } } \examples{ data(phiX174Phage) nchar(phiX174Phage) genBankPhage <- phiX174Phage[[1]] genBankSubstring <- substring(genBankPhage, 2793-34, 2811+34) data(srPhiX174) srPhiX174 quPhiX174 summary(wtPhiX174) alignPhiX174 <- pairwiseAlignment(srPhiX174, genBankSubstring, patternQuality=SolexaQuality(quPhiX174), subjectQuality=SolexaQuality(99L), type="global-local") summary(alignPhiX174, weight=wtPhiX174) } \keyword{datasets} pwalign/man/pid.Rd0000644000175100017510000000337314614311433015057 0ustar00biocbuildbiocbuild\name{pid} \alias{pid} \alias{pid,PairwiseAlignments-method} \title{Percent Sequence Identity} \description{ Calculates the percent sequence identity for a pairwise sequence alignment. } \usage{ pid(x, type="PID1") } \arguments{ \item{x}{a \code{\link{PairwiseAlignments}} object.} \item{type}{one of percent sequence identity. One of \code{"PID1"}, \code{"PID2"}, \code{"PID3"}, and \code{"PID4"}. See Details for more information.} } \details{ Since there is no universal definition of percent sequence identity, the \code{pid} function calculates this statistic in the following types: \describe{ \item{\code{"PID1"}:}{ 100 * (identical positions) / (aligned positions + internal gap positions) } \item{\code{"PID2"}:}{ 100 * (identical positions) / (aligned positions) } \item{\code{"PID3"}:}{ 100 * (identical positions) / (length shorter sequence) } \item{\code{"PID4"}:}{ 100 * (identical positions) / (average length of the two sequences) } } } \value{ A numeric vector containing the specified sequence identity measures. } \references{ A. May, Percent Sequence Identity: The Need to Be Explicit, Structure 2004, 12(5):737. G. Raghava and G. Barton, Quantification of the variation in percentage identity for protein sequence alignments, BMC Bioinformatics 2006, 7:415. } \author{P. Aboyoun} \seealso{ \link{pairwiseAlignment}, \link{PairwiseAlignments-class}, \link{match-utils} } \examples{ s1 <- DNAString("AGTATAGATGATAGAT") s2 <- DNAString("AGTAGATAGATGGATGATAGATA") palign1 <- pairwiseAlignment(s1, s2) palign1 pid(palign1) palign2 <- pairwiseAlignment(s1, s2, substitutionMatrix = nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE)) palign2 pid(palign2, type = "PID4") } \keyword{methods} pwalign/man/predefined_scoring_matrices.Rd0000644000175100017510000000525114614311433022020 0ustar00biocbuildbiocbuild\name{predefined_scoring_matrices} \docType{data} \alias{predefined_scoring_matrices} \alias{BLOSUM45} \alias{BLOSUM50} \alias{BLOSUM62} \alias{BLOSUM80} \alias{BLOSUM100} \alias{PAM30} \alias{PAM40} \alias{PAM70} \alias{PAM120} \alias{PAM250} \title{Predefined scoring matrices} \description{ Predefined scoring matrices for nucleotide and amino acid alignments. } \usage{ data(BLOSUM45) data(BLOSUM50) data(BLOSUM62) data(BLOSUM80) data(BLOSUM100) data(PAM30) data(PAM40) data(PAM70) data(PAM120) data(PAM250) } \format{ The BLOSUM and PAM matrices are square symmetric matrices with integer coefficients, whose row and column names are identical and unique: each name is a single letter representing a nucleotide or an amino acid. } \details{ The BLOSUM and PAM matrices are not unique. For example, the definition of the widely used BLOSUM62 matrix varies depending on the source, and even a given source can provide different versions of "BLOSUM62" without keeping track of the changes over time. NCBI provides many matrices here ftp://ftp.ncbi.nih.gov/blast/matrices/ but their definitions don't match those of the matrices bundled with their stand-alone BLAST software available here ftp://ftp.ncbi.nih.gov/blast/ The BLOSUM45, BLOSUM62, BLOSUM80, PAM30 and PAM70 matrices were taken from NCBI stand-alone BLAST software. The BLOSUM50, BLOSUM100, PAM40, PAM120 and PAM250 matrices were taken from ftp://ftp.ncbi.nih.gov/blast/matrices/ } \author{H. Pagès and P. Aboyoun} \seealso{ \code{\link{nucleotideSubstitutionMatrix}}, \code{\link{pairwiseAlignment}}, \link{PairwiseAlignments-class}, \link{DNAString-class}, \link{AAString-class}, \link{PhredQuality-class}, \link{SolexaQuality-class}, \link{IlluminaQuality-class} } \examples{ ## Align two amino acid sequences with the BLOSUM62 matrix: aa1 <- AAString("HXBLVYMGCHFDCXVBEHIKQZ") aa2 <- AAString("QRNYMYCFQCISGNEYKQN") pairwiseAlignment(aa1, aa2, substitutionMatrix="BLOSUM62", gapOpening=3, gapExtension=1) ## See how the gap penalty influences the alignment: pairwiseAlignment(aa1, aa2, substitutionMatrix="BLOSUM62", gapOpening=6, gapExtension=2) ## See how the substitution matrix influences the alignment: pairwiseAlignment(aa1, aa2, substitutionMatrix="BLOSUM50", gapOpening=3, gapExtension=1) if (interactive()) { ## Compare our BLOSUM62 with BLOSUM62 from ## ftp://ftp.ncbi.nih.gov/blast/matrices/: data(BLOSUM62) BLOSUM62["Q", "Z"] file <- "ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62" b62 <- as.matrix(read.table(file, check.names=FALSE)) b62["Q", "Z"] } } \keyword{data} \keyword{datasets} pwalign/man/stringDist.Rd0000644000175100017510000000762314614311433016437 0ustar00biocbuildbiocbuild\name{stringDist} \alias{stringDist} \alias{stringDist,character-method} \alias{stringDist,XStringSet-method} \alias{stringDist,QualityScaledXStringSet-method} \title{String Distance/Alignment Score Matrix} \description{ Computes the Levenshtein edit distance or pairwise alignment score matrix for a set of strings. } \usage{ stringDist(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, \dots) \S4method{stringDist}{XStringSet}(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", quality = PhredQuality(22L), substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1) \S4method{stringDist}{QualityScaledXStringSet}(x, method = "quality", ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1) } \arguments{ \item{x}{a character vector or an \code{\link{XStringSet}} object.} \item{method}{calculation method. One of \code{"levenshtein"}, \code{"hamming"}, \code{"quality"}, or \code{"substitutionMatrix"}.} \item{ignoreCase}{logical value indicating whether to ignore case during scoring.} \item{diag}{logical value indicating whether the diagonal of the matrix should be printed by \code{print.dist}.} \item{upper}{logical value indicating whether the upper triangle of the matrix should be printed by \code{print.dist}.} \item{type}{(applicable when \code{method = "quality"} or \code{method = "substitutionMatrix"}). type of alignment. One of \code{"global"}, \code{"local"}, and \code{"overlap"}, where \code{"global"} = align whole strings with end gap penalties, \code{"local"} = align string fragments, \code{"overlap"} = align whole strings without end gap penalties.} \item{quality}{(applicable when \code{method = "quality"}). object of class \code{\link{XStringQuality}} representing the quality scores for \code{x} that are used in a quality-based method for generating a substitution matrix.} \item{substitutionMatrix}{(applicable when \code{method = "substitutionMatrix"}). symmetric matrix representing the fixed substitution scores in the alignment.} \item{fuzzyMatrix}{(applicable when \code{method = "quality"}). fuzzy match matrix for quality-based alignments. It takes values between 0 and 1; where 0 is an unambiguous mismatch, 1 is an unambiguous match, and values in between represent a fraction of "matchiness".} \item{gapOpening}{(applicable when \code{method = "quality"} or \code{method = "substitutionMatrix"}). penalty for opening a gap in the alignment.} \item{gapExtension}{(applicable when \code{method = "quality"} or \code{method = "substitutionMatrix"}). penalty for extending a gap in the alignment} \item{\dots}{optional arguments to generic function to support additional methods.} } \details{ When \code{method = "hamming"}, uses the underlying \code{neditStartingAt} code to calculate the distances, where the Hamming distance is defined as the number of substitutions between two strings of equal length. Otherwise, uses the underlying \code{pairwiseAlignment} code to compute the distance/alignment score matrix. } \value{ Returns an object of class \code{"dist"}. } \author{P. Aboyoun} \seealso{ \link[stats]{dist}, \link[base]{agrep}, \link{pairwiseAlignment}, \link{substitution_matrices} } \examples{ stringDist(c("lazy", "HaZy", "crAzY")) stringDist(c("lazy", "HaZy", "crAzY"), ignoreCase = TRUE) data(phiX174Phage) plot(hclust(stringDist(phiX174Phage), method = "single")) data(srPhiX174) stringDist(srPhiX174[1:4]) stringDist(srPhiX174[1:4], method = "quality", quality = SolexaQuality(quPhiX174[1:4]), gapOpening = 10, gapExtension = 4) } \keyword{character} \keyword{multivariate} \keyword{cluster} pwalign/man/substitution_matrices.Rd0000644000175100017510000001236014614311433020742 0ustar00biocbuildbiocbuild\name{substitution_matrices} \alias{substitution_matrices} \alias{nucleotideSubstitutionMatrix} \alias{qualitySubstitutionMatrices} \alias{errorSubstitutionMatrices} \title{Utilities to generate substitution matrices} \description{ Utilities to generate substitution matrices. } \usage{ nucleotideSubstitutionMatrix(match=1, mismatch=0, baseOnly=FALSE, type="DNA", symmetric=TRUE) qualitySubstitutionMatrices(fuzzyMatch=c(0, 1), alphabetLength=4L, qualityClass="PhredQuality", bitScale=1) errorSubstitutionMatrices(errorProbability, fuzzyMatch=c(0, 1), alphabetLength=4L, bitScale=1) } \arguments{ \item{match}{the scoring for a nucleotide match.} \item{mismatch}{the scoring for a nucleotide mismatch.} \item{baseOnly}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, only uses the letters in the "base" alphabet i.e. "A", "C", "G", "T".} \item{type}{either "DNA" or "RNA".} \item{symmetric}{\code{TRUE} or \code{FALSE}. Default is \code{TRUE}. If \code{FALSE}, the resulting matrix will be asymmetric.} \item{fuzzyMatch}{a named or unnamed numeric vector representing the base match probability.} \item{errorProbability}{a named or unnamed numeric vector representing the error probability.} \item{alphabetLength}{an integer representing the number of letters in the underlying string alphabet. For DNA and RNA, this would be 4L. For Amino Acids, this could be 20L.} \item{qualityClass}{a character string of \code{"PhredQuality"}, \code{"SolexaQuality"}, or \code{"IlluminaQuality"}.} \item{bitScale}{a numeric value to scale the quality-based substitution matrices. By default, this is 1, representing bit-scale scoring.} } \details{ The quality matrices computed in \code{qualitySubstitutionMatrices} are based on the paper by Ketil Malde. Let \eqn{\epsilon_i} be the probability of an error in the base read. For \code{"Phred"} quality measures \eqn{Q} in \eqn{[0, 99]}, these error probabilities are given by \eqn{\epsilon_i = 10^{-Q/10}}. For \code{"Solexa"} quality measures \eqn{Q} in \eqn{[-5, 99]}, they are given by \eqn{\epsilon_i = 1 - 1/(1 + 10^{-Q/10})}. Assuming independence within and between base reads, the combined error probability of a mismatch when the underlying bases do match is \eqn{\epsilon_c = \epsilon_1 + \epsilon_2 - (n/(n-1)) * \epsilon_1 * \epsilon_2}, where \eqn{n} is the number of letters in the underlying alphabet. Using \eqn{\epsilon_c}, the substitution score is given by when two bases match is given by \eqn{b * \log_2(\gamma_{x,y} * (1 - \epsilon_c) * n + (1 - \gamma_{x,y}) * \epsilon_c * (n/(n-1)))}, where \eqn{b} is the bit-scaling for the scoring and \eqn{\gamma_{x,y}} is the probability that characters \eqn{x} and \eqn{y} represents the same underlying information (e.g. using IUPAC, \eqn{\gamma_{A,A} = 1} and \eqn{\gamma_{A,N} = 1/4}. In the arguments listed above \code{fuzzyMatch} represents \eqn{\gamma_{x,y}} and \code{errorProbability} represents \eqn{\epsilon_i}. } \value{ A matrix. } \references{ K. Malde, The effect of sequence quality on sequence alignment, Bioinformatics, Feb 23, 2008. } \author{P. Aboyoun, with contribution from Albert Vill (support for asymmetric matrices in \code{nucleotideSubstitutionMatrix()})} \seealso{ \link{predefined_scoring_matrices}, \code{\link{pairwiseAlignment}}, \link{PairwiseAlignments-class}, \link{DNAString-class}, \link{AAString-class}, \link{PhredQuality-class}, \link{SolexaQuality-class}, \link{IlluminaQuality-class} } \examples{ s1 <- DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG") s2 <- DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC") s3 <- DNAString("NCTTCRCCAGCTCCCTGGMGGTAAGTTGATCAAAGVAAACGCAAAGTTNTCAAG") ## Fit a global pairwise alignment using edit distance scoring: nsm <- nucleotideSubstitutionMatrix(0, -1, TRUE) pairwiseAlignment(s1, s2, substitutionMatrix=nsm, gapOpening=0, gapExtension=1) ## Align sequences using an asymmetric substitution matrix. ## The asymmetry of the matrix means that the query sequence is not ## penalized for ambiguous bases in the subject / consensus sequence: ansm <- nucleotideSubstitutionMatrix(symmetric=FALSE) ansm["M", c("A","C","G","T")] # A C G T # 0.5 0.5 0.0 0.0 ansm[c("A","C","G","T"), "M"] # A C G T # 1 1 0 0 ansm["M", "H"] # 1 ansm["H", "M"] # 0.6666667 ## Due to this asymmetry, the order of the sequences is important: pairwiseAlignment(s1, s3, substitutionMatrix=ansm) pairwiseAlignment(s3, s1, substitutionMatrix=ansm) ## Examine quality-based match and mismatch bit scores for DNA/RNA ## strings in pairwiseAlignment. By default patternQuality and ## subjectQuality are PhredQuality(22L): qualityMatrices <- qualitySubstitutionMatrices() qualityMatrices["22", "22", "1"] qualityMatrices["22", "22", "0"] pairwiseAlignment(s1, s2) ## Get the substitution scores when the error probability is 0.1: subscores <- errorSubstitutionMatrices(errorProbability=0.1) submat <- matrix(subscores[ , , "0"], 4, 4) diag(submat) <- subscores[ , , "1"] dimnames(submat) <- list(DNA_ALPHABET[1:4], DNA_ALPHABET[1:4]) submat pairwiseAlignment(s1, s2, substitutionMatrix=submat) } \keyword{utilities} pwalign/NAMESPACE0000644000175100017510000000475114614311433014461 0ustar00biocbuildbiocbuilduseDynLib(pwalign) import(methods) importFrom(utils, data, packageVersion) import(BiocGenerics) import(S4Vectors) import(IRanges) import(Biostrings) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Export S4 classes ### exportClasses( InDel, AlignedXStringSet0, AlignedXStringSet, QualityAlignedXStringSet, PairwiseAlignments, PairwiseAlignmentsSingleSubject, PairwiseAlignmentsSingleSubjectSummary ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Export S4 methods for generics not defined in pwalign ### exportMethods( ## Methods for generics defined in the base package: length, as.character, as.matrix, toString, nchar, summary, ## Methods for generics defined in the methods package: show, ## Methods for generics defined in the BiocGenerics package: type, start, end, width, score, ## Methods for generics defined in the S4Vectors package: parallel_slot_names, parallelVectorNames, ## Methods for generics defined in the IRanges package: Views, subject, coverage, ## Methods for generics defined in the Biostrings package: pattern, consensusMatrix, mismatch, nmatch, nmismatch ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Export non-generic functions ### export( ## PairwiseAlignments-io.R: writePairwiseAlignments, ## substitution_matrices.R: nucleotideSubstitutionMatrix, errorSubstitutionMatrices, qualitySubstitutionMatrices ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Export S4 generics defined in pwalign, and corresponding methods ### export( ## InDel-class.R: insertion, deletion, ## AlignedXStringSet-class.R: unaligned, aligned, indel, nindel, ## PairwiseAlignments-class.R: PairwiseAlignments, alignedPattern, alignedSubject, ## PairwiseAlignmentsSingleSubject-class.R: PairwiseAlignmentsSingleSubject, ## align-utils.R: nedit, mismatchTable, mismatchSummary, compareStrings, ## pid.R: pid, ## pairwiseAlignment.R: pairwiseAlignment, ## stringDist.R: stringDist ) ### Same list as above. exportMethods( insertion, deletion, unaligned, aligned, indel, nindel, PairwiseAlignments, alignedPattern, alignedSubject, PairwiseAlignmentsSingleSubject, nedit, mismatchTable, mismatchSummary, compareStrings, pid, pairwiseAlignment, stringDist ) pwalign/NEWS0000644000175100017510000000014014614311433013725 0ustar00biocbuildbiocbuildVERSION 1.0.0 ------------- o First version of the package that is ready for general use. pwalign/R/0000755000175100017510000000000014614311433013434 5ustar00biocbuildbiocbuildpwalign/R/00datacache.R0000644000175100017510000000220014614311433015606 0ustar00biocbuildbiocbuild### ========================================================================= ### Serialized objects ### SERIALIZED_OBJNAMES <- c( "BLOSUM45", "BLOSUM50", "BLOSUM62", "BLOSUM80", "BLOSUM100", "PAM30", "PAM40", "PAM70", "PAM120", "PAM250" ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Objects created "on-the-fly" (not serialized) ### ### WARNING: Improper calls to getdata() by .createObject() can lead to ### infinite recursion! ### .createObject <- function(objname) { # add more here... stop("don't know how to create object '", objname, "'") } ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "getdata" function (NOT exported) ### .datacache <- new.env(hash=TRUE, parent=emptyenv()) getdata <- function(objname) { if (!exists(objname, envir=.datacache)) { if (objname %in% SERIALIZED_OBJNAMES) { data(list=objname, package="pwalign", envir=.datacache) } else { assign(objname, .createObject(objname), envir=.datacache) } } get(objname, envir=.datacache) } pwalign/R/align-utils.R0000644000175100017510000003432114614311433016012 0ustar00biocbuildbiocbuild### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "mismatch", "nmatch" and "nmismatch" methods. ### setMethod("mismatch", c(pattern = "AlignedXStringSet0", x = "missing"), function(pattern, x, fixed) pattern@mismatch ) setMethod("nmatch", c(pattern = "PairwiseAlignments", x = "missing"), function(pattern, x, fixed) .Call2("PairwiseAlignments_nmatch", nchar(pattern), nmismatch(pattern), nindel(subject(pattern))[,"WidthSum"], nindel(pattern(pattern))[,"WidthSum"], PACKAGE="pwalign") ) setMethod("nmatch", c(pattern = "PairwiseAlignmentsSingleSubjectSummary", x = "missing"), function(pattern, x, fixed) pattern@nmatch ) setMethod("nmismatch", c(pattern = "AlignedXStringSet0", x = "missing"), function(pattern, x, fixed) elementNROWS(mismatch(pattern)) ) setMethod("nmismatch", c(pattern = "PairwiseAlignments", x = "missing"), function(pattern, x, fixed) nmismatch(pattern(pattern)) ) setMethod("nmismatch", c(pattern = "PairwiseAlignmentsSingleSubjectSummary", x = "missing"), function(pattern, x, fixed) pattern@nmismatch ) setGeneric("nedit", function(x) standardGeneric("nedit")) setMethod("nedit", "PairwiseAlignments", function(x) nmismatch(x) + unname(nindel(subject(x))[,"WidthSum"]) + unname(nindel(pattern(x))[,"WidthSum"]) ) setMethod("nedit", "PairwiseAlignmentsSingleSubjectSummary", function(x) nmismatch(x) + unname(insertion(nindel(x))[,"WidthSum"]) + unname(deletion(nindel(x))[,"WidthSum"]) ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "mismatchTable" generic and methods. ### setGeneric("mismatchTable", signature = "x", function(x, shiftLeft = 0L, shiftRight = 0L, ...) standardGeneric("mismatchTable") ) setMethod("mismatchTable", "AlignedXStringSet0", function(x, shiftLeft = 0L, shiftRight = 0L, prefixColNames = "") { if (!isSingleNumber(shiftLeft) || shiftLeft > 0) stop("'shiftLeft' must be a non-positive integer") if (!isSingleNumber(shiftRight) || shiftRight < 0) stop("'shiftRight' must be a non-negative integer") shiftLeft <- as.integer(shiftLeft) shiftRight <- as.integer(shiftRight) nMismatch <- nmismatch(x) id <- rep.int(seq_len(length(nMismatch)), nMismatch) singleString <- (length(unaligned(x)) == 1) if (singleString) subset <- unaligned(x)[[1]] else subset <- unaligned(x)[id] position <- unlist(mismatch(x)) if (shiftLeft == 0L) start <- position else start <- pmax(position + shiftLeft, 1L) if (shiftRight == 0L) end <- position else end <- pmin(position + shiftRight, nchar(subset)) if (length(subset) == 0) substring <- character(0) else if (singleString) substring <- Views(subset, start=start, end=end) else substring <- narrow(subset, start=start, end=end) output <- data.frame("Id" = id, "Start" = start, "End" = end, "Substring" = as.character(substring)) if (any(nchar(prefixColNames) > 0)) names(output) <- paste(prefixColNames, names(output), sep = "") output } ) setMethod("mismatchTable", "QualityAlignedXStringSet", function(x, shiftLeft = 0L, shiftRight = 0L, prefixColNames = "") { output <- callNextMethod(x, shiftLeft = shiftLeft, shiftRight = shiftRight, prefixColNames = "") if (nrow(output) == 0) { output <- cbind(output, "Quality" = character(0)) } else { x_unaligned_quality <- quality(unaligned(x)) if (length(x_unaligned_quality) != length(x)) { ## Just for sanity but should never fail. stopifnot(length(x_unaligned_quality) == 1L) x_unaligned_quality <- rep.int(x_unaligned_quality, length(x)) } start <- output[["Start"]] end <- output[["End"]] quality <- narrow(x_unaligned_quality[output[["Id"]]], start=start, end=end) output <- cbind(output, "Quality"=as.character(quality)) } if (any(nchar(prefixColNames) > 0)) names(output) <- paste(prefixColNames, names(output), sep = "") output } ) setMethod("mismatchTable", "PairwiseAlignments", function(x, shiftLeft = 0L, shiftRight = 0L) { cbind(mismatchTable(pattern(x), shiftLeft = shiftLeft, shiftRight = shiftRight, prefixColNames = "Pattern"), mismatchTable(subject(x), shiftLeft = shiftLeft, shiftRight = shiftRight, prefixColNames = "Subject")[,-1]) } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "mismatchSummary" generic and methods. ### setGeneric("mismatchSummary", signature = "x", function(x, ...) standardGeneric("mismatchSummary") ) setMethod("mismatchSummary", "AlignedXStringSet0", function(x, weight=1L, .mismatchTable=mismatchTable(x)) { if (!is.numeric(weight) || !(length(weight) %in% c(1, length(x)))) stop("'weight' must be an integer vector with length 1 or 'length(x)'") if (!is.integer(weight)) weight <- as.integer(weight) coverageTable <- as.vector(coverage(x, weight = weight)) n <- length(coverageTable) if (length(weight) == 1) endTable <- weight * table(.mismatchTable[["End"]]) else endTable <- table(rep(.mismatchTable[["End"]], weight[.mismatchTable[["Id"]]])) countTable <- rep(0L, n) countTable[as.integer(names(endTable))] <- endTable list("position" = data.frame("Position" = seq_len(n), "Count" = countTable, "Probability" = countTable / coverageTable)) } ) setMethod("mismatchSummary", "QualityAlignedXStringSet", function(x, weight=1L, .mismatchTable=mismatchTable(x)) { if (!is.numeric(weight) || !(length(weight) %in% c(1, length(x)))) stop("'weight' must be an integer vector with length 1 or 'length(x)'") if (!is.integer(weight)) weight <- as.integer(weight) qualityValues <- (minQuality(quality(unaligned(x))) + offset(quality(unaligned(x)))): (maxQuality(quality(unaligned(x))) + offset(quality(unaligned(x)))) qualityZero <- offset(quality(unaligned(x))) if ((length(quality(unaligned(x))) == 1) && (nchar(quality(unaligned(x))) == 1)) qualityAll <- sum(as.numeric(weight) * width(x)) * alphabetFrequency(quality(unaligned(x)), collapse = TRUE)[qualityValues + 1] else { nonEmptyAlignment <- (width(x) > 0) if (length(weight) == 1) qualityAll <- as.numeric(weight) * alphabetFrequency(narrow(quality(unaligned(x))[nonEmptyAlignment], start = start(x)[nonEmptyAlignment], end = end(x)[nonEmptyAlignment]), collapse = TRUE)[qualityValues + 1] else qualityAll <- colSums(as.numeric(weight)[nonEmptyAlignment] * alphabetFrequency(narrow(quality(unaligned(x))[nonEmptyAlignment], start = start(x)[nonEmptyAlignment], end = end(x)[nonEmptyAlignment]) )[, qualityValues + 1, drop=FALSE]) } names(qualityAll) <- vapply(as.raw(qualityValues), rawToChar, character(1)) qualityAll <- qualityAll[qualityAll > 0] if (length(weight) == 1) qualityTable <- weight * table(.mismatchTable[["Quality"]]) else qualityTable <- table(rep(.mismatchTable[["Quality"]], weight[.mismatchTable[["Id"]]])) qualityCounts <- rep(0L, length(qualityAll)) names(qualityCounts) <- names(qualityAll) qualityCounts[names(qualityTable)] <- qualityTable c(callNextMethod(x, weight = weight, .mismatchTable = .mismatchTable), list("quality" = data.frame("Quality" = unlist(lapply(names(qualityAll), utf8ToInt)) - qualityZero, "Count" = qualityCounts, "Probability" = qualityCounts / qualityAll))) } ) setMethod("mismatchSummary", "PairwiseAlignmentsSingleSubject", function(x, weight=1L) { if (!is.numeric(weight) || !(length(weight) %in% c(1, length(x)))) stop("'weight' must be an integer vector with length 1 or 'length(x)'") if (!is.integer(weight)) weight <- as.integer(weight) mismatchTable <- list("pattern" = mismatchTable(pattern(x)), "subject" = mismatchTable(subject(x))) combinedInfo <- paste(mismatchTable[["subject"]][["End"]], mismatchTable[["pattern"]][["Substring"]], sep = "\001") if (length(weight) == 1) subjectTable <- weight * table(combinedInfo) else subjectTable <- table(rep(combinedInfo, weight[mismatchTable[["pattern"]][["Id"]]])) if (length(subjectTable) == 0) { subjectTableLabels <- character(0) subjectPosition <- integer(0) } else { subjectTableLabels <- strsplit(names(subjectTable), split = "\001") subjectPosition <- as.integer(unlist(lapply(subjectTableLabels, "[", 1))) } output <- list("pattern" = mismatchSummary(pattern(x), weight = weight, .mismatchTable = mismatchTable[["pattern"]]), "subject" = data.frame("SubjectPosition" = subjectPosition, "Subject" = safeExplode(letter(unaligned(subject(x))[[1]], subjectPosition)), "Pattern" = unlist(lapply(subjectTableLabels, "[", 2)), "Count" = as.vector(subjectTable), "Probability" = as.vector(subjectTable) / coverage(subject(x), weight = weight)[subjectPosition, drop = TRUE])) output[["subject"]] <- output[["subject"]][order(output[["subject"]][[1]], output[["subject"]][[2]]),] rownames(output[["subject"]]) <- as.character(seq_len(nrow(output[["subject"]]))) output } ) setMethod("mismatchSummary", "PairwiseAlignmentsSingleSubjectSummary", function(x) x@mismatchSummary ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "coverage" methods. ### setMethod("coverage", "AlignedXStringSet0", function(x, shift=0L, width=NULL, weight=1L) { shift <- recycleIntegerArg(shift, "shift", length(x@range)) if (is.null(width)) width <- max(nchar(unaligned(x))) + max(shift) coverage(x@range, shift=shift, width=width, weight=weight) } ) setMethod("coverage", "PairwiseAlignmentsSingleSubject", function(x, shift=0L, width=NULL, weight=1L) coverage(subject(x), shift=shift, width=width, weight=weight) ) setMethod("coverage", "PairwiseAlignmentsSingleSubjectSummary", function(x, shift=0L, width=NULL, weight=1L) { if (shift != 0L) stop("'shift' argument is not supported for 'PairwiseAlignmentsSingleSubjectSummary' objects") if (weight != 1L) stop("'weight' argument is not supported for 'PairwiseAlignmentsSingleSubjectSummary' objects") window(x@coverage, width=width) } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Compare strings ### ### This is a symmetric operation but the names of the arguments suggest ### otherwise which is unfortunate. Would have been better to call ### them 'x' and 'y'. setGeneric("compareStrings", signature=c("pattern", "subject"), function(pattern, subject) standardGeneric("compareStrings") ) .compareStrings <- function(pattern, subject) { if (!is.character(pattern)) pattern <- as.character(pattern) if (!is.character(subject)) subject <- as.character(subject) if (length(pattern) != length(subject)) stop(wmsg("'pattern' and 'subject' must have the same length")) ncharPattern <- nchar(pattern) if (any(ncharPattern != nchar(subject))) stop(wmsg("the strings in 'pattern' and 'subject' must have ", "the same number of characters")) .Call2("align_compareStrings", pattern, subject, max(ncharPattern), "+", "-", "?", PACKAGE="pwalign") } setMethod("compareStrings", c("character", "character"), .compareStrings) setMethod("compareStrings", c("XString", "XString"), .compareStrings) setMethod("compareStrings", c("XStringSet", "XStringSet"), .compareStrings) setMethod("compareStrings", c("AlignedXStringSet0", "AlignedXStringSet0"), .compareStrings) setMethod("compareStrings", c("PairwiseAlignments", "missing"), function(pattern, subject) .compareStrings(pattern@pattern, pattern@subject) ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### "consensusMatrix" method for PairwiseAlignmentsSingleSubject objects. ### setMethod("consensusMatrix", "PairwiseAlignmentsSingleSubject", function(x, as.prob=FALSE, shift=0L, width=NULL, baseOnly=FALSE, gapCode="-", endgapCode="-") { if (!identical(shift, 0L) || !identical(width, NULL)) stop(wmsg("\"consensusMatrix\" method for ", "PairwiseAlignmentsSingleSubject objects ", "doesn't support the 'shift' and 'width' arguments")) consensusMatrix(aligned(x, gapCode=gapCode, endgapCode=endgapCode), as.prob=as.prob, baseOnly=baseOnly) } ) pwalign/R/AlignedXStringSet-class.R0000644000175100017510000001374014614311433020225 0ustar00biocbuildbiocbuild### ========================================================================= ### AlignedXStringSet objects ### ------------------------------------------------------------------------- ### ### An AlignedXStringSet object contains aligned sequences. ### ### NOTE: Because the 'unaligned' slot of an AlignedXStringSet object ### must not be a QualityScaledXStringSet object (see Validity below), then ### the QualityAlignedXStringSet class cannot contain the AlignedXStringSet ### class. Otherwise, any QualityAlignedXStringSet object would be invalid! ### setClass("AlignedXStringSet0", contains="Vector", representation( "VIRTUAL", range="IRanges", # of length N mismatch="CompressedIntegerList", # of length N indel="CompressedIRangesList", # of length N unaligned="XStringSet" # of length 1 or N ) ) ### Combine the new "parallel slots" with those of the parent class. Make ### sure to put the new parallel slots **first**. See R/Vector-class.R file ### in the S4Vectors package for what slots should or should not be considered ### "parallel". setMethod("parallel_slot_names", "AlignedXStringSet0", function(x) { ans <- callNextMethod() if (length(x@unaligned) != 1L) ans <- c("unaligned", ans) c("range", "mismatch", "indel", ans) } ) setClass("AlignedXStringSet", contains="AlignedXStringSet0") setClass("QualityAlignedXStringSet", contains="AlignedXStringSet0", representation( unaligned="QualityScaledXStringSet" ) ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Validity. ### .valid.AlignedXStringSet <- function(object) { message <- NULL if (is(object@unaligned, "QualityScaledXStringSet")) message <- c(message, "'unaligned' must not be a QualityScaledXStringSet object") message } setValidity("AlignedXStringSet", function(object) { problems <- .valid.AlignedXStringSet(object) if (is.null(problems)) TRUE else problems } ) .valid.QualityAlignedXStringSet <- function(object) { message <- NULL ## FIXME: surely something different is meant here because the ## QualityAlignedXStringSet class has no 'quality' slot! #if (!(length(object@quality) %in% c(1, length(object@range)))) # message <- c(message, "length(quality) != 1 or length(range)") message } setValidity("QualityAlignedXStringSet", function(object) { problems <- .valid.QualityAlignedXStringSet(object) if (is.null(problems)) TRUE else problems } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Accessor methods. ### setGeneric("unaligned", function(x) standardGeneric("unaligned")) setMethod("unaligned", "AlignedXStringSet0", function(x) x@unaligned) setGeneric("aligned", function(x, ...) standardGeneric("aligned")) setMethod("aligned", "AlignedXStringSet0", function(x, degap = FALSE) { if (degap) { if (length(unaligned(x)) == 1) { value <- Views(unaligned(x)[[1]], start=start(x@range), end=end(x@range)) } else { value <- narrow(as(unaligned(x), "XStringSet"), start=start(x@range), end=end(x@range)) } } else { codecX <- xscodec(x) if (is.null(codecX)) { gapCode <- charToRaw("-") } else { letters2codes <- codecX@codes names(letters2codes) <- codecX@letters gapCode <- as.raw(letters2codes[["-"]]) } value <- .Call2("AlignedXStringSet_align_aligned", x, gapCode, PACKAGE="pwalign") } value }) setMethod("start", "AlignedXStringSet0", function(x) start(x@range)) setMethod("end", "AlignedXStringSet0", function(x) end(x@range)) setMethod("width", "AlignedXStringSet0", function(x) width(x@range)) setMethod("ranges", "AlignedXStringSet0", function(x) x@range) setGeneric("indel", function(x) standardGeneric("indel")) setMethod("indel", "AlignedXStringSet0", function(x) x@indel) setGeneric("nindel", function(x) standardGeneric("nindel")) setMethod("nindel", "AlignedXStringSet0", function(x) summary(indel(x))) setMethod("nchar", "AlignedXStringSet0", function(x, type="chars", allowNA=FALSE) .Call2("AlignedXStringSet_nchar", x, PACKAGE="pwalign")) setMethod("seqtype", "AlignedXStringSet0", function(x) seqtype(unaligned(x))) setMethod("parallelVectorNames", "AlignedXStringSet0", function(x) c("unaligned", "range", "mismatch", "indel", "start", "end", "width", "nindel")) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "show" method. ### ### TODO: Make the "show" method to format the alignment in a SGD fashion ### i.e. split in 60-letter blocks and use the "|" character to highlight ### exact matches. setMethod("show", "AlignedXStringSet0", function(object) { if (length(object) == 0) cat("Empty ", class(object), "\n", sep = "") else { if (length(object) > 1L) cat(class(object), " (1 of ", length(object), ")\n", sep = "") if (width(object)[1] == 0) { cat("[1] \"\"\n") } else { snippet <- toSeqSnippet(aligned(object)[[1L]], getOption("width") - 8L) cat(paste("[", start(object)[1], "]", sep = ""), add_colors(snippet), "\n") } } } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Coercion. ### setMethod("as.character", "AlignedXStringSet0", function(x) { as.character(aligned(x)) } ) setMethod("toString", "AlignedXStringSet0", function(x, ...) toString(as.character(x), ...) ) pwalign/R/InDel-class.R0000644000175100017510000000126314614311433015657 0ustar00biocbuildbiocbuild### ========================================================================== ### InDel objects ### -------------------------------------------------------------------------- ### An InDel object contains the insertion and deletion information. setClass("InDel", representation( insertion="ANY", deletion="ANY" ) ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Accessor methods. ### setGeneric("insertion", function(x) standardGeneric("insertion")) setMethod("insertion", "InDel", function(x) x@insertion) setGeneric("deletion", function(x) standardGeneric("deletion")) setMethod("deletion", "InDel", function(x) x@deletion) pwalign/R/pairwiseAlignment.R0000644000175100017510000005752714614311433017261 0ustar00biocbuildbiocbuild### ========================================================================= ### The pairwiseAlignment() generic & related functions ### ------------------------------------------------------------------------- ### ### The pairwiseAligment() function provides optimal pairwise alignment of ### the following types: ### - Global alignment ### - Local alignment ### - Overlap alignment ### - Pattern Overlap alignment ### - Subject Overlap alignment ### ### ------------------------------------------------------------------------- XStringSet.pairwiseAlignment <- function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 10, gapExtension = 4, scoreOnly = FALSE) { ## Check arguments if (seqtype(pattern) != seqtype(subject)) stop("'pattern' and 'subject' must contain ", "sequences of the same type") if (!(length(subject) %in% c(1, length(pattern)))) stop("'length(subject)' must equal 1 or 'length(pattern)'") type <- match.arg(type, c("global", "local", "overlap", "global-local", "local-global", "subjectOverlap", "patternOverlap")) if (type == "subjectOverlap") { warning("type = 'subjectOverlap' has been renamed type = 'global-local'") type <- "global-local" } if (type == "patternOverlap") { warning("type = 'patternOverlap' has been renamed type = 'local-global'") type <- "local-global" } typeCode <- c("global" = 1L, "local" = 2L, "overlap" = 3L, "global-local" = 4L, "local-global" = 5L)[[type]] gapOpening <- as.double(abs(gapOpening)) if (length(gapOpening) != 1 || is.na(gapOpening)) stop("'gapOpening' must be a non-negative numeric vector of length 1") gapExtension <- as.double(abs(gapExtension)) if (length(gapExtension) != 1 || is.na(gapExtension)) stop("'gapExtension' must be a non-negative numeric vector of length 1") scoreOnly <- as.logical(scoreOnly) if (length(scoreOnly) != 1 || any(is.na(scoreOnly))) stop("'scoreOnly' must be a non-missing logical value") ## Process string information if (is.null(xscodec(pattern))) { unique_letters <- unique(c(uniqueLetters(pattern), uniqueLetters(subject))) #Even if safeLettersToInt() will deal properly with embedded nuls, I #suspect bad things would probably happen downstream in case there are any. alphabetToCodes <- safeLettersToInt(unique_letters, letters.as.names=TRUE) } else { alphabetToCodes <- xscodes(pattern) } useQuality <- FALSE if (is.character(substitutionMatrix)) { if (length(substitutionMatrix) != 1) stop("'substitutionMatrix' is a character vector of length != 1") tempMatrix <- substitutionMatrix substitutionMatrix <- try(getdata(tempMatrix), silent = TRUE) if (is(substitutionMatrix, "try-error")) stop("unknown scoring matrix \"", tempMatrix, "\"") } if (!is.matrix(substitutionMatrix) || !is.numeric(substitutionMatrix)) stop("'substitutionMatrix' must be a numeric matrix") if (!identical(rownames(substitutionMatrix), colnames(substitutionMatrix))) stop("row and column names differ for matrix 'substitutionMatrix'") if (is.null(rownames(substitutionMatrix))) stop("matrix 'substitutionMatrix' must have row and column names") if (any(duplicated(rownames(substitutionMatrix)))) stop("matrix 'substitutionMatrix' has duplicated row names") availableLetters <- intersect(names(alphabetToCodes), rownames(substitutionMatrix)) substitutionMatrix <- matrix(as.double(substitutionMatrix[availableLetters, availableLetters]), nrow = length(availableLetters), ncol = length(availableLetters), dimnames = list(availableLetters, availableLetters)) substitutionArray <- array(unlist(substitutionMatrix, substitutionMatrix), dim = c(dim(substitutionMatrix), 2), dimnames = list(availableLetters, availableLetters, c("0", "1"))) substitutionLookupTable <- buildLookupTable(alphabetToCodes[availableLetters], 0:(length(availableLetters) - 1)) fuzzyMatrix <- matrix(0L, length(availableLetters), length(availableLetters), dimnames = list(availableLetters, availableLetters)) diag(fuzzyMatrix) <- 1L fuzzyLookupTable <- buildLookupTable(alphabetToCodes[availableLetters], 0:(length(availableLetters) - 1)) .Call2("XStringSet_align_pairwiseAlignment", pattern, subject, type, typeCode, scoreOnly, gapOpening, gapExtension, useQuality, substitutionArray, dim(substitutionArray), substitutionLookupTable, fuzzyMatrix, dim(fuzzyMatrix), fuzzyLookupTable, PACKAGE="pwalign") } .normargFuzzyMatrix <- function(fuzzyMatrix, rownames) { if (is.null(fuzzyMatrix)) { fuzzyMatrix <- diag(length(rownames)) dimnames(fuzzyMatrix) <- list(rownames, rownames) return(fuzzyMatrix) } if (!is.matrix(fuzzyMatrix) || !is.numeric(fuzzyMatrix) || any(is.na(fuzzyMatrix)) || any(fuzzyMatrix < 0) || any(fuzzyMatrix > 1)) stop("'fuzzyMatrix' must be a numeric matrix with values ", "between 0 and 1 inclusive") if (!identical(rownames(fuzzyMatrix), colnames(fuzzyMatrix))) stop("row and column names differ for matrix 'fuzzyMatrix'") if (is.null(rownames(fuzzyMatrix))) stop("matrix 'fuzzyMatrix' must have row and column names") if (any(duplicated(rownames(fuzzyMatrix)))) stop("matrix 'fuzzyMatrix' has duplicated row names") availableLetters <- intersect(rownames, rownames(fuzzyMatrix)) fuzzyMatrix[availableLetters, availableLetters, drop = FALSE] } .makeSubstitutionLookupTable <- function(qpattern) { keys <- (minQuality(qpattern) + offset(qpattern)): (maxQuality(qpattern) + offset(qpattern)) vals <- 0:(maxQuality(qpattern) - minQuality(qpattern)) buildLookupTable(keys, vals) } QualityScaledXStringSet.pairwiseAlignment <- function(pattern, subject, type = "global", fuzzyMatrix = NULL, gapOpening = 10, gapExtension = 4, scoreOnly = FALSE) { ## Check arguments if (class(pattern) != class(subject)) stop("'pattern' and 'subject' must be of the same class") if (!(length(subject) %in% c(1L, length(pattern)))) stop("'length(subject)' must equal 1 or 'length(pattern)'") type <- match.arg(type, c("global", "local", "overlap", "global-local", "local-global", "subjectOverlap", "patternOverlap")) if (type == "subjectOverlap") { warning("type \"subjectOverlap\" has been renamed \"global-local\"") type <- "global-local" } if (type == "patternOverlap") { warning("type \"patternOverlap\" has been renamed \"local-global\"") type <- "local-global" } typeCode <- c("global" = 1L, "local" = 2L, "overlap" = 3L, "global-local" = 4L, "local-global" = 5L)[[type]] gapOpening <- as.double(abs(gapOpening)) if (length(gapOpening) != 1L || is.na(gapOpening)) stop("'gapOpening' must be a non-negative numeric vector of length 1") gapExtension <- as.double(abs(gapExtension)) if (length(gapExtension) != 1L || is.na(gapExtension)) stop("'gapExtension' must be a non-negative numeric vector of length 1") scoreOnly <- as.logical(scoreOnly) if (length(scoreOnly) != 1L || any(is.na(scoreOnly))) stop("'scoreOnly' must be a non-missing logical value") if (class(quality(pattern)) != class(quality(subject))) stop("'quality(pattern)' and 'quality(subject)' must be ", "of the same class") ## Process string information if (is.null(xscodec(pattern))) { unique_letters <- unique(c(uniqueLetters(pattern), uniqueLetters(subject))) #Even if safeLettersToInt() will deal properly with embedded nuls, I #suspect bad things will happen downstream in case there are any. alphabetToCodes <- safeLettersToInt(unique_letters, letters.as.names=TRUE) } else { alphabetToCodes <- xscodes(pattern) } useQuality <- TRUE fuzzyMatrix <- .normargFuzzyMatrix(fuzzyMatrix, names(alphabetToCodes)) uniqueFuzzyValues <- sort(unique(fuzzyMatrix)) fuzzyReferenceMatrix <- matrix(match(fuzzyMatrix, uniqueFuzzyValues) - 1L, nrow = nrow(fuzzyMatrix), ncol = ncol(fuzzyMatrix), dimnames = dimnames(fuzzyMatrix)) fuzzyLookupTable <- buildLookupTable(alphabetToCodes[rownames(fuzzyMatrix)], seq_len(nrow(fuzzyMatrix)) - 1L) alphabetLength <- switch(class(pattern), QualityScaledDNAStringSet =, QualityScaledRNAStringSet = 4L, QualityScaledAAStringSet = 20L, length(alphabetToCodes)) substitutionArray <- qualitySubstitutionMatrices(fuzzyMatch = uniqueFuzzyValues, alphabetLength = alphabetLength, qualityClass = class(quality(pattern))) substitutionLookupTable <- .makeSubstitutionLookupTable(quality(pattern)) .Call2("XStringSet_align_pairwiseAlignment", pattern, subject, type, typeCode, scoreOnly, gapOpening, gapExtension, useQuality, substitutionArray, dim(substitutionArray), substitutionLookupTable, fuzzyReferenceMatrix, dim(fuzzyReferenceMatrix), fuzzyLookupTable, PACKAGE="pwalign") } mpi.collate.pairwiseAlignment <- function(mpiOutput, pattern, subject) { value <- mpiOutput[[1]] value@score <- unlist(lapply(mpiOutput, score)) value@pattern@unaligned <- pattern value@pattern@range <- do.call(c, lapply(mpiOutput, function(x) x@pattern@range)) value@pattern@mismatch <- do.call(c, lapply(mpiOutput, function(x) x@pattern@mismatch)) value@pattern@indel <- do.call(c, lapply(mpiOutput, function(x) x@pattern@indel)) value@subject@unaligned <- subject value@subject@range <- do.call(c, lapply(mpiOutput, function(x) x@subject@range)) value@subject@mismatch <- do.call(c, lapply(mpiOutput, function(x) x@subject@mismatch)) value@subject@indel <- do.call(c, lapply(mpiOutput, function(x) x@subject@indel)) value } mpi.XStringSet.pairwiseAlignment <- function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 10, gapExtension = 4, scoreOnly = FALSE) { n <- length(pattern) if (n > 1 && is.loaded("mpi_comm_size")) { ## 'get()' are to quieten R CMD check, and for no other reason mpi.comm.size <- get("mpi.comm.size", mode="function") mpi.remote.exec <- get("mpi.remote.exec", mode="function") mpi.parLapply <- get("mpi.parLapply", mode="function") k <- min(mpi.comm.size() - 1, n) useMpi <- (k > 1) } else { useMpi <- FALSE } ## Use of Rmpi temporarily disabled. #if (useMpi) { if (FALSE) { perNode <- n %/% k subsets <- vector("list", k) for (i in seq_len(k)) { indices <- ((i-1)*perNode+1):ifelse(i < k, i*perNode, n) if (length(subject) == 1) { subsets[[i]] <- list(pattern = XStringSet(seqtype(pattern), as.character(pattern[indices])), subject = subject) } else { subsets[[i]] <- list(pattern = XStringSet(seqtype(pattern), as.character(pattern[indices])), subject = XStringSet(seqtype(subject), as.character(subject[indices]))) } } ## Commented out for now to quieten BiocCheck. Uncomment when ## re-enabling Rmpi. #mpi.remote.exec(library(pwalign), ret = FALSE) mpiOutput <- mpi.parLapply(subsets, function(x, type = "global", substitutionMatrix = NULL, gapOpening = 10, gapExtension = 4, scoreOnly = FALSE) { output <- XStringSet.pairwiseAlignment(pattern = x$pattern, subject = x$subject, type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, scoreOnly = scoreOnly) if (!scoreOnly) { output@pattern@unaligned <- BStringSet("") output@subject@unaligned <- BStringSet("") } output }, type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, scoreOnly = scoreOnly) if (scoreOnly) { value <- unlist(mpiOutput) } else { value <- mpi.collate.pairwiseAlignment(mpiOutput, pattern, subject) } } else { value <- XStringSet.pairwiseAlignment(pattern = pattern, subject = subject, type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, scoreOnly = scoreOnly) } value } mpi.QualityScaledXStringSet.pairwiseAlignment <- function(pattern, subject, type = "global", fuzzyMatrix = NULL, gapOpening = 10, gapExtension = 4, scoreOnly = FALSE) { n <- length(pattern) if (n > 1 && is.loaded("mpi_comm_size")) { ## 'get()' are to quieten R CMD check, and for no other reason mpi.comm.size <- get("mpi.comm.size", mode="function") mpi.remote.exec <- get("mpi.remote.exec", mode="function") mpi.parLapply <- get("mpi.parLapply", mode="function") k <- min(mpi.comm.size() - 1, n) useMpi <- (k > 1) } else { useMpi <- FALSE } ## Use of Rmpi temporarily disabled. #if (useMpi) { if (FALSE) { perNode <- n %/% k subsets <- vector("list", k) for (i in seq_len(k)) { indices <- ((i-1)*perNode+1):ifelse(i < k, i*perNode, n) if (length(subject) == 1) { subsets[[i]] <- list(pattern = QualityScaledXStringSet(XStringSet(seqtype(pattern), as.character(pattern[indices])), do.call(class(quality(pattern)), list(as.character(quality(pattern[indices]))))), subject = subject) } else { subsets[[i]] <- list(pattern = QualityScaledXStringSet(XStringSet(seqtype(pattern), as.character(pattern[indices])), do.call(class(quality(pattern)), list(as.character(quality(pattern[indices]))))), subject = QualityScaledXStringSet(XStringSet(seqtype(subject), as.character(subject[indices])), do.call(class(quality(subject)), list(as.character(quality(subject[indices])))))) } } ## Commented out for now to quieten BiocCheck. Uncomment when ## re-enabling Rmpi. #mpi.remote.exec(library(pwalign), ret = FALSE) mpiOutput <- mpi.parLapply(subsets, function(x, type = "global", fuzzyMatrix = NULL, gapOpening = 10, gapExtension = 4, scoreOnly = FALSE) { output <- QualityScaledXStringSet.pairwiseAlignment(pattern = x$pattern, subject = x$subject, type = type, fuzzyMatrix = fuzzyMatrix, gapOpening = gapOpening, gapExtension = gapExtension, scoreOnly = scoreOnly) if (!scoreOnly) { output@pattern@unaligned <- BStringSet("") output@subject@unaligned <- BStringSet("") } output }, type = type, fuzzyMatrix = fuzzyMatrix, gapOpening = gapOpening, gapExtension = gapExtension, scoreOnly = scoreOnly) if (scoreOnly) { value <- unlist(mpiOutput) } else { value <- mpi.collate.pairwiseAlignment(mpiOutput, pattern, subject) } } else { value <- QualityScaledXStringSet.pairwiseAlignment(pattern = pattern, subject = subject, type = type, fuzzyMatrix = fuzzyMatrix, gapOpening = gapOpening, gapExtension = gapExtension, scoreOnly = scoreOnly) } value } ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### pairwiseAlignment() generic and methods. ### ### We want pairwiseAlignment() to work when 'pattern' and 'subject' are any ### of the 4 following objects: character vector, XString, XStringSet, and ### QualityScaledXStringSet. With the 4 methods defined below, we cover the 16 ### type pairs we want to support (and more). ### TODO: Maybe consider making pairwiseAlignment() just an ordinary function? ### setGeneric("pairwiseAlignment", function(pattern, subject, ...) standardGeneric("pairwiseAlignment") ) setMethod("pairwiseAlignment", c("ANY", "ANY"), function(pattern, subject, patternQuality=PhredQuality(22L), subjectQuality=PhredQuality(22L), type="global", substitutionMatrix=NULL, fuzzyMatrix=NULL, gapOpening=10, gapExtension=4, scoreOnly=FALSE) { ## Turn each of 'pattern' and 'subject' into an instance of one of ## the 4 direct concrete subclasses of the XStringSet virtual class. pattern_seqtype <- try(seqtype(pattern), silent=TRUE) if (is(pattern_seqtype, "try-error")) pattern_seqtype <- "B" subject_seqtype <- try(seqtype(subject), silent=TRUE) if (is(subject_seqtype, "try-error")) subject_seqtype <- "B" if (pattern_seqtype == "B") pattern_seqtype <- subject_seqtype if (subject_seqtype == "B") subject_seqtype <- pattern_seqtype pattern <- XStringSet(pattern_seqtype, pattern) subject <- XStringSet(subject_seqtype, subject) if (!is.null(substitutionMatrix)) { mpi.XStringSet.pairwiseAlignment(pattern, subject, type=type, substitutionMatrix=substitutionMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } else { pattern <- QualityScaledXStringSet(pattern, patternQuality) subject <- QualityScaledXStringSet(subject, subjectQuality) mpi.QualityScaledXStringSet.pairwiseAlignment(pattern, subject, type=type, fuzzyMatrix=fuzzyMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } } ) setMethod("pairwiseAlignment", c("ANY", "QualityScaledXStringSet"), function(pattern, subject, patternQuality=PhredQuality(22L), type="global", substitutionMatrix=NULL, fuzzyMatrix=NULL, gapOpening=10, gapExtension=4, scoreOnly=FALSE) { if (is.character(pattern)) { pattern <- XStringSet(seqtype(subject), pattern) } else { pattern <- as(pattern, "XStringSet") } if (!is.null(substitutionMatrix)) { subject <- as(subject, "XStringSet") mpi.XStringSet.pairwiseAlignment(pattern, subject, type=type, substitutionMatrix=substitutionMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } else { pattern <- QualityScaledXStringSet(pattern, patternQuality) mpi.QualityScaledXStringSet.pairwiseAlignment(pattern, subject, type=type, fuzzyMatrix=fuzzyMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } } ) setMethod("pairwiseAlignment", c("QualityScaledXStringSet", "ANY"), function(pattern, subject, subjectQuality=PhredQuality(22L), type="global", substitutionMatrix=NULL, fuzzyMatrix=NULL, gapOpening=10, gapExtension=4, scoreOnly=FALSE) { if (is.character(subject)) { subject <- XStringSet(seqtype(pattern), subject) } else { subject <- as(subject, "XStringSet") } if (!is.null(substitutionMatrix)) { pattern <- as(pattern, "XStringSet") mpi.XStringSet.pairwiseAlignment(pattern, subject, type=type, substitutionMatrix=substitutionMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } else { subject <- QualityScaledXStringSet(subject, subjectQuality) mpi.QualityScaledXStringSet.pairwiseAlignment(pattern, subject, type=type, fuzzyMatrix=fuzzyMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } } ) setMethod("pairwiseAlignment", c("QualityScaledXStringSet", "QualityScaledXStringSet"), function(pattern, subject, type="global", substitutionMatrix=NULL, fuzzyMatrix=NULL, gapOpening=10, gapExtension=4, scoreOnly=FALSE) { if (!is.null(substitutionMatrix)) { pattern <- as(pattern, "XStringSet") subject <- as(subject, "XStringSet") mpi.XStringSet.pairwiseAlignment(pattern, subject, type=type, substitutionMatrix=substitutionMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } else { mpi.QualityScaledXStringSet.pairwiseAlignment(pattern, subject, type=type, fuzzyMatrix=fuzzyMatrix, gapOpening=gapOpening, gapExtension=gapExtension, scoreOnly=scoreOnly) } } ) pwalign/R/PairwiseAlignments-class.R0000644000175100017510000004420714614311433020476 0ustar00biocbuildbiocbuild### ========================================================================= ### PairwiseAlignments objects ### ------------------------------------------------------------------------- ### ### A PairwiseAlignments object contains two aligned XStringSet objects. ### setClass("PairwiseAlignments", contains="Vector", representation( pattern="AlignedXStringSet0", # of length N subject="AlignedXStringSet0", # of length N score="numeric", # of length N type="character", gapOpening="numeric", gapExtension="numeric" ) ) ### Combine the new "parallel slots" with those of the parent class. Make ### sure to put the new parallel slots **first**. See R/Vector-class.R file ### in the S4Vectors package for what slots should or should not be considered ### "parallel". setMethod("parallel_slot_names", "PairwiseAlignments", function(x) c("pattern", "subject", "score", callNextMethod()) ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Constructors. ### newPairwiseAlignments <- function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString", pwaClass = "PairwiseAlignments") { seqtype <- substr(baseClass, 1, nchar(baseClass) - 6) # remove "String" suffix getMismatches <- function(x) { whichMismatches <- which(x[["values"]] == "?") if (length(whichMismatches) == 0) { value <- integer(0) } else { start <- cumsum(x[["lengths"]])[whichMismatches] end <- start + (x[["lengths"]][whichMismatches] - 1L) value <- eval(parse(text = paste("c(", paste(start, ":", end, sep = "", collapse = ", "), ")"))) } IntegerList(value) } getRange <- function(x) { if (!(x[["values"]][1] %in% c("-", "+"))) { start <- 1L } else if (length(x[["values"]]) == 1) { start <- integer(0) } else { start <- x[["lengths"]][1] + 1L } if (!(x[["values"]][length(x[["values"]])] %in% c("-", "+"))) { end <- sum(x[["lengths"]]) } else if (length(x[["values"]]) == 1) { end <- integer(0) } else { end <- sum(x[["lengths"]][-length(x[["lengths"]])]) } IRanges(start = start, end = end) } getIndels <- function(x, indelChar) { if (x[["values"]][1] %in% c("-", "+")) { x[["values"]] <- x[["values"]][-1] x[["lengths"]] <- x[["lengths"]][-1] } if (x[["values"]][length(x[["values"]])] %in% c("-", "+")) { x[["values"]] <- x[["values"]][-length(x[["values"]])] x[["lengths"]] <- x[["lengths"]][-length(x[["lengths"]])] } isIndels <- (x[["values"]] == indelChar) if (!any(isIndels)) IRangesList(IRanges(integer(0), integer(0))) else IRangesList(IRanges( cumsum(c(1L, ifelse(isIndels, 0L, x[["lengths"]])[-length(x[["lengths"]])]))[isIndels], width = x[["lengths"]][isIndels])) } if (length(pattern) != 1 || length(subject) != 1) stop("'pattern' and 'subject' must both be of length 1") if (nchar(pattern) != nchar(subject)) stop("'pattern' and 'subject' must have the same number of characters") type <- match.arg(type, c("global", "local", "overlap", "global-local", "local-global", "subjectOverlap", "patternOverlap")) if (type == "subjectOverlap") { warning("type = 'subjectOverlap' has been renamed type = 'global-local'") type <- "global-local" } if (type == "patternOverlap") { warning("type = 'patternOverlap' has been renamed type = 'local-global'") type <- "local-global" } gapOpening <- as.double(abs(gapOpening)) if (length(gapOpening) != 1 || is.na(gapOpening)) stop("'gapOpening' must be a non-negative numeric vector of length 1") gapExtension <- as.double(abs(gapExtension)) if (length(gapExtension) != 1 || is.na(gapExtension)) stop("'gapExtension' must be a non-negative numeric vector of length 1") explodedPattern <- safeExplode(pattern) explodedSubject <- safeExplode(subject) degappedPattern <- explodedPattern[explodedPattern != "-"] degappedSubject <- explodedSubject[explodedSubject != "-"] availableLetters <- sort(unique(c(unique(degappedPattern), unique(degappedSubject)))) if (is.null(substitutionMatrix)) { substitutionMatrix <- diag(length(availableLetters)) - 1 dimnames(substitutionMatrix) <- list(availableLetters, availableLetters) } else if (is.character(substitutionMatrix)) { if (length(substitutionMatrix) != 1) stop("'substitutionMatrix' is a character vector of length != 1") tempMatrix <- substitutionMatrix substitutionMatrix <- try(getdata(tempMatrix), silent = TRUE) if (is(substitutionMatrix, "try-error")) stop("unknown scoring matrix \"", tempMatrix, "\"") } if (!is.matrix(substitutionMatrix) || !is.numeric(substitutionMatrix)) stop("'substitutionMatrix' must be a numeric matrix") if (!identical(rownames(substitutionMatrix), colnames(substitutionMatrix))) stop("row and column names differ for matrix 'substitutionMatrix'") if (is.null(rownames(substitutionMatrix))) stop("matrix 'substitutionMatrix' must have row and column names") if (any(duplicated(rownames(substitutionMatrix)))) stop("matrix 'substitutionMatrix' has duplicated row names") availableLetters <- intersect(availableLetters, rownames(substitutionMatrix)) substitutionMatrix <- matrix(as.double(substitutionMatrix[availableLetters, availableLetters]), nrow = length(availableLetters), ncol = length(availableLetters), dimnames = list(availableLetters, availableLetters)) comparison <- rle(safeExplode(compareStrings(pattern, subject))) whichPattern <- which(comparison[["values"]] != "-") patternRle <- structure(list(lengths = comparison[["lengths"]][whichPattern], values = comparison[["values"]][whichPattern]), class = "rle") whichSubject <- which(comparison[["values"]] != "+") subjectRle <- structure(list(lengths = comparison[["lengths"]][whichSubject], values = comparison[["values"]][whichSubject]), class = "rle") substitutionIndices <- (explodedPattern != "-") & (explodedSubject != "-") new(pwaClass, pattern = new("AlignedXStringSet", unaligned = XStringSet(seqtype, paste(degappedPattern, collapse = "")), range = getRange(patternRle), mismatch = getMismatches(patternRle), indel = getIndels(comparison, "-")), subject = new("AlignedXStringSet", unaligned = XStringSet(seqtype, paste(degappedSubject, collapse = "")), range = getRange(subjectRle), mismatch = getMismatches(subjectRle), indel = getIndels(comparison, "+")), type = type, score = sum(substitutionMatrix[ match(explodedPattern[substitutionIndices], availableLetters) + length(availableLetters) * (match(explodedSubject[substitutionIndices], availableLetters) - 1)]) + gapOpening * sum(comparison[["values"]] %in% c("+", "-")) + gapExtension * sum(comparison[["lengths"]][comparison[["values"]] %in% c("+", "-")]), gapOpening = gapOpening, gapExtension = gapExtension) } setGeneric("PairwiseAlignments", function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, ...) standardGeneric("PairwiseAlignments")) setMethod("PairwiseAlignments", signature(pattern = "XString", subject = "XString"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (seqtype(pattern) != seqtype(subject)) stop("'pattern' and 'subject' must contain ", "sequences of the same type") PairwiseAlignments(as.character(pattern), as.character(subject), type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = xsbaseclass(pattern)) } ) setMethod("PairwiseAlignments", signature(pattern = "XStringSet", subject = "missing"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (length(pattern) != 2) stop("'pattern' must be of length 2 when 'subject' is missing") if (diff(nchar(pattern)) != 0) stop("'pattern' elements must have the same number of characters") PairwiseAlignments(as.character(pattern[1]), as.character(pattern[2]), type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = xsbaseclass(pattern)) } ) setMethod("PairwiseAlignments", signature(pattern = "character", subject = "missing"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") { if (length(pattern) != 2) stop("'pattern' must be of length 2 when 'subject' is missing") if (diff(nchar(pattern)) != 0) stop("'pattern' elements must have the same number of characters") PairwiseAlignments(pattern[1], pattern[2], type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = baseClass) } ) setMethod("PairwiseAlignments", signature(pattern = "character", subject = "character"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") { newPairwiseAlignments(pattern = pattern, subject = subject, type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = baseClass, pwaClass = "PairwiseAlignments") } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Validity. ### .valid.PairwiseAlignments <- function(object) { message <- NULL if (!identical(class(unaligned(pattern(object))), class(unaligned(subject(object))))) message <- c(message, "'unaligned(pattern)' and 'unaligned(subject)' must be XString objects of the same base type") if (length(object@type) != 1 || !(object@type %in% c("global", "local", "overlap", "global-local", "local-global"))) message <- c(message, "'type' must be one of 'global', 'local', 'overlap', 'global-local', or 'local-global'") if (!isSingleNumber(object@gapOpening) || object@gapOpening < 0) message <- c(message, "'gapOpening' must be a non-negative numeric vector of length 1") if (!isSingleNumber(object@gapExtension) || object@gapExtension < 0) message <- c(message, "'gapExtension' must be a non-negative numeric vector of length 1") message } setValidity("PairwiseAlignments", function(object) { problems <- .valid.PairwiseAlignments(object) if (is.null(problems)) TRUE else problems } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### get_aligned_pattern() ### ### 'x_pattern' and 'x_subject' must come from the "pattern" and "subject" ### slots of a PairwiseAlignments object of length 1. They're both expected ### to be AlignedXStringSet0 objects of length 1. ### 'global.pattern' and 'global.subject' must indicate whether the pattern ### and/or subject were globally aligned or not. get_aligned_pattern <- function(x_pattern, x_subject, global.pattern=TRUE, global.subject=TRUE, check=FALSE) { if (!is(x_pattern, "AlignedXStringSet0") || length(x_pattern) != 1L) stop("'x_pattern' must be an AlignedXStringSet0 object of length 1") if (!is(x_subject, "AlignedXStringSet0") || length(x_subject) != 1L) stop("'x_subject' must be an AlignedXStringSet0 object of length 1") aligned_pattern <- aligned(x_pattern)[[1L]] # XString object ## Sanity check: if (check) { aligned_subject <- aligned(x_subject)[[1L]] # XString object stopifnot(identical(length(aligned_pattern), length(aligned_subject))) } ans <- aligned_pattern original_pattern <- x_pattern@unaligned[[1L]] # XString object ## We only need 'original_subject' for its length. original_subject <- x_subject@unaligned[[1L]] # XString object if (global.pattern) { start1 <- start(x_pattern@range) if (start1 > 1L) { prefix1 <- subseq(original_pattern, end=start1 - 1L) ans <- c(prefix1, ans) } end1 <- end(x_pattern@range) if (end1 < length(original_pattern)) { suffix1 <- subseq(original_pattern, start=end1 + 1L) ans <- c(ans, suffix1) } } if (global.subject) { start2 <- start(x_subject@range) if (start2 > 1L) { prefix2 <- rep.int(XString(seqtype(ans), "-"), start2 - 1L) ans <- c(prefix2, ans) } end2 <- end(x_subject@range) if (end2 < length(original_subject)) { suffix2 <- rep.int(XString(seqtype(ans), "-"), length(original_subject) - end2) ans <- c(ans, suffix2) } } ans } ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Accessor methods. ### setMethod("pattern", "PairwiseAlignments", function(x) x@pattern) setMethod("subject", "PairwiseAlignments", function(x) x@subject) setMethod("type", "PairwiseAlignments", function(x) x@type) setGeneric("alignedPattern", function(x) standardGeneric("alignedPattern")) setGeneric("alignedSubject", function(x) standardGeneric("alignedSubject")) setMethod("alignedPattern", "PairwiseAlignments", function(x) { x_pattern <- pattern(x) x_subject <- subject(x) x_type <- type(x) global.pattern <- x_type %in% c("global", "global-local") global.subject <- x_type %in% c("global", "local-global") ans <- do.call(c, lapply(seq_along(x), function(i) { as(get_aligned_pattern(x_pattern[i], x_subject[i], global.pattern, global.subject), "XStringSet") } ) ) names(ans) <- names(x_pattern@unaligned) ans } ) setMethod("alignedSubject", "PairwiseAlignments", function(x) { x_pattern <- pattern(x) x_subject <- subject(x) x_type <- type(x) global.pattern <- x_type %in% c("global", "global-local") global.subject <- x_type %in% c("global", "local-global") ans <- do.call(c, lapply(seq_along(x), function(i) { as(get_aligned_pattern(x_subject[i], x_pattern[i], global.subject, global.pattern), "XStringSet") } ) ) names(ans) <- names(x_subject@unaligned) ans } ) setMethod("score", "PairwiseAlignments", function(x) x@score) setMethod("insertion", "PairwiseAlignments", function(x) indel(subject(x))) setMethod("deletion", "PairwiseAlignments", function(x) indel(pattern(x))) setMethod("indel", "PairwiseAlignments", function(x) new("InDel", insertion = insertion(x), deletion = deletion(x))) setMethod("nindel", "PairwiseAlignments", function(x) new("InDel", insertion = nindel(subject(x)), deletion = nindel(pattern(x)))) setMethod("nchar", "PairwiseAlignments", function(x, type="chars", allowNA=FALSE) nchar(subject(x))) setMethod("seqtype", "PairwiseAlignments", function(x) seqtype(subject(x))) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "show" method ### ### TODO: Maybe make the "show" method format the alignment in a SGD fashion ### i.e. split in 60-letter blocks and use the "|" character to highlight ### exact matches. ### .show_PairwiseAlignments <- function(x) { x_len <- length(x) if (x_len == 0L) cat("Empty ") cat(switch(type(x), "global"="Global", "overlap"="Overlap", "local"="Local", "global-local" = "Global-Local", "local-global"="Local-Global"), " ", class(x), sep="") if (x_len == 0L) { cat("\n") return() } cat(" (1 of ", x_len, ")\n", sep="") x1 <- x[1L] x_type <- type(x) global.pattern <- x_type %in% c("global", "global-local") global.subject <- x_type %in% c("global", "local-global") p1start <- if (global.pattern) "" else paste0("[", start(x1@pattern@range), "]") s1start <- if (global.subject) "" else paste0("[", start(x1@subject@range), "]") width <- max(nchar(p1start), nchar(s1start)) if (width != 0L) { width <- width + 1L p1start <- format(p1start, justify="right", width=width) s1start <- format(s1start, justify="right", width=width) } width <- getOption("width") - 9L - width pattern1 <- toSeqSnippet(alignedPattern(x1)[[1L]], width) subject1 <- toSeqSnippet(alignedSubject(x1)[[1L]], width) cat("pattern:", p1start, " ", add_colors(pattern1), "\n", sep="") cat("subject:", s1start, " ", add_colors(subject1), "\n", sep="") cat("score:", score(x1), "\n") } setMethod("show", "PairwiseAlignments", function(object) .show_PairwiseAlignments(object) ) pwalign/R/PairwiseAlignments-io.R0000644000175100017510000003371214614311433017777 0ustar00biocbuildbiocbuild### ========================================================================= ### Input/output of PairwiseAlignments objects ### ------------------------------------------------------------------------- ### ### Only output is supported at the moment. ### .pre2postaligned <- function(pos, axset) { if (!is(axset, "AlignedXStringSet0") || length(axset) != 1L) stop("'axset' must be an AlignedXStringSet0 object of length 1") ## .postaligned_gap_ranges() below sometimes needs to call ## .pre2postaligned() with a 'pos' that is 'end(axset@range) + 1L'. ## This happens when there is a gap at the end of the alignment like in ## x <- DNAString("TCAACTTAACTT") ## y <- DNAString("GGGCAACAACGGG") ## pa <- pairwiseAlignment(x, y, type="global-local", ## gapOpening=-2, gapExtension=-1) ## writePairwiseAlignments(pa) stopifnot(all(pos >= start(axset@range)), all(pos <= end(axset@range) + 1L)) lkup <- integer(width(axset@range) + 1L) gap_ranges <- indel(axset)[[1L]] lkup[start(gap_ranges)] <- width(gap_ranges) lkup <- cumsum(lkup + 1L) lkup[pos - start(axset@range) + 1L] } .test.pre2postaligned <- function(axset) { if (!is(axset, "AlignedXStringSet0") || length(axset) != 1L) stop("'axset' must be an AlignedXStringSet0 object of length 1") target <- subseq(unaligned(axset)[[1L]], start(axset@range), end(axset@range)) pos <- as.integer(axset@range) current <- aligned(axset)[[1L]][.pre2postaligned(pos, axset)] identical(as.character(target), as.character(current)) } .postaligned_gap_ranges <- function(axset) { if (!is(axset, "AlignedXStringSet0") || length(axset) != 1L) stop("'axset' must be an AlignedXStringSet0 object of length 1") gap_ranges <- indel(axset)[[1L]] prealigned_gap_start <- start(gap_ranges) + start(axset@range) - 1L postaligned_gap_start <- .pre2postaligned(prealigned_gap_start, axset) - width(gap_ranges) IRanges(postaligned_gap_start, width=width(gap_ranges)) } .postaligned_match_ranges <- function(axset) { if (!is(axset, "AlignedXStringSet0") || length(axset) != 1L) stop("'axset' must be an AlignedXStringSet0 object of length 1") aligned_axset <- aligned(axset) postaligned_width <- width(aligned_axset) prealigned_mismatches <- mismatch(axset)[[1L]] is_in_range <- prealigned_mismatches >= start(axset@range) & prealigned_mismatches <= end(axset@range) prealigned_mismatches <- prealigned_mismatches[is_in_range] postaligned_mismatches <- .pre2postaligned(prealigned_mismatches, axset) postaligned_mismatch_ranges <- as(postaligned_mismatches, "IRanges") postaligned_gap_ranges <- .postaligned_gap_ranges(axset) postaligned_mismatches_or_indels <- union(postaligned_mismatch_ranges, postaligned_gap_ranges) setdiff(IRanges(1L, postaligned_width), postaligned_mismatches_or_indels) } .make_pipes <- function(x) { if (!is(x, "PairwiseAlignments") || length(x) != 1L) stop("'x' must be a PairwiseAlignments object of length 1") x_pattern <- pattern(x) # QualityAlignedXStringSet object x_subject <- subject(x) # QualityAlignedXStringSet object postaligned_pattern <- aligned(x_pattern)[[1L]] postaligned_subject <- aligned(x_subject)[[1L]] postaligned_pattern_match_ranges <- .postaligned_match_ranges(x_pattern) postaligned_subject_match_ranges <- .postaligned_match_ranges(x_subject) postaligned_match_ranges <- intersect(postaligned_pattern_match_ranges, postaligned_subject_match_ranges) ## Sanity check: ii <- IRanges:::unlist_as_integer(postaligned_match_ranges) if (!identical(as.character(postaligned_pattern[ii]), as.character(postaligned_subject[ii]))) stop("pwalign internal error: mismatches and/or indels ", "reported in 'pattern(x)' and 'subject(x)' are inconsistent") tmp <- rep.int(" ", length(postaligned_pattern)) tmp[ii] <- "|" BString(paste(tmp, collapse="")) } .make_blank_BString <- function(nblank) { rep.int(BString(" "), nblank) } .makePostalignedSeqs <- function(x) { if (!is(x, "PairwiseAlignments") || length(x) != 1L) stop("'x' must be a PairwiseAlignments object of length 1") x_type <- type(x) global.pattern <- x_type %in% c("global", "global-local") global.subject <- x_type %in% c("global", "local-global") aligned_pattern <- get_aligned_pattern(x@pattern, x@subject, global.pattern, global.subject, check=TRUE) aligned_subject <- get_aligned_pattern(x@subject, x@pattern, global.subject, global.pattern) ans_seqs <- c(as(aligned_pattern, "XStringSet"), as(aligned_subject, "XStringSet")) original_pattern <- x@pattern@unaligned pattern_name <- names(original_pattern) if (is.null(pattern_name)) pattern_name <- "" original_subject <- x@subject@unaligned subject_name <- names(original_subject) if (is.null(subject_name)) subject_name <- "" names(ans_seqs) <- c(pattern_name, subject_name) ans_ranges <- c(x@pattern@range, x@subject@range) ans_pipes <- .make_pipes(x) if (global.pattern) { original_pattern <- original_pattern[[1L]] start1 <- start(ans_ranges)[1L] if (start1 > 1L) { prefix <- .make_blank_BString(start1 - 1L) ans_pipes <- c(prefix, ans_pipes) start(ans_ranges)[1L] <- 1L } end1 <- end(ans_ranges)[1L] if (end1 < length(original_pattern)) { suffix <- .make_blank_BString(length(original_pattern) - end1) ans_pipes <- c(ans_pipes, suffix) end(ans_ranges)[1L] <- length(original_pattern) } } if (global.subject) { original_subject <- original_subject[[1L]] start2 <- start(ans_ranges)[2L] if (start2 > 1L) { prefix <- .make_blank_BString(start2 - 1L) ans_pipes <- c(prefix, ans_pipes) start(ans_ranges)[2L] <- 1L } end2 <- end(ans_ranges)[2L] if (end2 < length(original_subject)) { suffix <- .make_blank_BString(length(original_subject) - end2) ans_pipes <- c(ans_pipes, suffix) end(ans_ranges)[2L] <- length(original_subject) } } list(ans_seqs, ans_ranges, ans_pipes) } .writePairHeader <- function(x, alignment.length, Identity, Similarity, Gaps, pattern.name="P1", subject.name="S1", Matrix=NA, file="") { if (!is(x, "PairwiseAlignments") || length(x) != 1L) stop("'x' must be a PairwiseAlignments object of length 1") if (!isSingleNumber(alignment.length)) stop("'alignment.length' must be a single number") if (!is.integer(alignment.length)) alignment.length <- as.integer(alignment.length) if (!isSingleStringOrNA(Matrix)) stop("'Matrix' must be a single string or NA") Gap_penalty <- sprintf("%.1f", (x@gapOpening + x@gapExtension)) Extend_penalty <- sprintf("%.1f", x@gapExtension) prettyPercentage <- function(ratio) sprintf("%.1f%%", round(ratio * 100, digits=1L)) Identity_percentage <- prettyPercentage(Identity / alignment.length) Identity <- paste(format(Identity, width=7L), "/", alignment.length, " (", Identity_percentage, ")", sep="") Similarity_percentage <- prettyPercentage(Similarity / alignment.length) Similarity <- paste(format(Similarity, width=5L), "/", alignment.length, " (", Similarity_percentage, ")", sep="") Gaps_percentage <- prettyPercentage(Gaps / alignment.length) Gaps <- paste(format(Gaps, width=11L), "/", alignment.length, " (", Gaps_percentage, ")", sep="") Score <- x@score cat("#=======================================\n", file=file) cat("#\n", file=file) cat("# Aligned_sequences: 2\n", file=file) cat("# 1: ", pattern.name, "\n", sep="", file=file) cat("# 2: ", subject.name, "\n", sep="", file=file) cat("# Matrix: ", Matrix, "\n", sep="", file=file) cat("# Gap_penalty: ", Gap_penalty, "\n", sep="", file=file) cat("# Extend_penalty: ", Extend_penalty, "\n", sep="", file=file) cat("#\n", file=file) cat("# Length: ", alignment.length, "\n", sep="", file=file) cat("# Identity: ", Identity, "\n", sep="", file=file) cat("# Similarity: ", Similarity, "\n", sep="", file=file) cat("# Gaps: ", Gaps, "\n", sep="", file=file) cat("# Score: ", Score, "\n", sep="", file=file) cat("#\n#\n", file=file) cat("#=======================================\n", file=file) } .writePairSequences <- function(top.string, bottom.string, middle.string, top.name="P1", bottom.name="S1", top.start=1L, bottom.start=1L, block.width=50, file="") { if (!isSingleNumber(block.width)) stop("'block.width' must be a single number") if (!is.integer(block.width)) block.width <- as.integer(block.width) alignment_length <- length(top.string) start_width <- max(nchar(as.character(top.start + alignment_length)), nchar(as.character(bottom.start + alignment_length))) name_width <- max(20L - start_width - 1L, nchar(top.name), nchar(bottom.name)) nblock <- alignment_length %/% block.width if (alignment_length %% block.width != 0L) nblock <- nblock + 1L start1 <- top.start start3 <- bottom.start for (i in seq_len(nblock)) { to <- i * block.width from <- to - block.width + 1L if (to > alignment_length) to <- alignment_length string1 <- as.character(subseq(top.string, from, to)) string2 <- as.character(subseq(middle.string, from, to)) string3 <- as.character(subseq(bottom.string, from, to)) if (i != 1L) cat("\n", file=file) ## 1st line cat(format(top.name, width=name_width), " ", format(start1, justify="right", width=start_width), " ", string1, file=file, sep="") end1 <- start1 + to - from - countPattern("-", string1) cat(format(end1, justify="right", width=7L), "\n", sep="", file=file) ## 2nd line cat(format("", width=name_width), " ", format("", width=start_width), " ", string2, "\n", file=file, sep="") ## 2rd line cat(format(bottom.name, width=name_width), " ", format(start3, justify="right", width=start_width), " ", string3, file=file, sep="") end3 <- start3 + to - from - countPattern("-", string3) cat(format(end3, justify="right", width=7L), "\n", sep="", file=file) start1 <- end1 + 1L start3 <- end3 + 1L } } writePairwiseAlignments <- function(x, file="", Matrix=NA, block.width=50) { if (!is(x, "PairwiseAlignments")) stop("'x' must be a PairwiseAlignments object") if (isSingleString(file)) { if (file == "") { file <- stdout() } else if (substring(file, 1L, 1L) == "|") { file <- pipe(substring(file, 2L), "w") on.exit(close(file)) } else { file <- file(file, "w") on.exit(close(file)) } } else if (!is(file, "connection")) { stop("'file' must be a single string or a connection object") } pkgversion <- as.character(packageVersion("pwalign")) Program <- paste("pwalign (version ", pkgversion, "), ", "a Bioconductor package", sep="") cat("########################################\n", file=file) cat("# Program: ", Program, "\n", sep="", file=file) cat("# Rundate: ", date(), "\n", sep="", file=file) cat("########################################\n", file=file) x_len <- length(x) if (x_len == 0L) warning("'x' is an empty PairwiseAlignments object ", "-> nothing to write") #else if (x_len >= 2L) # warning("'x' contains more than 1 pairwise alignment") x_type <- type(x) is_pattern_global <- x_type %in% c("global", "global-local") is_subject_global <- x_type %in% c("global", "local-global") if (length(unaligned(subject(x))) != 1L) { bottom_name0 <- "" } else { bottom_name0 <- "S1" } for (i in seq_len(x_len)) { #if (i != 1L) # cat("\n\n", file=file) xi <- x[i] postaligned_seqs <- .makePostalignedSeqs(xi) seqs <- postaligned_seqs[[1L]] ranges <- postaligned_seqs[[2L]] pipes <- postaligned_seqs[[3L]] name1 <- names(seqs)[1L] if (name1 == "") name1 <- paste("P", i, sep="") name2 <- names(seqs)[2L] if (name2 == "") { if (bottom_name0 == "") { name2 <- paste("S", i, sep="") } else { name2 <- bottom_name0 } } Identity <- countPattern("|", pipes) Gaps <- sum(width(union(ranges(matchPattern("-", seqs[[1L]])), ranges(matchPattern("-", seqs[[2L]]))))) .writePairHeader(xi, length(seqs[[1L]]), Identity, NA, Gaps, pattern.name=name1, subject.name=name2, Matrix=Matrix, file=file) cat("\n", file=file) start1 <- start(ranges)[1L] start2 <- start(ranges[2L]) .writePairSequences(seqs[[1L]], seqs[[2L]], pipes, top.name=name1, bottom.name=name2, top.start=start1, bottom.start=start2, block.width=block.width, file=file) cat("\n\n", file=file) } cat("#---------------------------------------\n", file=file) cat("#---------------------------------------\n", file=file) invisible(NULL) } pwalign/R/PairwiseAlignmentsSingleSubject-class.R0000644000175100017510000002414114614311433023153 0ustar00biocbuildbiocbuild### ========================================================================== ### PairwiseAlignmentsSingleSubject objects ### -------------------------------------------------------------------------- ### A PairwiseAlignmentsSingleSubject object contains the result of the ### pairwise alignment of many patterns to one subject. ### setClass("PairwiseAlignmentsSingleSubject", contains="PairwiseAlignments" ) setClass("PairwiseAlignmentsSingleSubjectSummary", representation( type="character", score="numeric", nmatch="numeric", nmismatch="numeric", ninsertion="matrix", ndeletion="matrix", coverage="Rle", mismatchSummary="list" ) ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Constructors. ### setGeneric("PairwiseAlignmentsSingleSubject", function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, ...) standardGeneric("PairwiseAlignmentsSingleSubject")) setMethod("PairwiseAlignmentsSingleSubject", signature(pattern = "XString", subject = "XString"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (seqtype(pattern) != seqtype(subject)) stop("'pattern' and 'subject' must contain ", "sequences of the same type") PairwiseAlignmentsSingleSubject(as.character(pattern), as.character(subject), type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = xsbaseclass(pattern)) } ) setMethod("PairwiseAlignmentsSingleSubject", signature(pattern = "XStringSet", subject = "missing"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (length(pattern) != 2) stop("'pattern' must be of length 2 when 'subject' is missing") if (diff(nchar(pattern)) != 0) stop("'pattern' elements must have the same number of characters") PairwiseAlignmentsSingleSubject(as.character(pattern[1]), as.character(pattern[2]), type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = xsbaseclass(pattern)) } ) setMethod("PairwiseAlignmentsSingleSubject", signature(pattern = "character", subject = "missing"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") { if (length(pattern) != 2) stop("'pattern' must be of length 2 when 'subject' is missing") if (diff(nchar(pattern)) != 0) stop("'pattern' elements must have the same number of characters") PairwiseAlignmentsSingleSubject(pattern[1], pattern[2], type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = baseClass) } ) setMethod("PairwiseAlignmentsSingleSubject", signature(pattern = "character", subject = "character"), function(pattern, subject, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1, baseClass = "BString") { newPairwiseAlignments(pattern = pattern, subject = subject, type = type, substitutionMatrix = substitutionMatrix, gapOpening = gapOpening, gapExtension = gapExtension, baseClass = baseClass, pwaClass = "PairwiseAlignmentsSingleSubject") } ) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### Accessor methods. ### ### TODO: Support the 'width' argument. setMethod("Views", signature = c(subject = "PairwiseAlignmentsSingleSubject"), function(subject, start=NULL, end=NULL, width=NULL, names=NULL) { if (!is.null(width)) stop("\"Views\" method for PairwiseAlignmentsSingleSubject objects ", "does not support the 'width' argument yet, sorry!") if (is.null(start)) start <- NA if (all(is.na(start))) start <- start(subject(subject)) else if (!is.numeric(start) || length(start) > 1) stop("'start' must be either NA or an integer vector of length 1") else start <- as.integer(start) + start(subject(subject)) if (is.null(end)) end <- NA if (all(is.na(end))) end <- end(subject(subject)) else if (!is.numeric(end) || length(end) > 1) stop("'end' must be either NA or an integer vector of length 1") else end <- as.integer(end) + start(subject(subject)) tmp <- unaligned(subject(subject)) stopifnot(length(tmp) == 1L) # sanity check ans_subject <- tmp[[1L]] Views(ans_subject, start=start, end=end, names=names) }) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "summary" method. ### setMethod("summary", "PairwiseAlignmentsSingleSubject", function(object, weight=1L, ...) { if (!is.numeric(weight) || !(length(weight) %in% c(1, length(object)))) stop("'weight' must be an integer vector with length 1 or 'length(object)'") if (!is.integer(weight)) weight <- as.integer(weight) if (all(weight == 1)) new("PairwiseAlignmentsSingleSubjectSummary", type = type(object), score = score(object), nmatch = nmatch(object), nmismatch = nmismatch(object), ninsertion = nindel(subject(object)), ndeletion = nindel(pattern(object)), coverage = coverage(object), mismatchSummary = mismatchSummary(object)) else new("PairwiseAlignmentsSingleSubjectSummary", type = type(object), score = rep(score(object), weight), nmatch = rep(nmatch(object), weight), nmismatch = rep(nmismatch(object), weight), ninsertion = nindel(subject(object))[rep(seq_len(length(object)), weight), , drop = FALSE], ndeletion = nindel(pattern(object))[rep(seq_len(length(object)), weight), , drop = FALSE], coverage = coverage(object, weight = weight), mismatchSummary = mismatchSummary(object, weight = weight)) }) setMethod("show", "PairwiseAlignmentsSingleSubjectSummary", function(object) { cat(switch(type(object), "global" = "Global", "overlap" = "Overlap", "local" = "Local", "global-local" = "Global-Local", "local-global" = "Local-Global"), " Single Subject Pairwise Alignments\n", sep = "") cat("Number of Alignments: ", length(score(object)), "\n", sep = "") cat("\nScores:\n") print(summary(score(object))) cat("\nNumber of matches:\n") print(summary(nmatch(object))) n <- min(nrow(mismatchSummary(object)[["subject"]]), 10) cat(paste("\nTop", n, "Mismatch Counts:\n")) mmtable <- mismatchSummary(object)[["subject"]][ order(mismatchSummary(object)[["subject"]][["Count"]], mismatchSummary(object)[["subject"]][["Probability"]], decreasing = TRUE)[seq_len(n)],,drop=FALSE] rownames(mmtable) <- NULL print(mmtable) }) setMethod("type", "PairwiseAlignmentsSingleSubjectSummary", function(x) x@type) setMethod("score", "PairwiseAlignmentsSingleSubjectSummary", function(x) x@score) setMethod("nindel", "PairwiseAlignmentsSingleSubjectSummary", function(x) new("InDel", insertion = x@ninsertion, deletion = x@ndeletion)) setMethod("length", "PairwiseAlignmentsSingleSubjectSummary", function(x) length(score(x))) setMethod("nchar", "PairwiseAlignmentsSingleSubjectSummary", function(x, type="chars", allowNA=FALSE) unname(nmatch(x) + nmismatch(x) + x@ninsertion[,"WidthSum"] + x@ndeletion[,"WidthSum"])) ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ### The "as.character" method. ### setMethod("aligned", "PairwiseAlignmentsSingleSubject", function(x, degap=FALSE, gapCode="-", endgapCode="-") { if (degap) { value <- aligned(pattern(x), degap = degap) } else { codecX <- xscodec(x) if (is.null(codecX)) { gapCode <- charToRaw(gapCode) endgapCode <- charToRaw(endgapCode) } else { letters2codes <- codecX@codes names(letters2codes) <- codecX@letters gapCode <- as.raw(letters2codes[[gapCode]]) endgapCode <- as.raw(letters2codes[[endgapCode]]) } value <- .Call2("PairwiseAlignmentsSingleSubject_align_aligned", x, gapCode, endgapCode, PACKAGE="pwalign") } value }) setMethod("as.character", "PairwiseAlignmentsSingleSubject", function(x) { as.character(alignedPattern(x)) }) setMethod("toString", "PairwiseAlignmentsSingleSubject", function(x, ...) toString(as.character(x), ...)) setMethod("as.matrix", "PairwiseAlignmentsSingleSubject", function(x) { as.matrix(aligned(x)) }) pwalign/R/pid.R0000644000175100017510000000141414614311433014333 0ustar00biocbuildbiocbuild### ========================================================================= ### Percent Sequence Identity ### ------------------------------------------------------------------------- ### setGeneric("pid", signature="x", function(x, type="PID1") standardGeneric("pid") ) setMethod("pid", "PairwiseAlignments", function(x, type="PID1") { type <- match.arg(type, c("PID1", "PID2", "PID3", "PID4")) denom <- switch(type, PID1 = nchar(x), PID2 = nmatch(x) + nmismatch(x), PID3 = pmin(nchar(unaligned(pattern(x))), nchar(unaligned(subject(x)))), PID4 = (nchar(unaligned(pattern(x))) + nchar(unaligned(subject(x)))) / 2 ) 100 * nmatch(x)/denom }) pwalign/R/stringDist.R0000644000175100017510000003245514614311433015722 0ustar00biocbuildbiocbuild### ========================================================================= ### The stringDist() generic ### ------------------------------------------------------------------------- XStringSet.stringDist <- function(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", substitutionMatrix = NULL, gapOpening = 0, gapExtension = 1) { ## Check arguments method <- match.arg(method, c("levenshtein", "hamming", "quality", "substitutionMatrix")) if (method == "hamming") { if (ignoreCase) stop("'ignoreCase != TRUE' when 'type =\"hamming\"") answer <- .Call2("XStringSet_dist_hamming", x, PACKAGE="Biostrings") } else { ## Process string information if (is.null(xscodec(x))) { unique_letters <- uniqueLetters(x) #Even if safeLettersToInt() will deal properly with embedded NULLs, I #suspect bad things will happen downstream in case there are any. alphabetToCodes <- safeLettersToInt(unique_letters, letters.as.names=TRUE) } else { alphabetToCodes <- xscodes(x) } ## Set parameters when method == "levenshtein" if (method == "levenshtein") { type <- "global" typeCode <- 1L gapOpening <- 0 gapExtension <- 1 if (ignoreCase) caseAdjustedAlphabet <- tolower(names(alphabetToCodes)) else caseAdjustedAlphabet <- names(alphabetToCodes) substitutionMatrix <- outer(caseAdjustedAlphabet, caseAdjustedAlphabet, function(x,y) -as.numeric(x!=y)) dimnames(substitutionMatrix) <- list(names(alphabetToCodes), names(alphabetToCodes)) } else { type <- match.arg(type, c("global", "local", "overlap")) typeCode <- c("global" = 1L, "local" = 2L, "overlap" = 3L)[[type]] gapOpening <- as.double(abs(gapOpening)) if (length(gapOpening) != 1 || is.na(gapOpening)) stop("'gapOpening' must be a non-negative numeric vector of length 1") gapExtension <- as.double(abs(gapExtension)) if (length(gapExtension) != 1 || is.na(gapExtension)) stop("'gapExtension' must be a non-negative numeric vector of length 1") } useQuality <- FALSE if (is.character(substitutionMatrix)) { if (length(substitutionMatrix) != 1) stop("'substitutionMatrix' is a character vector of length != 1") tempMatrix <- substitutionMatrix substitutionMatrix <- try(getdata(tempMatrix), silent = TRUE) if (is(substitutionMatrix, "try-error")) stop("unknown scoring matrix \"", tempMatrix, "\"") } if (!is.matrix(substitutionMatrix) || !is.numeric(substitutionMatrix)) stop("'substitutionMatrix' must be a numeric matrix") if (!identical(rownames(substitutionMatrix), colnames(substitutionMatrix))) stop("row and column names differ for matrix 'substitutionMatrix'") if (is.null(rownames(substitutionMatrix))) stop("matrix 'substitutionMatrix' must have row and column names") if (any(duplicated(rownames(substitutionMatrix)))) stop("matrix 'substitutionMatrix' has duplicated row names") if (!isSymmetric(substitutionMatrix)) stop("'substitutionMatrix' must be a symmetric matrix") availableLetters <- intersect(names(alphabetToCodes), rownames(substitutionMatrix)) substitutionMatrix <- matrix(as.double(substitutionMatrix[availableLetters, availableLetters]), nrow = length(availableLetters), ncol = length(availableLetters), dimnames = list(availableLetters, availableLetters)) substitutionArray <- array(unlist(substitutionMatrix, substitutionMatrix), dim = c(dim(substitutionMatrix), 2), dimnames = list(availableLetters, availableLetters, c("0", "1"))) substitutionLookupTable <- buildLookupTable(alphabetToCodes[availableLetters], 0:(length(availableLetters) - 1)) fuzzyMatrix <- matrix(0L, length(availableLetters), length(availableLetters), dimnames = list(availableLetters, availableLetters)) diag(fuzzyMatrix) <- 1L fuzzyLookupTable <- buildLookupTable(alphabetToCodes[availableLetters], 0:(length(availableLetters) - 1)) answer <- .Call2("XStringSet_align_distance", x, type, typeCode, gapOpening, gapExtension, useQuality, substitutionArray, dim(substitutionArray), substitutionLookupTable, fuzzyMatrix, dim(fuzzyMatrix), fuzzyLookupTable, PACKAGE="pwalign") if (method %in% c("levenshtein", "substitutionMatrix")) answer <- -answer } attr(answer, "Size") <- length(x) attr(answer, "Labels") <- names(x) attr(answer, "Diag") <- diag attr(answer, "Upper") <- upper attr(answer, "method") <- method class(answer) <- "dist" return(answer) } QualityScaledXStringSet.stringDist <- function(x, ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1) { ## Check arguments type <- match.arg(type, c("global", "local", "overlap")) typeCode <- c("global" = 1L, "local" = 2L, "overlap" = 3L)[[type]] gapOpening <- as.double(abs(gapOpening)) if (length(gapOpening) != 1 || is.na(gapOpening)) stop("'gapOpening' must be a non-negative numeric vector of length 1") gapExtension <- as.double(abs(gapExtension)) if (length(gapExtension) != 1 || is.na(gapExtension)) stop("'gapExtension' must be a non-negative numeric vector of length 1") ## Process string information if (is.null(xscodec(x))) { unique_letters <- uniqueLetters(x) #Even if safeLettersToInt() will deal properly with embedded nuls, I #suspect bad things will happen downstream in case there are any. alphabetToCodes <- safeLettersToInt(unique_letters, letters.as.names=TRUE) } else { alphabetToCodes <- xscodes(x) } useQuality <- TRUE if (is.null(fuzzyMatrix)) { fuzzyMatrix <- diag(length(alphabetToCodes)) dimnames(fuzzyMatrix) <- list(names(alphabetToCodes), names(alphabetToCodes)) } else { if (!is.matrix(fuzzyMatrix) || !is.numeric(fuzzyMatrix) || any(is.na(fuzzyMatrix)) || any(fuzzyMatrix < 0) || any(fuzzyMatrix > 1)) stop("'fuzzyMatrix' must be a numeric matrix with values between 0 and 1 inclusive") if (!identical(rownames(fuzzyMatrix), colnames(fuzzyMatrix))) stop("row and column names differ for matrix 'fuzzyMatrix'") if (is.null(rownames(fuzzyMatrix))) stop("matrix 'fuzzyMatrix' must have row and column names") if (any(duplicated(rownames(fuzzyMatrix)))) stop("matrix 'fuzzyMatrix' has duplicated row names") } availableLetters <- intersect(names(alphabetToCodes), rownames(fuzzyMatrix)) fuzzyMatrix <- fuzzyMatrix[availableLetters, availableLetters] uniqueFuzzyValues <- sort(unique(fuzzyMatrix)) fuzzyReferenceMatrix <- matrix(match(fuzzyMatrix, uniqueFuzzyValues) - 1L, nrow = nrow(fuzzyMatrix), ncol = ncol(fuzzyMatrix), dimnames = dimnames(fuzzyMatrix)) fuzzyLookupTable <- buildLookupTable(alphabetToCodes[availableLetters], 0:(length(availableLetters) - 1)) alphabetLength <- switch(class(x), QualityScaledDNAStringSet =, QualityScaledRNAStringSet = 4L, QualityScaledAAStringSet = 20L, 256L) substitutionArray <- qualitySubstitutionMatrices(fuzzyMatch = uniqueFuzzyValues, alphabetLength = alphabetLength, qualityClass = class(quality(x))) substitutionLookupTable <- buildLookupTable((minQuality(quality(x)) + offset(quality(x))): (maxQuality(quality(x)) + offset(quality(x))), 0:(maxQuality(quality(x)) - minQuality(quality(x)))) answer <- .Call2("XStringSet_align_distance", x, type, typeCode, gapOpening, gapExtension, useQuality, substitutionArray, dim(substitutionArray), substitutionLookupTable, fuzzyReferenceMatrix, dim(fuzzyReferenceMatrix), fuzzyLookupTable, PACKAGE="pwalign") attr(answer, "Size") <- length(x) attr(answer, "Labels") <- names(x) attr(answer, "Diag") <- diag attr(answer, "Upper") <- upper attr(answer, "method") <- "quality" class(answer) <- "dist" return(answer) } setGeneric("stringDist", signature = "x", function(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, ...) standardGeneric("stringDist"), where=asNamespace("pwalign")) setMethod("stringDist", signature(x = "character"), function(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", quality = PhredQuality(22L), substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (method != "quality") { XStringSet.stringDist(x = BStringSet(x), method = method, ignoreCase = ignoreCase, diag = diag, upper = upper, type = type, substitutionMatrix = substitutionMatrix, gapExtension = gapExtension, gapOpening = gapOpening) } else { QualityScaledXStringSet.stringDist(x = QualityScaledBStringSet(x, quality), ignoreCase = ignoreCase, diag = diag, upper = upper, type = type, fuzzyMatrix = fuzzyMatrix, gapExtension = gapExtension, gapOpening = gapOpening) }}) setMethod("stringDist", signature(x = "XStringSet"), function(x, method = "levenshtein", ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", quality = PhredQuality(22L), substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (method != "quality") { XStringSet.stringDist(x = x, method = method, ignoreCase = ignoreCase, diag = diag, upper = upper, type = type, substitutionMatrix = substitutionMatrix, gapExtension = gapExtension, gapOpening = gapOpening) } else { QualityScaledXStringSet.stringDist(x = QualityScaledXStringSet(x, quality), ignoreCase = ignoreCase, diag = diag, upper = upper, type = type, fuzzyMatrix = fuzzyMatrix, gapExtension = gapExtension, gapOpening = gapOpening) }}) setMethod("stringDist", signature(x = "QualityScaledXStringSet"), function(x, method = "quality", ignoreCase = FALSE, diag = FALSE, upper = FALSE, type = "global", substitutionMatrix = NULL, fuzzyMatrix = NULL, gapOpening = 0, gapExtension = 1) { if (method != "quality") { XStringSet.stringDist(x = as(x, "XStringSet"), method = method, ignoreCase = ignoreCase, diag = diag, upper = upper, type = type, substitutionMatrix = substitutionMatrix, gapExtension = gapExtension, gapOpening = gapOpening) } else { QualityScaledXStringSet.stringDist(x = x, ignoreCase = ignoreCase, diag = diag, upper = upper, type = type, fuzzyMatrix = fuzzyMatrix, gapExtension = gapExtension, gapOpening = gapOpening) }}) pwalign/R/substitution_matrices.R0000644000175100017510000000676614614311433020241 0ustar00biocbuildbiocbuild### ========================================================================= ### Utilities to generate substitution matrices ### ------------------------------------------------------------------------- ### nucleotideSubstitutionMatrix <- function(match = 1, mismatch = 0, baseOnly = FALSE, type = "DNA", symmetric = TRUE) { "%safemult%" <- function(x, y) ifelse(is.infinite(x) & y == 0, 0, x * y) type <- match.arg(type, c("DNA", "RNA")) if (!isSingleNumber(match) || !isSingleNumber(mismatch)) { stop("'match' and 'mismatch' must be non-missing numbers") } if (baseOnly) { letters <- IUPAC_CODE_MAP[DNA_BASES] } else { letters <- IUPAC_CODE_MAP } if (type == "RNA") { names(letters) <- chartr("T", "U", names(letters)) } nLetters <- length(letters) splitLetters <- strsplit(letters,split="") submat <- matrix(0, nrow = nLetters, ncol = nLetters, dimnames = list(names(letters), names(letters))) if (symmetric) { for (i in seq_len(nLetters)) { for (j in i:nLetters) { submat[i,j] <- submat[j,i] <- mean(outer(splitLetters[[i]], splitLetters[[j]], "==")) } } } else { for (i in seq_len(nLetters)) { for (j in i:nLetters) { submat[i,j] <- mean(outer(splitLetters[[i]], splitLetters[[j]], "%in%")) submat[j,i] <- mean(outer(splitLetters[[j]], splitLetters[[i]], "%in%")) } } } abs(match) * submat - abs(mismatch) %safemult% (1 - submat) } errorSubstitutionMatrices <- function(errorProbability, fuzzyMatch = c(0, 1), alphabetLength = 4L, bitScale = 1) { if (!is.numeric(errorProbability) || !all(!is.na(errorProbability) & errorProbability >= 0 & errorProbability <= 1)) stop("'errorProbability' must be a numeric vector with values between 0 and 1 inclusive") if (!is.numeric(fuzzyMatch) || !all(!is.na(fuzzyMatch) & fuzzyMatch >= 0 & fuzzyMatch <= 1)) stop("'fuzzyMatch' must be a numeric vector with values between 0 and 1 inclusive") errorMatrix <- outer(errorProbability, errorProbability, function(e1,e2,n) e1 + e2 - (n/(n - 1)) * e1 * e2, n = alphabetLength) adjMatchProbs <- lapply(list(match = (1 - errorMatrix) * alphabetLength, mismatch = errorMatrix * (alphabetLength / (alphabetLength - 1))), function(x) {dimnames(x) <- list(names(errorProbability), names(errorProbability)); x}) output <- array(NA_real_, dim = c(length(errorProbability), length(errorProbability), length(fuzzyMatch)), dimnames = list(names(errorProbability), names(errorProbability), as.character(fuzzyMatch))) for (i in seq_len(length(fuzzyMatch))) { output[,,i] <- bitScale * log2(fuzzyMatch[i] * adjMatchProbs[["match"]] + (1 - fuzzyMatch[i]) * adjMatchProbs[["mismatch"]]) } output } qualitySubstitutionMatrices <- function(fuzzyMatch = c(0, 1), alphabetLength = 4L, qualityClass = "PhredQuality", bitScale = 1) { if (!is.numeric(fuzzyMatch) || !all(!is.na(fuzzyMatch) & fuzzyMatch >= 0 & fuzzyMatch <= 1)) stop("'fuzzyMatch' must be a numeric vector with values between 0 and 1 inclusive") if (!is(new(qualityClass), "XStringQuality")) stop("'qualityClass' must be one of the 'XStringQuality' classes") qualityIntegers <- minQuality(new(qualityClass)):maxQuality(new(qualityClass)) errorProbability <- qualityConverter(qualityIntegers, qualityClass, "numeric") names(errorProbability) <- as.character(qualityIntegers) errorSubstitutionMatrices(errorProbability, fuzzyMatch, alphabetLength = alphabetLength, bitScale = bitScale) } pwalign/R/utils.R0000644000175100017510000000134214614311433014717 0ustar00biocbuildbiocbuildadd_colors <- Biostrings:::add_colors add_colors.default <- Biostrings:::add_colors.default add_colors.DNA <- Biostrings:::add_colors.DNA add_colors.RNA <- Biostrings:::add_colors.RNA add_colors.AA <- Biostrings:::add_colors.AA safeLettersToInt <- Biostrings:::safeLettersToInt buildLookupTable <- Biostrings:::buildLookupTable xscodec <- Biostrings:::xscodec xsbaseclass <- Biostrings:::xsbaseclass XString <- Biostrings:::XString XStringSet <- Biostrings:::XStringSet QualityScaledXStringSet <- Biostrings:::QualityScaledXStringSet toSeqSnippet <- Biostrings:::toSeqSnippet offset <- Biostrings:::offset minQuality <- Biostrings:::minQuality maxQuality <- Biostrings:::maxQuality qualityConverter <- Biostrings:::qualityConverter pwalign/R/zzz.R0000644000175100017510000000026714614311433014421 0ustar00biocbuildbiocbuild.onLoad <- function(libname, pkgname) { } .onUnload <- function(libpath) { library.dynam.unload("pwalign", libpath) } .test <- function() BiocGenerics:::testPackage("pwalign") pwalign/README.md0000644000175100017510000000063214614311433014513 0ustar00biocbuildbiocbuild[](https://bioconductor.org/) **pwalign** is an R/Bioconductor package for performing pairwise sequence alignments. See https://bioconductor.org/packages/pwalign for more information including how to install the release version of the package (please refrain from installing directly from GitHub). pwalign/src/0000755000175100017510000000000014614351210014017 5ustar00biocbuildbiocbuildpwalign/src/align_pairwiseAlignment.c0000644000175100017510000014120214614311433021022 0ustar00biocbuildbiocbuild#include "pwalign.h" #include "S4Vectors_interface.h" #include "IRanges_interface.h" #include "Biostrings_interface.h" #include /* R_CheckUserInterrupt */ #include #include #define MAX(x, y) (x > y ? x : y) #define MIN(x, y) (x < y ? x : y) #define POSITIVE_INFINITY R_PosInf #define NEGATIVE_INFINITY R_NegInf #define MAX_BUF_SIZE 1048576 #define GLOBAL_ALIGNMENT 1 #define LOCAL_ALIGNMENT 2 #define OVERLAP_ALIGNMENT 3 #define GLOBAL_LOCAL_ALIGNMENT 4 #define LOCAL_GLOBAL_ALIGNMENT 5 #define SUBSTITUTION 'S' #define DELETION 'D' #define INSERTION 'I' #define TERMINATION 'T' #define CURR_MATRIX(i, j) (currMatrix[i + nCharString1Plus1 * j]) #define PREV_MATRIX(i, j) (prevMatrix[i + nCharString1Plus1 * j]) #define S_TRACE_MATRIX(i, j) (sTraceMatrix[i + nCharString1 * j]) #define D_TRACE_MATRIX(i, j) (dTraceMatrix[i + nCharString1 * j]) #define I_TRACE_MATRIX(i, j) (iTraceMatrix[i + nCharString1 * j]) #define FUZZY_MATRIX(i, j) (fuzzyMatrix[i + fuzzyMatrixDim[0] * j]) #define SUBSTITUTION_ARRAY(i, j, k) (substitutionArray[i + substitutionArrayDim[0] * (j + substitutionArrayDim[1] * k)]) #define SET_LOOKUP_VALUE(lookupTable, length, key) \ { \ unsigned char lookupKey = (unsigned char) (key); \ if (lookupKey >= (length) || (lookupValue = (lookupTable)[lookupKey]) == NA_INTEGER) { \ error("key %d not in lookup table", (int) lookupKey); \ } \ } /* Structure to hold alignment information */ struct AlignInfo { /* Initialized before passing the AlignInfo structure to * pairwiseAlignment(). Not modified by pairwiseAlignment(). */ Chars_holder string; Chars_holder quality; int endGap; /* Allocated (but not initialized) before passing the AlignInfo * structure to pairwiseAlignment(). Filled by pairwiseAlignment(). */ int* mismatch; int* startIndel; int* widthIndel; /* Not initialized before passing the AlignInfo structure to * pairwiseAlignment(). Set by pairwiseAlignment(). */ int lengthMismatch; int lengthIndel; int startRange; int widthRange; }; void function1(struct AlignInfo *); void print_AlignInfo(const struct AlignInfo *alignInfoPtr) { int string_len, i; const char *string_seq, *c; Rprintf("- string: "); string_len = alignInfoPtr->string.length; string_seq = alignInfoPtr->string.ptr; for (i = 0, c = string_seq; i < string_len; i++, c++) Rprintf("%c", *c); Rprintf("\n"); Rprintf("- quality: "); string_len = alignInfoPtr->quality.length; string_seq = alignInfoPtr->quality.ptr; for (i = 0, c = string_seq; i < string_len; i++, c++) Rprintf("%c", *c); Rprintf("\n"); Rprintf("- endGap: %d\n", alignInfoPtr->endGap); Rprintf("- lengthMismatch: %d\n", alignInfoPtr->lengthMismatch); Rprintf("- lengthIndel: %d\n", alignInfoPtr->lengthIndel); Rprintf("- startRange: %d\n", alignInfoPtr->startRange); Rprintf("- widthRange: %d\n", alignInfoPtr->widthRange); return; } /* Structure to hold alignment buffers */ struct AlignBuffer { float *currMatrix; float *prevMatrix; char *sTraceMatrix; char *iTraceMatrix; char *dTraceMatrix; }; void function2(struct AlignBuffer *); /* Structure to hold mismatch buffers */ struct MismatchBuffer { int *pattern; int *subject; int usedSpace; int totalSpace; }; void function3(struct MismatchBuffer *); /* Structure to hold indel buffers */ struct IndelBuffer { int *start; int *width; int usedSpace; int totalSpace; }; void function4(struct IndelBuffer *); /* Traceback through the score matrices */ static void traceback(const struct AlignBuffer *alignBufferPtr, char currTraceMatrix, struct AlignInfo *align1InfoPtr, struct AlignInfo *align2InfoPtr) { int i, j; char prevTraceMatrix = '?'; const char *sTraceMatrix = alignBufferPtr->sTraceMatrix; const char *iTraceMatrix = alignBufferPtr->iTraceMatrix; const char *dTraceMatrix = alignBufferPtr->dTraceMatrix; const int nCharString1 = align1InfoPtr->string.length; const int nCharString2 = align2InfoPtr->string.length; const int nCharString1Minus1 = nCharString1 - 1; const int nCharString2Minus1 = nCharString2 - 1; //Rprintf("align1InfoPtr:\n"); //print_AlignInfo(align1InfoPtr); //Rprintf("align2InfoPtr:\n"); //print_AlignInfo(align2InfoPtr); i = nCharString1 - align1InfoPtr->startRange; j = nCharString2 - align2InfoPtr->startRange; while (currTraceMatrix != TERMINATION && i >= 0 && j >= 0) { switch (currTraceMatrix) { case INSERTION: if (I_TRACE_MATRIX(i, j) != TERMINATION) { if (j == nCharString2Minus1) { align1InfoPtr->startRange++; } else { align1InfoPtr->widthRange++; if (prevTraceMatrix != INSERTION) { align2InfoPtr->startIndel[align2InfoPtr->lengthIndel] = nCharString2 - j; align2InfoPtr->lengthIndel++; } align2InfoPtr->widthIndel[align2InfoPtr->lengthIndel - 1] += 1; } } prevTraceMatrix = currTraceMatrix; currTraceMatrix = I_TRACE_MATRIX(i, j); i--; break; case DELETION: if (D_TRACE_MATRIX(i, j) != TERMINATION) { if (i == nCharString1Minus1) { align2InfoPtr->startRange++; } else { align2InfoPtr->widthRange++; if (prevTraceMatrix != DELETION) { align1InfoPtr->startIndel[align1InfoPtr->lengthIndel] = nCharString1 - i; align1InfoPtr->lengthIndel++; } align1InfoPtr->widthIndel[align1InfoPtr->lengthIndel - 1] += 1; } } prevTraceMatrix = currTraceMatrix; currTraceMatrix = D_TRACE_MATRIX(i, j); j--; break; case SUBSTITUTION: prevTraceMatrix = currTraceMatrix; currTraceMatrix = S_TRACE_MATRIX(i, j); if (currTraceMatrix != TERMINATION) { align1InfoPtr->widthRange++; align2InfoPtr->widthRange++; } if (currTraceMatrix != TERMINATION && align1InfoPtr->string.ptr[nCharString1Minus1 - i] != align2InfoPtr->string.ptr[nCharString2Minus1 - j]) { align1InfoPtr->mismatch[align1InfoPtr->lengthMismatch] = nCharString1 - i; align2InfoPtr->mismatch[align2InfoPtr->lengthMismatch] = nCharString2 - j; align1InfoPtr->lengthMismatch++; align2InfoPtr->lengthMismatch++; } i--; j--; break; default: error("unknown traceback code %d", currTraceMatrix); break; } } const int offset1 = align1InfoPtr->startRange - 1; if (offset1 > 0 && align1InfoPtr->lengthIndel > 0) { for (i = 0; i < align1InfoPtr->lengthIndel; i++) align1InfoPtr->startIndel[i] -= offset1; } const int offset2 = align2InfoPtr->startRange - 1; if (offset2 > 0 && align2InfoPtr->lengthIndel > 0) { for (j = 0; j < align2InfoPtr->lengthIndel; j++) align2InfoPtr->startIndel[j] -= offset2; } return; } /* Returns the score of the optimal pairwise alignment */ static double pairwiseAlignment( struct AlignInfo *align1InfoPtr, struct AlignInfo *align2InfoPtr, const int localAlignment, const int scoreOnly, const float gapOpening, const float gapExtension, const int useQuality, const double *substitutionArray, const int *substitutionArrayDim, const int *substitutionLookupTable, const int substitutionLookupTableLength, const int *fuzzyMatrix, const int *fuzzyMatrixDim, const int *fuzzyLookupTable, const int fuzzyLookupTableLength, struct AlignBuffer *alignBufferPtr) { int i, j, iMinus1, jMinus1; /* Step 1: Get information on input XString objects */ const int nCharString1 = align1InfoPtr->string.length; const int nCharString2 = align2InfoPtr->string.length; const int nCharString1Plus1 = nCharString1 + 1; const int nCharString1Minus1 = nCharString1 - 1; const int nCharString2Minus1 = nCharString2 - 1; align1InfoPtr->startRange = -1; align2InfoPtr->startRange = -1; align1InfoPtr->widthRange = 0; align2InfoPtr->widthRange = 0; if (nCharString1 < 1 || nCharString2 < 1) { double zeroCharScore; if (nCharString1 >= 1 && align1InfoPtr->endGap) zeroCharScore = - gapOpening - nCharString1 * gapExtension; else if (nCharString2 >= 1 && align2InfoPtr->endGap) zeroCharScore = - gapOpening - nCharString2 * gapExtension; else zeroCharScore = 0.0; align1InfoPtr->lengthMismatch = 0; align2InfoPtr->lengthMismatch = 0; align1InfoPtr->lengthIndel = 0; align2InfoPtr->lengthIndel = 0; return zeroCharScore; } /* Step 2: Create objects for scores values */ /* Rows of currMatrix and prevMatrix = (0) substitution, (1) deletion, and (2) insertion */ float *currMatrix = alignBufferPtr->currMatrix; float *prevMatrix = alignBufferPtr->prevMatrix; CURR_MATRIX(0, 0) = 0.0; CURR_MATRIX(0, 1) = (align2InfoPtr->endGap ? - gapOpening : 0.0); for (i = 1, iMinus1 = 0; i <= nCharString1; i++, iMinus1++) { CURR_MATRIX(i, 0) = NEGATIVE_INFINITY; CURR_MATRIX(i, 1) = NEGATIVE_INFINITY; } if (align1InfoPtr->endGap) { for (i = 0; i <= nCharString1; i++) CURR_MATRIX(i, 2) = - gapOpening - i * gapExtension; } else { for (i = 0; i <= nCharString1; i++) CURR_MATRIX(i, 2) = 0.0; } /* Step 3: Perform main alignment operations */ Chars_holder sequence1, sequence2; int scalar1, scalar2; if (useQuality) { sequence1 = align1InfoPtr->quality; sequence2 = align2InfoPtr->quality; scalar1 = (align1InfoPtr->quality.length == 1); scalar2 = (align2InfoPtr->quality.length == 1); } else { sequence1 = align1InfoPtr->string; sequence2 = align2InfoPtr->string; scalar1 = (nCharString1 == 1); scalar2 = (nCharString2 == 1); } int lookupValue = 0, element1, element2, stringElt1, stringElt2, fuzzy, iElt, jElt; const int noEndGap1 = !align1InfoPtr->endGap; const int noEndGap2 = !align2InfoPtr->endGap; const float gapOpeningPlusExtension = gapOpening + gapExtension; const float endGapAddend = (align2InfoPtr->endGap ? - gapExtension : 0.0); float *tempMatrix, substitutionValue; double maxScore = NEGATIVE_INFINITY; if (scoreOnly) { /* Simplified calculations when only need the alignment score */ for (j = 1, jElt = nCharString2Minus1; j <= nCharString2; j++, jElt--) { tempMatrix = prevMatrix; prevMatrix = currMatrix; currMatrix = tempMatrix; CURR_MATRIX(0, 0) = NEGATIVE_INFINITY; CURR_MATRIX(0, 1) = PREV_MATRIX(0, 1) + endGapAddend; CURR_MATRIX(0, 2) = NEGATIVE_INFINITY; SET_LOOKUP_VALUE(fuzzyLookupTable, fuzzyLookupTableLength, align2InfoPtr->string.ptr[jElt]); stringElt2 = lookupValue; SET_LOOKUP_VALUE(substitutionLookupTable, substitutionLookupTableLength, sequence2.ptr[scalar2 ? 0 : jElt]); element2 = lookupValue; if (localAlignment) { for (i = 1, iMinus1 = 0, iElt = nCharString1Minus1; i <= nCharString1; i++, iMinus1++, iElt--) { SET_LOOKUP_VALUE(fuzzyLookupTable, fuzzyLookupTableLength, align1InfoPtr->string.ptr[iElt]); stringElt1 = lookupValue; SET_LOOKUP_VALUE(substitutionLookupTable, substitutionLookupTableLength, sequence1.ptr[scalar1 ? 0 : iElt]); element1 = lookupValue; fuzzy = FUZZY_MATRIX(stringElt1, stringElt2); substitutionValue = (float) SUBSTITUTION_ARRAY(element1, element2, fuzzy); CURR_MATRIX(i, 0) = MAX(0.0, MAX(PREV_MATRIX(iMinus1, 0), MAX(PREV_MATRIX(iMinus1, 1), PREV_MATRIX(iMinus1, 2))) + substitutionValue); CURR_MATRIX(i, 1) = MAX(MAX(PREV_MATRIX(i, 0), PREV_MATRIX(i, 2)) - gapOpeningPlusExtension, PREV_MATRIX(i, 1) - gapExtension); CURR_MATRIX(i, 2) = MAX(MAX(CURR_MATRIX(iMinus1, 0), CURR_MATRIX(iMinus1, 1)) - gapOpeningPlusExtension, CURR_MATRIX(iMinus1, 2) - gapExtension); maxScore = MAX(CURR_MATRIX(i, 0), maxScore); } } else { for (i = 1, iMinus1 = 0, iElt = nCharString1Minus1; i <= nCharString1; i++, iMinus1++, iElt--) { SET_LOOKUP_VALUE(fuzzyLookupTable, fuzzyLookupTableLength, align1InfoPtr->string.ptr[iElt]); stringElt1 = lookupValue; SET_LOOKUP_VALUE(substitutionLookupTable, substitutionLookupTableLength, sequence1.ptr[scalar1 ? 0 : iElt]); element1 = lookupValue; fuzzy = FUZZY_MATRIX(stringElt1, stringElt2); substitutionValue = (float) SUBSTITUTION_ARRAY(element1, element2, fuzzy); CURR_MATRIX(i, 0) = MAX(PREV_MATRIX(iMinus1, 0), MAX(PREV_MATRIX(iMinus1, 1), PREV_MATRIX(iMinus1, 2))) + substitutionValue; CURR_MATRIX(i, 1) = MAX(MAX(PREV_MATRIX(i, 0), PREV_MATRIX(i, 2)) - gapOpeningPlusExtension, PREV_MATRIX(i, 1) - gapExtension); CURR_MATRIX(i, 2) = MAX(MAX(CURR_MATRIX(iMinus1, 0), CURR_MATRIX(iMinus1, 1)) - gapOpeningPlusExtension, CURR_MATRIX(iMinus1, 2) - gapExtension); } if (noEndGap2) { CURR_MATRIX(nCharString1, 1) = MAX(PREV_MATRIX(nCharString1, 0), MAX(PREV_MATRIX(nCharString1, 1), PREV_MATRIX(nCharString1, 2))); } if (noEndGap1 && j == nCharString2) { for (i = 1, iMinus1 = 0; i <= nCharString1; i++, iMinus1++) { CURR_MATRIX(i, 2) = MAX(MAX(CURR_MATRIX(iMinus1, 0), CURR_MATRIX(iMinus1, 1)), CURR_MATRIX(iMinus1, 2)); } } } } if (!localAlignment) { maxScore = MAX(CURR_MATRIX(nCharString1, 0), MAX(CURR_MATRIX(nCharString1, 1), CURR_MATRIX(nCharString1, 2))); } } else { /* Step 3a: Create objects for traceback values */ char *sTraceMatrix = alignBufferPtr->sTraceMatrix; char *iTraceMatrix = alignBufferPtr->iTraceMatrix; char *dTraceMatrix = alignBufferPtr->dTraceMatrix; /* Step 3b: Prepare the alignment info object for alignment */ const int alignmentBufferSize = nCharString1Plus1; align1InfoPtr->lengthMismatch = 0; align2InfoPtr->lengthMismatch = 0; align1InfoPtr->lengthIndel = 0; align2InfoPtr->lengthIndel = 0; memset(align1InfoPtr->mismatch, 0, alignmentBufferSize * sizeof(int)); memset(align2InfoPtr->mismatch, 0, alignmentBufferSize * sizeof(int)); memset(align1InfoPtr->startIndel, 0, alignmentBufferSize * sizeof(int)); memset(align2InfoPtr->startIndel, 0, alignmentBufferSize * sizeof(int)); memset(align1InfoPtr->widthIndel, 0, alignmentBufferSize * sizeof(int)); memset(align2InfoPtr->widthIndel, 0, alignmentBufferSize * sizeof(int)); for (j = 1, jMinus1 = 0, jElt = nCharString2Minus1; j <= nCharString2; j++, jMinus1++, jElt--) { tempMatrix = prevMatrix; prevMatrix = currMatrix; currMatrix = tempMatrix; CURR_MATRIX(0, 0) = NEGATIVE_INFINITY; CURR_MATRIX(0, 1) = PREV_MATRIX(0, 1) + endGapAddend; CURR_MATRIX(0, 2) = NEGATIVE_INFINITY; SET_LOOKUP_VALUE(fuzzyLookupTable, fuzzyLookupTableLength, align2InfoPtr->string.ptr[jElt]); stringElt2 = lookupValue; SET_LOOKUP_VALUE(substitutionLookupTable, substitutionLookupTableLength, sequence2.ptr[scalar2 ? 0 : jElt]); element2 = lookupValue; if (localAlignment) { for (i = 1, iMinus1 = 0, iElt = nCharString1Minus1; i <= nCharString1; i++, iMinus1++, iElt--) { SET_LOOKUP_VALUE(fuzzyLookupTable, fuzzyLookupTableLength, align1InfoPtr->string.ptr[iElt]); stringElt1 = lookupValue; SET_LOOKUP_VALUE(substitutionLookupTable, substitutionLookupTableLength, sequence1.ptr[scalar1 ? 0 : iElt]); element1 = lookupValue; fuzzy = FUZZY_MATRIX(stringElt1, stringElt2); substitutionValue = (float) SUBSTITUTION_ARRAY(element1, element2, fuzzy); /* Step 3c: Generate (0) substitution, (1) deletion, and (2) insertion scores * and traceback values */ if (PREV_MATRIX(iMinus1, 0) >= MAX(PREV_MATRIX(iMinus1, 1), PREV_MATRIX(iMinus1, 2))) { S_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 0) = PREV_MATRIX(iMinus1, 0) + substitutionValue; } else if (PREV_MATRIX(iMinus1, 1) >= PREV_MATRIX(iMinus1, 2)) { S_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 0) = PREV_MATRIX(iMinus1, 1) + substitutionValue; } else { S_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 0) = PREV_MATRIX(iMinus1, 2) + substitutionValue; } if (PREV_MATRIX(i, 1) > (MAX(PREV_MATRIX(i, 0), PREV_MATRIX(i, 2)) - gapOpening)) { D_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 1) = PREV_MATRIX(i, 1) - gapExtension; } else if (PREV_MATRIX(i, 0) >= PREV_MATRIX(i, 2)) { D_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 1) = PREV_MATRIX(i, 0) - gapOpeningPlusExtension; } else { D_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 1) = PREV_MATRIX(i, 2) - gapOpeningPlusExtension; } if (CURR_MATRIX(iMinus1, 2) > (MAX(CURR_MATRIX(iMinus1, 0), CURR_MATRIX(iMinus1, 1)) - gapOpening)) { I_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 2) - gapExtension; } else if (CURR_MATRIX(iMinus1, 0) >= CURR_MATRIX(iMinus1, 1)) { I_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 0) - gapOpeningPlusExtension; } else { I_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 1) - gapOpeningPlusExtension; } CURR_MATRIX(i, 0) = MAX(0.0, CURR_MATRIX(i, 0)); if (CURR_MATRIX(i, 0) == 0.0) S_TRACE_MATRIX(iMinus1, jMinus1) = TERMINATION; CURR_MATRIX(i, 1) = MAX(0.0, CURR_MATRIX(i, 1)); if (CURR_MATRIX(i, 1) == 0.0) D_TRACE_MATRIX(iMinus1, jMinus1) = TERMINATION; CURR_MATRIX(i, 2) = MAX(0.0, CURR_MATRIX(i, 2)); if (CURR_MATRIX(i, 2) == 0.0) I_TRACE_MATRIX(iMinus1, jMinus1) = TERMINATION; /* Step 3d: Get the optimal score for local alignments */ if (CURR_MATRIX(i, 0) >= maxScore) { align1InfoPtr->startRange = iElt + 1; align2InfoPtr->startRange = jElt + 1; maxScore = CURR_MATRIX(i, 0); } } } else { for (i = 1, iMinus1 = 0, iElt = nCharString1Minus1; i <= nCharString1; i++, iMinus1++, iElt--) { SET_LOOKUP_VALUE(fuzzyLookupTable, fuzzyLookupTableLength, align1InfoPtr->string.ptr[iElt]); stringElt1 = lookupValue; SET_LOOKUP_VALUE(substitutionLookupTable, substitutionLookupTableLength, sequence1.ptr[scalar1 ? 0 : iElt]); element1 = lookupValue; fuzzy = FUZZY_MATRIX(stringElt1, stringElt2); substitutionValue = (float) SUBSTITUTION_ARRAY(element1, element2, fuzzy); /* Step 3c: Generate (0) substitution, (1) deletion, and (2) insertion scores * and traceback values */ if (PREV_MATRIX(iMinus1, 0) >= MAX(PREV_MATRIX(iMinus1, 1), PREV_MATRIX(iMinus1, 2))) { S_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 0) = PREV_MATRIX(iMinus1, 0) + substitutionValue; } else if (PREV_MATRIX(iMinus1, 1) >= PREV_MATRIX(iMinus1, 2)) { S_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 0) = PREV_MATRIX(iMinus1, 1) + substitutionValue; } else { S_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 0) = PREV_MATRIX(iMinus1, 2) + substitutionValue; } if (PREV_MATRIX(i, 1) > (MAX(PREV_MATRIX(i, 0), PREV_MATRIX(i, 2)) - gapOpening)) { D_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 1) = PREV_MATRIX(i, 1) - gapExtension; } else if (PREV_MATRIX(i, 0) >= PREV_MATRIX(i, 2)) { D_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 1) = PREV_MATRIX(i, 0) - gapOpeningPlusExtension; } else { D_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 1) = PREV_MATRIX(i, 2) - gapOpeningPlusExtension; } if (CURR_MATRIX(iMinus1, 2) > (MAX(CURR_MATRIX(iMinus1, 0), CURR_MATRIX(iMinus1, 1)) - gapOpening)) { I_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 2) - gapExtension; } else if (CURR_MATRIX(iMinus1, 0) >= CURR_MATRIX(iMinus1, 1)) { I_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 0) - gapOpeningPlusExtension; } else { I_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 1) - gapOpeningPlusExtension; } } } if (noEndGap2) { if (PREV_MATRIX(nCharString1, 1) >= MAX(PREV_MATRIX(nCharString1, 0), PREV_MATRIX(nCharString1, 2))) { D_TRACE_MATRIX(nCharString1Minus1, jMinus1) = DELETION; CURR_MATRIX(nCharString1, 1) = PREV_MATRIX(nCharString1, 1); } else if (PREV_MATRIX(nCharString1, 0) >= PREV_MATRIX(nCharString1, 2)) { D_TRACE_MATRIX(nCharString1Minus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(nCharString1, 1) = PREV_MATRIX(nCharString1, 0); } else { D_TRACE_MATRIX(nCharString1Minus1, jMinus1) = INSERTION; CURR_MATRIX(nCharString1, 1) = PREV_MATRIX(nCharString1, 2); } } if (noEndGap1 && j == nCharString2) { for (i = 1, iMinus1 = 0; i <= nCharString1; i++, iMinus1++) { if (CURR_MATRIX(iMinus1, 2) >= MAX(CURR_MATRIX(iMinus1, 0), CURR_MATRIX(iMinus1, 1))) { I_TRACE_MATRIX(iMinus1, jMinus1) = INSERTION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 2); } else if (CURR_MATRIX(iMinus1, 0) >= CURR_MATRIX(iMinus1, 1)) { I_TRACE_MATRIX(iMinus1, jMinus1) = SUBSTITUTION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 0); } else { I_TRACE_MATRIX(iMinus1, jMinus1) = DELETION; CURR_MATRIX(i, 2) = CURR_MATRIX(iMinus1, 1); } } } } char currTraceMatrix = '?'; if (localAlignment) { if (maxScore == 0.0) currTraceMatrix = TERMINATION; else currTraceMatrix = SUBSTITUTION; } else { /* Step 3g: Get the optimal score for non-local alignments */ align1InfoPtr->startRange = 1; align2InfoPtr->startRange = 1; if (CURR_MATRIX(nCharString1, 0) >= MAX(CURR_MATRIX(nCharString1, 1), CURR_MATRIX(nCharString1, 2))) { currTraceMatrix = SUBSTITUTION; maxScore = CURR_MATRIX(nCharString1, 0); } else if (CURR_MATRIX(nCharString1, 1) >= CURR_MATRIX(nCharString1, 2)) { currTraceMatrix = DELETION; maxScore = CURR_MATRIX(nCharString1, 1); } else { currTraceMatrix = INSERTION; maxScore = CURR_MATRIX(nCharString1, 2); } } /* Step 4: Traceback through the score matrices */ traceback(alignBufferPtr, currTraceMatrix, align1InfoPtr, align2InfoPtr); } return (double) maxScore; } /* * INPUTS * 'pattern': XStringSet or QualityScaledXStringSet object for patterns * 'subject': XStringSet or QualityScaledXStringSet object for subject * 'type': type of pairwise alignment * (character vector of length 1; * 'global', 'local', 'overlap', 'global-local', * 'local-global') * 'typeCode': type of pairwise alignment * (integer vector of length 1; * 1 = 'global', 2 = 'local', 3 = 'overlap', * 4 = 'global-local', 5 = 'local-global') * 'scoreOnly': denotes whether or not to only return the scores * of the optimal pairwise alignment * (logical vector of length 1) * 'gapOpening': gap opening cost or penalty * (single non-negative double) * 'gapExtension': gap extension cost or penalty * (single non-negative double) * 'useQuality': denotes whether or not to use quality measures * in the optimal pairwise alignment * (logical vector of length 1) * 'substitutionArray': a three-dimensional double array where the first two * dimensions are for substitutions and the * third is for fuzziness of matches * 'substitutionArrayDim': dimension of 'substitutionArray' * (integer vector of length 3) * 'substitutionLookupTable': lookup table for translating XString bytes to * substitution indices * (integer vector) * 'fuzzyMatrix': fuzzy matrix for matches * (double matrix) * 'fuzzyMatrixDim': dimension of 'fuzzyMatrix' * (integer vector of length 2) * 'fuzzyLookupTable': lookup table for translating XString bytes to * fuzzy indices * (integer vector) * * OUTPUT * If scoreOnly = TRUE, returns either a vector of scores * If scoreOnly = FALSE, returns an S4 PairwiseAlignments or * PairwiseAlignmentsSingleSubject object. */ SEXP XStringSet_align_pairwiseAlignment( SEXP pattern, SEXP subject, SEXP type, SEXP typeCode, SEXP scoreOnly, SEXP gapOpening, SEXP gapExtension, SEXP useQuality, SEXP substitutionArray, SEXP substitutionArrayDim, SEXP substitutionLookupTable, SEXP fuzzyMatrix, SEXP fuzzyMatrixDim, SEXP fuzzyLookupTable) { const int scoreOnlyValue = LOGICAL(scoreOnly)[0]; const int useQualityValue = LOGICAL(useQuality)[0]; const int localAlignment = (INTEGER(typeCode)[0] == LOCAL_ALIGNMENT); float gapOpeningValue = REAL(gapOpening)[0]; float gapExtensionValue = REAL(gapExtension)[0]; if (gapOpeningValue == POSITIVE_INFINITY || gapExtensionValue == POSITIVE_INFINITY) { gapOpeningValue = 0.0; gapExtensionValue = POSITIVE_INFINITY; } XStringSet_holder pattern_holder = hold_XStringSet(pattern); XStringSet_holder subject_holder = hold_XStringSet(subject); const int numberOfStrings = get_length_from_XStringSet_holder(&pattern_holder); const int multipleSubjects = get_length_from_XStringSet_holder(&subject_holder) > 1; int lengthOfPatternQualitySet = 0; int lengthOfSubjectQualitySet = 0; SEXP patternQuality, subjectQuality; XStringSet_holder patternQuality_holder, subjectQuality_holder; if (useQualityValue) { patternQuality = GET_SLOT(pattern, install("quality")); subjectQuality = GET_SLOT(subject, install("quality")); patternQuality_holder = hold_XStringSet(patternQuality); lengthOfPatternQualitySet = get_XStringSet_length(patternQuality); subjectQuality_holder = hold_XStringSet(subjectQuality); lengthOfSubjectQualitySet = get_length_from_XStringSet_holder(&subjectQuality_holder); } else { patternQuality = R_NilValue; subjectQuality = R_NilValue; } /* Create the alignment info objects */ struct AlignInfo align1Info, align2Info; align2Info.string = get_elt_from_XStringSet_holder(&subject_holder, 0); if (useQualityValue) align2Info.quality = get_elt_from_XStringSet_holder(&subjectQuality_holder, 0); align1Info.endGap = (INTEGER(typeCode)[0] == GLOBAL_ALIGNMENT || INTEGER(typeCode)[0] == GLOBAL_LOCAL_ALIGNMENT); align2Info.endGap = (INTEGER(typeCode)[0] == GLOBAL_ALIGNMENT || INTEGER(typeCode)[0] == LOCAL_GLOBAL_ALIGNMENT); SEXP output; int i, quality1Element = 0, quality2Element = 0; const int quality1Increment = ((lengthOfPatternQualitySet < numberOfStrings) ? 0 : 1); const int quality2Increment = ((lengthOfSubjectQualitySet < numberOfStrings) ? 0 : 1); /* Create the alignment buffer object */ struct AlignBuffer alignBuffer; int nCharString1 = 0, nCharString2 = 0, nCharProduct = 0; reset_ovflow_flag(); if (multipleSubjects) { for (i = 0; i < numberOfStrings; i++) { int nchar1 = get_elt_from_XStringSet_holder(&pattern_holder, i).length; int nchar2 = get_elt_from_XStringSet_holder(&subject_holder, i).length; nCharString1 = MAX(nCharString1, nchar1); nCharString2 = MAX(nCharString2, nchar2); nCharProduct = MAX(nCharProduct, safe_int_mult(nchar1, nchar2)); } } else { for (i = 0; i < numberOfStrings; i++) { nCharString1 = MAX(nCharString1, get_elt_from_XStringSet_holder(&pattern_holder, i).length); } nCharString2 = align2Info.string.length; nCharProduct = safe_int_mult(nCharString1, nCharString2); } if (get_ovflow_flag()) error("max(nchar(pattern) * nchar(subject)) is too big " "(must be <= %d)", INT_MAX); const int alignmentBufferSize = nCharString1 + 1; alignBuffer.currMatrix = (float *) R_alloc((long) 3 * alignmentBufferSize, sizeof(float)); alignBuffer.prevMatrix = (float *) R_alloc((long) 3 * alignmentBufferSize, sizeof(float)); struct MismatchBuffer mismatchBuffer; struct IndelBuffer indel1Buffer; struct IndelBuffer indel2Buffer; int mismatchBufferSize = 0, indelBufferSize = 0; if (!scoreOnlyValue) { align1Info.mismatch = (int *) R_alloc((long) alignmentBufferSize, sizeof(int)); align2Info.mismatch = (int *) R_alloc((long) alignmentBufferSize, sizeof(int)); align1Info.startIndel = (int *) R_alloc((long) alignmentBufferSize, sizeof(int)); align2Info.startIndel = (int *) R_alloc((long) alignmentBufferSize, sizeof(int)); align1Info.widthIndel = (int *) R_alloc((long) alignmentBufferSize, sizeof(int)); align2Info.widthIndel = (int *) R_alloc((long) alignmentBufferSize, sizeof(int)); alignBuffer.sTraceMatrix = (char *) R_alloc((long) nCharProduct, sizeof(char)); alignBuffer.iTraceMatrix = (char *) R_alloc((long) nCharProduct, sizeof(char)); alignBuffer.dTraceMatrix = (char *) R_alloc((long) nCharProduct, sizeof(char)); mismatchBufferSize = MIN(MAX_BUF_SIZE, alignmentBufferSize + numberOfStrings * (alignmentBufferSize/4)); mismatchBuffer.pattern = (int *) R_alloc((long) mismatchBufferSize, sizeof(int)); mismatchBuffer.subject = (int *) R_alloc((long) mismatchBufferSize, sizeof(int)); mismatchBuffer.usedSpace = 0; mismatchBuffer.totalSpace = mismatchBufferSize; indelBufferSize = MIN(MAX_BUF_SIZE, alignmentBufferSize + numberOfStrings * (alignmentBufferSize/12)); indel1Buffer.start = (int *) R_alloc((long) indelBufferSize, sizeof(int)); indel1Buffer.width = (int *) R_alloc((long) indelBufferSize, sizeof(int)); indel1Buffer.usedSpace = 0; indel1Buffer.totalSpace = indelBufferSize; indel2Buffer.start = (int *) R_alloc((long) indelBufferSize, sizeof(int)); indel2Buffer.width = (int *) R_alloc((long) indelBufferSize, sizeof(int)); indel2Buffer.usedSpace = 0; indel2Buffer.totalSpace = indelBufferSize; } double *score; if (scoreOnlyValue) { PROTECT(output = NEW_NUMERIC(numberOfStrings)); for (i = 0, score = REAL(output); i < numberOfStrings; i++, score++) { R_CheckUserInterrupt(); align1Info.string = get_elt_from_XStringSet_holder(&pattern_holder, i); if (useQualityValue) { align1Info.quality = get_elt_from_XStringSet_holder(&patternQuality_holder, quality1Element); quality1Element += quality1Increment; } if (multipleSubjects) { align2Info.string = get_elt_from_XStringSet_holder(&subject_holder, i); if (useQualityValue) { align2Info.quality = get_elt_from_XStringSet_holder(&subjectQuality_holder, quality2Element); quality2Element += quality2Increment; } } *score = pairwiseAlignment( &align1Info, &align2Info, localAlignment, scoreOnlyValue, gapOpeningValue, gapExtensionValue, useQualityValue, REAL(substitutionArray), INTEGER(substitutionArrayDim), INTEGER(substitutionLookupTable), LENGTH(substitutionLookupTable), INTEGER(fuzzyMatrix), INTEGER(fuzzyMatrixDim), INTEGER(fuzzyLookupTable), LENGTH(fuzzyLookupTable), &alignBuffer); } UNPROTECT(1); } else { SEXP alignedPattern; SEXP alignedPatternRange, alignedPatternRangeStart, alignedPatternRangeWidth; SEXP alignedPatternMismatch; SEXP alignedPatternMismatchPartitioning, alignedPatternMismatchValues; SEXP alignedPatternMismatchEnds; SEXP alignedPatternIndel; SEXP alignedPatternIndelPartitioning, alignedPatternIndelRange; SEXP alignedPatternIndelRangeStart, alignedPatternIndelRangeWidth; SEXP alignedPatternIndelEnds; SEXP alignedSubject; SEXP alignedSubjectRange, alignedSubjectRangeStart, alignedSubjectRangeWidth; SEXP alignedSubjectMismatch; SEXP alignedSubjectMismatchPartitioning, alignedSubjectMismatchValues; SEXP alignedSubjectMismatchEnds; SEXP alignedSubjectIndel; SEXP alignedSubjectIndelPartitioning, alignedSubjectIndelRange; SEXP alignedSubjectIndelRangeStart, alignedSubjectIndelRangeWidth; SEXP alignedSubjectIndelEnds; SEXP alignedScore; PROTECT(alignedPatternRangeStart = NEW_INTEGER(numberOfStrings)); PROTECT(alignedPatternRangeWidth = NEW_INTEGER(numberOfStrings)); PROTECT(alignedPatternMismatchEnds = NEW_INTEGER(numberOfStrings)); PROTECT(alignedPatternIndelEnds = NEW_INTEGER(numberOfStrings)); PROTECT(alignedSubjectRangeStart = NEW_INTEGER(numberOfStrings)); PROTECT(alignedSubjectRangeWidth = NEW_INTEGER(numberOfStrings)); PROTECT(alignedSubjectMismatchEnds = NEW_INTEGER(numberOfStrings)); PROTECT(alignedSubjectIndelEnds = NEW_INTEGER(numberOfStrings)); PROTECT(alignedScore = NEW_NUMERIC(numberOfStrings)); int align1MismatchPrevEnd = 0, align1IndelPrevEnd = 0; int align2MismatchPrevEnd = 0, align2IndelPrevEnd = 0; int *tempIntPtr; int *align1RangeStart, *align1RangeWidth, *align1MismatchEnds, *align1IndelEnds; int *align2RangeStart, *align2RangeWidth, *align2MismatchEnds, *align2IndelEnds; for (i = 0, score = REAL(alignedScore), align1RangeStart = INTEGER(alignedPatternRangeStart), align1RangeWidth = INTEGER(alignedPatternRangeWidth), align1MismatchEnds = INTEGER(alignedPatternMismatchEnds), align1IndelEnds = INTEGER(alignedPatternIndelEnds), align2RangeStart = INTEGER(alignedSubjectRangeStart), align2RangeWidth = INTEGER(alignedSubjectRangeWidth), align2MismatchEnds = INTEGER(alignedSubjectMismatchEnds), align2IndelEnds = INTEGER(alignedSubjectIndelEnds); i < numberOfStrings; i++, score++, align1RangeStart++, align1RangeWidth++, align1MismatchEnds++, align1IndelEnds++, align2RangeStart++, align2RangeWidth++, align2MismatchEnds++, align2IndelEnds++) { R_CheckUserInterrupt(); align1Info.string = get_elt_from_XStringSet_holder(&pattern_holder, i); if (useQualityValue) { align1Info.quality = get_elt_from_XStringSet_holder(&patternQuality_holder, quality1Element); quality1Element += quality1Increment; } if (multipleSubjects) { align2Info.string = get_elt_from_XStringSet_holder(&subject_holder, i); if (useQualityValue) { align2Info.quality = get_elt_from_XStringSet_holder(&subjectQuality_holder, quality2Element); quality2Element += quality2Increment; } } *score = pairwiseAlignment( &align1Info, &align2Info, localAlignment, scoreOnlyValue, gapOpeningValue, gapExtensionValue, useQualityValue, REAL(substitutionArray), INTEGER(substitutionArrayDim), INTEGER(substitutionLookupTable), LENGTH(substitutionLookupTable), INTEGER(fuzzyMatrix), INTEGER(fuzzyMatrixDim), INTEGER(fuzzyLookupTable), LENGTH(fuzzyLookupTable), &alignBuffer); *align1MismatchEnds = align1Info.lengthMismatch + align1MismatchPrevEnd; *align2MismatchEnds = align2Info.lengthMismatch + align2MismatchPrevEnd; if (align1Info.lengthMismatch > 0) { if ((mismatchBuffer.usedSpace + align1Info.lengthMismatch) > mismatchBuffer.totalSpace) { mismatchBuffer.totalSpace = mismatchBuffer.totalSpace + MIN(MAX_BUF_SIZE, alignmentBufferSize + (numberOfStrings - (i+1)) * (alignmentBufferSize/4)); tempIntPtr = (int *) R_alloc((long) mismatchBuffer.totalSpace, sizeof(int)); memcpy(tempIntPtr, mismatchBuffer.pattern, mismatchBuffer.usedSpace * sizeof(int)); mismatchBuffer.pattern = tempIntPtr; tempIntPtr = (int *) R_alloc((long) mismatchBuffer.totalSpace, sizeof(int)); memcpy(tempIntPtr, mismatchBuffer.subject, mismatchBuffer.usedSpace * sizeof(int)); mismatchBuffer.subject = tempIntPtr; } memcpy(&mismatchBuffer.pattern[mismatchBuffer.usedSpace], align1Info.mismatch, align1Info.lengthMismatch * sizeof(int)); memcpy(&mismatchBuffer.subject[mismatchBuffer.usedSpace], align2Info.mismatch, align1Info.lengthMismatch * sizeof(int)); mismatchBuffer.usedSpace = mismatchBuffer.usedSpace + align1Info.lengthMismatch; } *align1RangeStart = align1Info.startRange; *align1RangeWidth = align1Info.widthRange; *align1IndelEnds = align1Info.lengthIndel + align1IndelPrevEnd; if (align1Info.lengthIndel > 0) { if ((indel1Buffer.usedSpace + align1Info.lengthIndel) > indel1Buffer.totalSpace) { indel1Buffer.totalSpace = indel1Buffer.totalSpace + MIN(MAX_BUF_SIZE, alignmentBufferSize + (numberOfStrings - (i+1)) * (alignmentBufferSize/12)); tempIntPtr = (int *) R_alloc((long) indel1Buffer.totalSpace, sizeof(int)); memcpy(tempIntPtr, indel1Buffer.start, indel1Buffer.usedSpace * sizeof(int)); indel1Buffer.start = tempIntPtr; tempIntPtr = (int *) R_alloc((long) indel1Buffer.totalSpace, sizeof(int)); memcpy(tempIntPtr, indel1Buffer.width, indel1Buffer.usedSpace * sizeof(int)); indel1Buffer.width = tempIntPtr; } memcpy(&indel1Buffer.start[indel1Buffer.usedSpace], align1Info.startIndel, align1Info.lengthIndel * sizeof(int)); memcpy(&indel1Buffer.width[indel1Buffer.usedSpace], align1Info.widthIndel, align1Info.lengthIndel * sizeof(int)); indel1Buffer.usedSpace = indel1Buffer.usedSpace + align1Info.lengthIndel; } *align2RangeStart = align2Info.startRange; *align2RangeWidth = align2Info.widthRange; *align2IndelEnds = align2Info.lengthIndel + align2IndelPrevEnd; if (align2Info.lengthIndel > 0) { if ((indel2Buffer.usedSpace + align2Info.lengthIndel) > indel2Buffer.totalSpace) { indel2Buffer.totalSpace = indel2Buffer.totalSpace + MIN(MAX_BUF_SIZE, alignmentBufferSize + (numberOfStrings - (i+1)) * (alignmentBufferSize/12)); tempIntPtr = (int *) R_alloc((long) indel2Buffer.totalSpace, sizeof(int)); memcpy(tempIntPtr, indel2Buffer.start, indel2Buffer.usedSpace * sizeof(int)); indel2Buffer.start = tempIntPtr; tempIntPtr = (int *) R_alloc((long) indel2Buffer.totalSpace, sizeof(int)); memcpy(tempIntPtr, indel2Buffer.width, indel2Buffer.usedSpace * sizeof(int)); indel2Buffer.width = tempIntPtr; } memcpy(&indel2Buffer.start[indel2Buffer.usedSpace], align2Info.startIndel, align2Info.lengthIndel * sizeof(int)); memcpy(&indel2Buffer.width[indel2Buffer.usedSpace], align2Info.widthIndel, align2Info.lengthIndel * sizeof(int)); indel2Buffer.usedSpace = indel2Buffer.usedSpace + align2Info.lengthIndel; } align1MismatchPrevEnd = *align1MismatchEnds; align2MismatchPrevEnd = *align2MismatchEnds; align1IndelPrevEnd = *align1IndelEnds; align2IndelPrevEnd = *align2IndelEnds; } /* Create the output object */ if (multipleSubjects) { PROTECT(output = NEW_OBJECT(MAKE_CLASS("PairwiseAlignments"))); } else { PROTECT(output = NEW_OBJECT(MAKE_CLASS("PairwiseAlignmentsSingleSubject"))); } /* Set the "pattern" slot */ if (useQualityValue) { PROTECT(alignedPattern = NEW_OBJECT(MAKE_CLASS("QualityAlignedXStringSet"))); } else { PROTECT(alignedPattern = NEW_OBJECT(MAKE_CLASS("AlignedXStringSet"))); } SET_SLOT(alignedPattern, mkChar("unaligned"), pattern); /* Set the "range" sub-slot */ PROTECT(alignedPatternRange = new_IRanges("IRanges", alignedPatternRangeStart, alignedPatternRangeWidth, R_NilValue)); SET_SLOT(alignedPattern, mkChar("range"), alignedPatternRange); /* Set the "mismatch" sub-slot */ PROTECT(alignedPatternMismatch = NEW_OBJECT(MAKE_CLASS("CompressedIntegerList"))); PROTECT(alignedPatternMismatchPartitioning = NEW_OBJECT(MAKE_CLASS("PartitioningByEnd"))); PROTECT(alignedPatternMismatchValues = NEW_INTEGER(mismatchBuffer.usedSpace)); memcpy(INTEGER(alignedPatternMismatchValues), mismatchBuffer.pattern, mismatchBuffer.usedSpace * sizeof(int)); SET_SLOT(alignedPatternMismatchPartitioning, mkChar("end"), alignedPatternMismatchEnds); SET_SLOT(alignedPatternMismatch, mkChar("partitioning"), alignedPatternMismatchPartitioning); SET_SLOT(alignedPatternMismatch, mkChar("unlistData"), alignedPatternMismatchValues); SET_SLOT(alignedPattern, mkChar("mismatch"), alignedPatternMismatch); /* Set the "indel" sub-slot */ PROTECT(alignedPatternIndel = NEW_OBJECT(MAKE_CLASS("CompressedIRangesList"))); PROTECT(alignedPatternIndelPartitioning = NEW_OBJECT(MAKE_CLASS("PartitioningByEnd"))); PROTECT(alignedPatternIndelRangeStart = NEW_INTEGER(indel1Buffer.usedSpace)); PROTECT(alignedPatternIndelRangeWidth = NEW_INTEGER(indel1Buffer.usedSpace)); memcpy(INTEGER(alignedPatternIndelRangeStart), indel1Buffer.start, indel1Buffer.usedSpace * sizeof(int)); memcpy(INTEGER(alignedPatternIndelRangeWidth), indel1Buffer.width, indel1Buffer.usedSpace * sizeof(int)); PROTECT(alignedPatternIndelRange = new_IRanges("IRanges", alignedPatternIndelRangeStart, alignedPatternIndelRangeWidth, R_NilValue)); SET_SLOT(alignedPatternIndelPartitioning, mkChar("end"), alignedPatternIndelEnds); SET_SLOT(alignedPatternIndel, mkChar("partitioning"), alignedPatternIndelPartitioning); SET_SLOT(alignedPatternIndel, mkChar("unlistData"), alignedPatternIndelRange); SET_SLOT(alignedPattern, mkChar("indel"), alignedPatternIndel); SET_SLOT(output, mkChar("pattern"), alignedPattern); /* Set the "subject" slot */ if (useQualityValue) { PROTECT(alignedSubject = NEW_OBJECT(MAKE_CLASS("QualityAlignedXStringSet"))); } else { PROTECT(alignedSubject = NEW_OBJECT(MAKE_CLASS("AlignedXStringSet"))); } SET_SLOT(alignedSubject, mkChar("unaligned"), subject); /* Set the "range" sub-slot */ PROTECT(alignedSubjectRange = new_IRanges("IRanges", alignedSubjectRangeStart, alignedSubjectRangeWidth, R_NilValue)); SET_SLOT(alignedSubject, mkChar("range"), alignedSubjectRange); /* Set the "mismatch" sub-slot */ PROTECT(alignedSubjectMismatch = NEW_OBJECT(MAKE_CLASS("CompressedIntegerList"))); PROTECT(alignedSubjectMismatchPartitioning = NEW_OBJECT(MAKE_CLASS("PartitioningByEnd"))); PROTECT(alignedSubjectMismatchValues = NEW_INTEGER(mismatchBuffer.usedSpace)); memcpy(INTEGER(alignedSubjectMismatchValues), mismatchBuffer.subject, mismatchBuffer.usedSpace * sizeof(int)); SET_SLOT(alignedSubjectMismatchPartitioning, mkChar("end"), alignedSubjectMismatchEnds); SET_SLOT(alignedSubjectMismatch, mkChar("partitioning"), alignedSubjectMismatchPartitioning); SET_SLOT(alignedSubjectMismatch, mkChar("unlistData"), alignedSubjectMismatchValues); SET_SLOT(alignedSubject, mkChar("mismatch"), alignedSubjectMismatch); /* Set the "indel" sub-slot */ PROTECT(alignedSubjectIndel = NEW_OBJECT(MAKE_CLASS("CompressedIRangesList"))); PROTECT(alignedSubjectIndelPartitioning = NEW_OBJECT(MAKE_CLASS("PartitioningByEnd"))); PROTECT(alignedSubjectIndelRangeStart = NEW_INTEGER(indel2Buffer.usedSpace)); PROTECT(alignedSubjectIndelRangeWidth = NEW_INTEGER(indel2Buffer.usedSpace)); memcpy(INTEGER(alignedSubjectIndelRangeStart), indel2Buffer.start, indel2Buffer.usedSpace * sizeof(int)); memcpy(INTEGER(alignedSubjectIndelRangeWidth), indel2Buffer.width, indel2Buffer.usedSpace * sizeof(int)); PROTECT(alignedSubjectIndelRange = new_IRanges("IRanges", alignedSubjectIndelRangeStart, alignedSubjectIndelRangeWidth, R_NilValue)); SET_SLOT(alignedSubjectIndelPartitioning, mkChar("end"), alignedSubjectIndelEnds); SET_SLOT(alignedSubjectIndel, mkChar("partitioning"), alignedSubjectIndelPartitioning); SET_SLOT(alignedSubjectIndel, mkChar("unlistData"), alignedSubjectIndelRange); SET_SLOT(alignedSubject, mkChar("indel"), alignedSubjectIndel); SET_SLOT(output, mkChar("subject"), alignedSubject); /* Set the "score" slot */ SET_SLOT(output, mkChar("score"), alignedScore); /* Set the "type" slot */ SET_SLOT(output, mkChar("type"), type); /* Set the "gapOpening" slot */ SET_SLOT(output, mkChar("gapOpening"), gapOpening); /* Set the "gapExtension" slot */ SET_SLOT(output, mkChar("gapExtension"), gapExtension); /* Output is ready */ UNPROTECT(30); } return output; } /* * INPUTS * 'string': XStringSet object for strings * 'type': type of pairwise alignment * (character vector of length 1; * 'global', 'local', 'overlap') * 'typeCode': type of pairwise alignment * (integer vector of length 1; * 1 = 'global', 2 = 'local', 3 = 'overlap') * 'gapOpening': gap opening cost or penalty * (single non-negative double) * 'gapExtension': gap extension cost or penalty * (single non-negative double) * 'useQuality': denotes whether or not to use quality measures * in the optimal pairwise alignment * (logical vector of length 1) * 'substitutionArray': a three-dimensional double array where the first two * dimensions are for substitutions and the * third is for fuzziness of matches * 'substitutionArrayDim': dimension of 'substitutionArray' * (integer vector of length 3) * 'substitutionLookupTable': lookup table for translating XString bytes to * substitution indices * (integer vector) * 'fuzzyMatrix': fuzzy matrix for matches * (double matrix) * 'fuzzyMatrixDim': dimension of 'fuzzyMatrix' * (integer vector of length 2) * 'fuzzyLookupTable': lookup table for translating XString bytes to * fuzzy indices * (integer vector) * * OUTPUT * Return a numeric vector containing the lower triangle of the score matrix. */ SEXP XStringSet_align_distance( SEXP string, SEXP type, SEXP typeCode, SEXP gapOpening, SEXP gapExtension, SEXP useQuality, SEXP substitutionArray, SEXP substitutionArrayDim, SEXP substitutionLookupTable, SEXP fuzzyMatrix, SEXP fuzzyMatrixDim, SEXP fuzzyLookupTable) { int scoreOnlyValue = 1; int useQualityValue = LOGICAL(useQuality)[0]; float gapOpeningValue = REAL(gapOpening)[0]; float gapExtensionValue = REAL(gapExtension)[0]; if (gapOpeningValue == POSITIVE_INFINITY || gapExtensionValue == POSITIVE_INFINITY) { gapOpeningValue = 0.0; gapExtensionValue = POSITIVE_INFINITY; } int localAlignment = (INTEGER(typeCode)[0] == LOCAL_ALIGNMENT); /* Create the alignment info objects */ struct AlignInfo align1Info, align2Info; align1Info.endGap = (INTEGER(typeCode)[0] == GLOBAL_ALIGNMENT); align2Info.endGap = (INTEGER(typeCode)[0] == GLOBAL_ALIGNMENT); int numberOfStrings = get_XStringSet_length(string); int lengthOfStringQualitySet = 0; SEXP stringQuality = R_NilValue; XStringSet_holder string_holder = hold_XStringSet(string); XStringSet_holder stringQuality_holder; if (useQualityValue) { stringQuality = GET_SLOT(string, install("quality")); stringQuality_holder = hold_XStringSet(stringQuality); lengthOfStringQualitySet = get_XStringSet_length(stringQuality); } SEXP output; int i, j, iQualityElement = 0, jQualityElement = 0; int qualityIncrement = ((lengthOfStringQualitySet < numberOfStrings) ? 0 : 1); /* Create the alignment buffer object */ struct AlignBuffer alignBuffer; int nCharString = 0; for (i = 0; i < numberOfStrings; i++) { nCharString = MAX(nCharString, get_elt_from_XStringSet_holder(&string_holder, i).length); } int alignmentBufferSize = nCharString + 1; alignBuffer.currMatrix = (float *) R_alloc((long) 3 * alignmentBufferSize, sizeof(float)); alignBuffer.prevMatrix = (float *) R_alloc((long) 3 * alignmentBufferSize, sizeof(float)); double *score; PROTECT(output = NEW_NUMERIC((numberOfStrings * (numberOfStrings - 1)) / 2)); score = REAL(output); if (!useQualityValue) { for (i = 0; i < numberOfStrings; i++) { R_CheckUserInterrupt(); align1Info.string = get_elt_from_XStringSet_holder(&string_holder, i); for (j = i + 1; j < numberOfStrings; j++) { align2Info.string = get_elt_from_XStringSet_holder(&string_holder, j); *score = pairwiseAlignment( &align1Info, &align2Info, localAlignment, scoreOnlyValue, gapOpeningValue, gapExtensionValue, useQualityValue, REAL(substitutionArray), INTEGER(substitutionArrayDim), INTEGER(substitutionLookupTable), LENGTH(substitutionLookupTable), INTEGER(fuzzyMatrix), INTEGER(fuzzyMatrixDim), INTEGER(fuzzyLookupTable), LENGTH(fuzzyLookupTable), &alignBuffer); score++; } } } else { for (i = 0; i < numberOfStrings; i++) { R_CheckUserInterrupt(); align1Info.string = get_elt_from_XStringSet_holder(&string_holder, i); align1Info.quality = get_elt_from_XStringSet_holder(&stringQuality_holder, iQualityElement); jQualityElement = iQualityElement + qualityIncrement; iQualityElement += qualityIncrement; for (j = i + 1; j < numberOfStrings; j++) { align2Info.string = get_elt_from_XStringSet_holder(&string_holder, j); align2Info.quality = get_elt_from_XStringSet_holder(&stringQuality_holder, jQualityElement); jQualityElement += qualityIncrement; *score = pairwiseAlignment( &align1Info, &align2Info, localAlignment, scoreOnlyValue, gapOpeningValue, gapExtensionValue, useQualityValue, REAL(substitutionArray), INTEGER(substitutionArrayDim), INTEGER(substitutionLookupTable), LENGTH(substitutionLookupTable), INTEGER(fuzzyMatrix), INTEGER(fuzzyMatrixDim), INTEGER(fuzzyLookupTable), LENGTH(fuzzyLookupTable), &alignBuffer); score++; } } } UNPROTECT(1); return output; } pwalign/src/align_utils.c0000644000175100017510000002654114614311433016510 0ustar00biocbuildbiocbuild#include "pwalign.h" #include "S4Vectors_interface.h" #include "IRanges_interface.h" #include "XVector_interface.h" #include "Biostrings_interface.h" const char* get_qualityless_classname(SEXP object) { const char *classname = get_classname(object); const char *outputClassname; if (strcmp(classname, "QualityScaledBStringSet") == 0) { outputClassname = "BStringSet"; } else if (strcmp(classname, "QualityScaledDNAStringSet") == 0) { outputClassname = "DNAStringSet"; } else if (strcmp(classname, "QualityScaledRNAStringSet") == 0) { outputClassname = "RNAStringSet"; } else { outputClassname = classname; } return outputClassname; } /* * --- .Call ENTRY POINT --- */ SEXP PairwiseAlignments_nmatch(SEXP nchar, SEXP nmismatch, SEXP ninsertion, SEXP ndeletion) { int ans_len, i, *ans_elt; const int *nchar_elt, *nmismatch_elt, *ninsertion_elt, *ndeletion_elt; SEXP ans; ans_len = LENGTH(nchar); PROTECT(ans = NEW_INTEGER(ans_len)); for (i = 0, nchar_elt = INTEGER(nchar), nmismatch_elt = INTEGER(nmismatch), ninsertion_elt = INTEGER(ninsertion), ndeletion_elt = INTEGER(ndeletion), ans_elt = INTEGER(ans); i < ans_len; i++, nchar_elt++, nmismatch_elt++, ninsertion_elt++, ndeletion_elt++, ans_elt++) { *ans_elt = *nchar_elt - *nmismatch_elt - *ninsertion_elt - *ndeletion_elt; } UNPROTECT(1); return ans; } SEXP AlignedXStringSet_nchar(SEXP alignedXStringSet) { SEXP range = GET_SLOT(alignedXStringSet, install("range")); int numberOfAlignments = get_IRanges_length(range); SEXP indel = GET_SLOT(alignedXStringSet, install("indel")); CompressedIRangesList_holder indel_holder = hold_CompressedIRangesList(indel); SEXP output; PROTECT(output = NEW_INTEGER(numberOfAlignments)); int i, j, *outputPtr; const int *rangeWidth; for (i = 0, rangeWidth = INTEGER(get_IRanges_width(range)), outputPtr = INTEGER(output); i < numberOfAlignments; i++, rangeWidth++, outputPtr++) { IRanges_holder indelElement = get_elt_from_CompressedIRangesList_holder(&indel_holder, i); int numberOfIndels = get_length_from_IRanges_holder(&indelElement); *outputPtr = *rangeWidth; for (j = 0; j < numberOfIndels; j++) *outputPtr += get_width_elt_from_IRanges_holder(&indelElement, j); } UNPROTECT(1); return output; } SEXP AlignedXStringSet_align_aligned(SEXP alignedXStringSet, SEXP gapCode) { int i, j, k; char gapCodeValue = (char) RAW(gapCode)[0]; SEXP unaligned = GET_SLOT(alignedXStringSet, install("unaligned")); XStringSet_holder unaligned_holder = hold_XStringSet(unaligned); SEXP range = GET_SLOT(alignedXStringSet, install("range")); int numberOfAlignments = get_IRanges_length(range); SEXP indel = GET_SLOT(alignedXStringSet, install("indel")); CompressedIRangesList_holder indel_holder = hold_CompressedIRangesList(indel); const char *stringSetClass = get_qualityless_classname(unaligned); const char *stringClass = get_List_elementType(unaligned); int numberOfStrings = get_XStringSet_length(unaligned); SEXP output; SEXP alignedRanges, alignedStart, alignedWidth; PROTECT(alignedWidth = AlignedXStringSet_nchar(alignedXStringSet)); PROTECT(alignedStart = NEW_INTEGER(LENGTH(alignedWidth))); int totalNChars = 0; const int *width_i; int *start_i; for (i = 0, start_i = INTEGER(alignedStart), width_i = INTEGER(alignedWidth); i < LENGTH(alignedWidth); i++, start_i++, width_i++) { *start_i = 1 + totalNChars; totalNChars += *width_i; } SEXP alignedStringTag; PROTECT(alignedStringTag = NEW_RAW(totalNChars)); PROTECT(alignedRanges = new_IRanges("IRanges", alignedStart, alignedWidth, R_NilValue)); char *alignedStringPtr = (char *) RAW(alignedStringTag); PROTECT(output = new_XRawList_from_tag(stringSetClass, stringClass, alignedStringTag, alignedRanges)); int stringIncrement = (numberOfStrings == 1 ? 0 : 1); int index = 0, stringElement = 0; const int *rangeStart, *rangeWidth; for (i = 0, rangeStart = INTEGER(get_IRanges_start(range)), rangeWidth = INTEGER(get_IRanges_width(range)); i < numberOfAlignments; i++, rangeStart++, rangeWidth++) { Chars_holder origString = get_elt_from_XStringSet_holder(&unaligned_holder, stringElement); char *origStringPtr = (char *) (origString.ptr + (*rangeStart - 1)); IRanges_holder indelElement = get_elt_from_CompressedIRangesList_holder(&indel_holder, i); int numberOfIndel = get_length_from_IRanges_holder(&indelElement); if (numberOfIndel == 0) { memcpy(&alignedStringPtr[index], origStringPtr, *rangeWidth * sizeof(char)); index += *rangeWidth; } else { int prevStart = 0; for (j = 0; j < numberOfIndel; j++) { int currStart = get_start_elt_from_IRanges_holder(&indelElement, j) - 1; int currWidth = get_width_elt_from_IRanges_holder(&indelElement, j); int copyElements = currStart - prevStart; if (copyElements > 0) { memcpy(&alignedStringPtr[index], origStringPtr, copyElements * sizeof(char)); index += copyElements; origStringPtr += copyElements; } for (k = 0; k < currWidth; k++) { alignedStringPtr[index] = gapCodeValue; index++; } prevStart = currStart; } int copyElements = *rangeWidth - prevStart; memcpy(&alignedStringPtr[index], origStringPtr, copyElements * sizeof(char)); index += copyElements; } stringElement += stringIncrement; } UNPROTECT(5); return output; } SEXP PairwiseAlignmentsSingleSubject_align_aligned(SEXP alignment, SEXP gapCode, SEXP endgapCode) { int i, j, k; char gapCodeValue = (char) RAW(gapCode)[0]; char endgapCodeValue = (char) RAW(endgapCode)[0]; SEXP pattern = GET_SLOT(alignment, install("pattern")); SEXP unalignedPattern = GET_SLOT(pattern, install("unaligned")); XStringSet_holder unalignedPattern_holder = hold_XStringSet(unalignedPattern); SEXP rangePattern = GET_SLOT(pattern, install("range")); SEXP namesPattern = get_IRanges_names(rangePattern); SEXP indelPattern = GET_SLOT(pattern, install("indel")); CompressedIRangesList_holder indelPattern_holder = hold_CompressedIRangesList(indelPattern); SEXP subject = GET_SLOT(alignment, install("subject")); SEXP rangeSubject = GET_SLOT(subject, install("range")); SEXP indelSubject = GET_SLOT(subject, install("indel")); CompressedIRangesList_holder indelSubject_holder = hold_CompressedIRangesList(indelSubject); const char *stringSetClass = get_qualityless_classname(unalignedPattern); const char *stringClass = get_List_elementType(unalignedPattern); int numberOfAlignments = get_IRanges_length(rangePattern); int numberOfChars = INTEGER(get_XStringSet_width(GET_SLOT(subject, install("unaligned"))))[0]; SEXP output; SEXP mappedRanges, mappedStart, mappedWidth; PROTECT(mappedWidth = NEW_INTEGER(numberOfAlignments)); PROTECT(mappedStart = NEW_INTEGER(numberOfAlignments)); int totalNChars = numberOfAlignments * numberOfChars; int *width_i, *start_i; for (i = 0, start_i = INTEGER(mappedStart), width_i = INTEGER(mappedWidth); i < numberOfAlignments; i++, start_i++, width_i++) { *start_i = i * numberOfChars + 1; *width_i = numberOfChars; } SEXP mappedStringTag; PROTECT(mappedStringTag = NEW_RAW(totalNChars)); PROTECT(mappedRanges = new_IRanges("IRanges", mappedStart, mappedWidth, namesPattern)); char *mappedStringPtr = (char *) RAW(mappedStringTag); PROTECT(output = new_XRawList_from_tag(stringSetClass, stringClass, mappedStringTag, mappedRanges)); int index = 0; const int *rangeStartPattern, *rangeWidthPattern, *rangeStartSubject, *rangeWidthSubject; for (i = 0, rangeStartPattern = INTEGER(get_IRanges_start(rangePattern)), rangeWidthPattern = INTEGER(get_IRanges_width(rangePattern)), rangeStartSubject = INTEGER(get_IRanges_start(rangeSubject)), rangeWidthSubject = INTEGER(get_IRanges_width(rangeSubject)); i < numberOfAlignments; i++, rangeStartPattern++, rangeWidthPattern++, rangeStartSubject++, rangeWidthSubject++) { Chars_holder origString = get_elt_from_XStringSet_holder(&unalignedPattern_holder, i); char *origStringPtr = (char *) (origString.ptr + (*rangeStartPattern - 1)); IRanges_holder indelElementPattern = get_elt_from_CompressedIRangesList_holder(&indelPattern_holder, i); IRanges_holder indelElementSubject = get_elt_from_CompressedIRangesList_holder(&indelSubject_holder, i); int numberOfIndelPattern = get_length_from_IRanges_holder(&indelElementPattern); int numberOfIndelSubject = get_length_from_IRanges_holder(&indelElementSubject); for (j = 0; j < *rangeStartSubject - 1; j++) { mappedStringPtr[index] = endgapCodeValue; index++; } int jPattern = 1, jp = 0, js = 0; int indelStartPattern, indelWidthPattern, indelStartSubject, indelWidthSubject; if (numberOfIndelPattern > 0) { indelStartPattern = get_start_elt_from_IRanges_holder(&indelElementPattern, jp); indelWidthPattern = get_width_elt_from_IRanges_holder(&indelElementPattern, jp); } if (numberOfIndelSubject > 0) { indelStartSubject = get_start_elt_from_IRanges_holder(&indelElementSubject, js); indelWidthSubject = get_width_elt_from_IRanges_holder(&indelElementSubject, js); } for (j = 1; j <= *rangeWidthSubject; j++) { if ((numberOfIndelSubject == 0) || (j < indelStartSubject)) { if ((numberOfIndelPattern == 0) || (jPattern < indelStartPattern)) { mappedStringPtr[index] = *origStringPtr; index++; origStringPtr++; jPattern++; } else { for (k = 0; k < indelWidthPattern; k++) { mappedStringPtr[index] = gapCodeValue; index++; } j += indelWidthPattern - 1; jp++; indelStartPattern = get_start_elt_from_IRanges_holder(&indelElementPattern, jp); indelWidthPattern = get_width_elt_from_IRanges_holder(&indelElementPattern, jp); numberOfIndelPattern--; } } else { origStringPtr += indelWidthSubject; jPattern += indelWidthSubject; j--; js++; indelStartSubject = get_start_elt_from_IRanges_holder(&indelElementSubject, js); indelWidthSubject = get_width_elt_from_IRanges_holder(&indelElementSubject, js); numberOfIndelSubject--; } } for (j = *rangeStartSubject + (*rangeWidthSubject - 1); j < numberOfChars; j++) { mappedStringPtr[index] = endgapCodeValue; index++; } } UNPROTECT(5); return(output); } SEXP align_compareStrings(SEXP patternStrings, SEXP subjectStrings, SEXP maxNChar, SEXP insertionCode, SEXP deletionCode, SEXP mismatchCode) { char insertionChar = CHAR(STRING_ELT(insertionCode, 0))[0]; char deletionChar = CHAR(STRING_ELT(deletionCode, 0))[0]; char mismatchChar = CHAR(STRING_ELT(mismatchCode, 0))[0]; int numberOfStrings = LENGTH(patternStrings); char *outputPtr = (char *) R_alloc((long) (INTEGER(maxNChar)[0] + 1), sizeof(char)); SEXP output; PROTECT(output = NEW_CHARACTER(numberOfStrings)); int i, j; char *output_j; const char *subject_j; for (i = 0; i < numberOfStrings; i++) { const char *patternPtr = (char *) CHAR(STRING_ELT(patternStrings, i)); const char *subjectPtr = (char *) CHAR(STRING_ELT(subjectStrings, i)); int numberOfChars = strlen(patternPtr); memcpy(outputPtr, patternPtr, numberOfChars * sizeof(char)); outputPtr[numberOfChars] = '\0'; for (j = 0, output_j = outputPtr, subject_j = subjectPtr; j < numberOfChars; j++, output_j++, subject_j++) { if (*output_j != deletionChar) { if (*subject_j == deletionChar) { *output_j = insertionChar; } else if (*subject_j != *output_j) { *output_j = mismatchChar; } } } SET_STRING_ELT(output, i, mkChar(outputPtr)); } UNPROTECT(1); return(output); } pwalign/src/Biostrings_stubs.c0000644000175100017510000000003714614311433017531 0ustar00biocbuildbiocbuild#include "_Biostrings_stubs.c" pwalign/src/IRanges_stubs.c0000644000175100017510000000003414614311433016733 0ustar00biocbuildbiocbuild#include "_IRanges_stubs.c" pwalign/src/pwalign.h0000644000175100017510000000233414614311433015636 0ustar00biocbuildbiocbuild#ifndef _PWALIGN_H_ #define _PWALIGN_H_ #include /* align_utils.c */ SEXP PairwiseAlignments_nmatch( SEXP nchar, SEXP nmismatch, SEXP ninsertion, SEXP ndeletion ); SEXP AlignedXStringSet_nchar(SEXP alignedXStringSet); SEXP AlignedXStringSet_align_aligned( SEXP alignedXStringSet, SEXP gapCode ); SEXP PairwiseAlignmentsSingleSubject_align_aligned( SEXP alignment, SEXP gapCode, SEXP endgapCode ); SEXP align_compareStrings( SEXP patternStrings, SEXP subjectStrings, SEXP maxNChar, SEXP insertionCode, SEXP deletionCode, SEXP mismatchCode ); /* align_pairwiseAlignment.c */ SEXP XStringSet_align_pairwiseAlignment( SEXP pattern, SEXP subject, SEXP type, SEXP typeCode, SEXP scoreOnly, SEXP gapOpening, SEXP gapExtension, SEXP useQuality, SEXP substitutionArray, SEXP substitutionArrayDim, SEXP substitutionLookupTable, SEXP fuzzyMatrix, SEXP fuzzyMatrixDim, SEXP fuzzyLookupTable ); SEXP XStringSet_align_distance( SEXP string, SEXP type, SEXP typeCode, SEXP gapOpening, SEXP gapExtension, SEXP useQuality, SEXP substitutionArray, SEXP substitutionArrayDim, SEXP substitutionLookupTable, SEXP fuzzyMatrix, SEXP fuzzyMatrixDim, SEXP fuzzyLookupTable ); #endif /* _PWALIGN_H_ */ pwalign/src/R_init_pairwiseAlignment.c0000644000175100017510000000131014614311433021147 0ustar00biocbuildbiocbuild#include "pwalign.h" #define CALLMETHOD_DEF(fun, numArgs) {#fun, (DL_FUNC) &fun, numArgs} static const R_CallMethodDef callMethods[] = { /* align_utils.c */ CALLMETHOD_DEF(PairwiseAlignments_nmatch, 4), CALLMETHOD_DEF(AlignedXStringSet_nchar, 1), CALLMETHOD_DEF(AlignedXStringSet_align_aligned, 2), CALLMETHOD_DEF(PairwiseAlignmentsSingleSubject_align_aligned, 3), CALLMETHOD_DEF(align_compareStrings, 6), /* align_pairwiseAlignment.c */ CALLMETHOD_DEF(XStringSet_align_pairwiseAlignment, 14), CALLMETHOD_DEF(XStringSet_align_distance, 12), {NULL, NULL, 0} }; void R_init_pwalign(DllInfo *info) { R_registerRoutines(info, NULL, callMethods, NULL, NULL); R_useDynamicSymbols(info, 0); return; } pwalign/src/S4Vectors_stubs.c0000644000175100017510000000003614614311433017241 0ustar00biocbuildbiocbuild#include "_S4Vectors_stubs.c" pwalign/src/XVector_stubs.c0000644000175100017510000000003414614311433016775 0ustar00biocbuildbiocbuild#include "_XVector_stubs.c" pwalign/tests/0000755000175100017510000000000014614311433014375 5ustar00biocbuildbiocbuildpwalign/tests/run_unitTests.R0000644000175100017510000000011714614311433017405 0ustar00biocbuildbiocbuildrequire("pwalign") || stop("unable to load pwalign package") pwalign:::.test() pwalign/TODO0000644000175100017510000000213514614311433013724 0ustar00biocbuildbiocbuildTODO list --------- - pairwiseAlignment() is broken on BString objects: pairwiseAlignment(BString("xxYYxx"), BString("YY")) # Global PairwiseAlignmentsSingleSubject (1 of 1) # pattern: xxYYxx # subject: --YY-- # score: -34.03641 pairwiseAlignment(BString("YYYYYY"), BString("YY")) # Global PairwiseAlignmentsSingleSubject (1 of 1) # pattern: --YYYYYY # subject: YY------ # score: -52 This 2nd alignment deosn't look like the best one. Furthermore, aligning the 2 patterns above with a single call to pairwiseAlignment() gives a different alignment for the 2nd pattern: pa <- pairwiseAlignment(BStringSet(c("xxYYxx", "YYYYYY")), BString("YY")) pa[1] # Global PairwiseAlignmentsSingleSubject (1 of 1) # pattern: xxYYxx # subject: --YY-- # score: -34.03641 pa[2] # Global PairwiseAlignmentsSingleSubject (1 of 1) # pattern: YYYYYY # subject: Y----Y # score: -24.03641 This 2nd alignment looks correct now. - Extend pairwiseAlignment to return the set of maximum alignments, rather than just one element from that set. pwalign/vignettes/0000755000175100017510000000000014614351210015240 5ustar00biocbuildbiocbuildpwalign/vignettes/PairwiseAlignments.Rnw0000644000175100017510000015023414614311433021545 0ustar00biocbuildbiocbuild%\VignetteIndexEntry{Pairwise Sequence Alignments} %\VignetteKeywords{DNA, RNA, Sequence, Biostrings, Sequence alignment} %\VignettePackage{pwalign} % % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \textwidth=6.5in \textheight=8.5in %\parskip=.3cm \oddsidemargin=-.1in \evensidemargin=-.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\R}{{\textsf{R}}} \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\term}[1]{{\emph{#1}}} \newcommand{\Rpackage}[1]{\textsf{#1}} \newcommand{\Rfunction}[1]{\texttt{#1}} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Pairwise Sequence Alignments} \author{Patrick Aboyoun \\ Gentleman Lab \\ Fred Hutchinson Cancer Research Center \\ Seattle, WA} \date{\today} \maketitle \tableofcontents \section{Introduction} In this document we illustrate how to perform pairwise sequence alignments using the \Rpackage{pwalign} package's central function \Rfunction{pairwiseAlignment}. This function aligns a set of \Rfunarg{pattern} strings to a \Rfunarg{subject} string in a global, local, or overlap (ends-free) fashion with or without affine gaps using either a fixed or quality-based substitution scoring scheme. This function's computation time is proportional to the product of the two string lengths being aligned. \section{Pairwise Sequence Alignment Problems} The (Needleman-Wunsch) global, the (Smith-Waterman) local, and (ends-free) overlap pairwise sequence alignment problems are described as follows. Let string $S_i$ have $n_i$ characters $c_{(i,j)}$ with $j \in \left\{1, \ldots, n_i\right\}$. A pairwise sequence alignment is a mapping of strings $S_1$ and $S_2$ to gapped substrings ${S'}_1$ and ${S'}_2$ that are defined by \begin{eqnarray*} {S'}_1 & = & g_{\left(1,a_1\right)}c_{\left(1,a_1\right)} \cdots g_{\left(1,b_1\right)}c_{\left(1,b_1\right)}g_{\left(1,b_1+1\right)}\\ {S'}_2 & = & g_{\left(2,a_2\right)}c_{\left(2,a_2\right)} \cdots g_{\left(2,b_2\right)}c_{\left(2,b_2\right)}g_{\left(2,b_2+1\right)} \end{eqnarray*} \begin{tabbing} where \= \\ \> $a_i, b_i \in \{1, \ldots, n_i\}$ with $a_i \leq b_i$ \\ \> $g_{(i,j)} = 0$ or more gaps at the specified position $j$ for aligned string $i$ \\ \> $length({S'}_1) = length({S'}_2)$ \end{tabbing} Each of these pairwise sequence alignment problems is solved by maximizing the alignment \textit{score}. An alignment score is determined by the type of pairwise sequence alignment (global, local, overlap), which sets the $[a_i, b_i]$ ranges for the substrings; the substitution scoring scheme, which sets the distance between aligned characters; and the gap penalties, which is divided into opening and extension components. The optimal pairwise sequence alignment is the pairwise sequence alignment with the largest score for the specified alignment type, substitution scoring scheme, and gap penalties. The pairwise sequence alignment types, substitution scoring schemes, and gap penalties influence alignment scores in the following manner: \begin{description} \item{Pairwise Sequence Alignment Types: } The type of pairwise sequence alignment determines the substring ranges to apply the substitution scoring and gap penalty schemes. For the three primary (global, local, overlap) and two derivative (subject overlap, pattern overlap) pairwise sequence alignment types, the resulting substring ranges are as follows: \begin{description} \item{Global - } $[a_1, b_1] = [1, n_1]$ and $[a_2, b_2] = [1, n_2]$ \item{Local - } $[a_1, b_1]$ and $[a_2, b_2]$ \item{Overlap - } $\left\{[a_1, b_1] = [a_1, n_1], [a_2, b_2] = [1, b_2]\right\}$ or $\left\{[a_1, b_1] = [1, b_1], [a_2, b_2] = [a_2, n_2]\right\}$ \item{Subject Overlap - } $[a_1, b_1] = [1, n_1]$ and $[a_2, b_2]$ \item{Pattern Overlap - } $[a_1, b_1]$ and $[a_2, b_2] = [1, n_2]$ \end{description} \item{Substitution Scoring Schemes: } The substitution scoring scheme sets the values for the aligned character pairings within the substring ranges determined by the type of pairwise sequence alignment. This scoring scheme can be fixed for character pairings or quality-dependent for character pairings. (Characters that align with a gap are penalized according to the ``Gap Penalty'' framework.) \begin{description} \item{Fixed substitution scoring - } Fixed substitution scoring schemes associate each aligned character pairing with a value. These schemes are very common and include awarding one value for a match and another for a mismatch, Point Accepted Mutation (PAM) matrices, and Block Substitution Matrix (BLOSUM) matrices. \item{Quality-based substitution scoring - } Quality-based substitution scoring schemes derive the value for the aligned character pairing based on the probabilities of character recording errors \cite{Malde:2008}. Let $\epsilon_i$ be the probability of a character recording error. Assuming independence within and between recordings and a uniform background frequency of the different characters, the combined error probability of a mismatch when the underlying characters do match is $\epsilon_c = \epsilon_1 + \epsilon_2 - (n/(n-1)) * \epsilon_1 * \epsilon_2$, where $n$ is the number of characters in the underlying alphabet (e.g. in DNA and RNA, $n = 4$). Using $\epsilon_c$, the substitution score is given by $b * \log_2(\gamma_{(x,y)} * (1 - \epsilon_c) * n + (1 - \gamma_{(x,y)}) * \epsilon_c * (n/(n-1)))$, where $b$ is the bit-scaling for the scoring and $\gamma_{(x,y)}$ is the probability that characters $x$ and $y$ represents the same underlying letters (e.g. using IUPAC, $\gamma_{(A,A)} = 1$ and $\gamma_{(A,N)} = 1/4$). \end{description} \item{Gap Penalties: } Gap penalties are the values associated with the gaps within the substring ranges determined by the type of pairwise sequence alignment. These penalties are divided into \textit{gap opening} and \textit{gap extension} components, where the gap opening penalty is the cost for adding a new gap and the gap extension penalty is the incremental cost incurred along the length of the gap. A \textit{constant gap penalty} occurs when there is a cost associated with opening a gap, but no cost for the length of a gap (i.e. gap extension is zero). A \textit{linear gap penalty} occurs when there is no cost associated for opening a gap (i.e. gap opening is zero), but there is a cost for the length of the gap. An \textit{affine gap penalty} occurs when both the gap opening and gap extension have a non-zero associated cost. \end{description} \section{Main Pairwise Sequence Alignment Function} The \Rfunction{pairwiseAlignment} function solves the pairwise sequence alignment problems mentioned above. It aligns one or more strings specified in the \Rfunarg{pattern} argument with a single string specified in the \Rfunarg{subject} argument. <>= options(width=72) @ <>= library(pwalign) pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede") @ The type of pairwise sequence alignment is set by specifying the \Rfunarg{type} argument to be one of \texttt{"global"}, \texttt{"local"}, \texttt{"overlap"}, \texttt{"global-local"}, and \texttt{"local-global"}. <>= pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", type = "local") @ The gap penalties are regulated by the \Rfunarg{gapOpening} and \Rfunarg{gapExtension} arguments. <>= pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", gapOpening = 0, gapExtension = 1) @ The substitution scoring scheme is set using three arguments, two of which are quality-based related (\Rfunarg{patternQuality}, \Rfunarg{subjectQuality}) and one is fixed substitution related (\Rfunarg{substitutionMatrix}). When the substitution scores are fixed by character pairing, the \Rfunarg{substituionMatrix} argument takes a matrix with the appropriate alphabets as dimension names. The \Rfunction{nucleotideSubstitutionMatrix} function tranlates simple match and mismatch scores to the full spectrum of IUPAC nucleotide codes. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1) @ When the substitution scores are quality-based, the \Rfunarg{patternQuality} and \Rfunarg{subjectQuality} arguments represent the equivalent of $[x-99]$ numeric quality values for the respective strings, and the optional \Rfunarg{fuzzyMatrix} argument represents how the closely two characters match on a $[0,1]$ scale. The \Rfunarg{patternQuality} and \Rfunarg{subjectQuality} arguments accept quality measures in either a \Rclass{PhredQuality}, \Rclass{SolexaQuality}, or \Rclass{IlluminaQuality} scaling. For \Rclass{PhredQuality} and \Rclass{IlluminaQuality} measures $Q \in [0, 99]$, the probability of an error in the base read is given by $10^{-Q/10}$ and for \Rclass{SolexaQuality} measures $Q \in [-5, 99]$, they are given by $1 - 1/(1 + 10^{-Q/10})$. The \Rfunction{qualitySubstitutionMatrices} function maps the \Rfunarg{patternQuality} and \Rfunarg{subjectQuality} scores to match and mismatch penalties. These three arguments will be demonstrated in later sections. The final argument, \Rfunarg{scoreOnly}, to the \Rfunction{pairwiseAlignment} function accepts a logical value to specify whether or not to return just the pairwise sequence alignment score. If \Rfunarg{scoreOnly} is \Robject{FALSE}, the pairwise alignment with the maximum alignment score is returned. If more than one pairwise alignment has the maximum alignment score exists, the first alignment along the subject is returned. If there are multiple pairwise alignments with the maximum alignment score at the chosen subject location, then at each location along the alignment mismatches are given preference to insertions/deletions. For example, \code{pattern: [1] ATTA; subject: [1] AT-A} is chosen above \code{pattern: [1] ATTA; subject: [1] A-TA} if they both have the maximum alignment score. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) @ \subsection{Exercise 1} \begin{enumerate} \item Using \Rfunction{pairwiseAlignment}, fit the global, local, and overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} using the default settings. \item Do any of the alignments change if the \Rfunarg{gapExtension} argument is set to \Robject{-Inf}? \end{enumerate} [Answers provided in section \ref{sec:Answers1}.] \section{Pairwise Sequence Alignment Classes} Following the design principles of Bioconductor and R, the pairwise sequence alignment functionality in the \Rpackage{pwalign} package keeps the end user close to their data through the use of five specialty classes: \Rclass{PairwiseAlignments}, \Rclass{PairwiseAlignmentsSingleSubject}, \Rclass{PairwiseAlignmentsSingleSubjectSummary}, \Rclass{AlignedXStringSet}, and \Rclass{QualityAlignedXStringSet}. The \Rclass{PairwiseAlignmentsSingleSubject} class inherits from the \Rclass{PairwiseAlignments} class and they both hold the results of a fit from the \Rfunction{pairwiseAlignment} function, with the former class being used to represent all patterns aligning to a single subject and the latter being used to represent elementwise alignments between a set of patterns and a set of subjects. <>= pa1 <- pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede") class(pa1) @ and the \Rfunction{pairwiseAlignmentSummary} function holds the results of a summarized pairwise sequence alignment. <>= summary(pa1) class(summary(pa1)) @ The \Rclass{AlignedXStringSet} and \Rclass{QualityAlignedXStringSet} classes hold the ``gapped'' ${S'}_i$ substrings with the former class holding the results when the pairwise sequence alignment is performed with a fixed substitution scoring scheme and the latter class a quality-based scoring scheme. <>= class(pattern(pa1)) submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pa2 <- pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1) class(pattern(pa2)) @ \subsection{Exercise 2} \begin{enumerate} \item What is the primary benefit of formal summary classes like \Rclass{PairwiseAlignmentsSingleSubjectSummary} and \Rclass{summary.lm} to end users? \end{enumerate} [Answer provided in section \ref{sec:Answers2}.] \section{Pairwise Sequence Alignment Helper Functions} Tables \ref{table:helperfuns1}, \ref{table:helperfuns1} and \ref{table:alignfuns} show functions that interact with objects of class \Rclass{PairwiseAlignments}, \Rclass{PairwiseAlignmentsSingleSubject}, and \Rclass{AlignedXStringSet}. These functions should be used in preference to direct slot extraction from the alignment objects. \begin{table}[ht] \begin{center} \begin{tabular}{l|l} \hline Function & Description \\ \hline \Rfunction{[} & Extracts the specified elements of the alignment object \\ \Rfunction{alphabet} & Extracts the allowable characters in the original strings \\ \Rfunction{compareStrings} & Creates character string mashups of the alignments \\ \Rfunction{deletion} & Extracts the locations of the gaps inserted into the pattern for the alignments \\ \Rfunction{length} & Extracts the number of patterns aligned \\ \Rfunction{mismatchTable} & Creates a table for the mismatching positions \\ \Rfunction{nchar} & Computes the length of ``gapped'' substrings \\ \Rfunction{nedit} & Computes the Levenshtein edit distance of the alignments \\ \Rfunction{indel} & Extracts the locations of the insertion \& deletion gaps in the alignments \\ \Rfunction{insertion} & Extracts the locations of the gaps inserted into the subject for the alignments \\ \Rfunction{nindel} & Computes the number of insertions \& deletions in the alignments \\ \Rfunction{nmatch} & Computes the number of matching characters in the alignments \\ \Rfunction{nmismatch} & Computes the number of mismatching characters in the alignments \\ \Rfunction{pattern}, \Rfunction{subject} & Extracts the aligned pattern/subject \\ \Rfunction{pid} & Computes the percent sequence identity \\ \Rfunction{rep} & Replicates the elements of the alignment object \\ \Rfunction{score} & Extracts the pairwise sequence alignment scores \\ \Rfunction{type} & Extracts the type of pairwise sequence alignment \\ \hline \end{tabular} \end{center} \caption{Functions for \Rclass{PairwiseAlignments} and \Rclass{PairwiseAlignmentsSingleSubject} objects.} \label{table:helperfuns1} \end{table} \begin{table}[ht] \begin{center} \begin{tabular}{l|l} \hline Function & Description \\ \hline \Rfunction{aligned} & Creates an \Rclass{XStringSet} containing either ``filled-with-gaps'' or degapped aligned strings \\ \Rfunction{as.character} & Creates a character vector version of \Rfunction{aligned} \\ \Rfunction{as.matrix} & Creates an ``exploded" character matrix version of \Rfunction{aligned} \\ \Rfunction{consensusMatrix} & Computes a consensus matrix for the alignments \\ \Rfunction{consensusString} & Creates the string based on a 50\% + 1 vote from the consensus matrix \\ \Rfunction{coverage} & Computes the alignment coverage along the subject \\ \Rfunction{mismatchSummary} & Summarizes the information of the \Rfunction{mismatchTable} \\ \Rfunction{summary} & Summarizes a pairwise sequence alignment \\ \Rfunction{toString} & Creates a concatenated string version of \Rfunction{aligned} \\ \Rfunction{Views} & Creates an \Rclass{XStringViews} representing the aligned region along the subject \\ \hline \end{tabular} \end{center} \caption{Additional functions for \Rclass{PairwiseAlignmentsSingleSubject} objects.} \label{table:helperfuns2} \end{table} The \Rfunction{score}, \Rfunction{nedit}, \Rfunction{nmatch}, \Rfunction{nmismatch}, and \Rfunction{nchar} functions return numeric vectors containing information on the pairwise sequence alignment score, number of matches, number of mismatches, and number of aligned characters respectively. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 pa2 <- pairwiseAlignment(pattern = c("succeed", "precede"), subject = "supersede", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1) score(pa2) nedit(pa2) nmatch(pa2) nmismatch(pa2) nchar(pa2) aligned(pa2) as.character(pa2) as.matrix(pa2) consensusMatrix(pa2) @ The \Rfunction{summary}, \Rfunction{mismatchTable}, and \Rfunction{mismatchSummary} functions return various summaries of the pairwise sequence alignments. <>= summary(pa2) mismatchTable(pa2) mismatchSummary(pa2) @ \begin{table}[ht] \begin{center} \begin{tabular}{l|l} \hline Function & Description \\ \hline \Rfunction{[} & Extracts the specified elements of the alignment object \\ \Rfunction{aligned}, \Rfunction{unaligned} & Extracts the aligned/unaligned strings \\ \Rfunction{alphabet} & Extracts the allowable characters in the original strings \\ \Rfunction{as.character}, \Rfunction{toString} & Converts the alignments to character strings \\ \Rfunction{coverage} & Computes the alignment coverage \\ \Rfunction{end} & Extracts the ending index of the aligned range \\ \Rfunction{indel} & Extracts the insertion/deletion locations \\ \Rfunction{length} & Extracts the number of patterns aligned \\ \Rfunction{mismatch} & Extracts the position of the mismatches \\ \Rfunction{mismatchSummary} & Summarizes the information of the \Rfunction{mismatchTable} \\ \Rfunction{mismatchTable} & Creates a table for the mismatching positions \\ \Rfunction{nchar} & Computes the length of ``gapped'' substrings \\ \Rfunction{nindel} & Computes the number of insertions/deletions in the alignments \\ \Rfunction{nmismatch} & Computes the number of mismatching characters in the alignments \\ \Rfunction{rep} & Replicates the elements of the alignment object \\ \Rfunction{start} & Extracts the starting index of the aligned range \\ \Rfunction{toString} & Creates a concatenated string containing the alignments \\ \Rfunction{width} & Extracts the width of the aligned range \\ \hline \end{tabular} \end{center} \caption{Functions for \Rclass{AlignedXString} and \Rclass{QualityAlignedXString} objects.} \label{table:alignfuns} \end{table} The \Rfunction{pattern} and \Rfunction{subject} functions extract the aligned pattern and subject objects for further analysis. Most of the actions that can be performed on \Rclass{PairwiseAlignments} objects can also be performed on \Rclass{AlignedXStringSet} and \Rclass{QualityAlignedXStringSet} objects as well as operations including \Rfunction{start}, \Rfunction{end}, and \Rfunction{width} that extracts the start, end, and width of the alignment ranges. <>= class(pattern(pa2)) aligned(pattern(pa2)) nindel(pattern(pa2)) start(subject(pa2)) end(subject(pa2)) @ \subsection{Exercise 3} For the overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} with the \Rfunction{pairwiseAlignment} default settings, perform the following operations: \begin{enumerate} \item Use \Rfunction{nmatch} and \Rfunction{nmismath} to extract the number of matches and mismatches respectively. \item Use the \Rfunction{compareStrings} function to get the symbolic representation of the alignment. \item Use the \Rfunction{as.character} function to the get the character string versions of the alignments. \item Use the \Rfunction{pattern} function to extract the aligned pattern and apply the \Rfunction{mismatch} function to it to find the locations of the mismatches. \item Use the \Rfunction{subject} function to extract the aligned subject and apply the \Rfunction{aligned} function to it to get the aligned strings. \end{enumerate} [Answers provided in section \ref{sec:Answers3}.] \section{Edit Distances} One of the earliest uses of pairwise sequence alignment is in the area of text analysis. In 1965 Vladimir Levenshtein considered a metric, now called the \textit{Levenshtein edit distance}, that measures the similarity between two strings. This distance metric is equivalent to the negative of the score of a pairwise sequence alignment with a match cost of 0, a mismatch cost of -1, a gap opening penalty of 0, and a gap extension penalty of 1. The \Rfunction{stringDist} uses the internals of the \Rfunction{pairwiseAlignment} function to calculate the Levenshtein edit distance matrix for a set of strings. There is also an implementation of approximate string matching using Levenshtein edit distance in the \Rfunction{agrep} (approximate grep) function of the \Rpackage{base} R package. As the following example shows, it is possible to replicate the \Rfunction{agrep} function using the \Rfunction{pairwiseAlignment} function. Since the \Rfunction{agrep} function is vectorized in \Rfunarg{x} rather than \Rfunarg{pattern}, these arguments are flipped in the call to \Rfunction{pairwiseAlignment}. <>= agrepBioC <- function(pattern, x, ignore.case = FALSE, value = FALSE, max.distance = 0.1) { if (!is.character(pattern)) pattern <- as.character(pattern) if (!is.character(x)) x <- as.character(x) if (max.distance < 1) max.distance <- ceiling(max.distance / nchar(pattern)) characters <- unique(unlist(strsplit(c(pattern, x), "", fixed = TRUE))) if (ignore.case) substitutionMatrix <- outer(tolower(characters), tolower(characters), function(x,y) -as.numeric(x!=y)) else substitutionMatrix <- outer(characters, characters, function(x,y) -as.numeric(x!=y)) dimnames(substitutionMatrix) <- list(characters, characters) distance <- - pairwiseAlignment(pattern = x, subject = pattern, substitutionMatrix = substitutionMatrix, type = "local-global", gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) whichClose <- which(distance <= max.distance) if (value) whichClose <- x[whichClose] whichClose } cbind(base = agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE), bioc = agrepBioC("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)) cbind(base = agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE), bioc = agrepBioC("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)) @ \subsection{Exercise 4} \begin{enumerate} \item Use the \Rfunction{pairwiseAlignment} function to find the Levenshtein edit distance between \Robject{"syzygy"} and \Robject{"zyzzyx"}. \item Use the \Rfunction{stringDist} function to find the Levenshtein edit distance for the vector \Robject{c("zyzzyx", "syzygy", "succeed", "precede", "supersede")}. \end{enumerate} [Answers provided in section \ref{sec:Answers4}.] \section{Application: Using Evolutionary Models in Protein Alignments} When proteins are believed to descend from a common ancestor, evolutionary models can be used as a guide in pairwise sequence alignments. The two most common families evolutionary models of proteins used in pairwise sequence alignments are Point Accepted Mutation (PAM) matrices, which are based on explicit evolutionary models, and Block Substitution Matrix (BLOSUM) matrices, which are based on data-derived evolution models. The \Rpackage{pwalign} package contains 5 PAM and 5 BLOSUM matrices (\Robject{PAM30} \Robject{PAM40}, \Robject{PAM70}, \Robject{PAM120}, \Robject{PAM250}, \Robject{BLOSUM45}, \Robject{BLOSUM50}, \Robject{BLOSUM62}, \Robject{BLOSUM80}, and \Robject{BLOSUM100}) that can be used in the \Rfunarg{substitutionMatrix} argument to the \Rfunction{pairwiseAlignment} function. Here is an example pairwise sequence alignment of amino acids from Durbin, Eddy et al being fit by the \Rfunction{pairwiseAlignment} function using the \Robject{BLOSUM50} matrix: <>= data(BLOSUM50) BLOSUM50[1:4,1:4] nwdemo <- pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"), substitutionMatrix = BLOSUM50, gapOpening = 0, gapExtension = 8) nwdemo compareStrings(nwdemo) pid(nwdemo) @ \subsection{Exercise 5} \begin{enumerate} \item Repeat the alignment exercise above using \Robject{BLOSUM62}, a gap opening penalty of 12, and a gap extension penalty of 4. \item Explore to find out what caused the alignment to change. \end{enumerate} [Answers provided in section \ref{sec:Answers5}.] \section{Application: Removing Adapters from Sequence Reads} Finding and removing uninteresting experiment process-related fragments like adapters is a common problem in genetic sequencing, and pairwise sequence alignment is well-suited to address this issue. When adapters are used to anchor or extend a sequence during the experiment process, they either intentionally or unintentionally become sequenced during the read process. The following code simulates what sequences with adapter fragments at either end could look like during an experiment. <>= simulateReads <- function(N, adapter, experiment, substitutionRate = 0.01, gapRate = 0.001) { chars <- strsplit(as.character(adapter), "")[[1]] sapply(seq_len(N), function(i, experiment, substitutionRate, gapRate) { width <- experiment[["width"]][i] side <- experiment[["side"]][i] randomLetters <- function(n) sample(DNA_ALPHABET[1:4], n, replace = TRUE) randomLettersWithEmpty <- function(n) sample(c("", DNA_ALPHABET[1:4]), n, replace = TRUE, prob = c(1 - gapRate, rep(gapRate/4, 4))) nChars <- length(chars) value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), chars), randomLettersWithEmpty(nChars), sep = "", collapse = "") if (side) value <- paste(c(randomLetters(36 - width), substring(value, 1, width)), sep = "", collapse = "") else value <- paste(c(substring(value, 37 - width, 36), randomLetters(36 - width)), sep = "", collapse = "") value }, experiment = experiment, substitutionRate = substitutionRate, gapRate = gapRate) } adapter <- DNAString("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") set.seed(123) N <- 1000 experiment <- list(side = rbinom(N, 1, 0.5), width = sample(0:36, N, replace = TRUE)) table(experiment[["side"]], experiment[["width"]]) adapterStrings <- simulateReads(N, adapter, experiment, substitutionRate = 0.01, gapRate = 0.001) adapterStrings <- DNAStringSet(adapterStrings) @ These simulated strings above have 0 to 36 characters from the adapters attached to either end. We can use completely random strings as a baseline for any pairwise sequence alignment methodology we develop to remove the adapter characters. <>= M <- 5000 randomStrings <- apply(matrix(sample(DNA_ALPHABET[1:4], 36 * M, replace = TRUE), nrow = M), 1, paste, collapse = "") randomStrings <- DNAStringSet(randomStrings) @ Since edit distances are easy to explain, it serves as a good place to start for developing a adapter removal methodology. Unfortunately given that it is based on a global alignment, it only is useful for filtering out sequences that are derived primarily from the adapter. <>= ## Method 1: Use edit distance with an FDR of 1e-03 submat1 <- nucleotideSubstitutionMatrix(match = 0, mismatch = -1, baseOnly = TRUE) randomScores1 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) quantile(randomScores1, seq(0.99, 1, by = 0.001)) adapterAligns1 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1) table(score(adapterAligns1) > quantile(randomScores1, 0.999), experiment[["width"]]) @ One improvement to removing adapters is to look at consecutive matches anywhere within the sequence. This is more versatile than the edit distance method, but it requires a relatively large number of consecutive matches and is susceptible to issues related to error related substitutions and insertions/deletions. <>= ## Method 2: Use consecutive matches anywhere in string with an FDR of 1e-03 submat2 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) randomScores2 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf, scoreOnly = TRUE) quantile(randomScores2, seq(0.99, 1, by = 0.001)) adapterAligns2 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf) table(score(adapterAligns2) > quantile(randomScores2, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(start(pattern(adapterAligns2)) > 37 - end(pattern(adapterAligns2)), experiment[["side"]]) @ Limiting consecutive matches to the ends provides better results, but it doesn't resolve the issues related to substitutions and insertions/deletions errors. <>= ## Method 3: Use consecutive matches on the ends with an FDR of 1e-03 submat3 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) randomScores3 <- pairwiseAlignment(randomStrings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf, scoreOnly = TRUE) quantile(randomScores3, seq(0.99, 1, by = 0.001)) adapterAligns3 <- pairwiseAlignment(adapterStrings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf) table(score(adapterAligns3) > quantile(randomScores3, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(adapterAligns3)) == 36, experiment[["side"]]) @ Allowing for substitutions and insertions/deletions errors in the pairwise sequence alignments provides much better results for finding adapter fragments. <>= ## Method 4: Allow mismatches and indels on the ends with an FDR of 1e-03 randomScores4 <- pairwiseAlignment(randomStrings, adapter, type = "overlap", scoreOnly = TRUE) quantile(randomScores4, seq(0.99, 1, by = 0.001)) adapterAligns4 <- pairwiseAlignment(adapterStrings, adapter, type = "overlap") table(score(adapterAligns4) > quantile(randomScores4, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(adapterAligns4)) == 36, experiment[["side"]]) @ Using the results that allow for substitutions and insertions/deletions errors, the cleaned sequence fragments can be generated as follows: <>= ## Method 4 continued: Remove adapter fragments fragmentFound <- score(adapterAligns4) > quantile(randomScores4, 0.999) fragmentFoundAt1 <- fragmentFound & (start(pattern(adapterAligns4)) == 1) fragmentFoundAt36 <- fragmentFound & (end(pattern(adapterAligns4)) == 36) cleanedStrings <- as.character(adapterStrings) cleanedStrings[fragmentFoundAt1] <- as.character(narrow(adapterStrings[fragmentFoundAt1], end = 36, width = 36 - end(pattern(adapterAligns4[fragmentFoundAt1])))) cleanedStrings[fragmentFoundAt36] <- as.character(narrow(adapterStrings[fragmentFoundAt36], start = 1, width = start(pattern(adapterAligns4[fragmentFoundAt36])) - 1)) cleanedStrings <- DNAStringSet(cleanedStrings) cleanedStrings @ \subsection{Exercise 6} \begin{enumerate} \item Rerun the simulation time using the \Rfunction{simulateReads} function with a \Rfunarg{substitutionRate} of 0.005 and \Rfunarg{gapRate} of 0.0005. How do the different pairwise sequence alignment methods compare? \item (Advanced) Modify the \Rfunction{simulateReads} function to accept different equal length adapters on either side (left \& right) of the reads. How would the methods for trimming the reads change? \end{enumerate} [Answers provided in section \ref{sec:Answers6}.] \section{Application: Quality Assurance in Sequencing Experiments} Due to its flexibility, the \Rfunction{pairwiseAlignment} function is able to diagnose sequence matching-related issues that arise when \Rfunction{matchPDict} and its related functions don't find a match. This section contains an example involving a short read Solexa sequencing experiment of bacteriophage $\phi$ X174 DNA produced by New England BioLabs (NEB). This experiment contains slightly less than 5000 unique short reads in \Robject{srPhiX174}, with quality measures in \Robject{quPhiX174}, and frequency for those short reads in \Robject{wtPhiX174}. In order to demonstrate how to find sequence differences in the target, these short reads will be compared against the bacteriophage $\phi$ X174 genome NC\_001422 from the GenBank database. <>= data(phiX174Phage) genBankPhage <- phiX174Phage[[1]] nchar(genBankPhage) data(srPhiX174) srPhiX174 quPhiX174 summary(wtPhiX174) fullShortReads <- rep(srPhiX174, wtPhiX174) srPDict <- PDict(fullShortReads) table(countPDict(srPDict, genBankPhage)) @ For these short reads, the \Rfunction{pairwiseAlignment} function finds that the small number of perfect matches is due to two locations on the bacteriophage $\phi$X174 genome. Unlike the \Rfunction{countPDict} function from the \Rpackage{Biostrings} package, the \Rfunction{pairwiseAlignment} function works off of the original strings, rather than \Rfunction{PDict} processed strings, and to be computationally efficient it is recommended that the unique sequences are supplied to the \Rfunction{pairwiseAlignment} function, and the frequencies of those sequences are supplied to the \Rfunarg{weight} argument of functions like \Rfunction{summary}, \Rfunction{mismatchSummary}, and \Rfunction{coverage}. For the purposes of this exercise, a substring of the GenBank bacteriophage $\phi$ X174 genome is supplied to the \Rfunarg{subject} argument of the \Rfunction{pairwiseAlignment} function to reduce the computation time. <>= genBankSubstring <- substring(genBankPhage, 2793-34, 2811+34) genBankAlign <- pairwiseAlignment(srPhiX174, genBankSubstring, patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") summary(genBankAlign, weight = wtPhiX174) revisedPhage <- replaceLetterAt(genBankPhage, c(2793, 2811), "TT") table(countPDict(srPDict, revisedPhage)) @ The following plot shows the coverage of the aligned short reads along the substring of the bacteriophage $\phi$ X174 genome. Applying the \Rfunction{slice} function to the coverage shows the entire substring is covered by aligned short reads. <>= genBankCoverage <- coverage(genBankAlign, weight = wtPhiX174) plot((2793-34):(2811+34), as.integer(genBankCoverage), xlab = "Position", ylab = "Coverage", type = "l") nchar(genBankSubstring) slice(genBankCoverage, lower = 1) @ \subsection{Exercise 7} \begin{enumerate} \item Rerun the global-local alignment of the short reads against the entire genome. (This may take a few minutes.) \item Plot the coverage of these alignments and use the \Rfunction{slice} function to find the ranges of alignment. Are there any alignments outside of the substring region that was used above? \item Use the \Rfunction{reverseComplement} function on the bacteriophage $\phi$ X174 genome. Do any short reads have a higher alignment score on this new sequence than on the original sequence? \end{enumerate} [Answers provided in section \ref{sec:Answers7}.] \section{Computation Profiling} The \Rfunction{pairwiseAlignment} function uses a dynamic programming algorithm based on the Needleman-Wunsch and Smith-Waterman algorithms for global and local pairwise sequence alignments respectively. The algorithm consumes memory and computation time proportional to the product of the length of the two strings being aligned. <>= N <- as.integer(seq(500, 5000, by = 500)) timings <- rep(0, length(N)) names(timings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) timings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global"))[["user.self"]] } timings coef(summary(lm(timings ~ poly(N, 2)))) plot(N, timings, xlab = "String Size, Both Strings", ylab = "Timing (sec.)", type = "l", main = "Global Pairwise Sequence Alignment Timings") @ When a problem only requires the pairwise sequence alignment score, setting the \Rfunarg{scoreOnly} argument to \Robject{TRUE} will more than halve the computation time. <>= scoreOnlyTimings <- rep(0, length(N)) names(scoreOnlyTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) scoreOnlyTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global", scoreOnly = TRUE))[["user.self"]] } scoreOnlyTimings round((timings - scoreOnlyTimings) / timings, 2) @ \subsection{Exercise 8} \begin{enumerate} \item Rerun the first set of profiling code, but this time fix the number of characters in \Robject{string1} to 35 and have the number of characters in \Robject{string2} range from 5000, 50000, by increments of 5000. What is the computational order of this simulation exercise? \item Rerun the second set of profiling code using the simulations from the previous exercise with \Rfunarg{scoreOnly} argument set to \Robject{TRUE}. Is is still twice as fast? \end{enumerate} [Answers provided in section \ref{sec:Answers8}.] \section{Computing alignment consensus matrices} The \Rfunction{consensusMatrix} function is provided for computing a consensus matrix for a set of equal-length strings assumed to be aligned. To illustrate, the following application assumes the ORF data to be aligned for the first 10 positions (patently false): <>= file <- system.file("extdata", "someORF.fa", package="Biostrings") orf <- readDNAStringSet(file) orf orf10 <- DNAStringSet(orf, end=10) consensusMatrix(orf10, as.prob=TRUE, baseOnly=TRUE) @ The information content as defined by Hertz and Stormo 1995 is computed as follows: <>= informationContent <- function(Lmers) { zlog <- function(x) ifelse(x==0,0,log(x)) co <- consensusMatrix(Lmers, as.prob=TRUE) lets <- rownames(co) fr <- alphabetFrequency(Lmers, collapse=TRUE)[lets] fr <- fr / sum(fr) sum(co*zlog(co/fr), na.rm=TRUE) } informationContent(orf10) @ \section{Exercise Answers} \subsection{Exercise 1} \label{sec:Answers1} \begin{enumerate} \item Using \Rfunction{pairwiseAlignment}, fit the global, local, and overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} using the default settings. <>= pairwiseAlignment("zyzzyx", "syzygy") pairwiseAlignment("zyzzyx", "syzygy", type = "local") pairwiseAlignment("zyzzyx", "syzygy", type = "overlap") @ \item Do any of the alignments change if the \Rfunarg{gapExtension} argument is set to \Robject{-Inf}? \textit{Yes, the overlap pairwise sequence alignment changes.} <>= pairwiseAlignment("zyzzyx", "syzygy", type = "overlap", gapExtension = Inf) @ \end{enumerate} \subsection{Exercise 2} \label{sec:Answers2} \begin{enumerate} \item What is the primary benefit of formal summary classes like \Rclass{PairwiseAlignmentsSingleSubjectSummary} and \Rclass{summary.lm} to end users? \textit{These classes allow the end user to extract the summary output for further operations.} <>= ex2 <- summary(pairwiseAlignment("zyzzyx", "syzygy")) nmatch(ex2) / nmismatch(ex2) @ \end{enumerate} \subsection{Exercise 3} \label{sec:Answers3} For the overlap pairwise sequence alignment of the strings \Robject{"syzygy"} and \Robject{"zyzzyx"} with the \Rfunction{pairwiseAlignment} default settings, perform the following operations: <>= ex3 <- pairwiseAlignment("zyzzyx", "syzygy", type = "overlap") @ \begin{enumerate} \item Use \Rfunction{nmatch} and \Rfunction{nmismath} to extract the number of matches and mismatches respectively. <>= nmatch(ex3) nmismatch(ex3) @ \item Use the \Rfunction{compareStrings} function to get the symbolic representation of the alignment. <>= compareStrings(ex3) @ \item Use the \Rfunction{as.character} function to the get the character string versions of the alignments. <>= as.character(ex3) @ \item Use the \Rfunction{pattern} function to extract the aligned pattern and apply the \Rfunction{mismatch} function to it to find the locations of the mismatches. <>= mismatch(pattern(ex3)) @ \item Use the \Rfunction{subject} function to extract the aligned subject and apply the \Rfunction{aligned} function to it to get the aligned strings. <>= aligned(subject(ex3)) @ \end{enumerate} \subsection{Exercise 4} \label{sec:Answers4} \begin{enumerate} \item Use the \Rfunction{pairwiseAlignment} function to find the Levenshtein edit distance between \Robject{"syzygy"} and \Robject{"zyzzyx"}. <>= submat <- matrix(-1, nrow = 26, ncol = 26, dimnames = list(letters, letters)) diag(submat) <- 0 - pairwiseAlignment("zyzzyx", "syzygy", substitutionMatrix = submat, gapOpening = 0, gapExtension = 1, scoreOnly = TRUE) @ \item Use the \Rfunction{stringDist} function to find the Levenshtein edit distance for the vector \Robject{c("zyzzyx", "syzygy", "succeed", "precede", "supersede")}. <>= stringDist(c("zyzzyx", "syzygy", "succeed", "precede", "supersede")) @ \end{enumerate} \subsection{Exercise 5} \label{sec:Answers5} \begin{enumerate} \item Repeat the alignment exercise above using \Robject{BLOSUM62}, a gap opening penalty of 12, and a gap extension penalty of 4. <>= data(BLOSUM62) pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"), substitutionMatrix = BLOSUM62, gapOpening = 12, gapExtension = 4) @ \item Explore to find out what caused the alignment to change. \textit{The sift in gap penalties favored infrequent long gaps to frequent short ones.} \end{enumerate} \subsection{Exercise 6} \label{sec:Answers6} \begin{enumerate} \item Rerun the simulation time using the \Rfunction{simulateReads} function with a \Rfunarg{substitutionRate} of 0.005 and \Rfunarg{gapRate} of 0.0005. How do the different pairwise sequence alignment methods compare? \textit{The different methods are much more comprobable when the error rates are lower.} <>= adapter <- DNAString("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") set.seed(123) N <- 1000 experiment <- list(side = rbinom(N, 1, 0.5), width = sample(0:36, N, replace = TRUE)) table(experiment[["side"]], experiment[["width"]]) ex6Strings <- simulateReads(N, adapter, experiment, substitutionRate = 0.005, gapRate = 0.0005) ex6Strings <- DNAStringSet(ex6Strings) ex6Strings ## Method 1: Use edit distance with an FDR of 1e-03 submat1 <- nucleotideSubstitutionMatrix(match = 0, mismatch = -1, baseOnly = TRUE) quantile(randomScores1, seq(0.99, 1, by = 0.001)) ex6Aligns1 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat1, gapOpening = 0, gapExtension = 1) table(score(ex6Aligns1) > quantile(randomScores1, 0.999), experiment[["width"]]) ## Method 2: Use consecutive matches anywhere in string with an FDR of 1e-03 submat2 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) quantile(randomScores2, seq(0.99, 1, by = 0.001)) ex6Aligns2 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat2, type = "local", gapOpening = 0, gapExtension = Inf) table(score(ex6Aligns2) > quantile(randomScores2, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(start(pattern(ex6Aligns2)) > 37 - end(pattern(ex6Aligns2)), experiment[["side"]]) ## Method 3: Use consecutive matches on the ends with an FDR of 1e-03 submat3 <- nucleotideSubstitutionMatrix(match = 1, mismatch = -Inf, baseOnly = TRUE) ex6Aligns3 <- pairwiseAlignment(ex6Strings, adapter, substitutionMatrix = submat3, type = "overlap", gapOpening = 0, gapExtension = Inf) table(score(ex6Aligns3) > quantile(randomScores3, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(ex6Aligns3)) == 36, experiment[["side"]]) ## Method 4: Allow mismatches and indels on the ends with an FDR of 1e-03 quantile(randomScores4, seq(0.99, 1, by = 0.001)) ex6Aligns4 <- pairwiseAlignment(ex6Strings, adapter, type = "overlap") table(score(ex6Aligns4) > quantile(randomScores4, 0.999), experiment[["width"]]) # Determine if the correct end was chosen table(end(pattern(ex6Aligns4)) == 36, experiment[["side"]]) @ \item (Advanced) Modify the \Rfunction{simulateReads} function to accept different equal length adapters on either side (left \& right) of the reads. How would the methods for trimming the reads change? <>= simulateReads <- function(N, left, right = left, experiment, substitutionRate = 0.01, gapRate = 0.001) { leftChars <- strsplit(as.character(left), "")[[1]] rightChars <- strsplit(as.character(right), "")[[1]] if (length(leftChars) != length(rightChars)) stop("left and right adapters must have the same number of characters") nChars <- length(leftChars) sapply(seq_len(N), function(i) { width <- experiment[["width"]][i] side <- experiment[["side"]][i] randomLetters <- function(n) sample(DNA_ALPHABET[1:4], n, replace = TRUE) randomLettersWithEmpty <- function(n) sample(c("", DNA_ALPHABET[1:4]), n, replace = TRUE, prob = c(1 - gapRate, rep(gapRate/4, 4))) if (side) { value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), rightChars), randomLettersWithEmpty(nChars), sep = "", collapse = "") value <- paste(c(randomLetters(36 - width), substring(value, 1, width)), sep = "", collapse = "") } else { value <- paste(ifelse(rbinom(nChars,1,substitutionRate), randomLetters(nChars), leftChars), randomLettersWithEmpty(nChars), sep = "", collapse = "") value <- paste(c(substring(value, 37 - width, 36), randomLetters(36 - width)), sep = "", collapse = "") } value }) } leftAdapter <- adapter rightAdapter <- reverseComplement(adapter) ex6LeftRightStrings <- simulateReads(N, leftAdapter, rightAdapter, experiment) ex6LeftAligns4 <- pairwiseAlignment(ex6LeftRightStrings, leftAdapter, type = "overlap") ex6RightAligns4 <- pairwiseAlignment(ex6LeftRightStrings, rightAdapter, type = "overlap") scoreCutoff <- quantile(randomScores4, 0.999) leftAligned <- start(pattern(ex6LeftAligns4)) == 1 & score(ex6LeftAligns4) > pmax(scoreCutoff, score(ex6RightAligns4)) rightAligned <- end(pattern(ex6RightAligns4)) == 36 & score(ex6RightAligns4) > pmax(scoreCutoff, score(ex6LeftAligns4)) table(leftAligned, rightAligned) table(leftAligned | rightAligned, experiment[["width"]]) @ \end{enumerate} \subsection{Exercise 7} \label{sec:Answers7} \begin{enumerate} \item Rerun the global-local alignment of the short reads against the entire genome. (This may take a few minutes.) <>= genBankFullAlign <- pairwiseAlignment(srPhiX174, genBankPhage, patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") summary(genBankFullAlign, weight = wtPhiX174) @ \item Plot the coverage of these alignments and use the \Rfunction{slice} function to find the ranges of alignment. Are there any alignments outside of the substring region that was used above? \textit{Yes, there are some alignments outside of the specified substring region.} <>= genBankFullCoverage <- coverage(genBankFullAlign, weight = wtPhiX174) plot(as.integer(genBankFullCoverage), xlab = "Position", ylab = "Coverage", type = "l") slice(genBankFullCoverage, lower = 1) @ \item Use the \Rfunction{reverseComplement} function on the bacteriophage $\phi$ X174 genome. Do any short reads have a higher alignment score on this new sequence than on the original sequence? \textit{Yes, there are some strings with a higher score on the new sequence.} <>= genBankFullAlignRevComp <- pairwiseAlignment(srPhiX174, reverseComplement(genBankPhage), patternQuality = SolexaQuality(quPhiX174), subjectQuality = SolexaQuality(99L), type = "global-local") table(score(genBankFullAlignRevComp) > score(genBankFullAlign)) @ \end{enumerate} \subsection{Exercise 8} \label{sec:Answers8} \begin{enumerate} \item Rerun the first set of profiling code, but this time fix the number of characters in \Robject{string1} to 35 and have the number of characters in \Robject{string2} range from 5000, 50000, by increments of 5000. What is the computational order of this simulation exercise? \textit{As expected, the growth in time is now linear.} <>= N <- as.integer(seq(5000, 50000, by = 5000)) newTimings <- rep(0, length(N)) names(newTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], 35, replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) newTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global"))[["user.self"]] } newTimings coef(summary(lm(newTimings ~ poly(N, 2)))) plot(N, newTimings, xlab = "Larger String Size", ylab = "Timing (sec.)", type = "l", main = "Global Pairwise Sequence Alignment Timings") @ \item Rerun the second set of profiling code using the simulations from the previous exercise with \Rfunarg{scoreOnly} argument set to \Robject{TRUE}. Is is still twice as fast? \textit{Yes, it is still over twice as fast.} <>= newScoreOnlyTimings <- rep(0, length(N)) names(newScoreOnlyTimings) <- as.character(N) for (i in seq_len(length(N))) { string1 <- DNAString(paste(sample(DNA_ALPHABET[1:4], 35, replace = TRUE), collapse = "")) string2 <- DNAString(paste(sample(DNA_ALPHABET[1:4], N[i], replace = TRUE), collapse = "")) newScoreOnlyTimings[i] <- system.time(pairwiseAlignment(string1, string2, type = "global", scoreOnly = TRUE))[["user.self"]] } newScoreOnlyTimings round((newTimings - newScoreOnlyTimings) / newTimings, 2) @ \end{enumerate} \section{Session Information} All of the output in this vignette was produced under the following conditions: <>= sessionInfo() @ \begin{thebibliography}{} \bibitem{Durbin:1998} {Durbin, R.}, {Eddy, S.}, {Krogh, A.}, and {Mitchison G.} \newblock {\em Biological Sequence Analysis}. \newblock Cambridge UP 1998, sec 2.3. \bibitem{Haubold:2006} {Haubold, B.} and {Wiehe, T.} \newblock {\em Introduction to Computational Biology}. \newblock Birkhauser Verlag 2006, Chapter 2. \bibitem{Malde:2008} {Malde, K.} \newblock The effect of sequence quality on sequence alignment. \newblock {\em Bioinformatics}, 24(7):897-900, 2008. \bibitem{NeedWun:1970} {Needleman,S.} and {Wunsch,C.} \newblock A general method applicable to the search for similarities in the amino acid sequence of two proteins. \newblock {\em Journal of Molecular Biology}, 48, 443-453, 1970. \bibitem{Smith:2003} {Smith, H.}; {Hutchison, C.}; {Pfannkoch, C.}; and {Venter, C.} \newblock Generating a synthetic genome by whole genome assembly: \{phi\}X174 bacteriophage from synthetic oligonucleotides. \newblock {\em Proceedings of the National Academy of Sciences}, 100(26): 15440-15445, 2003. \bibitem{SmithWater:1981} {Smith,T.F.} and {Waterman,M.S.} \newblock Identification of common molecular subsequences. \newblock {\em Journal of Molecular Biology}, 147, 195-197, 1981. \end{thebibliography} \end{document}